From iklam at openjdk.org Wed Mar 1 00:58:04 2023 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 1 Mar 2023 00:58:04 GMT Subject: RFR: JDK-8302989: Add missing INCLUDE_CDS checks In-Reply-To: References: Message-ID: On Tue, 28 Feb 2023 13:02:22 GMT, Matthias Baesken wrote: > > It may be more profitable (and less work) if we change these variables to `const` when CDS is disabled: > > ``` > > extern bool DumpSharedSpaces; > > extern bool DynamicDumpSharedSpaces; > > extern bool RequireSharedSpaces; > > extern "C" { > > // Make sure UseSharedSpaces is accessible to the serviceability agent. > > extern JNIEXPORT jboolean UseSharedSpaces; > > } > > ``` > > > > But some changes may be needed in SA. > > Hi Ioi, do you think this should be done for all 4 bools ? Btw. when adding a const to UseSharedSpaces I run into something like this (when compiling with gcc, seems there is some issue when const is used together with JNIEXPORT , any ideas why ? ` globalDefinitions.cpp:50:26: error: 'visibility' attribute ignored [-Werror=attributes]` I think it should be done for all four bools. UseSharedSpaces is accessed by SA in src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/VM.java. I would suggest adding this: public boolean isSharingEnabled() { if (sharingEnabled == null) { <<<<< Boolean sharingConfigured = getSharingConfigured(); // with similar "lookup" logic as below ... if (!sharingConfigured.booleanValue()) { sharingEnabled = sharingConfigured; // INCLUDE_CDS is disabled return false; } >>>>> Address address = VM.getVM().getDebugger().lookup(null, "UseSharedSpaces"); if (address == null && getOS().equals("win32")) { // On Win32 symbols are prefixed with the dll name. So look for // UseSharedSpaces as a symbol in jvm.dll. address = VM.getVM().getDebugger().lookup(null, "jvm!UseSharedSpaces"); } sharingEnabled = address.getJBooleanAt(0); } return sharingEnabled.booleanValue(); } Then we can change globalDefinitions.cpp to something like: #if INCLUDE_CDS // Old CDS options bool DumpSharedSpaces; bool DynamicDumpSharedSpaces; bool RequireSharedSpaces; extern "C" { JNIEXPORT jboolean UseSharedSpaces = true; // for SA JNIEXPORT jboolean SharingConfigured= true; // for SA } #else extern "C" { JNIEXPORT jboolean SharingConfigured= false; // for SA } #endif You may need to test with a build that has SA enabled but CDS disabled. I am not sure if this is a supported combination. ------------- PR: https://git.openjdk.org/jdk/pull/12691 From kbarrett at openjdk.org Wed Mar 1 02:49:39 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 1 Mar 2023 02:49:39 GMT Subject: RFR: 8302798: Refactor -XX:+UseOSErrorReporting for noreturn crash reporting [v2] In-Reply-To: References: Message-ID: > Please review this change to the implementation of the Windows-specific option > UseOSErrorReporting, toward allowing crash reporting functions to be declared > noreturn. VMError::report_and_die no longer conditionally returns if the > Windows-only option UseOSErrorReporting is true. > > The Windows-only sections of report_and_die now call RaiseFailFastException > (https://learn.microsoft.com/en-us/windows/win32/api/errhandlingapi/nf-errhandlingapi-raisefailfastexception), > which immediately invokes WER (Windows Error Reporting) if it is enabled, > without executing structured exception handler. If WER is not enabled, it > just immediately terminates the program. Thus, we no longer return to walk up > thestructured exception handler chain to pop out at the top as unhandled in > order to invoke WER. > > This permits declaring report_and_die as [[noreturn]], once some functions > from the os class are also so declared. Also adding that attribute as > appropriate to other functions in the os class. This of course assumes > the use of [[noreturn]] in HotSpot code is approved (JDK-8302124). > > There is a pre-existing bug that I'll be reporting separately. If > UseOSErrorReporting and CreateCoredumpOnCrash are both true, we create an > empty .mdmp file. We shouldn't create that file when UseOSErrorReporting. > > Testing: > mach5 tier1-3 > > Manual testing with the following, to verify desired behavior. > > -XX:ErrorHandlerTest=N > 1: assertion failure > 2: guarantee failure > 14: SIGSEGV > 15: divide by zero > path/to/bin/java \ > -XX:+UnlockDiagnosticVMOptions \ > -XX:+ErrorLogSecondaryErrorDetails \ > -XX:+UseOSErrorReporting \ > -XX:ErrorHandlerTest=1 \ > TestDebug.java > > --- TestDebug.java --- > import java.lang.String; > public class TestDebug { > static private volatile String dummy; > public static void main(String[] args) throws Exception { > while (true) { > dummy = new String("foo bar"); > } > } > } > --- end TestDebug.java --- > > The state of WER can be examined and modified using Power Shell commands > {Get,Enable,Disable}-WindowsErrorReporting. > > The state of reporting WER captured errors can be examined and modified using > Control Panel > Security and Maintenance > Maintenance : Report Problems [on,off] > > With Report Problems off, reports are placed in > c:\ProgramData\Microsoft\Windows\WER\ReportArchive > > I verified that executing the above test with WER enabled adds an entry in > that directory, but not when it's disabled. Also nothing is added there when > the test is run with -XX:-UseOSErrorReporting. Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: remove failfast cuttoff of secondary errors ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12759/files - new: https://git.openjdk.org/jdk/pull/12759/files/2e39a1b8..e3717037 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12759&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12759&range=00-01 Stats: 6 lines in 1 file changed: 0 ins; 6 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/12759.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12759/head:pull/12759 PR: https://git.openjdk.org/jdk/pull/12759 From duke at openjdk.org Wed Mar 1 04:11:05 2023 From: duke at openjdk.org (Amit Kumar) Date: Wed, 1 Mar 2023 04:11:05 GMT Subject: RFR: 8303210: [linux, Windows] Enable UseSystemMemoryBarrier by default if possible [v3] In-Reply-To: References: <9eZo1xYNGhjMSC9lDXKtkO1eyU_H-Veuh1AeP3CPKbg=.69b6c4ff-a3ef-439f-8468-21fec9de1825@github.com> Message-ID: <97IuCUIDtTNr5sodrGE3PcPUugt3UT2eZPqvCmqNhWo=.26c6c856-d3a3-4541-8b7e-7b2654e16ec6@github.com> On Tue, 28 Feb 2023 05:26:22 GMT, Martin Doerr wrote: >> I'd like to enable UseSystemMemoryBarrier by default on supported Operating Systems in order to improve performance of thread state transitions (I/O, JNI, foreign function calls, JIT compiler threads, etc.). See JBS issue for more details. >> Unfortunately, the feature was not yet implemented on all platforms. I added the code, but need the platform maintainers to check if it can be used reliably (and ideally if the performance improves). It's easy to switch it off again in case of problems. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Improve logging. Build and tier1 test are okay with this patch, except one new failure (it could be just a random failure, unrelated to this patch) compiler/jsr292/ContinuousCallSiteTargetChange.java But @TheRealMDoerr I'm not sure about how to perform regression, will catch up with Tyler for it and post the result. ------------- PR: https://git.openjdk.org/jdk/pull/12753 From kbarrett at openjdk.org Wed Mar 1 04:46:53 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 1 Mar 2023 04:46:53 GMT Subject: RFR: 8303418: Improve parameter and variable names in BitMap Message-ID: Please review this change to names in BitMap. - Parameters that designate a bit in a BitMap are named "bit". - Parameters that designate a word in the underlying BitMap storage are named "word". - Parameters that designate a range are named "beg" and "end" resp. Added helper function `flipped_word` for use by `get_next_bit_impl`, replacing the odd overload for `map`. In `get_next_bit_impl`, prefixed the variables "index" and "limit" with "word_" to make clear the units. Testing: mach5 tier1 ------------- Commit messages: - improve names in get_next_bit_impl - cleanup parameter names Changes: https://git.openjdk.org/jdk/pull/12798/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12798&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303418 Stats: 53 lines in 3 files changed: 5 ins; 1 del; 47 mod Patch: https://git.openjdk.org/jdk/pull/12798.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12798/head:pull/12798 PR: https://git.openjdk.org/jdk/pull/12798 From mdoerr at openjdk.org Wed Mar 1 05:20:05 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 1 Mar 2023 05:20:05 GMT Subject: RFR: 8303210: [linux, Windows] Enable UseSystemMemoryBarrier by default if possible [v3] In-Reply-To: <97IuCUIDtTNr5sodrGE3PcPUugt3UT2eZPqvCmqNhWo=.26c6c856-d3a3-4541-8b7e-7b2654e16ec6@github.com> References: <9eZo1xYNGhjMSC9lDXKtkO1eyU_H-Veuh1AeP3CPKbg=.69b6c4ff-a3ef-439f-8468-21fec9de1825@github.com> <97IuCUIDtTNr5sodrGE3PcPUugt3UT2eZPqvCmqNhWo=.26c6c856-d3a3-4541-8b7e-7b2654e16ec6@github.com> Message-ID: On Wed, 1 Mar 2023 04:08:44 GMT, Amit Kumar wrote: > Build and tier1 test are okay with this patch, except one new failure (it could be just a random failure, unrelated to this patch) compiler/jsr292/ContinuousCallSiteTargetChange.java > > But @TheRealMDoerr I'm not sure about how to perform regression, will catch up with Tyler for it and post the result. Thanks for checking! "ContinuousCallSiteTargetChange.java" often fails. It's probably unrelated. I guess I'll change my plan and only make the feature available for now since there are some drawbacks which prevent us from enabling it by default on all platforms. ------------- PR: https://git.openjdk.org/jdk/pull/12753 From mdoerr at openjdk.org Wed Mar 1 05:20:08 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 1 Mar 2023 05:20:08 GMT Subject: RFR: 8303210: [linux, Windows] Enable UseSystemMemoryBarrier by default if possible [v3] In-Reply-To: <5fRZQL_fESuAYk3W5hdNEMl8dtcAPrWMUardVietcgI=.c550b7c9-5586-4d1e-83bf-2e0d67d00784@github.com> References: <9eZo1xYNGhjMSC9lDXKtkO1eyU_H-Veuh1AeP3CPKbg=.69b6c4ff-a3ef-439f-8468-21fec9de1825@github.com> <5fRZQL_fESuAYk3W5hdNEMl8dtcAPrWMUardVietcgI=.c550b7c9-5586-4d1e-83bf-2e0d67d00784@github.com> Message-ID: On Tue, 28 Feb 2023 06:28:42 GMT, David Holmes wrote: > There is also a 7.9% regression on Aarch64 with the Renaissance-Scala-Kmeans benchmark. > > And 6% on Renaissance-ParMnemonics on Aarch64 (but a 7% improvement on x64 for this one). > > This may have to be an opt-in potential optimisation, for which we would need a full-fledged product flag (not experimental or diagnostic). Thanks a lot for measuring! It's very unfortunate that this nice OS feature performs poorly in some cases! So, I'll keep it off by default for now (hoping that there will be some improvement in the future). ------------- PR: https://git.openjdk.org/jdk/pull/12753 From mdoerr at openjdk.org Wed Mar 1 05:31:34 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 1 Mar 2023 05:31:34 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v4] In-Reply-To: References: Message-ID: > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: HFA: Add support for nested structures. See JDK-8300294. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12708/files - new: https://git.openjdk.org/jdk/pull/12708/files/a4d844f7..b461d80c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=02-03 Stats: 37 lines in 2 files changed: 24 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From kbarrett at openjdk.org Wed Mar 1 06:08:33 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 1 Mar 2023 06:08:33 GMT Subject: RFR: 8303418: Improve parameter and variable names in BitMap [v2] In-Reply-To: References: Message-ID: > Please review this change to names in BitMap. > > - Parameters that designate a bit in a BitMap are named "bit". > - Parameters that designate a word in the underlying BitMap storage are named "word". > - Parameters that designate a range are named "beg" and "end" resp. > > Added helper function `flipped_word` for use by `get_next_bit_impl`, replacing > the odd overload for `map`. > > In `get_next_bit_impl`, prefixed the variables "index" and "limit" with > "word_" to make clear the units. > > Testing: > mach5 tier1 Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: copyrights ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12798/files - new: https://git.openjdk.org/jdk/pull/12798/files/cccf2fc0..59138b0f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12798&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12798&range=00-01 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/12798.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12798/head:pull/12798 PR: https://git.openjdk.org/jdk/pull/12798 From mdoerr at openjdk.org Wed Mar 1 06:12:54 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 1 Mar 2023 06:12:54 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v5] In-Reply-To: References: Message-ID: <3kCGyBFd4OIVzBcT_6KQIvJIFCGVRTRq2dy3qKP0Piw=.ef00d405-43e7-402a-a1d3-1ebaed3b7830@github.com> > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Minor cleanup. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12708/files - new: https://git.openjdk.org/jdk/pull/12708/files/b461d80c..75b5c78f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=03-04 Stats: 6 lines in 3 files changed: 0 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From mdoerr at openjdk.org Wed Mar 1 06:12:59 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 1 Mar 2023 06:12:59 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v3] In-Reply-To: References: <8b3vVrV22RuhdRoRYacXV0ZeghFGgKkC8S_z-iMrzAQ=.dd84b743-8b51-4281-8f5f-f9eff6207bc7@github.com> Message-ID: On Tue, 28 Feb 2023 16:54:50 GMT, Jorn Vernee wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove size restriction for structs. Add TODO for Big Endian. > > src/hotspot/cpu/ppc/downcallLinker_ppc.cpp line 343: > >> 341: >> 342: __ flush(); >> 343: // Disassembler::decode((u_char*)start, (u_char*)__ pc(), tty); > > Leftover commented code? > > (note that the stub can also be disassembled with `-Xlog:foreign+downcall=trace` now) Removed. Thanks for the hint! > src/hotspot/cpu/ppc/upcallLinker_ppc.cpp line 356: > >> 354: } >> 355: #endif >> 356: //blob->print_on(tty); > > Leftover commented code? Removed. ------------- PR: https://git.openjdk.org/jdk/pull/12708 From mdoerr at openjdk.org Wed Mar 1 06:13:03 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 1 Mar 2023 06:13:03 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v5] In-Reply-To: References: <8b3vVrV22RuhdRoRYacXV0ZeghFGgKkC8S_z-iMrzAQ=.dd84b743-8b51-4281-8f5f-f9eff6207bc7@github.com> Message-ID: On Wed, 22 Feb 2023 20:25:07 GMT, Jorn Vernee wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Minor cleanup. > > src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/CallArranger.java line 81: > >> 79: new VMStorage[] { f1, f2, f3, f4, f5, f6, f7, f8 }, // FP output >> 80: new VMStorage[] { r0, r2, r3, r4, r5, r6, r7, r8, r9, r10, r11, r12 }, // volatile GP >> 81: new VMStorage[] { f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11, f12, f13 }, // volatile FP > > Note that argument registers are assumed volatile, so they don't have to be duplicated here. Removed. > src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/linux/LinuxPPC64CallArranger.java line 33: > >> 31: * PPC64 CallArranger specialized for Linux ABI. >> 32: */ >> 33: public class LinuxPPC64CallArranger extends CallArranger { > > I don't really see the point in having a separate subclass with `CallArranger` being abstract, unless you are planning to add other implementations later? > > (edit: see also later comment in CallArranger https://github.com/openjdk/jdk/pull/12708#discussion_r1120753657) AIX support will need to get implemented, yet. I guess @backwaterred will work on it. ------------- PR: https://git.openjdk.org/jdk/pull/12708 From mdoerr at openjdk.org Wed Mar 1 06:20:11 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 1 Mar 2023 06:20:11 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v3] In-Reply-To: References: <8b3vVrV22RuhdRoRYacXV0ZeghFGgKkC8S_z-iMrzAQ=.dd84b743-8b51-4281-8f5f-f9eff6207bc7@github.com> Message-ID: On Tue, 28 Feb 2023 19:45:28 GMT, Jorn Vernee wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove size restriction for structs. Add TODO for Big Endian. > > src/hotspot/cpu/ppc/downcallLinker_ppc.cpp line 133: > >> 131: Register callerSP = R2, // C/C++ uses R2 as TOC, but we can reuse it here >> 132: tmp = R11_scratch1, // same as shuffle_reg >> 133: call_target_address = R12_scratch2; // same as _abi._scratch2 (ABIv2 requires this reg!) > > Do I understand correctly that the ABI requires the register to be used for the call to be `R12`? How does that make a difference? I guess in some cases the callee might want to know the address through which it is called? (so it looks at `R12`) ABI v2 requires R12 to point to the function entry point. It is used to access constants relative to it. > src/hotspot/cpu/ppc/downcallLinker_ppc.cpp line 154: > >> 152: // (abi_reg_args is abi_minframe plus space for 8 argument register spill slots) >> 153: assert(_abi._shadow_space_bytes == frame::abi_minframe_size, "expected space according to ABI"); >> 154: int allocated_frame_size = frame::abi_minframe_size + MAX2(_input_registers.length(), 8) * BytesPerWord; > > This is hard-coding an assumption about the ABI that's being called. Ok for now. > > If it needs to be addressed in the future, it could be done by adding another field to `ABIDescriptor` like `min_stack_arg_bytes`, or something like that (which is set to zero for other ABIs). It seems to be different from `shadow_space` since it's also used by the caller to put stack arguments. Yeah, I think this should be done on demand. > src/hotspot/cpu/ppc/foreignGlobals_ppc.cpp line 229: > >> 227: >> 228: void ArgumentShuffle::pd_generate(MacroAssembler* masm, VMStorage tmp, int in_stk_bias, int out_stk_bias, const StubLocations& locs) const { >> 229: Register callerSP = as_Register(tmp); // preset > > It looks like `tmp` is being used to hold the caller's SP. I'm guessing this can not be computed the same way as we do on x86 and aarch64? (based on `RBP`, `RFP_BIAS`) > > If you want, you could add another register argument to `pd_generate` that is just invalid/unused on other platforms. That way you could use `tmp` for the shuffling instead of having to go through the stack. (looks like `R0` is already used in some cases as a temp register) There's no BP register. It could get loaded from the back chain, but would still need a register. R0 is always available as scratch reg, so there's no need to pass it. (Note that I'm not going through stack because of the lack of registers.) Yeah, maybe passing callerSP separately would make it better readable on PPC64. Not sure if it's worth changing. > src/hotspot/cpu/ppc/upcallLinker_ppc.cpp line 137: > >> 135: ArgumentShuffle arg_shuffle(in_sig_bt, total_in_args, out_sig_bt, total_out_args, &in_conv, &out_conv, shuffle_reg); >> 136: // The Java call uses the JIT ABI, but we also call C. >> 137: int out_arg_area = MAX2(frame::jit_out_preserve_size + arg_shuffle.out_arg_bytes(), (int)frame::abi_reg_args_size); > > We need `frame::abi_reg_args_size` since we call `on_entry`/`on_exit` which require the stack space I guess? Correct. C functions are allowed to write into the space which is part of the caller's frame. ------------- PR: https://git.openjdk.org/jdk/pull/12708 From mdoerr at openjdk.org Wed Mar 1 06:28:12 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 1 Mar 2023 06:28:12 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v3] In-Reply-To: References: <8b3vVrV22RuhdRoRYacXV0ZeghFGgKkC8S_z-iMrzAQ=.dd84b743-8b51-4281-8f5f-f9eff6207bc7@github.com> Message-ID: On Tue, 28 Feb 2023 20:00:15 GMT, Jorn Vernee wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove size restriction for structs. Add TODO for Big Endian. > > src/hotspot/cpu/ppc/upcallLinker_ppc.cpp line 240: > >> 238: __ ld(call_target_address, in_bytes(Method::from_compiled_offset()), R19_method); >> 239: __ mtctr(call_target_address); >> 240: __ bctrl(); > > Ok, I see. I guess there is some special purpose register called `CTR` which we are moving to for `bctrl` here. Does ABIv2 require that move to always come from `R12`? (from the comment in downcallLinker). > > (I'm trying to understand the requirements for possibly tweaking shared code). There's no instruction which can use a GP reg as branch target directly. That's why we use CTR. In this case, using R12 is not required since we call Java and our PPC64 VM code does not rely on it. If we were calling C, using R12 would be required by ABI v2. > src/hotspot/cpu/ppc/upcallLinker_ppc.cpp line 347: > >> 345: FunctionDescriptor* fd = (FunctionDescriptor*)fd_addr; >> 346: fd->set_entry(fd_addr + sizeof(FunctionDescriptor)); >> 347: #endif > > Had to do a double take. Looks like we're not the only one who are using the name `FunctionDescriptor` :) Yeah, ABI v1 (Big Endian) treats function pointers as pointers to a structure called `FunctionDescriptor` which contains the entry point plus additional information like a pointer to a constant table. > src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/CallArranger.java line 286: > >> 284: // "no partial DW rule": Mark first stack slot to get filled. >> 285: // Note: Can only happen with forArguments = true. >> 286: VMStorage overlappingReg = null; > > `overlappingReg` is initialized along all branches, so it's not needed to assign `null` here (and then javac will check it is actually assigned before use) Removed. > src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/TypeClass.java line 66: > >> 64: } >> 65: >> 66: static boolean isHomogeneousFloatAggregate(MemoryLayout type, boolean useABIv2) { > > Note that we had to make some changes to this routine on AArch64, since it didn't properly account for nested structs/unions and arrays. See: https://github.com/openjdk/panama-foreign/pull/780 > > Just as a heads up, in case PPC needs changes too. Thanks for the hint! I just added the code since it is known to be needed, small and doesn't break any of the existing tests. I'll need to test and possibly debug the various cases, yet. ------------- PR: https://git.openjdk.org/jdk/pull/12708 From mdoerr at openjdk.org Wed Mar 1 06:41:09 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 1 Mar 2023 06:41:09 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v3] In-Reply-To: References: <8b3vVrV22RuhdRoRYacXV0ZeghFGgKkC8S_z-iMrzAQ=.dd84b743-8b51-4281-8f5f-f9eff6207bc7@github.com> Message-ID: On Fri, 24 Feb 2023 07:17:30 GMT, Jorn Vernee wrote: >> Some more remarks about other issues: >> - Uploaded my simple reproducer to [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) >> - Using oversized load / stores is problematic. Don't forget that OpenJDK still supports Big Endian platforms (AIX, s390x). >> - The result of `NativeCallingConvention::calling_convention` is interpreted as size, but it returns the max offset. That's off by one slot. Should I file a bug for that? (PPC64 is not affected because it doesn't use the result.) >> - Since the membar on the return path was mentioned: I think it would be good to enable UseSystemMemoryBarrier by default on operating systems which support it. Maybe we should discuss this with @robehn. > >> * Uploaded my simple reproducer to [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > Thanks! > >> * Using oversized load / stores is problematic. Don't forget that OpenJDK still supports Big Endian platforms (AIX, s390x). > > You're right. I realized that it's also problematic for heap segments, for which we can't do oversized accesses. I am working on another solution that splits up the loads/stores into power-of-two sized chunks: https://github.com/openjdk/panama-foreign/compare/foreign-memaccess+abi...JornVernee:panama-foreign:OOB That patch is just a POC at this point though. Also, I don't think it works for BE at the moment (need to flip the offset for BE, I think. Just like we do in Unsafe). > >> * The result of `NativeCallingConvention::calling_convention` is interpreted as size, but it returns the max offset. That's off by one slot. Should I file a bug for that? (PPC64 is not affected because it doesn't use the result.) > > I'm not sure there's an issue there. Note that the 'max offset' is computed as `reg.offset() + reg.stack_size()`, so that should get us the size we need to allocate for the stack arguments. (e.g. 2 ints being passed at offset 0 and 4, would make max offset 4 + 4 = 8, which gives the size needed for the 2 ints). Computing the max offset instead of just summing the sizes of the stack arguments is needed since stack arguments can be sparsely placed in some cases on Mac/AArch64. > >> * Since the membar on the return path was mentioned: I think it would be good to enable UseSystemMemoryBarrier by default on operating systems which support it. Maybe we should discuss this with @robehn. > > ~I don't think we've done that much testing with UseSystemMemoryBarrier since it was added~. I'm a bit nervous about turning it on by default since it's currently also used for JNI. Let's see what Robbin thinks. @JornVernee: Thanks a lot for your detailed review! I have quite a few TODOs which include: - Include my tests for the HFA corner cases. - Try to improve handling of the overlapping registers as you suggested. - Check nesting of HFA. There will surely be more when looking into Big Endian support after merging with your recent work on https://github.com/openjdk/panama-foreign/compare/foreign-memaccess+abi...JornVernee:panama-foreign:OOB We should get rid of oversized accesses on PPC64, too. Thanks for sharing your plans to intrisify `linkToNative` in C2 later. I guess we should do more preparation work on all platforms when that gets addressed. ------------- PR: https://git.openjdk.org/jdk/pull/12708 From fyang at openjdk.org Wed Mar 1 07:03:06 2023 From: fyang at openjdk.org (Fei Yang) Date: Wed, 1 Mar 2023 07:03:06 GMT Subject: RFR: 8303210: [linux, Windows] Enable UseSystemMemoryBarrier by default if possible [v3] In-Reply-To: References: <9eZo1xYNGhjMSC9lDXKtkO1eyU_H-Veuh1AeP3CPKbg=.69b6c4ff-a3ef-439f-8468-21fec9de1825@github.com> Message-ID: On Tue, 28 Feb 2023 05:26:22 GMT, Martin Doerr wrote: >> I'd like to enable UseSystemMemoryBarrier by default on supported Operating Systems in order to improve performance of thread state transitions (I/O, JNI, foreign function calls, JIT compiler threads, etc.). See JBS issue for more details. >> Unfortunately, the feature was not yet implemented on all platforms. I added the code, but need the platform maintainers to check if it can be used reliably (and ideally if the performance improves). It's easy to switch it off again in case of problems. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Improve logging. bootcycle and tier1-3 test good on linux-riscv64 platform. I think it will be safer to explicitly include "runtime/globals.hpp": diff --git a/src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp b/src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp index d0083f932df..117145ac137 100644 --- a/src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp +++ b/src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp @@ -44,6 +44,7 @@ #include "prims/methodHandles.hpp" #include "runtime/continuation.hpp" #include "runtime/continuationEntry.inline.hpp" +#include "runtime/globals.hpp" #include "runtime/jniHandles.hpp" #include "runtime/safepointMechanism.hpp" #include "runtime/sharedRuntime.hpp" diff --git a/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp b/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp index 3a259d4ba18..ff7b40c2695 100644 --- a/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp +++ b/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp @@ -45,6 +45,7 @@ #include "runtime/arguments.hpp" #include "runtime/deoptimization.hpp" #include "runtime/frame.inline.hpp" +#include "runtime/globals.hpp" #include "runtime/jniHandles.hpp" #include "runtime/sharedRuntime.hpp" #include "runtime/stubRoutines.hpp" ------------- PR: https://git.openjdk.org/jdk/pull/12753 From stuefe at openjdk.org Wed Mar 1 07:13:08 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 1 Mar 2023 07:13:08 GMT Subject: RFR: 8302798: Refactor -XX:+UseOSErrorReporting for noreturn crash reporting [v2] In-Reply-To: References: Message-ID: On Wed, 1 Mar 2023 02:49:39 GMT, Kim Barrett wrote: >> Please review this change to the implementation of the Windows-specific option >> UseOSErrorReporting, toward allowing crash reporting functions to be declared >> noreturn. VMError::report_and_die no longer conditionally returns if the >> Windows-only option UseOSErrorReporting is true. >> >> The Windows-only sections of report_and_die now call RaiseFailFastException >> (https://learn.microsoft.com/en-us/windows/win32/api/errhandlingapi/nf-errhandlingapi-raisefailfastexception), >> which immediately invokes WER (Windows Error Reporting) if it is enabled, >> without executing structured exception handler. If WER is not enabled, it >> just immediately terminates the program. Thus, we no longer return to walk up >> thestructured exception handler chain to pop out at the top as unhandled in >> order to invoke WER. >> >> This permits declaring report_and_die as [[noreturn]], once some functions >> from the os class are also so declared. Also adding that attribute as >> appropriate to other functions in the os class. This of course assumes >> the use of [[noreturn]] in HotSpot code is approved (JDK-8302124). >> >> There is a pre-existing bug that I'll be reporting separately. If >> UseOSErrorReporting and CreateCoredumpOnCrash are both true, we create an >> empty .mdmp file. We shouldn't create that file when UseOSErrorReporting. >> >> Testing: >> mach5 tier1-3 >> >> Manual testing with the following, to verify desired behavior. >> >> -XX:ErrorHandlerTest=N >> 1: assertion failure >> 2: guarantee failure >> 14: SIGSEGV >> 15: divide by zero >> path/to/bin/java \ >> -XX:+UnlockDiagnosticVMOptions \ >> -XX:+ErrorLogSecondaryErrorDetails \ >> -XX:+UseOSErrorReporting \ >> -XX:ErrorHandlerTest=1 \ >> TestDebug.java >> >> --- TestDebug.java --- >> import java.lang.String; >> public class TestDebug { >> static private volatile String dummy; >> public static void main(String[] args) throws Exception { >> while (true) { >> dummy = new String("foo bar"); >> } >> } >> } >> --- end TestDebug.java --- >> >> The state of WER can be examined and modified using Power Shell commands >> {Get,Enable,Disable}-WindowsErrorReporting. >> >> The state of reporting WER captured errors can be examined and modified using >> Control Panel > Security and Maintenance > Maintenance : Report Problems [on,off] >> >> With Report Problems off, reports are placed in >> c:\ProgramData\Microsoft\Windows\WER\ReportArchive >> >> I verified that executing the above test with WER enabled adds an entry in >> that directory, but not when it's disabled. Also nothing is added there when >> the test is run with -XX:-UseOSErrorReporting. > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > remove failfast cuttoff of secondary errors LGTM ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.org/jdk/pull/12759 From duke at openjdk.org Wed Mar 1 08:27:08 2023 From: duke at openjdk.org (Jan Kratochvil) Date: Wed, 1 Mar 2023 08:27:08 GMT Subject: RFR: 8302191: Performance degradation for float/double modulo on Linux [v2] In-Reply-To: References: Message-ID: On Tue, 28 Feb 2023 12:41:07 GMT, Jan Kratochvil wrote: >> test/micro/org/openjdk/bench/vm/floatingpoint/DremFrem.java line 44: >> >>> 42: * Java bytecode drem is defined as C fmod (not C drem==remainder). >>> 43: * GCC since 93ba85fdd253b4b9cf2b9e54e8e5969b1a3db098 has slow fmod(). >>> 44: * Testcase is based on: https://stackoverflow.com/a/55673220/2995591 >> >> That could be a problem. What is the license for code shown on StackOverflow? > > A nice catch. I have tried to get a compatible license, we'll see: https://github.com/cirosantilli/java-cheat/pull/3 It is now resolved, the testcase code is now relicensed as GPLv2+. ------------- PR: https://git.openjdk.org/jdk/pull/12508 From duke at openjdk.org Wed Mar 1 08:52:06 2023 From: duke at openjdk.org (Jan Kratochvil) Date: Wed, 1 Mar 2023 08:52:06 GMT Subject: RFR: 8302191: Performance degradation for float/double modulo on Linux [v2] In-Reply-To: References: Message-ID: <50HcTSZYkWhmyRTTSpQsJJ9Dv28R_RnY1XflczBBxeg=.7c6c1e71-7129-4bc2-9ab5-f5dde541f0a9@github.com> On Mon, 27 Feb 2023 01:16:31 GMT, David Holmes wrote: >> Jan Kratochvil has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains one additional commit since the last revision: >> >> 8302191: Performance degradation for float/double modulo on Linux > > src/hotspot/cpu/x86/sharedRuntime_x86.cpp line 87: > >> 85: #endif //COMPILER1 >> 86: >> 87: JRT_LEAF(jfloat, SharedRuntime::frem(jfloat x, jfloat y)) > > Nit: extra space before x and y. I agree but FYI it was just copy-pasted from existing source: > `-JRT_LEAF(jfloat, SharedRuntime::frem(jfloat x, jfloat y))` > src/hotspot/share/runtime/sharedRuntime.cpp line 236: > >> 234: const julong double_infinity = CONST64(0x7FF0000000000000); >> 235: >> 236: #ifndef X86 > > I wonder if the WIN64 workaround is actually needed/valid for non-X86 windows? It is true the comment says: > 64-bit Windows on amd64 returns the wrong values for infinity operands. I have left the workaround really just for AMD64 Windows now. I am going to get it regression tested on aarch64 if that is enough. ------------- PR: https://git.openjdk.org/jdk/pull/12508 From duke at openjdk.org Wed Mar 1 08:56:44 2023 From: duke at openjdk.org (Jan Kratochvil) Date: Wed, 1 Mar 2023 08:56:44 GMT Subject: RFR: 8302191: Performance degradation for float/double modulo on Linux [v3] In-Reply-To: References: Message-ID: > I have OCA already processed/approved. I am not Author but my Author request is being processed these days (sent to Rob McKenna). > I did regression test x86_64 OpenJDK-8. I will leave other regression testing on GHA. > The patch (and former GCC performance regression) affects only x86_64+i686. Jan Kratochvil has updated the pull request incrementally with one additional commit since the last revision: Update according to the upstream review by David Holmes. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12508/files - new: https://git.openjdk.org/jdk/pull/12508/files/d00f1a5a..44ba5348 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12508&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12508&range=01-02 Stats: 66 lines in 3 files changed: 30 ins; 30 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/12508.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12508/head:pull/12508 PR: https://git.openjdk.org/jdk/pull/12508 From eosterlund at openjdk.org Wed Mar 1 09:21:58 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 1 Mar 2023 09:21:58 GMT Subject: RFR: 8302780: Add support for vectorized arraycopy GC barriers [v7] In-Reply-To: References: Message-ID: <3Pj7rn9qShfBoskPL0APjSp_uFqSbe59oJOHIwnGZwo=.68853563-53cc-4988-aa26-7f64aabafd58@github.com> > So far, the arraycopy stubs have performed some kind of bulk pre/post barriers for arraycopy, which have been good enough, and allowed the copying itself to be done with plain loads and stores. For generational ZGC, this approach is not good enough, and we need barriers for the actual copying, but instead don't need the pre/post barriers. To prepare the JVM for generational ZGC, we need to add an API for arraycopy barriers. Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: Helper stack object aarch64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12670/files - new: https://git.openjdk.org/jdk/pull/12670/files/8d2b8f41..888e4cb0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12670&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12670&range=05-06 Stats: 446 lines in 1 file changed: 73 ins; 242 del; 131 mod Patch: https://git.openjdk.org/jdk/pull/12670.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12670/head:pull/12670 PR: https://git.openjdk.org/jdk/pull/12670 From mbaesken at openjdk.org Wed Mar 1 09:24:39 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 1 Mar 2023 09:24:39 GMT Subject: RFR: JDK-8302989: Add missing INCLUDE_CDS checks [v2] In-Reply-To: References: Message-ID: > The cds only coding in hotspot is usually guarded with the INCLUDE_CDS macro so that it can be removed at compile time in case the correct configure flags are set. > However at some places INCLUDE_CDS is missing and should be added. > > One question - should (additionally to the UseSharedSpaces code section) the DumpSharedSpaces code sections be guarded as well with INCLUDE_CDS macros ? Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: Make SharedSpaces related vars const and false in non CDS mode ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12691/files - new: https://git.openjdk.org/jdk/pull/12691/files/6c22a2a7..5dff9614 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12691&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12691&range=00-01 Stats: 46 lines in 4 files changed: 45 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12691.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12691/head:pull/12691 PR: https://git.openjdk.org/jdk/pull/12691 From eosterlund at openjdk.org Wed Mar 1 09:29:12 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 1 Mar 2023 09:29:12 GMT Subject: RFR: 8302780: Add support for vectorized arraycopy GC barriers [v6] In-Reply-To: <2g2QqIWkC2a7QyVzPkX-QW3LcMrti6-AMwuFPqe752o=.c63ae268-69be-4770-b642-544633c1cf8f@github.com> References: <47e5gQJkZn7ldGv3n2cyQoZSAT9YpWWvIwXnbxhdGuQ=.643a4652-204b-4689-85a4-9378e8a587b3@github.com> <2g2QqIWkC2a7QyVzPkX-QW3LcMrti6-AMwuFPqe752o=.c63ae268-69be-4770-b642-544633c1cf8f@github.com> Message-ID: <1NdlCJpW9kP6N1prWI-pXRBFCooYppayETd83JdBEkE=.df3ed4f4-b685-4377-8284-40b4a3cedf5e@github.com> On Tue, 28 Feb 2023 10:34:17 GMT, Andrew Haley wrote: >> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: >> >> Add comment > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 720: > >> 718: t4 = r7, t5 = r11, t6 = r12, t7 = r13; >> 719: const Register stride = r14; >> 720: const Register gct1 = r8, gct2 = r9, gct3 = r10; > > Please don't alias rscratch1 and rscratch2. Many macros use them, and this is a bug waiting to happen. I view the GC barriers in spirit as being one of said macros, they just happen to be in a different file. We could have them in the macroAssembler file, but choose to not clutter it with GC specifics, and moved it to a different file instead. The main reason I selected r8 and r9 for the GC temp registers, is that we have run out of C ABI caller saved registers in copy_memory. All other registers in [r0, r17] are already used, and the arraycopy stubs are called with the C ABI, hence expecting any callee saved register to either not be touched or be saved in the callee, making them poor choices for temp registers. While a different selection could be made, using for example a callee saved register such as r20 and r21, I'd have to push the registers in the prologue and pop them in the epilogue of the arraycopy stubs, which seems a bit awkward if it can be avoided. I'm open to doing that, but it doesn't spark joy. So I thought I'd explain the choice here. What do you think? ------------- PR: https://git.openjdk.org/jdk/pull/12670 From eosterlund at openjdk.org Wed Mar 1 09:32:29 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 1 Mar 2023 09:32:29 GMT Subject: RFR: 8302780: Add support for vectorized arraycopy GC barriers [v6] In-Reply-To: References: <47e5gQJkZn7ldGv3n2cyQoZSAT9YpWWvIwXnbxhdGuQ=.643a4652-204b-4689-85a4-9378e8a587b3@github.com> Message-ID: On Tue, 28 Feb 2023 13:48:23 GMT, Andrew Haley wrote: >> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: >> >> Add comment > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 812: > >> 810: bs_asm->copy_load_at(_masm, decorators, type, 32, >> 811: v2, v3, Address(__ pre(s, 8 * unit)), >> 812: gct1, gct2, gcvt1); > > All this extreme cut-and-paste manual unrolling is very hard to read, maintain, and review. > I wasn't going to say anything, because it's Erik, and what do I know! But I must push back here, this is too much. > Please consider these style changes. I updated the PR with a helper stack object encapsulating the choice of GC temp registers, types, decorators, etc, so that each line removes all the noise and becomes more readable. Do you like it? If yes, do you also want your proposed loop constructions in the new more compact form? I'm okay either way. ------------- PR: https://git.openjdk.org/jdk/pull/12670 From kbarrett at openjdk.org Wed Mar 1 09:54:09 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 1 Mar 2023 09:54:09 GMT Subject: RFR: 8302798: Refactor -XX:+UseOSErrorReporting for noreturn crash reporting [v2] In-Reply-To: <6VenuisSudRu06LiySSGslFxVuvMh9GpY1elHkahExU=.5b06f61d-7635-48b0-b139-da102ceb2fcf@github.com> References: <6VenuisSudRu06LiySSGslFxVuvMh9GpY1elHkahExU=.5b06f61d-7635-48b0-b139-da102ceb2fcf@github.com> Message-ID: On Mon, 27 Feb 2023 21:57:37 GMT, Coleen Phillimore wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> remove failfast cuttoff of secondary errors > > This is a nice solution and thank you for going the extra step of verifying that the code supports WER. Thanks for reviews @coleenp and @tstuefe . ------------- PR: https://git.openjdk.org/jdk/pull/12759 From eosterlund at openjdk.org Wed Mar 1 09:54:15 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 1 Mar 2023 09:54:15 GMT Subject: RFR: 8302780: Add support for vectorized arraycopy GC barriers [v6] In-Reply-To: <1NdlCJpW9kP6N1prWI-pXRBFCooYppayETd83JdBEkE=.df3ed4f4-b685-4377-8284-40b4a3cedf5e@github.com> References: <47e5gQJkZn7ldGv3n2cyQoZSAT9YpWWvIwXnbxhdGuQ=.643a4652-204b-4689-85a4-9378e8a587b3@github.com> <2g2QqIWkC2a7QyVzPkX-QW3LcMrti6-AMwuFPqe752o=.c63ae268-69be-4770-b642-544633c1cf8f@github.com> <1NdlCJpW9kP6N1prWI-pXRBFCooYppayETd83JdBEkE=.df3ed4f4-b685-4377-8284-40b4a3cedf5e@github.com> Message-ID: On Wed, 1 Mar 2023 09:25:58 GMT, Erik ?sterlund wrote: >> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 720: >> >>> 718: t4 = r7, t5 = r11, t6 = r12, t7 = r13; >>> 719: const Register stride = r14; >>> 720: const Register gct1 = r8, gct2 = r9, gct3 = r10; >> >> Please don't alias rscratch1 and rscratch2. Many macros use them, and this is a bug waiting to happen. > > I view the GC barriers in spirit as being one of said macros, they just happen to be in a different file. We could have them in the macroAssembler file, but choose to not clutter it with GC specifics, and moved it to a different file instead. > The main reason I selected r8 and r9 for the GC temp registers, is that we have run out of C ABI caller saved registers in copy_memory. All other registers in [r0, r17] are already used, and the arraycopy stubs are called with the C ABI, hence expecting any callee saved register to either not be touched or be saved in the callee, making them poor choices for temp registers. While a different selection could be made, using for example a callee saved register such as r20 and r21, I'd have to push the registers in the prologue and pop them in the epilogue of the arraycopy stubs, which seems a bit awkward if it can be avoided. I'm open to doing that, but it doesn't spark joy. So I thought I'd explain the choice here. What do you think? Note also that the arraycopy stubs were already using r8 and r9 as temp registers. I just moved their use to GC barriers, so only they need to deal with their inherent scratchyness, while the temp registers used in the client code can use registers that are not scratchy, which was not the case before. ------------- PR: https://git.openjdk.org/jdk/pull/12670 From iklam at openjdk.org Wed Mar 1 10:00:06 2023 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 1 Mar 2023 10:00:06 GMT Subject: RFR: JDK-8302989: Add missing INCLUDE_CDS checks [v2] In-Reply-To: References: Message-ID: On Wed, 1 Mar 2023 09:24:39 GMT, Matthias Baesken wrote: >> The cds only coding in hotspot is usually guarded with the INCLUDE_CDS macro so that it can be removed at compile time in case the correct configure flags are set. >> However at some places INCLUDE_CDS is missing and should be added. >> >> One question - should (additionally to the UseSharedSpaces code section) the DumpSharedSpaces code sections be guarded as well with INCLUDE_CDS macros ? > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > Make SharedSpaces related vars const and false in non CDS mode src/hotspot/share/utilities/globalDefinitions.hpp line 594: > 592: #else > 593: // in non CDS mode do not export it > 594: extern const bool UseSharedSpaces; How about this? const bool RequireSharedSpaces = false; This will allow the C++ compiler to eliminate all code that depend on this value when `INCLUDE_CDS` is false. ------------- PR: https://git.openjdk.org/jdk/pull/12691 From kbarrett at openjdk.org Wed Mar 1 10:25:06 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 1 Mar 2023 10:25:06 GMT Subject: RFR: 8302798: Refactor -XX:+UseOSErrorReporting for noreturn crash reporting [v3] In-Reply-To: References: Message-ID: > Please review this change to the implementation of the Windows-specific option > UseOSErrorReporting, toward allowing crash reporting functions to be declared > noreturn. VMError::report_and_die no longer conditionally returns if the > Windows-only option UseOSErrorReporting is true. > > The Windows-only sections of report_and_die now call RaiseFailFastException > (https://learn.microsoft.com/en-us/windows/win32/api/errhandlingapi/nf-errhandlingapi-raisefailfastexception), > which immediately invokes WER (Windows Error Reporting) if it is enabled, > without executing structured exception handler. If WER is not enabled, it > just immediately terminates the program. Thus, we no longer return to walk up > thestructured exception handler chain to pop out at the top as unhandled in > order to invoke WER. > > This permits declaring report_and_die as [[noreturn]], once some functions > from the os class are also so declared. Also adding that attribute as > appropriate to other functions in the os class. This of course assumes > the use of [[noreturn]] in HotSpot code is approved (JDK-8302124). > > There is a pre-existing bug that I'll be reporting separately. If > UseOSErrorReporting and CreateCoredumpOnCrash are both true, we create an > empty .mdmp file. We shouldn't create that file when UseOSErrorReporting. > > Testing: > mach5 tier1-3 > > Manual testing with the following, to verify desired behavior. > > -XX:ErrorHandlerTest=N > 1: assertion failure > 2: guarantee failure > 14: SIGSEGV > 15: divide by zero > path/to/bin/java \ > -XX:+UnlockDiagnosticVMOptions \ > -XX:+ErrorLogSecondaryErrorDetails \ > -XX:+UseOSErrorReporting \ > -XX:ErrorHandlerTest=1 \ > TestDebug.java > > --- TestDebug.java --- > import java.lang.String; > public class TestDebug { > static private volatile String dummy; > public static void main(String[] args) throws Exception { > while (true) { > dummy = new String("foo bar"); > } > } > } > --- end TestDebug.java --- > > The state of WER can be examined and modified using Power Shell commands > {Get,Enable,Disable}-WindowsErrorReporting. > > The state of reporting WER captured errors can be examined and modified using > Control Panel > Security and Maintenance > Maintenance : Report Problems [on,off] > > With Report Problems off, reports are placed in > c:\ProgramData\Microsoft\Windows\WER\ReportArchive > > I verified that executing the above test with WER enabled adds an entry in > that directory, but not when it's disabled. Also nothing is added there when > the test is run with -XX:-UseOSErrorReporting. Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into failfast - remove failfast cuttoff of secondary errors - failfast ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12759/files - new: https://git.openjdk.org/jdk/pull/12759/files/e3717037..8a7c86ee Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12759&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12759&range=01-02 Stats: 3684 lines in 195 files changed: 2505 ins; 453 del; 726 mod Patch: https://git.openjdk.org/jdk/pull/12759.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12759/head:pull/12759 PR: https://git.openjdk.org/jdk/pull/12759 From kbarrett at openjdk.org Wed Mar 1 10:25:07 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 1 Mar 2023 10:25:07 GMT Subject: Integrated: 8302798: Refactor -XX:+UseOSErrorReporting for noreturn crash reporting In-Reply-To: References: Message-ID: On Mon, 27 Feb 2023 10:19:50 GMT, Kim Barrett wrote: > Please review this change to the implementation of the Windows-specific option > UseOSErrorReporting, toward allowing crash reporting functions to be declared > noreturn. VMError::report_and_die no longer conditionally returns if the > Windows-only option UseOSErrorReporting is true. > > The Windows-only sections of report_and_die now call RaiseFailFastException > (https://learn.microsoft.com/en-us/windows/win32/api/errhandlingapi/nf-errhandlingapi-raisefailfastexception), > which immediately invokes WER (Windows Error Reporting) if it is enabled, > without executing structured exception handler. If WER is not enabled, it > just immediately terminates the program. Thus, we no longer return to walk up > thestructured exception handler chain to pop out at the top as unhandled in > order to invoke WER. > > This permits declaring report_and_die as [[noreturn]], once some functions > from the os class are also so declared. Also adding that attribute as > appropriate to other functions in the os class. This of course assumes > the use of [[noreturn]] in HotSpot code is approved (JDK-8302124). > > There is a pre-existing bug that I'll be reporting separately. If > UseOSErrorReporting and CreateCoredumpOnCrash are both true, we create an > empty .mdmp file. We shouldn't create that file when UseOSErrorReporting. > > Testing: > mach5 tier1-3 > > Manual testing with the following, to verify desired behavior. > > -XX:ErrorHandlerTest=N > 1: assertion failure > 2: guarantee failure > 14: SIGSEGV > 15: divide by zero > path/to/bin/java \ > -XX:+UnlockDiagnosticVMOptions \ > -XX:+ErrorLogSecondaryErrorDetails \ > -XX:+UseOSErrorReporting \ > -XX:ErrorHandlerTest=1 \ > TestDebug.java > > --- TestDebug.java --- > import java.lang.String; > public class TestDebug { > static private volatile String dummy; > public static void main(String[] args) throws Exception { > while (true) { > dummy = new String("foo bar"); > } > } > } > --- end TestDebug.java --- > > The state of WER can be examined and modified using Power Shell commands > {Get,Enable,Disable}-WindowsErrorReporting. > > The state of reporting WER captured errors can be examined and modified using > Control Panel > Security and Maintenance > Maintenance : Report Problems [on,off] > > With Report Problems off, reports are placed in > c:\ProgramData\Microsoft\Windows\WER\ReportArchive > > I verified that executing the above test with WER enabled adds an entry in > that directory, but not when it's disabled. Also nothing is added there when > the test is run with -XX:-UseOSErrorReporting. This pull request has now been integrated. Changeset: 539a4951 Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/539a4951eee914da15a00cbd04ebc6a2c59b8f23 Stats: 51 lines in 5 files changed: 22 ins; 8 del; 21 mod 8302798: Refactor -XX:+UseOSErrorReporting for noreturn crash reporting Reviewed-by: coleenp, stuefe ------------- PR: https://git.openjdk.org/jdk/pull/12759 From eosterlund at openjdk.org Wed Mar 1 10:56:31 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 1 Mar 2023 10:56:31 GMT Subject: RFR: 8302780: Add support for vectorized arraycopy GC barriers [v8] In-Reply-To: References: Message-ID: > So far, the arraycopy stubs have performed some kind of bulk pre/post barriers for arraycopy, which have been good enough, and allowed the copying itself to be done with plain loads and stores. For generational ZGC, this approach is not good enough, and we need barriers for the actual copying, but instead don't need the pre/post barriers. To prepare the JVM for generational ZGC, we need to add an API for arraycopy barriers. Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: Add comment about r15 useage ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12670/files - new: https://git.openjdk.org/jdk/pull/12670/files/888e4cb0..d03a5c29 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12670&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12670&range=06-07 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/12670.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12670/head:pull/12670 PR: https://git.openjdk.org/jdk/pull/12670 From eosterlund at openjdk.org Wed Mar 1 10:56:33 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 1 Mar 2023 10:56:33 GMT Subject: RFR: 8302780: Add support for vectorized arraycopy GC barriers [v6] In-Reply-To: <44kCeVWFrSGQEcehVV5G_EowMs26zWcuv5KmDbYv9Mc=.f49cb445-de78-4932-aefc-1839fd0be611@github.com> References: <47e5gQJkZn7ldGv3n2cyQoZSAT9YpWWvIwXnbxhdGuQ=.643a4652-204b-4689-85a4-9378e8a587b3@github.com> <44kCeVWFrSGQEcehVV5G_EowMs26zWcuv5KmDbYv9Mc=.f49cb445-de78-4932-aefc-1839fd0be611@github.com> Message-ID: <3nz9PRPYzYVszMpwatVR3TbeFZsdT0jHpZ6lHqgJbSA=.12875265-8044-41c7-8e12-3b2851be8b08@github.com> On Tue, 28 Feb 2023 12:28:02 GMT, Roberto Casta?eda Lozano wrote: >> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: >> >> Add comment > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 1270: > >> 1268: Label copy4, copy8, copy16, copy32, copy80, copy_big, finish; >> 1269: const Register t2 = r5, t3 = r6, t4 = r7, t5 = r11; >> 1270: const Register t6 = r12, t7 = r13, t8 = r14, t9 = r15; > > I find the usage of r15 in `copy_memory` a bit confusing. If I get it right, it is used > 1. as a temporary register (aliased as `t9`) for 16-bytes copying (L1407-1437), > 2. as a temporary register (in its raw form, i.e. not via `t9`) passed explicitly to `copy_memory_small` (L1518-1548), and > 3. as a "special" register passed implicitly to the generated `copy_longs` stubs (L1557-1574). > > I think it would be clearer if `t9` was used instead of `r15` for case 2) and then perhaps a comment was added before calling the generated `copy_longs` stubs mentioning that `t9` cannot be used from then on because it aliases with `r15`. Since r15 is part of the ABI contract across different generated stubs, I'd prefer to not change it to t9 as when you scroll through the different functions, it isn't clear that t9 isn't just used locally as a temporary register, but is used across stubs, even though it isn't passed as parameter with the name "t9" said stubs. I did however add a comment explaining when we start using r15 is count and that we can't use t9 any more after that point. I hope that's okay. ------------- PR: https://git.openjdk.org/jdk/pull/12670 From duke at openjdk.org Wed Mar 1 11:11:48 2023 From: duke at openjdk.org (Jan Kratochvil) Date: Wed, 1 Mar 2023 11:11:48 GMT Subject: RFR: 8302191: Performance degradation for float/double modulo on Linux [v4] In-Reply-To: References: Message-ID: > I have OCA already processed/approved. I am not Author but my Author request is being processed these days (sent to Rob McKenna). > I did regression test x86_64 OpenJDK-8. I will leave other regression testing on GHA. > The patch (and former GCC performance regression) affects only x86_64+i686. Jan Kratochvil has updated the pull request incrementally with one additional commit since the last revision: Fix WIN32 vs. WIN64. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12508/files - new: https://git.openjdk.org/jdk/pull/12508/files/44ba5348..4b7756f3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12508&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12508&range=02-03 Stats: 15 lines in 2 files changed: 9 ins; 5 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12508.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12508/head:pull/12508 PR: https://git.openjdk.org/jdk/pull/12508 From rrich at openjdk.org Wed Mar 1 11:18:47 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 1 Mar 2023 11:18:47 GMT Subject: RFR: 8296440: Remove Method* handling from cleanup_inline_caches_impl Message-ID: This PR replaces cleaning of static stubs in CompiledMethod::cleanup_inline_caches_impl() with a guarantee that it is actually not needed because the holder of the embedded target Method* is alive if the caller nmethod is not unloading. The holder of the target Method* has to be alive because it is reachable from the caller nmethod's oop pool. This is checked by `check_path_to_callee()` when a statically bound call gets resolved. C2i entry barriers can be removed for the same reason. Testing: Many rounds in our CI testing which includes most JCK and JTREG tests, Renaissance benchmark and SAP specific tests with fastdebug and release builds on the standard platforms plus PPC64. I've also done tier1 and tier2 tests with -XX:-Inline and tier1 tests with ZGC. I've started hotspot and jdk tier1 tests with -Xcomp. They were not finished when I stopped them after 24h. ------------- Commit messages: - Remove MacroAssembler::resolve_weak_handle() - Remove keep_alive_offset() and holder_offset() from CLD - Remove MacroAssembler::load_method_holder_cld() - Remove c2i entry barrier - Check dependency for statically bound call Changes: https://git.openjdk.org/jdk/pull/12802/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12802&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8296440 Stats: 367 lines in 26 files changed: 74 ins; 283 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/12802.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12802/head:pull/12802 PR: https://git.openjdk.org/jdk/pull/12802 From mbaesken at openjdk.org Wed Mar 1 11:30:02 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 1 Mar 2023 11:30:02 GMT Subject: RFR: JDK-8302989: Add missing INCLUDE_CDS checks [v2] In-Reply-To: References: Message-ID: On Wed, 1 Mar 2023 09:57:28 GMT, Ioi Lam wrote: >> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: >> >> Make SharedSpaces related vars const and false in non CDS mode > > src/hotspot/share/utilities/globalDefinitions.hpp line 594: > >> 592: #else >> 593: // in non CDS mode do not export it >> 594: extern const bool UseSharedSpaces; > > How about this? > > > const bool RequireSharedSpaces = false; > > > This will allow the C++ compiler to eliminate all code that depend on this value when `INCLUDE_CDS` is false. Hi Ioi, I set this already some lines above (in the non-INCLUDE_CDS case). ------------- PR: https://git.openjdk.org/jdk/pull/12691 From tschatzl at openjdk.org Wed Mar 1 11:38:14 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 1 Mar 2023 11:38:14 GMT Subject: RFR: 8303418: Improve parameter and variable names in BitMap [v2] In-Reply-To: References: Message-ID: On Wed, 1 Mar 2023 06:08:33 GMT, Kim Barrett wrote: >> Please review this change to names in BitMap. >> >> - Parameters that designate a bit in a BitMap are named "bit". >> - Parameters that designate a word in the underlying BitMap storage are named "word". >> - Parameters that designate a range are named "beg" and "end" resp. >> >> Added helper function `flipped_word` for use by `get_next_bit_impl`, replacing >> the odd overload for `map`. >> >> In `get_next_bit_impl`, prefixed the variables "index" and "limit" with >> "word_" to make clear the units. >> >> Testing: >> mach5 tier1 > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > copyrights Lgtm. It would be great to also fix the unusual spacing between method names and the parameter brackets (and the method body brackets) a little, but that can be a different issue. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.org/jdk/pull/12798 From duke at openjdk.org Wed Mar 1 11:59:07 2023 From: duke at openjdk.org (Jan Kratochvil) Date: Wed, 1 Mar 2023 11:59:07 GMT Subject: RFR: 8302191: Performance degradation for float/double modulo on Linux [v2] In-Reply-To: <4A9Uw_YcF6d1bg751wsOJJeLT1Mfx8b6Ne5MUSUvdrs=.de62d0d6-ccdb-402c-94be-800328cc622d@github.com> References: <4A9Uw_YcF6d1bg751wsOJJeLT1Mfx8b6Ne5MUSUvdrs=.de62d0d6-ccdb-402c-94be-800328cc622d@github.com> Message-ID: On Mon, 27 Feb 2023 02:23:07 GMT, David Holmes wrote: >> Jan Kratochvil has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: >> >> 8302191: Performance degradation for float/double modulo on Linux > > src/hotspot/cpu/x86/sharedRuntime_x86.cpp line 97: > >> 95: jne 1b \n\ >> 96: " >> 97: :"=t"(retval) > > This doesn't compile on Windows > > c:\sb\prod\1677461706\workspace\open\src\hotspot\cpu\x86\sharedRuntime_x86.cpp(97): error C2059: syntax error: ':' > c:\sb\prod\1677461706\workspace\open\src\hotspot\cpu\x86\sharedRuntime_x86.cpp(114): error C2059: syntax error: ':' > > I assume this asm is gcc based. Thanks for reporting it, I am no longer providing the asm variant on any Windows platform. ------------- PR: https://git.openjdk.org/jdk/pull/12508 From duke at openjdk.org Wed Mar 1 12:03:54 2023 From: duke at openjdk.org (Jan Kratochvil) Date: Wed, 1 Mar 2023 12:03:54 GMT Subject: RFR: 8302191: Performance degradation for float/double modulo on Linux [v5] In-Reply-To: References: Message-ID: > I have OCA already processed/approved. I am not Author but my Author request is being processed these days (sent to Rob McKenna). > I did regression test x86_64 OpenJDK-8. I will leave other regression testing on GHA. > The patch (and former GCC performance regression) affects only x86_64+i686. Jan Kratochvil has updated the pull request incrementally with one additional commit since the last revision: Fix copyright author. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12508/files - new: https://git.openjdk.org/jdk/pull/12508/files/4b7756f3..a04ee993 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12508&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12508&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12508.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12508/head:pull/12508 PR: https://git.openjdk.org/jdk/pull/12508 From duke at openjdk.org Wed Mar 1 12:03:59 2023 From: duke at openjdk.org (Jan Kratochvil) Date: Wed, 1 Mar 2023 12:03:59 GMT Subject: RFR: 8302191: Performance degradation for float/double modulo on Linux [v2] In-Reply-To: References: Message-ID: On Mon, 27 Feb 2023 01:01:11 GMT, David Holmes wrote: >> Jan Kratochvil has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains one additional commit since the last revision: >> >> 8302191: Performance degradation for float/double modulo on Linux > > test/micro/org/openjdk/bench/vm/floatingpoint/DremFrem.java line 2: > >> 1: /* >> 2: * Copyright (c) 2023 Oracle and/or its affiliates. All rights reserved. > > Oracle did not write this code so it should have your own copyright on it, unless you copied it from other OpenJDK code in which case it could have dual copyright. I have put there Azul now. ------------- PR: https://git.openjdk.org/jdk/pull/12508 From iklam at openjdk.org Wed Mar 1 12:30:05 2023 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 1 Mar 2023 12:30:05 GMT Subject: RFR: JDK-8302989: Add missing INCLUDE_CDS checks [v2] In-Reply-To: References: Message-ID: On Wed, 1 Mar 2023 11:26:51 GMT, Matthias Baesken wrote: >> src/hotspot/share/utilities/globalDefinitions.hpp line 594: >> >>> 592: #else >>> 593: // in non CDS mode do not export it >>> 594: extern const bool UseSharedSpaces; >> >> How about this? >> >> >> const bool RequireSharedSpaces = false; >> >> >> This will allow the C++ compiler to eliminate all code that depend on this value when `INCLUDE_CDS` is false. > > Hi Ioi, I set this already some lines above (in the non-INCLUDE_CDS case). You need to remove the `extern` keyword, and add the definition of `= false` in globalDefinitions.hpp. Then, the corresponding definition should be removed from globalDefinitions.cpp. ------------- PR: https://git.openjdk.org/jdk/pull/12691 From duke at openjdk.org Wed Mar 1 12:41:05 2023 From: duke at openjdk.org (Johannes Bechberger) Date: Wed, 1 Mar 2023 12:41:05 GMT Subject: RFR: 8303444: AsyncGetCallTrace obtains too few frames with instrumentation agent Message-ID: This fixes the bug by removing the faulty completeness check for runtime blobs. I tested it using the [trace_validation](https://github.com/parttimenerd/trace_validation) tool successfully as described in the issue. I furthermore ran the [jdk-profiling-tester](https://github.com/parttimenerd/jdk-profiling-tester) to ensure that this fix did not introduce any stability issues and ran the serviceability JTREG tests successfully. ------------- Commit messages: - Don't check for completeness of runtime stubs Changes: https://git.openjdk.org/jdk/pull/12804/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12804&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303444 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/12804.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12804/head:pull/12804 PR: https://git.openjdk.org/jdk/pull/12804 From coleenp at openjdk.org Wed Mar 1 12:42:10 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 1 Mar 2023 12:42:10 GMT Subject: RFR: 8292059: Do not inline InstanceKlass::allocate_instance() In-Reply-To: References: Message-ID: <9ZXi9uNa5ETIhldKLCDAYojtXTGEg-5EexLwHNE2zhI=.026815b3-3117-4184-be4c-5fdf42c2655f@github.com> On Tue, 28 Feb 2023 11:11:54 GMT, Afshin Zafari wrote: > The inline and not-inline versions of the method is stress tested to compare the performance difference. The statistics are drawn in the following charts. The vertical axis is in milliseconds. > > ![chart (2)](https://user-images.githubusercontent.com/4697012/221848555-2884313e-9d26-41c9-a265-3f1ce295b17b.png) > > ![chart (3)](https://user-images.githubusercontent.com/4697012/221863810-94118677-b4af-468f-90c6-5ea365ae3588.png) I don't know about these numbers but if this is not a neutral change for performance based on looking at the code and callers, I don't know what is. src/hotspot/share/oops/instanceKlass.inline.hpp line 188: > 186: } > 187: > 188: inline instanceOop InstanceKlass::allocate_instance(oop java_class, TRAPS) { In moving this, can you eliminate any #includes at the top? And migrate them to .cpp files that might need them. ------------- PR: https://git.openjdk.org/jdk/pull/12782 From epeter at openjdk.org Wed Mar 1 12:59:09 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 1 Mar 2023 12:59:09 GMT Subject: RFR: 8302144: Move ZeroTLABTest.java to tier3 [v2] In-Reply-To: References: Message-ID: > This is a "Hello World" test for the flags `UseTLAB` and `ZeroTLAB`, running with `-Xcomp`. It takes about 30seconds. > This test does not need to run on `tier1`. So I pushed it out to `tier3`. > > Moved it to tier3, by adding it to hotspot_slow_compiler, which is excluded from tier1 and tier2. Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - merge from master - 8302144: Move ZeroTLABTest.java to tier3 ------------- Changes: https://git.openjdk.org/jdk/pull/12620/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12620&range=01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/12620.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12620/head:pull/12620 PR: https://git.openjdk.org/jdk/pull/12620 From jvernee at openjdk.org Wed Mar 1 13:06:09 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 1 Mar 2023 13:06:09 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v3] In-Reply-To: References: <8b3vVrV22RuhdRoRYacXV0ZeghFGgKkC8S_z-iMrzAQ=.dd84b743-8b51-4281-8f5f-f9eff6207bc7@github.com> Message-ID: On Wed, 1 Mar 2023 06:37:45 GMT, Martin Doerr wrote: > There will surely be more when looking into Big Endian support after merging with your recent work on https://github.com/openjdk/panama-foreign/compare/foreign-memaccess+abi...JornVernee:panama-foreign:OOB I will try to move this into the mainline ahead of the JEP PR (process allowing). As well as another testing coverage patch for nested structs/unions: https://github.com/openjdk/panama-foreign/pull/780. > Thanks for sharing your plans to intrisify linkToNative in C2 later. I guess we should do more preparation work on all platforms when that gets addressed. WRT this: we had intrinsification in the past as well (https://github.com/openjdk/panama-foreign/pull/609), but it was removed since we also removed 'trivial' calls at some point (and then the intrinsics were not used). Now that trivial calls are back on the menu, it might be interesting to look at re-adding intrinsics as well. ------------- PR: https://git.openjdk.org/jdk/pull/12708 From mdoerr at openjdk.org Wed Mar 1 13:06:13 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 1 Mar 2023 13:06:13 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v3] In-Reply-To: References: <8b3vVrV22RuhdRoRYacXV0ZeghFGgKkC8S_z-iMrzAQ=.dd84b743-8b51-4281-8f5f-f9eff6207bc7@github.com> Message-ID: On Tue, 28 Feb 2023 20:51:50 GMT, Jorn Vernee wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove size restriction for structs. Add TODO for Big Endian. > > src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/CallArranger.java line 68: > >> 66: public abstract class CallArranger { >> 67: // Linux PPC64 Little Endian uses ABI v2. >> 68: private static final boolean useABIv2 = ByteOrder.nativeOrder() == ByteOrder.LITTLE_ENDIAN; > > Now that I'm here. This could be a potentially interesting case for having 2 subclasses of CallArranger: one for `useABIv2 == true` and one for `false`. Yeah, let's wait until we know what changes we need for AIX (and Big Endian linux). > src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/CallArranger.java line 293: > >> 291: } else { >> 292: overlappingReg = new VMStorage(StorageType.STACK_AND_FLOAT, >> 293: (short) STACK_SLOT_SIZE, (int) stackOffset - 4); > > I think you could remove the mixed VMStorage types here relatively easily by returning a `VMStorage[][]`, where each element is a single element array, but then for the `needOverlapping` case add another element to the array for the extra store (instead of replacing the existing one). > > Then when unboxing a `STRUCT_HFA`, `dup` the result of the `bufferLoad` and then do 2 `vmStore`s (one for each element). > > For boxing, you could just ignore the extra storage, and just `vmLoad` the first one (or, whichever one you like :)) Thanks! I need to find extra time for this. Sounds like a good idea and I may be able to get rid of some nasty code. ------------- PR: https://git.openjdk.org/jdk/pull/12708 From simonis at openjdk.org Wed Mar 1 13:39:04 2023 From: simonis at openjdk.org (Volker Simonis) Date: Wed, 1 Mar 2023 13:39:04 GMT Subject: RFR: 8302783: Improve CRC32C intrinsic with crypto pmull on AArch64 In-Reply-To: References: Message-ID: On Fri, 17 Feb 2023 19:59:24 GMT, Yi-Fan Tsai wrote: > This change adds a pmull-based CRC32C intrinsic, and it is more performant than the existing crc32c-instruction-based intrinsic on Neoverse V1. The benchmark shows 10 - 99% improvement. The improvement comes from the execution throughput increase of pmull/pmull2 from 1 on Neoverse N1 to 4 on Neoverse V1 while the latency remains 2 while the throughput of CRC32C instructions did not changed. > > The pmull-based CRC32C intrinsic is enabled by the existing option UseCryptoPmullForCRC32 which also enables the pmull-based CRC32 intrinsic. The option requires crc32c instructions, eor3 in SHA3, and 64-bit pmull/pmull2 in Cryptographic Extension. > > With this change, there will be only two different CRC32C intrinsics, crc32c and pmull, while there are four CRC32 intrinsics. > > The following test has passed. > test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java > > The throughput reported by [the micro benchmark](https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/java/util/TestCRC32C.java) is measured on an EC2 c7g instance. The optimization shows 10 - 99% improvement when the input is at least 384 bytes. > > | input | 64 | 128 | 256 | 384 | 511 | 512 | 1,024 | > | ------------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | > | improvement | 1.60% | 0.00% | 0.00% | 15.24% | 10.76% | 34.32% | 72.39% | > > | input | 2,048 | 4,096 | 8,192 | 16,384 | 32,768 | 65,536 | > | ------------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | > | improvement | 84.96% | 92.59% | 96.19% | 98.02% | 99.32% | 98.36% | > > > Baseline > > Benchmark (count) Mode Cnt Score Error Units > TestCRC32C.testCRC32CUpdate 64 thrpt 12 196575.739 ? 1824.113 ops/ms > TestCRC32C.testCRC32CUpdate 128 thrpt 12 123666.570 ? 2.730 ops/ms > TestCRC32C.testCRC32CUpdate 256 thrpt 12 70188.989 ? 2.002 ops/ms > TestCRC32C.testCRC32CUpdate 384 thrpt 12 49000.690 ? 1.421 ops/ms > TestCRC32C.testCRC32CUpdate 511 thrpt 12 34106.279 ? 25.390 ops/ms > TestCRC32C.testCRC32CUpdate 512 thrpt 12 37638.349 ? 1.039 ops/ms > TestCRC32C.testCRC32CUpdate 1024 thrpt 12 19526.513 ? 0.439 ops/ms > TestCRC32C.testCRC32CUpdate 2048 thrpt 12 9951.392 ? 4.803 ops/ms > TestCRC32C.testCRC32CUpdate 4096 thrpt 12 5023.268 ? 0.240 ops/ms > TestCRC32C.testCRC32CUpdate 8192 thrpt 12 2523.877 ? 0.062 ops/ms > TestCRC32C.testCRC32CUpdate 16384 thrpt 12 1265.011 ? 0.047 ops/ms > TestCRC32C.testCRC32CUpdate 32768 thrpt 12 632.291 ? 0.058 ops/ms > TestCRC32C.testCRC32CUpdate 65536 thrpt 12 315.396 ? 0.160 ops/ms > > > Crypto pmull > > Benchmark (count) Mode Cnt Score Error Units > TestCRC32C.testCRC32CUpdate 64 thrpt 12 199726.599 ? 166.477 ops/ms > TestCRC32C.testCRC32CUpdate 128 thrpt 12 123669.385 ? 1.821 ops/ms > TestCRC32C.testCRC32CUpdate 256 thrpt 12 70188.727 ? 1.313 ops/ms > TestCRC32C.testCRC32CUpdate 384 thrpt 12 56468.837 ? 76.524 ops/ms > TestCRC32C.testCRC32CUpdate 511 thrpt 12 37777.205 ? 406.431 ops/ms > TestCRC32C.testCRC32CUpdate 512 thrpt 12 50554.555 ? 17.169 ops/ms > TestCRC32C.testCRC32CUpdate 1024 thrpt 12 33661.006 ? 140.471 ops/ms > TestCRC32C.testCRC32CUpdate 2048 thrpt 12 18406.482 ? 205.952 ops/ms > TestCRC32C.testCRC32CUpdate 4096 thrpt 12 9674.159 ? 20.390 ops/ms > TestCRC32C.testCRC32CUpdate 8192 thrpt 12 4951.562 ? 6.566 ops/ms > TestCRC32C.testCRC32CUpdate 16384 thrpt 12 2504.970 ? 1.883 ops/ms > TestCRC32C.testCRC32CUpdate 32768 thrpt 12 1260.278 ? 0.484 ops/ms > TestCRC32C.testCRC32CUpdate 65536 thrpt 12 625.608 ? 0.300 ops/ms Change looks good. Thanks, Volker ------------- Marked as reviewed by simonis (Reviewer). PR: https://git.openjdk.org/jdk/pull/12624 From duke at openjdk.org Wed Mar 1 14:38:03 2023 From: duke at openjdk.org (Johannes Bechberger) Date: Wed, 1 Mar 2023 14:38:03 GMT Subject: RFR: 8303444: AsyncGetCallTrace obtains too few frames with instrumentation agent In-Reply-To: References: Message-ID: On Wed, 1 Mar 2023 12:32:19 GMT, Johannes Bechberger wrote: > This fixes the bug by removing the faulty completeness check for runtime blobs. > > I tested it using the [trace_validation](https://github.com/parttimenerd/trace_validation) tool successfully as described in the issue. I furthermore ran the [jdk-profiling-tester](https://github.com/parttimenerd/jdk-profiling-tester) to ensure that this fix did not introduce any stability issues and ran the serviceability JTREG tests successfully. The problem seems to stem from runtime stubs generated for GC load barriers. These stubs are marked with a `frame_never_safe` value for `_frame_complete_offset`. This then causes `is_frame_complete_at` to return always false for these codeblobs. It might be that there were problems in the past (before 2008) with walking over these frames, but neither GetStackTrace nor the the current implementation of AsyncGetCallTrace have any problems. It might be that walking over these frames got only implemented in the recent years. ------------- PR: https://git.openjdk.org/jdk/pull/12804 From jcking at openjdk.org Wed Mar 1 15:53:25 2023 From: jcking at openjdk.org (Justin King) Date: Wed, 1 Mar 2023 15:53:25 GMT Subject: RFR: JDK-8300783: Consolidate byteswap implementations [v14] In-Reply-To: <-b783DPmWbWFeigKf7F7SFYddDKwErM4AdFcfRx01eM=.5794fa08-a0d7-4761-a449-8ebfd639e30d@github.com> References: <-b783DPmWbWFeigKf7F7SFYddDKwErM4AdFcfRx01eM=.5794fa08-a0d7-4761-a449-8ebfd639e30d@github.com> Message-ID: <9Z_vNHvdrpNCZriJVv4fmLju4PTmAMPHeJ5sEJ5OsTI=.6b1e1bbf-b3b4-4fa5-9b1c-c61eb8f0459b@github.com> On Wed, 15 Feb 2023 15:39:14 GMT, Justin King wrote: >> Deduplicate byte swapping implementations by consolidating them into `utilities/byteswap.hpp`, following `std::byteswap` introduced in C++23. Further simplification of `Bytes` will follow in https://github.com/openjdk/jdk/pull/12078. > > Justin King has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 24 additional commits since the last revision: > > - Merge remote-tracking branch 'upstream/master' into byteswap > - Update based on review > > Signed-off-by: Justin King > - Fix copyright > > Signed-off-by: Justin King > - Update copyright > > Signed-off-by: Justin King > - Add missing include > > Signed-off-by: Justin King > - Remove unused include > > Signed-off-by: Justin King > - Reorganize tests > > Signed-off-by: Justin King > - Fix test > > Signed-off-by: Justin King > - Merge remote-tracking branch 'upstream/master' into byteswap > - Be restrict on requiring 1, 2, 4, or 8 byte integers > > Signed-off-by: Justin King > - ... and 14 more: https://git.openjdk.org/jdk/compare/a469507d...223d733b Friendly poke for a second reviewer. ------------- PR: https://git.openjdk.org/jdk/pull/12114 From kvn at openjdk.org Wed Mar 1 17:37:16 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 1 Mar 2023 17:37:16 GMT Subject: RFR: 8302144: Move ZeroTLABTest.java to tier3 [v2] In-Reply-To: References: Message-ID: <0J9V3m35smWUpAttCIwF-r86K0Br3a6lsj0Btim0eIg=.606b25c8-b1f2-4358-9c9a-3f4541017fc5@github.com> On Wed, 1 Mar 2023 12:59:09 GMT, Emanuel Peter wrote: >> This is a "Hello World" test for the flags `UseTLAB` and `ZeroTLAB`, running with `-Xcomp`. It takes about 30seconds. >> This test does not need to run on `tier1`. So I pushed it out to `tier3`. >> >> Moved it to tier3, by adding it to hotspot_slow_compiler, which is excluded from tier1 and tier2. > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - merge from master > - 8302144: Move ZeroTLABTest.java to tier3 Agree. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.org/jdk/pull/12620 From dnsimon at openjdk.org Wed Mar 1 18:38:19 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 1 Mar 2023 18:38:19 GMT Subject: RFR: 8303431: [JVMCI] libgraal annotation API Message-ID: This PR extends JVMCI with new API (`jdk.vm.ci.meta.Annotated`) for accessing annotations. The main differences from `java.lang.reflect.AnnotatedElement` are: * Each `Annotated` method explicitly specifies the annotation type(s) for which it wants annotation data. That is, there is no direct equivalent of `AnnotatedElement.getAnnotations()`. * Annotation data is returned in a map-like object (of type `jdk.vm.ci.meta.AnnotationData`) instead of in an `Annotation` object. This works better for libgraal as it avoids the need for annotation types to be loaded and included in libgraal. To demonstrate the new API, here's an example in terms `java.lang.reflect.AnnotatedElement` (which `ResolvedJavaType` implements): ResolvedJavaMethod method = ...; ExplodeLoop a = method.getAnnotation(ExplodeLoop.class); return switch (a.kind()) { case FULL_UNROLL -> LoopExplosionKind.FULL_UNROLL; case FULL_UNROLL_UNTIL_RETURN -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; ... } The same code using the new API: ResolvedJavaMethod method = ...; ResolvedJavaType explodeLoopType = ...; AnnotationData a = method.getAnnotationDataFor(explodeLoopType); return switch (a.getEnum("kind").getName()) { case "FULL_UNROLL" -> LoopExplosionKind.FULL_UNROLL; case "FULL_UNROLL_UNTIL_RETURN" -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; ... } The implementation relies on new methods in `jdk.internal.vm.VMSupport` for parsing annotations and serializing/deserializing to/from a byte array. This allows the annotation data to be passed from the HotSpot heap to the libgraal heap. ------------- Commit messages: - made AnnotationDataDecoder package-private - add annotation API to JVMCI Changes: https://git.openjdk.org/jdk/pull/12810/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12810&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303431 Stats: 2666 lines in 33 files changed: 2614 ins; 24 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/12810.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12810/head:pull/12810 PR: https://git.openjdk.org/jdk/pull/12810 From redestad at openjdk.org Wed Mar 1 19:15:10 2023 From: redestad at openjdk.org (Claes Redestad) Date: Wed, 1 Mar 2023 19:15:10 GMT Subject: RFR: 8292059: Do not inline InstanceKlass::allocate_instance() In-Reply-To: References: Message-ID: <-W9KBy9yRFPK3L5GHdhGkkHE1YRqh--NNuRFgPnpVx4=.e4459e3a-fb59-4f57-a05d-7445abf182aa@github.com> On Tue, 28 Feb 2023 11:11:54 GMT, Afshin Zafari wrote: > The inline and not-inline versions of the method is stress tested to compare the performance difference. The statistics are drawn in the following charts. The vertical axis is in milliseconds. > > ![chart (2)](https://user-images.githubusercontent.com/4697012/221848555-2884313e-9d26-41c9-a265-3f1ce295b17b.png) > > ![chart (3)](https://user-images.githubusercontent.com/4697012/221863810-94118677-b4af-468f-90c6-5ea365ae3588.png) I'm also skeptical as to the relevance of these numbers. `allocate_instance(oop, TRAPS)` is nowhere near GC code, so my guess is that there's run-to-run variation on whatever the workload is here. I'm not against removing this inlining. I did it as a minor part of another startup optimization after checking that the inlining was size-neutral and noticing that this meant a speed-up in interpreter and C1 for some relatively common JNI calls. It might still be marginally beneficial there, but if there's a measurable impact on compilation time and header complexity then who am I to object. ------------- PR: https://git.openjdk.org/jdk/pull/12782 From stuefe at openjdk.org Wed Mar 1 21:03:25 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 1 Mar 2023 21:03:25 GMT Subject: RFR: 8302798: Refactor -XX:+UseOSErrorReporting for noreturn crash reporting In-Reply-To: References: Message-ID: On Mon, 27 Feb 2023 12:27:56 GMT, David Holmes wrote: > > So ... this approach skips the VM error reporting and goes straight to WER? > > Strike that - no it doesn't. Okay I have to admit I'm not at all clear on what the existing unwinding process actually looks like, and how the new fix affects that. I don't think there is unwinding as we know it (calling destructors etc). IIRC on x64 there is a chain of data structures that correlate with stack levels, and each data structure is associated with a range of PCs. On crash the code just locates the correct data structure for the crash PC, grabs the exception handler address from that structure and invokes that. If that handler returns "CONTINUE_SEARCH" it walks the chain up to the next data structure and invokes that. With this patch we don't return CONTINUE_SEARCH anymore but end the process right away. ------------- PR: https://git.openjdk.org/jdk/pull/12759 From dholmes at openjdk.org Thu Mar 2 04:05:15 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 2 Mar 2023 04:05:15 GMT Subject: RFR: 8302191: Performance degradation for float/double modulo on Linux [v5] In-Reply-To: References: Message-ID: On Wed, 1 Mar 2023 12:03:54 GMT, Jan Kratochvil wrote: >> I have OCA already processed/approved. I am not Author but my Author request is being processed these days (sent to Rob McKenna). >> I did regression test x86_64 OpenJDK-8. I will leave other regression testing on GHA. >> The patch (and former GCC performance regression) affects only x86_64+i686. > > Jan Kratochvil has updated the pull request incrementally with one additional commit since the last revision: > > Fix copyright author. You can't move the _WIN64 workaround to the sharedRuntime_x86.cpp file as that code is also used by Windows-Aarch64. Whether it needs the workaround or not is another matter, but unless proven otherwise we have to assume it does. ------------- Changes requested by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/12508 From mdoerr at openjdk.org Thu Mar 2 04:48:43 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 2 Mar 2023 04:48:43 GMT Subject: RFR: 8303210: [linux, Windows] Enable UseSystemMemoryBarrier by default if possible [v4] In-Reply-To: <9eZo1xYNGhjMSC9lDXKtkO1eyU_H-Veuh1AeP3CPKbg=.69b6c4ff-a3ef-439f-8468-21fec9de1825@github.com> References: <9eZo1xYNGhjMSC9lDXKtkO1eyU_H-Veuh1AeP3CPKbg=.69b6c4ff-a3ef-439f-8468-21fec9de1825@github.com> Message-ID: <_5_YBJ25wS7iSZHP3l3JCkDfsEwJAxOvb5TFYxogKQg=.f7fc4158-7519-4d64-811a-3bfdbe0d53e8@github.com> > I'd like to enable UseSystemMemoryBarrier by default on supported Operating Systems in order to improve performance of thread state transitions (I/O, JNI, foreign function calls, JIT compiler threads, etc.). See JBS issue for more details. > Unfortunately, the feature was not yet implemented on all platforms. I added the code, but need the platform maintainers to check if it can be used reliably (and ideally if the performance improves). It's easy to switch it off again in case of problems. Martin Doerr has updated the pull request incrementally with two additional commits since the last revision: - Make it a regular product flag and keep it off by default. - Add includes for riscv. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12753/files - new: https://git.openjdk.org/jdk/pull/12753/files/4660291f..b7278798 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12753&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12753&range=02-03 Stats: 3 lines in 3 files changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12753.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12753/head:pull/12753 PR: https://git.openjdk.org/jdk/pull/12753 From mdoerr at openjdk.org Thu Mar 2 04:48:45 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 2 Mar 2023 04:48:45 GMT Subject: RFR: 8303210: [linux, Windows] Enable UseSystemMemoryBarrier by default if possible [v3] In-Reply-To: References: <9eZo1xYNGhjMSC9lDXKtkO1eyU_H-Veuh1AeP3CPKbg=.69b6c4ff-a3ef-439f-8468-21fec9de1825@github.com> Message-ID: <4OUteKRXlzDsD5WYteulbdtGRZHLhZimdP-d9IWkEx4=.eb057247-54ba-4156-bd58-90049c36b544@github.com> On Wed, 1 Mar 2023 07:00:34 GMT, Fei Yang wrote: > bootcycle and tier1-3 test good on linux-riscv64 platform. I think it will be safer to explicitly include "runtime/globals.hpp": Thanks for testing! Added the includes. ------------- PR: https://git.openjdk.org/jdk/pull/12753 From mdoerr at openjdk.org Thu Mar 2 04:48:45 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 2 Mar 2023 04:48:45 GMT Subject: RFR: 8303210: [linux, Windows] Enable UseSystemMemoryBarrier by default if possible [v3] In-Reply-To: References: <9eZo1xYNGhjMSC9lDXKtkO1eyU_H-Veuh1AeP3CPKbg=.69b6c4ff-a3ef-439f-8468-21fec9de1825@github.com> Message-ID: On Tue, 28 Feb 2023 05:26:22 GMT, Martin Doerr wrote: >> I'd like to enable UseSystemMemoryBarrier by default on supported Operating Systems in order to improve performance of thread state transitions (I/O, JNI, foreign function calls, JIT compiler threads, etc.). See JBS issue for more details. >> Unfortunately, the feature was not yet implemented on all platforms. I added the code, but need the platform maintainers to check if it can be used reliably (and ideally if the performance improves). It's easy to switch it off again in case of problems. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Improve logging. I have made it a regular product flag and keep it off by default. I'll need a CSR, now. ------------- PR: https://git.openjdk.org/jdk/pull/12753 From mdoerr at openjdk.org Thu Mar 2 06:43:15 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 2 Mar 2023 06:43:15 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v6] In-Reply-To: References: Message-ID: > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Will get addressed separately: [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Add test for HFA corner cases. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12708/files - new: https://git.openjdk.org/jdk/pull/12708/files/75b5c78f..c96e1120 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=04-05 Stats: 256 lines in 2 files changed: 256 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From epeter at openjdk.org Thu Mar 2 07:25:23 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 2 Mar 2023 07:25:23 GMT Subject: Integrated: 8302144: Move ZeroTLABTest.java to tier3 In-Reply-To: References: Message-ID: On Fri, 17 Feb 2023 16:09:31 GMT, Emanuel Peter wrote: > This is a "Hello World" test for the flags `UseTLAB` and `ZeroTLAB`, running with `-Xcomp`. It takes about 30seconds. > This test does not need to run on `tier1`. So I pushed it out to `tier3`. > > Moved it to tier3, by adding it to hotspot_slow_compiler, which is excluded from tier1 and tier2. This pull request has now been integrated. Changeset: 99f5687e Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/99f5687eb192b249a4a4533578f56b131fb8f234 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8302144: Move ZeroTLABTest.java to tier3 Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/12620 From epeter at openjdk.org Thu Mar 2 07:25:22 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 2 Mar 2023 07:25:22 GMT Subject: RFR: 8302144: Move ZeroTLABTest.java to tier3 [v2] In-Reply-To: <9Zu7yAXxztPf84rwNxjbzcID97P5PRVj2k7WR8iRexE=.24646832-7270-464b-a221-920ef64bfc5a@github.com> References: <9Zu7yAXxztPf84rwNxjbzcID97P5PRVj2k7WR8iRexE=.24646832-7270-464b-a221-920ef64bfc5a@github.com> Message-ID: <_dEm89ldxBNwxu_AKiPkO0dvpNQNjOu53exjemUT8fY=.ac1913e0-b321-40e3-8941-f36428a2af5b@github.com> On Mon, 20 Feb 2023 07:32:34 GMT, Tobias Hartmann wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: >> >> - merge from master >> - 8302144: Move ZeroTLABTest.java to tier3 > > Looks good. Thanks @TobiHartmann @vnkozlov for the reviews! ------------- PR: https://git.openjdk.org/jdk/pull/12620 From aboldtch at openjdk.org Thu Mar 2 07:45:13 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 2 Mar 2023 07:45:13 GMT Subject: RFR: 8303418: Improve parameter and variable names in BitMap [v2] In-Reply-To: References: Message-ID: On Wed, 1 Mar 2023 06:08:33 GMT, Kim Barrett wrote: >> Please review this change to names in BitMap. >> >> - Parameters that designate a bit in a BitMap are named "bit". >> - Parameters that designate a word in the underlying BitMap storage are named "word". >> - Parameters that designate a range are named "beg" and "end" resp. >> >> Added helper function `flipped_word` for use by `get_next_bit_impl`, replacing >> the odd overload for `map`. >> >> In `get_next_bit_impl`, prefixed the variables "index" and "limit" with >> "word_" to make clear the units. >> >> Testing: >> mach5 tier1 > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > copyrights Nice to see a _bit_ more consistency in the parameter names here. Not 100% sold on the use of `bit` for a single bit index. But at least consistency makes it clearer. As long as `beg` and `end` are used for `[beg,end)` ranges then the names are appropriate. If they ever start to deviate from the `std::begin` and `std::end` semantics they should be changed to avoid confusion. ------------- Marked as reviewed by aboldtch (Committer). PR: https://git.openjdk.org/jdk/pull/12798 From duke at openjdk.org Thu Mar 2 07:57:00 2023 From: duke at openjdk.org (Amit Kumar) Date: Thu, 2 Mar 2023 07:57:00 GMT Subject: RFR: 8302328: [s390x] Simplify asm_assert definition Message-ID: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> This PR cleanups some assert statements and specifies branch condition at calling site itself. Remaining asm_assert methods are inlined as well. ------------- Commit messages: - added inline keyword before function implementation - inlined assert method - relocation code shortening - inline asm_assert - update header and cleanup - removed asm_assert{ne, high, low} - removed asm_assert_eq and asm_assert_static functionality - introduce branch conditions Changes: https://git.openjdk.org/jdk/pull/12822/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12822&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8302328 Stats: 172 lines in 14 files changed: 50 ins; 79 del; 43 mod Patch: https://git.openjdk.org/jdk/pull/12822.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12822/head:pull/12822 PR: https://git.openjdk.org/jdk/pull/12822 From sspitsyn at openjdk.org Thu Mar 2 08:53:11 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 2 Mar 2023 08:53:11 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v2] In-Reply-To: References: Message-ID: On Wed, 23 Nov 2022 10:14:23 GMT, Serguei Spitsyn wrote: >> This problem has two sides. >> One is that the `VirtualThread::run() `cashes the field `notifyJvmtiEvents` value. >> It caused the native method `notifyJvmtiUnmountBegin()` not called after the field `notifyJvmtiEvents` >> value has been set to `true` when an agent library is loaded into running VM. >> The fix is to get rid of this cashing. >> Another is that enabling `notifyJvmtiEvents` notifications needs a synchronization. >> Otherwise, a VTMS transition start can be missed which will cause some asserts to fire. >> The fix is to use a JvmtiVTMSTransitionDisabler helper for sync. >> >> Testing: >> The originally failed tests are passed now: >> >> runtime/vthread/RedefineClass.java >> runtime/vthread/TestObjectAllocationSampleEvent.java >> >> In progress: >> Run the tiers 1-6 to make sure there are no regression. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > remove caching if notifyJvmtiEvents in yieldContinuation Need to keep this PR alive for a while. ------------- PR: https://git.openjdk.org/jdk/pull/11304 From duke at openjdk.org Thu Mar 2 08:54:06 2023 From: duke at openjdk.org (Amit Kumar) Date: Thu, 2 Mar 2023 08:54:06 GMT Subject: RFR: 8302328: [s390x] Simplify asm_assert definition In-Reply-To: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> References: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> Message-ID: On Thu, 2 Mar 2023 07:49:05 GMT, Amit Kumar wrote: > This PR cleanups some assert statements and specifies branch condition at calling site itself. Remaining asm_assert methods are inlined as well. @backwaterred @RealLucy @TheRealMDoerr please review & provide your suggestions :) Thanks It's an s390x-specific PR, So linux-x86 failures are not related. ------------- PR: https://git.openjdk.org/jdk/pull/12822 From rcastanedalo at openjdk.org Thu Mar 2 08:56:17 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 2 Mar 2023 08:56:17 GMT Subject: RFR: 8302780: Add support for vectorized arraycopy GC barriers [v6] In-Reply-To: <3nz9PRPYzYVszMpwatVR3TbeFZsdT0jHpZ6lHqgJbSA=.12875265-8044-41c7-8e12-3b2851be8b08@github.com> References: <47e5gQJkZn7ldGv3n2cyQoZSAT9YpWWvIwXnbxhdGuQ=.643a4652-204b-4689-85a4-9378e8a587b3@github.com> <44kCeVWFrSGQEcehVV5G_EowMs26zWcuv5KmDbYv9Mc=.f49cb445-de78-4932-aefc-1839fd0be611@github.com> <3nz9PRPYzYVszMpwatVR3TbeFZsdT0jHpZ6lHqgJbSA=.12875265-8044-41c7-8e12-3b2851be8b08@github.com> Message-ID: <4GE8QumAJzRXPGKFAAXqOiJpM2fUTxfbSUKnQDnDOB0=.62c4e6f6-2063-488e-90a9-f4708bbf032a@github.com> On Wed, 1 Mar 2023 10:51:53 GMT, Erik ?sterlund wrote: >> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 1270: >> >>> 1268: Label copy4, copy8, copy16, copy32, copy80, copy_big, finish; >>> 1269: const Register t2 = r5, t3 = r6, t4 = r7, t5 = r11; >>> 1270: const Register t6 = r12, t7 = r13, t8 = r14, t9 = r15; >> >> I find the usage of r15 in `copy_memory` a bit confusing. If I get it right, it is used >> 1. as a temporary register (aliased as `t9`) for 16-bytes copying (L1407-1437), >> 2. as a temporary register (in its raw form, i.e. not via `t9`) passed explicitly to `copy_memory_small` (L1518-1548), and >> 3. as a "special" register passed implicitly to the generated `copy_longs` stubs (L1557-1574). >> >> I think it would be clearer if `t9` was used instead of `r15` for case 2) and then perhaps a comment was added before calling the generated `copy_longs` stubs mentioning that `t9` cannot be used from then on because it aliases with `r15`. > > Since r15 is part of the ABI contract across different generated stubs, I'd prefer to not change it to t9 as when you scroll through the different functions, it isn't clear that t9 isn't just used locally as a temporary register, but is used across stubs, even though it isn't passed as parameter with the name "t9" said stubs. I did however add a comment explaining when we start using r15 is count and that we can't use t9 any more after that point. I hope that's okay. The comment helps, thanks! ------------- PR: https://git.openjdk.org/jdk/pull/12670 From rcastanedalo at openjdk.org Thu Mar 2 09:18:26 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 2 Mar 2023 09:18:26 GMT Subject: RFR: 8302780: Add support for vectorized arraycopy GC barriers [v8] In-Reply-To: References: Message-ID: On Wed, 1 Mar 2023 10:56:31 GMT, Erik ?sterlund wrote: >> So far, the arraycopy stubs have performed some kind of bulk pre/post barriers for arraycopy, which have been good enough, and allowed the copying itself to be done with plain loads and stores. For generational ZGC, this approach is not good enough, and we need barriers for the actual copying, but instead don't need the pre/post barriers. To prepare the JVM for generational ZGC, we need to add an API for arraycopy barriers. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > Add comment about r15 useage > All this extreme cut-and-paste manual unrolling is very hard to read, maintain, and review. I wasn't going to say anything, because it's Erik, and what do I know! But I must push back here, this is too much. Please consider these style changes. In my opinion, the (very much needed) changes you suggest are outside the scope of this PR, which is about lifting the memory accesses, in their existing form, to a barrier API-level. Conflating this with your suggested changes would make it harder to review this PR, which is sufficiently complex in its current form. I totally agree that your suggestions would improve readability and maintainability, but couldn't we apply them in a follow-up RFE? ------------- PR: https://git.openjdk.org/jdk/pull/12670 From kbarrett at openjdk.org Thu Mar 2 09:34:12 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 2 Mar 2023 09:34:12 GMT Subject: RFR: 8303418: Improve parameter and variable names in BitMap [v2] In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 07:42:16 GMT, Axel Boldt-Christmas wrote: > Not 100% sold on the use of `bit` for a single bit index. But at least consistency makes it clearer. In this change I was mostly trying to remove outliers in parameter naming. I'm not wedded to either `bit` or `word`, but that is the common usage. ------------- PR: https://git.openjdk.org/jdk/pull/12798 From kbarrett at openjdk.org Thu Mar 2 09:45:00 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 2 Mar 2023 09:45:00 GMT Subject: RFR: 8303418: Improve parameter and variable names in BitMap [v3] In-Reply-To: References: Message-ID: > Please review this change to names in BitMap. > > - Parameters that designate a bit in a BitMap are named "bit". > - Parameters that designate a word in the underlying BitMap storage are named "word". > - Parameters that designate a range are named "beg" and "end" resp. > > Added helper function `flipped_word` for use by `get_next_bit_impl`, replacing > the odd overload for `map`. > > In `get_next_bit_impl`, prefixed the variables "index" and "limit" with > "word_" to make clear the units. > > Testing: > mach5 tier1 Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into improve-bitmap-varnames - copyrights - improve names in get_next_bit_impl - cleanup parameter names ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12798/files - new: https://git.openjdk.org/jdk/pull/12798/files/59138b0f..cb665e54 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12798&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12798&range=01-02 Stats: 3035 lines in 106 files changed: 2058 ins; 624 del; 353 mod Patch: https://git.openjdk.org/jdk/pull/12798.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12798/head:pull/12798 PR: https://git.openjdk.org/jdk/pull/12798 From kbarrett at openjdk.org Thu Mar 2 09:45:04 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 2 Mar 2023 09:45:04 GMT Subject: RFR: 8303418: Improve parameter and variable names in BitMap [v2] In-Reply-To: References: Message-ID: On Wed, 1 Mar 2023 11:34:53 GMT, Thomas Schatzl wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> copyrights > > Lgtm. > > It would be great to also fix the unusual spacing between method names and the parameter brackets (and the method body brackets) a little, but that can be a different issue. Thanks for reviews @tschatzl and @xmas92 ------------- PR: https://git.openjdk.org/jdk/pull/12798 From kbarrett at openjdk.org Thu Mar 2 09:49:15 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 2 Mar 2023 09:49:15 GMT Subject: Integrated: 8303418: Improve parameter and variable names in BitMap In-Reply-To: References: Message-ID: On Wed, 1 Mar 2023 04:39:19 GMT, Kim Barrett wrote: > Please review this change to names in BitMap. > > - Parameters that designate a bit in a BitMap are named "bit". > - Parameters that designate a word in the underlying BitMap storage are named "word". > - Parameters that designate a range are named "beg" and "end" resp. > > Added helper function `flipped_word` for use by `get_next_bit_impl`, replacing > the odd overload for `map`. > > In `get_next_bit_impl`, prefixed the variables "index" and "limit" with > "word_" to make clear the units. > > Testing: > mach5 tier1 This pull request has now been integrated. Changeset: 3091744f Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/3091744fff56ae08861f28b87c1de27738c4c62b Stats: 55 lines in 3 files changed: 5 ins; 1 del; 49 mod 8303418: Improve parameter and variable names in BitMap Reviewed-by: tschatzl, aboldtch ------------- PR: https://git.openjdk.org/jdk/pull/12798 From aph at openjdk.org Thu Mar 2 09:54:12 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 2 Mar 2023 09:54:12 GMT Subject: RFR: 8302780: Add support for vectorized arraycopy GC barriers [v6] In-Reply-To: References: <47e5gQJkZn7ldGv3n2cyQoZSAT9YpWWvIwXnbxhdGuQ=.643a4652-204b-4689-85a4-9378e8a587b3@github.com> <2g2QqIWkC2a7QyVzPkX-QW3LcMrti6-AMwuFPqe752o=.c63ae268-69be-4770-b642-544633c1cf8f@github.com> <1NdlCJpW9kP6N1prWI-pXRBFCooYppayETd83JdBEkE=.df3ed4f4-b685-4377-8284-40b4a3cedf5e@github.com> Message-ID: On Wed, 1 Mar 2023 09:51:22 GMT, Erik ?sterlund wrote: > Note also that the arraycopy stubs were already using r8 and r9 as temp registers. I just moved their use to GC barriers, so only they need to deal with their inherent scratchyness, while the temp registers used in the client code can use registers that are not scratchy, which was not the case before. The use of scratch registers in the AArch64 GC barriers was _already_ buggy and error prone, which resulted in serious and hard-to-find bugs. If you need to use the scratch registers in stubs, please, please don't alias them. Call them `rscratch1` and `rscratch2`. Using scratch registers is safe as long as the code follows a couple of rules. Don't alias them, and if an inner macro is called from another macro, it's often safer to pass scratch registers to the inner macro explicitly. I know we're not consistent about this. ------------- PR: https://git.openjdk.org/jdk/pull/12670 From aph at openjdk.org Thu Mar 2 10:01:21 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 2 Mar 2023 10:01:21 GMT Subject: RFR: 8302780: Add support for vectorized arraycopy GC barriers [v6] In-Reply-To: References: <47e5gQJkZn7ldGv3n2cyQoZSAT9YpWWvIwXnbxhdGuQ=.643a4652-204b-4689-85a4-9378e8a587b3@github.com> Message-ID: On Wed, 1 Mar 2023 09:29:14 GMT, Erik ?sterlund wrote: >> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 812: >> >>> 810: bs_asm->copy_load_at(_masm, decorators, type, 32, >>> 811: v2, v3, Address(__ pre(s, 8 * unit)), >>> 812: gct1, gct2, gcvt1); >> >> All this extreme cut-and-paste manual unrolling is very hard to read, maintain, and review. >> I wasn't going to say anything, because it's Erik, and what do I know! But I must push back here, this is too much. >> Please consider these style changes. > > I updated the PR with a helper stack object encapsulating the choice of GC temp registers, types, decorators, etc, so that each line removes all the noise and becomes more readable. Do you like it? If yes, do you also want your proposed loop constructions in the new more compact form? I'm okay either way. The more compact form is a _huge_ improvement. I believe that using loops rather than hand-unrolling the code would make it even better! ------------- PR: https://git.openjdk.org/jdk/pull/12670 From aph at openjdk.org Thu Mar 2 10:06:26 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 2 Mar 2023 10:06:26 GMT Subject: RFR: 8302780: Add support for vectorized arraycopy GC barriers [v8] In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 09:15:34 GMT, Roberto Casta?eda Lozano wrote: > > All this extreme cut-and-paste manual unrolling is very hard to read, maintain, and review. > > I wasn't going to say anything, because it's Erik, and what do I know! But I must push back here, this is too much. > > Please consider these style changes. > > In my opinion, the (very much needed) changes you suggest are outside the scope of this PR, which is about lifting the memory accesses, in their existing form, to a barrier API-level. Conflating this with your suggested changes would make it harder to review this PR, which is sufficiently complex in its current form. I totally agree that your suggestions would improve readability and maintainability, but couldn't we apply them in a follow-up RFE? I disagree in every way. The added complexity, which is fixed so it no longer matters, made it near-impossible for me to reason about this PR. And, as John Rose put it, like any good carpenter we should clean up as we work. ------------- PR: https://git.openjdk.org/jdk/pull/12670 From duke at openjdk.org Thu Mar 2 10:14:00 2023 From: duke at openjdk.org (Amit Kumar) Date: Thu, 2 Mar 2023 10:14:00 GMT Subject: RFR: 8303147: [s390x] fast & slow debug builds are broken Message-ID: <_ZdU0STu6HC-2e2UuIXJPe7RslDoKVDPLYXSGJFAwkA=.7639982c-26d7-4a93-8e6f-2d0171b71477@github.com> This PR fixes broken fast debug and slow debug build for s390x-arch. tier1 test are completed and results are not affect after this patch. ------------- Commit messages: - build fix for s390x Changes: https://git.openjdk.org/jdk/pull/12825/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12825&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303147 Stats: 53 lines in 12 files changed: 35 ins; 13 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/12825.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12825/head:pull/12825 PR: https://git.openjdk.org/jdk/pull/12825 From prappo at openjdk.org Thu Mar 2 12:13:14 2023 From: prappo at openjdk.org (Pavel Rappo) Date: Thu, 2 Mar 2023 12:13:14 GMT Subject: RFR: 8303480: Miscellaneous fixes to mostly invisible doc comments Message-ID: Please review this superficial documentation cleanup that was triggered by unrelated analysis of doc comments in JDK API. The only effect that this multi-area PR has on the JDK API Documentation (i.e. the observable effect on the generated HTML pages) can be summarized as follows: diff -ur build/macosx-aarch64/images/docs-before/api/serialized-form.html build/macosx-aarch64/images/docs-after/api/serialized-form.html --- build/macosx-aarch64/images/docs-before/api/serialized-form.html 2023-03-02 11:47:44 +++ build/macosx-aarch64/images/docs-after/api/serialized-form.html 2023-03-02 11:48:45 @@ -17084,7 +17084,7 @@ throws IOException, ClassNotFoundException
readObject is called to restore the state of the - (@code BasicPermission} from a stream.
+ BasicPermission from a stream.
Parameters:
s - the ObjectInputStream from which data is read
Notes ----- * I'm not an expert in any of the affected areas, except for jdk.javadoc, and I was merely after misused tags. Because of that, I would appreciate reviews from experts in other areas. * I discovered many more issues than I included in this PR. The excluded issues seem to occur in infrequently updated third-party code (e.g. javax.xml), which I assume we shouldn't touch unless necessary. * I will update copyright years after (and if) the fix had been approved, as required. ------------- Commit messages: - Initial commit Changes: https://git.openjdk.org/jdk/pull/12826/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12826&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303480 Stats: 75 lines in 39 files changed: 0 ins; 0 del; 75 mod Patch: https://git.openjdk.org/jdk/pull/12826.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12826/head:pull/12826 PR: https://git.openjdk.org/jdk/pull/12826 From duke at openjdk.org Thu Mar 2 13:17:05 2023 From: duke at openjdk.org (Jan Kratochvil) Date: Thu, 2 Mar 2023 13:17:05 GMT Subject: RFR: 8302191: Performance degradation for float/double modulo on Linux [v5] In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 04:02:13 GMT, David Holmes wrote: > You can't move the _WIN64 workaround to the sharedRuntime_x86.cpp file as that code is also used by Windows-Aarch64. Whether it needs the workaround or not is another matter, but unless proven otherwise we have to assume it does. I hope/believe the bug affects only amd64 and not aarch64. I want to verify it but I have some difficulty getting remote access to such Windows boxes, I am working on the verification. ------------- PR: https://git.openjdk.org/jdk/pull/12508 From mullan at openjdk.org Thu Mar 2 13:24:07 2023 From: mullan at openjdk.org (Sean Mullan) Date: Thu, 2 Mar 2023 13:24:07 GMT Subject: RFR: 8303480: Miscellaneous fixes to mostly invisible doc comments In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 12:03:44 GMT, Pavel Rappo wrote: > Please review this superficial documentation cleanup that was triggered by unrelated analysis of doc comments in JDK API. > > The only effect that this multi-area PR has on the JDK API Documentation (i.e. the observable effect on the generated HTML pages) can be summarized as follows: > > > diff -ur build/macosx-aarch64/images/docs-before/api/serialized-form.html build/macosx-aarch64/images/docs-after/api/serialized-form.html > --- build/macosx-aarch64/images/docs-before/api/serialized-form.html 2023-03-02 11:47:44 > +++ build/macosx-aarch64/images/docs-after/api/serialized-form.html 2023-03-02 11:48:45 > @@ -17084,7 +17084,7 @@ > throws IOException, > ClassNotFoundException >
readObject is called to restore the state of the > - (@code BasicPermission} from a stream.
> + BasicPermission from a stream. >
>
Parameters:
>
s - the ObjectInputStream from which data is read
> > Notes > ----- > > * I'm not an expert in any of the affected areas, except for jdk.javadoc, and I was merely after misused tags. Because of that, I would appreciate reviews from experts in other areas. > * I discovered many more issues than I included in this PR. The excluded issues seem to occur in infrequently updated third-party code (e.g. javax.xml), which I assume we shouldn't touch unless necessary. > * I will update copyright years after (and if) the fix had been approved, as required. security related changes look fine. ------------- Marked as reviewed by mullan (Reviewer). PR: https://git.openjdk.org/jdk/pull/12826 From iwalulya at openjdk.org Thu Mar 2 13:56:04 2023 From: iwalulya at openjdk.org (Ivan Walulya) Date: Thu, 2 Mar 2023 13:56:04 GMT Subject: RFR: 8191565: Last-ditch Full GC should also move humongous objects Message-ID: <58l059EvQI6HNQyjUYSGYEWt6x-c1yvtmfX1QWfinH8=.87517ba1-ec81-4b9f-a41b-b05c8d33cf3d@github.com> Hi All, Please review this change to move humongous regions during the Last-Ditch full gc ( on `do_maximal_compaction`). This change will enable G1 to avoid encountering Out-Of-Memory errors that may occur due to the fragmentation of memory regions caused by the allocation of large memory blocks. Here's how it works: At the end of `phase2_prepare_compaction`, G1 performs a serial compaction process for regular objects, which results in the heap being divided into two parts. The first part is a densely populated prefix that contains all the regular objects that have been moved. The second part consists of the remaining heap space, which may contain free regions, uncommitted regions, and regions that are not compacting. By moving/compacting the humongous objects in the second part of the heap closer to the dense prefix, G1 reduces the region fragmentation and avoids running into OOM errors. We have enabled for G1 the Jtreg test that was previously used only for Shenandoah to test such workload. Testing: Tier 1-3 ------------- Commit messages: - restructure - add test - breakout large functioons - working prototype Changes: https://git.openjdk.org/jdk/pull/12830/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12830&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8191565 Stats: 333 lines in 13 files changed: 318 ins; 9 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/12830.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12830/head:pull/12830 PR: https://git.openjdk.org/jdk/pull/12830 From mbaesken at openjdk.org Thu Mar 2 16:27:32 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 2 Mar 2023 16:27:32 GMT Subject: RFR: JDK-8302989: Add missing INCLUDE_CDS checks [v3] In-Reply-To: References: Message-ID: > The cds only coding in hotspot is usually guarded with the INCLUDE_CDS macro so that it can be removed at compile time in case the correct configure flags are set. > However at some places INCLUDE_CDS is missing and should be added. > > One question - should (additionally to the UseSharedSpaces code section) the DumpSharedSpaces code sections be guarded as well with INCLUDE_CDS macros ? Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: adjust some cds related vars ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12691/files - new: https://git.openjdk.org/jdk/pull/12691/files/5dff9614..33eb5631 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12691&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12691&range=01-02 Stats: 18 lines in 2 files changed: 2 ins; 14 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/12691.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12691/head:pull/12691 PR: https://git.openjdk.org/jdk/pull/12691 From mbaesken at openjdk.org Thu Mar 2 16:30:16 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 2 Mar 2023 16:30:16 GMT Subject: RFR: JDK-8302989: Add missing INCLUDE_CDS checks [v2] In-Reply-To: References: Message-ID: On Wed, 1 Mar 2023 12:27:29 GMT, Ioi Lam wrote: >> Hi Ioi, I set this already some lines above (in the non-INCLUDE_CDS case). > > You need to remove the `extern` keyword, and add the definition of `= false` in globalDefinitions.hpp. > > Then, the corresponding definition should be removed from globalDefinitions.cpp. Hi Ioi, I adjusted the *SharedSpaces in globalDefinitions.cpp / globalDefinitions.hpp . ------------- PR: https://git.openjdk.org/jdk/pull/12691 From rkennke at openjdk.org Thu Mar 2 16:34:05 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 2 Mar 2023 16:34:05 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity [v29] In-Reply-To: References: Message-ID: > See [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) for details. > > Basically, when running with -XX:-UseCompressedClassPointers, arrays will have a gap between the length field and the first array element, because array elements will only start at word-aligned offsets. This is not necessary for smaller-than-word elements. > > Also, while it is not very important now, it will become very important with Lilliput, which eliminates the Klass field and would always put the length field at offset 8, and leave a gap between offset 12 and 16. > > Testing: > - [x] runtime/FieldLayout/ArrayBaseOffsets.java (x86_64, x86_32, aarch64, arm, riscv, s390) > - [x] bootcycle (x86_64, x86_32, aarch64, arm, riscv, s390) > - [x] tier1 (x86_64, x86_32, aarch64, riscv) > - [x] tier2 (x86_64, aarch64, riscv) > - [x] tier3 (x86_64, riscv) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Rename payload_start -> payload_offset ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11044/files - new: https://git.openjdk.org/jdk/pull/11044/files/572de842..95b85b65 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11044&range=28 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11044&range=27-28 Stats: 9 lines in 1 file changed: 0 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/11044.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11044/head:pull/11044 PR: https://git.openjdk.org/jdk/pull/11044 From stuefe at openjdk.org Thu Mar 2 16:37:47 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 2 Mar 2023 16:37:47 GMT Subject: RFR: JDK-8293114: GC should trim the native heap [v10] In-Reply-To: <23KpPM4oPV6F1nz3g5CvIqvuX-ANcsMH4GuVNXjR-Lw=.b8d0fa2d-bb85-4899-8e21-f68ea64b988d@github.com> References: <23KpPM4oPV6F1nz3g5CvIqvuX-ANcsMH4GuVNXjR-Lw=.b8d0fa2d-bb85-4899-8e21-f68ea64b988d@github.com> Message-ID: > (This PR is in draft and still getting worked on. The description below is obsolete and does not reflect its current state) > > ------------------------------------------- > > This RFE adds the option to auto-trim the Glibc heap as part of the GC cycle. If the VM process suffered high temporary malloc spikes (regardless whether from JVM- or user code), this could recover significant amounts of memory. > > We discussed this a year ago [1], but the item got pushed to the bottom of my work pile, therefore it took longer than I thought. > > ### Motivation > > The Glibc allocator is reluctant to return memory to the OS, much more so than other allocators. Temporary malloc spikes often carry over as permanent RSS increase. > > Note that C-heap retention is difficult to observe. Since it is freed memory, it won't show up in NMT, it is just a part of private RSS. > > Theoretically, retained memory is not lost since it will be reused by future mallocs. Retaining memory is therefore a bet on the future behavior of the app. The allocator bets on the application needing memory in the near future, and to satisfy that need via malloc. > > But an app's malloc load can fluctuate wildly, with temporary spikes and long idle periods. And if the app rolls its own custom allocators atop of mmap, as hotspot does, a lot of that memory cannot be reused even though it counts toward its memory footprint. > > To help, Glibc exports an API to trim the C-heap: `malloc_trim(3)`. With JDK 18 [2], SAP contributed a new jcmd command to *manually* trim the C-heap on Linux. This RFE adds a complementary way to trim automatically. > > #### Is this even a problem? > > Do we have high malloc spikes in the JVM process? We assume that malloc load from hotspot is usually low since hotspot typically clusters allocations into custom areas - metaspace, code heap, arenas. > > But arenas are subject to Glibc mem retention too. I was surprised by that since I assumed 32k arena chunks were too big to be subject of Glibc retention. But I saw in experiments that high arena peaks often cause lasting RSS increase. > > And of course, both hotspot and JDK do a lot of finer-granular mallocs outside of custom allocators. > > But many cases of high memory retention in Glibc I have seen in third-party JNI code. Libraries allocate large buffers via malloc as temporary buffers. In fact, since we introduced the jcmd "System.trim_native_heap", some of our customers started to call this command periodically in scripts to counter these issues. > > Therefore I think while high malloc spikes are atypical for a JVM process, they can happen. Having a way to auto-trim the native heap makes sense. > > ### When should we trim? > > We want to trim when we know there is a lull in malloc activity coming. But we have no knowledge of the future. > > We could build a heuristic based on malloc frequency. But on closer inspection that is difficult. We cannot use NMT, since NMT has no complete picture (only knows hotspot) and is usually disabled in production anyway. The only way to get *all* mallocs would be to use Glibc malloc hooks. We have done so in desperate cases at SAP, but Glibc removed malloc hooks in 2.35. It would be a messy solution anyway; best to avoid it. > > The next best thing is synchronizing with the larger C-heap users in the VM: compiler and GC. But compiler turns out not to be such a problem, since the compiler uses arenas, and arena chunks are buffered in a free pool with a five-second delay. That means compiler activity that happens in bursts, like at VM startup, will just shuffle arena chunks around from/to the arena free pool, never bothering to call malloc or free. > > That leaves the GC, which was also the experts' recommendation in last year's discussion [1]. Most GCs do uncommit, and trimming the native heap fits well into this. And we want to time the trim to not get into the way of a GC. Plus, integrating trims into the GC cycle lets us reuse GC logging and timing, thereby making RSS changes caused by trim-native visible to the analyst. > > > ### How it works: > > Patch adds new options (experimental for now, and shared among all GCs): > > > -XX:+GCTrimNativeHeap > -XX:GCTrimNativeHeapInterval= (defaults to 60) > > > `GCTrimNativeHeap` is off by default. If enabled, it will cause the VM to trim the native heap on full GCs as well as periodically. The period is defined by `GCTrimNativeHeapInterval`. Periodic trimming can be completely switched off with `GCTrimNativeHeapInterval=0`; in that case, we will only trim on full GCs. > > ### Examples: > > This is an artificial test that causes two high malloc spikes with long idle periods. Observe how RSS recovers with trim but stays up without trim. The trim interval was set to 15 seconds for the test, and no GC was invoked here; this is periodic trimming. > > ![alloc-test](http://cr.openjdk.java.net/~stuefe/other/autotrim/rss-all-collectors.png) > > (See here for parameters: [run script](http://cr.openjdk.java.net/~stuefe/other/autotrim/run-all.sh) ) > > Spring pet clinic boots up, then idles. Once with, once without trim, with the trim interval at 60 seconds default. Of course, if it were actually doing something instead of idling, trim effects would be smaller. But the point of trimming is to recover memory in idle periods. > > ![petclinic bootup](http://cr.openjdk.java.net/~stuefe/other/autotrim/spring-petclinic-rss-with-and-without-trim.png)) > > (See here for parameters: [run script](http://cr.openjdk.java.net/~stuefe/other/autotrim/run-petclinic-boot.sh) ) > > > > ### Implementation > > One problem I faced when implementing this was that trimming was non-interruptable. GCs usually split the uncommit work into smaller portions, which is impossible for `malloc_trim()`. > > So very slow trims could introduce longer GC pauses. I did not want this, therefore I implemented two ways to trim: > 1) GCs can opt to trim asynchronously. In that case, a `NativeTrimmer` thread runs on behalf of the GC and takes care of all trimming. The GC just nudges the `NativeTrimmer` at the end of its GC cycle, but the trim itself runs concurrently. > 2) GCs can do the trim inside their own thread, synchronously. It will have to wait until the trim is done. > > (1) has the advantage of giving us periodic trims even without GC activity (Shenandoah does this out of the box). > > #### Serial > > Serial does the trimming synchronously as part of a full GC, and only then. I did not want to spawn a separate thread for the SerialGC. Therefore Serial is the only GC that does not offer periodic trimming, it just trims on full GC. > > #### Parallel, G1, Z > > All of them do the trimming asynchronously via `NativeTrimmer`. They schedule the native trim at the end of a full collection. They also pause the trimming at the beginning of a cycle to not trim during GCs. > > #### Shenandoah > > Shenandoah does the trimming synchronously in its service thread, similar to how it handles uncommits. Since the service thread already runs concurrently and continuously, it can do periodic trimming; no need to spin a new thread. And this way we can reuse the Shenandoah timing classes. > > ### Patch details > > - adds three new functions to the `os` namespace: > - `os::trim_native_heap()` implementing trim > - `os::can_trim_native_heap()` and `os::should_trim_native_heap()` to return whether platform supports trimming resp. whether the platform considers trimming to be useful. > - replaces implementation of the cmd "System.trim_native_heap" with the new `os::trim_native_heap` > - provides a new wrapper function wrapping the tedious `mallinfo()` vs `mallinfo2()` business: `os::Linux::get_mallinfo()` > - adds a GC-shared utility class, `GCTrimNative`, that takes care of trimming and GC-logging and houses the `NativeTrimmer` thread class. > - adds a regression test > > > ### Tests > > Tested older Glibc (2.31), and newer Glibc (2.35) (`mallinfo()` vs` mallinfo2()`), on Linux x64. > > The rest of the tests will be done by GHA and in our SAP nightlies. > > > ### Remarks > > #### How about other allocators? > > I have seen this retention problem mainly with the Glibc and the AIX libc. Muslc returns memory more eagerly to the OS. I also tested with jemalloc and found it also reclaims more aggressively, therefore I don't think MacOS or BSD are affected that much by retention either. > > #### Trim costs? > > Trim-native is a tradeoff between memory and performance. We pay > - The cost to do the trim depends on how much is trimmed. Time ranges on my machine between < 1ms for no-op trims, to ~800ms for 32GB trims. > - The cost for re-acquiring the memory, should the memory be needed again, is the second cost factor. > > #### Predicting malloc_trim effects? > > `ShenandoahUncommit` avoids uncommits if they are not necessary, thus avoiding work and gc log spamming. I liked that and tried to follow that example. Tried to devise a way to predict the effect trim could have based on allocator info from mallinfo(3). That was quite frustrating since the documentation was confusing and I had to do a lot of experimenting. In the end, I came up with a heuristic to prevent obviously pointless trim attempts; see `os::should_trim_native_heap()`. I am not completely happy with it. > > #### glibc.malloc.trim_threshold? > > glibc has a tunable that looks like it could influence the willingness of Glibc to return memory to the OS, the "trim_threshold". In practice, I could not get it to do anything useful. Regardless of the setting, it never seemed to influence the trimming behavior. Even if it would work, I'm not sure we'd want to use that, since by doing malloc_trim manually we can space out the trims as we see fit, instead of paying the trim price for free(3). > > > - [1] https://mail.openjdk.org/pipermail/hotspot-dev/2021-August/054323.html > - [2] https://bugs.openjdk.org/browse/JDK-8269345 Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 34 commits: - wip - Merge branch 'master' into JDK-8293114-GC-trim-native - wip - merge master - wip - wip - rename GCTrimNative TrimNative - rename NativeTrimmer - rename - src/hotspot/share/gc/shared/gcTrimNativeHeap.cpp - ... and 24 more: https://git.openjdk.org/jdk/compare/99f5687e...5d41312e ------------- Changes: https://git.openjdk.org/jdk/pull/10085/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10085&range=09 Stats: 1246 lines in 21 files changed: 1241 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/10085.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10085/head:pull/10085 PR: https://git.openjdk.org/jdk/pull/10085 From matsaave at openjdk.org Thu Mar 2 16:55:23 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 2 Mar 2023 16:55:23 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry Message-ID: The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. This change supports the following platforms: x86, aarch64, PPC, and RISCV ------------- Commit messages: - PPC and RISCV port - 8301995: Move invokedynamic resolution information out of the cpCache Changes: https://git.openjdk.org/jdk/pull/12778/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8301995 Stats: 1418 lines in 54 files changed: 1036 ins; 168 del; 214 mod Patch: https://git.openjdk.org/jdk/pull/12778.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12778/head:pull/12778 PR: https://git.openjdk.org/jdk/pull/12778 From aturbanov at openjdk.org Thu Mar 2 18:49:05 2023 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Thu, 2 Mar 2023 18:49:05 GMT Subject: RFR: 8302191: Performance degradation for float/double modulo on Linux [v5] In-Reply-To: References: Message-ID: On Wed, 1 Mar 2023 12:03:54 GMT, Jan Kratochvil wrote: >> I have OCA already processed/approved. I am not Author but my Author request is being processed these days (sent to Rob McKenna). >> I did regression test x86_64 OpenJDK-8. I will leave other regression testing on GHA. >> The patch (and former GCC performance regression) affects only x86_64+i686. > > Jan Kratochvil has updated the pull request incrementally with one additional commit since the last revision: > > Fix copyright author. test/micro/org/openjdk/bench/vm/floatingpoint/DremFrem.java line 177: > 175: double quiet_nan2 = Double.longBitsToDouble(0x7ffc000000000002l); > 176: double signaling_nan1 = Double.longBitsToDouble(0x7ffa000000000001l); > 177: double signaling_nan2 = Double.longBitsToDouble(0x7ffa000000000002l); Suggestion: double quiet_nan1 = Double.longBitsToDouble(0x7ffc000000000001L); double quiet_nan2 = Double.longBitsToDouble(0x7ffc000000000002L); double signaling_nan1 = Double.longBitsToDouble(0x7ffa000000000001L); double signaling_nan2 = Double.longBitsToDouble(0x7ffa000000000002L); ------------- PR: https://git.openjdk.org/jdk/pull/12508 From prr at openjdk.org Thu Mar 2 21:29:04 2023 From: prr at openjdk.org (Phil Race) Date: Thu, 2 Mar 2023 21:29:04 GMT Subject: RFR: 8303480: Miscellaneous fixes to mostly invisible doc comments In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 12:03:44 GMT, Pavel Rappo wrote: > Please review this superficial documentation cleanup that was triggered by unrelated analysis of doc comments in JDK API. > > The only effect that this multi-area PR has on the JDK API Documentation (i.e. the observable effect on the generated HTML pages) can be summarized as follows: > > > diff -ur build/macosx-aarch64/images/docs-before/api/serialized-form.html build/macosx-aarch64/images/docs-after/api/serialized-form.html > --- build/macosx-aarch64/images/docs-before/api/serialized-form.html 2023-03-02 11:47:44 > +++ build/macosx-aarch64/images/docs-after/api/serialized-form.html 2023-03-02 11:48:45 > @@ -17084,7 +17084,7 @@ > throws IOException, > ClassNotFoundException >
readObject is called to restore the state of the > - (@code BasicPermission} from a stream.
> + BasicPermission from a stream. >
>
Parameters:
>
s - the ObjectInputStream from which data is read
> > Notes > ----- > > * I'm not an expert in any of the affected areas, except for jdk.javadoc, and I was merely after misused tags. Because of that, I would appreciate reviews from experts in other areas. > * I discovered many more issues than I included in this PR. The excluded issues seem to occur in infrequently updated third-party code (e.g. javax.xml), which I assume we shouldn't touch unless necessary. > * I will update copyright years after (and if) the fix had been approved, as required. java.desktop changes are fine ------------- Marked as reviewed by prr (Reviewer). PR: https://git.openjdk.org/jdk/pull/12826 From ayang at openjdk.org Thu Mar 2 21:57:06 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 2 Mar 2023 21:57:06 GMT Subject: RFR: 8303534: Merge CompactibleSpace into ContiguousSpace Message-ID: Simple refactoring of merging two types. Test: tier1-5 ------------- Commit messages: - merge-type Changes: https://git.openjdk.org/jdk/pull/12841/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12841&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303534 Stats: 191 lines in 14 files changed: 27 ins; 122 del; 42 mod Patch: https://git.openjdk.org/jdk/pull/12841.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12841/head:pull/12841 PR: https://git.openjdk.org/jdk/pull/12841 From cjplummer at openjdk.org Thu Mar 2 22:14:04 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 2 Mar 2023 22:14:04 GMT Subject: RFR: 8303534: Merge CompactibleSpace into ContiguousSpace In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 21:49:39 GMT, Albert Mingkun Yang wrote: > Simple refactoring of merging two types. > > Test: tier1-5 Copyrights need updating in a few files. ------------- PR: https://git.openjdk.org/jdk/pull/12841 From dholmes at openjdk.org Thu Mar 2 22:14:15 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 2 Mar 2023 22:14:15 GMT Subject: RFR: 8302191: Performance degradation for float/double modulo on Linux [v5] In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 13:13:56 GMT, Jan Kratochvil wrote: > I hope/believe the bug affects only amd64 and not aarch64. I want to verify it but I have some difficulty getting remote access to such Windows boxes, I am working on the verification. IIUC the workaround is needed for a bug in Visual Studio, so verification would require testing binaries built with all version of VS that we allow to build the JDK. Otherwise we need a definitive statement about which version of VS the bug was fixed in and/or that the bug does not exist in the VS libraries used for Aarch64. ------------- PR: https://git.openjdk.org/jdk/pull/12508 From cjplummer at openjdk.org Thu Mar 2 22:18:08 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 2 Mar 2023 22:18:08 GMT Subject: RFR: 8303480: Miscellaneous fixes to mostly invisible doc comments In-Reply-To: References: Message-ID: <63UHTjtrUOVGBTwRt_M4QJ7aqBnuAGqekNTTTl3GM74=.ddedac04-ff87-40b9-9ea7-6b6d26d9d202@github.com> On Thu, 2 Mar 2023 12:03:44 GMT, Pavel Rappo wrote: > Please review this superficial documentation cleanup that was triggered by unrelated analysis of doc comments in JDK API. > > The only effect that this multi-area PR has on the JDK API Documentation (i.e. the observable effect on the generated HTML pages) can be summarized as follows: > > > diff -ur build/macosx-aarch64/images/docs-before/api/serialized-form.html build/macosx-aarch64/images/docs-after/api/serialized-form.html > --- build/macosx-aarch64/images/docs-before/api/serialized-form.html 2023-03-02 11:47:44 > +++ build/macosx-aarch64/images/docs-after/api/serialized-form.html 2023-03-02 11:48:45 > @@ -17084,7 +17084,7 @@ > throws IOException, > ClassNotFoundException >
readObject is called to restore the state of the > - (@code BasicPermission} from a stream.
> + BasicPermission from a stream. >
>
Parameters:
>
s - the ObjectInputStream from which data is read
> > Notes > ----- > > * I'm not an expert in any of the affected areas, except for jdk.javadoc, and I was merely after misused tags. Because of that, I would appreciate reviews from experts in other areas. > * I discovered many more issues than I included in this PR. The excluded issues seem to occur in infrequently updated third-party code (e.g. javax.xml), which I assume we shouldn't touch unless necessary. > * I will update copyright years after (and if) the fix had been approved, as required. The SA changes (jdk.hotspot.agent) look fine. ------------- Marked as reviewed by cjplummer (Reviewer). PR: https://git.openjdk.org/jdk/pull/12826 From phh at openjdk.org Thu Mar 2 22:26:11 2023 From: phh at openjdk.org (Paul Hohensee) Date: Thu, 2 Mar 2023 22:26:11 GMT Subject: RFR: 8302783: Improve CRC32C intrinsic with crypto pmull on AArch64 In-Reply-To: References: Message-ID: On Fri, 17 Feb 2023 19:59:24 GMT, Yi-Fan Tsai wrote: > This change adds a pmull-based CRC32C intrinsic, and it is more performant than the existing crc32c-instruction-based intrinsic on Neoverse V1. The benchmark shows 10 - 99% improvement. The improvement comes from the execution throughput increase of pmull/pmull2 from 1 on Neoverse N1 to 4 on Neoverse V1 while the latency remains 2 while the throughput of CRC32C instructions did not changed. > > The pmull-based CRC32C intrinsic is enabled by the existing option UseCryptoPmullForCRC32 which also enables the pmull-based CRC32 intrinsic. The option requires crc32c instructions, eor3 in SHA3, and 64-bit pmull/pmull2 in Cryptographic Extension. > > With this change, there will be only two different CRC32C intrinsics, crc32c and pmull, while there are four CRC32 intrinsics. > > The following test has passed. > test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java > > The throughput reported by [the micro benchmark](https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/java/util/TestCRC32C.java) is measured on an EC2 c7g instance. The optimization shows 10 - 99% improvement when the input is at least 384 bytes. > > | input | 64 | 128 | 256 | 384 | 511 | 512 | 1,024 | > | ------------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | > | improvement | 1.60% | 0.00% | 0.00% | 15.24% | 10.76% | 34.32% | 72.39% | > > | input | 2,048 | 4,096 | 8,192 | 16,384 | 32,768 | 65,536 | > | ------------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | > | improvement | 84.96% | 92.59% | 96.19% | 98.02% | 99.32% | 98.36% | > > > Baseline > > Benchmark (count) Mode Cnt Score Error Units > TestCRC32C.testCRC32CUpdate 64 thrpt 12 196575.739 ? 1824.113 ops/ms > TestCRC32C.testCRC32CUpdate 128 thrpt 12 123666.570 ? 2.730 ops/ms > TestCRC32C.testCRC32CUpdate 256 thrpt 12 70188.989 ? 2.002 ops/ms > TestCRC32C.testCRC32CUpdate 384 thrpt 12 49000.690 ? 1.421 ops/ms > TestCRC32C.testCRC32CUpdate 511 thrpt 12 34106.279 ? 25.390 ops/ms > TestCRC32C.testCRC32CUpdate 512 thrpt 12 37638.349 ? 1.039 ops/ms > TestCRC32C.testCRC32CUpdate 1024 thrpt 12 19526.513 ? 0.439 ops/ms > TestCRC32C.testCRC32CUpdate 2048 thrpt 12 9951.392 ? 4.803 ops/ms > TestCRC32C.testCRC32CUpdate 4096 thrpt 12 5023.268 ? 0.240 ops/ms > TestCRC32C.testCRC32CUpdate 8192 thrpt 12 2523.877 ? 0.062 ops/ms > TestCRC32C.testCRC32CUpdate 16384 thrpt 12 1265.011 ? 0.047 ops/ms > TestCRC32C.testCRC32CUpdate 32768 thrpt 12 632.291 ? 0.058 ops/ms > TestCRC32C.testCRC32CUpdate 65536 thrpt 12 315.396 ? 0.160 ops/ms > > > Crypto pmull > > Benchmark (count) Mode Cnt Score Error Units > TestCRC32C.testCRC32CUpdate 64 thrpt 12 199726.599 ? 166.477 ops/ms > TestCRC32C.testCRC32CUpdate 128 thrpt 12 123669.385 ? 1.821 ops/ms > TestCRC32C.testCRC32CUpdate 256 thrpt 12 70188.727 ? 1.313 ops/ms > TestCRC32C.testCRC32CUpdate 384 thrpt 12 56468.837 ? 76.524 ops/ms > TestCRC32C.testCRC32CUpdate 511 thrpt 12 37777.205 ? 406.431 ops/ms > TestCRC32C.testCRC32CUpdate 512 thrpt 12 50554.555 ? 17.169 ops/ms > TestCRC32C.testCRC32CUpdate 1024 thrpt 12 33661.006 ? 140.471 ops/ms > TestCRC32C.testCRC32CUpdate 2048 thrpt 12 18406.482 ? 205.952 ops/ms > TestCRC32C.testCRC32CUpdate 4096 thrpt 12 9674.159 ? 20.390 ops/ms > TestCRC32C.testCRC32CUpdate 8192 thrpt 12 4951.562 ? 6.566 ops/ms > TestCRC32C.testCRC32CUpdate 16384 thrpt 12 2504.970 ? 1.883 ops/ms > TestCRC32C.testCRC32CUpdate 32768 thrpt 12 1260.278 ? 0.484 ops/ms > TestCRC32C.testCRC32CUpdate 65536 thrpt 12 625.608 ? 0.300 ops/ms Lgtm. The linux-x86 pre-submit test failure is caused by a test using -XX:+UseCompressedClassPointers, which is an invalid switch for 32-bit JVMs. The linux-cross-compile pre-submit test failure is a compile-time failure in src/hotspot/cpu/arm/interpreterRT_arm.cpp, which latter is not touched by this patch. ------------- Marked as reviewed by phh (Reviewer). PR: https://git.openjdk.org/jdk/pull/12624 From darcy at openjdk.org Thu Mar 2 22:31:14 2023 From: darcy at openjdk.org (Joe Darcy) Date: Thu, 2 Mar 2023 22:31:14 GMT Subject: RFR: 8302191: Performance degradation for float/double modulo on Linux [v5] In-Reply-To: References: Message-ID: On Wed, 1 Mar 2023 12:03:54 GMT, Jan Kratochvil wrote: >> I have OCA already processed/approved. I am not Author but my Author request is being processed these days (sent to Rob McKenna). >> I did regression test x86_64 OpenJDK-8. I will leave other regression testing on GHA. >> The patch (and former GCC performance regression) affects only x86_64+i686. > > Jan Kratochvil has updated the pull request incrementally with one additional commit since the last revision: > > Fix copyright author. test/micro/org/openjdk/bench/vm/floatingpoint/DremFrem.java line 174: > 172: double inf_minus_inf = Double.POSITIVE_INFINITY - Double.POSITIVE_INFINITY; > 173: double inf_times_zero = Double.POSITIVE_INFINITY * 0.0f; > 174: double quiet_nan1 = Double.longBitsToDouble(0x7ffc000000000001l); Nit: whether or not a particular NaN bit pattern is quiet or signalling is an architecture-specific determination, it is not specified by the IEEE 754 standard. ------------- PR: https://git.openjdk.org/jdk/pull/12508 From duke at openjdk.org Thu Mar 2 22:44:14 2023 From: duke at openjdk.org (Yi-Fan Tsai) Date: Thu, 2 Mar 2023 22:44:14 GMT Subject: Integrated: 8302783: Improve CRC32C intrinsic with crypto pmull on AArch64 In-Reply-To: References: Message-ID: <7Yn_j1nryz0tjnGOXMjUy_Hp3JbgEIc63z8Z_uXm92Q=.2f6c9cc4-20d7-4d56-8d94-e32565540413@github.com> On Fri, 17 Feb 2023 19:59:24 GMT, Yi-Fan Tsai wrote: > This change adds a pmull-based CRC32C intrinsic, and it is more performant than the existing crc32c-instruction-based intrinsic on Neoverse V1. The benchmark shows 10 - 99% improvement. The improvement comes from the execution throughput increase of pmull/pmull2 from 1 on Neoverse N1 to 4 on Neoverse V1 while the latency remains 2 while the throughput of CRC32C instructions did not changed. > > The pmull-based CRC32C intrinsic is enabled by the existing option UseCryptoPmullForCRC32 which also enables the pmull-based CRC32 intrinsic. The option requires crc32c instructions, eor3 in SHA3, and 64-bit pmull/pmull2 in Cryptographic Extension. > > With this change, there will be only two different CRC32C intrinsics, crc32c and pmull, while there are four CRC32 intrinsics. > > The following test has passed. > test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32C.java > > The throughput reported by [the micro benchmark](https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/java/util/TestCRC32C.java) is measured on an EC2 c7g instance. The optimization shows 10 - 99% improvement when the input is at least 384 bytes. > > | input | 64 | 128 | 256 | 384 | 511 | 512 | 1,024 | > | ------------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | > | improvement | 1.60% | 0.00% | 0.00% | 15.24% | 10.76% | 34.32% | 72.39% | > > | input | 2,048 | 4,096 | 8,192 | 16,384 | 32,768 | 65,536 | > | ------------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | > | improvement | 84.96% | 92.59% | 96.19% | 98.02% | 99.32% | 98.36% | > > > Baseline > > Benchmark (count) Mode Cnt Score Error Units > TestCRC32C.testCRC32CUpdate 64 thrpt 12 196575.739 ? 1824.113 ops/ms > TestCRC32C.testCRC32CUpdate 128 thrpt 12 123666.570 ? 2.730 ops/ms > TestCRC32C.testCRC32CUpdate 256 thrpt 12 70188.989 ? 2.002 ops/ms > TestCRC32C.testCRC32CUpdate 384 thrpt 12 49000.690 ? 1.421 ops/ms > TestCRC32C.testCRC32CUpdate 511 thrpt 12 34106.279 ? 25.390 ops/ms > TestCRC32C.testCRC32CUpdate 512 thrpt 12 37638.349 ? 1.039 ops/ms > TestCRC32C.testCRC32CUpdate 1024 thrpt 12 19526.513 ? 0.439 ops/ms > TestCRC32C.testCRC32CUpdate 2048 thrpt 12 9951.392 ? 4.803 ops/ms > TestCRC32C.testCRC32CUpdate 4096 thrpt 12 5023.268 ? 0.240 ops/ms > TestCRC32C.testCRC32CUpdate 8192 thrpt 12 2523.877 ? 0.062 ops/ms > TestCRC32C.testCRC32CUpdate 16384 thrpt 12 1265.011 ? 0.047 ops/ms > TestCRC32C.testCRC32CUpdate 32768 thrpt 12 632.291 ? 0.058 ops/ms > TestCRC32C.testCRC32CUpdate 65536 thrpt 12 315.396 ? 0.160 ops/ms > > > Crypto pmull > > Benchmark (count) Mode Cnt Score Error Units > TestCRC32C.testCRC32CUpdate 64 thrpt 12 199726.599 ? 166.477 ops/ms > TestCRC32C.testCRC32CUpdate 128 thrpt 12 123669.385 ? 1.821 ops/ms > TestCRC32C.testCRC32CUpdate 256 thrpt 12 70188.727 ? 1.313 ops/ms > TestCRC32C.testCRC32CUpdate 384 thrpt 12 56468.837 ? 76.524 ops/ms > TestCRC32C.testCRC32CUpdate 511 thrpt 12 37777.205 ? 406.431 ops/ms > TestCRC32C.testCRC32CUpdate 512 thrpt 12 50554.555 ? 17.169 ops/ms > TestCRC32C.testCRC32CUpdate 1024 thrpt 12 33661.006 ? 140.471 ops/ms > TestCRC32C.testCRC32CUpdate 2048 thrpt 12 18406.482 ? 205.952 ops/ms > TestCRC32C.testCRC32CUpdate 4096 thrpt 12 9674.159 ? 20.390 ops/ms > TestCRC32C.testCRC32CUpdate 8192 thrpt 12 4951.562 ? 6.566 ops/ms > TestCRC32C.testCRC32CUpdate 16384 thrpt 12 2504.970 ? 1.883 ops/ms > TestCRC32C.testCRC32CUpdate 32768 thrpt 12 1260.278 ? 0.484 ops/ms > TestCRC32C.testCRC32CUpdate 65536 thrpt 12 625.608 ? 0.300 ops/ms This pull request has now been integrated. Changeset: f3abc406 Author: Yi-Fan Tsai Committer: Paul Hohensee URL: https://git.openjdk.org/jdk/commit/f3abc4063de658418283aee1f552c4b4976e5211 Stats: 78 lines in 3 files changed: 77 ins; 0 del; 1 mod 8302783: Improve CRC32C intrinsic with crypto pmull on AArch64 Reviewed-by: simonis, phh ------------- PR: https://git.openjdk.org/jdk/pull/12624 From dholmes at openjdk.org Thu Mar 2 22:47:05 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 2 Mar 2023 22:47:05 GMT Subject: RFR: 8286781: Replace the deprecated/obsolete gethostbyname and inet_addr calls Message-ID: We can replace `gethostbyname`, which is deprecated on Windows and Linux, with `getaddrinfo`. This API is available on all supported platforms and so can be placed in shared code. @djelinski pointed out that `getaddrinfo` can resolve both IP addresses and host names so the two step approach used in `networkStream::connect` is not necessary and we can do away with `os::get_host_by_name()` completely. The build is updated to enable winsock deprecation warnings, and now we need to use `ws2_32.lib` we can drop `wsock32.lib` (as it is basically a subset - again thanks @djelinski ). Testing - all Oracle builds in tiers 1-5 - All GHA builds The actual code change has to be manually tested because the code is only used by Ideal Graph Printing to connect to the Ideal Graph Visualizer. I've manually tested on Windows and Linux and @tobiasholenstein tested macOS. Thanks. ------------- Commit messages: - 8286781: Replace the deprecated/obsolete gethostbyname and inet_addr calls Changes: https://git.openjdk.org/jdk/pull/12842/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12842&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8286781 Stats: 41 lines in 6 files changed: 13 ins; 17 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/12842.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12842/head:pull/12842 PR: https://git.openjdk.org/jdk/pull/12842 From dholmes at openjdk.org Thu Mar 2 22:53:05 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 2 Mar 2023 22:53:05 GMT Subject: RFR: 8286781: Replace the deprecated/obsolete gethostbyname and inet_addr calls In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 22:35:17 GMT, David Holmes wrote: > We can replace `gethostbyname`, which is deprecated on Windows and Linux, with `getaddrinfo`. This API is available on all supported platforms and so can be placed in shared code. @djelinski pointed out that `getaddrinfo` can resolve both IP addresses and host names so the two step approach used in `networkStream::connect` is not necessary and we can do away with `os::get_host_by_name()` completely. > > The build is updated to enable winsock deprecation warnings, and now we need to use `ws2_32.lib` we can drop `wsock32.lib` (as it is basically a subset - again thanks @djelinski ). > > Testing > - all Oracle builds in tiers 1-5 > - All GHA builds > > The actual code change has to be manually tested because the code is only used by Ideal Graph Printing to connect to the Ideal Graph Visualizer. I've manually tested on Windows and Linux and @tobiasholenstein tested macOS. > > Thanks. I forgot to mention I have no way to test on AIX so if someone were able to do that, that would be good. Thanks. I know it builds okay. ------------- PR: https://git.openjdk.org/jdk/pull/12842 From dholmes at openjdk.org Fri Mar 3 00:34:22 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 3 Mar 2023 00:34:22 GMT Subject: RFR: JDK-8302801: Remove fdlibm C sources [v3] In-Reply-To: <3ePedpVieHDjKofsLHTwEI2xiCfEZ047Ewb4hwxSjiQ=.368af0f3-935e-4be8-9bc7-dbda01decf5c@github.com> References: <3ePedpVieHDjKofsLHTwEI2xiCfEZ047Ewb4hwxSjiQ=.368af0f3-935e-4be8-9bc7-dbda01decf5c@github.com> Message-ID: On Thu, 2 Mar 2023 19:55:39 GMT, Joe Darcy wrote: >> While the review of https://github.com/openjdk/jdk/pull/12800 finishes up, I thought I'd get out for the review the next phase of the FDLIBM port: removing the FDLIBM C sources from the repo. >> >> A repo with the changes for JDK-8302027 and this PR successful build on the default set of platform and successfully run tier 1 tests, which includes tests of the math library. >> >> There are a few remaining references to the case-independent string "fdlibm" in the make directory and HotSpot sources. HotSpot contains a partial fork for FDLIBM (a tine of FDLIBM?) to use for intrinsics. The remaining make machinery contains logic to determine what set of gcc options can be used for the compile. >> >> The intention of this change is to remove use of FDLIBM for the core libraries. > > Joe Darcy has updated the pull request incrementally with one additional commit since the last revision: > > Respond to review feedback. Hotspot changes are okay but I'm a bit confused about what the hotspot code will now be used for? ------------- PR: https://git.openjdk.org/jdk/pull/12821 From dholmes at openjdk.org Fri Mar 3 00:41:06 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 3 Mar 2023 00:41:06 GMT Subject: RFR: JDK-8302801: Remove fdlibm C sources [v3] In-Reply-To: <3ePedpVieHDjKofsLHTwEI2xiCfEZ047Ewb4hwxSjiQ=.368af0f3-935e-4be8-9bc7-dbda01decf5c@github.com> References: <3ePedpVieHDjKofsLHTwEI2xiCfEZ047Ewb4hwxSjiQ=.368af0f3-935e-4be8-9bc7-dbda01decf5c@github.com> Message-ID: On Thu, 2 Mar 2023 19:55:39 GMT, Joe Darcy wrote: >> While the review of https://github.com/openjdk/jdk/pull/12800 finishes up, I thought I'd get out for the review the next phase of the FDLIBM port: removing the FDLIBM C sources from the repo. >> >> A repo with the changes for JDK-8302027 and this PR successful build on the default set of platform and successfully run tier 1 tests, which includes tests of the math library. >> >> There are a few remaining references to the case-independent string "fdlibm" in the make directory and HotSpot sources. HotSpot contains a partial fork for FDLIBM (a tine of FDLIBM?) to use for intrinsics. The remaining make machinery contains logic to determine what set of gcc options can be used for the compile. >> >> The intention of this change is to remove use of FDLIBM for the core libraries. > > Joe Darcy has updated the pull request incrementally with one additional commit since the last revision: > > Respond to review feedback. Actually this is really my lack of understanding about the current code: why do we intrinsify `Math` but not `StrictMath`? ------------- PR: https://git.openjdk.org/jdk/pull/12821 From dholmes at openjdk.org Fri Mar 3 01:03:16 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 3 Mar 2023 01:03:16 GMT Subject: RFR: JDK-8302989: Add missing INCLUDE_CDS checks [v3] In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 16:27:32 GMT, Matthias Baesken wrote: >> The cds only coding in hotspot is usually guarded with the INCLUDE_CDS macro so that it can be removed at compile time in case the correct configure flags are set. >> However at some places INCLUDE_CDS is missing and should be added. >> >> One question - should (additionally to the UseSharedSpaces code section) the DumpSharedSpaces code sections be guarded as well with INCLUDE_CDS macros ? > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > adjust some cds related vars src/hotspot/share/runtime/arguments.cpp line 1460: > 1458: #if INCLUDE_CDS > 1459: UseSharedSpaces = false; > 1460: #endif This doesn't make sense - the entire `no_shared_spaces` method is only meaningful on a JVM with CDS. Otherwise `-Xshare:on` should immediately be rejected. ?? src/hotspot/share/runtime/arguments.cpp line 2675: > 2673: xshare_auto_cmd_line = true; > 2674: // -Xshare:off > 2675: } else if (match_option(option, "-Xshare:off")) { These flags should all immediately be rejected if used in a non-CDS build. Yes this is pre-existing but it now looks totally broken with the new changes. There should be something like: #ifndef INCLUDE_CDS } else if (match_option(option, "-Xshare")) { // or whatever we need to match the prefix only warning("Option %s is not supported in this VM", option); #else // process flags as usual #endif ------------- PR: https://git.openjdk.org/jdk/pull/12691 From kvn at openjdk.org Fri Mar 3 01:05:12 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 3 Mar 2023 01:05:12 GMT Subject: RFR: JDK-8302801: Remove fdlibm C sources [v3] In-Reply-To: References: <3ePedpVieHDjKofsLHTwEI2xiCfEZ047Ewb4hwxSjiQ=.368af0f3-935e-4be8-9bc7-dbda01decf5c@github.com> Message-ID: On Fri, 3 Mar 2023 00:31:12 GMT, David Holmes wrote: > Hotspot changes are okay but I'm a bit confused about what the hotspot code will now be used for? `SharedRuntime::*` runtime math functions are used on platforms where there are no HW instructions or intrinsics (Zero VM). JIT compiled code may also call them in such case (or when intrinsics disabled with flag): https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/library_call.cpp#L1815 Consider them as default intrinsics for all platforms (since they are written in C). They are faster than interpreting bytecode. They are also needed for results consistency - the same code is used by Interpreter and JITed code. ------------- PR: https://git.openjdk.org/jdk/pull/12821 From iklam at openjdk.org Fri Mar 3 01:16:13 2023 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 3 Mar 2023 01:16:13 GMT Subject: RFR: JDK-8302989: Add missing INCLUDE_CDS checks [v2] In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 16:27:32 GMT, Matthias Baesken wrote: >> You need to remove the `extern` keyword, and add the definition of `= false` in globalDefinitions.hpp. >> >> Then, the corresponding definition should be removed from globalDefinitions.cpp. > > Hi Ioi, I adjusted the *SharedSpaces in globalDefinitions.cpp / globalDefinitions.hpp . I am on vacation now so I won?t be able to follow up with the review. Please feel free to integrate after you get enough reviews. Just a general comment. For blocks that are enclosed with ?if (UseSharedSpace) {?}?, there?s no need to put them inside #if as these blocks will be elided by the C++ compiler. That way the code will be less messy. ------------- PR: https://git.openjdk.org/jdk/pull/12691 From darcy at openjdk.org Fri Mar 3 02:06:06 2023 From: darcy at openjdk.org (Joe Darcy) Date: Fri, 3 Mar 2023 02:06:06 GMT Subject: RFR: JDK-8302801: Remove fdlibm C sources [v3] In-Reply-To: References: <3ePedpVieHDjKofsLHTwEI2xiCfEZ047Ewb4hwxSjiQ=.368af0f3-935e-4be8-9bc7-dbda01decf5c@github.com> Message-ID: On Fri, 3 Mar 2023 00:38:18 GMT, David Holmes wrote: > Actually this is really my lack of understanding about the current code: why do we intrinsify `Math` but not `StrictMath`? In brief, the Math methods are allowed implementation flexibility in terms of their algorithm but the StrictMath methods are not. The "interesting" StrictMath methods are required to use the FDLIBM algorithms. (One exception is StrictMath.sqrt. Since sqrt is required to be correctly rounded, there is only one correct answer for any given argument and StrictMath.sqrt can be intrinsified to a hardware sqrt instruction just like Math.sqrt can.) ------------- PR: https://git.openjdk.org/jdk/pull/12821 From darcy at openjdk.org Fri Mar 3 02:10:15 2023 From: darcy at openjdk.org (Joe Darcy) Date: Fri, 3 Mar 2023 02:10:15 GMT Subject: RFR: JDK-8302801: Remove fdlibm C sources [v3] In-Reply-To: References: <3ePedpVieHDjKofsLHTwEI2xiCfEZ047Ewb4hwxSjiQ=.368af0f3-935e-4be8-9bc7-dbda01decf5c@github.com> Message-ID: On Fri, 3 Mar 2023 01:02:40 GMT, Vladimir Kozlov wrote: > Hotspot changes are okay but I'm a bit confused about what the hotspot code will now be used for? I'm not 100% positive if the current __kernel_rem_pio2 code would be in use. IIRC, back when we used the fsin/fcos instructions to intrinsify sin/cos on the x87 FPU, to meet Java semantics we needed to do argument reduction into the range supported by fsin/fcos. Perhaps __kernel_rem_pio2 is a hold-over from that time? I believe my recent sin/cos algorithms for instrincs wouldn't need to use these pathways. ------------- PR: https://git.openjdk.org/jdk/pull/12821 From eliu at openjdk.org Fri Mar 3 03:02:13 2023 From: eliu at openjdk.org (Eric Liu) Date: Fri, 3 Mar 2023 03:02:13 GMT Subject: RFR: 8301012: [vectorapi]: Intrinsify CompressBitsV/ExpandBitsV and add the AArch64 SVE backend implementation In-Reply-To: References: Message-ID: <-sLl5-N5rwoXIV6jNaUOaXOybanJif2SBVTVque9L84=.3bce4551-05c1-4bec-849d-0869be75a9ca@github.com> On Mon, 6 Feb 2023 17:23:20 GMT, Bhavana Kilambi wrote: > This patch adds mid-end compiler vector IR nodes for the scalar CompressBits and ExpandBits nodes - CompressBitsV and ExpandBitsV and also adds aarch64 backend support for these nodes using SVE2 instructions (included in the svebitperm feature). As there are direct instructions in SVE2 that map to these operations, a huge speed up in performance can be observed and it might significantly benefit all those workloads that extensively run these operations on an SVE2(with svebitperm feature) supporting machine. > > All the JTREG tests under "test/jdk/jdk/incubator/vector" pass successfully with this patch on an SVE2 machine. > The JMH tests - COMPRESS_BITS and EXPAND_BITS from [1] and [2] were run on a 128-bit vector length, SVE2 and svebitperm supporting aarch64 machine. Following are the gains observed with this patch - > > > Benchmark (length) Mode Cnt Gain > IntMaxVector.COMPRESS_BITS 1024 thrpt 15 81.68x > IntMaxVector.EXPAND_BITS 1024 thrpt 15 85.65x > LongMaxVector.COMPRESS_BITS 1024 thrpt 15 70.78x > LongMaxVector.EXPAND_BITS 1024 thrpt 15 76.31x > > > The "Gain" column is the ratio between the throughput of benchmark runs with this patch and that of benchmark runs without this patch. This patch does not change the performance of these operations for all other machines that do not support these instructions or when run on a different architecture. > With this patch, vectorization of CompressBits and ExpandBits operations happens only through vectorapi for aarch64. Autovectorization does not take place as the current JDK source does not contain aarch64 backend implementation for scalar CompressBits and ExpandBits. However, this PR - https://github.com/openjdk/jdk/pull/10537 adds aarch64 backend implementaton for CompressBits and ExpandBits and may lead to autovectorization of these nodes as well eventually but this PR is a standalone one and not dependent on the scalar implementation. > > [1] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/IntMaxVector.java > [2] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/LongMaxVector.java Marked as reviewed by eliu (Committer). ------------- PR: https://git.openjdk.org/jdk/pull/12446 From mdoerr at openjdk.org Fri Mar 3 04:04:06 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 3 Mar 2023 04:04:06 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v7] In-Reply-To: References: Message-ID: > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Will get addressed separately: [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - Merge branch 'master' into PPC64_Panama - Add test for HFA corner cases. - Minor cleanup. - HFA: Add support for nested structures. See JDK-8300294. - Remove size restriction for structs. Add TODO for Big Endian. - Clean fix for NativeMemorySegmentImpl issue with byteSize 0. - Initial Panama implementation. ------------- Changes: https://git.openjdk.org/jdk/pull/12708/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=06 Stats: 2247 lines in 59 files changed: 2139 ins; 1 del; 107 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From kbarrett at openjdk.org Fri Mar 3 04:12:59 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 3 Mar 2023 04:12:59 GMT Subject: RFR: 8302189: Mark assertion failures noreturn Message-ID: <4gLeUT6_5s6WpG-b-0146d490ZRgBq_JON1-5P3Wgtk=.e0bc3fd3-fce0-47b8-9478-1de796128731@github.com> Also 8302799: Refactor Debugging variable usage for noreturn crash reporting ------------- Commit messages: - new implementation of Debugging - noreturn attributes Changes: https://git.openjdk.org/jdk/pull/12845/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12845&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8302189 Stats: 168 lines in 8 files changed: 128 ins; 19 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/12845.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12845/head:pull/12845 PR: https://git.openjdk.org/jdk/pull/12845 From kbarrett at openjdk.org Fri Mar 3 04:21:06 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 3 Mar 2023 04:21:06 GMT Subject: RFR: 8302189: Mark assertion failures noreturn In-Reply-To: <4gLeUT6_5s6WpG-b-0146d490ZRgBq_JON1-5P3Wgtk=.e0bc3fd3-fce0-47b8-9478-1de796128731@github.com> References: <4gLeUT6_5s6WpG-b-0146d490ZRgBq_JON1-5P3Wgtk=.e0bc3fd3-fce0-47b8-9478-1de796128731@github.com> Message-ID: On Fri, 3 Mar 2023 04:04:25 GMT, Kim Barrett wrote: > Also 8302799: Refactor Debugging variable usage for noreturn crash reporting make/hotspot/lib/CompileJvm.gmk line 103: > 101: DISABLED_WARNINGS_xlc := tautological-compare shift-negative-value > 102: > 103: DISABLED_WARNINGS_microsoft := 4624 4244 4291 4146 4127 4722 I forgot to mention this in the PR summary. New warning suppression for 4722: 'function' : destructor never returns, potential memory leak We have various destructors calling ShouldNotCallThis, ShouldNotReachHere, or the like, which triggers this warning. It might be possible to deal with some (but not all) of those by deleting the destructor. The warning doesn't seem all that useful to us, so I just suppressed it. ------------- PR: https://git.openjdk.org/jdk/pull/12845 From dholmes at openjdk.org Fri Mar 3 04:38:07 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 3 Mar 2023 04:38:07 GMT Subject: RFR: 8302189: Mark assertion failures noreturn In-Reply-To: <4gLeUT6_5s6WpG-b-0146d490ZRgBq_JON1-5P3Wgtk=.e0bc3fd3-fce0-47b8-9478-1de796128731@github.com> References: <4gLeUT6_5s6WpG-b-0146d490ZRgBq_JON1-5P3Wgtk=.e0bc3fd3-fce0-47b8-9478-1de796128731@github.com> Message-ID: On Fri, 3 Mar 2023 04:04:25 GMT, Kim Barrett wrote: > Also 8302799: Refactor Debugging variable usage for noreturn crash reporting A few initial comments while I try to digest this. I don't use these particular debugging mechanism, neither Debugger nor BREAKPOINT, but it seems potentially problematic to me that the removed BREAKPOINTs happened after the error was reported, and IIUC the new mechanism will activate before the error is reported. Thanks. make/hotspot/lib/CompileJvm.gmk line 103: > 101: DISABLED_WARNINGS_xlc := tautological-compare shift-negative-value > 102: > 103: DISABLED_WARNINGS_microsoft := 4624 4244 4291 4146 4127 4722 It is annoying that we don't document what these warnings are. :( src/hotspot/share/utilities/debug.cpp line 85: > 83: if (is_enabled()) { > 84: fatal("Multiple Debugging contexts"); > 85: } This seems too restrictive as you could hit different DebuggingContexts in different threads. ?? src/hotspot/share/utilities/debug.cpp line 290: > 288: private: > 289: ResourceMark _rm; > 290: DebuggingContext _debugging; Why a different initialization syntax here? ------------- PR: https://git.openjdk.org/jdk/pull/12845 From mdoerr at openjdk.org Fri Mar 3 04:43:40 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 3 Mar 2023 04:43:40 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v8] In-Reply-To: References: Message-ID: > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Will get addressed separately: [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Fix merge bug. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12708/files - new: https://git.openjdk.org/jdk/pull/12708/files/d71f0f0a..f23edd8e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=06-07 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From darcy at openjdk.org Fri Mar 3 06:01:17 2023 From: darcy at openjdk.org (Joe Darcy) Date: Fri, 3 Mar 2023 06:01:17 GMT Subject: RFR: 8302191: Performance degradation for float/double modulo on Linux [v5] In-Reply-To: References: Message-ID: On Wed, 1 Mar 2023 12:03:54 GMT, Jan Kratochvil wrote: >> I have OCA already processed/approved. I am not Author but my Author request is being processed these days (sent to Rob McKenna). >> I did regression test x86_64 OpenJDK-8. I will leave other regression testing on GHA. >> The patch (and former GCC performance regression) affects only x86_64+i686. > > Jan Kratochvil has updated the pull request incrementally with one additional commit since the last revision: > > Fix copyright author. src/hotspot/cpu/x86/sharedRuntime_x86.cpp line 129: > 127: // GCC had slow fmod(): > 128: // since https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=4f2611b6e872c40e0bf4da38ff05df8c8fe0ee64 > 129: // until https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=8020c9c42349f51f75239b9d35a2be41848a97bd We typically list such URLs in the bug database rather than the source code. ------------- PR: https://git.openjdk.org/jdk/pull/12508 From kbarrett at openjdk.org Fri Mar 3 07:33:06 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 3 Mar 2023 07:33:06 GMT Subject: RFR: 8286781: Replace the deprecated/obsolete gethostbyname and inet_addr calls In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 22:35:17 GMT, David Holmes wrote: > We can replace `gethostbyname`, which is deprecated on Windows and Linux, with `getaddrinfo`. This API is available on all supported platforms and so can be placed in shared code. @djelinski pointed out that `getaddrinfo` can resolve both IP addresses and host names so the two step approach used in `networkStream::connect` is not necessary and we can do away with `os::get_host_by_name()` completely. > > The build is updated to enable winsock deprecation warnings, and now we need to use `ws2_32.lib` we can drop `wsock32.lib` (as it is basically a subset - again thanks @djelinski ). > > Testing > - all Oracle builds in tiers 1-5 > - All GHA builds > > The actual code change has to be manually tested because the code is only used by Ideal Graph Printing to connect to the Ideal Graph Visualizer. I've manually tested on Windows and Linux and @tobiasholenstein tested macOS. > > Thanks. Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.org/jdk/pull/12842 From mbaesken at openjdk.org Fri Mar 3 08:07:20 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 3 Mar 2023 08:07:20 GMT Subject: RFR: JDK-8302989: Add missing INCLUDE_CDS checks [v2] In-Reply-To: References: Message-ID: On Fri, 3 Mar 2023 01:13:18 GMT, Ioi Lam wrote: >> Hi Ioi, I adjusted the *SharedSpaces in globalDefinitions.cpp / globalDefinitions.hpp . > > I am on vacation now so I won?t be able to follow up with the review. Please feel free to integrate after you get enough reviews. > > Just a general comment. For blocks that are enclosed with ?if (UseSharedSpace) {?}?, there?s no need to put them inside #if as these blocks will be elided by the C++ compiler. That way the code will be less messy. Hi Ioi, thanks for the feedback. Mostly you are correct, the `#if` blocks are not needed any more at most places. Should I remove them at least from my patch now or keep them? But I think there are a few exceptions - at some places the coding attempts to set the *SharedSpaces flags and this does not compile any more after switching to const. So there the `#if` blocks have to stay. ------------- PR: https://git.openjdk.org/jdk/pull/12691 From aivanov at openjdk.org Fri Mar 3 08:28:19 2023 From: aivanov at openjdk.org (Alexey Ivanov) Date: Fri, 3 Mar 2023 08:28:19 GMT Subject: RFR: 8303480: Miscellaneous fixes to mostly invisible doc comments In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 12:03:44 GMT, Pavel Rappo wrote: > Please review this superficial documentation cleanup that was triggered by unrelated analysis of doc comments in JDK API. > > The only effect that this multi-area PR has on the JDK API Documentation (i.e. the observable effect on the generated HTML pages) can be summarized as follows: > > > diff -ur build/macosx-aarch64/images/docs-before/api/serialized-form.html build/macosx-aarch64/images/docs-after/api/serialized-form.html > --- build/macosx-aarch64/images/docs-before/api/serialized-form.html 2023-03-02 11:47:44 > +++ build/macosx-aarch64/images/docs-after/api/serialized-form.html 2023-03-02 11:48:45 > @@ -17084,7 +17084,7 @@ > throws IOException, > ClassNotFoundException >
readObject is called to restore the state of the > - (@code BasicPermission} from a stream.
> + BasicPermission from a stream. >
>
Parameters:
>
s - the ObjectInputStream from which data is read
> > Notes > ----- > > * I'm not an expert in any of the affected areas, except for jdk.javadoc, and I was merely after misused tags. Because of that, I would appreciate reviews from experts in other areas. > * I discovered many more issues than I included in this PR. The excluded issues seem to occur in infrequently updated third-party code (e.g. javax.xml), which I assume we shouldn't touch unless necessary. > * I will update copyright years after (and if) the fix had been approved, as required. Looks good to me. I looked through all the changes, paying more attention to the client area. src/java.base/share/classes/java/lang/invoke/BootstrapMethodInvoker.java line 257: > 255: > 256: /** > 257: * @return true iff the BSM method type exactly matches I assume ?iff? should ?if?? src/jdk.compiler/share/classes/com/sun/tools/javac/code/Types.java line 2866: > 2864: * Merge multiple abstract methods. The preferred method is a method that is a subsignature > 2865: * of all the other signatures and whose return type is more specific {@link MostSpecificReturnCheck}. > 2866: * The resulting preferred method has a thrown clause that is the intersection of the merged Is it ??has a {@code throws} clause??? ------------- Marked as reviewed by aivanov (Reviewer). PR: https://git.openjdk.org/jdk/pull/12826 From mbaesken at openjdk.org Fri Mar 3 08:37:25 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 3 Mar 2023 08:37:25 GMT Subject: RFR: JDK-8302989: Add missing INCLUDE_CDS checks [v3] In-Reply-To: References: Message-ID: On Fri, 3 Mar 2023 00:50:09 GMT, David Holmes wrote: >> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: >> >> adjust some cds related vars > > src/hotspot/share/runtime/arguments.cpp line 1460: > >> 1458: #if INCLUDE_CDS >> 1459: UseSharedSpaces = false; >> 1460: #endif > > This doesn't make sense - the entire `no_shared_spaces` method is only meaningful on a JVM with CDS. Otherwise `-Xshare:on` should immediately be rejected. ?? Hi David, unfortunately this has to stay for now, otherwise we would try to set a const in the CDS-disabled build, this does not compile. Maybe it would make sense to guard the whole function `no_shared_spaces` and adjust the calls/usages but I would prefer a separate issue for this. ------------- PR: https://git.openjdk.org/jdk/pull/12691 From eosterlund at openjdk.org Fri Mar 3 08:51:37 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 3 Mar 2023 08:51:37 GMT Subject: RFR: 8302780: Add support for vectorized arraycopy GC barriers [v9] In-Reply-To: References: Message-ID: > So far, the arraycopy stubs have performed some kind of bulk pre/post barriers for arraycopy, which have been good enough, and allowed the copying itself to be done with plain loads and stores. For generational ZGC, this approach is not good enough, and we need barriers for the actual copying, but instead don't need the pre/post barriers. To prepare the JVM for generational ZGC, we need to add an API for arraycopy barriers. Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: Remove rscratch aliasing ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12670/files - new: https://git.openjdk.org/jdk/pull/12670/files/d03a5c29..28efe5d8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12670&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12670&range=07-08 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/12670.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12670/head:pull/12670 PR: https://git.openjdk.org/jdk/pull/12670 From eosterlund at openjdk.org Fri Mar 3 08:51:38 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 3 Mar 2023 08:51:38 GMT Subject: RFR: 8302780: Add support for vectorized arraycopy GC barriers [v6] In-Reply-To: References: <47e5gQJkZn7ldGv3n2cyQoZSAT9YpWWvIwXnbxhdGuQ=.643a4652-204b-4689-85a4-9378e8a587b3@github.com> <2g2QqIWkC2a7QyVzPkX-QW3LcMrti6-AMwuFPqe752o=.c63ae268-69be-4770-b642-544633c1cf8f@github.com> <1NdlCJpW9kP6N1prWI-pXRBFCooYppayETd83JdBEkE=.df3ed4f4-b685-4377-8284-40b4a3cedf5e@github.com> Message-ID: On Thu, 2 Mar 2023 09:51:03 GMT, Andrew Haley wrote: >> Note also that the arraycopy stubs were already using r8 and r9 as temp registers. I just moved their use to GC barriers, so only they need to deal with their inherent scratchyness, while the temp registers used in the client code can use registers that are not scratchy, which was not the case before. > >> Note also that the arraycopy stubs were already using r8 and r9 as temp registers. I just moved their use to GC barriers, so only they need to deal with their inherent scratchyness, while the temp registers used in the client code can use registers that are not scratchy, which was not the case before. > > The use of scratch registers in the AArch64 GC barriers was _already_ buggy and error prone, which resulted in serious and hard-to-find bugs. If you need to use the scratch registers in stubs, please, please don't alias them. Call them `rscratch1` and `rscratch2`. > > Using scratch registers is safe as long as the code follows a couple of rules. Don't alias them, and if an inner macro is called from another macro, it's often safer to pass scratch registers to the inner macro explicitly. I know we're not consistent about this. I see. Yeah I'd love to go more and more towards the more explicit approach. I removed the aliasing so we pass in rscratch1 and rscratch2 instead of r8 and r9. ------------- PR: https://git.openjdk.org/jdk/pull/12670 From mbaesken at openjdk.org Fri Mar 3 08:55:07 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 3 Mar 2023 08:55:07 GMT Subject: RFR: JDK-8302989: Add missing INCLUDE_CDS checks [v4] In-Reply-To: References: Message-ID: > The cds only coding in hotspot is usually guarded with the INCLUDE_CDS macro so that it can be removed at compile time in case the correct configure flags are set. > However at some places INCLUDE_CDS is missing and should be added. > > One question - should (additionally to the UseSharedSpaces code section) the DumpSharedSpaces code sections be guarded as well with INCLUDE_CDS macros ? Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: Adjust arguments handling ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12691/files - new: https://git.openjdk.org/jdk/pull/12691/files/33eb5631..937ca5bf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12691&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12691&range=02-03 Stats: 14 lines in 1 file changed: 4 ins; 10 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/12691.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12691/head:pull/12691 PR: https://git.openjdk.org/jdk/pull/12691 From mbaesken at openjdk.org Fri Mar 3 08:55:12 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 3 Mar 2023 08:55:12 GMT Subject: RFR: JDK-8302989: Add missing INCLUDE_CDS checks [v3] In-Reply-To: References: Message-ID: On Fri, 3 Mar 2023 00:58:58 GMT, David Holmes wrote: >> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: >> >> adjust some cds related vars > > src/hotspot/share/runtime/arguments.cpp line 2675: > >> 2673: xshare_auto_cmd_line = true; >> 2674: // -Xshare:off >> 2675: } else if (match_option(option, "-Xshare:off")) { > > These flags should all immediately be rejected if used in a non-CDS build. Yes this is pre-existing but it now looks totally broken with the new changes. There should be something like: > > #ifndef INCLUDE_CDS > } else if (match_option(option, "-Xshare")) { // or whatever we need to match the prefix only > warning("Option %s is not supported in this VM", option); > #else > // process flags as usual > #endif Hi David, I adjusted the coding following your request (however we need to use `#if` not `#ifndef` here). ------------- PR: https://git.openjdk.org/jdk/pull/12691 From prappo at openjdk.org Fri Mar 3 09:41:06 2023 From: prappo at openjdk.org (Pavel Rappo) Date: Fri, 3 Mar 2023 09:41:06 GMT Subject: RFR: 8303480: Miscellaneous fixes to mostly invisible doc comments In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 16:23:17 GMT, Alexey Ivanov wrote: >> Please review this superficial documentation cleanup that was triggered by unrelated analysis of doc comments in JDK API. >> >> The only effect that this multi-area PR has on the JDK API Documentation (i.e. the observable effect on the generated HTML pages) can be summarized as follows: >> >> >> diff -ur build/macosx-aarch64/images/docs-before/api/serialized-form.html build/macosx-aarch64/images/docs-after/api/serialized-form.html >> --- build/macosx-aarch64/images/docs-before/api/serialized-form.html 2023-03-02 11:47:44 >> +++ build/macosx-aarch64/images/docs-after/api/serialized-form.html 2023-03-02 11:48:45 >> @@ -17084,7 +17084,7 @@ >> throws IOException, >> ClassNotFoundException >>
readObject is called to restore the state of the >> - (@code BasicPermission} from a stream.
>> + BasicPermission from a stream. >>
>>
Parameters:
>>
s - the ObjectInputStream from which data is read
>> >> Notes >> ----- >> >> * I'm not an expert in any of the affected areas, except for jdk.javadoc, and I was merely after misused tags. Because of that, I would appreciate reviews from experts in other areas. >> * I discovered many more issues than I included in this PR. The excluded issues seem to occur in infrequently updated third-party code (e.g. javax.xml), which I assume we shouldn't touch unless necessary. >> * I will update copyright years after (and if) the fix had been approved, as required. > > src/java.base/share/classes/java/lang/invoke/BootstrapMethodInvoker.java line 257: > >> 255: >> 256: /** >> 257: * @return true iff the BSM method type exactly matches > > I assume ?iff? should ?if?? Here and elsewhere in this file "iff" might mean [if and only if](https://en.wikipedia.org/wiki/If_and_only_if), which would make sense. (FWIW, there are a few hundred occurrences of the word "iff" in src.) @cl4es (Claes Redestad), as the author of those lines would you like to chime in? Since Claes might read this, I note that when I changed unsupported `{@see}` to `{@link}` thoughtout this file, my IDE could not resolve one of the links: `java.lang.invoke.LambdaMetafactory#metafactory(MethodHandles.Lookup,String,Class,MethodType,MethodHandle,MethodType)` While there's a similarly-name method with slightly different parameters, I refrained from using it: `java.lang.invoke.LambdaMetafactory#metafactory(MethodHandles.Lookup,String,MethodType,MethodType,MethodHandle,MethodType)`. ------------- PR: https://git.openjdk.org/jdk/pull/12826 From prappo at openjdk.org Fri Mar 3 09:44:13 2023 From: prappo at openjdk.org (Pavel Rappo) Date: Fri, 3 Mar 2023 09:44:13 GMT Subject: RFR: 8303480: Miscellaneous fixes to mostly invisible doc comments In-Reply-To: References: Message-ID: <5TgKeBVz0u1hCa1qOiC7Y46DJvUtDIsDa1wv2I4tAX8=.8575f968-0685-450d-8d77-16523cd7531a@github.com> On Fri, 3 Mar 2023 08:15:49 GMT, Alexey Ivanov wrote: >> Please review this superficial documentation cleanup that was triggered by unrelated analysis of doc comments in JDK API. >> >> The only effect that this multi-area PR has on the JDK API Documentation (i.e. the observable effect on the generated HTML pages) can be summarized as follows: >> >> >> diff -ur build/macosx-aarch64/images/docs-before/api/serialized-form.html build/macosx-aarch64/images/docs-after/api/serialized-form.html >> --- build/macosx-aarch64/images/docs-before/api/serialized-form.html 2023-03-02 11:47:44 >> +++ build/macosx-aarch64/images/docs-after/api/serialized-form.html 2023-03-02 11:48:45 >> @@ -17084,7 +17084,7 @@ >> throws IOException, >> ClassNotFoundException >>
readObject is called to restore the state of the >> - (@code BasicPermission} from a stream.
>> + BasicPermission from a stream. >>
>>
Parameters:
>>
s - the ObjectInputStream from which data is read
>> >> Notes >> ----- >> >> * I'm not an expert in any of the affected areas, except for jdk.javadoc, and I was merely after misused tags. Because of that, I would appreciate reviews from experts in other areas. >> * I discovered many more issues than I included in this PR. The excluded issues seem to occur in infrequently updated third-party code (e.g. javax.xml), which I assume we shouldn't touch unless necessary. >> * I will update copyright years after (and if) the fix had been approved, as required. > > src/jdk.compiler/share/classes/com/sun/tools/javac/code/Types.java line 2866: > >> 2864: * Merge multiple abstract methods. The preferred method is a method that is a subsignature >> 2865: * of all the other signatures and whose return type is more specific {@link MostSpecificReturnCheck}. >> 2866: * The resulting preferred method has a thrown clause that is the intersection of the merged > > Is it ??has a {@code throws} clause??? Thanks! I'll add this to a separate PR. ------------- PR: https://git.openjdk.org/jdk/pull/12826 From aph at openjdk.org Fri Mar 3 09:45:22 2023 From: aph at openjdk.org (Andrew Haley) Date: Fri, 3 Mar 2023 09:45:22 GMT Subject: RFR: 8302780: Add support for vectorized arraycopy GC barriers [v9] In-Reply-To: References: Message-ID: On Fri, 3 Mar 2023 08:51:37 GMT, Erik ?sterlund wrote: >> So far, the arraycopy stubs have performed some kind of bulk pre/post barriers for arraycopy, which have been good enough, and allowed the copying itself to be done with plain loads and stores. For generational ZGC, this approach is not good enough, and we need barriers for the actual copying, but instead don't need the pre/post barriers. To prepare the JVM for generational ZGC, we need to add an API for arraycopy barriers. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > Remove rscratch aliasing That's good enough for me. Thanks. ------------- Marked as reviewed by aph (Reviewer). PR: https://git.openjdk.org/jdk/pull/12670 From eosterlund at openjdk.org Fri Mar 3 09:45:23 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 3 Mar 2023 09:45:23 GMT Subject: RFR: 8302780: Add support for vectorized arraycopy GC barriers [v6] In-Reply-To: References: <47e5gQJkZn7ldGv3n2cyQoZSAT9YpWWvIwXnbxhdGuQ=.643a4652-204b-4689-85a4-9378e8a587b3@github.com> Message-ID: On Thu, 2 Mar 2023 09:58:04 GMT, Andrew Haley wrote: >> I updated the PR with a helper stack object encapsulating the choice of GC temp registers, types, decorators, etc, so that each line removes all the noise and becomes more readable. Do you like it? If yes, do you also want your proposed loop constructions in the new more compact form? I'm okay either way. > > The more compact form is a _huge_ improvement. I believe that using loops rather than hand-unrolling the code would make it even better! I think you are right that the loop form is easier to understand and maintain. But I think it makes it harder to convince myself that I didn't accidentally change the logic with this patch. I'm up for doing it as a follow-up RFE. Does that sound okay? ------------- PR: https://git.openjdk.org/jdk/pull/12670 From dholmes at openjdk.org Fri Mar 3 09:56:35 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 3 Mar 2023 09:56:35 GMT Subject: RFR: 8286781: Replace the deprecated/obsolete gethostbyname and inet_addr calls [v2] In-Reply-To: References: Message-ID: <3jFBC6xN7K5rRI3NUQODmTV3Fo03iRajvhNpd5Tolek=.6a2058a2-dc3d-4341-95e7-4eae498d3e1b@github.com> > We can replace `gethostbyname`, which is deprecated on Windows and Linux, with `getaddrinfo`. This API is available on all supported platforms and so can be placed in shared code. @djelinski pointed out that `getaddrinfo` can resolve both IP addresses and host names so the two step approach used in `networkStream::connect` is not necessary and we can do away with `os::get_host_by_name()` completely. > > The build is updated to enable winsock deprecation warnings, and now we need to use `ws2_32.lib` we can drop `wsock32.lib` (as it is basically a subset - again thanks @djelinski ). > > Testing > - all Oracle builds in tiers 1-5 > - All GHA builds > > The actual code change has to be manually tested because the code is only used by Ideal Graph Printing to connect to the Ideal Graph Visualizer. I've manually tested on Windows and Linux and @tobiasholenstein tested macOS. > > Thanks. David Holmes has updated the pull request incrementally with one additional commit since the last revision: Restrict getaddrinfo to IPv4 only as per the rest of the code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12842/files - new: https://git.openjdk.org/jdk/pull/12842/files/81f15e05..8b6f6317 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12842&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12842&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12842.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12842/head:pull/12842 PR: https://git.openjdk.org/jdk/pull/12842 From redestad at openjdk.org Fri Mar 3 10:12:13 2023 From: redestad at openjdk.org (Claes Redestad) Date: Fri, 3 Mar 2023 10:12:13 GMT Subject: RFR: 8303480: Miscellaneous fixes to mostly invisible doc comments In-Reply-To: References: Message-ID: <-U8YFFuXm_hMf-bY1AVCRauRrE-fRYRxrx_yf38ZL1A=.d50884c5-cc4b-489a-b817-828faf876c76@github.com> On Fri, 3 Mar 2023 09:38:13 GMT, Pavel Rappo wrote: >> src/java.base/share/classes/java/lang/invoke/BootstrapMethodInvoker.java line 257: >> >>> 255: >>> 256: /** >>> 257: * @return true iff the BSM method type exactly matches >> >> I assume ?iff? should ?if?? > > Here and elsewhere in this file "iff" might mean [if and only if](https://en.wikipedia.org/wiki/If_and_only_if), which would make sense. (FWIW, there are a few hundred occurrences of the word "iff" in src.) > > @cl4es (Claes Redestad), as the author of those lines would you like to chime in? > > Since Claes might read this, I note that when I changed unsupported `{@see}` to `{@link}` thoughtout this file, my IDE could not resolve one of the links: `java.lang.invoke.LambdaMetafactory#metafactory(MethodHandles.Lookup,String,Class,MethodType,MethodHandle,MethodType)` > > While there's a similarly-name method with slightly different parameters, I refrained from using it: > `java.lang.invoke.LambdaMetafactory#metafactory(MethodHandles.Lookup,String,MethodType,MethodType,MethodHandle,MethodType)`. Yes, iff means if-and-only-if and is used for extra precision in formal logic, mathematics. As @pavelrappo points out it's a relatively common occurrence in the OpenJDK sources, though perhaps not in the public javadocs. Perhaps a bit pretentious, but mostly a terse way to say "return true if the BSM method type exactly matches X, otherwise false". The broken link stems from the fact that the method I was targeting (a way to use condy for lambda proxy singletons rather than a `MethodHandle.constant`) was never integrated. We'll look at either getting that done (@briangoetz suggested the time might be ready for it) or remove this currently pointless static bootstrap specialization test. ------------- PR: https://git.openjdk.org/jdk/pull/12826 From eosterlund at openjdk.org Fri Mar 3 10:45:17 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 3 Mar 2023 10:45:17 GMT Subject: RFR: 8302780: Add support for vectorized arraycopy GC barriers [v8] In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 10:02:59 GMT, Andrew Haley wrote: >>> All this extreme cut-and-paste manual unrolling is very hard to read, maintain, and review. >> I wasn't going to say anything, because it's Erik, and what do I know! But I must push back here, this is too much. >> Please consider these style changes. >> >> In my opinion, the (very much needed) changes you suggest are outside the scope of this PR, which is about lifting the memory accesses, in their existing form, to a barrier API-level. Conflating this with your suggested changes would make it harder to review this PR, which is sufficiently complex in its current form. I totally agree that your suggestions would improve readability and maintainability, but couldn't we apply them in a follow-up RFE? > >> > All this extreme cut-and-paste manual unrolling is very hard to read, maintain, and review. >> > I wasn't going to say anything, because it's Erik, and what do I know! But I must push back here, this is too much. >> > Please consider these style changes. >> >> In my opinion, the (very much needed) changes you suggest are outside the scope of this PR, which is about lifting the memory accesses, in their existing form, to a barrier API-level. Conflating this with your suggested changes would make it harder to review this PR, which is sufficiently complex in its current form. I totally agree that your suggestions would improve readability and maintainability, but couldn't we apply them in a follow-up RFE? > > I disagree in every way. The added complexity, which is fixed so it no longer matters, made it near-impossible for me to reason about this PR. And, as John Rose put it, like any good carpenter we should clean up as we work. Thanks for the review @theRealAph! And thanks also to @robcasloz, @albertnetymk and @RealFYang. ------------- PR: https://git.openjdk.org/jdk/pull/12670 From mdoerr at openjdk.org Fri Mar 3 10:55:57 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 3 Mar 2023 10:55:57 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v9] In-Reply-To: References: Message-ID: > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Will get addressed separately: [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Handle HFA corner cases with overlapping registers in Java. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12708/files - new: https://git.openjdk.org/jdk/pull/12708/files/f23edd8e..98e242c2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=07-08 Stats: 53 lines in 4 files changed: 16 ins; 20 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From mdoerr at openjdk.org Fri Mar 3 11:02:18 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 3 Mar 2023 11:02:18 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v3] In-Reply-To: References: <8b3vVrV22RuhdRoRYacXV0ZeghFGgKkC8S_z-iMrzAQ=.dd84b743-8b51-4281-8f5f-f9eff6207bc7@github.com> Message-ID: On Wed, 1 Mar 2023 06:26:20 GMT, Martin Doerr wrote: >> src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/CallArranger.java line 293: >> >>> 291: } else { >>> 292: overlappingReg = new VMStorage(StorageType.STACK_AND_FLOAT, >>> 293: (short) STACK_SLOT_SIZE, (int) stackOffset - 4); >> >> I think you could remove the mixed VMStorage types here relatively easily by returning a `VMStorage[][]`, where each element is a single element array, but then for the `needOverlapping` case add another element to the array for the extra store (instead of replacing the existing one). >> >> Then when unboxing a `STRUCT_HFA`, `dup` the result of the `bufferLoad` and then do 2 `vmStore`s (one for each element). >> >> For boxing, you could just ignore the extra storage, and just `vmLoad` the first one (or, whichever one you like :)) > > Thanks! I need to find extra time for this. Sounds like a good idea and I may be able to get rid of some nasty code. Done by https://github.com/openjdk/jdk/pull/12708/commits/98e242c24c07ea977b7709b9f8d0c10ce87e84c0 (using a record instead of a `VMStorage[][]` because I think this is better readable). Note that it's a bit more complicated. I couldn't use your `dup` trick, because I need to put the value into a GP reg and one half of it to a FP reg. The Panama code doesn't support that (IllegalArgumentException: Invalid operand type: interface java.lang.foreign.MemorySegment. float expected). ------------- PR: https://git.openjdk.org/jdk/pull/12708 From djelinski at openjdk.org Fri Mar 3 11:23:14 2023 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Fri, 3 Mar 2023 11:23:14 GMT Subject: RFR: 8286781: Replace the deprecated/obsolete gethostbyname and inet_addr calls [v2] In-Reply-To: <3jFBC6xN7K5rRI3NUQODmTV3Fo03iRajvhNpd5Tolek=.6a2058a2-dc3d-4341-95e7-4eae498d3e1b@github.com> References: <3jFBC6xN7K5rRI3NUQODmTV3Fo03iRajvhNpd5Tolek=.6a2058a2-dc3d-4341-95e7-4eae498d3e1b@github.com> Message-ID: <_OvnSTx31PuBeUlYl9mJzN7AspzmRzF6cumv2HH5xfw=.dc4b5910-74e4-4e0d-b886-c4213bc32f23@github.com> On Fri, 3 Mar 2023 09:56:35 GMT, David Holmes wrote: >> We can replace `gethostbyname`, which is deprecated on Windows and Linux, with `getaddrinfo`. This API is available on all supported platforms and so can be placed in shared code. @djelinski pointed out that `getaddrinfo` can resolve both IP addresses and host names so the two step approach used in `networkStream::connect` is not necessary and we can do away with `os::get_host_by_name()` completely. >> >> The build is updated to enable winsock deprecation warnings, and now we need to use `ws2_32.lib` we can drop `wsock32.lib` (as it is basically a subset - again thanks @djelinski ). >> >> Testing >> - all Oracle builds in tiers 1-5 >> - All GHA builds >> >> The actual code change has to be manually tested because the code is only used by Ideal Graph Printing to connect to the Ideal Graph Visualizer. I've manually tested on Windows and Linux and @tobiasholenstein tested macOS. >> >> Thanks. > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > Restrict getaddrinfo to IPv4 only as per the rest of the code Marked as reviewed by djelinski (Committer). ------------- PR: https://git.openjdk.org/jdk/pull/12842 From duke at openjdk.org Fri Mar 3 11:31:56 2023 From: duke at openjdk.org (Jan Kratochvil) Date: Fri, 3 Mar 2023 11:31:56 GMT Subject: RFR: 8302191: Performance degradation for float/double modulo on Linux [v6] In-Reply-To: References: Message-ID: > I have OCA already processed/approved. I am not Author but my Author request is being processed these days (sent to Rob McKenna). > I did regression test x86_64 OpenJDK-8. I will leave other regression testing on GHA. > The patch (and former GCC performance regression) affects only x86_64+i686. Jan Kratochvil has updated the pull request incrementally with one additional commit since the last revision: Uppercase L - a review by turbanoff. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12508/files - new: https://git.openjdk.org/jdk/pull/12508/files/a04ee993..390d06c2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12508&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12508&range=04-05 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/12508.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12508/head:pull/12508 PR: https://git.openjdk.org/jdk/pull/12508 From aivanov at openjdk.org Fri Mar 3 11:34:16 2023 From: aivanov at openjdk.org (Alexey Ivanov) Date: Fri, 3 Mar 2023 11:34:16 GMT Subject: RFR: 8303480: Miscellaneous fixes to mostly invisible doc comments In-Reply-To: <-U8YFFuXm_hMf-bY1AVCRauRrE-fRYRxrx_yf38ZL1A=.d50884c5-cc4b-489a-b817-828faf876c76@github.com> References: <-U8YFFuXm_hMf-bY1AVCRauRrE-fRYRxrx_yf38ZL1A=.d50884c5-cc4b-489a-b817-828faf876c76@github.com> Message-ID: On Fri, 3 Mar 2023 10:09:27 GMT, Claes Redestad wrote: > Yes, iff means if-and-only-if and is used for extra precision in formal logic, mathematics. I've never come across it before. With your explanations, it makes perfect sense. ------------- PR: https://git.openjdk.org/jdk/pull/12826 From duke at openjdk.org Fri Mar 3 11:46:16 2023 From: duke at openjdk.org (Jan Kratochvil) Date: Fri, 3 Mar 2023 11:46:16 GMT Subject: RFR: 8302191: Performance degradation for float/double modulo on Linux [v5] In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 22:27:45 GMT, Joe Darcy wrote: >> Jan Kratochvil has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix copyright author. > > test/micro/org/openjdk/bench/vm/floatingpoint/DremFrem.java line 174: > >> 172: double inf_minus_inf = Double.POSITIVE_INFINITY - Double.POSITIVE_INFINITY; >> 173: double inf_times_zero = Double.POSITIVE_INFINITY * 0.0f; >> 174: double quiet_nan1 = Double.longBitsToDouble(0x7ffc000000000001l); > > Nit: whether or not a particular NaN bit pattern is quiet or signalling is an architecture-specific determination, it is not specified by the IEEE 754 standard. First I disagree: - [it makes formal recommendations for the encoding of the signaling/quiet NaN state.](https://en.wikipedia.org/wiki/IEEE_754-2008_revision#Clause_6:_Infinity,_NaNs,_and_sign_bit) - [A signaling NaN bit string should be encoded with the first bit of the trailing significand field being 0](https://irem.univ-reunion.fr/IMG/pdf/ieee-754-2008.pdf#page=47) Second the testcase tests just compatibility as AFAIK Java does not support the signaling. ------------- PR: https://git.openjdk.org/jdk/pull/12508 From duke at openjdk.org Fri Mar 3 11:51:47 2023 From: duke at openjdk.org (Jan Kratochvil) Date: Fri, 3 Mar 2023 11:51:47 GMT Subject: RFR: 8302191: Performance degradation for float/double modulo on Linux [v7] In-Reply-To: References: Message-ID: > I have OCA already processed/approved. I am not Author but my Author request is being processed these days (sent to Rob McKenna). > I did regression test x86_64 OpenJDK-8. I will leave other regression testing on GHA. > The patch (and former GCC performance regression) affects only x86_64+i686. Jan Kratochvil has updated the pull request incrementally with one additional commit since the last revision: Remove comments to be moved to JBS (Bug System) - a review by jddarcy. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12508/files - new: https://git.openjdk.org/jdk/pull/12508/files/390d06c2..ef4cd57a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12508&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12508&range=05-06 Stats: 9 lines in 2 files changed: 0 ins; 9 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/12508.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12508/head:pull/12508 PR: https://git.openjdk.org/jdk/pull/12508 From duke at openjdk.org Fri Mar 3 11:51:52 2023 From: duke at openjdk.org (Jan Kratochvil) Date: Fri, 3 Mar 2023 11:51:52 GMT Subject: RFR: 8302191: Performance degradation for float/double modulo on Linux [v5] In-Reply-To: References: Message-ID: <5IxJHzZ9p0cnxA7WQ6U23XCVEw7e2Yp9ye47uu76Y4E=.761cf840-9dca-40df-92e1-26b3d2a0786e@github.com> On Fri, 3 Mar 2023 05:58:37 GMT, Joe Darcy wrote: >> Jan Kratochvil has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix copyright author. > > src/hotspot/cpu/x86/sharedRuntime_x86.cpp line 129: > >> 127: // GCC had slow fmod(): >> 128: // since https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=4f2611b6e872c40e0bf4da38ff05df8c8fe0ee64 >> 129: // until https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=8020c9c42349f51f75239b9d35a2be41848a97bd > > We typically list such URLs in the bug database rather than the source code. I have removed it (but I do not like it much). Also removed it from the testcase. I will add it to JBS after I can login to my new account. ------------- PR: https://git.openjdk.org/jdk/pull/12508 From duke at openjdk.org Fri Mar 3 11:53:48 2023 From: duke at openjdk.org (Afshin Zafari) Date: Fri, 3 Mar 2023 11:53:48 GMT Subject: RFR: 8292059: Do not inline InstanceKlass::allocate_instance() [v2] In-Reply-To: References: Message-ID: <8koli6nAbt8Rx4Je8MRic0dPloLTb9IiUyw25BvUI0s=.07267dc7-d2a1-4426-8876-1e41b1a248ac@github.com> > The inline and not-inline versions of the method is stress tested to compare the performance difference. The statistics are drawn in the following charts. The vertical axis is in milliseconds. > > ![chart (2)](https://user-images.githubusercontent.com/4697012/221848555-2884313e-9d26-41c9-a265-3f1ce295b17b.png) > > ![chart (3)](https://user-images.githubusercontent.com/4697012/221863810-94118677-b4af-468f-90c6-5ea365ae3588.png) Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: 8292059: Do not inline InstanceKlass::allocate_instance() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12782/files - new: https://git.openjdk.org/jdk/pull/12782/files/48d5a430..30a5734c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12782&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12782&range=00-01 Stats: 38 lines in 8 files changed: 13 ins; 17 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/12782.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12782/head:pull/12782 PR: https://git.openjdk.org/jdk/pull/12782 From duke at openjdk.org Fri Mar 3 11:58:17 2023 From: duke at openjdk.org (Afshin Zafari) Date: Fri, 3 Mar 2023 11:58:17 GMT Subject: RFR: 8292059: Do not inline InstanceKlass::allocate_instance() [v2] In-Reply-To: <9ZXi9uNa5ETIhldKLCDAYojtXTGEg-5EexLwHNE2zhI=.026815b3-3117-4184-be4c-5fdf42c2655f@github.com> References: <9ZXi9uNa5ETIhldKLCDAYojtXTGEg-5EexLwHNE2zhI=.026815b3-3117-4184-be4c-5fdf42c2655f@github.com> Message-ID: <1W22QvJKaoJVRI5Wrx6DZEizGOxHsiSAEoLkHsCauQY=.6e934949-d854-49f8-9e1e-5786c0d04c8f@github.com> On Wed, 1 Mar 2023 12:38:59 GMT, Coleen Phillimore wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> 8292059: Do not inline InstanceKlass::allocate_instance() > > src/hotspot/share/oops/instanceKlass.inline.hpp line 188: > >> 186: } >> 187: >> 188: inline instanceOop InstanceKlass::allocate_instance(oop java_class, TRAPS) { > > In moving this, can you eliminate any #includes at the top? And migrate them to .cpp files that might need them. All the #include headers are removed except the `utilities/devirtualizer.inline.hpp`. The .cpp files that need it should include it before `oops/instanceKlass.inline.hpp` which violates the _sorted header filename_ in the coding style rules. ------------- PR: https://git.openjdk.org/jdk/pull/12782 From duke at openjdk.org Fri Mar 3 12:07:54 2023 From: duke at openjdk.org (Jan Kratochvil) Date: Fri, 3 Mar 2023 12:07:54 GMT Subject: RFR: 8302191: Performance degradation for float/double modulo on Linux [v8] In-Reply-To: References: Message-ID: > I have OCA already processed/approved. I am not Author but my Author request is being processed these days (sent to Rob McKenna). > I did regression test x86_64 OpenJDK-8. I will leave other regression testing on GHA. > The patch (and former GCC performance regression) affects only x86_64+i686. Jan Kratochvil has updated the pull request incrementally with one additional commit since the last revision: Always include the _WIN64 workaround - a review by dholmes-ora. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12508/files - new: https://git.openjdk.org/jdk/pull/12508/files/ef4cd57a..873562c8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12508&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12508&range=06-07 Stats: 71 lines in 2 files changed: 34 ins; 34 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/12508.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12508/head:pull/12508 PR: https://git.openjdk.org/jdk/pull/12508 From duke at openjdk.org Fri Mar 3 12:07:58 2023 From: duke at openjdk.org (Jan Kratochvil) Date: Fri, 3 Mar 2023 12:07:58 GMT Subject: RFR: 8302191: Performance degradation for float/double modulo on Linux [v2] In-Reply-To: <50HcTSZYkWhmyRTTSpQsJJ9Dv28R_RnY1XflczBBxeg=.7c6c1e71-7129-4bc2-9ab5-f5dde541f0a9@github.com> References: <50HcTSZYkWhmyRTTSpQsJJ9Dv28R_RnY1XflczBBxeg=.7c6c1e71-7129-4bc2-9ab5-f5dde541f0a9@github.com> Message-ID: On Wed, 1 Mar 2023 08:48:58 GMT, Jan Kratochvil wrote: >> src/hotspot/share/runtime/sharedRuntime.cpp line 236: >> >>> 234: const julong double_infinity = CONST64(0x7FF0000000000000); >>> 235: >>> 236: #ifndef X86 >> >> I wonder if the WIN64 workaround is actually needed/valid for non-X86 windows? > > It is true the comment says: >> 64-bit Windows on amd64 returns the wrong values for infinity operands. > > I have left the workaround really just for AMD64 Windows now. I am going to get it regression tested on aarch64 if that is enough. So the _WIN64 workaround will be always there. ------------- PR: https://git.openjdk.org/jdk/pull/12508 From ayang at openjdk.org Fri Mar 3 12:30:38 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 3 Mar 2023 12:30:38 GMT Subject: RFR: 8303534: Merge CompactibleSpace into ContiguousSpace [v2] In-Reply-To: References: Message-ID: > Simple refactoring of merging two types. > > Test: tier1-5 Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: copyright-year ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12841/files - new: https://git.openjdk.org/jdk/pull/12841/files/dc9f901f..da1931a1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12841&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12841&range=00-01 Stats: 6 lines in 6 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/12841.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12841/head:pull/12841 PR: https://git.openjdk.org/jdk/pull/12841 From mbaesken at openjdk.org Fri Mar 3 13:03:03 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 3 Mar 2023 13:03:03 GMT Subject: RFR: JDK-8303575: adjust Xen handling on Linux aarch64 Message-ID: After [JDK-8301050](https://bugs.openjdk.org/browse/JDK-8301050) the Xen handling on aarch64 should be slightly adjusted. The output in VM_Version::print_platform_virtualization_info was missed and needs to be added for Xen. Additionally a new XenPVHVM virtualization type could be introduced because this describes the Xen on aarch64 better. See also https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/virtualization-on-arm-with-xen where the naming is used. ------------- Commit messages: - JDK-8303575 Changes: https://git.openjdk.org/jdk/pull/12853/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12853&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303575 Stats: 13 lines in 3 files changed: 5 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/12853.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12853/head:pull/12853 PR: https://git.openjdk.org/jdk/pull/12853 From duke at openjdk.org Fri Mar 3 13:45:11 2023 From: duke at openjdk.org (Jan Kratochvil) Date: Fri, 3 Mar 2023 13:45:11 GMT Subject: RFR: 8302191: Performance degradation for float/double modulo on Linux [v9] In-Reply-To: References: Message-ID: > I have OCA already processed/approved. I am not Author but my Author request is being processed these days (sent to Rob McKenna). > I did regression test x86_64 OpenJDK-8. I will leave other regression testing on GHA. > The patch (and former GCC performance regression) affects only x86_64+i686. Jan Kratochvil has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Merge remote-tracking branch 'origin/master' into modulo - Always include the _WIN64 workaround - a review by dholmes-ora. - Remove comments to be moved to JBS (Bug System) - a review by jddarcy. - Uppercase L - a review by turbanoff. - Fix copyright author. - Fix WIN32 vs. WIN64. - Update according to the upstream review by David Holmes. - 8302191: Performance degradation for float/double modulo on Linux ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12508/files - new: https://git.openjdk.org/jdk/pull/12508/files/873562c8..cdcf89d7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12508&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12508&range=07-08 Stats: 18454 lines in 676 files changed: 11810 ins; 3539 del; 3105 mod Patch: https://git.openjdk.org/jdk/pull/12508.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12508/head:pull/12508 PR: https://git.openjdk.org/jdk/pull/12508 From tschatzl at openjdk.org Fri Mar 3 14:58:12 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 3 Mar 2023 14:58:12 GMT Subject: RFR: 8191565: Last-ditch Full GC should also move humongous objects In-Reply-To: <58l059EvQI6HNQyjUYSGYEWt6x-c1yvtmfX1QWfinH8=.87517ba1-ec81-4b9f-a41b-b05c8d33cf3d@github.com> References: <58l059EvQI6HNQyjUYSGYEWt6x-c1yvtmfX1QWfinH8=.87517ba1-ec81-4b9f-a41b-b05c8d33cf3d@github.com> Message-ID: On Thu, 2 Mar 2023 13:48:10 GMT, Ivan Walulya wrote: > Hi All, > > Please review this change to move humongous regions during the Last-Ditch full gc ( on `do_maximal_compaction`). This change will enable G1 to avoid encountering Out-Of-Memory errors that may occur due to the fragmentation of memory regions caused by the allocation of large memory blocks. > > Here's how it works: At the end of `phase2_prepare_compaction`, G1 performs a serial compaction process for regular objects, which results in the heap being divided into two parts. The first part is a densely populated prefix that contains all the regular objects that have been moved. The second part consists of the remaining heap space, which may contain free regions, uncommitted regions, and regions that are not compacting. By moving/compacting the humongous objects in the second part of the heap closer to the dense prefix, G1 reduces the region fragmentation and avoids running into OOM errors. > > We have enabled for G1 the Jtreg test that was previously used only for Shenandoah to test such workload. > > Testing: Tier 1-3 Changes requested by tschatzl (Reviewer). src/hotspot/share/gc/g1/g1FullCollector.hpp line 87: > 85: G1FullGCCompactionPoint _serial_compaction_point; > 86: G1FullGCCompactionPoint _humongous_compaction_point; > 87: GrowableArray* _humongous_compaction_regions; This could be a `GrowableArrayCHeap _humongous_compaction_region` to avoid the necessity for explicit new/delete. Afaics there is no reassignment of that variable ever. src/hotspot/share/gc/g1/g1FullCollector.hpp line 150: > 148: > 149: inline void add_humongous_region(HeapRegion* hr) { _humongous_compaction_regions->append(hr); } > 150: GrowableArray* humongous_compaction_regions() { return _humongous_compaction_regions; } Please move to the `.inline.hpp` file. src/hotspot/share/gc/g1/g1FullGCCompactTask.cpp line 59: > 57: > 58: size_t size = obj->size(); > 59: // copy object and reinit its mark Suggestion: // Copy object and reinit its mark. src/hotspot/share/gc/g1/g1FullGCCompactTask.cpp line 117: > 115: } > 116: > 117: void G1FullGCCompactTask::reset_humongous_metadata(HeapRegion* start_hr, uint num_regions, size_t word_size) { Maybe this could somehow be refactored with `G1CollectedHeap::humongous_obj_allocate_initialize_regions`; probably not easily. src/hotspot/share/gc/g1/g1FullGCCompactTask.cpp line 174: > 172: size_t word_size = obj->size(); > 173: > 174: uint num_regions = (uint) G1CollectedHeap::humongous_obj_size_in_regions(word_size); Suggestion: uint num_regions = (uint)G1CollectedHeap::humongous_obj_size_in_regions(word_size); src/hotspot/share/gc/g1/g1FullGCCompactionPoint.cpp line 153: > 151: oop obj = cast_to_oop(hr->bottom()); > 152: size_t obj_size = obj->size(); > 153: int num_regions = (int) G1CollectedHeap::humongous_obj_size_in_regions(obj_size); Please use signed int (or why not `size_t`?) src/hotspot/share/gc/g1/g1FullGCCompactionPoint.cpp line 159: > 157: } > 158: > 159: // Find contiguous compaction target regions for the humongous object Suggestion: // Find contiguous compaction target regions for the humongous object. src/hotspot/share/gc/g1/g1FullGCCompactionPoint.cpp line 160: > 158: > 159: // Find contiguous compaction target regions for the humongous object > 160: Pair range = find_contiguous_before(hr, num_regions); (I'm suprised that MSVC does not complain here passing an int to the uint parameter) src/hotspot/share/gc/g1/g1FullGCCompactionPoint.cpp line 165: > 163: > 164: if (range_begin == range_end) { > 165: // No contiguous compaction target regions found, so the object cannot be moved Suggestion: // No contiguous compaction target regions found, so the object cannot be moved. src/hotspot/share/gc/g1/g1FullGCCompactionPoint.cpp line 172: > 170: _collector->marker(0)->preserved_stack()->push_if_necessary(obj, obj->mark()); > 171: > 172: HeapRegion* destn_hr = _compaction_regions->at(range_begin); I would prefer `dest_hr` instead of `destn_hr` as abbreviation for "destination". It seems unusual. src/hotspot/share/gc/g1/g1FullGCCompactionPoint.cpp line 176: > 174: assert(obj->is_forwarded(), "Object must be forwarded!"); > 175: > 176: // Add the humongous object regions to the compaction point Suggestion: // Add the humongous object regions to the compaction point. src/hotspot/share/gc/g1/g1FullGCCompactionPoint.cpp line 205: > 203: // Check if the current region and the previous region are contiguous. > 204: bool regions_are_contiguous = (_compaction_regions->at(range_end)->hrm_index() - _compaction_regions->at(range_end - 1)->hrm_index()) == 1; > 205: contiguous_region_count = regions_are_contiguous ? contiguous_region_count + 1 : 1; Suggestion: contiguous_region_count = regions_are_contiguous ? contiguous_region_count + 1 : 1; src/hotspot/share/gc/g1/heapRegion.inline.hpp line 192: > 190: } > 191: > 192: inline void HeapRegion::reset_compacted_humongous_after_full_gc(HeapWord* new_top) { This method could probably just call `reset_compacted_after_full_gc` as they are identical. (I am not suggestion to remove this method, although I'm not completely sure it's useful). src/hotspot/share/utilities/growableArray.hpp line 261: > 259: > 260: // Remove all elements in the range [start - end). The order is preserved. > 261: void erase(int start, int end) { I'd probably name this new method `remove_range` instead of using something completely unrelated. ------------- PR: https://git.openjdk.org/jdk/pull/12830 From kvn at openjdk.org Fri Mar 3 16:24:11 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 3 Mar 2023 16:24:11 GMT Subject: RFR: 8303415: Add VM_Version::is_intrinsic_supported(id) Message-ID: Currently we check VM flags, directives and JIT compiler support when we generate intrinsics. We have *product* VM flags for most intrinsics and set them in VM based on HW support. But not all intrinsics have such flags and it is not scalable to add new *product* flag for each new intrinsic. Also we have `-XX:DisableIntrinsic=` and `-XX:ControlIntrinsic=` flags to control intrinsics from command line. We don't need specific flags for that. I propose to add new `VM_Version::is_intrinsic_supported(id)` method to check platform support for intrinsic without adding new flag. I used it for `_floatToFloat16` intrinsic for my work on [JDK-8302976](https://bugs.openjdk.org/browse/JDK-8302976). Additional fixes: Fixed Interpreter to skip intrinsics if they are disabled with flag. Moved Interpreter's `InlineIntrinsics` flag check into one place in shared code. Added separate interpreter id for `_dsqrt_strict` so it could be disabled separately from regular `_dsqrt`. Added missing `native` mark to `_currentThread`. Removed unused `AbstractInterpreter::in_native_entry()`. Cleanup C2 intrinsic checks code. Tested tier1-4,xcomp,stress. Also ran tier1-3,xcomp with `-XX:-InlineIntrinsics`. ------------- Commit messages: - 8303415: Add VM_Version::is_intrinsic_supported(id) Changes: https://git.openjdk.org/jdk/pull/12858/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12858&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303415 Stats: 335 lines in 25 files changed: 214 ins; 82 del; 39 mod Patch: https://git.openjdk.org/jdk/pull/12858.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12858/head:pull/12858 PR: https://git.openjdk.org/jdk/pull/12858 From duke at openjdk.org Fri Mar 3 16:49:00 2023 From: duke at openjdk.org (Afshin Zafari) Date: Fri, 3 Mar 2023 16:49:00 GMT Subject: Integrated: 8301117: Remove old_size param from ResizeableResourceHashtable::resize() In-Reply-To: References: Message-ID: On Sat, 18 Feb 2023 15:31:46 GMT, Afshin Zafari wrote: > `old_size` is removed from the function parameters in definitions and calls. > `table_size` is used as old size. > ### Test > mach5 tiers 1-5. This pull request has now been integrated. Changeset: ae797c64 Author: Afshin Zafari Committer: Calvin Cheung URL: https://git.openjdk.org/jdk/commit/ae797c64afc61a0b1c8fbc48f56b2c41f54a7301 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod 8301117: Remove old_size param from ResizeableResourceHashtable::resize() Reviewed-by: dholmes, coleenp ------------- PR: https://git.openjdk.org/jdk/pull/12635 From coleenp at openjdk.org Fri Mar 3 16:57:20 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 3 Mar 2023 16:57:20 GMT Subject: RFR: 8292059: Do not inline InstanceKlass::allocate_instance() [v2] In-Reply-To: <8koli6nAbt8Rx4Je8MRic0dPloLTb9IiUyw25BvUI0s=.07267dc7-d2a1-4426-8876-1e41b1a248ac@github.com> References: <8koli6nAbt8Rx4Je8MRic0dPloLTb9IiUyw25BvUI0s=.07267dc7-d2a1-4426-8876-1e41b1a248ac@github.com> Message-ID: On Fri, 3 Mar 2023 11:53:48 GMT, Afshin Zafari wrote: >> The inline and not-inline versions of the method is stress tested to compare the performance difference. The statistics are drawn in the following charts. The vertical axis is in milliseconds. >> >> ![chart (2)](https://user-images.githubusercontent.com/4697012/221848555-2884313e-9d26-41c9-a265-3f1ce295b17b.png) >> >> ![chart (3)](https://user-images.githubusercontent.com/4697012/221863810-94118677-b4af-468f-90c6-5ea365ae3588.png) > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > 8292059: Do not inline InstanceKlass::allocate_instance() I have one suggested change but you should point out that the parameter change is an optimization that replaces some of the inlined benefits. src/hotspot/share/oops/instanceKlass.cpp line 1390: > 1388: } > 1389: > 1390: instanceOop InstanceKlass::allocate_instance(InstanceKlass* ik, TRAPS) { Should this have a null check on ik, like in the inlined function that it replaced? src/hotspot/share/prims/jvmtiEnvBase.cpp line 28: > 26: #include "classfile/classLoaderDataGraph.hpp" > 27: #include "classfile/javaClasses.inline.hpp" > 28: #include "classfile/vmSymbols.hpp" I assume this is a transitive include, so good to put it in the file where it's used. ------------- Changes requested by coleenp (Reviewer). PR: https://git.openjdk.org/jdk/pull/12782 From coleenp at openjdk.org Fri Mar 3 16:57:23 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 3 Mar 2023 16:57:23 GMT Subject: RFR: 8292059: Do not inline InstanceKlass::allocate_instance() [v2] In-Reply-To: <1W22QvJKaoJVRI5Wrx6DZEizGOxHsiSAEoLkHsCauQY=.6e934949-d854-49f8-9e1e-5786c0d04c8f@github.com> References: <9ZXi9uNa5ETIhldKLCDAYojtXTGEg-5EexLwHNE2zhI=.026815b3-3117-4184-be4c-5fdf42c2655f@github.com> <1W22QvJKaoJVRI5Wrx6DZEizGOxHsiSAEoLkHsCauQY=.6e934949-d854-49f8-9e1e-5786c0d04c8f@github.com> Message-ID: On Fri, 3 Mar 2023 11:55:14 GMT, Afshin Zafari wrote: >> src/hotspot/share/oops/instanceKlass.inline.hpp line 188: >> >>> 186: } >>> 187: >>> 188: inline instanceOop InstanceKlass::allocate_instance(oop java_class, TRAPS) { >> >> In moving this, can you eliminate any #includes at the top? And migrate them to .cpp files that might need them. > > All the #include headers are removed except the `utilities/devirtualizer.inline.hpp`. The .cpp files that need it should include it before `oops/instanceKlass.inline.hpp` which violates the _sorted header filename_ in the coding style rules. Good. I think devirtualizer.inline.hpp is used by the GC inlined functions so should remain here. ------------- PR: https://git.openjdk.org/jdk/pull/12782 From cjplummer at openjdk.org Fri Mar 3 17:41:14 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 3 Mar 2023 17:41:14 GMT Subject: RFR: 8303534: Merge CompactibleSpace into ContiguousSpace [v2] In-Reply-To: References: Message-ID: <8gVdjyz3i-jfbjjlZ9Cw84YNqZzqDLmn9gmF68VyLbs=.074e5e4a-1564-4244-8df2-cb97974ce68d@github.com> On Fri, 3 Mar 2023 12:30:38 GMT, Albert Mingkun Yang wrote: >> Simple refactoring of merging two types. >> >> Test: tier1-5 > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > copyright-year SA changes look good. ------------- Marked as reviewed by cjplummer (Reviewer). PR: https://git.openjdk.org/jdk/pull/12841 From dnsimon at openjdk.org Fri Mar 3 17:55:42 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 3 Mar 2023 17:55:42 GMT Subject: RFR: 8303588: [JVMCI] make JVMCI source directories conform with standard layout Message-ID: <1uJ1B_Mo_V1nye9Mvp59QFMvaW5-0wZeg_irFdc3MEA=.5f50a9ee-7b9b-4afd-b8d9-1e883cb37db8@github.com> The layout of the sources in `jdk.internal.vm.ci` stems from their initial development outside the JDK where they adopted a layout influenced by Eclipse. There's no good reason for maintaining this layout any more. Moving to a standard layout also means IDEs will be able to make sense of the JVMCI sources in `src.zip`. ------------- Commit messages: - made JVMCI source directories conform with standard layout Changes: https://git.openjdk.org/jdk/pull/12860/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12860&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303588 Stats: 20 lines in 209 files changed: 0 ins; 20 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/12860.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12860/head:pull/12860 PR: https://git.openjdk.org/jdk/pull/12860 From dnsimon at openjdk.org Fri Mar 3 18:00:12 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 3 Mar 2023 18:00:12 GMT Subject: RFR: 8303577: [JVMCI] OOME causes crash while translating exceptions Message-ID: JDK-8297431 added code for special handling of OutOfMemoryError when translating an exception between libjvmci and HotSpot[1]. Unfortunately, this code was deleted by JDK-8298099 when moving the exception translation mechanism to VMSupport[2]. This causes the VM to crash when an OOME occurs while translating an exception from HotSpot to libjvmci. This PR revives the deleted code. This bug was found by running `test/jdk/java/util/concurrent/locks/Lock/OOMEInAQS.java` on libgraal. The fix now makes the test pass. [1] https://github.com/openjdk/jdk/commit/952e10055135613e8ea2b818a4f35842936f5633#diff-4d3a3b7e7e12e1d5b4cf3e4677d9e0de5e9df3bbf1bbfa0d8d43d12098d67dc4R222-R234 [2] https://github.com/openjdk/jdk/commit/8b69a2e434ad2fa3369079622b57afb973d5bd9a#diff-7292551772c27b7152a3333f03cbbad90a897c5e37c6a97d4026be835e6d8fe1R121-R125 ------------- Commit messages: - decodeAndThrowThrowable needs to handle error codes Changes: https://git.openjdk.org/jdk/pull/12857/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12857&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303577 Stats: 31 lines in 5 files changed: 22 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/12857.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12857/head:pull/12857 PR: https://git.openjdk.org/jdk/pull/12857 From kvn at openjdk.org Fri Mar 3 18:06:15 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 3 Mar 2023 18:06:15 GMT Subject: RFR: 8303588: [JVMCI] make JVMCI source directories conform with standard layout In-Reply-To: <1uJ1B_Mo_V1nye9Mvp59QFMvaW5-0wZeg_irFdc3MEA=.5f50a9ee-7b9b-4afd-b8d9-1e883cb37db8@github.com> References: <1uJ1B_Mo_V1nye9Mvp59QFMvaW5-0wZeg_irFdc3MEA=.5f50a9ee-7b9b-4afd-b8d9-1e883cb37db8@github.com> Message-ID: On Fri, 3 Mar 2023 17:47:20 GMT, Doug Simon wrote: > The layout of the sources in `jdk.internal.vm.ci` stems from their initial development outside the JDK where they adopted a layout influenced by Eclipse. > > There's no good reason for maintaining this layout any more. Moving to a standard layout also means IDEs will be able to make sense of the JVMCI sources in `src.zip`. Good. Please, test it in mach5. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.org/jdk/pull/12860 From kvn at openjdk.org Fri Mar 3 18:08:37 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 3 Mar 2023 18:08:37 GMT Subject: RFR: 8303577: [JVMCI] OOME causes crash while translating exceptions In-Reply-To: References: Message-ID: On Fri, 3 Mar 2023 15:40:01 GMT, Doug Simon wrote: > JDK-8297431 added code for special handling of OutOfMemoryError when translating an exception between libjvmci and HotSpot[1]. > Unfortunately, this code was deleted by JDK-8298099 when moving the exception translation mechanism to VMSupport[2]. > This causes the VM to crash when an OOME occurs while translating an exception from HotSpot to libjvmci. > This PR revives the deleted code. > > This bug was found by running `test/jdk/java/util/concurrent/locks/Lock/OOMEInAQS.java` on libgraal. The fix now makes the test pass. > > [1] https://github.com/openjdk/jdk/commit/952e10055135613e8ea2b818a4f35842936f5633#diff-4d3a3b7e7e12e1d5b4cf3e4677d9e0de5e9df3bbf1bbfa0d8d43d12098d67dc4R222-R234 > [2] https://github.com/openjdk/jdk/commit/8b69a2e434ad2fa3369079622b57afb973d5bd9a#diff-7292551772c27b7152a3333f03cbbad90a897c5e37c6a97d4026be835e6d8fe1R121-R125 Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.org/jdk/pull/12857 From dholmes at openjdk.org Fri Mar 3 21:25:06 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 3 Mar 2023 21:25:06 GMT Subject: RFR: 8286781: Replace the deprecated/obsolete gethostbyname and inet_addr calls [v2] In-Reply-To: References: Message-ID: On Fri, 3 Mar 2023 07:30:39 GMT, Kim Barrett wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> Restrict getaddrinfo to IPv4 only as per the rest of the code > > Looks good. Thanks for the reviews @kimbarrett and @djelinski ! I'll leave this till early next week to integrate. ------------- PR: https://git.openjdk.org/jdk/pull/12842 From sviswanathan at openjdk.org Sat Mar 4 01:15:19 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Sat, 4 Mar 2023 01:15:19 GMT Subject: RFR: 8302191: Performance degradation for float/double modulo on Linux [v9] In-Reply-To: References: Message-ID: <_z7GiZvqh8NpSWzxLMKrcQB_0g_xz8lneN9LOI6DedI=.6453aa68-1f70-4662-897c-04537214cd8a@github.com> On Fri, 3 Mar 2023 13:45:11 GMT, Jan Kratochvil wrote: >> I have OCA already processed/approved. I am not Author but my Author request is being processed these days (sent to Rob McKenna). >> I did regression test x86_64 OpenJDK-8. I will leave other regression testing on GHA. >> The patch (and former GCC performance regression) affects only x86_64+i686. > > Jan Kratochvil has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Merge remote-tracking branch 'origin/master' into modulo > - Always include the _WIN64 workaround - a review by dholmes-ora. > - Remove comments to be moved to JBS (Bug System) - a review by jddarcy. > - Uppercase L - a review by turbanoff. > - Fix copyright author. > - Fix WIN32 vs. WIN64. > - Update according to the upstream review by David Holmes. > - 8302191: Performance degradation for float/double modulo on Linux Very nice performance increase. The only concern I have is that the x87 fpu control (using fldcw instruction) is not set in 64 bit builds by the JVM anymore explicitly. It is only set in the 32bit builds. Maybe @iwanowww or @shipilev have some thoughts on this. ------------- PR: https://git.openjdk.org/jdk/pull/12508 From duke at openjdk.org Sat Mar 4 02:28:43 2023 From: duke at openjdk.org (Jan Kratochvil) Date: Sat, 4 Mar 2023 02:28:43 GMT Subject: RFR: 8302191: Performance degradation for float/double modulo on Linux [v10] In-Reply-To: References: Message-ID: > I have OCA already processed/approved. I am not Author but my Author request is being processed these days (sent to Rob McKenna). > I did regression test x86_64 OpenJDK-8. I will leave other regression testing on GHA. > The patch (and former GCC performance regression) affects only x86_64+i686. Jan Kratochvil has updated the pull request incrementally with one additional commit since the last revision: Fix win32 broken build. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12508/files - new: https://git.openjdk.org/jdk/pull/12508/files/cdcf89d7..3e1c05d0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12508&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12508&range=08-09 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/12508.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12508/head:pull/12508 PR: https://git.openjdk.org/jdk/pull/12508 From stuefe at openjdk.org Sat Mar 4 07:05:45 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 4 Mar 2023 07:05:45 GMT Subject: RFR: JDK-8296995: ostream should handle snprintf(3) errors in release builds [v5] In-Reply-To: References: Message-ID: <87tTzvYaPryNHP6mcel-CEqUU6u8JoTo4VeQ0itFkN8=.a7fe6454-605d-4f64-925e-e5a51c844c78@github.com> > Small fix for a very unlikely problem. > > All streams in ostream.hpp end up using `os::snprintf()`, which uses `::vsnprintf()`. `vsnprintf(3)`can fail and return -1. > > The chance for this to happen is small. snprintf errors are usually encoding errors though not always (see third example at https://stackoverflow.com/questions/65334245/what-is-an-encoding-error-for-sprintf-that-should-return-1). I found "%ls" in one place in windows coding, so I am not sure we can always exclude the possibility of wide strings being used in our code base, or that of printing with outside-provided format strings. > > In case of an error, we assert in debug builds but don't handle it in release. There, this situation gets misdiagnosed later as a buffer overflow because we cast the signedness of the result away (see `outputStream::do_vsnprintf()`). > > --- > > The patch is trivial. The most exciting thing is the gtest, I guess. > > In release builds, we now treat this condition as an empty string write. I considered printing a clear marker into the stream instead, e.g. "ENCODING ERROR", but ultimately did not do it. Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - copyrights - erge branch 'master' into JDK-8296995-ostream-handle-sprintf-errors - fix copyright - feedback martin - JDK-8296995-ostream-handle-sprintf-errors ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11160/files - new: https://git.openjdk.org/jdk/pull/11160/files/b41f13d4..407345ca Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11160&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11160&range=03-04 Stats: 384575 lines in 5749 files changed: 193196 ins; 138159 del; 53220 mod Patch: https://git.openjdk.org/jdk/pull/11160.diff Fetch: git fetch https://git.openjdk.org/jdk pull/11160/head:pull/11160 PR: https://git.openjdk.org/jdk/pull/11160 From kbarrett at openjdk.org Sat Mar 4 11:19:12 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 4 Mar 2023 11:19:12 GMT Subject: RFR: 8302189: Mark assertion failures noreturn In-Reply-To: References: <4gLeUT6_5s6WpG-b-0146d490ZRgBq_JON1-5P3Wgtk=.e0bc3fd3-fce0-47b8-9478-1de796128731@github.com> Message-ID: On Fri, 3 Mar 2023 04:35:35 GMT, David Holmes wrote: > I don't use these particular debugging mechanism, neither Debugger nor BREAKPOINT, but it seems potentially problematic to me that the removed BREAKPOINTs happened after the error was reported, and IIUC the new mechanism will activate before the error is reported. The removed BREAKPOINTs were not happening after the error was reported. Normally the report functions would report and die, without reaching the BREAKPOINT at all. They were only reachable when "Debugging" was enabled by being in one of the manually invoked "commands", which disabled the report function and made it return immediately to the BREAKPOINT, which isn't really very interesting because in one of these commands we want to just "ignore" errors entirely. > make/hotspot/lib/CompileJvm.gmk line 103: > >> 101: DISABLED_WARNINGS_xlc := tautological-compare shift-negative-value >> 102: >> 103: DISABLED_WARNINGS_microsoft := 4624 4244 4291 4146 4127 4722 > > It is annoying that we don't document what these warnings are. :( I've made the same comment in the past. This time I decided to do something about it: https://bugs.openjdk.org/browse/JDK-8303618 Document disabled C/C++ warnings on Windows > src/hotspot/share/utilities/debug.cpp line 85: > >> 83: if (is_enabled()) { >> 84: fatal("Multiple Debugging contexts"); >> 85: } > > This seems too restrictive as you could hit different DebuggingContexts in different threads. ?? This facility is only intended for use by manually invoked commands while the program is stopped in a debugger. Multi-threaded use is not an issue (and was not supported previously either). I don't think there are any nested uses either, but I've now run across a couple of places where nesting could be useful. So I'm changing the state from a simple bool to a nesting counter. > src/hotspot/share/utilities/debug.cpp line 290: > >> 288: private: >> 289: ResourceMark _rm; >> 290: DebuggingContext _debugging; > > Why a different initialization syntax here? I don't understand the question? These are member variable declarations. ------------- PR: https://git.openjdk.org/jdk/pull/12845 From kbarrett at openjdk.org Sat Mar 4 11:23:58 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 4 Mar 2023 11:23:58 GMT Subject: RFR: 8302189: Mark assertion failures noreturn [v2] In-Reply-To: <4gLeUT6_5s6WpG-b-0146d490ZRgBq_JON1-5P3Wgtk=.e0bc3fd3-fce0-47b8-9478-1de796128731@github.com> References: <4gLeUT6_5s6WpG-b-0146d490ZRgBq_JON1-5P3Wgtk=.e0bc3fd3-fce0-47b8-9478-1de796128731@github.com> Message-ID: > Also 8302799: Refactor Debugging variable usage for noreturn crash reporting Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: make Debugging::_enabled a nesting counter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12845/files - new: https://git.openjdk.org/jdk/pull/12845/files/ca68e500..f296ab62 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12845&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12845&range=00-01 Stats: 10 lines in 2 files changed: 1 ins; 4 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/12845.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12845/head:pull/12845 PR: https://git.openjdk.org/jdk/pull/12845 From jwaters at openjdk.org Sat Mar 4 19:01:15 2023 From: jwaters at openjdk.org (Julian Waters) Date: Sat, 4 Mar 2023 19:01:15 GMT Subject: RFR: 8250269: Replace ATTRIBUTE_ALIGNED with alignas [v15] In-Reply-To: References: <9QKV9cYFTo_1D8R-mI80lnewNkA0ceJNKFPbrvICxl4=.d6736b76-8324-4084-bede-6e144b4f6c04@github.com> Message-ID: On Sat, 4 Feb 2023 15:05:06 GMT, Julian Waters wrote: >> C++11 added the alignas attribute, for the purpose of specifying alignment on types, much like compiler specific syntax such as gcc's __attribute__((aligned(x))) or Visual C++'s __declspec(align(x)). >> >> We can phase out the use of the macro in favor of the standard attribute. In the meantime, we can replace the compiler specific definitions of ATTRIBUTE_ALIGNED with a portable definition. We might deprecate the use of the macro but changing its implementation quickly and cleanly applies the feature where the macro is being used. >> >> Note: With certain parts of HotSpot using ATTRIBUTE_ALIGNED so indiscriminately, this commit will likely take some time to get right >> >> This will require adding the alignas attribute to the list of language features approved for use in HotSpot code. (Completed with [8297912](https://github.com/openjdk/jdk/pull/11446)) > > Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: > > - Merge branch 'openjdk:master' into alignas > - alignas > - Merge branch 'openjdk:master' into alignas > - Merge branch 'openjdk:master' into alignas > - Merge branch 'openjdk:master' into alignas > - Merge branch 'openjdk:master' into alignas > - Merge branch 'openjdk:master' into alignas > - Merge branch 'openjdk:master' into alignas > - Merge branch 'openjdk:master' into alignas > - Merge branch 'openjdk:master' into alignas > - ... and 5 more: https://git.openjdk.org/jdk/compare/9971522f...a621bb62 :( ------------- PR: https://git.openjdk.org/jdk/pull/11431 From dnsimon at openjdk.org Sat Mar 4 21:49:02 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Sat, 4 Mar 2023 21:49:02 GMT Subject: RFR: 8303588: [JVMCI] make JVMCI source directories conform with standard layout In-Reply-To: References: <1uJ1B_Mo_V1nye9Mvp59QFMvaW5-0wZeg_irFdc3MEA=.5f50a9ee-7b9b-4afd-b8d9-1e883cb37db8@github.com> Message-ID: <77AUCWTeYZOzVv-uRmie21wQ6Csaz2_FCVseAwFXtrs=.9941a87f-4a8e-4089-aff6-db8297a15c9d@github.com> On Fri, 3 Mar 2023 18:02:22 GMT, Vladimir Kozlov wrote: > Please, test it in mach5 Done - please see link in issue. All 6 failures are not related to this change. ------------- PR: https://git.openjdk.org/jdk/pull/12860 From dnsimon at openjdk.org Sat Mar 4 21:55:12 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Sat, 4 Mar 2023 21:55:12 GMT Subject: RFR: 8303588: [JVMCI] make JVMCI source directories conform with standard layout In-Reply-To: References: <1uJ1B_Mo_V1nye9Mvp59QFMvaW5-0wZeg_irFdc3MEA=.5f50a9ee-7b9b-4afd-b8d9-1e883cb37db8@github.com> Message-ID: On Fri, 3 Mar 2023 18:02:22 GMT, Vladimir Kozlov wrote: >> The layout of the sources in `jdk.internal.vm.ci` stems from their initial development outside the JDK where they adopted a layout influenced by Eclipse. >> >> There's no good reason for maintaining this layout any more. Moving to a standard layout also means IDEs will be able to make sense of the JVMCI sources in `src.zip`. > > Good. > Please, test it in mach5. Thanks for the review @vnkozlov . Since @tkrodriguez approved an internal review for these changes, I'm going to integrate. ------------- PR: https://git.openjdk.org/jdk/pull/12860 From dnsimon at openjdk.org Sat Mar 4 21:55:15 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Sat, 4 Mar 2023 21:55:15 GMT Subject: Integrated: 8303588: [JVMCI] make JVMCI source directories conform with standard layout In-Reply-To: <1uJ1B_Mo_V1nye9Mvp59QFMvaW5-0wZeg_irFdc3MEA=.5f50a9ee-7b9b-4afd-b8d9-1e883cb37db8@github.com> References: <1uJ1B_Mo_V1nye9Mvp59QFMvaW5-0wZeg_irFdc3MEA=.5f50a9ee-7b9b-4afd-b8d9-1e883cb37db8@github.com> Message-ID: On Fri, 3 Mar 2023 17:47:20 GMT, Doug Simon wrote: > The layout of the sources in `jdk.internal.vm.ci` stems from their initial development outside the JDK where they adopted a layout influenced by Eclipse. > > There's no good reason for maintaining this layout any more. Moving to a standard layout also means IDEs will be able to make sense of the JVMCI sources in `src.zip`. This pull request has now been integrated. Changeset: 9fdbf3cf Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/9fdbf3cfc4bf58daa93807b47e403536e4681e90 Stats: 20 lines in 209 files changed: 0 ins; 20 del; 0 mod 8303588: [JVMCI] make JVMCI source directories conform with standard layout Reviewed-by: kvn ------------- PR: https://git.openjdk.org/jdk/pull/12860 From darcy at openjdk.org Sun Mar 5 06:19:06 2023 From: darcy at openjdk.org (Joe Darcy) Date: Sun, 5 Mar 2023 06:19:06 GMT Subject: RFR: JDK-8302801: Remove fdlibm C sources [v4] In-Reply-To: References: Message-ID: > While the review of https://github.com/openjdk/jdk/pull/12800 finishes up, I thought I'd get out for the review the next phase of the FDLIBM port: removing the FDLIBM C sources from the repo. > > A repo with the changes for JDK-8302027 and this PR successful build on the default set of platform and successfully run tier 1 tests, which includes tests of the math library. > > There are a few remaining references to the case-independent string "fdlibm" in the make directory and HotSpot sources. HotSpot contains a partial fork for FDLIBM (a tine of FDLIBM?) to use for intrinsics. The remaining make machinery contains logic to determine what set of gcc options can be used for the compile. > > The intention of this change is to remove use of FDLIBM for the core libraries. Joe Darcy has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 20 commits: - Merge branch 'master' of https://git.openjdk.org/jdk into JDK-8302801-expr - Respond to review feedback. - Respond to review feedback and add description of transliteration process. - JDK-8302801: Remove fdlibm C sources - Update src/java.base/share/classes/java/lang/FdLibm.java Co-authored-by: Andrey Turbanov - Update src/java.base/share/classes/java/lang/FdLibm.java Co-authored-by: Andrey Turbanov - Update src/java.base/share/classes/java/lang/FdLibm.java Co-authored-by: Andrey Turbanov - Update src/java.base/share/classes/java/lang/FdLibm.java Co-authored-by: Andrey Turbanov - Update src/java.base/share/classes/java/lang/FdLibm.java Co-authored-by: Andrey Turbanov - Update src/java.base/share/classes/java/lang/FdLibm.java Co-authored-by: Andrey Turbanov - ... and 10 more: https://git.openjdk.org/jdk/compare/1bb39a95...437a8fce ------------- Changes: https://git.openjdk.org/jdk/pull/12821/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12821&range=03 Stats: 6643 lines in 65 files changed: 20 ins; 6613 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/12821.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12821/head:pull/12821 PR: https://git.openjdk.org/jdk/pull/12821 From jwaters at openjdk.org Sun Mar 5 15:39:14 2023 From: jwaters at openjdk.org (Julian Waters) Date: Sun, 5 Mar 2023 15:39:14 GMT Subject: RFR: 8302189: Mark assertion failures noreturn [v2] In-Reply-To: References: <4gLeUT6_5s6WpG-b-0146d490ZRgBq_JON1-5P3Wgtk=.e0bc3fd3-fce0-47b8-9478-1de796128731@github.com> Message-ID: On Sat, 4 Mar 2023 11:16:03 GMT, Kim Barrett wrote: >> make/hotspot/lib/CompileJvm.gmk line 103: >> >>> 101: DISABLED_WARNINGS_xlc := tautological-compare shift-negative-value >>> 102: >>> 103: DISABLED_WARNINGS_microsoft := 4624 4244 4291 4146 4127 4722 >> >> It is annoying that we don't document what these warnings are. :( > > I've made the same comment in the past. This time I decided to do something about it: > https://bugs.openjdk.org/browse/JDK-8303618 > Document disabled C/C++ warnings on Windows Looks like something I can help with, will assign the enhancement to myself >> src/hotspot/share/utilities/debug.cpp line 290: >> >>> 288: private: >>> 289: ResourceMark _rm; >>> 290: DebuggingContext _debugging; >> >> Why a different initialization syntax here? > > I don't understand the question? These are member variable declarations. I believe David is asking about why these particular member names were changed ------------- PR: https://git.openjdk.org/jdk/pull/12845 From darcy at openjdk.org Sun Mar 5 17:13:09 2023 From: darcy at openjdk.org (Joe Darcy) Date: Sun, 5 Mar 2023 17:13:09 GMT Subject: RFR: JDK-8302801: Remove fdlibm C sources [v4] In-Reply-To: <3JvuLUDJO3_dzKHOsMocC6kGDEmnIQo_7uobd-VTzHg=.22565440-42ae-4d26-9d74-2cbb7c63f9ea@github.com> References: <3JvuLUDJO3_dzKHOsMocC6kGDEmnIQo_7uobd-VTzHg=.22565440-42ae-4d26-9d74-2cbb7c63f9ea@github.com> Message-ID: On Thu, 2 Mar 2023 18:27:09 GMT, Joe Darcy wrote: >> make/autoconf/buildjdk-spec.gmk.in line 85: >> >>> 83: JVM_LIBS := @OPENJDK_BUILD_JVM_LIBS@ >>> 84: >>> 85: FDLIBM_CFLAGS := @OPENJDK_BUILD_FDLIBM_CFLAGS@ >> >> If the hotspot build still needs `FDLIBM_CFLAGS`, then this line needs to stay. > > Okay; added back. PS Successful Mach 5 job of default builds and tier 1 tests with this make line present. ------------- PR: https://git.openjdk.org/jdk/pull/12821 From alanb at openjdk.org Sun Mar 5 19:21:09 2023 From: alanb at openjdk.org (Alan Bateman) Date: Sun, 5 Mar 2023 19:21:09 GMT Subject: RFR: JDK-8302801: Remove fdlibm C sources [v4] In-Reply-To: References: Message-ID: On Sun, 5 Mar 2023 06:19:06 GMT, Joe Darcy wrote: >> While the review of https://github.com/openjdk/jdk/pull/12800 finishes up, I thought I'd get out for the review the next phase of the FDLIBM port: removing the FDLIBM C sources from the repo. >> >> A repo with the changes for JDK-8302027 and this PR successful build on the default set of platform and successfully run tier 1 tests, which includes tests of the math library. >> >> There are a few remaining references to the case-independent string "fdlibm" in the make directory and HotSpot sources. HotSpot contains a partial fork for FDLIBM (a tine of FDLIBM?) to use for intrinsics. The remaining make machinery contains logic to determine what set of gcc options can be used for the compile. >> >> The intention of this change is to remove use of FDLIBM for the core libraries. > > Joe Darcy has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 20 commits: > > - Merge branch 'master' of https://git.openjdk.org/jdk into JDK-8302801-expr > - Respond to review feedback. > - Respond to review feedback and add description of transliteration process. > - JDK-8302801: Remove fdlibm C sources > - Update src/java.base/share/classes/java/lang/FdLibm.java > > Co-authored-by: Andrey Turbanov > - Update src/java.base/share/classes/java/lang/FdLibm.java > > Co-authored-by: Andrey Turbanov > - Update src/java.base/share/classes/java/lang/FdLibm.java > > Co-authored-by: Andrey Turbanov > - Update src/java.base/share/classes/java/lang/FdLibm.java > > Co-authored-by: Andrey Turbanov > - Update src/java.base/share/classes/java/lang/FdLibm.java > > Co-authored-by: Andrey Turbanov > - Update src/java.base/share/classes/java/lang/FdLibm.java > > Co-authored-by: Andrey Turbanov > - ... and 10 more: https://git.openjdk.org/jdk/compare/1bb39a95...437a8fce I don't have any other comments on this, looks good. ------------- Marked as reviewed by alanb (Reviewer). PR: https://git.openjdk.org/jdk/pull/12821 From clanger at openjdk.org Sun Mar 5 20:46:24 2023 From: clanger at openjdk.org (Christoph Langer) Date: Sun, 5 Mar 2023 20:46:24 GMT Subject: RFR: JDK-8302320: AsyncGetCallTrace obtains too few frames in sanity test [v6] In-Reply-To: References: Message-ID: On Tue, 21 Feb 2023 08:58:50 GMT, Johannes Bechberger wrote: >> Extends the existing AsyncGetCallTrace test case and fixes the issue. > > Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision: > > Update full name > [db483a3](https://github.com/openjdk/jdk/commit/db483a38a815f85bd9668749674b5f0f6e4b27b4). You need to do that on the [commit](https://github.com/openjdk/jdk/commit/db483a38a815f85bd9668749674b5f0f6e4b27b4), not on the PR. However, not sure whether your author role is sufficient and everything Github is set up correctly... ------------- PR: https://git.openjdk.org/jdk/pull/12535 From dnsimon at openjdk.org Sun Mar 5 22:08:07 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Sun, 5 Mar 2023 22:08:07 GMT Subject: RFR: 8303431: [JVMCI] libgraal annotation API [v2] In-Reply-To: References: Message-ID: > This PR extends JVMCI with new API (`jdk.vm.ci.meta.Annotated`) for accessing annotations. The main differences from `java.lang.reflect.AnnotatedElement` are: > * Each `Annotated` method explicitly specifies the annotation type(s) for which it wants annotation data. That is, there is no direct equivalent of `AnnotatedElement.getAnnotations()`. > * Annotation data is returned in a map-like object (of type `jdk.vm.ci.meta.AnnotationData`) instead of in an `Annotation` object. This works better for libgraal as it avoids the need for annotation types to be loaded and included in libgraal. > > To demonstrate the new API, here's an example in terms `java.lang.reflect.AnnotatedElement` (which `ResolvedJavaType` implements): > > ResolvedJavaMethod method = ...; > ExplodeLoop a = method.getAnnotation(ExplodeLoop.class); > return switch (a.kind()) { > case FULL_UNROLL -> LoopExplosionKind.FULL_UNROLL; > case FULL_UNROLL_UNTIL_RETURN -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; > ... > } > > > The same code using the new API: > > > ResolvedJavaMethod method = ...; > ResolvedJavaType explodeLoopType = ...; > AnnotationData a = method.getAnnotationDataFor(explodeLoopType); > return switch (a.getEnum("kind").getName()) { > case "FULL_UNROLL" -> LoopExplosionKind.FULL_UNROLL; > case "FULL_UNROLL_UNTIL_RETURN" -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; > ... > } > > > The implementation relies on new methods in `jdk.internal.vm.VMSupport` for parsing annotations and serializing/deserializing to/from a byte array. This allows the annotation data to be passed from the HotSpot heap to the libgraal heap. Doug Simon has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - added support for inherited annotations - Merge branch 'master' into JDK-8303431 - made AnnotationDataDecoder package-private - add annotation API to JVMCI ------------- Changes: https://git.openjdk.org/jdk/pull/12810/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12810&range=01 Stats: 2752 lines in 33 files changed: 2700 ins; 24 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/12810.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12810/head:pull/12810 PR: https://git.openjdk.org/jdk/pull/12810 From dholmes at openjdk.org Sun Mar 5 22:28:11 2023 From: dholmes at openjdk.org (David Holmes) Date: Sun, 5 Mar 2023 22:28:11 GMT Subject: RFR: JDK-8302801: Remove fdlibm C sources [v4] In-Reply-To: References: Message-ID: On Sun, 5 Mar 2023 06:19:06 GMT, Joe Darcy wrote: >> While the review of https://github.com/openjdk/jdk/pull/12800 finishes up, I thought I'd get out for the review the next phase of the FDLIBM port: removing the FDLIBM C sources from the repo. >> >> A repo with the changes for JDK-8302027 and this PR successful build on the default set of platform and successfully run tier 1 tests, which includes tests of the math library. >> >> There are a few remaining references to the case-independent string "fdlibm" in the make directory and HotSpot sources. HotSpot contains a partial fork for FDLIBM (a tine of FDLIBM?) to use for intrinsics. The remaining make machinery contains logic to determine what set of gcc options can be used for the compile. >> >> The intention of this change is to remove use of FDLIBM for the core libraries. > > Joe Darcy has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 20 commits: > > - Merge branch 'master' of https://git.openjdk.org/jdk into JDK-8302801-expr > - Respond to review feedback. > - Respond to review feedback and add description of transliteration process. > - JDK-8302801: Remove fdlibm C sources > - Update src/java.base/share/classes/java/lang/FdLibm.java > > Co-authored-by: Andrey Turbanov > - Update src/java.base/share/classes/java/lang/FdLibm.java > > Co-authored-by: Andrey Turbanov > - Update src/java.base/share/classes/java/lang/FdLibm.java > > Co-authored-by: Andrey Turbanov > - Update src/java.base/share/classes/java/lang/FdLibm.java > > Co-authored-by: Andrey Turbanov > - Update src/java.base/share/classes/java/lang/FdLibm.java > > Co-authored-by: Andrey Turbanov > - Update src/java.base/share/classes/java/lang/FdLibm.java > > Co-authored-by: Andrey Turbanov > - ... and 10 more: https://git.openjdk.org/jdk/compare/1bb39a95...437a8fce Marked as reviewed by dholmes (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/12821 From dnsimon at openjdk.org Sun Mar 5 22:37:38 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Sun, 5 Mar 2023 22:37:38 GMT Subject: RFR: 8303431: [JVMCI] libgraal annotation API [v3] In-Reply-To: References: Message-ID: > This PR extends JVMCI with new API (`jdk.vm.ci.meta.Annotated`) for accessing annotations. The main differences from `java.lang.reflect.AnnotatedElement` are: > * Each `Annotated` method explicitly specifies the annotation type(s) for which it wants annotation data. That is, there is no direct equivalent of `AnnotatedElement.getAnnotations()`. > * Annotation data is returned in a map-like object (of type `jdk.vm.ci.meta.AnnotationData`) instead of in an `Annotation` object. This works better for libgraal as it avoids the need for annotation types to be loaded and included in libgraal. > > To demonstrate the new API, here's an example in terms `java.lang.reflect.AnnotatedElement` (which `ResolvedJavaType` implements): > > ResolvedJavaMethod method = ...; > ExplodeLoop a = method.getAnnotation(ExplodeLoop.class); > return switch (a.kind()) { > case FULL_UNROLL -> LoopExplosionKind.FULL_UNROLL; > case FULL_UNROLL_UNTIL_RETURN -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; > ... > } > > > The same code using the new API: > > > ResolvedJavaMethod method = ...; > ResolvedJavaType explodeLoopType = ...; > AnnotationData a = method.getAnnotationDataFor(explodeLoopType); > return switch (a.getEnum("kind").getName()) { > case "FULL_UNROLL" -> LoopExplosionKind.FULL_UNROLL; > case "FULL_UNROLL_UNTIL_RETURN" -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; > ... > } > > > The implementation relies on new methods in `jdk.internal.vm.VMSupport` for parsing annotations and serializing/deserializing to/from a byte array. This allows the annotation data to be passed from the HotSpot heap to the libgraal heap. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: fixed whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12810/files - new: https://git.openjdk.org/jdk/pull/12810/files/8743e8b9..3dd5ef9c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12810&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12810&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12810.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12810/head:pull/12810 PR: https://git.openjdk.org/jdk/pull/12810 From kvn at openjdk.org Mon Mar 6 03:08:17 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 6 Mar 2023 03:08:17 GMT Subject: RFR: JDK-8302801: Remove fdlibm C sources [v4] In-Reply-To: References: Message-ID: <-GBkdRKUf-hykdFTOo44ZVnuJSyhGzfCid3xeUHuMb0=.9225c4b6-65f8-4e54-8dac-c300872f660b@github.com> On Sun, 5 Mar 2023 06:19:06 GMT, Joe Darcy wrote: >> While the review of https://github.com/openjdk/jdk/pull/12800 finishes up, I thought I'd get out for the review the next phase of the FDLIBM port: removing the FDLIBM C sources from the repo. >> >> A repo with the changes for JDK-8302027 and this PR successful build on the default set of platform and successfully run tier 1 tests, which includes tests of the math library. >> >> There are a few remaining references to the case-independent string "fdlibm" in the make directory and HotSpot sources. HotSpot contains a partial fork for FDLIBM (a tine of FDLIBM?) to use for intrinsics. The remaining make machinery contains logic to determine what set of gcc options can be used for the compile. >> >> The intention of this change is to remove use of FDLIBM for the core libraries. > > Joe Darcy has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 20 commits: > > - Merge branch 'master' of https://git.openjdk.org/jdk into JDK-8302801-expr > - Respond to review feedback. > - Respond to review feedback and add description of transliteration process. > - JDK-8302801: Remove fdlibm C sources > - Update src/java.base/share/classes/java/lang/FdLibm.java > > Co-authored-by: Andrey Turbanov > - Update src/java.base/share/classes/java/lang/FdLibm.java > > Co-authored-by: Andrey Turbanov > - Update src/java.base/share/classes/java/lang/FdLibm.java > > Co-authored-by: Andrey Turbanov > - Update src/java.base/share/classes/java/lang/FdLibm.java > > Co-authored-by: Andrey Turbanov > - Update src/java.base/share/classes/java/lang/FdLibm.java > > Co-authored-by: Andrey Turbanov > - Update src/java.base/share/classes/java/lang/FdLibm.java > > Co-authored-by: Andrey Turbanov > - ... and 10 more: https://git.openjdk.org/jdk/compare/1bb39a95...437a8fce HotSpot changes look good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.org/jdk/pull/12821 From aboldtch at openjdk.org Mon Mar 6 07:08:19 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 6 Mar 2023 07:08:19 GMT Subject: RFR: 8300926: Several startup regressions ~6-70% in 21-b6 all platforms [v6] In-Reply-To: References: Message-ID: On Thu, 23 Feb 2023 20:14:32 GMT, Robbin Ehn wrote: >> Hi all, please consider. >> >> The original issue was when thread 1 asked to deopt nmethod set X and thread 2 asked for the same or a subset of X. >> All method will already be marked, but the actual deoptimizing, not entrant, patching PC on stacks and patching post call nops, was not done yet. Which meant thread 2 could 'pass' thread 1. >> Most places did deopt under Compile_lock, thus this is not an issue, but WB and clearCallSiteContext do not. >> >> Since a handshakes may take long before completion and Compile_lock is used for so much more than deopts. >> The fix in https://bugs.openjdk.org/browse/JDK-8299074 instead always emitted a handshake even when everything was already marked. (instead of adding Compile_lock to all places) >> >> This turnout to be problematic in the startup, for example the number of deopt handshakes in jetty dry run startup went from 5 to 39 handshakes. >> >> This fix first adds a barrier for which you do not pass until the requested deopts have happened and it coalesces the handshakes. >> Secondly it moves handshakes part out of the Compile_lock where it is possible. >> >> Which means we fix the performance bug and we reduce the contention on Compile_lock, meaning higher throughput in compiler and things such as class-loading. >> >> It passes t1-t7 with flying colours! t8 still not completed and I'm redoing some testing due to last minute simplifications. >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request incrementally with two additional commits since the last revision: > > - Comment fixes > - Include/fwd fixes src/hotspot/share/prims/whitebox.cpp line 798: > 796: result += mh->method_holder()->mark_osr_nmethods(&deopt_scope, mh()); > 797: } else if (mh->code() != nullptr) { > 798: deopt_scope.mark(mh->code()); I was working on a patch based on top of this one and had a crash here. The second call to `mh->code()` returned a `nullptr`. It looks racy to me and seems like the only thing protecting `CompileMethod::code()` is the `CompiledMethod_lock` which is not held here. Regardless `mh->code()` should probably only be loaded once. (Or at least not re-loaded after the nullptr check) ------------- PR: https://git.openjdk.org/jdk/pull/12585 From kbarrett at openjdk.org Mon Mar 6 07:35:00 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 6 Mar 2023 07:35:00 GMT Subject: RFR: 8303621: BitMap::iterate should support lambdas and other function objects Message-ID: Please review this enhancement of BitMap::iterate to support lambdas and other function objects as the operation being applied to the set bit indices. Some for-loops using BitMap::get_next_one_offset have been changed to use BitMap::iterate with a lambda. (One reason for changing the for-loops is that I'm considering a change to the get_next_one_offset API, and reducing the number of direct uses will simplify that.) For convenience, the function can either return void (always iterate over the whole range) or bool (stop iteration if returns false). Iteration using closure objects invoked via a do_bit member function are now implemented by being wrapped in a lambda, so get the same convenience. (Though, of course, if the closure is derived from BitMapClosure then do_bit returns bool.) The unit tests are written as "early" tests, not requiring an initialized VM. They also avoid any C heap allocation (even though C heap allocation has very early support). This is done to minimize the requirements for running the tests, since BitMap is used in a lot of places. This attempts to run these tests before uses. (Yes, I know about JDK-8257226; maybe that will be fixed someday.) (Some existing BitMap gtests should be modified to do similarly; see JDK-8303636.) Testing: mach5 tier1-7 ------------- Commit messages: - use iterate with lambda - bitmap iteration supports lambda Changes: https://git.openjdk.org/jdk/pull/12876/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12876&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303621 Stats: 302 lines in 5 files changed: 260 ins; 8 del; 34 mod Patch: https://git.openjdk.org/jdk/pull/12876.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12876/head:pull/12876 PR: https://git.openjdk.org/jdk/pull/12876 From eosterlund at openjdk.org Mon Mar 6 08:01:28 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 6 Mar 2023 08:01:28 GMT Subject: Integrated: 8302780: Add support for vectorized arraycopy GC barriers In-Reply-To: References: Message-ID: The message from this sender included one or more files which could not be scanned for virus detection; do not open these files unless you are certain of the sender's intent. ---------------------------------------------------------------------- On Mon, 20 Feb 2023 15:23:04 GMT, Erik ?sterlund wrote: > So far, the arraycopy stubs have performed some kind of bulk pre/post barriers for arraycopy, which have been good enough, and allowed the copying itself to be done with plain loads and stores. For generational ZGC, this approach is not good enough, and we need barriers for the actual copying, but instead don't need the pre/post barriers. To prepare the JVM for generational ZGC, we need to add an API for arraycopy barriers. This pull request has now been integrated. Changeset: 5f153e05 Author: Erik ?sterlund URL: https://git.openjdk.org/jdk/commit/5f153e056b1929a306b0907f4528bbd2766699c2 Stats: 970 lines in 11 files changed: 663 ins; 30 del; 277 mod 8302780: Add support for vectorized arraycopy GC barriers Co-authored-by: Yadong Wang Reviewed-by: ayang, fyang, rcastanedalo, aph ------------- PR: https://git.openjdk.org/jdk/pull/12670 From rcastanedalo at openjdk.org Mon Mar 6 08:42:31 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 6 Mar 2023 08:42:31 GMT Subject: RFR: 8302780: Add support for vectorized arraycopy GC barriers [v8] In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 10:02:59 GMT, Andrew Haley wrote: > > In my opinion, the (very much needed) changes you suggest are outside the scope of this PR, which is about lifting the memory accesses, in their existing form, to a barrier API-level. Conflating this with your suggested changes would make it harder to review this PR, which is sufficiently complex in its current form. I totally agree that your suggestions would improve readability and maintainability, but couldn't we apply them in a follow-up RFE? > > I disagree in every way. The added complexity, which is fixed so it no longer matters, made it near-impossible for me to reason about this PR. And, as John Rose put it, like any good carpenter we should clean up as we work. Just to be clear, my comment was specifically about the loop re-rolling part of your suggestions. I see now that this was not clear in my comment, sorry about that. ------------- PR: https://git.openjdk.org/jdk/pull/12670 From aboldtch at openjdk.org Mon Mar 6 09:00:06 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 6 Mar 2023 09:00:06 GMT Subject: RFR: 8303621: BitMap::iterate should support lambdas and other function objects In-Reply-To: References: Message-ID: <-3SOW9HL0EDTdD3UdOXl1w2z1do67Y7CltyoXYYL-aY=.0d69ad12-d9b3-4819-81cb-fb34ff2eee63@github.com> On Mon, 6 Mar 2023 07:27:23 GMT, Kim Barrett wrote: > Please review this enhancement of BitMap::iterate to support lambdas and other > function objects as the operation being applied to the set bit indices. Some > for-loops using BitMap::get_next_one_offset have been changed to use > BitMap::iterate with a lambda. > > (One reason for changing the for-loops is that I'm considering a change to the > get_next_one_offset API, and reducing the number of direct uses will simplify > that.) > > For convenience, the function can either return void (always iterate over the > whole range) or bool (stop iteration if returns false). Iteration using > closure objects invoked via a do_bit member function are now implemented by > being wrapped in a lambda, so get the same convenience. (Though, of course, > if the closure is derived from BitMapClosure then do_bit returns bool.) > > The unit tests are written as "early" tests, not requiring an initialized VM. > They also avoid any C heap allocation (even though C heap allocation has very > early support). This is done to minimize the requirements for running the > tests, since BitMap is used in a lot of places. This attempts to run these > tests before uses. (Yes, I know about JDK-8257226; maybe that will be fixed > someday.) (Some existing BitMap gtests should be modified to do similarly; see > JDK-8303636.) > > Testing: > mach5 tier1-7 lgtm. Nice cleanup, makes code easier to read. ------------- Marked as reviewed by aboldtch (Committer). PR: https://git.openjdk.org/jdk/pull/12876 From tschatzl at openjdk.org Mon Mar 6 09:39:03 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 6 Mar 2023 09:39:03 GMT Subject: RFR: 8303621: BitMap::iterate should support lambdas and other function objects In-Reply-To: References: Message-ID: On Mon, 6 Mar 2023 07:27:23 GMT, Kim Barrett wrote: > Please review this enhancement of BitMap::iterate to support lambdas and other > function objects as the operation being applied to the set bit indices. Some > for-loops using BitMap::get_next_one_offset have been changed to use > BitMap::iterate with a lambda. > > (One reason for changing the for-loops is that I'm considering a change to the > get_next_one_offset API, and reducing the number of direct uses will simplify > that.) > > For convenience, the function can either return void (always iterate over the > whole range) or bool (stop iteration if returns false). Iteration using > closure objects invoked via a do_bit member function are now implemented by > being wrapped in a lambda, so get the same convenience. (Though, of course, > if the closure is derived from BitMapClosure then do_bit returns bool.) > > The unit tests are written as "early" tests, not requiring an initialized VM. > They also avoid any C heap allocation (even though C heap allocation has very > early support). This is done to minimize the requirements for running the > tests, since BitMap is used in a lot of places. This attempts to run these > tests before uses. (Yes, I know about JDK-8257226; maybe that will be fixed > someday.) (Some existing BitMap gtests should be modified to do similarly; see > JDK-8303636.) > > Testing: > mach5 tier1-7 Marked as reviewed by tschatzl (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/12876 From tschatzl at openjdk.org Mon Mar 6 09:41:12 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 6 Mar 2023 09:41:12 GMT Subject: RFR: 8303534: Merge CompactibleSpace into ContiguousSpace [v2] In-Reply-To: References: Message-ID: On Fri, 3 Mar 2023 12:30:38 GMT, Albert Mingkun Yang wrote: >> Simple refactoring of merging two types. >> >> Test: tier1-5 > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > copyright-year Marked as reviewed by tschatzl (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/12841 From kbarrett at openjdk.org Mon Mar 6 10:15:36 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 6 Mar 2023 10:15:36 GMT Subject: RFR: 8303621: BitMap::iterate should support lambdas and other function objects In-Reply-To: References: Message-ID: On Mon, 6 Mar 2023 07:27:23 GMT, Kim Barrett wrote: > Please review this enhancement of BitMap::iterate to support lambdas and other > function objects as the operation being applied to the set bit indices. Some > for-loops using BitMap::get_next_one_offset have been changed to use > BitMap::iterate with a lambda. > > (One reason for changing the for-loops is that I'm considering a change to the > get_next_one_offset API, and reducing the number of direct uses will simplify > that.) > > For convenience, the function can either return void (always iterate over the > whole range) or bool (stop iteration if returns false). Iteration using > closure objects invoked via a do_bit member function are now implemented by > being wrapped in a lambda, so get the same convenience. (Though, of course, > if the closure is derived from BitMapClosure then do_bit returns bool.) > > The unit tests are written as "early" tests, not requiring an initialized VM. > They also avoid any C heap allocation (even though C heap allocation has very > early support). This is done to minimize the requirements for running the > tests, since BitMap is used in a lot of places. This attempts to run these > tests before uses. (Yes, I know about JDK-8257226; maybe that will be fixed > someday.) (Some existing BitMap gtests should be modified to do similarly; see > JDK-8303636.) > > Testing: > mach5 tier1-7 src/hotspot/share/utilities/bitMap.hpp line 273: > 271: // terminates early. Otherwise, the result must be convertible to bool. > 272: // > 273: // - cl->do_bit(i) is a valid expression whose result is convertible to bool. This description was an intermediate that I wasn't finished with, and isn't correct since it doesn't describe `cl->do_bit(i)` as potentially returning void, which is supported by the implementation. I seem to have lost the final version, and will need to rewrite it. ------------- PR: https://git.openjdk.org/jdk/pull/12876 From iwalulya at openjdk.org Mon Mar 6 10:22:27 2023 From: iwalulya at openjdk.org (Ivan Walulya) Date: Mon, 6 Mar 2023 10:22:27 GMT Subject: RFR: 8191565: Last-ditch Full GC should also move humongous objects [v2] In-Reply-To: <58l059EvQI6HNQyjUYSGYEWt6x-c1yvtmfX1QWfinH8=.87517ba1-ec81-4b9f-a41b-b05c8d33cf3d@github.com> References: <58l059EvQI6HNQyjUYSGYEWt6x-c1yvtmfX1QWfinH8=.87517ba1-ec81-4b9f-a41b-b05c8d33cf3d@github.com> Message-ID: > Hi All, > > Please review this change to move humongous regions during the Last-Ditch full gc ( on `do_maximal_compaction`). This change will enable G1 to avoid encountering Out-Of-Memory errors that may occur due to the fragmentation of memory regions caused by the allocation of large memory blocks. > > Here's how it works: At the end of `phase2_prepare_compaction`, G1 performs a serial compaction process for regular objects, which results in the heap being divided into two parts. The first part is a densely populated prefix that contains all the regular objects that have been moved. The second part consists of the remaining heap space, which may contain free regions, uncommitted regions, and regions that are not compacting. By moving/compacting the humongous objects in the second part of the heap closer to the dense prefix, G1 reduces the region fragmentation and avoids running into OOM errors. > > We have enabled for G1 the Jtreg test that was previously used only for Shenandoah to test such workload. > > Testing: Tier 1-3 Ivan Walulya has updated the pull request incrementally with two additional commits since the last revision: - Refactor resetting humongous metadata - Thomas review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12830/files - new: https://git.openjdk.org/jdk/pull/12830/files/f4951d37..6f8dd514 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12830&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12830&range=00-01 Stats: 219 lines in 12 files changed: 71 ins; 103 del; 45 mod Patch: https://git.openjdk.org/jdk/pull/12830.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12830/head:pull/12830 PR: https://git.openjdk.org/jdk/pull/12830 From kbarrett at openjdk.org Mon Mar 6 10:33:36 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 6 Mar 2023 10:33:36 GMT Subject: RFR: 8303621: BitMap::iterate should support lambdas and other function objects [v2] In-Reply-To: References: Message-ID: The message from this sender included one or more files which could not be scanned for virus detection; do not open these files unless you are certain of the sender's intent. ---------------------------------------------------------------------- > Please review this enhancement of BitMap::iterate to support lambdas and other > function objects as the operation being applied to the set bit indices. Some > for-loops using BitMap::get_next_one_offset have been changed to use > BitMap::iterate with a lambda. > > (One reason for changing the for-loops is that I'm considering a change to the > get_next_one_offset API, and reducing the number of direct uses will simplify > that.) > > For convenience, the function can either return void (always iterate over the > whole range) or bool (stop iteration if returns false). Iteration using > closure objects invoked via a do_bit member function are now implemented by > being wrapped in a lambda, so get the same convenience. (Though, of course, > if the closure is derived from BitMapClosure then do_bit returns bool.) > > The unit tests are written as "early" tests, not requiring an initialized VM. > They also avoid any C heap allocation (even though C heap allocation has very > early support). This is done to minimize the requirements for running the > tests, since BitMap is used in a lot of places. This attempts to run these > tests before uses. (Yes, I know about JDK-8257226; maybe that will be fixed > someday.) (Some existing BitMap gtests should be modified to do similarly; see > JDK-8303636.) > > Testing: > mach5 tier1-7 Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: improve description of iterate ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12876/files - new: https://git.openjdk.org/jdk/pull/12876/files/a4f46bdc..a1befdf3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12876&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12876&range=00-01 Stats: 21 lines in 1 file changed: 9 ins; 9 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/12876.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12876/head:pull/12876 PR: https://git.openjdk.org/jdk/pull/12876 From tschatzl at openjdk.org Mon Mar 6 10:50:20 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 6 Mar 2023 10:50:20 GMT Subject: RFR: 8303621: BitMap::iterate should support lambdas and other function objects [v2] In-Reply-To: References: Message-ID: On Mon, 6 Mar 2023 10:33:36 GMT, Kim Barrett wrote: >> Please review this enhancement of BitMap::iterate to support lambdas and other >> function objects as the operation being applied to the set bit indices. Some >> for-loops using BitMap::get_next_one_offset have been changed to use >> BitMap::iterate with a lambda. >> >> (One reason for changing the for-loops is that I'm considering a change to the >> get_next_one_offset API, and reducing the number of direct uses will simplify >> that.) >> >> For convenience, the function can either return void (always iterate over the >> whole range) or bool (stop iteration if returns false). Iteration using >> closure objects invoked via a do_bit member function are now implemented by >> being wrapped in a lambda, so get the same convenience. (Though, of course, >> if the closure is derived from BitMapClosure then do_bit returns bool.) >> >> The unit tests are written as "early" tests, not requiring an initialized VM. >> They also avoid any C heap allocation (even though C heap allocation has very >> early support). This is done to minimize the requirements for running the >> tests, since BitMap is used in a lot of places. This attempts to run these >> tests before uses. (Yes, I know about JDK-8257226; maybe that will be fixed >> someday.) (Some existing BitMap gtests should be modified to do similarly; see >> JDK-8303636.) >> >> Testing: >> mach5 tier1-7 > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > improve description of iterate Still good. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.org/jdk/pull/12876 From aboldtch at openjdk.org Mon Mar 6 11:02:10 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 6 Mar 2023 11:02:10 GMT Subject: RFR: 8303621: BitMap::iterate should support lambdas and other function objects [v2] In-Reply-To: References: Message-ID: On Mon, 6 Mar 2023 10:33:36 GMT, Kim Barrett wrote: >> Please review this enhancement of BitMap::iterate to support lambdas and other >> function objects as the operation being applied to the set bit indices. Some >> for-loops using BitMap::get_next_one_offset have been changed to use >> BitMap::iterate with a lambda. >> >> (One reason for changing the for-loops is that I'm considering a change to the >> get_next_one_offset API, and reducing the number of direct uses will simplify >> that.) >> >> For convenience, the function can either return void (always iterate over the >> whole range) or bool (stop iteration if returns false). Iteration using >> closure objects invoked via a do_bit member function are now implemented by >> being wrapped in a lambda, so get the same convenience. (Though, of course, >> if the closure is derived from BitMapClosure then do_bit returns bool.) >> >> The unit tests are written as "early" tests, not requiring an initialized VM. >> They also avoid any C heap allocation (even though C heap allocation has very >> early support). This is done to minimize the requirements for running the >> tests, since BitMap is used in a lot of places. This attempts to run these >> tests before uses. (Yes, I know about JDK-8257226; maybe that will be fixed >> someday.) (Some existing BitMap gtests should be modified to do similarly; see >> JDK-8303636.) >> >> Testing: >> mach5 tier1-7 > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > improve description of iterate Marked as reviewed by aboldtch (Committer). ------------- PR: https://git.openjdk.org/jdk/pull/12876 From stefank at openjdk.org Mon Mar 6 13:35:22 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 6 Mar 2023 13:35:22 GMT Subject: RFR: 8292059: Do not inline InstanceKlass::allocate_instance() [v2] In-Reply-To: <8koli6nAbt8Rx4Je8MRic0dPloLTb9IiUyw25BvUI0s=.07267dc7-d2a1-4426-8876-1e41b1a248ac@github.com> References: <8koli6nAbt8Rx4Je8MRic0dPloLTb9IiUyw25BvUI0s=.07267dc7-d2a1-4426-8876-1e41b1a248ac@github.com> Message-ID: On Fri, 3 Mar 2023 11:53:48 GMT, Afshin Zafari wrote: >> The inline and not-inline versions of the method is stress tested to compare the performance difference. The statistics are drawn in the following charts. The vertical axis is in milliseconds. >> >> ![chart (2)](https://user-images.githubusercontent.com/4697012/221848555-2884313e-9d26-41c9-a265-3f1ce295b17b.png) >> >> ![chart (3)](https://user-images.githubusercontent.com/4697012/221863810-94118677-b4af-468f-90c6-5ea365ae3588.png) > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > 8292059: Do not inline InstanceKlass::allocate_instance() There's a lot of changes to includes that needs to be restored / cleaned. I've marked them below: src/hotspot/share/memory/iterator.inline.hpp line 35: > 33: #include "oops/compressedOops.inline.hpp" > 34: #include "oops/klass.hpp" > 35: #include "utilities/devirtualizer.inline.hpp" Sort order is wrong. src/hotspot/share/oops/instanceKlass.inline.hpp line 28: > 26: #define SHARE_OOPS_INSTANCEKLASS_INLINE_HPP > 27: > 28: #include "oops/instanceKlass.hpp" All these includes should *not* be removed. src/hotspot/share/oops/oop.inline.hpp line 36: > 34: #include "oops/arrayOop.hpp" > 35: #include "oops/compressedOops.inline.hpp" > 36: #include "oops/klass.inline.hpp" Sort order. src/hotspot/share/prims/jvmtiEnvBase.cpp line 28: > 26: #include "classfile/classLoaderDataGraph.hpp" > 27: #include "classfile/javaClasses.inline.hpp" > 28: #include "classfile/vmSymbols.hpp" Sort order. ------------- Changes requested by stefank (Reviewer). PR: https://git.openjdk.org/jdk/pull/12782 From stefank at openjdk.org Mon Mar 6 13:39:19 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 6 Mar 2023 13:39:19 GMT Subject: RFR: 8292059: Do not inline InstanceKlass::allocate_instance() [v2] In-Reply-To: <8koli6nAbt8Rx4Je8MRic0dPloLTb9IiUyw25BvUI0s=.07267dc7-d2a1-4426-8876-1e41b1a248ac@github.com> References: <8koli6nAbt8Rx4Je8MRic0dPloLTb9IiUyw25BvUI0s=.07267dc7-d2a1-4426-8876-1e41b1a248ac@github.com> Message-ID: On Fri, 3 Mar 2023 11:53:48 GMT, Afshin Zafari wrote: >> The inline and not-inline versions of the method is stress tested to compare the performance difference. The statistics are drawn in the following charts. The vertical axis is in milliseconds. >> >> ![chart (2)](https://user-images.githubusercontent.com/4697012/221848555-2884313e-9d26-41c9-a265-3f1ce295b17b.png) >> >> ![chart (3)](https://user-images.githubusercontent.com/4697012/221863810-94118677-b4af-468f-90c6-5ea365ae3588.png) > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > 8292059: Do not inline InstanceKlass::allocate_instance() What is the motivation to change the parameter from `oop java_class` to `InstanceKlass*`? The call sites are now much noisier and harder to read. ------------- PR: https://git.openjdk.org/jdk/pull/12782 From thomas.stuefe at gmail.com Mon Mar 6 14:18:56 2023 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Mon, 6 Mar 2023 15:18:56 +0100 Subject: alignas and clang Message-ID: Hi, I'm not sure if this has been discussed before. With alignas, on clang, if the natural type of the data has a larger alignment than what I requested with alignas, I get a compile error if the specified alignment is smaller than what would be the natural alignment of the type. Example: struct alignas(2) XX { void* p; }; gives: error: requested alignment is less than minimum alignment of 8 for type 'XX' Happens on both MacOS and Linux clang build. Does not happen with GCC. Does not happen with ATTRIBUTE_ALIGN(2). Is this a clang bug? The standard [1] says: "The object or the type declared by such a declaration will have its alignment requirement equal to the strictest (largest) non-zero expression of all alignas specifiers used in the declaration, unless it would weaken the natural alignment of the type. " That reads to me like gcc is correct. This is a bit of a drawback compared to ATTRIBUTE_ALIGN, especially if coding for different 32-bit and 64-bit platforms. [1] https://en.cppreference.com/w/cpp/language/alignas -------------- next part -------------- An HTML attachment was scrubbed... URL: From eosterlund at openjdk.org Mon Mar 6 15:04:37 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 6 Mar 2023 15:04:37 GMT Subject: RFR: 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing [v9] In-Reply-To: References: Message-ID: <5La1Nl4tQg06WqJWG4Tou6QoEXdnS8l_YTddxJbYHyE=.89395bb9-1570-4bc0-98e9-355da8412c89@github.com> On Mon, 20 Feb 2023 07:15:23 GMT, Axel Boldt-Christmas wrote: >> Add reentrant step logic to VMError::report with an inner loop which enable the logic to recover at every step of the iteration. >> >> Before this change, if printing one register/stack position crashes then no more registers/stack positions will be printed. >> >> After this change even if the VM is unstable and some registers print_location crashes the hs_err printing will recover and keep attempting to print the rest of the registers or stack values. >> >> Enables the following >> ```C++ >> REENTRANT_STEP_IF("printing register info", _verbose && _context && _thread && Universe::is_fully_initialized()) >> os::print_register_info_header(st, _context); >> >> REENTRANT_LOOP_START(os::print_nth_register_info_max_index()) >> // decode register contents if possible >> ResourceMark rm(_thread); >> os::print_nth_register_info(st, REENTRANT_ITERATION_STEP, _context); >> REENTRANT_LOOP_END >> >> st->cr(); >> >> >> Testing: tier 1 and compiled Linux-x64/aarch64, MacOS-x64/aarch64, Windows x64 and cross-compiled Linux-x86/riscv/arm/ppc/s390x (GHA and some local) > > Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - Merge remote-tracking branch 'upstream_jdk/master' into vmerror_report_register_stack_reentrant > - Add test > - Fix and strengthen print_stack_location > - Missed variable rename > - Copyright > - Rework logic and use continuation state for reattempts > - Merge remote-tracking branch 'upstream_jdk/master' into vmerror_report_register_stack_reentrant > - Restructure os::print_register_info interface > - Code syle and line length > - Merge Fix > - ... and 5 more: https://git.openjdk.org/jdk/compare/2009dc2b...2e12b4a5 Just a naming nit, otherwise it looks good. src/hotspot/share/utilities/vmError.cpp line 175: > 173: static bool check_stack_headroom(Thread* thread, > 174: size_t headroom) { > 175: static const address stack_top = thread != nullptr We typically call "stack_top "stack_base", and "stack_bottom" we call "stack_end". ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.org/jdk/pull/11017 From tanksherman27 at gmail.com Mon Mar 6 15:22:17 2023 From: tanksherman27 at gmail.com (Julian Waters) Date: Mon, 6 Mar 2023 23:22:17 +0800 Subject: alignas and clang In-Reply-To: References: Message-ID: Hi Thomas, >From what I know gcc is actually the one that's not correct here, a bigger alignment is considered stricter in C++ (though the Standard really should make this much clearer) since a lower alignment value results in more possible addresses, hence making the type's alignment weaker, and alignas is very explicitly forbidden from weakening alignment, especially the regular (unmodified) minimum alignment of said type. I don't know if there are places where we purposely weaken type alignment in HotSpot though best regards, Julian On Mon, Mar 6, 2023 at 10:19?PM Thomas St?fe wrote: > Hi, > > I'm not sure if this has been discussed before. With alignas, on clang, if > the natural type of the data has a larger alignment than what I requested > with alignas, I get a compile error if the specified alignment is smaller > than what would be the natural alignment of the type. > > Example: > > struct alignas(2) XX { > void* p; > }; > > gives: error: requested alignment is less than minimum alignment of 8 for > type 'XX' > > Happens on both MacOS and Linux clang build. Does not happen with GCC. > Does not happen with ATTRIBUTE_ALIGN(2). > > Is this a clang bug? The standard [1] says: > > "The object or the type declared by such a declaration will have its > alignment requirement equal to the strictest (largest) non-zero expression > of all alignas specifiers used in the declaration, unless it would weaken > the natural alignment of the type. " > > That reads to me like gcc is correct. > > This is a bit of a drawback compared to ATTRIBUTE_ALIGN, especially if > coding for different 32-bit and 64-bit platforms. > > [1] https://en.cppreference.com/w/cpp/language/alignas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.stuefe at gmail.com Mon Mar 6 15:52:39 2023 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Mon, 6 Mar 2023 16:52:39 +0100 Subject: alignas and clang In-Reply-To: References: Message-ID: Hi Julian, On Mon, Mar 6, 2023 at 4:22?PM Julian Waters wrote: > Hi Thomas, > > From what I know gcc is actually the one that's not correct here, a bigger > alignment is considered stricter in C++ (though the Standard really should > make this much clearer) since a lower alignment value results in more > possible addresses, hence making the type's alignment weaker, and alignas > is very explicitly forbidden from weakening alignment, especially the > regular (unmodified) minimum alignment of said type. > gcc does not weaken the alignment, though. It just chooses whatever is larger: natural or explicitly given alignment. So it behaves to spec, provided I understand the spec. I don't know if there are places where we purposely weaken type alignment > in HotSpot though > Not weaken, and not purposefully. But the specified alignment may be smaller than the natural one, and the natural (hah) thing to do would be to choose whatever's larger. For example, from https://github.com/openjdk/jdk/pull/12879: template struct MyContainer { union alignas(alignof(T)) DataHolder { char bytes[sizeof(T)]; void* p; }; DataHolder d; }; So, I want the compiler to allocate storage aligned for T, but postpone the actual construction of T. I also want to keep the thing in a freelist, hence the next pointer (here its a union, but that does not matter for the problem). For T=int, on 64-bit, I would have expected the compiler to generate DataHolder with an alignment requirement of 8 since T needs 4 and void* needs 8. But the compiler does not accept this. However, clang accepts ATTRIBUTE_ALIGNED and then does what gcc does. Cheers, Thomas > best regards, > Julian > > On Mon, Mar 6, 2023 at 10:19?PM Thomas St?fe > wrote: > >> Hi, >> >> I'm not sure if this has been discussed before. With alignas, on clang, >> if the natural type of the data has a larger alignment than what I >> requested with alignas, I get a compile error if the specified alignment is >> smaller than what would be the natural alignment of the type. >> >> Example: >> >> struct alignas(2) XX { >> void* p; >> }; >> >> gives: error: requested alignment is less than minimum alignment of 8 for >> type 'XX' >> >> Happens on both MacOS and Linux clang build. Does not happen with GCC. >> Does not happen with ATTRIBUTE_ALIGN(2). >> >> Is this a clang bug? The standard [1] says: >> >> "The object or the type declared by such a declaration will have its >> alignment requirement equal to the strictest (largest) non-zero expression >> of all alignas specifiers used in the declaration, unless it would weaken >> the natural alignment of the type. " >> >> That reads to me like gcc is correct. >> >> This is a bit of a drawback compared to ATTRIBUTE_ALIGN, especially if >> coding for different 32-bit and 64-bit platforms. >> >> [1] https://en.cppreference.com/w/cpp/language/alignas >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From never at openjdk.org Mon Mar 6 16:13:47 2023 From: never at openjdk.org (Tom Rodriguez) Date: Mon, 6 Mar 2023 16:13:47 GMT Subject: RFR: 8303577: [JVMCI] OOME causes crash while translating exceptions In-Reply-To: References: Message-ID: On Fri, 3 Mar 2023 15:40:01 GMT, Doug Simon wrote: > JDK-8297431 added code for special handling of OutOfMemoryError when translating an exception between libjvmci and HotSpot[1]. > Unfortunately, this code was deleted by JDK-8298099 when moving the exception translation mechanism to VMSupport[2]. > This causes the VM to crash when an OOME occurs while translating an exception from HotSpot to libjvmci. > This PR revives the deleted code. > > This bug was found by running `test/jdk/java/util/concurrent/locks/Lock/OOMEInAQS.java` on libgraal. The fix now makes the test pass. > > Note that the code modified in `src/java.base/share/classes/jdk/internal/vm/VMSupport.java` is JVMCI specific and was original added by [JDK-8298099](https://git.openjdk.org/jdk/pull/11513). > > [1] https://github.com/openjdk/jdk/commit/952e10055135613e8ea2b818a4f35842936f5633#diff-4d3a3b7e7e12e1d5b4cf3e4677d9e0de5e9df3bbf1bbfa0d8d43d12098d67dc4R222-R234 > [2] https://github.com/openjdk/jdk/commit/8b69a2e434ad2fa3369079622b57afb973d5bd9a#diff-7292551772c27b7152a3333f03cbbad90a897c5e37c6a97d4026be835e6d8fe1R121-R125 Marked as reviewed by never (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/12857 From dnsimon at openjdk.org Mon Mar 6 16:14:01 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 6 Mar 2023 16:14:01 GMT Subject: RFR: 8303577: [JVMCI] OOME causes crash while translating exceptions In-Reply-To: References: Message-ID: On Fri, 3 Mar 2023 18:05:51 GMT, Vladimir Kozlov wrote: >> JDK-8297431 added code for special handling of OutOfMemoryError when translating an exception between libjvmci and HotSpot[1]. >> Unfortunately, this code was deleted by JDK-8298099 when moving the exception translation mechanism to VMSupport[2]. >> This causes the VM to crash when an OOME occurs while translating an exception from HotSpot to libjvmci. >> This PR revives the deleted code. >> >> This bug was found by running `test/jdk/java/util/concurrent/locks/Lock/OOMEInAQS.java` on libgraal. The fix now makes the test pass. >> >> Note that the code modified in `src/java.base/share/classes/jdk/internal/vm/VMSupport.java` is JVMCI specific and was original added by [JDK-8298099](https://git.openjdk.org/jdk/pull/11513). >> >> [1] https://github.com/openjdk/jdk/commit/952e10055135613e8ea2b818a4f35842936f5633#diff-4d3a3b7e7e12e1d5b4cf3e4677d9e0de5e9df3bbf1bbfa0d8d43d12098d67dc4R222-R234 >> [2] https://github.com/openjdk/jdk/commit/8b69a2e434ad2fa3369079622b57afb973d5bd9a#diff-7292551772c27b7152a3333f03cbbad90a897c5e37c6a97d4026be835e6d8fe1R121-R125 > > Good. Thanks for the reviews @vnkozlov and @tkrodriguez . ------------- PR: https://git.openjdk.org/jdk/pull/12857 From dnsimon at openjdk.org Mon Mar 6 16:14:02 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 6 Mar 2023 16:14:02 GMT Subject: Integrated: 8303577: [JVMCI] OOME causes crash while translating exceptions In-Reply-To: References: Message-ID: On Fri, 3 Mar 2023 15:40:01 GMT, Doug Simon wrote: > JDK-8297431 added code for special handling of OutOfMemoryError when translating an exception between libjvmci and HotSpot[1]. > Unfortunately, this code was deleted by JDK-8298099 when moving the exception translation mechanism to VMSupport[2]. > This causes the VM to crash when an OOME occurs while translating an exception from HotSpot to libjvmci. > This PR revives the deleted code. > > This bug was found by running `test/jdk/java/util/concurrent/locks/Lock/OOMEInAQS.java` on libgraal. The fix now makes the test pass. > > Note that the code modified in `src/java.base/share/classes/jdk/internal/vm/VMSupport.java` is JVMCI specific and was original added by [JDK-8298099](https://git.openjdk.org/jdk/pull/11513). > > [1] https://github.com/openjdk/jdk/commit/952e10055135613e8ea2b818a4f35842936f5633#diff-4d3a3b7e7e12e1d5b4cf3e4677d9e0de5e9df3bbf1bbfa0d8d43d12098d67dc4R222-R234 > [2] https://github.com/openjdk/jdk/commit/8b69a2e434ad2fa3369079622b57afb973d5bd9a#diff-7292551772c27b7152a3333f03cbbad90a897c5e37c6a97d4026be835e6d8fe1R121-R125 This pull request has now been integrated. Changeset: cac81ddc Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/cac81ddc9259168a5b12c290ae2ce7db25a729fc Stats: 31 lines in 5 files changed: 22 ins; 0 del; 9 mod 8303577: [JVMCI] OOME causes crash while translating exceptions Reviewed-by: kvn, never ------------- PR: https://git.openjdk.org/jdk/pull/12857 From tanksherman27 at gmail.com Mon Mar 6 16:22:29 2023 From: tanksherman27 at gmail.com (Julian Waters) Date: Tue, 7 Mar 2023 00:22:29 +0800 Subject: alignas and clang In-Reply-To: References: Message-ID: Ahhh, I see what you mean. Seems to me that this may actually be compiler dependent, since C++14 simply states such situations are ill-formed and makes no promises of instead choosing the larger alignment when this happens. Does seem to be pretty annoying in that case :( Love that pun though, on another note :P best regards, Julian On Mon, Mar 6, 2023 at 11:52?PM Thomas St?fe wrote: > Hi Julian, > > On Mon, Mar 6, 2023 at 4:22?PM Julian Waters > wrote: > >> Hi Thomas, >> >> From what I know gcc is actually the one that's not correct here, a >> bigger alignment is considered stricter in C++ (though the Standard really >> should make this much clearer) since a lower alignment value results in >> more possible addresses, hence making the type's alignment weaker, and >> alignas is very explicitly forbidden from weakening alignment, especially >> the regular (unmodified) minimum alignment of said type. >> > > gcc does not weaken the alignment, though. It just chooses whatever is > larger: natural or explicitly given alignment. So it behaves to spec, > provided I understand the spec. > > I don't know if there are places where we purposely weaken type alignment >> in HotSpot though >> > > Not weaken, and not purposefully. But the specified alignment may be > smaller than the natural one, and the natural (hah) thing to do would be to > choose whatever's larger. > > For example, from https://github.com/openjdk/jdk/pull/12879: > > template struct MyContainer { > union alignas(alignof(T)) DataHolder { > char bytes[sizeof(T)]; > void* p; > }; > DataHolder d; > }; > > So, I want the compiler to allocate storage aligned for T, but postpone > the actual construction of T. I also want to keep the thing in a freelist, > hence the next pointer (here its a union, but that does not matter for the > problem). > > For T=int, on 64-bit, I would have expected the compiler to generate > DataHolder with an alignment requirement of 8 since T needs 4 and void* > needs 8. But the compiler does not accept this. However, clang accepts > ATTRIBUTE_ALIGNED and then does what gcc does. > > Cheers, Thomas > > >> best regards, >> Julian >> >> On Mon, Mar 6, 2023 at 10:19?PM Thomas St?fe >> wrote: >> >>> Hi, >>> >>> I'm not sure if this has been discussed before. With alignas, on clang, >>> if the natural type of the data has a larger alignment than what I >>> requested with alignas, I get a compile error if the specified alignment is >>> smaller than what would be the natural alignment of the type. >>> >>> Example: >>> >>> struct alignas(2) XX { >>> void* p; >>> }; >>> >>> gives: error: requested alignment is less than minimum alignment of 8 >>> for type 'XX' >>> >>> Happens on both MacOS and Linux clang build. Does not happen with GCC. >>> Does not happen with ATTRIBUTE_ALIGN(2). >>> >>> Is this a clang bug? The standard [1] says: >>> >>> "The object or the type declared by such a declaration will have its >>> alignment requirement equal to the strictest (largest) non-zero expression >>> of all alignas specifiers used in the declaration, unless it would weaken >>> the natural alignment of the type. " >>> >>> That reads to me like gcc is correct. >>> >>> This is a bit of a drawback compared to ATTRIBUTE_ALIGN, especially if >>> coding for different 32-bit and 64-bit platforms. >>> >>> [1] https://en.cppreference.com/w/cpp/language/alignas >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcking at openjdk.org Mon Mar 6 16:30:33 2023 From: jcking at openjdk.org (Justin King) Date: Mon, 6 Mar 2023 16:30:33 GMT Subject: RFR: JDK-8300783: Consolidate byteswap implementations [v8] In-Reply-To: <4T6ba7HVAkPmaau2WD3FRRyOlmEz7MDX5nz2UM-rfms=.58f59fc2-6030-4d9f-914f-5f37df4fb95e@github.com> References: <4T6ba7HVAkPmaau2WD3FRRyOlmEz7MDX5nz2UM-rfms=.58f59fc2-6030-4d9f-914f-5f37df4fb95e@github.com> Message-ID: On Wed, 25 Jan 2023 12:31:11 GMT, David Holmes wrote: >> Justin King has updated the pull request incrementally with one additional commit since the last revision: >> >> Update comment >> >> Signed-off-by: Justin King > > CI run was fine wrt these changes. @dholmes-ora Poke. :) ------------- PR: https://git.openjdk.org/jdk/pull/12114 From jvernee at openjdk.org Mon Mar 6 16:42:10 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 6 Mar 2023 16:42:10 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v3] In-Reply-To: References: <8b3vVrV22RuhdRoRYacXV0ZeghFGgKkC8S_z-iMrzAQ=.dd84b743-8b51-4281-8f5f-f9eff6207bc7@github.com> Message-ID: On Wed, 1 Mar 2023 06:37:45 GMT, Martin Doerr wrote: >>> * Uploaded my simple reproducer to [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) >> >> Thanks! >> >>> * Using oversized load / stores is problematic. Don't forget that OpenJDK still supports Big Endian platforms (AIX, s390x). >> >> You're right. I realized that it's also problematic for heap segments, for which we can't do oversized accesses. I am working on another solution that splits up the loads/stores into power-of-two sized chunks: https://github.com/openjdk/panama-foreign/compare/foreign-memaccess+abi...JornVernee:panama-foreign:OOB That patch is just a POC at this point though. Also, I don't think it works for BE at the moment (need to flip the offset for BE, I think. Just like we do in Unsafe). >> >>> * The result of `NativeCallingConvention::calling_convention` is interpreted as size, but it returns the max offset. That's off by one slot. Should I file a bug for that? (PPC64 is not affected because it doesn't use the result.) >> >> I'm not sure there's an issue there. Note that the 'max offset' is computed as `reg.offset() + reg.stack_size()`, so that should get us the size we need to allocate for the stack arguments. (e.g. 2 ints being passed at offset 0 and 4, would make max offset 4 + 4 = 8, which gives the size needed for the 2 ints). Computing the max offset instead of just summing the sizes of the stack arguments is needed since stack arguments can be sparsely placed in some cases on Mac/AArch64. >> >>> * Since the membar on the return path was mentioned: I think it would be good to enable UseSystemMemoryBarrier by default on operating systems which support it. Maybe we should discuss this with @robehn. >> >> ~I don't think we've done that much testing with UseSystemMemoryBarrier since it was added~. I'm a bit nervous about turning it on by default since it's currently also used for JNI. Let's see what Robbin thinks. > > @JornVernee: Thanks a lot for your detailed review! I have quite a few TODOs which include: > - Include my tests for the HFA corner cases. > - Try to improve handling of the overlapping registers as you suggested. > - Check nesting of HFA. > > There will surely be more when looking into Big Endian support after merging with your recent work on https://github.com/openjdk/panama-foreign/compare/foreign-memaccess+abi...JornVernee:panama-foreign:OOB > We should get rid of oversized accesses on PPC64, too. > Thanks for sharing your plans to intrisify `linkToNative` in C2 later. I guess we should do more preparation work on all platforms when that gets addressed. @TheRealMDoerr I've moved the support for structs/unions that are not a power of 2 in size to this repo, so you should be able to merge the master branch to get it now. ------------- PR: https://git.openjdk.org/jdk/pull/12708 From fparain at openjdk.org Mon Mar 6 16:47:39 2023 From: fparain at openjdk.org (Frederic Parain) Date: Mon, 6 Mar 2023 16:47:39 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams Message-ID: Please review this change re-implementing the FieldInfo data structure. The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of stong typing and semantic overloading, and a poor memory efficiency. The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. Tested with mach5, tier 1 to 7. Thank you. ------------- Commit messages: - Merge remote-tracking branch 'upstream/master' into fieldinfo_unsigned5 - Reimplementation of FieldInfo as an unsigned5 stream Changes: https://git.openjdk.org/jdk/pull/12855/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12855&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8292818 Stats: 1699 lines in 52 files changed: 897 ins; 446 del; 356 mod Patch: https://git.openjdk.org/jdk/pull/12855.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12855/head:pull/12855 PR: https://git.openjdk.org/jdk/pull/12855 From thomas.stuefe at gmail.com Mon Mar 6 17:58:55 2023 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Mon, 6 Mar 2023 18:58:55 +0100 Subject: alignas and clang In-Reply-To: References: Message-ID: The message from this sender included one or more files which could not be scanned for virus detection; do not open these files unless you are certain of the sender's intent. ---------------------------------------------------------------------- On Mon, Mar 6, 2023 at 5:23?PM Julian Waters wrote: > Ahhh, I see what you mean. Seems to me that this may actually be compiler > dependent, since C++14 simply states such situations are ill-formed and > makes no promises of instead choosing the larger alignment when > this happens. > Oh, you are right, I stopped reading before it got interesting. "If the strictest (largest) alignas on a declaration is weaker than the alignment it would have without any alignas specifiers (that is, weaker than its natural alignment or weaker than alignas on another declaration of the same object or type), the program is ill-formed." Never mind then, I'll find a different solution. Thanks for thinking this through with me! Cheers, Thomas Does seem to be pretty annoying in that case :( > > Love that pun though, on another note :P > > best regards, > Julian > > On Mon, Mar 6, 2023 at 11:52?PM Thomas St?fe > wrote: > >> Hi Julian, >> >> On Mon, Mar 6, 2023 at 4:22?PM Julian Waters >> wrote: >> >>> Hi Thomas, >>> >>> From what I know gcc is actually the one that's not correct here, a >>> bigger alignment is considered stricter in C++ (though the Standard really >>> should make this much clearer) since a lower alignment value results in >>> more possible addresses, hence making the type's alignment weaker, and >>> alignas is very explicitly forbidden from weakening alignment, especially >>> the regular (unmodified) minimum alignment of said type. >>> >> >> gcc does not weaken the alignment, though. It just chooses whatever is >> larger: natural or explicitly given alignment. So it behaves to spec, >> provided I understand the spec. >> >> I don't know if there are places where we purposely weaken type alignment >>> in HotSpot though >>> >> >> Not weaken, and not purposefully. But the specified alignment may be >> smaller than the natural one, and the natural (hah) thing to do would be to >> choose whatever's larger. >> >> For example, from https://github.com/openjdk/jdk/pull/12879: >> >> template struct MyContainer { >> union alignas(alignof(T)) DataHolder { >> char bytes[sizeof(T)]; >> void* p; >> }; >> DataHolder d; >> }; >> >> So, I want the compiler to allocate storage aligned for T, but postpone >> the actual construction of T. I also want to keep the thing in a freelist, >> hence the next pointer (here its a union, but that does not matter for the >> problem). >> >> For T=int, on 64-bit, I would have expected the compiler to generate >> DataHolder with an alignment requirement of 8 since T needs 4 and void* >> needs 8. But the compiler does not accept this. However, clang accepts >> ATTRIBUTE_ALIGNED and then does what gcc does. >> >> Cheers, Thomas >> >> >>> best regards, >>> Julian >>> >>> On Mon, Mar 6, 2023 at 10:19?PM Thomas St?fe >>> wrote: >>> >>>> Hi, >>>> >>>> I'm not sure if this has been discussed before. With alignas, on clang, >>>> if the natural type of the data has a larger alignment than what I >>>> requested with alignas, I get a compile error if the specified alignment is >>>> smaller than what would be the natural alignment of the type. >>>> >>>> Example: >>>> >>>> struct alignas(2) XX { >>>> void* p; >>>> }; >>>> >>>> gives: error: requested alignment is less than minimum alignment of 8 >>>> for type 'XX' >>>> >>>> Happens on both MacOS and Linux clang build. Does not happen with GCC. >>>> Does not happen with ATTRIBUTE_ALIGN(2). >>>> >>>> Is this a clang bug? The standard [1] says: >>>> >>>> "The object or the type declared by such a declaration will have its >>>> alignment requirement equal to the strictest (largest) non-zero expression >>>> of all alignas specifiers used in the declaration, unless it would weaken >>>> the natural alignment of the type. " >>>> >>>> That reads to me like gcc is correct. >>>> >>>> This is a bit of a drawback compared to ATTRIBUTE_ALIGN, especially if >>>> coding for different 32-bit and 64-bit platforms. >>>> >>>> [1] https://en.cppreference.com/w/cpp/language/alignas >>>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From prappo at openjdk.org Mon Mar 6 20:22:48 2023 From: prappo at openjdk.org (Pavel Rappo) Date: Mon, 6 Mar 2023 20:22:48 GMT Subject: RFR: 8303480: Miscellaneous fixes to mostly invisible doc comments [v2] In-Reply-To: References: Message-ID: > Please review this superficial documentation cleanup that was triggered by unrelated analysis of doc comments in JDK API. > > The only effect that this multi-area PR has on the JDK API Documentation (i.e. the observable effect on the generated HTML pages) can be summarized as follows: > > > diff -ur build/macosx-aarch64/images/docs-before/api/serialized-form.html build/macosx-aarch64/images/docs-after/api/serialized-form.html > --- build/macosx-aarch64/images/docs-before/api/serialized-form.html 2023-03-02 11:47:44 > +++ build/macosx-aarch64/images/docs-after/api/serialized-form.html 2023-03-02 11:48:45 > @@ -17084,7 +17084,7 @@ > throws IOException, > ClassNotFoundException >
readObject is called to restore the state of the > - (@code BasicPermission} from a stream.
> + BasicPermission from a stream. >
>
Parameters:
>
s - the ObjectInputStream from which data is read
> > Notes > ----- > > * I'm not an expert in any of the affected areas, except for jdk.javadoc, and I was merely after misused tags. Because of that, I would appreciate reviews from experts in other areas. > * I discovered many more issues than I included in this PR. The excluded issues seem to occur in infrequently updated third-party code (e.g. javax.xml), which I assume we shouldn't touch unless necessary. > * I will update copyright years after (and if) the fix had been approved, as required. Pavel Rappo has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into 8303480 - Initial commit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12826/files - new: https://git.openjdk.org/jdk/pull/12826/files/d2f4a553..87166408 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12826&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12826&range=00-01 Stats: 13433 lines in 415 files changed: 9003 ins; 2610 del; 1820 mod Patch: https://git.openjdk.org/jdk/pull/12826.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12826/head:pull/12826 PR: https://git.openjdk.org/jdk/pull/12826 From inakonechnyy at openjdk.org Mon Mar 6 20:28:40 2023 From: inakonechnyy at openjdk.org (Ilarion Nakonechnyy) Date: Mon, 6 Mar 2023 20:28:40 GMT Subject: RFR: 8302491: NoClassDefFoundError omits the original cause of an error [v5] In-Reply-To: References: Message-ID: <0UEtpYm8TvljAdj3FQI5LYfjHxXPBWBfFD0lDsLojJA=.d259d5df-24b8-4b93-8fb3-feafc696deab@github.com> > The proposed approach added a new function for getting the cause of an exception -`java_lang_Throwable::get_cause_simple `, that gets called within `InstanceKlass::add_initialization_error` if an old one `java_lang_Throwable::get_cause_with_stack_trace` didn't succeed because of an exception during the VM call. The simple function doesn't call the VM for getting a stack trace but fills in any other information about an exception. > > Besides that, the discovering information about an exception was added to `ConstantPoolCacheEntry::save_and_throw_indy_exc` function. > > Jtreg for reproducing the issue also was added to the commit. > The commit was tested with tier1 tests. Ilarion Nakonechnyy has updated the pull request incrementally with one additional commit since the last revision: 1/ create_initialization_error(): return empty exception, if EIIE creation failed. 2/ remove testcase ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12566/files - new: https://git.openjdk.org/jdk/pull/12566/files/5c9c1ffe..adf139fa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12566&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12566&range=03-04 Stats: 95 lines in 2 files changed: 6 ins; 87 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/12566.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12566/head:pull/12566 PR: https://git.openjdk.org/jdk/pull/12566 From jjg at openjdk.org Mon Mar 6 20:31:18 2023 From: jjg at openjdk.org (Jonathan Gibbons) Date: Mon, 6 Mar 2023 20:31:18 GMT Subject: RFR: 8303480: Miscellaneous fixes to mostly invisible doc comments [v2] In-Reply-To: References: <-U8YFFuXm_hMf-bY1AVCRauRrE-fRYRxrx_yf38ZL1A=.d50884c5-cc4b-489a-b817-828faf876c76@github.com> Message-ID: On Fri, 3 Mar 2023 11:31:04 GMT, Alexey Ivanov wrote: >> Yes, iff means if-and-only-if and is used for extra precision in formal logic, mathematics. As @pavelrappo points out it's a relatively common occurrence in the OpenJDK sources, though perhaps not in the public javadocs. Perhaps a bit pretentious, but mostly a terse way to say "return true if the BSM method type exactly matches X, otherwise false". >> >> The broken link stems from the fact that the method I was targeting (a way to use condy for lambda proxy singletons rather than a `MethodHandle.constant`) was never integrated. We'll look at either getting that done (@briangoetz suggested the time might be ready for it) or remove this currently pointless static bootstrap specialization test. > >> Yes, iff means if-and-only-if and is used for extra precision in formal logic, mathematics. > > I've never come across it before. With your explanations, it makes perfect sense. I would recommend (separately) changing `iff` to the expanded form `if and only if` ------------- PR: https://git.openjdk.org/jdk/pull/12826 From jjg at openjdk.org Mon Mar 6 20:36:13 2023 From: jjg at openjdk.org (Jonathan Gibbons) Date: Mon, 6 Mar 2023 20:36:13 GMT Subject: RFR: 8303480: Miscellaneous fixes to mostly invisible doc comments [v2] In-Reply-To: References: Message-ID: On Mon, 6 Mar 2023 20:22:48 GMT, Pavel Rappo wrote: >> Please review this superficial documentation cleanup that was triggered by unrelated analysis of doc comments in JDK API. >> >> The only effect that this multi-area PR has on the JDK API Documentation (i.e. the observable effect on the generated HTML pages) can be summarized as follows: >> >> >> diff -ur build/macosx-aarch64/images/docs-before/api/serialized-form.html build/macosx-aarch64/images/docs-after/api/serialized-form.html >> --- build/macosx-aarch64/images/docs-before/api/serialized-form.html 2023-03-02 11:47:44 >> +++ build/macosx-aarch64/images/docs-after/api/serialized-form.html 2023-03-02 11:48:45 >> @@ -17084,7 +17084,7 @@ >> throws IOException, >> ClassNotFoundException >>
readObject is called to restore the state of the >> - (@code BasicPermission} from a stream.
>> + BasicPermission from a stream. >>
>>
Parameters:
>>
s - the ObjectInputStream from which data is read
>> >> Notes >> ----- >> >> * I'm not an expert in any of the affected areas, except for jdk.javadoc, and I was merely after misused tags. Because of that, I would appreciate reviews from experts in other areas. >> * I discovered many more issues than I included in this PR. The excluded issues seem to occur in infrequently updated third-party code (e.g. javax.xml), which I assume we shouldn't touch unless necessary. >> * I will update copyright years after (and if) the fix had been approved, as required. > > Pavel Rappo has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into 8303480 > - Initial commit Marked as reviewed by jjg (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/12826 From lancea at openjdk.org Mon Mar 6 20:39:17 2023 From: lancea at openjdk.org (Lance Andersen) Date: Mon, 6 Mar 2023 20:39:17 GMT Subject: RFR: 8303480: Miscellaneous fixes to mostly invisible doc comments [v2] In-Reply-To: References: Message-ID: On Mon, 6 Mar 2023 20:22:48 GMT, Pavel Rappo wrote: >> Please review this superficial documentation cleanup that was triggered by unrelated analysis of doc comments in JDK API. >> >> The only effect that this multi-area PR has on the JDK API Documentation (i.e. the observable effect on the generated HTML pages) can be summarized as follows: >> >> >> diff -ur build/macosx-aarch64/images/docs-before/api/serialized-form.html build/macosx-aarch64/images/docs-after/api/serialized-form.html >> --- build/macosx-aarch64/images/docs-before/api/serialized-form.html 2023-03-02 11:47:44 >> +++ build/macosx-aarch64/images/docs-after/api/serialized-form.html 2023-03-02 11:48:45 >> @@ -17084,7 +17084,7 @@ >> throws IOException, >> ClassNotFoundException >>
readObject is called to restore the state of the >> - (@code BasicPermission} from a stream.
>> + BasicPermission from a stream. >>
>>
Parameters:
>>
s - the ObjectInputStream from which data is read
>> >> Notes >> ----- >> >> * I'm not an expert in any of the affected areas, except for jdk.javadoc, and I was merely after misused tags. Because of that, I would appreciate reviews from experts in other areas. >> * I discovered many more issues than I included in this PR. The excluded issues seem to occur in infrequently updated third-party code (e.g. javax.xml), which I assume we shouldn't touch unless necessary. >> * I will update copyright years after (and if) the fix had been approved, as required. > > Pavel Rappo has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into 8303480 > - Initial commit Marked as reviewed by lancea (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/12826 From xliu at openjdk.org Mon Mar 6 21:24:16 2023 From: xliu at openjdk.org (Xin Liu) Date: Mon, 6 Mar 2023 21:24:16 GMT Subject: RFR: 8301136: Improve unlink() and unlink_all() of ResourceHashtableBase [v5] In-Reply-To: <9GAFUt_y36yObC0oOhzxNC05Y8Ja_fkUPxvhZCuFSPY=.255df6aa-d417-4277-9799-a9208e758158@github.com> References: <9GAFUt_y36yObC0oOhzxNC05Y8Ja_fkUPxvhZCuFSPY=.255df6aa-d417-4277-9799-a9208e758158@github.com> Message-ID: On Sun, 29 Jan 2023 06:41:10 GMT, Ioi Lam wrote: >> Xin Liu has updated the pull request incrementally with one additional commit since the last revision: >> >> Add lambda API unlink(Function f) per reviewers' request. > > I updated [JDK-8301296](https://bugs.openjdk.org/browse/JDK-8301296) to clarify the problems with the current ResourceHashtable design and my proposal for fixing them. > > I think the above proposed fixes shouldn't block the progress of this PR, which is just an optimization that maintains the current behavior. So let's continue the discussions here. > > There are two parts of this PR: > > [1] For the optimization (counting the number of entries and exit the loop early), do you have any performance data that shows this is beneficial? > > For this optimization to be beneficial, we must have two conditions -- (a) the table is too large so it's likely to have many unused entries, and (b) the hash function is bad so most of the unused entries are at the end of the table. > > For (a), maybe it's better to change the table to ResizeableResourceHashtable? > For (b), maybe you can also print out the occupancy of the table in your traces like this one (in your earlier PR https://github.com/openjdk/jdk/pull/12016) > > > [12.824s][debug][hashtables] table_size = 109, break at 68 > > > If we have many entries (e.g., 40) but they all pack to the front end of the table, that means we have a bad hash function. > > [2] As I suggested earlier, we should consolidate the code to use a single unlink loop, so you can apply this counting optimization in a single place. > > I am not quite sure why you would need the following in your "wrapper" functions: > > > if (clean) { > *ptr = node->_next; > } else { > ptr = &(node->_next); > } > > > and > > > if (node->_next == nullptr) { > // nullify the bucket when reach the end of linked list. > *ptr = nullptr; > } > > > I wrote a version of the consolidated loop that's much simpler. It also aligns with the old code so the diff is more readable: > > https://github.com/openjdk/jdk/compare/master...iklam:jdk:8301136-resource-hash-unlink-all-suggestion > > Note that I have both `unlink_all()` and `unlink_all(Function function)`, that's because the current API allows the user function to do two things (a) check if the entry should be freed, (b) perform special clean up on the entry. So if you want to free all entries but need to perform special clean up, you'd call `unlink_all(Function function)`. hi, @iklam, I found that the dominating way to remove elements in resoureHashTable is its destructor. I ran some Renaissance benchmarks and observed the similar thing. In particular, it's very common to call destructor for an empty resourceHashtable. I think we should apply to the shortcut optimization in JDK-8300184 in the destructor. ? jdk git:(JDK-8301136) ./build/linux-x86_64-server-release/images/jdk/bin/java -Xlog:hashtables=debug --version [0.052s][debug][hashtables] ResourceHashtableBase destructor: entries = 0 [0.052s][debug][hashtables] ResourceHashtableBase table_size = 1031, break at 0, removed = 0 [0.052s][debug][hashtables] ResourceHashtableBase destructor: entries = 0 [0.052s][debug][hashtables] ResourceHashtableBase table_size = 1031, break at 0, removed = 0 [0.063s][debug][hashtables] ResourceHashtableBase destructor: entries = 0 [0.063s][debug][hashtables] ResourceHashtableBase table_size = 1031, break at 0, removed = 0 [0.063s][debug][hashtables] ResourceHashtableBase destructor: entries = 0 [0.063s][debug][hashtables] ResourceHashtableBase table_size = 1031, break at 0, removed = 0 [0.063s][debug][hashtables] ResourceHashtableBase destructor: entries = 0 [0.063s][debug][hashtables] ResourceHashtableBase table_size = 1031, break at 0, removed = 0 [0.063s][debug][hashtables] ResourceHashtableBase destructor: entries = 0 [0.063s][debug][hashtables] ResourceHashtableBase table_size = 1031, break at 0, removed = 0 [0.063s][debug][hashtables] ResourceHashtableBase destructor: entries = 0 [0.063s][debug][hashtables] ResourceHashtableBase table_size = 1031, break at 0, removed = 0 [0.072s][debug][hashtables] ResourceHashtableBase destructor: entries = 0 [0.072s][debug][hashtables] ResourceHashtableBase table_size = 1031, break at 0, removed = 0 openjdk 21-internal 2023-09-19 OpenJDK Runtime Environment (build 21-internal-adhoc.xxinliu.jdk) OpenJDK 64-Bit Server VM (build 21-internal-adhoc.xxinliu.jdk, mixed mode, sharing) [0.073s][debug][hashtables] ResourceHashtableBase destructor: entries = 0 [0.073s][debug][hashtables] ResourceHashtableBase table_size = 1031, break at 0, removed = 0 [0.073s][debug][hashtables] ResourceHashtableBase destructor: entries = 0 [0.073s][debug][hashtables] ResourceHashtableBase table_size = 1031, break at 0, removed = 0 [0.073s][debug][hashtables] ResourceHashtableBase destructor: entries = 0 [0.073s][debug][hashtables] ResourceHashtableBase table_size = 1031, break at 0, removed = 0 [0.073s][debug][hashtables] ResourceHashtableBase destructor: entries = 0 [0.073s][debug][hashtables] ResourceHashtableBase table_size = 1031, break at 0, removed = 0 [0.073s][debug][hashtables] ResourceHashtableBase destructor: entries = 0 [0.073s][debug][hashtables] ResourceHashtableBase table_size = 1031, break at 0, removed = 0 ------------- PR: https://git.openjdk.org/jdk/pull/12213 From rriggs at openjdk.org Mon Mar 6 21:29:08 2023 From: rriggs at openjdk.org (Roger Riggs) Date: Mon, 6 Mar 2023 21:29:08 GMT Subject: RFR: 8303480: Miscellaneous fixes to mostly invisible doc comments [v2] In-Reply-To: References: Message-ID: On Mon, 6 Mar 2023 20:22:48 GMT, Pavel Rappo wrote: >> Please review this superficial documentation cleanup that was triggered by unrelated analysis of doc comments in JDK API. >> >> The only effect that this multi-area PR has on the JDK API Documentation (i.e. the observable effect on the generated HTML pages) can be summarized as follows: >> >> >> diff -ur build/macosx-aarch64/images/docs-before/api/serialized-form.html build/macosx-aarch64/images/docs-after/api/serialized-form.html >> --- build/macosx-aarch64/images/docs-before/api/serialized-form.html 2023-03-02 11:47:44 >> +++ build/macosx-aarch64/images/docs-after/api/serialized-form.html 2023-03-02 11:48:45 >> @@ -17084,7 +17084,7 @@ >> throws IOException, >> ClassNotFoundException >>
readObject is called to restore the state of the >> - (@code BasicPermission} from a stream.
>> + BasicPermission from a stream. >>
>>
Parameters:
>>
s - the ObjectInputStream from which data is read
>> >> Notes >> ----- >> >> * I'm not an expert in any of the affected areas, except for jdk.javadoc, and I was merely after misused tags. Because of that, I would appreciate reviews from experts in other areas. >> * I discovered many more issues than I included in this PR. The excluded issues seem to occur in infrequently updated third-party code (e.g. javax.xml), which I assume we shouldn't touch unless necessary. >> * I will update copyright years after (and if) the fix had been approved, as required. > > Pavel Rappo has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into 8303480 > - Initial commit Marked as reviewed by rriggs (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/12826 From dholmes at openjdk.org Tue Mar 7 01:19:29 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 7 Mar 2023 01:19:29 GMT Subject: RFR: 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing [v9] In-Reply-To: <5La1Nl4tQg06WqJWG4Tou6QoEXdnS8l_YTddxJbYHyE=.89395bb9-1570-4bc0-98e9-355da8412c89@github.com> References: <5La1Nl4tQg06WqJWG4Tou6QoEXdnS8l_YTddxJbYHyE=.89395bb9-1570-4bc0-98e9-355da8412c89@github.com> Message-ID: <2DFTvxr-u0lXBoxjEjR5PPuJr69dl_r-IVsm67cqDV0=.07382883-e874-418e-9eab-bfe826d7d499@github.com> The message from this sender included one or more files which could not be scanned for virus detection; do not open these files unless you are certain of the sender's intent. ---------------------------------------------------------------------- On Mon, 6 Mar 2023 15:00:31 GMT, Erik ?sterlund wrote: >> Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: >> >> - Merge remote-tracking branch 'upstream_jdk/master' into vmerror_report_register_stack_reentrant >> - Add test >> - Fix and strengthen print_stack_location >> - Missed variable rename >> - Copyright >> - Rework logic and use continuation state for reattempts >> - Merge remote-tracking branch 'upstream_jdk/master' into vmerror_report_register_stack_reentrant >> - Restructure os::print_register_info interface >> - Code syle and line length >> - Merge Fix >> - ... and 5 more: https://git.openjdk.org/jdk/compare/2009dc2b...2e12b4a5 > > src/hotspot/share/utilities/vmError.cpp line 175: > >> 173: static bool check_stack_headroom(Thread* thread, >> 174: size_t headroom) { >> 175: static const address stack_top = thread != nullptr > > We typically call "stack_top "stack_base", and "stack_bottom" we call "stack_end". There is existing code in this file that uses this naming: STEP_IF("printing stack bounds", _verbose) st->print("Stack: "); address stack_top; size_t stack_size; if (_thread) { stack_top = _thread->stack_base(); stack_size = _thread->stack_size(); } else { stack_top = os::current_stack_base(); stack_size = os::current_stack_size(); } address stack_bottom = stack_top - stack_size; st->print("[" PTR_FORMAT "," PTR_FORMAT "]", p2i(stack_bottom), p2i(stack_top)); ------------- PR: https://git.openjdk.org/jdk/pull/11017 From dholmes at openjdk.org Tue Mar 7 01:29:13 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 7 Mar 2023 01:29:13 GMT Subject: RFR: 8302189: Mark assertion failures noreturn [v2] In-Reply-To: References: <4gLeUT6_5s6WpG-b-0146d490ZRgBq_JON1-5P3Wgtk=.e0bc3fd3-fce0-47b8-9478-1de796128731@github.com> Message-ID: On Sun, 5 Mar 2023 15:36:30 GMT, Julian Waters wrote: >> I don't understand the question? These are member variable declarations. > > I believe David is asking about why these particular member names were changed No I'm comparing the syntax used elsewhere: DebuggingContext debugging{}; with that used here: DebuggingContext debugging; why the braces in most cases but not here? ------------- PR: https://git.openjdk.org/jdk/pull/12845 From dholmes at openjdk.org Tue Mar 7 01:32:14 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 7 Mar 2023 01:32:14 GMT Subject: RFR: 8302189: Mark assertion failures noreturn [v2] In-Reply-To: References: <4gLeUT6_5s6WpG-b-0146d490ZRgBq_JON1-5P3Wgtk=.e0bc3fd3-fce0-47b8-9478-1de796128731@github.com> Message-ID: On Sat, 4 Mar 2023 11:16:10 GMT, Kim Barrett wrote: >> src/hotspot/share/utilities/debug.cpp line 85: >> >>> 83: if (is_enabled()) { >>> 84: fatal("Multiple Debugging contexts"); >>> 85: } >> >> This seems too restrictive as you could hit different DebuggingContexts in different threads. ?? > > This facility is only intended for use by manually invoked commands while the program is stopped in a debugger. Multi-threaded use is not an issue (and was not supported previously either). I don't think there are any nested uses either, but I've now run across a couple of places where nesting could be useful. So I'm changing the state from a simple bool to a nesting counter. Okay I hadn't realized the process was basically "suspended" when this was activated - as I said I never use this stuff. ------------- PR: https://git.openjdk.org/jdk/pull/12845 From kim.barrett at oracle.com Tue Mar 7 01:38:39 2023 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 7 Mar 2023 01:38:39 +0000 Subject: alignas and clang In-Reply-To: References: Message-ID: <33A01455-A0AB-48AD-9968-9AAA3EB66E4D@oracle.com> > On Mar 6, 2023, at 12:58 PM, Thomas St?fe wrote: > On Mon, Mar 6, 2023 at 5:23?PM Julian Waters wrote: > Ahhh, I see what you mean. Seems to me that this may actually be compiler dependent, since C++14 simply states such situations are ill-formed and makes no promises of instead choosing the larger alignment when this happens. > > Oh, you are right, I stopped reading before it got interesting. > > "If the strictest (largest) alignas on a declaration is weaker than the alignment it would have without any alignas specifiers (that is, weaker than its natural alignment or weaker than alignas on another declaration of the same object or type), the program is ill-formed." > > Never mind then, I'll find a different solution. It looks like you found the explanation for the error. In case you forgot or hadn?t noticed, a solution is to use multiple `alignas` attributes for the declaration, and include one which provides the ?natural? alignment. There?s even an example of doing this in the standard - C++14 7.6.2/7. So you could add `alignas(void*)` to the alignment specifiers in the example. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Message signed with OpenPGP URL: From darcy at openjdk.org Tue Mar 7 01:39:14 2023 From: darcy at openjdk.org (Joe Darcy) Date: Tue, 7 Mar 2023 01:39:14 GMT Subject: RFR: JDK-8302801: Remove fdlibm C sources [v4] In-Reply-To: References: <3JvuLUDJO3_dzKHOsMocC6kGDEmnIQo_7uobd-VTzHg=.22565440-42ae-4d26-9d74-2cbb7c63f9ea@github.com> Message-ID: On Sun, 5 Mar 2023 17:10:08 GMT, Joe Darcy wrote: > PS Successful Mach 5 job of default builds and tier 1 tests with this make line present. PPS And for extra measure as suggested by David Holmes, a tier 1 through 5 build job was also successful. ------------- PR: https://git.openjdk.org/jdk/pull/12821 From dholmes at openjdk.org Tue Mar 7 01:40:14 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 7 Mar 2023 01:40:14 GMT Subject: RFR: 8302189: Mark assertion failures noreturn [v2] In-Reply-To: References: <4gLeUT6_5s6WpG-b-0146d490ZRgBq_JON1-5P3Wgtk=.e0bc3fd3-fce0-47b8-9478-1de796128731@github.com> Message-ID: On Sat, 4 Mar 2023 11:23:58 GMT, Kim Barrett wrote: >> Also 8302799: Refactor Debugging variable usage for noreturn crash reporting > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > make Debugging::_enabled a nesting counter Okay other than the one outstanding minor syntax query, I have nothing further to add. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/12845 From cslucas at openjdk.org Tue Mar 7 01:51:13 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 7 Mar 2023 01:51:13 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges Message-ID: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Can I please get reviews for this PR to add support for the rematerialization of scalar-replaced objects that participate in allocation merges? The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) This PR adds support scalar replacing allocations participating in merges that are used *only* as debug information in SafePointNode and its subclasses. Although there is a performance benefit in doing scalar replacement in this scenario only, the goal of this PR is mainly to add infrastructure to support the rematerialization of SR objects participating in merges. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP, Load+AddP, primarily) subsequently. The approach I used is pretty straightforward. It consists basically in: 1) Extend SafePointScalarObjectNode to represent multiple SR objects; 2) Add a new Class to support rematerialization of SR objects part of merges; 3) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 4) Patch C2 to generate unique types for SR objects participating in allocation merges used only as debug information. I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression that might be related. I also tested with several applications and didn't see any failure. ------------- Commit messages: - Add support for rematerializing scalar replaced objects participating in allocation merges Changes: https://git.openjdk.org/jdk/pull/12897/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12897&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8287061 Stats: 1803 lines in 18 files changed: 1653 ins; 9 del; 141 mod Patch: https://git.openjdk.org/jdk/pull/12897.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12897/head:pull/12897 PR: https://git.openjdk.org/jdk/pull/12897 From dholmes at openjdk.org Tue Mar 7 01:51:20 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 7 Mar 2023 01:51:20 GMT Subject: RFR: 8302191: Performance degradation for float/double modulo on Linux [v10] In-Reply-To: References: Message-ID: On Sat, 4 Mar 2023 02:28:43 GMT, Jan Kratochvil wrote: >> I have OCA already processed/approved. I am not Author but my Author request is being processed these days (sent to Rob McKenna). >> I did regression test x86_64 OpenJDK-8. I will leave other regression testing on GHA. >> The patch (and former GCC performance regression) affects only x86_64+i686. > > Jan Kratochvil has updated the pull request incrementally with one additional commit since the last revision: > > Fix win32 broken build. Changes requested by dholmes (Reviewer). src/hotspot/share/runtime/sharedRuntime.cpp line 238: > 236: #endif > 237: > 238: #if !defined(TARGET_COMPILER_gcc) || defined(_WIN64) That is the wrong condition - it would activate for any non-Windows, non-gcc compiler. This code is only for _WIN64 is it not? regardless of the (theoretical) compiler used on win64? src/hotspot/share/runtime/sharedRuntime.cpp line 272: > 270: #endif > 271: JRT_END > 272: #endif // !X86 || _WIN64 !X86 ??? ------------- PR: https://git.openjdk.org/jdk/pull/12508 From dholmes at openjdk.org Tue Mar 7 02:07:31 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 7 Mar 2023 02:07:31 GMT Subject: RFR: 8302491: NoClassDefFoundError omits the original cause of an error [v5] In-Reply-To: <0UEtpYm8TvljAdj3FQI5LYfjHxXPBWBfFD0lDsLojJA=.d259d5df-24b8-4b93-8fb3-feafc696deab@github.com> References: <0UEtpYm8TvljAdj3FQI5LYfjHxXPBWBfFD0lDsLojJA=.d259d5df-24b8-4b93-8fb3-feafc696deab@github.com> Message-ID: On Mon, 6 Mar 2023 20:28:40 GMT, Ilarion Nakonechnyy wrote: >> The proposed approach added a new function for getting the cause of an exception -`java_lang_Throwable::get_cause_simple `, that gets called within `InstanceKlass::add_initialization_error` if an old one `java_lang_Throwable::get_cause_with_stack_trace` didn't succeed because of an exception during the VM call. The simple function doesn't call the VM for getting a stack trace but fills in any other information about an exception. >> >> Besides that, the discovering information about an exception was added to `ConstantPoolCacheEntry::save_and_throw_indy_exc` function. >> >> Jtreg for reproducing the issue also was added to the commit. >> The commit was tested with tier1 tests. > > Ilarion Nakonechnyy has updated the pull request incrementally with one additional commit since the last revision: > > 1/ create_initialization_error(): return empty exception, if > EIIE creation failed. > 2/ remove testcase Thanks for the updates. Mostly just nits with comments and names. src/hotspot/share/classfile/javaClasses.cpp line 2741: > 2739: > 2740: // Throw ExceptionInInitializerError as the cause with this exception in > 2741: // the message and stack trace. I thought I'd flagged this earlier - this comment is not correct. Suggestion for a method-level comment (which allows removing a comment further down). // Creates an ExceptionInInitializerError to be recorded as the initialization error when class initialization // failed due to the passed in `throwable`. We cannot save `throwable` directly due to issues with keeping alive // all objects referenced via its stacktrace. So instead we save a new EIIE instance, with the same message and // symbolic stacktrace of `throwable`. src/hotspot/share/classfile/javaClasses.cpp line 2756: > 2754: > 2755: Symbol* exception_name = vmSymbols::java_lang_ExceptionInInitializerError(); > 2756: Handle h_cause = Exceptions::new_exception(current, exception_name, st.as_string()); Existing: h_cause in an inappropriate name as this is not the "cause" of anything. Suggest simply h_eiie src/hotspot/share/classfile/javaClasses.cpp line 2758: > 2756: Handle h_cause = Exceptions::new_exception(current, exception_name, st.as_string()); > 2757: // If new_exception returns a different exception while creating the exception, > 2758: // abandon the attempts to save initialization error and return null. Nit: save _the_ initialization error ... src/hotspot/share/classfile/javaClasses.cpp line 2760: > 2758: // abandon the attempts to save initialization error and return null. > 2759: // We can't just return an original throwable (that is get passed as a parameter), > 2760: // because it would keep all the caller classes alive. Delete this. src/hotspot/share/oops/instanceKlass.cpp line 983: > 981: // this would be still be helpful. > 982: JavaThread* THREAD = current; > 983: Handle cause = java_lang_Throwable::create_initialization_error(current, exception); Nit: s/cause/init_error/ src/hotspot/share/oops/instanceKlass.cpp line 987: > 985: CLEAR_PENDING_EXCEPTION; > 986: return; > 987: } We still need to return if we got null. ------------- Changes requested by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/12566 From dholmes at openjdk.org Tue Mar 7 02:42:22 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 7 Mar 2023 02:42:22 GMT Subject: RFR: JDK-8300783: Consolidate byteswap implementations [v8] In-Reply-To: References: <4T6ba7HVAkPmaau2WD3FRRyOlmEz7MDX5nz2UM-rfms=.58f59fc2-6030-4d9f-914f-5f37df4fb95e@github.com> Message-ID: <7SrHCI-9DW8abOV4cIMPkNt6NF9T7Czx_brOJAZtizQ=.f80ef739-4103-44c2-a7cf-5fb47500864b@github.com> On Mon, 6 Mar 2023 16:27:26 GMT, Justin King wrote: >> CI run was fine wrt these changes. > > @dholmes-ora Poke. :) Sorry @jcking but as I previously indicated I'm not qualified to review the details of this. But I will see if I can get someone else to. ------------- PR: https://git.openjdk.org/jdk/pull/12114 From dholmes at openjdk.org Tue Mar 7 02:55:19 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 7 Mar 2023 02:55:19 GMT Subject: RFR: JDK-8302989: Add missing INCLUDE_CDS checks [v3] In-Reply-To: References: Message-ID: On Fri, 3 Mar 2023 08:34:21 GMT, Matthias Baesken wrote: >> src/hotspot/share/runtime/arguments.cpp line 1460: >> >>> 1458: #if INCLUDE_CDS >>> 1459: UseSharedSpaces = false; >>> 1460: #endif >> >> This doesn't make sense - the entire `no_shared_spaces` method is only meaningful on a JVM with CDS. Otherwise `-Xshare:on` should immediately be rejected. ?? > > Hi David, unfortunately this has to stay for now, otherwise we would try to set a const in the CDS-disabled build, this does not compile. > Maybe it would make sense to guard the whole function `no_shared_spaces` and adjust the calls/usages but I would prefer a separate issue for this. AFAICS all calls to `no_shared_spaces` are already within INCLUDE_CDS regions so the function itself can also be INCLUDE_CDS only. ------------- PR: https://git.openjdk.org/jdk/pull/12691 From xliu at openjdk.org Tue Mar 7 04:34:37 2023 From: xliu at openjdk.org (Xin Liu) Date: Tue, 7 Mar 2023 04:34:37 GMT Subject: RFR: 8301136: Improve unlink() and unlink_all() of ResourceHashtableBase [v6] In-Reply-To: References: Message-ID: > 1. Apply the same idea of JDK-8300184 to unlink(). > 2. Because ResourceHashtableBase doesn't support copy assignment, client of it has to purge all elements first when it needs to assign it. We would like provide a specialized version called 'unlink_all()'. We don't need to update each node's _next in this case. We only nullify all buckets. > 3. This patch also provides a specialized version of unlink_all() for destructor. We don't even update buckets. it's dead anyway. Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Add log to dump information in removal APIs. - Merge branch 'master' into JDK-8301136 - Add lambda API unlink(Function f) per reviewers' request. - Use unlink_all() in JvmtiTagMapTable::clear. - Add a template for unlink(), unlink_all() and dtor. - Quit early if cnt is zero. - 8301136: Improve unlink() and unlink_all() of ResourceHashtableBase ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12213/files - new: https://git.openjdk.org/jdk/pull/12213/files/086750e4..8b3bb2df Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12213&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12213&range=04-05 Stats: 104740 lines in 2885 files changed: 47853 ins; 25265 del; 31622 mod Patch: https://git.openjdk.org/jdk/pull/12213.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12213/head:pull/12213 PR: https://git.openjdk.org/jdk/pull/12213 From xliu at openjdk.org Tue Mar 7 05:04:09 2023 From: xliu at openjdk.org (Xin Liu) Date: Tue, 7 Mar 2023 05:04:09 GMT Subject: RFR: 8301136: Improve unlink() and unlink_all() of ResourceHashtableBase [v6] In-Reply-To: References: Message-ID: <1Z3kA8O-iyd5aVsc8-Jqzpe8uSz5-HIeYsltKlMGrtI=.879e0dae-5e1c-46d2-aeed-f4a397558f37@github.com> The message from this sender included one or more files which could not be scanned for virus detection; do not open these files unless you are certain of the sender's intent. ---------------------------------------------------------------------- On Tue, 7 Mar 2023 04:34:37 GMT, Xin Liu wrote: >> 1. Apply the same idea of JDK-8300184 to unlink(). >> 2. Because ResourceHashtableBase doesn't support copy assignment, client of it has to purge all elements first when it needs to assign it. We would like provide a specialized version called 'unlink_all()'. We don't need to update each node's _next in this case. We only nullify all buckets. >> 3. This patch also provides a specialized version of unlink_all() for destructor. We don't even update buckets. it's dead anyway. > > Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Add log to dump information in removal APIs. > - Merge branch 'master' into JDK-8301136 > - Add lambda API unlink(Function f) per reviewers' request. > - Use unlink_all() in JvmtiTagMapTable::clear. > - Add a template for unlink(), unlink_all() and dtor. > - Quit early if cnt is zero. > - 8301136: Improve unlink() and unlink_all() of ResourceHashtableBase I uploaded the hashtable logs of Renaissance/finagle-chirper. The reason I looked in it is that it produces the most logs in runtime. It yields 14,422 lines of log for the tag 'hashtables'. One interesting thing is that 9,166 or 63.5% of them is to remove an empty ResourceHashable. $awk -F '[ ,]' '$15 == 0 {print $0}' finagle-chirper-hashtables.log | wc -l 9166 "break at 0" below indicates that we effectively skip any iteration. [2.151s][debug][hashtables] ResourceHashtableBase table_size = 1031, break at 0, removed 0 out of 0 For the remaining non-trivial entries, 5254 out of 5256 is to remove all elements. They are all from destructor. Current patch only invokes `unlink_all()` in JVMTI. We don't use jvmti in the test. awk -F '[ ,]' '$12 == $15 {print $0}' finagle-chirper-hashtables-nontrivial.log|wc -l 5254 Overall, 14,063 or 97.5% of logs breaks earlier and save some useless iterations. awk -F '[ ,]' '$5 != $9 {print $0}' finagle-chirper-hashtables.log | wc -l 14063 Conclusion: 1. We need to optimize 'destructor' of resourceHashtable. In particular, we should apply the optimization of JDK-8300184 to it. 2. we should consider to specialize "unlink_all". ------------- PR: https://git.openjdk.org/jdk/pull/12213 From xliu at openjdk.org Tue Mar 7 05:14:24 2023 From: xliu at openjdk.org (Xin Liu) Date: Tue, 7 Mar 2023 05:14:24 GMT Subject: RFR: 8301136: Improve unlink() and unlink_all() of ResourceHashtableBase [v5] In-Reply-To: <9GAFUt_y36yObC0oOhzxNC05Y8Ja_fkUPxvhZCuFSPY=.255df6aa-d417-4277-9799-a9208e758158@github.com> References: <9GAFUt_y36yObC0oOhzxNC05Y8Ja_fkUPxvhZCuFSPY=.255df6aa-d417-4277-9799-a9208e758158@github.com> Message-ID: On Sun, 29 Jan 2023 06:41:10 GMT, Ioi Lam wrote: >> Xin Liu has updated the pull request incrementally with one additional commit since the last revision: >> >> Add lambda API unlink(Function f) per reviewers' request. > > I updated [JDK-8301296](https://bugs.openjdk.org/browse/JDK-8301296) to clarify the problems with the current ResourceHashtable design and my proposal for fixing them. > > I think the above proposed fixes shouldn't block the progress of this PR, which is just an optimization that maintains the current behavior. So let's continue the discussions here. > > There are two parts of this PR: > > [1] For the optimization (counting the number of entries and exit the loop early), do you have any performance data that shows this is beneficial? > > For this optimization to be beneficial, we must have two conditions -- (a) the table is too large so it's likely to have many unused entries, and (b) the hash function is bad so most of the unused entries are at the end of the table. > > For (a), maybe it's better to change the table to ResizeableResourceHashtable? > For (b), maybe you can also print out the occupancy of the table in your traces like this one (in your earlier PR https://github.com/openjdk/jdk/pull/12016) > > > [12.824s][debug][hashtables] table_size = 109, break at 68 > > > If we have many entries (e.g., 40) but they all pack to the front end of the table, that means we have a bad hash function. > > [2] As I suggested earlier, we should consolidate the code to use a single unlink loop, so you can apply this counting optimization in a single place. > > I am not quite sure why you would need the following in your "wrapper" functions: > > > if (clean) { > *ptr = node->_next; > } else { > ptr = &(node->_next); > } > > > and > > > if (node->_next == nullptr) { > // nullify the bucket when reach the end of linked list. > *ptr = nullptr; > } > > > I wrote a version of the consolidated loop that's much simpler. It also aligns with the old code so the diff is more readable: > > https://github.com/openjdk/jdk/compare/master...iklam:jdk:8301136-resource-hash-unlink-all-suggestion > > Note that I have both `unlink_all()` and `unlink_all(Function function)`, that's because the current API allows the user function to do two things (a) check if the entry should be freed, (b) perform special clean up on the entry. So if you want to free all entries but need to perform special clean up, you'd call `unlink_all(Function function)`. @iklam Like you said, I mixed up 2 things. My real intention is to introduce a new api `unlink_all()` because I used it in my project. I understand that you want me to refactor unlink(Iterator) using lambda. In order to have an efficient unlink_all() and dtor, I have to factor out unlink_impl(). It's private and acts as the algorithmic template. What should I do now? Should I split this JBS issue into 2? One is for the optimization and the other one for "unlink_all()"? that would make thing easier to review. thanks, --lx ------------- PR: https://git.openjdk.org/jdk/pull/12213 From kbarrett at openjdk.org Tue Mar 7 06:02:14 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 7 Mar 2023 06:02:14 GMT Subject: RFR: 8302189: Mark assertion failures noreturn [v2] In-Reply-To: References: <4gLeUT6_5s6WpG-b-0146d490ZRgBq_JON1-5P3Wgtk=.e0bc3fd3-fce0-47b8-9478-1de796128731@github.com> Message-ID: On Tue, 7 Mar 2023 01:25:47 GMT, David Holmes wrote: >> I believe David is asking about why these particular member names were changed > > No I'm comparing the syntax used elsewhere: > > DebuggingContext debugging{}; > > with that used here: > > DebuggingContext debugging; > > why the braces in most cases but not here? "syntax used elsewhere" are declarations of local variables initialized with _value initialization_ (which is _default initialization_ for this type). "that used here" is a member variable declaration in the Command class, where the initialization occurs elsewhere (in the constructor of the Command class [*]). They are entirely different syntactic entities. Does that help? Or am I still not understanding your question? [*] With C++11, data members can have default initializers, but that's not an approved feature for HotSpot. ------------- PR: https://git.openjdk.org/jdk/pull/12845 From dholmes at openjdk.org Tue Mar 7 06:28:15 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 7 Mar 2023 06:28:15 GMT Subject: RFR: 8302189: Mark assertion failures noreturn [v2] In-Reply-To: References: <4gLeUT6_5s6WpG-b-0146d490ZRgBq_JON1-5P3Wgtk=.e0bc3fd3-fce0-47b8-9478-1de796128731@github.com> Message-ID: On Tue, 7 Mar 2023 05:58:57 GMT, Kim Barrett wrote: >> No I'm comparing the syntax used elsewhere: >> >> DebuggingContext debugging{}; >> >> with that used here: >> >> DebuggingContext debugging; >> >> why the braces in most cases but not here? > > "syntax used elsewhere" are declarations of local variables initialized with _value initialization_ (which is _default initialization_ for this type). "that used here" is a member variable declaration in the Command class, where the initialization occurs elsewhere (in the constructor of the Command class [*]). They are entirely different syntactic entities. Does that help? Or am I still not understanding your question? > > [*] With C++11, data members can have default initializers, but that's not an approved feature for HotSpot. Sorry I was fixated on the weirdness (to me) of the {} initialization. I missed the fact these were different kinds of declarations. ------------- PR: https://git.openjdk.org/jdk/pull/12845 From dholmes at openjdk.org Tue Mar 7 08:15:20 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 7 Mar 2023 08:15:20 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams In-Reply-To: References: Message-ID: On Fri, 3 Mar 2023 14:50:34 GMT, Frederic Parain wrote: > Please review this change re-implementing the FieldInfo data structure. > > The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. > > The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). > > More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 > > Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. > > The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. > > Tested with mach5, tier 1 to 7. > > Thank you. HI Fred, I've taken one pass through this but it is a huge set of changes to try and digest. At this stage just a bunch of style nits. Thanks. src/hotspot/share/classfile/classFileParser.cpp line 1632: > 1630: { > 1631: debug_only(NoSafepointVerifier nsv;) > 1632: for(int i = 0; i < _temp_field_info->length(); i++) { Nit: space after 'for' please src/hotspot/share/classfile/classFileParser.cpp line 2012: > 2010: void ClassFileParser::FieldAnnotationCollector::apply_to(FieldInfo* f) { > 2011: if (is_contended()) > 2012: // Setting the contended group also set the contended bit in field flags Nit: s/set/sets/ src/hotspot/share/oops/fieldInfo.cpp line 30: > 28: > 29: void FieldInfo::print(outputStream* os, ConstantPool* cp) { > 30: os->print_cr("index=%d name_index=%d name=%s signature_index=%d signature=%s offset=%d AccessFlags=%d FieldFlags=%d initval_index=%d gen_signature_index=%d, gen_signature=%s contended_group=%d", Nit: please break up this line src/hotspot/share/oops/fieldInfo.cpp line 120: > 118: *java_fields_count = r.next_uint(); > 119: *injected_fields_count = r.next_uint(); > 120: while(r.has_next()) { Nit: space before ( src/hotspot/share/oops/fieldInfo.cpp line 135: > 133: int java_field_count = r.next_uint(); > 134: int injected_fields_count = r.next_uint(); > 135: while(r.has_next()) { Nit: space before ( src/hotspot/share/oops/fieldInfo.cpp line 140: > 138: fi.print(os, cp); > 139: } > 140: } Newline needed at EOF src/hotspot/share/oops/fieldInfo.inline.hpp line 135: > 133: new_flags = old_flags | mask; > 134: witness = Atomic::cmpxchg(&flags, old_flags, new_flags); > 135: } while(witness != old_flags); Nit: space before ( src/hotspot/share/oops/fieldInfo.inline.hpp line 155: > 153: inline void FieldStatus::update_access_watched(bool z) { update_flag(_fs_access_watched, z); } > 154: inline void FieldStatus::update_modification_watched(bool z) { update_flag(_fs_modification_watched, z); } > 155: inline void FieldStatus::update_initialized_final_update(bool z) {update_flag(_initialized_final_update, z); } Nit: space after { src/hotspot/share/oops/fieldStreams.hpp line 43: > 41: protected: > 42: const Array* _fieldinfo_stream; > 43: FieldInfoReader _reader; Nit variable name alignment seems off ------------- PR: https://git.openjdk.org/jdk/pull/12855 From ayang at openjdk.org Tue Mar 7 08:16:36 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 7 Mar 2023 08:16:36 GMT Subject: RFR: 8303534: Merge CompactibleSpace into ContiguousSpace [v2] In-Reply-To: References: Message-ID: On Fri, 3 Mar 2023 12:30:38 GMT, Albert Mingkun Yang wrote: >> Simple refactoring of merging two types. >> >> Test: tier1-5 > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > copyright-year Thanks for the review. ------------- PR: https://git.openjdk.org/jdk/pull/12841 From ayang at openjdk.org Tue Mar 7 08:16:39 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 7 Mar 2023 08:16:39 GMT Subject: Integrated: 8303534: Merge CompactibleSpace into ContiguousSpace In-Reply-To: References: Message-ID: <6U7ZF7mVsSZrtPD76uzRubWX5vFJTzeteT9nCjOu-Ko=.092d53b2-6aad-4930-b4a2-ffce14080ef6@github.com> On Thu, 2 Mar 2023 21:49:39 GMT, Albert Mingkun Yang wrote: > Simple refactoring of merging two types. > > Test: tier1-5 This pull request has now been integrated. Changeset: 7fbfc884 Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/7fbfc884f0980169e8c08167d59147222728b66b Stats: 197 lines in 14 files changed: 27 ins; 122 del; 48 mod 8303534: Merge CompactibleSpace into ContiguousSpace Reviewed-by: cjplummer, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/12841 From rehn at openjdk.org Tue Mar 7 08:25:06 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 7 Mar 2023 08:25:06 GMT Subject: RFR: 8300926: Several startup regressions ~6-70% in 21-b6 all platforms [v7] In-Reply-To: References: Message-ID: > Hi all, please consider. > > The original issue was when thread 1 asked to deopt nmethod set X and thread 2 asked for the same or a subset of X. > All method will already be marked, but the actual deoptimizing, not entrant, patching PC on stacks and patching post call nops, was not done yet. Which meant thread 2 could 'pass' thread 1. > Most places did deopt under Compile_lock, thus this is not an issue, but WB and clearCallSiteContext do not. > > Since a handshakes may take long before completion and Compile_lock is used for so much more than deopts. > The fix in https://bugs.openjdk.org/browse/JDK-8299074 instead always emitted a handshake even when everything was already marked. (instead of adding Compile_lock to all places) > > This turnout to be problematic in the startup, for example the number of deopt handshakes in jetty dry run startup went from 5 to 39 handshakes. > > This fix first adds a barrier for which you do not pass until the requested deopts have happened and it coalesces the handshakes. > Secondly it moves handshakes part out of the Compile_lock where it is possible. > > Which means we fix the performance bug and we reduce the contention on Compile_lock, meaning higher throughput in compiler and things such as class-loading. > > It passes t1-t7 with flying colours! t8 still not completed and I'm redoing some testing due to last minute simplifications. > > Thanks, Robbin Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: - Merge branch 'master' into 8300926 - Comment fixes - Include/fwd fixes - More review fixes - Coleen fix - Review fixes 2 - Review fixes - Fixed WS - Deopt scopes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12585/files - new: https://git.openjdk.org/jdk/pull/12585/files/e1162979..16732f1d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12585&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12585&range=05-06 Stats: 36623 lines in 1278 files changed: 23503 ins; 7075 del; 6045 mod Patch: https://git.openjdk.org/jdk/pull/12585.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12585/head:pull/12585 PR: https://git.openjdk.org/jdk/pull/12585 From stefank at openjdk.org Tue Mar 7 08:32:45 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 7 Mar 2023 08:32:45 GMT Subject: RFR: 8303621: BitMap::iterate should support lambdas and other function objects [v2] In-Reply-To: References: Message-ID: On Mon, 6 Mar 2023 10:33:36 GMT, Kim Barrett wrote: >> Please review this enhancement of BitMap::iterate to support lambdas and other >> function objects as the operation being applied to the set bit indices. Some >> for-loops using BitMap::get_next_one_offset have been changed to use >> BitMap::iterate with a lambda. >> >> (One reason for changing the for-loops is that I'm considering a change to the >> get_next_one_offset API, and reducing the number of direct uses will simplify >> that.) >> >> For convenience, the function can either return void (always iterate over the >> whole range) or bool (stop iteration if returns false). Iteration using >> closure objects invoked via a do_bit member function are now implemented by >> being wrapped in a lambda, so get the same convenience. (Though, of course, >> if the closure is derived from BitMapClosure then do_bit returns bool.) >> >> The unit tests are written as "early" tests, not requiring an initialized VM. >> They also avoid any C heap allocation (even though C heap allocation has very >> early support). This is done to minimize the requirements for running the >> tests, since BitMap is used in a lot of places. This attempts to run these >> tests before uses. (Yes, I know about JDK-8257226; maybe that will be fixed >> someday.) (Some existing BitMap gtests should be modified to do similarly; see >> JDK-8303636.) >> >> Testing: >> mach5 tier1-7 > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > improve description of iterate Thanks for doing this change. This aligns well with what we currently have in the Generational ZGC repository. I have one style comment that I think could help the readability of the code. src/hotspot/share/utilities/bitMap.inline.hpp line 273: > 271: } > 272: } > 273: } IMHO, this is an eyesore that I prefer if it could be hidden away a bit. Could we create an `invoke` function that that calls this, and change the code to: template inline bool BitMap::iterate(Function function, idx_t beg, idx_t end) const { for (idx_t index = beg; true; ++index) { index = get_next_one_offset(index, end); if (index >= end) { return true; } else if (!invoke(function, index)) { return false; } } } And also add a comment why we need IterateInvoker? ------------- Changes requested by stefank (Reviewer). PR: https://git.openjdk.org/jdk/pull/12876 From duke at openjdk.org Tue Mar 7 08:51:53 2023 From: duke at openjdk.org (Jan Kratochvil) Date: Tue, 7 Mar 2023 08:51:53 GMT Subject: RFR: 8302191: Performance degradation for float/double modulo on Linux [v11] In-Reply-To: References: Message-ID: > I have OCA already processed/approved. I am not Author but my Author request is being processed these days (sent to Rob McKenna). > I did regression test x86_64 OpenJDK-8. I will leave other regression testing on GHA. > The patch (and former GCC performance regression) affects only x86_64+i686. Jan Kratochvil has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: - Fix #endif comment - found by dholmes-ora. - Merge branch 'master' into modulo - Fix win32 broken build. - Merge remote-tracking branch 'origin/master' into modulo - Always include the _WIN64 workaround - a review by dholmes-ora. - Remove comments to be moved to JBS (Bug System) - a review by jddarcy. - Uppercase L - a review by turbanoff. - Fix copyright author. - Fix WIN32 vs. WIN64. - Update according to the upstream review by David Holmes. - ... and 1 more: https://git.openjdk.org/jdk/compare/b7d54b25...e4ff04dc ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12508/files - new: https://git.openjdk.org/jdk/pull/12508/files/3e1c05d0..e4ff04dc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12508&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12508&range=09-10 Stats: 9028 lines in 352 files changed: 6228 ins; 1738 del; 1062 mod Patch: https://git.openjdk.org/jdk/pull/12508.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12508/head:pull/12508 PR: https://git.openjdk.org/jdk/pull/12508 From duke at openjdk.org Tue Mar 7 08:52:01 2023 From: duke at openjdk.org (Jan Kratochvil) Date: Tue, 7 Mar 2023 08:52:01 GMT Subject: RFR: 8302191: Performance degradation for float/double modulo on Linux [v10] In-Reply-To: References: Message-ID: On Tue, 7 Mar 2023 01:45:50 GMT, David Holmes wrote: >> Jan Kratochvil has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix win32 broken build. > > src/hotspot/share/runtime/sharedRuntime.cpp line 238: > >> 236: #endif >> 237: >> 238: #if !defined(TARGET_COMPILER_gcc) || defined(_WIN64) > > That is the wrong condition - it would activate for any non-Windows, non-gcc compiler. This code is only for _WIN64 is it not? regardless of the (theoretical) compiler used on win64? For non-Windows non-gcc compiler it will use: > return ((jfloat)fmod((double)x,(double)y)); Which I find correct. > src/hotspot/share/runtime/sharedRuntime.cpp line 272: > >> 270: #endif >> 271: JRT_END >> 272: #endif // !X86 || _WIN64 > > !X86 ??? True, thanks for finding it. ------------- PR: https://git.openjdk.org/jdk/pull/12508 From rehn at openjdk.org Tue Mar 7 09:20:18 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 7 Mar 2023 09:20:18 GMT Subject: RFR: 8300926: Several startup regressions ~6-70% in 21-b6 all platforms [v6] In-Reply-To: References: Message-ID: <8-cfqT24ZTPwdTUlHSKMueLwMz2g3y9Qc8rGMPPttvU=.5e6c3c63-75fe-42fd-ab72-1f2f766a1a4e@github.com> On Mon, 27 Feb 2023 00:29:26 GMT, David Holmes wrote: > In `ciEnv::register_method` we take the `Compile_lock` to ensure `add_to_hierarchy` can't run: > > ``` > // Prevent SystemDictionary::add_to_hierarchy from running > // and invalidating our dependencies until we install this method. > // No safepoints are allowed. Otherwise, class redefinition can occur in between. > MutexLocker ml(Compile_lock); > NoSafepointVerifier nsv; > ``` > > does moving the deopt outside of the `Compile_lock` affect that? The class hierarchy is still handled inside the Compile_lock, so that comment is still true. The deopt is still done inside _init_monitor during classloading, which means no-one can create such object until deopt is completed. ------------- PR: https://git.openjdk.org/jdk/pull/12585 From rehn at openjdk.org Tue Mar 7 09:26:36 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 7 Mar 2023 09:26:36 GMT Subject: RFR: 8300926: Several startup regressions ~6-70% in 21-b6 all platforms [v7] In-Reply-To: References: Message-ID: On Tue, 7 Mar 2023 08:25:06 GMT, Robbin Ehn wrote: >> Hi all, please consider. >> >> The original issue was when thread 1 asked to deopt nmethod set X and thread 2 asked for the same or a subset of X. >> All method will already be marked, but the actual deoptimizing, not entrant, patching PC on stacks and patching post call nops, was not done yet. Which meant thread 2 could 'pass' thread 1. >> Most places did deopt under Compile_lock, thus this is not an issue, but WB and clearCallSiteContext do not. >> >> Since a handshakes may take long before completion and Compile_lock is used for so much more than deopts. >> The fix in https://bugs.openjdk.org/browse/JDK-8299074 instead always emitted a handshake even when everything was already marked. (instead of adding Compile_lock to all places) >> >> This turnout to be problematic in the startup, for example the number of deopt handshakes in jetty dry run startup went from 5 to 39 handshakes. >> >> This fix first adds a barrier for which you do not pass until the requested deopts have happened and it coalesces the handshakes. >> Secondly it moves handshakes part out of the Compile_lock where it is possible. >> >> Which means we fix the performance bug and we reduce the contention on Compile_lock, meaning higher throughput in compiler and things such as class-loading. >> >> It passes t1-t7 with flying colours! t8 still not completed and I'm redoing some testing due to last minute simplifications. >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: > > - Merge branch 'master' into 8300926 > - Comment fixes > - Include/fwd fixes > - More review fixes > - Coleen fix > - Review fixes 2 > - Review fixes > - Fixed WS > - Deopt scopes `vmClasses::resolve_shared_class` It's only an micro-optimization to remove lock, since we are single-threaded when this code runs. The calls above: `dictionary->add_klass` `InstanceKlass::restore_unshareable_info` can also grab locks. Since we pass all tests and other paths may take locks I'm not concerned. ------------- PR: https://git.openjdk.org/jdk/pull/12585 From kbarrett at openjdk.org Tue Mar 7 09:46:09 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 7 Mar 2023 09:46:09 GMT Subject: RFR: 8303621: BitMap::iterate should support lambdas and other function objects [v3] In-Reply-To: References: Message-ID: <324vpJnVkjaUEzKkmC0NSy2PbjrOFfTkH5WhiKf9-b4=.0fb689d0-875b-4761-9743-db3dbf2f483f@github.com> > Please review this enhancement of BitMap::iterate to support lambdas and other > function objects as the operation being applied to the set bit indices. Some > for-loops using BitMap::get_next_one_offset have been changed to use > BitMap::iterate with a lambda. > > (One reason for changing the for-loops is that I'm considering a change to the > get_next_one_offset API, and reducing the number of direct uses will simplify > that.) > > For convenience, the function can either return void (always iterate over the > whole range) or bool (stop iteration if returns false). Iteration using > closure objects invoked via a do_bit member function are now implemented by > being wrapped in a lambda, so get the same convenience. (Though, of course, > if the closure is derived from BitMapClosure then do_bit returns bool.) > > The unit tests are written as "early" tests, not requiring an initialized VM. > They also avoid any C heap allocation (even though C heap allocation has very > early support). This is done to minimize the requirements for running the > tests, since BitMap is used in a lot of places. This attempts to run these > tests before uses. (Yes, I know about JDK-8257226; maybe that will be fixed > someday.) (Some existing BitMap gtests should be modified to do similarly; see > JDK-8303636.) > > Testing: > mach5 tier1-7 Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - stefank review - Merge branch 'master' into lambda-iterate - improve description of iterate - use iterate with lambda - bitmap iteration supports lambda ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12876/files - new: https://git.openjdk.org/jdk/pull/12876/files/a1befdf3..4c03ae1b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12876&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12876&range=01-02 Stats: 6931 lines in 112 files changed: 4764 ins; 1389 del; 778 mod Patch: https://git.openjdk.org/jdk/pull/12876.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12876/head:pull/12876 PR: https://git.openjdk.org/jdk/pull/12876 From tschatzl at openjdk.org Tue Mar 7 09:57:19 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 7 Mar 2023 09:57:19 GMT Subject: RFR: 8191565: Last-ditch Full GC should also move humongous objects [v2] In-Reply-To: References: <58l059EvQI6HNQyjUYSGYEWt6x-c1yvtmfX1QWfinH8=.87517ba1-ec81-4b9f-a41b-b05c8d33cf3d@github.com> Message-ID: On Mon, 6 Mar 2023 10:22:27 GMT, Ivan Walulya wrote: >> Hi All, >> >> Please review this change to move humongous regions during the Last-Ditch full gc ( on `do_maximal_compaction`). This change will enable G1 to avoid encountering Out-Of-Memory errors that may occur due to the fragmentation of memory regions caused by the allocation of large memory blocks. >> >> Here's how it works: At the end of `phase2_prepare_compaction`, G1 performs a serial compaction process for regular objects, which results in the heap being divided into two parts. The first part is a densely populated prefix that contains all the regular objects that have been moved. The second part consists of the remaining heap space, which may contain free regions, uncommitted regions, and regions that are not compacting. By moving/compacting the humongous objects in the second part of the heap closer to the dense prefix, G1 reduces the region fragmentation and avoids running into OOM errors. >> >> We have enabled for G1 the Jtreg test that was previously used only for Shenandoah to test such workload. >> >> Testing: Tier 1-3 > > Ivan Walulya has updated the pull request incrementally with two additional commits since the last revision: > > - Refactor resetting humongous metadata > - Thomas review Looks good. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.org/jdk/pull/12830 From kbarrett at openjdk.org Tue Mar 7 09:57:48 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 7 Mar 2023 09:57:48 GMT Subject: RFR: 8303621: BitMap::iterate should support lambdas and other function objects [v4] In-Reply-To: References: Message-ID: > Please review this enhancement of BitMap::iterate to support lambdas and other > function objects as the operation being applied to the set bit indices. Some > for-loops using BitMap::get_next_one_offset have been changed to use > BitMap::iterate with a lambda. > > (One reason for changing the for-loops is that I'm considering a change to the > get_next_one_offset API, and reducing the number of direct uses will simplify > that.) > > For convenience, the function can either return void (always iterate over the > whole range) or bool (stop iteration if returns false). Iteration using > closure objects invoked via a do_bit member function are now implemented by > being wrapped in a lambda, so get the same convenience. (Though, of course, > if the closure is derived from BitMapClosure then do_bit returns bool.) > > The unit tests are written as "early" tests, not requiring an initialized VM. > They also avoid any C heap allocation (even though C heap allocation has very > early support). This is done to minimize the requirements for running the > tests, since BitMap is used in a lot of places. This attempts to run these > tests before uses. (Yes, I know about JDK-8257226; maybe that will be fixed > someday.) (Some existing BitMap gtests should be modified to do similarly; see > JDK-8303636.) > > Testing: > mach5 tier1-7 Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: convert to bool rather than calling RT::operator! ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12876/files - new: https://git.openjdk.org/jdk/pull/12876/files/4c03ae1b..83de109a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12876&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12876&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12876.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12876/head:pull/12876 PR: https://git.openjdk.org/jdk/pull/12876 From kbarrett at openjdk.org Tue Mar 7 10:01:47 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 7 Mar 2023 10:01:47 GMT Subject: RFR: 8303621: BitMap::iterate should support lambdas and other function objects [v2] In-Reply-To: References: Message-ID: On Tue, 7 Mar 2023 07:39:50 GMT, Stefan Karlsson wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> improve description of iterate > > src/hotspot/share/utilities/bitMap.inline.hpp line 273: > >> 271: } >> 272: } >> 273: } > > IMHO, this is an eyesore that I prefer if it could be hidden away a bit. Could we create an `invoke` function that that calls this, and change the code to: > > template > inline bool BitMap::iterate(Function function, idx_t beg, idx_t end) const { > for (idx_t index = beg; true; ++index) { > index = get_next_one_offset(index, end); > if (index >= end) { > return true; > } else if (!invoke(function, index)) { > return false; > } > } > } > > > And also add a comment why we need IterateInvoker? I added a local function object so that template clutter wasn't all right in the middle of the function. Making it a member function instead seemed to just push around the deck chairs. Added comment, and also fixed a lurking, probably never to be hit, bug that we were stopping early if ReturnType::operator! returned true, rather than if the converted to bool result was false. ------------- PR: https://git.openjdk.org/jdk/pull/12876 From ihse at openjdk.org Tue Mar 7 11:19:08 2023 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 7 Mar 2023 11:19:08 GMT Subject: RFR: 8303480: Miscellaneous fixes to mostly invisible doc comments [v2] In-Reply-To: References: Message-ID: On Mon, 6 Mar 2023 20:22:48 GMT, Pavel Rappo wrote: >> Please review this superficial documentation cleanup that was triggered by unrelated analysis of doc comments in JDK API. >> >> The only effect that this multi-area PR has on the JDK API Documentation (i.e. the observable effect on the generated HTML pages) can be summarized as follows: >> >> >> diff -ur build/macosx-aarch64/images/docs-before/api/serialized-form.html build/macosx-aarch64/images/docs-after/api/serialized-form.html >> --- build/macosx-aarch64/images/docs-before/api/serialized-form.html 2023-03-02 11:47:44 >> +++ build/macosx-aarch64/images/docs-after/api/serialized-form.html 2023-03-02 11:48:45 >> @@ -17084,7 +17084,7 @@ >> throws IOException, >> ClassNotFoundException >>
readObject is called to restore the state of the >> - (@code BasicPermission} from a stream.
>> + BasicPermission from a stream. >>
>>
Parameters:
>>
s - the ObjectInputStream from which data is read
>> >> Notes >> ----- >> >> * I'm not an expert in any of the affected areas, except for jdk.javadoc, and I was merely after misused tags. Because of that, I would appreciate reviews from experts in other areas. >> * I discovered many more issues than I included in this PR. The excluded issues seem to occur in infrequently updated third-party code (e.g. javax.xml), which I assume we shouldn't touch unless necessary. >> * I will update copyright years after (and if) the fix had been approved, as required. > > Pavel Rappo has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into 8303480 > - Initial commit Marked as reviewed by ihse (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/12826 From kbarrett at openjdk.org Tue Mar 7 12:08:36 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 7 Mar 2023 12:08:36 GMT Subject: RFR: 8303621: BitMap::iterate should support lambdas and other function objects [v2] In-Reply-To: References: Message-ID: On Tue, 7 Mar 2023 09:58:24 GMT, Kim Barrett wrote: >> src/hotspot/share/utilities/bitMap.inline.hpp line 273: >> >>> 271: } >>> 272: } >>> 273: } >> >> IMHO, this is an eyesore that I prefer if it could be hidden away a bit. Could we create an `invoke` function that that calls this, and change the code to: >> >> template >> inline bool BitMap::iterate(Function function, idx_t beg, idx_t end) const { >> for (idx_t index = beg; true; ++index) { >> index = get_next_one_offset(index, end); >> if (index >= end) { >> return true; >> } else if (!invoke(function, index)) { >> return false; >> } >> } >> } >> >> >> And also add a comment why we need IterateInvoker? > > I added a local function object so that template clutter wasn't all right in the middle of the function. > Making it a member function instead seemed to just push around the deck chairs. > Added comment, and also fixed a lurking, probably never to be hit, bug that we were stopping early if > ReturnType::operator! returned true, rather than if the converted to bool result was false. Here's an alternative, using member function templates instead of the partially specialized `IterateInvoker` function object. This doesn't seem obviously better to me. Other ideas I looked at involved messy SFINAE, some with trailing return types (a not yet approved for HotSpot C++11 feature). template static bool iterate_invoke_aux(Function function, idx_t index, Dispatch*) { return function(index); } template static bool iterate_invoke_aux(Function function, idx_t index, void*) { function(index); return true; } template static bool iterate_invoke(Function function, idx_t index) { using ReturnType = decltype(function(index)); return iterate_invoke_aux(function, index, static_cast(nullptr)); } ------------- PR: https://git.openjdk.org/jdk/pull/12876 From dholmes at openjdk.org Tue Mar 7 12:18:06 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 7 Mar 2023 12:18:06 GMT Subject: RFR: 8302191: Performance degradation for float/double modulo on Linux [v10] In-Reply-To: References: Message-ID: On Tue, 7 Mar 2023 08:47:21 GMT, Jan Kratochvil wrote: >> src/hotspot/share/runtime/sharedRuntime.cpp line 238: >> >>> 236: #endif >>> 237: >>> 238: #if !defined(TARGET_COMPILER_gcc) || defined(_WIN64) >> >> That is the wrong condition - it would activate for any non-Windows, non-gcc compiler. This code is only for _WIN64 is it not? regardless of the (theoretical) compiler used on win64? > > For non-Windows non-gcc compiler it will use: >> return ((jfloat)fmod((double)x,(double)y)); > > Which I find correct. Sorry my mistake. ------------- PR: https://git.openjdk.org/jdk/pull/12508 From stefank at openjdk.org Tue Mar 7 12:25:51 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 7 Mar 2023 12:25:51 GMT Subject: RFR: 8303621: BitMap::iterate should support lambdas and other function objects [v4] In-Reply-To: References: Message-ID: On Tue, 7 Mar 2023 09:57:48 GMT, Kim Barrett wrote: >> Please review this enhancement of BitMap::iterate to support lambdas and other >> function objects as the operation being applied to the set bit indices. Some >> for-loops using BitMap::get_next_one_offset have been changed to use >> BitMap::iterate with a lambda. >> >> (One reason for changing the for-loops is that I'm considering a change to the >> get_next_one_offset API, and reducing the number of direct uses will simplify >> that.) >> >> For convenience, the function can either return void (always iterate over the >> whole range) or bool (stop iteration if returns false). Iteration using >> closure objects invoked via a do_bit member function are now implemented by >> being wrapped in a lambda, so get the same convenience. (Though, of course, >> if the closure is derived from BitMapClosure then do_bit returns bool.) >> >> The unit tests are written as "early" tests, not requiring an initialized VM. >> They also avoid any C heap allocation (even though C heap allocation has very >> early support). This is done to minimize the requirements for running the >> tests, since BitMap is used in a lot of places. This attempts to run these >> tests before uses. (Yes, I know about JDK-8257226; maybe that will be fixed >> someday.) (Some existing BitMap gtests should be modified to do similarly; see >> JDK-8303636.) >> >> Testing: >> mach5 tier1-7 > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > convert to bool rather than calling RT::operator! Thanks. That's a bit easier to read, for me at least. ------------- Marked as reviewed by stefank (Reviewer). PR: https://git.openjdk.org/jdk/pull/12876 From kbarrett at openjdk.org Tue Mar 7 13:04:04 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 7 Mar 2023 13:04:04 GMT Subject: RFR: 8303621: BitMap::iterate should support lambdas and other function objects [v2] In-Reply-To: References: Message-ID: On Mon, 6 Mar 2023 10:59:42 GMT, Axel Boldt-Christmas wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> improve description of iterate > > Marked as reviewed by aboldtch (Committer). Thanks for reviews @xmas92 , @tschatzl , and @stefank . ------------- PR: https://git.openjdk.org/jdk/pull/12876 From kbarrett at openjdk.org Tue Mar 7 13:04:07 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 7 Mar 2023 13:04:07 GMT Subject: Integrated: 8303621: BitMap::iterate should support lambdas and other function objects In-Reply-To: References: Message-ID: On Mon, 6 Mar 2023 07:27:23 GMT, Kim Barrett wrote: > Please review this enhancement of BitMap::iterate to support lambdas and other > function objects as the operation being applied to the set bit indices. Some > for-loops using BitMap::get_next_one_offset have been changed to use > BitMap::iterate with a lambda. > > (One reason for changing the for-loops is that I'm considering a change to the > get_next_one_offset API, and reducing the number of direct uses will simplify > that.) > > For convenience, the function can either return void (always iterate over the > whole range) or bool (stop iteration if returns false). Iteration using > closure objects invoked via a do_bit member function are now implemented by > being wrapped in a lambda, so get the same convenience. (Though, of course, > if the closure is derived from BitMapClosure then do_bit returns bool.) > > The unit tests are written as "early" tests, not requiring an initialized VM. > They also avoid any C heap allocation (even though C heap allocation has very > early support). This is done to minimize the requirements for running the > tests, since BitMap is used in a lot of places. This attempts to run these > tests before uses. (Yes, I know about JDK-8257226; maybe that will be fixed > someday.) (Some existing BitMap gtests should be modified to do similarly; see > JDK-8303636.) > > Testing: > mach5 tier1-7 This pull request has now been integrated. Changeset: 008c5eb4 Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/008c5eb4dd40f93e9c7849bfc681d615ab29baad Stats: 310 lines in 5 files changed: 268 ins; 8 del; 34 mod 8303621: BitMap::iterate should support lambdas and other function objects Reviewed-by: aboldtch, tschatzl, stefank ------------- PR: https://git.openjdk.org/jdk/pull/12876 From kbarrett at openjdk.org Tue Mar 7 13:04:03 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 7 Mar 2023 13:04:03 GMT Subject: RFR: 8303621: BitMap::iterate should support lambdas and other function objects [v5] In-Reply-To: References: Message-ID: > Please review this enhancement of BitMap::iterate to support lambdas and other > function objects as the operation being applied to the set bit indices. Some > for-loops using BitMap::get_next_one_offset have been changed to use > BitMap::iterate with a lambda. > > (One reason for changing the for-loops is that I'm considering a change to the > get_next_one_offset API, and reducing the number of direct uses will simplify > that.) > > For convenience, the function can either return void (always iterate over the > whole range) or bool (stop iteration if returns false). Iteration using > closure objects invoked via a do_bit member function are now implemented by > being wrapped in a lambda, so get the same convenience. (Though, of course, > if the closure is derived from BitMapClosure then do_bit returns bool.) > > The unit tests are written as "early" tests, not requiring an initialized VM. > They also avoid any C heap allocation (even though C heap allocation has very > early support). This is done to minimize the requirements for running the > tests, since BitMap is used in a lot of places. This attempts to run these > tests before uses. (Yes, I know about JDK-8257226; maybe that will be fixed > someday.) (Some existing BitMap gtests should be modified to do similarly; see > JDK-8303636.) > > Testing: > mach5 tier1-7 Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge branch 'master' into lambda-iterate - convert to bool rather than calling RT::operator! - stefank review - Merge branch 'master' into lambda-iterate - improve description of iterate - use iterate with lambda - bitmap iteration supports lambda ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12876/files - new: https://git.openjdk.org/jdk/pull/12876/files/83de109a..83f8c567 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12876&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12876&range=03-04 Stats: 85 lines in 3 files changed: 83 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/12876.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12876/head:pull/12876 PR: https://git.openjdk.org/jdk/pull/12876 From coleenp at openjdk.org Tue Mar 7 13:15:26 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 7 Mar 2023 13:15:26 GMT Subject: RFR: 8302189: Mark assertion failures noreturn [v2] In-Reply-To: References: <4gLeUT6_5s6WpG-b-0146d490ZRgBq_JON1-5P3Wgtk=.e0bc3fd3-fce0-47b8-9478-1de796128731@github.com> Message-ID: On Sat, 4 Mar 2023 11:23:58 GMT, Kim Barrett wrote: >> Also 8302799: Refactor Debugging variable usage for noreturn crash reporting > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > make Debugging::_enabled a nesting counter Seems fine. I'll test drive with the debug commands next time I need to use them but this is similar to to the changes I have in my private repos. Thanks for keeping this functionality. src/hotspot/share/utilities/debug.hpp line 108: > 106: // because we need a fallback when we don't have any mechanism for detecting > 107: // constant evaluation. > 108: #if defined(TARGET_COMPILER_gcc) || defined(TARGET_COMPILER_xlc) All this seems like it should go in COMPILER_HEADER(globalDefinitions.hpp) but since globalDefinitions.hpp includes debug.hpp, you can't do this. Can we file an RFE to clean this up (if possible)? ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.org/jdk/pull/12845 From rehn at openjdk.org Tue Mar 7 13:47:55 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 7 Mar 2023 13:47:55 GMT Subject: RFR: 8300926: Several startup regressions ~6-70% in 21-b6 all platforms [v8] In-Reply-To: References: Message-ID: > Hi all, please consider. > > The original issue was when thread 1 asked to deopt nmethod set X and thread 2 asked for the same or a subset of X. > All method will already be marked, but the actual deoptimizing, not entrant, patching PC on stacks and patching post call nops, was not done yet. Which meant thread 2 could 'pass' thread 1. > Most places did deopt under Compile_lock, thus this is not an issue, but WB and clearCallSiteContext do not. > > Since a handshakes may take long before completion and Compile_lock is used for so much more than deopts. > The fix in https://bugs.openjdk.org/browse/JDK-8299074 instead always emitted a handshake even when everything was already marked. (instead of adding Compile_lock to all places) > > This turnout to be problematic in the startup, for example the number of deopt handshakes in jetty dry run startup went from 5 to 39 handshakes. > > This fix first adds a barrier for which you do not pass until the requested deopts have happened and it coalesces the handshakes. > Secondly it moves handshakes part out of the Compile_lock where it is possible. > > Which means we fix the performance bug and we reduce the contention on Compile_lock, meaning higher throughput in compiler and things such as class-loading. > > It passes t1-t7 with flying colours! t8 still not completed and I'm redoing some testing due to last minute simplifications. > > Thanks, Robbin Robbin Ehn has updated the pull request incrementally with two additional commits since the last revision: - Grab lock so code() is stable - Non CHA based vtables fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12585/files - new: https://git.openjdk.org/jdk/pull/12585/files/16732f1d..be2d3b5e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12585&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12585&range=06-07 Stats: 19 lines in 2 files changed: 16 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/12585.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12585/head:pull/12585 PR: https://git.openjdk.org/jdk/pull/12585 From rehn at openjdk.org Tue Mar 7 13:47:58 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 7 Mar 2023 13:47:58 GMT Subject: RFR: 8300926: Several startup regressions ~6-70% in 21-b6 all platforms [v6] In-Reply-To: References: Message-ID: On Mon, 6 Mar 2023 07:05:00 GMT, Axel Boldt-Christmas wrote: >> Robbin Ehn has updated the pull request incrementally with two additional commits since the last revision: >> >> - Comment fixes >> - Include/fwd fixes > > src/hotspot/share/prims/whitebox.cpp line 798: > >> 796: result += mh->method_holder()->mark_osr_nmethods(&deopt_scope, mh()); >> 797: } else if (mh->code() != nullptr) { >> 798: deopt_scope.mark(mh->code()); > > I was working on a patch based on top of this one and had a crash here. > > The second call to `mh->code()` returned a `nullptr`. It looks racy to me and seems like the only thing protecting `CompileMethod::code()` is the `CompiledMethod_lock` which is not held here. > > Regardless `mh->code()` should probably only be loaded once. (Or at least not re-loaded after the nullptr check) Fixed, thanks! ------------- PR: https://git.openjdk.org/jdk/pull/12585 From coleenp at openjdk.org Tue Mar 7 14:14:39 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 7 Mar 2023 14:14:39 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry In-Reply-To: References: Message-ID: On Mon, 27 Feb 2023 21:37:34 GMT, Matias Saavedra Silva wrote: > The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. > > This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. > > Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. > > The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. > > This change supports the following platforms: x86, aarch64, PPC, and RISCV This looks really good. I noted a few changes and questions. src/hotspot/cpu/ppc/templateTable_ppc_64.cpp line 53: > 51: > 52: #undef __ > 53: #define __ Disassembler::hook(__FILE__, __LINE__, _masm)-> What is this? Is this something useful for debugging the template interpreter? Probably doesn't belong with this change but might be nice to have (?) @reinrich src/hotspot/cpu/x86/templateTable_x86.cpp line 2801: > 2799: bool is_invokevirtual, > 2800: bool is_invokevfinal, /*unused*/ > 2801: bool is_invokedynamic /*unused*/) { I assume you have to keep this parameter for the platform that doesn't still have this change (s390)? src/hotspot/share/cds/classListParser.cpp line 590: > 588: // resolve it > 589: Handle recv; > 590: LinkResolver::resolve_invoke(info, recv, pool, ConstantPool::encode_invokedynamic_index(indy_index), Bytecodes::_invokedynamic, CHECK); nit: can you reformat so the line isn't so long? src/hotspot/share/ci/ciReplay.cpp line 419: > 417: be used to avoid multiple blocks of similar code. When CPCE is obsoleted > 418: these can be removed > 419: */ I don't know if you really need this comment. If so, use // style instead. src/hotspot/share/ci/ciReplay.cpp line 453: > 451: if (!parse_terminator()) { > 452: report_error("no dynamic invoke found"); > 453: return NULL; nullptr not NULL. src/hotspot/share/interpreter/rewriter.hpp line 143: > 141: assert(ref_index >= _resolved_reference_limit, ""); > 142: if (_pool->tag_at(cp_index).value() != JVM_CONSTANT_InvokeDynamic) { > 143: _invokedynamic_references_map.at_put_grow(ref_index, cache_index, -1); I think you might need to rename _invokedynamic_references_map variable name to _invokehandle_references_map with this change also. This will be confusng. src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 639: > 637: int indy_index = -1; > 638: for (int i = 0; i < cp->resolved_indy_entries_length(); i++) { > 639: tty->print_cr("Index: %d", cp->resolved_indy_entry_at(i)->constant_pool_index()); Looks like some debugging left in. src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 1529: > 1527: if (cp_cache_entry->is_resolved(Bytecodes::_invokedynamic)) { > 1528: return Bytecodes::_invokedynamic; > 1529: } This seems like it should be removed? src/hotspot/share/oops/cpCache.cpp line 727: > 725: set_reference_map(nullptr); > 726: #if INCLUDE_CDS > 727: if (_initial_entries != nullptr) { @iklam with moving invokedynamic entries out, do you still need to save initialized entries ? Does invokehandle need this? (Should have separate RFE if more cleanup is possible) src/hotspot/share/oops/resolvedIndyEntry.hpp line 26: > 24: > 25: #ifndef SHARE_OOPS_ResolvedIndyEntry_HPP > 26: #define SHARE_OOPS_ResolvedIndyEntry_HPP Make this all capital letters src/hotspot/share/oops/resolvedIndyEntry.hpp line 71: > 69: > 70: // Bit shift to get flags > 71: // Note: Only one flag exists at the moment but more could be added Actually two flags - resolution_failed too. src/hotspot/share/oops/resolvedIndyEntry.hpp line 87: > 85: bool is_vfinal() const { return false; } > 86: bool is_final() const { return false; } > 87: bool has_local_signature() const { return true; } The closed } don't need to be aligned. src/hotspot/share/oops/resolvedIndyEntry.hpp line 111: > 109: _return_type = return_type; > 110: set_flags(has_appendix); > 111: Atomic::release_store(&_method, m); Add a comment like // set this last. The method is read lock free from the entry and if set, indicates the rest of the resolution information is valid. src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/ResolvedIndyArray.java line 35: > 33: import sun.jvm.hotspot.types.WrongTypeException; > 34: import sun.jvm.hotspot.utilities.GenericArray; > 35: import sun.jvm.hotspot.utilities.Observable; Do you need all of these imports ? src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotConstantPool.java line 935: > 933: /*if (isInvokedynamicIndex(cpi)) { > 934: compilerToVM().resolveInvokeDynamicInPool(this, cpi); > 935: }*/ Is there something to fix here? ------------- Changes requested by coleenp (Reviewer). PR: https://git.openjdk.org/jdk/pull/12778 From mbaesken at openjdk.org Tue Mar 7 14:29:47 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 7 Mar 2023 14:29:47 GMT Subject: RFR: JDK-8302989: Add missing INCLUDE_CDS checks [v4] In-Reply-To: References: Message-ID: On Fri, 3 Mar 2023 08:55:07 GMT, Matthias Baesken wrote: >> The cds only coding in hotspot is usually guarded with the INCLUDE_CDS macro so that it can be removed at compile time in case the correct configure flags are set. >> However at some places INCLUDE_CDS is missing and should be added. >> >> One question - should (additionally to the UseSharedSpaces code section) the DumpSharedSpaces code sections be guarded as well with INCLUDE_CDS macros ? > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > Adjust arguments handling Unfortunately with the latest revision of the patch, quite a few tests fail when the build is configured with `--disable-cds` . Seems those tests set various -Xshare options for some reason, especially `-Xshare:off` . Failing tests (there might be a few more where the reason is not fully clear from the log) jdk/jfr/event/runtime/TestMetaspaceAllocationFailure.java Unrecognized option: -Xshare:off sun/security/provider/X509Factory/BigCRL.java Unrecognized option: -Xshare:off java/lang/module/ModuleDescriptorHashCodeTest.java Unrecognized option: -Xshare:off java/math/BigInteger/largeMemory/DivisionOverflow.java Unrecognized option: -Xshare:off java/math/BigInteger/largeMemory/StringConstructorOverflow.java Unrecognized option: -Xshare:off runtime/CommandLine/OptionsValidation/TestOptionsWithRanges.java#id7 stderr content[Unrecognized option: -Xshare:dump runtime/7158988/FieldMonitor.java JavaTest Message: Test threw exception: java.lang.Error: Target VM failed to initialize: VM initialization failed for: ------------- PR: https://git.openjdk.org/jdk/pull/12691 From mdoerr at openjdk.org Tue Mar 7 14:33:22 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 7 Mar 2023 14:33:22 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v10] In-Reply-To: References: Message-ID: > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Will get addressed separately: [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: - Merge remote-tracking branch 'origin' into PPC64_Panama - Handle HFA corner cases with overlapping registers in Java. - Fix merge bug. - Merge branch 'master' into PPC64_Panama - Add test for HFA corner cases. - Minor cleanup. - HFA: Add support for nested structures. See JDK-8300294. - Remove size restriction for structs. Add TODO for Big Endian. - Clean fix for NativeMemorySegmentImpl issue with byteSize 0. - Initial Panama implementation. ------------- Changes: https://git.openjdk.org/jdk/pull/12708/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=09 Stats: 2287 lines in 59 files changed: 2157 ins; 16 del; 114 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From lucy at openjdk.org Tue Mar 7 14:49:31 2023 From: lucy at openjdk.org (Lutz Schmidt) Date: Tue, 7 Mar 2023 14:49:31 GMT Subject: RFR: JDK-8303575: adjust Xen handling on Linux aarch64 In-Reply-To: References: Message-ID: <7Xly8bGJIWby8mvweZ_Qk8GE3LDpFZp2MwROEsDR6Yk=.119cdb2f-38bb-4782-957b-2d649f97a194@github.com> On Fri, 3 Mar 2023 12:56:55 GMT, Matthias Baesken wrote: > After [JDK-8301050](https://bugs.openjdk.org/browse/JDK-8301050) the Xen handling on aarch64 should be slightly adjusted. > The output in VM_Version::print_platform_virtualization_info was missed and needs to be added for Xen. > Additionally a new XenPVHVM virtualization type could be introduced because this describes the Xen on aarch64 better. > See also https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/virtualization-on-arm-with-xen where the naming is used. Looks good to me. ------------- Marked as reviewed by lucy (Reviewer). PR: https://git.openjdk.org/jdk/pull/12853 From rrich at openjdk.org Tue Mar 7 15:08:12 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 7 Mar 2023 15:08:12 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry In-Reply-To: References: Message-ID: On Tue, 7 Mar 2023 13:30:50 GMT, Coleen Phillimore wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > src/hotspot/cpu/ppc/templateTable_ppc_64.cpp line 53: > >> 51: >> 52: #undef __ >> 53: #define __ Disassembler::hook(__FILE__, __LINE__, _masm)-> > > What is this? Is this something useful for debugging the template interpreter? Probably doesn't belong with this change but might be nice to have (?) @reinrich Yes this is really useful when debugging the template interpreter. It annotates the disassembly with the generator source code. It helped tracking down a bug in the ppc part oft this pr. Other platforms have it too. Example: invokedynamic 186 invokedynamic [0x00003fff80075a00, 0x00003fff80075dc8] 968 bytes -------------------------------------------------------------------------------- 0x00003fff80075a00: std r17,0(r15) ;;@FILE: src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp ;; 2185: aep = __ pc(); __ push_ptr(); __ b(L); 0x00003fff80075a04: addi r15,r15,-8 0x00003fff80075a08: b 0x00003fff80075a40 ;; 2185: aep = __ pc(); __ push_ptr(); __ b(L); 0x00003fff80075a0c: stfs f15,0(r15) ;; 2186: fep = __ pc(); __ push_f(); __ b(L); 0x00003fff80075a10: addi r15,r15,-8 0x00003fff80075a14: b 0x00003fff80075a40 ;; 2186: fep = __ pc(); __ push_f(); __ b(L); 0x00003fff80075a18: stfd f15,-8(r15) ;; 2187: dep = __ pc(); __ push_d(); __ b(L); 0x00003fff80075a1c: addi r15,r15,-16 0x00003fff80075a20: b 0x00003fff80075a40 ;; 2187: dep = __ pc(); __ push_d(); __ b(L); 0x00003fff80075a24: li r0,0 ;; 2188: lep = __ pc(); __ push_l(); __ b(L); 0x00003fff80075a28: std r0,0(r15) 0x00003fff80075a2c: std r17,-8(r15) 0x00003fff80075a30: addi r15,r15,-16 0x00003fff80075a34: b 0x00003fff80075a40 ;; 2188: lep = __ pc(); __ push_l(); __ b(L); 0x00003fff80075a38: stw r17,0(r15) ;; 2189: __ align(32, 12, 24); // align L ;; 2191: iep = __ pc(); __ push_i(); 0x00003fff80075a3c: addi r15,r15,-8 0x00003fff80075a40: li r21,1 ;; 2192: vep = __ pc(); ;; 2193: __ bind(L); ;;@FILE: src/hotspot/share/interpreter/templateInterpreterGenerator.cpp ;; 366: __ verify_FPU(1, t->tos_in()); ;;@FILE: src/hotspot/cpu/ppc/templateTable_ppc_64.cpp ;; 2293: __ load_resolved_indy_entry(cache, index); 0x00003fff80075a44: lwax r21,r14,r21 0x00003fff80075a48: nand r21,r21,r21 0x00003fff80075a4c: ld r31,40(r27) 0x00003fff80075a50: rldicr r21,r21,4,59 0x00003fff80075a54: addi r21,r21,8 0x00003fff80075a58: add r31,r31,r21 0x00003fff80075a5c: ld r22,0(r31) ;; 2294: __ ld_ptr(method, in_bytes(ResolvedIndyEntry::method_offset()), cache); 0x00003fff80075a60: cmpdi r22,0 ;; 2297: __ cmpdi(CCR0, method, 0); 0x00003fff80075a64: bne- 0x00003fff80075b94 ;; 2298: __ bne(CCR0, resolved);,bo=0b00100[no_hint] 0x00003fff80075a68: li r4,186 ;; 2304: __ li(R4_ARG2, code); 0x00003fff80075a6c: ld r11,0(r1) ;; 2305: __ call_VM(noreg, entry, R4_ARG2, true); ------------- PR: https://git.openjdk.org/jdk/pull/12778 From prappo at openjdk.org Tue Mar 7 15:35:51 2023 From: prappo at openjdk.org (Pavel Rappo) Date: Tue, 7 Mar 2023 15:35:51 GMT Subject: Integrated: 8303480: Miscellaneous fixes to mostly invisible doc comments In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 12:03:44 GMT, Pavel Rappo wrote: > Please review this superficial documentation cleanup that was triggered by unrelated analysis of doc comments in JDK API. > > The only effect that this multi-area PR has on the JDK API Documentation (i.e. the observable effect on the generated HTML pages) can be summarized as follows: > > > diff -ur build/macosx-aarch64/images/docs-before/api/serialized-form.html build/macosx-aarch64/images/docs-after/api/serialized-form.html > --- build/macosx-aarch64/images/docs-before/api/serialized-form.html 2023-03-02 11:47:44 > +++ build/macosx-aarch64/images/docs-after/api/serialized-form.html 2023-03-02 11:48:45 > @@ -17084,7 +17084,7 @@ > throws IOException, > ClassNotFoundException >
readObject is called to restore the state of the > - (@code BasicPermission} from a stream.
> + BasicPermission from a stream. >
>
Parameters:
>
s - the ObjectInputStream from which data is read
> > Notes > ----- > > * I'm not an expert in any of the affected areas, except for jdk.javadoc, and I was merely after misused tags. Because of that, I would appreciate reviews from experts in other areas. > * I discovered many more issues than I included in this PR. The excluded issues seem to occur in infrequently updated third-party code (e.g. javax.xml), which I assume we shouldn't touch unless necessary. > * I will update copyright years after (and if) the fix had been approved, as required. This pull request has now been integrated. Changeset: 45a616a8 Author: Pavel Rappo URL: https://git.openjdk.org/jdk/commit/45a616a891e4a4b0e77b1f2fa040522f4a99d172 Stats: 75 lines in 39 files changed: 0 ins; 0 del; 75 mod 8303480: Miscellaneous fixes to mostly invisible doc comments Reviewed-by: mullan, prr, cjplummer, aivanov, jjg, lancea, rriggs, ihse ------------- PR: https://git.openjdk.org/jdk/pull/12826 From rehn at openjdk.org Tue Mar 7 16:42:47 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 7 Mar 2023 16:42:47 GMT Subject: RFR: 8300926: Several startup regressions ~6-70% in 21-b6 all platforms [v8] In-Reply-To: References: Message-ID: On Tue, 7 Mar 2023 13:47:55 GMT, Robbin Ehn wrote: >> Hi all, please consider. >> >> The original issue was when thread 1 asked to deopt nmethod set X and thread 2 asked for the same or a subset of X. >> All method will already be marked, but the actual deoptimizing, not entrant, patching PC on stacks and patching post call nops, was not done yet. Which meant thread 2 could 'pass' thread 1. >> Most places did deopt under Compile_lock, thus this is not an issue, but WB and clearCallSiteContext do not. >> >> Since a handshakes may take long before completion and Compile_lock is used for so much more than deopts. >> The fix in https://bugs.openjdk.org/browse/JDK-8299074 instead always emitted a handshake even when everything was already marked. (instead of adding Compile_lock to all places) >> >> This turnout to be problematic in the startup, for example the number of deopt handshakes in jetty dry run startup went from 5 to 39 handshakes. >> >> This fix first adds a barrier for which you do not pass until the requested deopts have happened and it coalesces the handshakes. >> Secondly it moves handshakes part out of the Compile_lock where it is possible. >> >> Which means we fix the performance bug and we reduce the contention on Compile_lock, meaning higher throughput in compiler and things such as class-loading. >> >> It passes t1-t7 with flying colours! t8 still not completed and I'm redoing some testing due to last minute simplifications. >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request incrementally with two additional commits since the last revision: > > - Grab lock so code() is stable > - Non CHA based vtables fix Running t1-t7, green so far. I think I have address everything? ------------- PR: https://git.openjdk.org/jdk/pull/12585 From prappo at openjdk.org Tue Mar 7 15:35:51 2023 From: prappo at openjdk.org (Pavel Rappo) Date: Tue, 7 Mar 2023 15:35:51 GMT Subject: Integrated: 8303480: Miscellaneous fixes to mostly invisible doc comments In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 12:03:44 GMT, Pavel Rappo wrote: > Please review this superficial documentation cleanup that was triggered by unrelated analysis of doc comments in JDK API. > > The only effect that this multi-area PR has on the JDK API Documentation (i.e. the observable effect on the generated HTML pages) can be summarized as follows: > > > diff -ur build/macosx-aarch64/images/docs-before/api/serialized-form.html build/macosx-aarch64/images/docs-after/api/serialized-form.html > --- build/macosx-aarch64/images/docs-before/api/serialized-form.html 2023-03-02 11:47:44 > +++ build/macosx-aarch64/images/docs-after/api/serialized-form.html 2023-03-02 11:48:45 > @@ -17084,7 +17084,7 @@ > throws IOException, > ClassNotFoundException >
readObject is called to restore the state of the > - (@code BasicPermission} from a stream.
> + BasicPermission from a stream. >
>
Parameters:
>
s - the ObjectInputStream from which data is read
> > Notes > ----- > > * I'm not an expert in any of the affected areas, except for jdk.javadoc, and I was merely after misused tags. Because of that, I would appreciate reviews from experts in other areas. > * I discovered many more issues than I included in this PR. The excluded issues seem to occur in infrequently updated third-party code (e.g. javax.xml), which I assume we shouldn't touch unless necessary. > * I will update copyright years after (and if) the fix had been approved, as required. This pull request has now been integrated. Changeset: 45a616a8 Author: Pavel Rappo URL: https://git.openjdk.org/jdk/commit/45a616a891e4a4b0e77b1f2fa040522f4a99d172 Stats: 75 lines in 39 files changed: 0 ins; 0 del; 75 mod 8303480: Miscellaneous fixes to mostly invisible doc comments Reviewed-by: mullan, prr, cjplummer, aivanov, jjg, lancea, rriggs, ihse ------------- PR: https://git.openjdk.org/jdk/pull/12826 From never at openjdk.org Tue Mar 7 17:32:13 2023 From: never at openjdk.org (Tom Rodriguez) Date: Tue, 7 Mar 2023 17:32:13 GMT Subject: RFR: 8302452: [JVMCI] Export _poly1305_processBlocks, JfrThreadLocal fields to JVMCI compiler. In-Reply-To: References: Message-ID: On Tue, 14 Feb 2023 15:13:05 GMT, Yudi Zheng wrote: > This PR allows JVMCI compiler intrinsics to reuse the _poly1305_processBlocks stub and to update JfrThreadLocal fields on `Thread.setCurrentThread` events. Marked as reviewed by never (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/12560 From qamai at openjdk.org Tue Mar 7 18:34:01 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 7 Mar 2023 18:34:01 GMT Subject: RFR: 8303762: [vectorapi] Intrinsification of Vector.slice Message-ID: `Vector::slice` is a method at the top-level class of the Vector API that concatenates the 2 inputs into an intermediate composite and extracts a window equal to the size of the inputs into the result. It is used in vector conversion methods where the part number is not 0 to slice the parts to the correct positions. Slicing is also used in text processing such as utf8 and utf16 validation. x86 starting from SSSE3 has `palignr` which does vector slicing very efficiently. As a result, I think it is beneficial to add a C2 node for this operation as well as intrinsify `Vector::slice` method. A slice is currently implemented as `v2.rearrange(iota).blend(v1.rearrange(iota), blendMask)` which requires preparation of the index vector and the blending mask. Even with the preparations being hoisted out of the loops, microbenchmarks show improvement using the slice instrinsics. Some have tremendous increases in throughput due to the limitation that a mask of length 2 cannot currently be intrinsified, leading to falling back to the Java implementations. Please take a look and have some reviews. Thank you very much. ------------- Commit messages: - sse2, increase warmup - aesthetic - optimise 64B - add jmh - vector slice intrinsics Changes: https://git.openjdk.org/jdk/pull/12909/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12909&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303762 Stats: 1699 lines in 58 files changed: 1376 ins; 257 del; 66 mod Patch: https://git.openjdk.org/jdk/pull/12909.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12909/head:pull/12909 PR: https://git.openjdk.org/jdk/pull/12909 From qamai at openjdk.org Tue Mar 7 18:34:01 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 7 Mar 2023 18:34:01 GMT Subject: RFR: 8303762: [vectorapi] Intrinsification of Vector.slice In-Reply-To: References: Message-ID: On Tue, 7 Mar 2023 18:23:42 GMT, Quan Anh Mai wrote: > `Vector::slice` is a method at the top-level class of the Vector API that concatenates the 2 inputs into an intermediate composite and extracts a window equal to the size of the inputs into the result. It is used in vector conversion methods where the part number is not 0 to slice the parts to the correct positions. Slicing is also used in text processing such as utf8 and utf16 validation. x86 starting from SSSE3 has `palignr` which does vector slicing very efficiently. As a result, I think it is beneficial to add a C2 node for this operation as well as intrinsify `Vector::slice` method. > > A slice is currently implemented as `v2.rearrange(iota).blend(v1.rearrange(iota), blendMask)` which requires preparation of the index vector and the blending mask. Even with the preparations being hoisted out of the loops, microbenchmarks show improvement using the slice instrinsics. Some have tremendous increases in throughput due to the limitation that a mask of length 2 cannot currently be intrinsified, leading to falling back to the Java implementations. > > Please take a look and have some reviews. Thank you very much. Benchmark results: Before After Benchmark (size) Mode Cnt Score Error Score Error Units Change Byte128Vector.sliceBinaryConstant 1024 thrpt 5 5058.760 ? 2214.115 8315.263 ? 102.169 ops/ms +64.37% Byte256Vector.sliceBinaryConstant 1024 thrpt 5 6986.299 ? 1028.257 8440.387 ? 30.163 ops/ms +20.81% Byte64Vector.sliceBinaryConstant 1024 thrpt 5 2944.869 ? 849.548 5926.054 ? 493.146 ops/ms +101.23% ByteMaxVector.sliceBinaryConstant 1024 thrpt 5 7269.226 ? 366.246 8201.184 ? 309.539 ops/ms +12.82% Double128Vector.sliceBinaryConstant 1024 thrpt 5 10.204 ? 0.508 979.287 ? 19.991 ops/ms x95.97 Double256Vector.sliceBinaryConstant 1024 thrpt 5 868.085 ? 26.378 967.799 ? 10.224 ops/ms +11.49% DoubleMaxVector.sliceBinaryConstant 1024 thrpt 5 813.646 ? 74.468 978.150 ? 14.316 ops/ms +20.22% Float128Vector.sliceBinaryConstant 1024 thrpt 5 1297.281 ? 23.650 1850.995 ? 29.741 ops/ms +42.68% Float256Vector.sliceBinaryConstant 1024 thrpt 5 1796.121 ? 26.662 2011.362 ? 38.418 ops/ms +11.98% Float64Vector.sliceBinaryConstant 1024 thrpt 5 10.381 ? 0.194 1628.510 ? 8.752 ops/ms x156.87 FloatMaxVector.sliceBinaryConstant 1024 thrpt 5 1820.161 ? 26.802 1988.085 ? 41.835 ops/ms +9.23% Int128Vector.sliceBinaryConstant 1024 thrpt 5 1394.911 ? 40.815 1864.818 ? 33.792 ops/ms +33.69% Int256Vector.sliceBinaryConstant 1024 thrpt 5 1874.496 ? 60.541 1864.818 ? 33.792 ops/ms -0.52% Int64Vector.sliceBinaryConstant 1024 thrpt 5 10.942 ? 0.377 1621.849 ? 56.538 ops/ms x148.22 IntMaxVector.sliceBinaryConstant 1024 thrpt 5 1870.746 ? 40.665 2027.041 ? 25.880 ops/ms +8.35% Long128Vector.sliceBinaryConstant 1024 thrpt 5 10.595 ? 0.306 991.969 ? 15.033 ops/ms x93.63 Long256Vector.sliceBinaryConstant 1024 thrpt 5 815.689 ? 12.243 989.365 ? 25.969 ops/ms +21.29% LongMaxVector.sliceBinaryConstant 1024 thrpt 5 822.060 ? 12.337 977.061 ? 31.968 ops/ms +18.86% Short128Vector.sliceBinaryConstant 1024 thrpt 5 3062.676 ? 124.796 3890.796 ? 326.767 ops/ms +27.04% Short256Vector.sliceBinaryConstant 1024 thrpt 5 3747.778 ? 119.356 4125.463 ? 33.602 ops/ms +10.08% Short64Vector.sliceBinaryConstant 1024 thrpt 5 1879.203 ? 69.160 2899.515 ? 57.870 ops/ms +54.29% ShortMaxVector.sliceBinaryConstant 1024 thrpt 5 3717.217 ? 48.876 4035.455 ? 102.725 ops/ms +8.56% ------------- PR: https://git.openjdk.org/jdk/pull/12909 From matsaave at openjdk.org Tue Mar 7 18:46:43 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 7 Mar 2023 18:46:43 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry In-Reply-To: References: Message-ID: <6-fIr9UsLHUJXqqwQNVvQrL-Q6MP_UoZAL0W7ZLDHb8=.da0c0246-cd98-4309-9247-b792212f6021@github.com> On Tue, 7 Mar 2023 14:10:33 GMT, Coleen Phillimore wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HotSpotConstantPool.java line 935: > >> 933: /*if (isInvokedynamicIndex(cpi)) { >> 934: compilerToVM().resolveInvokeDynamicInPool(this, cpi); >> 935: }*/ > > Is there something to fix here? That's a vestige of old code that I don't believe is necessary anymore. Invokedynamic is resolved further up so that can be removed. I think it makes sense to leave the invokedynamic case for completeness, but it will be left blank. ------------- PR: https://git.openjdk.org/jdk/pull/12778 From yzheng at openjdk.org Tue Mar 7 18:47:56 2023 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 7 Mar 2023 18:47:56 GMT Subject: Integrated: 8302452: [JVMCI] Export _poly1305_processBlocks, JfrThreadLocal fields to JVMCI compiler. In-Reply-To: References: Message-ID: On Tue, 14 Feb 2023 15:13:05 GMT, Yudi Zheng wrote: > This PR allows JVMCI compiler intrinsics to reuse the _poly1305_processBlocks stub and to update JfrThreadLocal fields on `Thread.setCurrentThread` events. This pull request has now been integrated. Changeset: 4d4eadea Author: Yudi Zheng Committer: Doug Simon URL: https://git.openjdk.org/jdk/commit/4d4eadeae320722191feaf8022a04461232ae95b Stats: 12 lines in 3 files changed: 12 ins; 0 del; 0 mod 8302452: [JVMCI] Export _poly1305_processBlocks, JfrThreadLocal fields to JVMCI compiler. Reviewed-by: dnsimon, never ------------- PR: https://git.openjdk.org/jdk/pull/12560 From matsaave at openjdk.org Tue Mar 7 19:16:35 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 7 Mar 2023 19:16:35 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry In-Reply-To: References: Message-ID: The message from this sender included one or more files which could not be scanned for virus detection; do not open these files unless you are certain of the sender's intent. ---------------------------------------------------------------------- On Tue, 7 Mar 2023 13:35:18 GMT, Coleen Phillimore wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > src/hotspot/cpu/x86/templateTable_x86.cpp line 2801: > >> 2799: bool is_invokevirtual, >> 2800: bool is_invokevfinal, /*unused*/ >> 2801: bool is_invokedynamic /*unused*/) { > > I assume you have to keep this parameter for the platform that doesn't still have this change (s390)? That's correct, this method is declared inside the hpp used by all platforms, so the parameters can't be changed until all the ports are complete. ------------- PR: https://git.openjdk.org/jdk/pull/12778 From matsaave at openjdk.org Tue Mar 7 19:21:23 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 7 Mar 2023 19:21:23 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry In-Reply-To: References: Message-ID: On Tue, 7 Mar 2023 13:39:50 GMT, Coleen Phillimore wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > src/hotspot/share/ci/ciReplay.cpp line 419: > >> 417: be used to avoid multiple blocks of similar code. When CPCE is obsoleted >> 418: these can be removed >> 419: */ > > I don't know if you really need this comment. If so, use // style instead. I think it's worth keeping around as a reminder for cleanup down the line since it's easy to overlook. I will change it to // style. ------------- PR: https://git.openjdk.org/jdk/pull/12778 From matsaave at openjdk.org Tue Mar 7 19:29:01 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 7 Mar 2023 19:29:01 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry In-Reply-To: References: Message-ID: On Tue, 7 Mar 2023 14:00:19 GMT, Coleen Phillimore wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > src/hotspot/share/oops/cpCache.cpp line 727: > >> 725: set_reference_map(nullptr); >> 726: #if INCLUDE_CDS >> 727: if (_initial_entries != nullptr) { > > @iklam with moving invokedynamic entries out, do you still need to save initialized entries ? Does invokehandle need this? (Should have separate RFE if more cleanup is possible) This along with the previous comment about `_invokedynamic_references_map` would probably be better suited for their own RFE. I think the scope of this PR should be limited to the indy structure and its implementation, so any changes related to invokehandle can be traced more easily. ------------- PR: https://git.openjdk.org/jdk/pull/12778 From rkennke at openjdk.org Tue Mar 7 20:34:57 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 7 Mar 2023 20:34:57 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v13] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix interpreter asymmetric fast-locking ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/ed611b0b..9d4ca05f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=11-12 Stats: 3 lines in 2 files changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From dnsimon at openjdk.org Tue Mar 7 20:52:16 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 7 Mar 2023 20:52:16 GMT Subject: RFR: 8303431: [JVMCI] libgraal annotation API [v3] In-Reply-To: References: Message-ID: On Sun, 5 Mar 2023 22:37:38 GMT, Doug Simon wrote: >> This PR extends JVMCI with new API (`jdk.vm.ci.meta.Annotated`) for accessing annotations. The main differences from `java.lang.reflect.AnnotatedElement` are: >> * Each `Annotated` method explicitly specifies the annotation type(s) for which it wants annotation data. That is, there is no direct equivalent of `AnnotatedElement.getAnnotations()`. >> * Annotation data is returned in a map-like object (of type `jdk.vm.ci.meta.AnnotationData`) instead of in an `Annotation` object. This works better for libgraal as it avoids the need for annotation types to be loaded and included in libgraal. >> >> To demonstrate the new API, here's an example in terms `java.lang.reflect.AnnotatedElement` (which `ResolvedJavaType` implements): >> >> ResolvedJavaMethod method = ...; >> ExplodeLoop a = method.getAnnotation(ExplodeLoop.class); >> return switch (a.kind()) { >> case FULL_UNROLL -> LoopExplosionKind.FULL_UNROLL; >> case FULL_UNROLL_UNTIL_RETURN -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; >> ... >> } >> >> >> The same code using the new API: >> >> >> ResolvedJavaMethod method = ...; >> ResolvedJavaType explodeLoopType = ...; >> AnnotationData a = method.getAnnotationDataFor(explodeLoopType); >> return switch (a.getEnum("kind").getName()) { >> case "FULL_UNROLL" -> LoopExplosionKind.FULL_UNROLL; >> case "FULL_UNROLL_UNTIL_RETURN" -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; >> ... >> } >> >> >> The implementation relies on new methods in `jdk.internal.vm.VMSupport` for parsing annotations and serializing/deserializing to/from a byte array. This allows the annotation data to be passed from the HotSpot heap to the libgraal heap. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > fixed whitespace This PR still needs work. I'll re-open it when ready. ------------- PR: https://git.openjdk.org/jdk/pull/12810 From dnsimon at openjdk.org Tue Mar 7 20:52:17 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 7 Mar 2023 20:52:17 GMT Subject: Withdrawn: 8303431: [JVMCI] libgraal annotation API In-Reply-To: References: Message-ID: On Wed, 1 Mar 2023 18:07:34 GMT, Doug Simon wrote: > This PR extends JVMCI with new API (`jdk.vm.ci.meta.Annotated`) for accessing annotations. The main differences from `java.lang.reflect.AnnotatedElement` are: > * Each `Annotated` method explicitly specifies the annotation type(s) for which it wants annotation data. That is, there is no direct equivalent of `AnnotatedElement.getAnnotations()`. > * Annotation data is returned in a map-like object (of type `jdk.vm.ci.meta.AnnotationData`) instead of in an `Annotation` object. This works better for libgraal as it avoids the need for annotation types to be loaded and included in libgraal. > > To demonstrate the new API, here's an example in terms `java.lang.reflect.AnnotatedElement` (which `ResolvedJavaType` implements): > > ResolvedJavaMethod method = ...; > ExplodeLoop a = method.getAnnotation(ExplodeLoop.class); > return switch (a.kind()) { > case FULL_UNROLL -> LoopExplosionKind.FULL_UNROLL; > case FULL_UNROLL_UNTIL_RETURN -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; > ... > } > > > The same code using the new API: > > > ResolvedJavaMethod method = ...; > ResolvedJavaType explodeLoopType = ...; > AnnotationData a = method.getAnnotationDataFor(explodeLoopType); > return switch (a.getEnum("kind").getName()) { > case "FULL_UNROLL" -> LoopExplosionKind.FULL_UNROLL; > case "FULL_UNROLL_UNTIL_RETURN" -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; > ... > } > > > The implementation relies on new methods in `jdk.internal.vm.VMSupport` for parsing annotations and serializing/deserializing to/from a byte array. This allows the annotation data to be passed from the HotSpot heap to the libgraal heap. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/12810 From pchilanomate at openjdk.org Tue Mar 7 22:31:11 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 7 Mar 2023 22:31:11 GMT Subject: RFR: 8302779: HelidonAppTest.java fails with "assert(_cb == CodeCache::find_blob(pc())) failed: Must be the same" or SIGSEGV Message-ID: Please review the following fix. The Method instance representing Continuation.enterSpecial() is replaced by a new Method during redefinition of the Continuation class. The already existing nmethod for it is not used, but a new one will be generated the first time enterSpecial() is resolved after redefinition. This means we could have more than one nmethod representing enterSpecial(), in particular, one generated before redefinition took place, and one after it. Now, when walking the stack, if we found a return barrier pc (as in Continuation::is_return_barrier_entry()) and we want to keep walking the physical stack then we know the sender will be the enterSpecial frame so we create it by calling ContinuationEntry::to_frame(). This method assumes there can only be one nmethod associated with enterSpecial() so we hit an assert later on. See the bug for more details of the crash. As I mentioned in the bug we don't need to rely on this assumption since we can re-read the updated value from _enter_special. But reading both _enter_special and _return_pc means we would need some kind of synchronization since to_frame() could be called concurrently with set_enter_code(). To avoid that we could just read _return_pc and calculate the blob from it each time, but I'm also assuming that overhead is undesired and that's why the static variable was introduced. Alternatively _enter_special could be read and _return_pc could be derived from it (by adding an extra field in the nmethod class). But if we go this route I think we would need to do a small fix on thaw too. After redefinition and before a new call to resolve enterSpecial(), the last thaw call for some virtual thread would create an entry frame with an old _return_pc (see ThawBase::new_entry_frame() and ThawBase::patch_return()). I'm not sure about the lifecycle of the old CodeBlob but it seems it could have been already removed if enterSpecial was not found while traversing everybody's stack. Maybe there are more issues. The simple solution implemented here is just to disallow redefinition of the Continuation class altogether. Another less restrictive option would be to keep the already generated enterSpecial nmethod, if there is one. I can also investigate one of the routes mentioned previously if desired. I tested the fix with the simple reproducer I added to the bug and also with the previously crashing HelidonAppTest.java test. Thanks, Patricio ------------- Commit messages: - v1 Changes: https://git.openjdk.org/jdk/pull/12911/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12911&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8302779 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/12911.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12911/head:pull/12911 PR: https://git.openjdk.org/jdk/pull/12911 From darcy at openjdk.org Tue Mar 7 22:31:42 2023 From: darcy at openjdk.org (Joe Darcy) Date: Tue, 7 Mar 2023 22:31:42 GMT Subject: Integrated: JDK-8302801: Remove fdlibm C sources In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 05:54:52 GMT, Joe Darcy wrote: > While the review of https://github.com/openjdk/jdk/pull/12800 finishes up, I thought I'd get out for the review the next phase of the FDLIBM port: removing the FDLIBM C sources from the repo. > > A repo with the changes for JDK-8302027 and this PR successful build on the default set of platform and successfully run tier 1 tests, which includes tests of the math library. > > There are a few remaining references to the case-independent string "fdlibm" in the make directory and HotSpot sources. HotSpot contains a partial fork for FDLIBM (a tine of FDLIBM?) to use for intrinsics. The remaining make machinery contains logic to determine what set of gcc options can be used for the compile. > > The intention of this change is to remove use of FDLIBM for the core libraries. This pull request has now been integrated. Changeset: b5b5cba7 Author: Joe Darcy URL: https://git.openjdk.org/jdk/commit/b5b5cba7feb0e7ef957fd6bef1e591fdb6fdaa9f Stats: 6643 lines in 65 files changed: 20 ins; 6613 del; 10 mod 8302801: Remove fdlibm C sources Reviewed-by: bpb, dholmes, alanb, kvn ------------- PR: https://git.openjdk.org/jdk/pull/12821 From dholmes at openjdk.org Tue Mar 7 23:00:25 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 7 Mar 2023 23:00:25 GMT Subject: Integrated: 8286781: Replace the deprecated/obsolete gethostbyname and inet_addr calls In-Reply-To: References: Message-ID: The message from this sender included one or more files which could not be scanned for virus detection; do not open these files unless you are certain of the sender's intent. ---------------------------------------------------------------------- On Thu, 2 Mar 2023 22:35:17 GMT, David Holmes wrote: > We can replace `gethostbyname`, which is deprecated on Windows and Linux, with `getaddrinfo`. This API is available on all supported platforms and so can be placed in shared code. @djelinski pointed out that `getaddrinfo` can resolve both IP addresses and host names so the two step approach used in `networkStream::connect` is not necessary and we can do away with `os::get_host_by_name()` completely. > > The build is updated to enable winsock deprecation warnings, and now we need to use `ws2_32.lib` we can drop `wsock32.lib` (as it is basically a subset - again thanks @djelinski ). > > Testing > - all Oracle builds in tiers 1-5 > - All GHA builds > > The actual code change has to be manually tested because the code is only used by Ideal Graph Printing to connect to the Ideal Graph Visualizer. I've manually tested on Windows and Linux and @tobiasholenstein tested macOS. > > Thanks. This pull request has now been integrated. Changeset: d7298245 Author: David Holmes URL: https://git.openjdk.org/jdk/commit/d7298245d6759f62e253b5cf0df975db17fdbf82 Stats: 41 lines in 6 files changed: 13 ins; 17 del; 11 mod 8286781: Replace the deprecated/obsolete gethostbyname and inet_addr calls Co-authored-by: Daniel Jeli?ski Reviewed-by: kbarrett, djelinski ------------- PR: https://git.openjdk.org/jdk/pull/12842 From inakonechnyy at openjdk.org Tue Mar 7 23:54:07 2023 From: inakonechnyy at openjdk.org (Ilarion Nakonechnyy) Date: Tue, 7 Mar 2023 23:54:07 GMT Subject: RFR: 8302491: NoClassDefFoundError omits the original cause of an error [v6] In-Reply-To: References: Message-ID: > The proposed approach added a new function for getting the cause of an exception -`java_lang_Throwable::get_cause_simple `, that gets called within `InstanceKlass::add_initialization_error` if an old one `java_lang_Throwable::get_cause_with_stack_trace` didn't succeed because of an exception during the VM call. The simple function doesn't call the VM for getting a stack trace but fills in any other information about an exception. > > Besides that, the discovering information about an exception was added to `ConstantPoolCacheEntry::save_and_throw_indy_exc` function. > > Jtreg for reproducing the issue also was added to the commit. > The commit was tested with tier1 tests. Ilarion Nakonechnyy has updated the pull request incrementally with one additional commit since the last revision: Address a review notes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12566/files - new: https://git.openjdk.org/jdk/pull/12566/files/adf139fa..bd4df11d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12566&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12566&range=04-05 Stats: 21 lines in 2 files changed: 6 ins; 2 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/12566.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12566/head:pull/12566 PR: https://git.openjdk.org/jdk/pull/12566 From inakonechnyy at openjdk.org Tue Mar 7 23:54:08 2023 From: inakonechnyy at openjdk.org (Ilarion Nakonechnyy) Date: Tue, 7 Mar 2023 23:54:08 GMT Subject: RFR: 8302491: NoClassDefFoundError omits the original cause of an error [v5] In-Reply-To: <0UEtpYm8TvljAdj3FQI5LYfjHxXPBWBfFD0lDsLojJA=.d259d5df-24b8-4b93-8fb3-feafc696deab@github.com> References: <0UEtpYm8TvljAdj3FQI5LYfjHxXPBWBfFD0lDsLojJA=.d259d5df-24b8-4b93-8fb3-feafc696deab@github.com> Message-ID: On Mon, 6 Mar 2023 20:28:40 GMT, Ilarion Nakonechnyy wrote: >> The proposed approach added a new function for getting the cause of an exception -`java_lang_Throwable::get_cause_simple `, that gets called within `InstanceKlass::add_initialization_error` if an old one `java_lang_Throwable::get_cause_with_stack_trace` didn't succeed because of an exception during the VM call. The simple function doesn't call the VM for getting a stack trace but fills in any other information about an exception. >> >> Besides that, the discovering information about an exception was added to `ConstantPoolCacheEntry::save_and_throw_indy_exc` function. >> >> Jtreg for reproducing the issue also was added to the commit. >> The commit was tested with tier1 tests. > > Ilarion Nakonechnyy has updated the pull request incrementally with one additional commit since the last revision: > > 1/ create_initialization_error(): return empty exception, if > EIIE creation failed. > 2/ remove testcase Thank you for the review. ------------- PR: https://git.openjdk.org/jdk/pull/12566 From inakonechnyy at openjdk.org Tue Mar 7 23:54:13 2023 From: inakonechnyy at openjdk.org (Ilarion Nakonechnyy) Date: Tue, 7 Mar 2023 23:54:13 GMT Subject: RFR: 8302491: NoClassDefFoundError omits the original cause of an error [v4] In-Reply-To: References: <0UEtpYm8TvljAdj3FQI5LYfjHxXPBWBfFD0lDsLojJA=.d259d5df-24b8-4b93-8fb3-feafc696deab@github.com> Message-ID: On Tue, 7 Mar 2023 02:03:54 GMT, David Holmes wrote: >> Ilarion Nakonechnyy has updated the pull request incrementally with four additional commits since the last revision: >> >> - reverting changes in cpCache.cpp >> - Redesigned get_cause as create_initialization_error. >> Corrected naming in testcase >> - jcheck corrections >> - Removed VM.compMode from test > > src/hotspot/share/oops/instanceKlass.cpp line 987: > >> 985: MutexLocker ml(THREAD, ClassInitError_lock); >> 986: OopHandle elem = OopHandle(Universe::vm_global(), cause()); >> 987: bool created; > > We still need to return if we got null. Thanks, I was mistaken about my corrections in calling `JavaCalls::call_virtual(`, and thought that create_initialization_error always return a non-null handle() ------------- PR: https://git.openjdk.org/jdk/pull/12566 From psandoz at openjdk.org Wed Mar 8 00:32:17 2023 From: psandoz at openjdk.org (Paul Sandoz) Date: Wed, 8 Mar 2023 00:32:17 GMT Subject: RFR: 8303762: [vectorapi] Intrinsification of Vector.slice In-Reply-To: References: Message-ID: <_PA9oL9dVd3Yrg0sXw3m0uwfGjP6TuqXGBm5M090GHM=.a09a8733-e59e-4b6d-a6a6-e518a8518450@github.com> On Tue, 7 Mar 2023 18:23:42 GMT, Quan Anh Mai wrote: > `Vector::slice` is a method at the top-level class of the Vector API that concatenates the 2 inputs into an intermediate composite and extracts a window equal to the size of the inputs into the result. It is used in vector conversion methods where the part number is not 0 to slice the parts to the correct positions. Slicing is also used in text processing such as utf8 and utf16 validation. x86 starting from SSSE3 has `palignr` which does vector slicing very efficiently. As a result, I think it is beneficial to add a C2 node for this operation as well as intrinsify `Vector::slice` method. > > A slice is currently implemented as `v2.rearrange(iota).blend(v1.rearrange(iota), blendMask)` which requires preparation of the index vector and the blending mask. Even with the preparations being hoisted out of the loops, microbenchmarks show improvement using the slice instrinsics. Some have tremendous increases in throughput due to the limitation that a mask of length 2 cannot currently be intrinsified, leading to falling back to the Java implementations. > > Please take a look and have some reviews. Thank you very much. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 2289: > 2287: getClass(), byte.class, length(), > 2288: this, that, origin, > 2289: new VectorSliceOp() { Change from inner class to lambda expression? ------------- PR: https://git.openjdk.org/jdk/pull/12909 From psandoz at openjdk.org Wed Mar 8 00:49:16 2023 From: psandoz at openjdk.org (Paul Sandoz) Date: Wed, 8 Mar 2023 00:49:16 GMT Subject: RFR: 8303762: [vectorapi] Intrinsification of Vector.slice In-Reply-To: References: Message-ID: <_btCmeotboVIVWcIbHksJAaRcJO5aFl0CPVRnqpkuj0=.e3405352-fa81-4707-babe-25061abd99c5@github.com> On Tue, 7 Mar 2023 18:23:42 GMT, Quan Anh Mai wrote: > `Vector::slice` is a method at the top-level class of the Vector API that concatenates the 2 inputs into an intermediate composite and extracts a window equal to the size of the inputs into the result. It is used in vector conversion methods where the part number is not 0 to slice the parts to the correct positions. Slicing is also used in text processing such as utf8 and utf16 validation. x86 starting from SSSE3 has `palignr` which does vector slicing very efficiently. As a result, I think it is beneficial to add a C2 node for this operation as well as intrinsify `Vector::slice` method. > > A slice is currently implemented as `v2.rearrange(iota).blend(v1.rearrange(iota), blendMask)` which requires preparation of the index vector and the blending mask. Even with the preparations being hoisted out of the loops, microbenchmarks show improvement using the slice instrinsics. Some have tremendous increases in throughput due to the limitation that a mask of length 2 cannot currently be intrinsified, leading to falling back to the Java implementations. > > Please take a look and have some reviews. Thank you very much. test/hotspot/jtreg/compiler/vectorapi/TestVectorSlice.java line 65: > 63: Asserts.assertEquals(expected, dst[i][j]); > 64: } > 65: } It should be possible to factor out this code into something like this: assertOffsets(length, (expected, i, j) -> Assert.assertEquals((byte)expected, dst[i][j]) test/hotspot/jtreg/compiler/vectorapi/TestVectorSlice.java line 68: > 66: > 67: length = 16; > 68: testB128(dst, src1, src2); Should `dst` be zeroed before the next call? or maybe easier to just reallocate. test/jdk/jdk/incubator/vector/templates/Kernel-Slice-bop-const.template line 1: > 1: $type$[] a = fa.apply(SPECIES.length()); Forgot to commit the updated unit tests? ------------- PR: https://git.openjdk.org/jdk/pull/12909 From david.holmes at oracle.com Wed Mar 8 00:56:31 2023 From: david.holmes at oracle.com (David Holmes) Date: Wed, 8 Mar 2023 10:56:31 +1000 Subject: Integrated: JDK-8302801: Remove fdlibm C sources In-Reply-To: References: Message-ID: Please note this fix has a problem with a missing definition for IEEEremainder that is causing UnsatisfiedLinkError. You can expect some significant noise in testing above tier 1 until this is fixed - which will hopefully be in the next 30 minutes or so. Otherwise a Backout will be performed. David On 8/03/2023 8:31 am, Joe Darcy wrote: > On Thu, 2 Mar 2023 05:54:52 GMT, Joe Darcy wrote: > >> While the review of https://github.com/openjdk/jdk/pull/12800 finishes up, I thought I'd get out for the review the next phase of the FDLIBM port: removing the FDLIBM C sources from the repo. >> >> A repo with the changes for JDK-8302027 and this PR successful build on the default set of platform and successfully run tier 1 tests, which includes tests of the math library. >> >> There are a few remaining references to the case-independent string "fdlibm" in the make directory and HotSpot sources. HotSpot contains a partial fork for FDLIBM (a tine of FDLIBM?) to use for intrinsics. The remaining make machinery contains logic to determine what set of gcc options can be used for the compile. >> >> The intention of this change is to remove use of FDLIBM for the core libraries. > > This pull request has now been integrated. > > Changeset: b5b5cba7 > Author: Joe Darcy > URL: https://git.openjdk.org/jdk/commit/b5b5cba7feb0e7ef957fd6bef1e591fdb6fdaa9f > Stats: 6643 lines in 65 files changed: 20 ins; 6613 del; 10 mod > > 8302801: Remove fdlibm C sources > > Reviewed-by: bpb, dholmes, alanb, kvn > > ------------- > > PR: https://git.openjdk.org/jdk/pull/12821 From kbarrett at openjdk.org Wed Mar 8 02:16:14 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 8 Mar 2023 02:16:14 GMT Subject: RFR: 8302189: Mark assertion failures noreturn [v2] In-Reply-To: References: <4gLeUT6_5s6WpG-b-0146d490ZRgBq_JON1-5P3Wgtk=.e0bc3fd3-fce0-47b8-9478-1de796128731@github.com> Message-ID: On Tue, 7 Mar 2023 13:08:22 GMT, Coleen Phillimore wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> make Debugging::_enabled a nesting counter > > src/hotspot/share/utilities/debug.hpp line 108: > >> 106: // because we need a fallback when we don't have any mechanism for detecting >> 107: // constant evaluation. >> 108: #if defined(TARGET_COMPILER_gcc) || defined(TARGET_COMPILER_xlc) > > All this seems like it should go in COMPILER_HEADER(globalDefinitions.hpp) but since globalDefinitions.hpp includes debug.hpp, you can't do this. Can we file an RFE to clean this up (if possible)? I considered that, and had two reasons for rejecting it. The include ordering problem was one. There are uses of assert and the like in globalDefinitions. Some of those might someday be moved elsewhere, but for now the order is a problem. The other is that this is really not "general purpose", but rather very specifically tailored to the use here. And it can't be general purpose, because there isn't a general fallback for testing for constexpr evaluation. I could have added `HAS_IS_CONSTANT_EVALUATED` and `::is_constant_evaluated()`, with all uses of the latter requiring protection by the former. But I couldn't think of any places where I would want to use that. Using it in, for example, count_trailing_zeros to allow it to be (sometimes) constexpr would make the question of whether it's constexpr be platform-dependent, which means it can't be used in shared required constexpr contexts. And it's even worse since the platform dependency is also currently compiler version dependent. I think that path leads nowhere good. However, I've filed this cleanup RFE: https://bugs.openjdk.org/browse/JDK-8303797 ------------- PR: https://git.openjdk.org/jdk/pull/12845 From kbarrett at openjdk.org Wed Mar 8 02:37:36 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 8 Mar 2023 02:37:36 GMT Subject: RFR: 8302189: Mark assertion failures noreturn [v3] In-Reply-To: <4gLeUT6_5s6WpG-b-0146d490ZRgBq_JON1-5P3Wgtk=.e0bc3fd3-fce0-47b8-9478-1de796128731@github.com> References: <4gLeUT6_5s6WpG-b-0146d490ZRgBq_JON1-5P3Wgtk=.e0bc3fd3-fce0-47b8-9478-1de796128731@github.com> Message-ID: > Also 8302799: Refactor Debugging variable usage for noreturn crash reporting Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into noreturn2 - make Debugging::_enabled a nesting counter - new implementation of Debugging - noreturn attributes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12845/files - new: https://git.openjdk.org/jdk/pull/12845/files/f296ab62..59295614 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12845&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12845&range=01-02 Stats: 18174 lines in 538 files changed: 7268 ins; 8749 del; 2157 mod Patch: https://git.openjdk.org/jdk/pull/12845.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12845/head:pull/12845 PR: https://git.openjdk.org/jdk/pull/12845 From kbarrett at openjdk.org Wed Mar 8 02:37:36 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 8 Mar 2023 02:37:36 GMT Subject: RFR: 8302189: Mark assertion failures noreturn [v2] In-Reply-To: References: <4gLeUT6_5s6WpG-b-0146d490ZRgBq_JON1-5P3Wgtk=.e0bc3fd3-fce0-47b8-9478-1de796128731@github.com> Message-ID: On Tue, 7 Mar 2023 01:36:56 GMT, David Holmes wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> make Debugging::_enabled a nesting counter > > Okay other than the one outstanding minor syntax query, I have nothing further to add. > > Thanks. Thanks for reviews @dholmes-ora and @coleenp ------------- PR: https://git.openjdk.org/jdk/pull/12845 From dholmes at openjdk.org Wed Mar 8 02:39:32 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 8 Mar 2023 02:39:32 GMT Subject: RFR: 8303799: [BACKOUT] JDK-8302801 Remove fdlibm C sources Message-ID: This reverts commit b5b5cba7feb0e7ef957fd6bef1e591fdb6fdaa9f. Thanks. ------------- Commit messages: - Revert "8302801: Remove fdlibm C sources" Changes: https://git.openjdk.org/jdk/pull/12916/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12916&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303799 Stats: 6643 lines in 65 files changed: 6613 ins; 20 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/12916.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12916/head:pull/12916 PR: https://git.openjdk.org/jdk/pull/12916 From darcy at openjdk.org Wed Mar 8 02:39:32 2023 From: darcy at openjdk.org (Joe Darcy) Date: Wed, 8 Mar 2023 02:39:32 GMT Subject: RFR: 8303799: [BACKOUT] JDK-8302801 Remove fdlibm C sources In-Reply-To: References: Message-ID: <5FoTd9vTbjbqt5elKqfE5JL8rN_HDV_2t1BzMGRGiCE=.4b7253e9-647e-43b9-8853-bfe2f66f0c4e@github.com> The message from this sender included one or more files which could not be scanned for virus detection; do not open these files unless you are certain of the sender's intent. ---------------------------------------------------------------------- On Wed, 8 Mar 2023 02:27:21 GMT, David Holmes wrote: > This reverts commit b5b5cba7feb0e7ef957fd6bef1e591fdb6fdaa9f. > > Thanks. Sorry for the noise (and missing IEEERemainer)! Thanks @dholmes-ora . ------------- Marked as reviewed by darcy (Reviewer). PR: https://git.openjdk.org/jdk/pull/12916 From bpb at openjdk.org Wed Mar 8 02:39:32 2023 From: bpb at openjdk.org (Brian Burkhalter) Date: Wed, 8 Mar 2023 02:39:32 GMT Subject: RFR: 8303799: [BACKOUT] JDK-8302801 Remove fdlibm C sources In-Reply-To: References: Message-ID: <4oeA0sOeKyTZtgAkE6avxboBskYMbaOE-oaV3fRNtHE=.2d5eda40-4689-46d1-90ed-022ce53f69fe@github.com> On Wed, 8 Mar 2023 02:27:21 GMT, David Holmes wrote: > This reverts commit b5b5cba7feb0e7ef957fd6bef1e591fdb6fdaa9f. > > Thanks. Marked as reviewed by bpb (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/12916 From kbarrett at openjdk.org Wed Mar 8 02:40:30 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 8 Mar 2023 02:40:30 GMT Subject: Integrated: 8302189: Mark assertion failures noreturn In-Reply-To: <4gLeUT6_5s6WpG-b-0146d490ZRgBq_JON1-5P3Wgtk=.e0bc3fd3-fce0-47b8-9478-1de796128731@github.com> References: <4gLeUT6_5s6WpG-b-0146d490ZRgBq_JON1-5P3Wgtk=.e0bc3fd3-fce0-47b8-9478-1de796128731@github.com> Message-ID: On Fri, 3 Mar 2023 04:04:25 GMT, Kim Barrett wrote: > Also 8302799: Refactor Debugging variable usage for noreturn crash reporting This pull request has now been integrated. Changeset: 5fa9bd45 Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/5fa9bd458232a0b5f31b1e7e5a4a2b1f4047da35 Stats: 165 lines in 8 files changed: 125 ins; 19 del; 21 mod 8302189: Mark assertion failures noreturn 8302799: Refactor Debugging variable usage for noreturn crash reporting Reviewed-by: dholmes, coleenp ------------- PR: https://git.openjdk.org/jdk/pull/12845 From dholmes at openjdk.org Wed Mar 8 02:40:04 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 8 Mar 2023 02:40:04 GMT Subject: RFR: 8303799: [BACKOUT] JDK-8302801 Remove fdlibm C sources In-Reply-To: <5FoTd9vTbjbqt5elKqfE5JL8rN_HDV_2t1BzMGRGiCE=.4b7253e9-647e-43b9-8853-bfe2f66f0c4e@github.com> References: <5FoTd9vTbjbqt5elKqfE5JL8rN_HDV_2t1BzMGRGiCE=.4b7253e9-647e-43b9-8853-bfe2f66f0c4e@github.com> Message-ID: On Wed, 8 Mar 2023 02:31:09 GMT, Joe Darcy wrote: >> This reverts commit b5b5cba7feb0e7ef957fd6bef1e591fdb6fdaa9f. >> >> Thanks. > > Sorry for the noise (and missing IEEERemainer)! Thanks @dholmes-ora . Thanks for the reviews @jddarcy and @bplb ! ------------- PR: https://git.openjdk.org/jdk/pull/12916 From dholmes at openjdk.org Wed Mar 8 02:42:18 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 8 Mar 2023 02:42:18 GMT Subject: Integrated: 8303799: [BACKOUT] JDK-8302801 Remove fdlibm C sources In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 02:27:21 GMT, David Holmes wrote: > This reverts commit b5b5cba7feb0e7ef957fd6bef1e591fdb6fdaa9f. > > Thanks. This pull request has now been integrated. Changeset: 21a6ab1e Author: David Holmes URL: https://git.openjdk.org/jdk/commit/21a6ab1e3ea5228a31955d58fe75e5ae66d1c6cd Stats: 6643 lines in 65 files changed: 6613 ins; 20 del; 10 mod 8303799: [BACKOUT] JDK-8302801 Remove fdlibm C sources Reviewed-by: darcy, bpb ------------- PR: https://git.openjdk.org/jdk/pull/12916 From iklam at openjdk.org Wed Mar 8 04:14:17 2023 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 8 Mar 2023 04:14:17 GMT Subject: RFR: JDK-8302989: Add missing INCLUDE_CDS checks [v4] In-Reply-To: References: Message-ID: On Fri, 3 Mar 2023 08:55:07 GMT, Matthias Baesken wrote: >> The cds only coding in hotspot is usually guarded with the INCLUDE_CDS macro so that it can be removed at compile time in case the correct configure flags are set. >> However at some places INCLUDE_CDS is missing and should be added. >> >> One question - should (additionally to the UseSharedSpaces code section) the DumpSharedSpaces code sections be guarded as well with INCLUDE_CDS macros ? > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > Adjust arguments handling Maybe we can keep the -Xshare flags and make their behavior consistent with builds that have CDS enabled: -Xshare:off - do nothing -Xshare:auto - do nothing (CDS cannot be used so it?s ?automatically? disabled) -Xshare:on - fail and report that CDS archive cannot be loaded. ------------- PR: https://git.openjdk.org/jdk/pull/12691 From dholmes at openjdk.org Wed Mar 8 09:30:25 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 8 Mar 2023 09:30:25 GMT Subject: RFR: JDK-8302989: Add missing INCLUDE_CDS checks [v4] In-Reply-To: References: Message-ID: On Tue, 7 Mar 2023 14:26:49 GMT, Matthias Baesken wrote: > Unfortunately with the latest revision of the patch, quite a few tests fail when the build is configured with --disable-cds . Yes that is to be expected. The tests have not be written to allow them to be run for arbitrary selection of the INCLUDE_XXX features. You will also have to exclude running all the CDS tests on such a build. Arguably any tests that set the -Xshare flag depend on CDS (positvely or negatively) and so should use appropriate @requires or runtime checks before using the flag. > Maybe we can keep the -Xshare flags and make their behavior consistent with builds that have CDS enabled This might help the tests but semantically this just seems wrong to me. It would be like accepting G1 flags when you only have SerialGC available. I think we are just demonstrating that fully conditionalizing CDS code requires some higher-level semantic issues to be dealt with and that it is probably not worth trying to do that. So that brings us back to whether it is worth even trying to make a small step from where we are towards more complete conditionalization? Seems an arbitrary line to set to me when the benefit is minimal. But if you want to revert to an earlier version of the patch that "works" then I won't object further. ------------- PR: https://git.openjdk.org/jdk/pull/12691 From mdoerr at openjdk.org Wed Mar 8 10:57:14 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 8 Mar 2023 10:57:14 GMT Subject: RFR: JDK-8303575: adjust Xen handling on Linux aarch64 In-Reply-To: References: Message-ID: On Fri, 3 Mar 2023 12:56:55 GMT, Matthias Baesken wrote: > After [JDK-8301050](https://bugs.openjdk.org/browse/JDK-8301050) the Xen handling on aarch64 should be slightly adjusted. > The output in VM_Version::print_platform_virtualization_info was missed and needs to be added for Xen. > Additionally a new XenPVHVM virtualization type could be introduced because this describes the Xen on aarch64 better. > See also https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/virtualization-on-arm-with-xen where the naming is used. LGTM. ------------- Marked as reviewed by mdoerr (Reviewer). PR: https://git.openjdk.org/jdk/pull/12853 From rkennke at openjdk.org Wed Mar 8 11:24:14 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 8 Mar 2023 11:24:14 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v14] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Use realloc instead of malloc+copy when growing the lock-stack ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/9d4ca05f..12c2b8c3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=12-13 Stats: 6 lines in 1 file changed: 0 ins; 5 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From stuefe at openjdk.org Wed Mar 8 11:40:01 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 8 Mar 2023 11:40:01 GMT Subject: RFR: JDK-8303817: Add constexpr for natural malloc alignment Message-ID: The message from this sender included one or more files which could not be scanned for virus detection; do not open these files unless you are certain of the sender's intent. ---------------------------------------------------------------------- I miss having an easy way to get the alignment the libc guarantees for malloc. Let's add this, and we can use this right away with NMT's MallocHeader. Tests: manually built and tested x64 linux (clang + GCC) and x86 linux. ------------- Commit messages: - JDK-8303817-constexpr-for-natural-malloc-alignment Changes: https://git.openjdk.org/jdk/pull/12921/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12921&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303817 Stats: 19 lines in 3 files changed: 17 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12921.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12921/head:pull/12921 PR: https://git.openjdk.org/jdk/pull/12921 From mbaesken at openjdk.org Wed Mar 8 11:42:17 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 8 Mar 2023 11:42:17 GMT Subject: RFR: JDK-8303575: adjust Xen handling on Linux aarch64 In-Reply-To: References: Message-ID: On Fri, 3 Mar 2023 12:56:55 GMT, Matthias Baesken wrote: > After [JDK-8301050](https://bugs.openjdk.org/browse/JDK-8301050) the Xen handling on aarch64 should be slightly adjusted. > The output in VM_Version::print_platform_virtualization_info was missed and needs to be added for Xen. > Additionally a new XenPVHVM virtualization type could be introduced because this describes the Xen on aarch64 better. > See also https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/virtualization-on-arm-with-xen where the naming is used. Hi Martin and Lutz, thanks for the reviews ! ------------- PR: https://git.openjdk.org/jdk/pull/12853 From mbaesken at openjdk.org Wed Mar 8 11:42:19 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 8 Mar 2023 11:42:19 GMT Subject: Integrated: JDK-8303575: adjust Xen handling on Linux aarch64 In-Reply-To: References: Message-ID: On Fri, 3 Mar 2023 12:56:55 GMT, Matthias Baesken wrote: > After [JDK-8301050](https://bugs.openjdk.org/browse/JDK-8301050) the Xen handling on aarch64 should be slightly adjusted. > The output in VM_Version::print_platform_virtualization_info was missed and needs to be added for Xen. > Additionally a new XenPVHVM virtualization type could be introduced because this describes the Xen on aarch64 better. > See also https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/virtualization-on-arm-with-xen where the naming is used. This pull request has now been integrated. Changeset: 8eaf84f0 Author: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/8eaf84f09476b08ed421efe74d7554e2b29bc5a7 Stats: 13 lines in 3 files changed: 5 ins; 0 del; 8 mod 8303575: adjust Xen handling on Linux aarch64 Reviewed-by: lucy, mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/12853 From mgronlun at openjdk.org Wed Mar 8 12:54:13 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 8 Mar 2023 12:54:13 GMT Subject: RFR: 8257967: JFR: Event for loaded agents Message-ID: Greetings, We are adding support to let JFR report on Agents. #### Design An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. A JavaAgent is an agent written in the Java programming language, using the APIs in the package [[ava.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: // Command line jdk.JavaAgent { startTime = 12:31:19.789 (2023-03-08) name = "JavaAgent.jar" options = "foo=bar" initialization = 12:31:15.574 (2023-03-08) initializationTime = 172 ms initializationMethod = "premain" } // Dynamic load jdk.JavaAgent { startTime = 12:31:31.158 (2023-03-08) name = "JavaAgent.jar" options = "bar=baz" initialization = 12:31:31.037 (2023-03-08) initializationTime = 64,1 ms initializationMethod = "agentmain" } The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar The event will also detail which initialization method was invoked by the JVM, "premain" for command line agents, and "agentmain" for agents loaded dynamically. "initialization" is the timestamp the JVM invoked the initialization method, and "initializationTime" is the duration of executing the initialization method. "startTime" represents the time the JFR framework issued the periodic event; hence "initialization" will be earlier than "startTime". An agent can also be written in a native programming language, using either the JVM Tools Interface (JVMTI) or JVM Profiling Interface (JVMPI). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. JVMTI standard spec:ification is [here](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html) JVMPI is an older interface, not a standard and is considered superseded by JVMTI, but the support is still in the JVM for agents started on the command line: -XRunMyAgent.jar To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: jdk.NativeAgent { startTime = 12:31:40.398 (2023-03-08) name = "jdwp" options = "transport=dt_socket,server=y,address=any,onjcmd=y" path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" dynamic = false initialization = 12:31:36.142 (2023-03-08) initializationTime = 0,00184 ms } The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported, and there is also a denotation if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true). The initialization of a native agent is performed by invoking an agent-specified callback routine which is not detailed in the event (for now). The "initialization" is the time the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationTime" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationTime" will be 0. #### Implementation There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping and use it to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. In order to implement this capability, it was necessary to refactor the code used to represent agents, called AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. The previous lists used to maintain the agents (JVMTI) and libraries (JVMPI) is not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. Testing: jdk_jfr, tier 1 - 6 Thanks Markus ------------- Commit messages: - event_names - adjustment - 8257967 Changes: https://git.openjdk.org/jdk/pull/12923/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8257967 Stats: 1862 lines in 22 files changed: 1353 ins; 485 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/12923.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12923/head:pull/12923 PR: https://git.openjdk.org/jdk/pull/12923 From mdoerr at openjdk.org Wed Mar 8 13:22:50 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 8 Mar 2023 13:22:50 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v11] In-Reply-To: References: Message-ID: > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Will get addressed separately: [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Remove STRUCT_REFERENCE which was incorrectly taken from aarch64. Pass size to bufferLoad/Store. Enable TestNested.java. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12708/files - new: https://git.openjdk.org/jdk/pull/12708/files/3a541d00..f75a240d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=09-10 Stats: 25 lines in 3 files changed: 0 ins; 21 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From qamai at openjdk.org Wed Mar 8 13:46:03 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 8 Mar 2023 13:46:03 GMT Subject: RFR: 8303762: [vectorapi] Intrinsification of Vector.slice [v2] In-Reply-To: References: Message-ID: > `Vector::slice` is a method at the top-level class of the Vector API that concatenates the 2 inputs into an intermediate composite and extracts a window equal to the size of the inputs into the result. It is used in vector conversion methods where the part number is not 0 to slice the parts to the correct positions. Slicing is also used in text processing such as utf8 and utf16 validation. x86 starting from SSSE3 has `palignr` which does vector slicing very efficiently. As a result, I think it is beneficial to add a C2 node for this operation as well as intrinsify `Vector::slice` method. > > A slice is currently implemented as `v2.rearrange(iota).blend(v1.rearrange(iota), blendMask)` which requires preparation of the index vector and the blending mask. Even with the preparations being hoisted out of the loops, microbenchmarks show improvement using the slice instrinsics. Some have tremendous increases in throughput due to the limitation that a mask of length 2 cannot currently be intrinsified, leading to falling back to the Java implementations. > > Please take a look and have some reviews. Thank you very much. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: address reviews ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12909/files - new: https://git.openjdk.org/jdk/pull/12909/files/e992d4c6..65409f13 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12909&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12909&range=00-01 Stats: 333 lines in 2 files changed: 61 ins; 182 del; 90 mod Patch: https://git.openjdk.org/jdk/pull/12909.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12909/head:pull/12909 PR: https://git.openjdk.org/jdk/pull/12909 From qamai at openjdk.org Wed Mar 8 13:52:18 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 8 Mar 2023 13:52:18 GMT Subject: RFR: 8303762: [vectorapi] Intrinsification of Vector.slice [v2] In-Reply-To: <_PA9oL9dVd3Yrg0sXw3m0uwfGjP6TuqXGBm5M090GHM=.a09a8733-e59e-4b6d-a6a6-e518a8518450@github.com> References: <_PA9oL9dVd3Yrg0sXw3m0uwfGjP6TuqXGBm5M090GHM=.a09a8733-e59e-4b6d-a6a6-e518a8518450@github.com> Message-ID: On Wed, 8 Mar 2023 00:29:05 GMT, Paul Sandoz wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> address reviews > > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 2289: > >> 2287: getClass(), byte.class, length(), >> 2288: this, that, origin, >> 2289: new VectorSliceOp() { > > Change from inner class to lambda expression? We still need this method to be inlined and I don't know if there is a way to annotate the lambda function. > test/hotspot/jtreg/compiler/vectorapi/TestVectorSlice.java line 65: > >> 63: Asserts.assertEquals(expected, dst[i][j]); >> 64: } >> 65: } > > It should be possible to factor out this code into something like this: > > assertOffsets(length, (expected, i, j) -> Assert.assertEquals((byte)expected, dst[i][j]) Fixed. > test/hotspot/jtreg/compiler/vectorapi/TestVectorSlice.java line 68: > >> 66: >> 67: length = 16; >> 68: testB128(dst, src1, src2); > > Should `dst` be zeroed before the next call? or maybe easier to just reallocate. Fixed, I just allocate another array. > test/jdk/jdk/incubator/vector/templates/Kernel-Slice-bop-const.template line 1: > >> 1: $type$[] a = fa.apply(SPECIES.length()); > > Forgot to commit the updated unit tests? This is for the microbenchmarks generated in the panama-vector repo only. Thanks a lot. ------------- PR: https://git.openjdk.org/jdk/pull/12909 From duke at openjdk.org Wed Mar 8 14:14:34 2023 From: duke at openjdk.org (Alexey Pavlyutkin) Date: Wed, 8 Mar 2023 14:14:34 GMT Subject: RFR: 8302073: Specifying OnError handler prevents WatcherThread to break a deadlock in report_and_die() Message-ID: The patch fixes error reporting to check timeout in the case when a user specifies OnError hander. Before VMError:check_timeout() ignored timeout in this case, and so didn't break malloc() deadlock. Verification (amd64/20.04LTS): 16:52:17 at alex@alex-VirtualBox>( echo " public class C { public static void main(String[] args) throws Throwable { > while (true) Thread.sleep(1000); > } > } > " >> C.java ) 16:57:35 at alex@alex-VirtualBox>./build/linux-x86_64-server-release/images/jdk/bin/java -XX:OnError='sleep 1;sleep 10;sleep 60' ./C.java & [2] 179574 17:00:19 at alex@alex-VirtualBox>kill -s SIGSEGV 179574 17:00:27 at alex@alex-VirtualBox># # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007f7b1701ecd5 (sent by kill), pid=179574, tid=179574 # # JRE version: OpenJDK Runtime Environment (21.0) (build 21-internal-adhoc.alex.jdk) # Java VM: OpenJDK 64-Bit Server VM (21-internal-adhoc.alex.jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # C [libpthread.so.0+0x9cd5] __pthread_clockjoin_ex+0x255 # # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /home/alex/jdk/core.179574) # # An error report file with more information is saved as: # /home/alex/jdk/hs_err_pid179574.log # # If you would like to submit a bug report, please visit: # https://bugreport.java.com/bugreport/crash.jsp # # # -XX:OnError="sleep 1;sleep 10;sleep 60" # Executing /bin/sh -c "sleep 1" ... # Executing /bin/sh -c "sleep 10" ... # Executing /bin/sh -c "sleep 60" ... [2]+ Aborted (core dumped) ./build/linux-x86_64-server-release/images/jdk/bin/java -XX:OnError='sleep 1;sleep 10;sleep 60' ./C.java 17:02:03 at alex@alex-VirtualBox>./build/linux-x86_64-server-release/images/jdk/bin/java -XX:ErrorLogTimeout=5 -XX:OnError='sleep 1;sleep 10;sleep 60' ./C.java & [2] 179602 17:02:32 at alex@alex-VirtualBox>kill -s SIGSEGV 179602 17:02:41 at alex@alex-VirtualBox># # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007f9d71b18cd5 (sent by kill), pid=179602, tid=179602 # # JRE version: OpenJDK Runtime Environment (21.0) (build 21-internal-adhoc.alex.jdk) # Java VM: OpenJDK 64-Bit Server VM (21-internal-adhoc.alex.jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # C [libpthread.so.0+0x9cd5] __pthread_clockjoin_ex+0x255 # # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /home/alex/jdk/core.179602) # # An error report file with more information is saved as: # /home/alex/jdk/hs_err_pid179602.log # # If you would like to submit a bug report, please visit: # https://bugreport.java.com/bugreport/crash.jsp # # # -XX:OnError="sleep 1;sleep 10;sleep 60" # Executing /bin/sh -c "sleep 1" ... # Executing /bin/sh -c "sleep 10" ... ------ Timeout during error reporting after 11 s. ------ 17:02:54 at alex@alex-VirtualBox> Regression (amd64/20.04LTS): `test/hotspot/jtreg/runtime/ErrorHandling` with different combinations of `-vmoption:-XX:ErrorLogTimeout=10` and `-vmoption:-XX:OnError='sleep 10'` ------------- Commit messages: - report_and_die() should check timeout not under ForkAndExecCheckPoint - 8302073: reporting routine shall check timeout expiration after each call of fork_and_exec() - 8302073: Specifying OnError handler prevents WatcherThread to break a deadlock in report_and_die() Changes: https://git.openjdk.org/jdk/pull/12925/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12925&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8302073 Stats: 28 lines in 1 file changed: 26 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/12925.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12925/head:pull/12925 PR: https://git.openjdk.org/jdk/pull/12925 From qamai at openjdk.org Wed Mar 8 14:21:54 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 8 Mar 2023 14:21:54 GMT Subject: RFR: JDK-8303817: Add constexpr for natural malloc alignment In-Reply-To: References: Message-ID: <9coR7ybmW69cjuVW4gPVNLYq9DPteSGZmcmyJm0E-OE=.618159df-f840-4428-b818-d054e119ee3e@github.com> On Wed, 8 Mar 2023 11:23:53 GMT, Thomas Stuefe wrote: > I miss having an easy way to get the alignment the libc guarantees for malloc. Let's add this, and we can use this right away with NMT's MallocHeader. > > Tests: manually built and tested x64 linux (clang + GCC) and x86 linux. Can we use `alignof(std::max_align_t)` instead? Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/12921 From mbaesken at openjdk.org Wed Mar 8 14:59:25 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 8 Mar 2023 14:59:25 GMT Subject: RFR: JDK-8302989: Add missing INCLUDE_CDS checks [v4] In-Reply-To: References: Message-ID: On Fri, 3 Mar 2023 08:55:07 GMT, Matthias Baesken wrote: >> The cds only coding in hotspot is usually guarded with the INCLUDE_CDS macro so that it can be removed at compile time in case the correct configure flags are set. >> However at some places INCLUDE_CDS is missing and should be added. >> >> One question - should (additionally to the UseSharedSpaces code section) the DumpSharedSpaces code sections be guarded as well with INCLUDE_CDS macros ? > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > Adjust arguments handling The disabling/enabling of jvm features is briefly mentioned in the build doc ( https://openjdk.org/groups/build/doc/building.html ) so I thought this is something that is supported and should work (maybe not perfect because of limited testing) . If it is really not so interesting to others to adjust tests with `@requires vm.cds` where needed and to remove not needed code when the feature is disabled , then we can leave it as it is. ------------- PR: https://git.openjdk.org/jdk/pull/12691 From mgronlun at openjdk.org Wed Mar 8 15:16:28 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 8 Mar 2023 15:16:28 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v2] In-Reply-To: References: Message-ID: > Greetings, > > We are adding support to let JFR report on Agents. > > #### Design > > An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. > > A JavaAgent is an agent written in the Java programming language, using the APIs in the package [[ava.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) > > A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. > > To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: > > // Command line > jdk.JavaAgent { > startTime = 12:31:19.789 (2023-03-08) > name = "JavaAgent.jar" > options = "foo=bar" > initialization = 12:31:15.574 (2023-03-08) > initializationTime = 172 ms > initializationMethod = "premain" > } > // Dynamic load > jdk.JavaAgent { > startTime = 12:31:31.158 (2023-03-08) > name = "JavaAgent.jar" > options = "bar=baz" > initialization = 12:31:31.037 (2023-03-08) > initializationTime = 64,1 ms > initializationMethod = "agentmain" > } > > The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. > > For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar > > The event will also detail which initialization method was invoked by the JVM, "premain" for command line agents, and "agentmain" for agents loaded dynamically. > > "initialization" is the timestamp the JVM invoked the initialization method, and "initializationTime" is the duration of executing the initialization method. > > "startTime" represents the time the JFR framework issued the periodic event; hence "initialization" will be earlier than "startTime". > > An agent can also be written in a native programming language, using either the JVM Tools Interface (JVMTI) or JVM Profiling Interface (JVMPI). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. > > JVMTI standard spec:ification is [here](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html) > > JVMPI is an older interface, not a standard and is considered superseded by JVMTI, but the support is still in the JVM for agents started on the command line: -XRunMyAgent.jar > > To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: > > jdk.NativeAgent { > startTime = 12:31:40.398 (2023-03-08) > name = "jdwp" > options = "transport=dt_socket,server=y,address=any,onjcmd=y" > path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" > dynamic = false > initialization = 12:31:36.142 (2023-03-08) > initializationTime = 0,00184 ms > } > > The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported, and there is also a denotation if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true). > > The initialization of a native agent is performed by invoking an agent-specified callback routine which is not detailed in the event (for now). The "initialization" is the time the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationTime" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationTime" will be 0. > > #### Implementation > > There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. > > Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. > > To implement this capability, it was necessary to refactor the code used to represent agents, called AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. > > The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. > > The previous lists that maintain the agents (JVMTI) and libraries (JVMPI) are not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. > > Testing: jdk_jfr, tier 1 - 6 > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: razor ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12923/files - new: https://git.openjdk.org/jdk/pull/12923/files/c50cca53..ed1ea797 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=00-01 Stats: 114 lines in 3 files changed: 33 ins; 74 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/12923.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12923/head:pull/12923 PR: https://git.openjdk.org/jdk/pull/12923 From coleenp at openjdk.org Wed Mar 8 15:40:33 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 8 Mar 2023 15:40:33 GMT Subject: RFR: 8302779: HelidonAppTest.java fails with "assert(_cb == CodeCache::find_blob(pc())) failed: Must be the same" or SIGSEGV In-Reply-To: References: Message-ID: On Tue, 7 Mar 2023 22:14:39 GMT, Patricio Chilano Mateo wrote: > Please review the following fix. The Method instance representing Continuation.enterSpecial() is replaced by a new Method during redefinition of the Continuation class. The already existing nmethod for it is not used, but a new one will be generated the first time enterSpecial() is resolved after redefinition. This means we could have more than one nmethod representing enterSpecial(), in particular, one generated before redefinition took place, and one after it. Now, when walking the stack, if we found a return barrier pc (as in Continuation::is_return_barrier_entry()) and we want to keep walking the physical stack then we know the sender will be the enterSpecial frame so we create it by calling ContinuationEntry::to_frame(). This method assumes there can only be one nmethod associated with enterSpecial() so we hit an assert later on. See the bug for more details of the crash. > > As I mentioned in the bug we don't need to rely on this assumption since we can re-read the updated value from _enter_special. But reading both _enter_special and _return_pc means we would need some kind of synchronization since to_frame() could be called concurrently with set_enter_code(). To avoid that we could just read _return_pc and calculate the blob from it each time, but I'm also assuming that overhead is undesired and that's why the static variable was introduced. Alternatively _enter_special could be read and _return_pc could be derived from it (by adding an extra field in the nmethod class). But if we go this route I think we would need to do a small fix on thaw too. After redefinition and before a new call to resolve enterSpecial(), the last thaw call for some virtual thread would create an entry frame with an old _return_pc (see ThawBase::new_entry_frame() and ThawBase::patch_return()). I'm not sure about the lifecycle of the old CodeBlob but it seems it could have bee n already removed if enterSpecial was not found while traversing everybody's stack. Maybe there are more issues. > > The simple solution implemented here is just to disallow redefinition of the Continuation class altogether. Another less restrictive option would be to keep the already generated enterSpecial nmethod, if there is one. I can also investigate one of the routes mentioned previously if desired. > > I tested the fix with the simple reproducer I added to the bug and also with the previously crashing HelidonAppTest.java test. > > Thanks, > Patricio That's a good fix and a good place for it. Thank you for figuring this out. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.org/jdk/pull/12911 From jcking at openjdk.org Wed Mar 8 15:42:49 2023 From: jcking at openjdk.org (Justin King) Date: Wed, 8 Mar 2023 15:42:49 GMT Subject: RFR: JDK-8303817: Add constexpr for natural malloc alignment In-Reply-To: <9coR7ybmW69cjuVW4gPVNLYq9DPteSGZmcmyJm0E-OE=.618159df-f840-4428-b818-d054e119ee3e@github.com> References: <9coR7ybmW69cjuVW4gPVNLYq9DPteSGZmcmyJm0E-OE=.618159df-f840-4428-b818-d054e119ee3e@github.com> Message-ID: The message from this sender included one or more files which could not be scanned for virus detection; do not open these files unless you are certain of the sender's intent. ---------------------------------------------------------------------- On Wed, 8 Mar 2023 14:19:12 GMT, Quan Anh Mai wrote: > Can we use `alignof(std::max_align_t)` instead? Thanks. Same request. ------------- PR: https://git.openjdk.org/jdk/pull/12921 From psandoz at openjdk.org Wed Mar 8 16:22:13 2023 From: psandoz at openjdk.org (Paul Sandoz) Date: Wed, 8 Mar 2023 16:22:13 GMT Subject: RFR: 8303762: [vectorapi] Intrinsification of Vector.slice [v2] In-Reply-To: References: <_PA9oL9dVd3Yrg0sXw3m0uwfGjP6TuqXGBm5M090GHM=.a09a8733-e59e-4b6d-a6a6-e518a8518450@github.com> Message-ID: On Wed, 8 Mar 2023 13:48:16 GMT, Quan Anh Mai wrote: >> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 2289: >> >>> 2287: getClass(), byte.class, length(), >>> 2288: this, that, origin, >>> 2289: new VectorSliceOp() { >> >> Change from inner class to lambda expression? > > We still need this method to be inlined and I don't know if there is a way to annotate the lambda function. Yes, i wondered about the inline and how important it might be. You want the fallback to inline so as not to perturb platforms without the intrinsic. Can you add a comment on the anon class? ------------- PR: https://git.openjdk.org/jdk/pull/12909 From psandoz at openjdk.org Wed Mar 8 16:26:05 2023 From: psandoz at openjdk.org (Paul Sandoz) Date: Wed, 8 Mar 2023 16:26:05 GMT Subject: RFR: 8303762: [vectorapi] Intrinsification of Vector.slice [v2] In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 13:46:03 GMT, Quan Anh Mai wrote: >> `Vector::slice` is a method at the top-level class of the Vector API that concatenates the 2 inputs into an intermediate composite and extracts a window equal to the size of the inputs into the result. It is used in vector conversion methods where the part number is not 0 to slice the parts to the correct positions. Slicing is also used in text processing such as utf8 and utf16 validation. x86 starting from SSSE3 has `palignr` which does vector slicing very efficiently. As a result, I think it is beneficial to add a C2 node for this operation as well as intrinsify `Vector::slice` method. >> >> A slice is currently implemented as `v2.rearrange(iota).blend(v1.rearrange(iota), blendMask)` which requires preparation of the index vector and the blending mask. Even with the preparations being hoisted out of the loops, microbenchmarks show improvement using the slice instrinsics. Some have tremendous increases in throughput due to the limitation that a mask of length 2 cannot currently be intrinsified, leading to falling back to the Java implementations. >> >> Please take a look and have some reviews. Thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > address reviews Java changes look good. The HotSpot code looks well structured but i will let others comment on the specifics. ------------- Marked as reviewed by psandoz (Reviewer). PR: https://git.openjdk.org/jdk/pull/12909 From coleenp at openjdk.org Wed Mar 8 16:37:23 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 8 Mar 2023 16:37:23 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams In-Reply-To: References: Message-ID: On Fri, 3 Mar 2023 14:50:34 GMT, Frederic Parain wrote: > Please review this change re-implementing the FieldInfo data structure. > > The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. > > The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). > > More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 > > Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. > > The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. > > Tested with mach5, tier 1 to 7. > > Thank you. I was able to do a first pass through the vm code except for jvmci. I didn't look at tests or SA in this pass. src/hotspot/share/ci/ciFlags.hpp line 47: > 45: > 46: ciFlags() { _flags = 0; _stable = false; _intialized_final_update = false; } > 47: ciFlags(AccessFlags flags, bool is_stable= false, bool is_initialized_final_update = false) { This should use constructor initializer syntax. src/hotspot/share/classfile/classFileParser.cpp line 1491: > 1489: _temp_field_info = new GrowableArray(total_fields); > 1490: > 1491: ResourceMark rm(THREAD); Is the ResourceMark ok here or should it go before allocating _temp_field_info ? src/hotspot/share/classfile/classFileParser.cpp line 1608: > 1606: fflags.update_injected(true); > 1607: AccessFlags aflags; > 1608: FieldInfo fi(aflags, (u2)(injected[n].name_index), (u2)(injected[n].signature_index), 0, fflags); I don't know why there's a cast here until I read more. If the FieldInfo name_index and signature_index fields are only u2 sized, could you pass this as an int and then in the constructor assert that the value doesn't overflow u2 instead? src/hotspot/share/classfile/classFileParser.cpp line 1634: > 1632: for(int i = 0; i < _temp_field_info->length(); i++) { > 1633: name = _temp_field_info->adr_at(i)->name(_cp); > 1634: sig = _temp_field_info->adr_at(i)->signature(_cp); This checking for duplicates looks like a good candidate for a separate function because parse_fields is so long. I'm adding this comment to remember to file an RFE to look into making this function shorter and factor out this code. src/hotspot/share/classfile/classFileParser.cpp line 6024: > 6022: int injected_fields_count = _temp_field_info->length() - _java_fields_count; > 6023: _fieldinfo_stream = FieldInfoStream::create_FieldInfoStream(_temp_field_info, _java_fields_count, injected_fields_count, loader_data(), CHECK); > 6024: _fields_status = MetadataFactory::new_array(_loader_data, _temp_field_info->length(), FieldStatus(0), CHECK); These lines seem long, could you reformat? src/hotspot/share/classfile/fieldLayoutBuilder.cpp line 554: > 552: FieldInfo ctrl = _field_info->at(0); > 553: FieldGroup* group = nullptr; > 554: FieldInfo tfi = *it; What's the 't' in tfi? Maybe a longer variable name would be helpful here. src/hotspot/share/classfile/javaClasses.cpp line 871: > 869: // a new UNSIGNED5 stream, and substitute it to the old FieldInfo stream. > 870: > 871: int java_fields; Can you put InstanceKlass* ik = InstanceKlass::cast(k); here and use that so there's only one cast? src/hotspot/share/classfile/javaClasses.cpp line 873: > 871: int java_fields; > 872: int injected_fields; > 873: GrowableArray* fields = FieldInfoStream::create_FieldInfoArray(InstanceKlass::cast(k)->fieldinfo_stream(), &java_fields, &injected_fields); This line looks too long too. src/hotspot/share/oops/fieldInfo.hpp line 31: > 29: #include "memory/metadataFactory.hpp" > 30: #include "oops/constantPool.hpp" > 31: #include "oops/symbol.hpp" Since you added an inline.hpp function can you move the functions that rely on including constantPool.hpp, symbol.hpp and metadataFactory.hpp into the inline.hpp file? src/hotspot/share/oops/fieldInfo.hpp line 180: > 178: u2 generic_signature_index() const { return _generic_signature_index; } > 179: void set_generic_signature_index(u2 index) { _generic_signature_index = index; } > 180: u2 contention_group() const { return _contention_group; } Can you align the { in these one line functions? src/hotspot/share/oops/fieldStreams.hpp line 28: > 26: #define SHARE_OOPS_FIELDSTREAMS_HPP > 27: > 28: #include "oops/instanceKlass.inline.hpp" including .inline.hpp from .hpp is against the guidelines. You should move things and include instanceKlass.inline.hpp in fieldStreams.inline.hpp instead. src/hotspot/share/oops/fieldStreams.hpp line 104: > 102: AccessFlags flags; > 103: flags.set_flags(field()->access_flags()); > 104: return flags; Did this used to do this for a reason? src/hotspot/share/oops/fieldStreams.inline.hpp line 28: > 26: #define SHARE_OOPS_FIELDSTREAMS_INLINE_HPP > 27: > 28: #include "oops/fieldInfo.inline.hpp" I don't know if you have to include oops/fieldInfo.inline.hpp but the include line for fieldStreams.hpp should be by itself and then this new include should be below with runtime/javaThread.hpp src/hotspot/share/oops/instanceKlass.hpp line 275: > 273: // Fields information is stored in an UNSIGNED5 encoded stream (see fieldInfo.hpp) > 274: Array* _fieldinfo_stream; > 275: Array* _fields_status; Can you align these two field identifiers? src/hotspot/share/prims/jvmtiRedefineClasses.cpp line 3582: > 3580: } > 3581: if (update_required) { > 3582: Array* old_stream = InstanceKlass::cast(scratch_class)->fieldinfo_stream(); scratch_class should already be an InstanceKlass, ie cast not required here or below. src/hotspot/share/runtime/reflectionUtils.hpp line 29: > 27: > 28: #include "memory/allStatic.hpp" > 29: #include "oops/instanceKlass.inline.hpp" Also here cannot include .inline.hpp in .hpp file. ------------- Changes requested by coleenp (Reviewer). PR: https://git.openjdk.org/jdk/pull/12855 From stuefe at openjdk.org Wed Mar 8 17:10:43 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 8 Mar 2023 17:10:43 GMT Subject: RFR: 8302073: Specifying OnError handler prevents WatcherThread to break a deadlock in report_and_die() In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 14:05:44 GMT, Alexey Pavlyutkin wrote: > The patch fixes error reporting to check timeout in the case when a user specifies OnError hander. Before VMError:check_timeout() ignored timeout in this case, and so didn't break malloc() deadlock. > > Verification (amd64/20.04LTS): the idea of the test is to crash JVM running with error hander of 3 successive `sleep` commands for 1s, 10s, and 60s with and without specified timeout > > > 16:52:17 at alex@alex-VirtualBox>( echo " > public class C { > public static void main(String[] args) throws Throwable { >> while (true) Thread.sleep(1000); >> } >> } >> " >> C.java ) > 16:57:35 at alex@alex-VirtualBox>./build/linux-x86_64-server-release/images/jdk/bin/java -XX:OnError='sleep 1;sleep 10;sleep 60' ./C.java & > [2] 179574 > 17:00:19 at alex@alex-VirtualBox>kill -s SIGSEGV 179574 > 17:00:27 at alex@alex-VirtualBox># > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x00007f7b1701ecd5 (sent by kill), pid=179574, tid=179574 > # > # JRE version: OpenJDK Runtime Environment (21.0) (build 21-internal-adhoc.alex.jdk) > # Java VM: OpenJDK 64-Bit Server VM (21-internal-adhoc.alex.jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > # Problematic frame: > # C [libpthread.so.0+0x9cd5] __pthread_clockjoin_ex+0x255 > # > # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /home/alex/jdk/core.179574) > # > # An error report file with more information is saved as: > # /home/alex/jdk/hs_err_pid179574.log > # > # If you would like to submit a bug report, please visit: > # https://bugreport.java.com/bugreport/crash.jsp > # > # > # -XX:OnError="sleep 1;sleep 10;sleep 60" > # Executing /bin/sh -c "sleep 1" ... > # Executing /bin/sh -c "sleep 10" ... > # Executing /bin/sh -c "sleep 60" ... > > [2]+ Aborted (core dumped) ./build/linux-x86_64-server-release/images/jdk/bin/java -XX:OnError='sleep 1;sleep 10;sleep 60' ./C.java > 17:02:03 at alex@alex-VirtualBox>./build/linux-x86_64-server-release/images/jdk/bin/java -XX:ErrorLogTimeout=5 -XX:OnError='sleep 1;sleep 10;sleep 60' ./C.java & > [2] 179602 > 17:02:32 at alex@alex-VirtualBox>kill -s SIGSEGV 179602 > 17:02:41 at alex@alex-VirtualBox># > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x00007f9d71b18cd5 (sent by kill), pid=179602, tid=179602 > # > # JRE version: OpenJDK Runtime Environment (21.0) (build 21-internal-adhoc.alex.jdk) > # Java VM: OpenJDK 64-Bit Server VM (21-internal-adhoc.alex.jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > # Problematic frame: > # C [libpthread.so.0+0x9cd5] __pthread_clockjoin_ex+0x255 > # > # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /home/alex/jdk/core.179602) > # > # An error report file with more information is saved as: > # /home/alex/jdk/hs_err_pid179602.log > # > # If you would like to submit a bug report, please visit: > # https://bugreport.java.com/bugreport/crash.jsp > # > # > # -XX:OnError="sleep 1;sleep 10;sleep 60" > # Executing /bin/sh -c "sleep 1" ... > # Executing /bin/sh -c "sleep 10" ... > > ------ Timeout during error reporting after 11 s. ------ > > 17:02:54 at alex@alex-VirtualBox> > > > Regression (amd64/20.04LTS): `test/hotspot/jtreg/runtime/ErrorHandling` with different combinations of `-vmoption:-XX:ErrorLogTimeout=10` and `-vmoption:-XX:OnError='sleep 10'` Thinking this through some more, I'm starting to doubt we do the right thing here. It is certainly convoluted: we *are* the reporting thread here, so what happens is that: - after fork_and_exec, we call check_timeout - check_timeout will signal ourself - we receive the signal - we enter the secondary signal handler recursively - we then re-enter VMError::report_and_die - we then note that this is a timeout and print an error message and call os::die. That is too complicated for my taste, and I don't know if there are any hidden issues with VM::check_timeout(). Since now, we call it from two threads, possibly, the reporting thread and the watcher thread. That function was not intended for concurrent usage. And before thinking about the correct behavior, we need to clarify if the protection we grant an OnError invocation extends to the whole chain of error scripts. Right now we say OnError scripts should not be interrupted. Okay, but what about the next OnError script? If the user specifies several OnError scripts, should they all get a chance to run to finish? Because denying the follow-up error scripts a chance to run feels weirdly arbitrary. Either those scripts are essential, or they aren't. If they are, all should run. If they are not, it should be okay to kill the JVM *while it is waiting for the child process to finish* - this would make the patch simpler, and people argue that this would be the correct behavior anyway. Personally, I think that killing the JVM while it is in waitpid waiting for the child is probably benign. Child would be reparented, possibly zombified on badly set up systems, but that's it. It would probably run to completion. src/hotspot/share/utilities/vmError.cpp line 1360: > 1358: > 1359: namespace { > 1360: class ForkAndExecCheckPoint : public StackObj { Nit: checkpoint sounds quite specific, and here is nothing checked. Also, this just guards only OnError usages of fork_and_exec, not possible other usages, so maybe "OnErrorInProgress" or something? src/hotspot/share/utilities/vmError.cpp line 1362: > 1360: class ForkAndExecCheckPoint : public StackObj { > 1361: NONCOPYABLE( ForkAndExecCheckPoint ); > 1362: static int _in_progress; Must be volatile, I think. src/hotspot/share/utilities/vmError.cpp line 1366: > 1364: ForkAndExecCheckPoint() { > 1365: assert(Atomic::load(&_in_progress) == 0, "fork_and_exec() is already in progress"); > 1366: Atomic::store(&_in_progress, 1); I'd do a CAS. src/hotspot/share/utilities/vmError.cpp line 1692: > 1690: { > 1691: // suspend timeout detection and run the command > 1692: ForkAndExecCheckPoint chechpoint; typo ------------- PR: https://git.openjdk.org/jdk/pull/12925 From stuefe at openjdk.org Wed Mar 8 17:21:18 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 8 Mar 2023 17:21:18 GMT Subject: RFR: 8302073: Specifying OnError handler prevents WatcherThread to break a deadlock in report_and_die() In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 14:05:44 GMT, Alexey Pavlyutkin wrote: > The patch fixes error reporting to check timeout in the case when a user specifies OnError hander. Before VMError:check_timeout() ignored timeout in this case, and so didn't break malloc() deadlock. > > Verification (amd64/20.04LTS): the idea of the test is to crash JVM running with error hander of 3 successive `sleep` commands for 1s, 10s, and 60s with and without specified timeout > > > 16:52:17 at alex@alex-VirtualBox>( echo " > public class C { > public static void main(String[] args) throws Throwable { >> while (true) Thread.sleep(1000); >> } >> } >> " >> C.java ) > 16:57:35 at alex@alex-VirtualBox>./build/linux-x86_64-server-release/images/jdk/bin/java -XX:OnError='sleep 1;sleep 10;sleep 60' ./C.java & > [2] 179574 > 17:00:19 at alex@alex-VirtualBox>kill -s SIGSEGV 179574 > 17:00:27 at alex@alex-VirtualBox># > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x00007f7b1701ecd5 (sent by kill), pid=179574, tid=179574 > # > # JRE version: OpenJDK Runtime Environment (21.0) (build 21-internal-adhoc.alex.jdk) > # Java VM: OpenJDK 64-Bit Server VM (21-internal-adhoc.alex.jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > # Problematic frame: > # C [libpthread.so.0+0x9cd5] __pthread_clockjoin_ex+0x255 > # > # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /home/alex/jdk/core.179574) > # > # An error report file with more information is saved as: > # /home/alex/jdk/hs_err_pid179574.log > # > # If you would like to submit a bug report, please visit: > # https://bugreport.java.com/bugreport/crash.jsp > # > # > # -XX:OnError="sleep 1;sleep 10;sleep 60" > # Executing /bin/sh -c "sleep 1" ... > # Executing /bin/sh -c "sleep 10" ... > # Executing /bin/sh -c "sleep 60" ... > > [2]+ Aborted (core dumped) ./build/linux-x86_64-server-release/images/jdk/bin/java -XX:OnError='sleep 1;sleep 10;sleep 60' ./C.java > 17:02:03 at alex@alex-VirtualBox>./build/linux-x86_64-server-release/images/jdk/bin/java -XX:ErrorLogTimeout=5 -XX:OnError='sleep 1;sleep 10;sleep 60' ./C.java & > [2] 179602 > 17:02:32 at alex@alex-VirtualBox>kill -s SIGSEGV 179602 > 17:02:41 at alex@alex-VirtualBox># > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x00007f9d71b18cd5 (sent by kill), pid=179602, tid=179602 > # > # JRE version: OpenJDK Runtime Environment (21.0) (build 21-internal-adhoc.alex.jdk) > # Java VM: OpenJDK 64-Bit Server VM (21-internal-adhoc.alex.jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > # Problematic frame: > # C [libpthread.so.0+0x9cd5] __pthread_clockjoin_ex+0x255 > # > # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /home/alex/jdk/core.179602) > # > # An error report file with more information is saved as: > # /home/alex/jdk/hs_err_pid179602.log > # > # If you would like to submit a bug report, please visit: > # https://bugreport.java.com/bugreport/crash.jsp > # > # > # -XX:OnError="sleep 1;sleep 10;sleep 60" > # Executing /bin/sh -c "sleep 1" ... > # Executing /bin/sh -c "sleep 10" ... > > ------ Timeout during error reporting after 11 s. ------ > > 17:02:54 at alex@alex-VirtualBox> > > > Regression (amd64/20.04LTS): `test/hotspot/jtreg/runtime/ErrorHandling` with different combinations of `-vmoption:-XX:ErrorLogTimeout=10` and `-vmoption:-XX:OnError='sleep 10'` Pinging @dholmes-ora for input and opinion ------------- PR: https://git.openjdk.org/jdk/pull/12925 From qamai at openjdk.org Wed Mar 8 17:24:50 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 8 Mar 2023 17:24:50 GMT Subject: RFR: 8303762: [vectorapi] Intrinsification of Vector.slice [v3] In-Reply-To: References: Message-ID: > `Vector::slice` is a method at the top-level class of the Vector API that concatenates the 2 inputs into an intermediate composite and extracts a window equal to the size of the inputs into the result. It is used in vector conversion methods where the part number is not 0 to slice the parts to the correct positions. Slicing is also used in text processing such as utf8 and utf16 validation. x86 starting from SSSE3 has `palignr` which does vector slicing very efficiently. As a result, I think it is beneficial to add a C2 node for this operation as well as intrinsify `Vector::slice` method. > > A slice is currently implemented as `v2.rearrange(iota).blend(v1.rearrange(iota), blendMask)` which requires preparation of the index vector and the blending mask. Even with the preparations being hoisted out of the loops, microbenchmarks show improvement using the slice instrinsics. Some have tremendous increases in throughput due to the limitation that a mask of length 2 cannot currently be intrinsified, leading to falling back to the Java implementations. > > Please take a look and have some reviews. Thank you very much. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: add comments explaining anonymous classes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12909/files - new: https://git.openjdk.org/jdk/pull/12909/files/65409f13..c31fdfe8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12909&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12909&range=01-02 Stats: 21 lines in 7 files changed: 21 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/12909.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12909/head:pull/12909 PR: https://git.openjdk.org/jdk/pull/12909 From qamai at openjdk.org Wed Mar 8 17:24:55 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 8 Mar 2023 17:24:55 GMT Subject: RFR: 8303762: [vectorapi] Intrinsification of Vector.slice [v2] In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 16:23:16 GMT, Paul Sandoz wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> address reviews > > Java changes look good. The HotSpot code looks well structured but i will let others comment on the specifics. @PaulSandoz Thanks for your review, I have added a comment explaining the rationales behind the anonymous class usage. ------------- PR: https://git.openjdk.org/jdk/pull/12909 From duke at openjdk.org Wed Mar 8 18:03:20 2023 From: duke at openjdk.org (Alexey Pavlyutkin) Date: Wed, 8 Mar 2023 18:03:20 GMT Subject: RFR: 8302073: Specifying OnError handler prevents WatcherThread to break a deadlock in report_and_die() In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 15:49:06 GMT, Thomas Stuefe wrote: >> The patch fixes error reporting to check timeout in the case when a user specifies OnError hander. Before VMError:check_timeout() ignored timeout in this case, and so didn't break malloc() deadlock. >> >> Verification (amd64/20.04LTS): the idea of the test is to crash JVM running with error hander of 3 successive `sleep` commands for 1s, 10s, and 60s with and without specified timeout >> >> >> 16:52:17 at alex@alex-VirtualBox>( echo " >> public class C { >> public static void main(String[] args) throws Throwable { >>> while (true) Thread.sleep(1000); >>> } >>> } >>> " >> C.java ) >> 16:57:35 at alex@alex-VirtualBox>./build/linux-x86_64-server-release/images/jdk/bin/java -XX:OnError='sleep 1;sleep 10;sleep 60' ./C.java & >> [2] 179574 >> 17:00:19 at alex@alex-VirtualBox>kill -s SIGSEGV 179574 >> 17:00:27 at alex@alex-VirtualBox># >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # SIGSEGV (0xb) at pc=0x00007f7b1701ecd5 (sent by kill), pid=179574, tid=179574 >> # >> # JRE version: OpenJDK Runtime Environment (21.0) (build 21-internal-adhoc.alex.jdk) >> # Java VM: OpenJDK 64-Bit Server VM (21-internal-adhoc.alex.jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) >> # Problematic frame: >> # C [libpthread.so.0+0x9cd5] __pthread_clockjoin_ex+0x255 >> # >> # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /home/alex/jdk/core.179574) >> # >> # An error report file with more information is saved as: >> # /home/alex/jdk/hs_err_pid179574.log >> # >> # If you would like to submit a bug report, please visit: >> # https://bugreport.java.com/bugreport/crash.jsp >> # >> # >> # -XX:OnError="sleep 1;sleep 10;sleep 60" >> # Executing /bin/sh -c "sleep 1" ... >> # Executing /bin/sh -c "sleep 10" ... >> # Executing /bin/sh -c "sleep 60" ... >> >> [2]+ Aborted (core dumped) ./build/linux-x86_64-server-release/images/jdk/bin/java -XX:OnError='sleep 1;sleep 10;sleep 60' ./C.java >> 17:02:03 at alex@alex-VirtualBox>./build/linux-x86_64-server-release/images/jdk/bin/java -XX:ErrorLogTimeout=5 -XX:OnError='sleep 1;sleep 10;sleep 60' ./C.java & >> [2] 179602 >> 17:02:32 at alex@alex-VirtualBox>kill -s SIGSEGV 179602 >> 17:02:41 at alex@alex-VirtualBox># >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # SIGSEGV (0xb) at pc=0x00007f9d71b18cd5 (sent by kill), pid=179602, tid=179602 >> # >> # JRE version: OpenJDK Runtime Environment (21.0) (build 21-internal-adhoc.alex.jdk) >> # Java VM: OpenJDK 64-Bit Server VM (21-internal-adhoc.alex.jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) >> # Problematic frame: >> # C [libpthread.so.0+0x9cd5] __pthread_clockjoin_ex+0x255 >> # >> # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /home/alex/jdk/core.179602) >> # >> # An error report file with more information is saved as: >> # /home/alex/jdk/hs_err_pid179602.log >> # >> # If you would like to submit a bug report, please visit: >> # https://bugreport.java.com/bugreport/crash.jsp >> # >> # >> # -XX:OnError="sleep 1;sleep 10;sleep 60" >> # Executing /bin/sh -c "sleep 1" ... >> # Executing /bin/sh -c "sleep 10" ... >> >> ------ Timeout during error reporting after 11 s. ------ >> >> 17:02:54 at alex@alex-VirtualBox> >> >> >> Regression (amd64/20.04LTS): `test/hotspot/jtreg/runtime/ErrorHandling` with different combinations of `-vmoption:-XX:ErrorLogTimeout=10` and `-vmoption:-XX:OnError='sleep 10'` > > src/hotspot/share/utilities/vmError.cpp line 1360: > >> 1358: >> 1359: namespace { >> 1360: class ForkAndExecCheckPoint : public StackObj { > > Nit: checkpoint sounds quite specific, and here is nothing checked. Also, this just guards only OnError usages of fork_and_exec, not possible other usages, so maybe "OnErrorInProgress" or something? ok > src/hotspot/share/utilities/vmError.cpp line 1362: > >> 1360: class ForkAndExecCheckPoint : public StackObj { >> 1361: NONCOPYABLE( ForkAndExecCheckPoint ); >> 1362: static int _in_progress; > > Must be volatile, I think. sure > src/hotspot/share/utilities/vmError.cpp line 1366: > >> 1364: ForkAndExecCheckPoint() { >> 1365: assert(Atomic::load(&_in_progress) == 0, "fork_and_exec() is already in progress"); >> 1366: Atomic::store(&_in_progress, 1); > > I'd do a CAS. Initially I used exactly xchg(), but it causes full memory fence. On other hand report_and_die() seems the last place to care about performance ------------- PR: https://git.openjdk.org/jdk/pull/12925 From duke at openjdk.org Wed Mar 8 18:14:07 2023 From: duke at openjdk.org (Alexey Pavlyutkin) Date: Wed, 8 Mar 2023 18:14:07 GMT Subject: RFR: 8302073: Specifying OnError handler prevents WatcherThread to break a deadlock in report_and_die() In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 14:05:44 GMT, Alexey Pavlyutkin wrote: > The patch fixes error reporting to check timeout in the case when a user specifies OnError hander. Before VMError:check_timeout() ignored timeout in this case, and so didn't break malloc() deadlock. > > Verification (amd64/20.04LTS): the idea of the test is to crash JVM running with error hander of 3 successive `sleep` commands for 1s, 10s, and 60s with and without specified timeout > > > 16:52:17 at alex@alex-VirtualBox>( echo " > public class C { > public static void main(String[] args) throws Throwable { >> while (true) Thread.sleep(1000); >> } >> } >> " >> C.java ) > 16:57:35 at alex@alex-VirtualBox>./build/linux-x86_64-server-release/images/jdk/bin/java -XX:OnError='sleep 1;sleep 10;sleep 60' ./C.java & > [2] 179574 > 17:00:19 at alex@alex-VirtualBox>kill -s SIGSEGV 179574 > 17:00:27 at alex@alex-VirtualBox># > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x00007f7b1701ecd5 (sent by kill), pid=179574, tid=179574 > # > # JRE version: OpenJDK Runtime Environment (21.0) (build 21-internal-adhoc.alex.jdk) > # Java VM: OpenJDK 64-Bit Server VM (21-internal-adhoc.alex.jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > # Problematic frame: > # C [libpthread.so.0+0x9cd5] __pthread_clockjoin_ex+0x255 > # > # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /home/alex/jdk/core.179574) > # > # An error report file with more information is saved as: > # /home/alex/jdk/hs_err_pid179574.log > # > # If you would like to submit a bug report, please visit: > # https://bugreport.java.com/bugreport/crash.jsp > # > # > # -XX:OnError="sleep 1;sleep 10;sleep 60" > # Executing /bin/sh -c "sleep 1" ... > # Executing /bin/sh -c "sleep 10" ... > # Executing /bin/sh -c "sleep 60" ... > > [2]+ Aborted (core dumped) ./build/linux-x86_64-server-release/images/jdk/bin/java -XX:OnError='sleep 1;sleep 10;sleep 60' ./C.java > 17:02:03 at alex@alex-VirtualBox>./build/linux-x86_64-server-release/images/jdk/bin/java -XX:ErrorLogTimeout=5 -XX:OnError='sleep 1;sleep 10;sleep 60' ./C.java & > [2] 179602 > 17:02:32 at alex@alex-VirtualBox>kill -s SIGSEGV 179602 > 17:02:41 at alex@alex-VirtualBox># > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x00007f9d71b18cd5 (sent by kill), pid=179602, tid=179602 > # > # JRE version: OpenJDK Runtime Environment (21.0) (build 21-internal-adhoc.alex.jdk) > # Java VM: OpenJDK 64-Bit Server VM (21-internal-adhoc.alex.jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > # Problematic frame: > # C [libpthread.so.0+0x9cd5] __pthread_clockjoin_ex+0x255 > # > # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /home/alex/jdk/core.179602) > # > # An error report file with more information is saved as: > # /home/alex/jdk/hs_err_pid179602.log > # > # If you would like to submit a bug report, please visit: > # https://bugreport.java.com/bugreport/crash.jsp > # > # > # -XX:OnError="sleep 1;sleep 10;sleep 60" > # Executing /bin/sh -c "sleep 1" ... > # Executing /bin/sh -c "sleep 10" ... > > ------ Timeout during error reporting after 11 s. ------ > > 17:02:54 at alex@alex-VirtualBox> > > > Regression (amd64/20.04LTS): `test/hotspot/jtreg/runtime/ErrorHandling` with different combinations of `-vmoption:-XX:ErrorLogTimeout=10` and `-vmoption:-XX:OnError='sleep 10'` > Thinking this through some more, I'm starting to doubt we do the right thing here. It is certainly convoluted: we _are_ the reporting thread here, so what happens is that: > > * after fork_and_exec, we call check_timeout > * check_timeout will signal ourself > * we receive the signal > * we enter the secondary signal handler recursively > * we then re-enter VMError::report_and_die > * we then note that this is a timeout and print an error message and call os::die. > > That is too complicated for my taste, and I don't know if there are any hidden issues with VM::check_timeout(). Since now, we call it from two threads, possibly, the reporting thread and the watcher thread. That function was not intended for concurrent usage. > > And before thinking about the correct behavior, we need to clarify if the protection we grant an OnError invocation extends to the whole chain of error scripts. Right now we say OnError scripts should not be interrupted. Okay, but what about the next OnError script? If the user specifies several OnError scripts, should they all get a chance to run to finish? > > Because denying the follow-up error scripts a chance to run feels weirdly arbitrary. Either those scripts are essential, or they aren't. If they are, all should run. If they are not, it should be okay to kill the JVM _while it is waiting for the child process to finish_ - this would make the patch simpler, and people argue that this would be the correct behavior anyway. > > Personally, I think that killing the JVM while it is in waitpid waiting for the child is probably benign. Child would be reparented, possibly zombified on badly set up systems, but that's it. It would probably run to completion. ok, let's replace check_timeout() with something like jlong expiration = get_reporting_start_time() + ( jlong )ErrorLogTimeout * TIMESTAMP_TO_SECONDS_FACTOR; if ( get_current_timestamp() > expiration ) break; ------------- PR: https://git.openjdk.org/jdk/pull/12925 From rkennke at openjdk.org Wed Mar 8 18:25:15 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 8 Mar 2023 18:25:15 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v15] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 - Inline initial LockStack stack ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/12c2b8c3..3c9d0d82 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=13-14 Stats: 15 lines in 2 files changed: 11 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From pchilanomate at openjdk.org Wed Mar 8 18:48:17 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 8 Mar 2023 18:48:17 GMT Subject: RFR: 8302779: HelidonAppTest.java fails with "assert(_cb == CodeCache::find_blob(pc())) failed: Must be the same" or SIGSEGV In-Reply-To: References: Message-ID: <9TIDrqZx-KDQ0tWE7BGiGYMSS5Ftb9y3B8OLcdPvrUs=.5cb94158-ab8b-47ea-96ba-a4e073a124d6@github.com> On Wed, 8 Mar 2023 15:37:12 GMT, Coleen Phillimore wrote: > That's a good fix and a good place for it. Thank you for figuring this out. > Thanks for the review Coleen! ------------- PR: https://git.openjdk.org/jdk/pull/12911 From mgronlun at openjdk.org Wed Mar 8 18:50:18 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 8 Mar 2023 18:50:18 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v3] In-Reply-To: References: Message-ID: > Greetings, > > We are adding support to let JFR report on Agents. > > #### Design > > An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. > > A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) > > A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. > > To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: > > // Command line > jdk.JavaAgent { > startTime = 12:31:19.789 (2023-03-08) > name = "JavaAgent.jar" > options = "foo=bar" > initialization = 12:31:15.574 (2023-03-08) > initializationTime = 172 ms > initializationMethod = "premain" > } > // Dynamic load > jdk.JavaAgent { > startTime = 12:31:31.158 (2023-03-08) > name = "JavaAgent.jar" > options = "bar=baz" > initialization = 12:31:31.037 (2023-03-08) > initializationTime = 64,1 ms > initializationMethod = "agentmain" > } > > The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. > > For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar > > The event will also detail which initialization method was invoked by the JVM, "premain" for command line agents, and "agentmain" for agents loaded dynamically. > > "initialization" is the timestamp the JVM invoked the initialization method, and "initializationTime" is the duration of executing the initialization method. > > "startTime" represents the time the JFR framework issued the periodic event; hence "initialization" will be earlier than "startTime". > > An agent can also be written in a native programming language, using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. > > To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: > > jdk.NativeAgent { > startTime = 12:31:40.398 (2023-03-08) > name = "jdwp" > options = "transport=dt_socket,server=y,address=any,onjcmd=y" > path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" > dynamic = false > initialization = 12:31:36.142 (2023-03-08) > initializationTime = 0,00184 ms > } > > The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported, and there is also a denotation if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true). > > The initialization of a native agent is performed by invoking an agent-specified callback routine which is not detailed in the event (for now). The "initialization" is the time the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationTime" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationTime" will be 0. > > #### Implementation > > There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. > > Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. > > To implement this capability, it was necessary to refactor the code used to represent agents, called AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. > > The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. > > The previous lists that maintain the agents (JVMTI) and libraries (JVMPI) are not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. > > Testing: jdk_jfr, tier 1 - 6 > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: update ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12923/files - new: https://git.openjdk.org/jdk/pull/12923/files/ed1ea797..26172f0e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=01-02 Stats: 13 lines in 2 files changed: 5 ins; 1 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/12923.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12923/head:pull/12923 PR: https://git.openjdk.org/jdk/pull/12923 From mgronlun at openjdk.org Wed Mar 8 18:56:55 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 8 Mar 2023 18:56:55 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v4] In-Reply-To: References: Message-ID: > Greetings, > > We are adding support to let JFR report on Agents. > > #### Design > > An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. > > A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) > > A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. > > To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: > > // Command line > jdk.JavaAgent { > startTime = 12:31:19.789 (2023-03-08) > name = "JavaAgent.jar" > options = "foo=bar" > initialization = 12:31:15.574 (2023-03-08) > initializationTime = 172 ms > initializationMethod = "premain" > } > // Dynamic load > jdk.JavaAgent { > startTime = 12:31:31.158 (2023-03-08) > name = "JavaAgent.jar" > options = "bar=baz" > initialization = 12:31:31.037 (2023-03-08) > initializationTime = 64,1 ms > initializationMethod = "agentmain" > } > > The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. > > For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar > > The event will also detail which initialization method was invoked by the JVM, "premain" for command line agents, and "agentmain" for agents loaded dynamically. > > "initialization" is the timestamp the JVM invoked the initialization method, and "initializationTime" is the duration of executing the initialization method. > > "startTime" represents the time the JFR framework issued the periodic event; hence "initialization" will be earlier than "startTime". > > An agent can also be written in a native programming language, using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. > > To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: > > jdk.NativeAgent { > startTime = 12:31:40.398 (2023-03-08) > name = "jdwp" > options = "transport=dt_socket,server=y,address=any,onjcmd=y" > path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" > dynamic = false > initialization = 12:31:36.142 (2023-03-08) > initializationTime = 0,00184 ms > } > > The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported, and there is also a denotation if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true). > > The initialization of a native agent is performed by invoking an agent-specified callback routine which is not detailed in the event (for now). The "initialization" is the time the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationTime" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationTime" will be 0. > > #### Implementation > > There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. > > Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. > > To implement this capability, it was necessary to refactor the code used to represent agents, called AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. > > The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. > > The previous lists that maintain the agents (JVMTI) and libraries (JVMPI) are not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. > > Testing: jdk_jfr, tier 1 - 6 > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with two additional commits since the last revision: - remove JVMPI - cleanup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12923/files - new: https://git.openjdk.org/jdk/pull/12923/files/26172f0e..355d307c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=02-03 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/12923.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12923/head:pull/12923 PR: https://git.openjdk.org/jdk/pull/12923 From dnsimon at openjdk.org Wed Mar 8 19:59:02 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 8 Mar 2023 19:59:02 GMT Subject: RFR: 8303431: [JVMCI] libgraal annotation API [v4] In-Reply-To: References: Message-ID: The message from this sender included one or more files which could not be scanned for virus detection; do not open these files unless you are certain of the sender's intent. ---------------------------------------------------------------------- > This PR extends JVMCI with new API (`jdk.vm.ci.meta.Annotated`) for accessing annotations. The main differences from `java.lang.reflect.AnnotatedElement` are: > * Each `Annotated` method explicitly specifies the annotation type(s) for which it wants annotation data. That is, there is no direct equivalent of `AnnotatedElement.getAnnotations()`. > * Annotation data is returned in a map-like object (of type `jdk.vm.ci.meta.AnnotationData`) instead of in an `Annotation` object. This works better for libgraal as it avoids the need for annotation types to be loaded and included in libgraal. > > To demonstrate the new API, here's an example in terms `java.lang.reflect.AnnotatedElement` (which `ResolvedJavaType` implements): > > ResolvedJavaMethod method = ...; > ExplodeLoop a = method.getAnnotation(ExplodeLoop.class); > return switch (a.kind()) { > case FULL_UNROLL -> LoopExplosionKind.FULL_UNROLL; > case FULL_UNROLL_UNTIL_RETURN -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; > ... > } > > > The same code using the new API: > > > ResolvedJavaMethod method = ...; > ResolvedJavaType explodeLoopType = ...; > AnnotationData a = method.getAnnotationDataFor(explodeLoopType); > return switch (a.getEnum("kind").getName()) { > case "FULL_UNROLL" -> LoopExplosionKind.FULL_UNROLL; > case "FULL_UNROLL_UNTIL_RETURN" -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; > ... > } > > > The implementation relies on new methods in `jdk.internal.vm.VMSupport` for parsing annotations and serializing/deserializing to/from a byte array. This allows the annotation data to be passed from the HotSpot heap to the libgraal heap. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: switched to use of lists and maps instead of arrays ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12810/files - new: https://git.openjdk.org/jdk/pull/12810/files/3dd5ef9c..948d3aa3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12810&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12810&range=02-03 Stats: 1241 lines in 15 files changed: 292 ins; 718 del; 231 mod Patch: https://git.openjdk.org/jdk/pull/12810.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12810/head:pull/12810 PR: https://git.openjdk.org/jdk/pull/12810 From duke at openjdk.org Wed Mar 8 21:10:16 2023 From: duke at openjdk.org (jsolomon8080) Date: Wed, 8 Mar 2023 21:10:16 GMT Subject: RFR: 8302073: Specifying OnError handler prevents WatcherThread to break a deadlock in report_and_die() In-Reply-To: References: Message-ID: <3ZipShQ_gHqbJaO1aIg8yk-OcPwKg2D6eFJghlad5DM=.29bb804e-42a7-4d2c-b99c-31c21f663f24@github.com> On Wed, 8 Mar 2023 14:05:44 GMT, Alexey Pavlyutkin wrote: > The patch fixes error reporting to check timeout in the case when a user specifies OnError hander. Before VMError:check_timeout() ignored timeout in this case, and so didn't break malloc() deadlock. > > Verification (amd64/20.04LTS): the idea of the test is to crash JVM running with error hander of 3 successive `sleep` commands for 1s, 10s, and 60s with and without specified timeout > > > 16:52:17 at alex@alex-VirtualBox>( echo " > public class C { > public static void main(String[] args) throws Throwable { >> while (true) Thread.sleep(1000); >> } >> } >> " >> C.java ) > 16:57:35 at alex@alex-VirtualBox>./build/linux-x86_64-server-release/images/jdk/bin/java -XX:OnError='sleep 1;sleep 10;sleep 60' ./C.java & > [2] 179574 > 17:00:19 at alex@alex-VirtualBox>kill -s SIGSEGV 179574 > 17:00:27 at alex@alex-VirtualBox># > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x00007f7b1701ecd5 (sent by kill), pid=179574, tid=179574 > # > # JRE version: OpenJDK Runtime Environment (21.0) (build 21-internal-adhoc.alex.jdk) > # Java VM: OpenJDK 64-Bit Server VM (21-internal-adhoc.alex.jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > # Problematic frame: > # C [libpthread.so.0+0x9cd5] __pthread_clockjoin_ex+0x255 > # > # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /home/alex/jdk/core.179574) > # > # An error report file with more information is saved as: > # /home/alex/jdk/hs_err_pid179574.log > # > # If you would like to submit a bug report, please visit: > # https://bugreport.java.com/bugreport/crash.jsp > # > # > # -XX:OnError="sleep 1;sleep 10;sleep 60" > # Executing /bin/sh -c "sleep 1" ... > # Executing /bin/sh -c "sleep 10" ... > # Executing /bin/sh -c "sleep 60" ... > > [2]+ Aborted (core dumped) ./build/linux-x86_64-server-release/images/jdk/bin/java -XX:OnError='sleep 1;sleep 10;sleep 60' ./C.java > 17:02:03 at alex@alex-VirtualBox>./build/linux-x86_64-server-release/images/jdk/bin/java -XX:ErrorLogTimeout=5 -XX:OnError='sleep 1;sleep 10;sleep 60' ./C.java & > [2] 179602 > 17:02:32 at alex@alex-VirtualBox>kill -s SIGSEGV 179602 > 17:02:41 at alex@alex-VirtualBox># > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x00007f9d71b18cd5 (sent by kill), pid=179602, tid=179602 > # > # JRE version: OpenJDK Runtime Environment (21.0) (build 21-internal-adhoc.alex.jdk) > # Java VM: OpenJDK 64-Bit Server VM (21-internal-adhoc.alex.jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > # Problematic frame: > # C [libpthread.so.0+0x9cd5] __pthread_clockjoin_ex+0x255 > # > # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /home/alex/jdk/core.179602) > # > # An error report file with more information is saved as: > # /home/alex/jdk/hs_err_pid179602.log > # > # If you would like to submit a bug report, please visit: > # https://bugreport.java.com/bugreport/crash.jsp > # > # > # -XX:OnError="sleep 1;sleep 10;sleep 60" > # Executing /bin/sh -c "sleep 1" ... > # Executing /bin/sh -c "sleep 10" ... > > ------ Timeout during error reporting after 11 s. ------ > > 17:02:54 at alex@alex-VirtualBox> > > > Regression (amd64/20.04LTS): `test/hotspot/jtreg/runtime/ErrorHandling` with different combinations of `-vmoption:-XX:ErrorLogTimeout=10` and `-vmoption:-XX:OnError='sleep 10'` Hi - I'm the originator of this bug report. I'm glad this is getting fixed and I will ultimately defer to the java experts since I have never contributed code to this code base, but if it were me, I'd do the simplest thing possible. The solution presented here seems overly complex. I don't understand why an OnError script or an abort_hook is so special. I think it's up to the user to ensure that neither takes longer than ErrorLogTimeout, which defaults to 2 minutes. That's an eternity. Do java users really expect to wait 2 minutes for their processes to exit? If this were my code, I would remove any guarantee about OnError and make it responsibility of the user to set ErrorLogTimeout appropriately. I understand that there may be many users who have counted on this guarantee and you can't break them now. I also understand that I'm not aware of all the ways that OnError is used. I'm sure we will get to some solution that will fix the real problem, which is what I care about the most. Thank you. ------------- PR: https://git.openjdk.org/jdk/pull/12925 From dholmes at openjdk.org Wed Mar 8 21:34:07 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 8 Mar 2023 21:34:07 GMT Subject: RFR: 8302073: Specifying OnError handler prevents WatcherThread to break a deadlock in report_and_die() In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 17:18:30 GMT, Thomas Stuefe wrote: >> The patch fixes error reporting to check timeout in the case when a user specifies OnError hander. Before VMError:check_timeout() ignored timeout in this case, and so didn't break malloc() deadlock. >> >> Verification (amd64/20.04LTS): the idea of the test is to crash JVM running with error hander of 3 successive `sleep` commands for 1s, 10s, and 60s with and without specified timeout >> >> >> 16:52:17 at alex@alex-VirtualBox>( echo " >> public class C { >> public static void main(String[] args) throws Throwable { >>> while (true) Thread.sleep(1000); >>> } >>> } >>> " >> C.java ) >> 16:57:35 at alex@alex-VirtualBox>./build/linux-x86_64-server-release/images/jdk/bin/java -XX:OnError='sleep 1;sleep 10;sleep 60' ./C.java & >> [2] 179574 >> 17:00:19 at alex@alex-VirtualBox>kill -s SIGSEGV 179574 >> 17:00:27 at alex@alex-VirtualBox># >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # SIGSEGV (0xb) at pc=0x00007f7b1701ecd5 (sent by kill), pid=179574, tid=179574 >> # >> # JRE version: OpenJDK Runtime Environment (21.0) (build 21-internal-adhoc.alex.jdk) >> # Java VM: OpenJDK 64-Bit Server VM (21-internal-adhoc.alex.jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) >> # Problematic frame: >> # C [libpthread.so.0+0x9cd5] __pthread_clockjoin_ex+0x255 >> # >> # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /home/alex/jdk/core.179574) >> # >> # An error report file with more information is saved as: >> # /home/alex/jdk/hs_err_pid179574.log >> # >> # If you would like to submit a bug report, please visit: >> # https://bugreport.java.com/bugreport/crash.jsp >> # >> # >> # -XX:OnError="sleep 1;sleep 10;sleep 60" >> # Executing /bin/sh -c "sleep 1" ... >> # Executing /bin/sh -c "sleep 10" ... >> # Executing /bin/sh -c "sleep 60" ... >> >> [2]+ Aborted (core dumped) ./build/linux-x86_64-server-release/images/jdk/bin/java -XX:OnError='sleep 1;sleep 10;sleep 60' ./C.java >> 17:02:03 at alex@alex-VirtualBox>./build/linux-x86_64-server-release/images/jdk/bin/java -XX:ErrorLogTimeout=5 -XX:OnError='sleep 1;sleep 10;sleep 60' ./C.java & >> [2] 179602 >> 17:02:32 at alex@alex-VirtualBox>kill -s SIGSEGV 179602 >> 17:02:41 at alex@alex-VirtualBox># >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # SIGSEGV (0xb) at pc=0x00007f9d71b18cd5 (sent by kill), pid=179602, tid=179602 >> # >> # JRE version: OpenJDK Runtime Environment (21.0) (build 21-internal-adhoc.alex.jdk) >> # Java VM: OpenJDK 64-Bit Server VM (21-internal-adhoc.alex.jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) >> # Problematic frame: >> # C [libpthread.so.0+0x9cd5] __pthread_clockjoin_ex+0x255 >> # >> # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /home/alex/jdk/core.179602) >> # >> # An error report file with more information is saved as: >> # /home/alex/jdk/hs_err_pid179602.log >> # >> # If you would like to submit a bug report, please visit: >> # https://bugreport.java.com/bugreport/crash.jsp >> # >> # >> # -XX:OnError="sleep 1;sleep 10;sleep 60" >> # Executing /bin/sh -c "sleep 1" ... >> # Executing /bin/sh -c "sleep 10" ... >> >> ------ Timeout during error reporting after 11 s. ------ >> >> 17:02:54 at alex@alex-VirtualBox> >> >> >> Regression (amd64/20.04LTS): `test/hotspot/jtreg/runtime/ErrorHandling` with different combinations of `-vmoption:-XX:ErrorLogTimeout=10` and `-vmoption:-XX:OnError='sleep 10'` > > Pinging @dholmes-ora for input and opinion I thought we were still discussing options in JBS so this PR seems premature to me. I agree with @tstuefe initial comment this seems way too complex and I'm not even sure I can figure out the control flow here. ------------- PR: https://git.openjdk.org/jdk/pull/12925 From dcubed at openjdk.org Wed Mar 8 22:25:25 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 8 Mar 2023 22:25:25 GMT Subject: RFR: 8300926: Several startup regressions ~6-70% in 21-b6 all platforms [v8] In-Reply-To: References: Message-ID: On Tue, 7 Mar 2023 13:47:55 GMT, Robbin Ehn wrote: >> Hi all, please consider. >> >> The original issue was when thread 1 asked to deopt nmethod set X and thread 2 asked for the same or a subset of X. >> All method will already be marked, but the actual deoptimizing, not entrant, patching PC on stacks and patching post call nops, was not done yet. Which meant thread 2 could 'pass' thread 1. >> Most places did deopt under Compile_lock, thus this is not an issue, but WB and clearCallSiteContext do not. >> >> Since a handshakes may take long before completion and Compile_lock is used for so much more than deopts. >> The fix in https://bugs.openjdk.org/browse/JDK-8299074 instead always emitted a handshake even when everything was already marked. (instead of adding Compile_lock to all places) >> >> This turnout to be problematic in the startup, for example the number of deopt handshakes in jetty dry run startup went from 5 to 39 handshakes. >> >> This fix first adds a barrier for which you do not pass until the requested deopts have happened and it coalesces the handshakes. >> Secondly it moves handshakes part out of the Compile_lock where it is possible. >> >> Which means we fix the performance bug and we reduce the contention on Compile_lock, meaning higher throughput in compiler and things such as class-loading. >> >> It passes t1-t7 with flying colours! t8 still not completed and I'm redoing some testing due to last minute simplifications. >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request incrementally with two additional commits since the last revision: > > - Grab lock so code() is stable > - Non CHA based vtables fix Re-reviewed v06 (main merge) and v07. Still thumbs up. ------------- Marked as reviewed by dcubed (Reviewer). PR: https://git.openjdk.org/jdk/pull/12585 From dnsimon at openjdk.org Wed Mar 8 22:59:23 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 8 Mar 2023 22:59:23 GMT Subject: RFR: 8303431: [JVMCI] libgraal annotation API [v5] In-Reply-To: References: Message-ID: > This PR extends JVMCI with new API (`jdk.vm.ci.meta.Annotated`) for accessing annotations. The main differences from `java.lang.reflect.AnnotatedElement` are: > * Each `Annotated` method explicitly specifies the annotation type(s) for which it wants annotation data. That is, there is no direct equivalent of `AnnotatedElement.getAnnotations()`. > * Annotation data is returned in a map-like object (of type `jdk.vm.ci.meta.AnnotationData`) instead of in an `Annotation` object. This works better for libgraal as it avoids the need for annotation types to be loaded and included in libgraal. > > To demonstrate the new API, here's an example in terms `java.lang.reflect.AnnotatedElement` (which `ResolvedJavaType` implements): > > ResolvedJavaMethod method = ...; > ExplodeLoop a = method.getAnnotation(ExplodeLoop.class); > return switch (a.kind()) { > case FULL_UNROLL -> LoopExplosionKind.FULL_UNROLL; > case FULL_UNROLL_UNTIL_RETURN -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; > ... > } > > > The same code using the new API: > > > ResolvedJavaMethod method = ...; > ResolvedJavaType explodeLoopType = ...; > AnnotationData a = method.getAnnotationDataFor(explodeLoopType); > return switch (a.getEnum("kind").getName()) { > case "FULL_UNROLL" -> LoopExplosionKind.FULL_UNROLL; > case "FULL_UNROLL_UNTIL_RETURN" -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; > ... > } > > > The implementation relies on new methods in `jdk.internal.vm.VMSupport` for parsing annotations and serializing/deserializing to/from a byte array. This allows the annotation data to be passed from the HotSpot heap to the libgraal heap. Doug Simon has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - Merge remote-tracking branch 'openjdk-jdk/master' into JDK-8303431 - switched to use of lists and maps instead of arrays - fixed whitespace - added support for inherited annotations - Merge branch 'master' into JDK-8303431 - made AnnotationDataDecoder package-private - add annotation API to JVMCI ------------- Changes: https://git.openjdk.org/jdk/pull/12810/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12810&range=04 Stats: 2326 lines in 34 files changed: 2273 ins; 23 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/12810.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12810/head:pull/12810 PR: https://git.openjdk.org/jdk/pull/12810 From dholmes at openjdk.org Wed Mar 8 23:02:17 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 8 Mar 2023 23:02:17 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v4] In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 18:56:55 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> We are adding support to let JFR report on Agents. >> >> #### Design >> >> An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. >> >> A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) >> >> A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. >> >> To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: >> >> // Command line >> jdk.JavaAgent { >> startTime = 12:31:19.789 (2023-03-08) >> name = "JavaAgent.jar" >> options = "foo=bar" >> initialization = 12:31:15.574 (2023-03-08) >> initializationTime = 172 ms >> initializationMethod = "premain" >> } >> // Dynamic load >> jdk.JavaAgent { >> startTime = 12:31:31.158 (2023-03-08) >> name = "JavaAgent.jar" >> options = "bar=baz" >> initialization = 12:31:31.037 (2023-03-08) >> initializationTime = 64,1 ms >> initializationMethod = "agentmain" >> } >> >> The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. >> >> For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar >> >> The event will also detail which initialization method was invoked by the JVM, "premain" for command line agents, and "agentmain" for agents loaded dynamically. >> >> "initialization" is the timestamp the JVM invoked the initialization method, and "initializationTime" is the duration of executing the initialization method. >> >> "startTime" represents the time the JFR framework issued the periodic event; hence "initialization" will be earlier than "startTime". >> >> An agent can also be written in a native programming language, using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. >> >> To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: >> >> jdk.NativeAgent { >> startTime = 12:31:40.398 (2023-03-08) >> name = "jdwp" >> options = "transport=dt_socket,server=y,address=any,onjcmd=y" >> path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" >> dynamic = false >> initialization = 12:31:36.142 (2023-03-08) >> initializationTime = 0,00184 ms >> } >> >> The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported, and there is also a denotation if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true). >> >> The initialization of a native agent is performed by invoking an agent-specified callback routine which is not detailed in the event (for now). The "initialization" is the time the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationTime" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationTime" will be 0. >> >> #### Implementation >> >> There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. >> >> Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. >> >> To implement this capability, it was necessary to refactor the code used to represent agents, called AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. >> >> The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. >> >> The previous lists that maintain the agents (JVMTI) and libraries (JVMPI) are not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. >> >> Testing: jdk_jfr, tier 1 - 6 >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with two additional commits since the last revision: > > - remove JVMPI > - cleanup This seems a very large and intrusive change just to give some data about agents (sorry if that sounds flippant). IIUC you are generating events for stuff that (used to?) happens very early in the initialization process and for which you now need to load a whole heap of JFR classes - which could themselves be subject to the actions of the agent. The impact of this on the overall initialization process is very hard to gauge. src/hotspot/share/runtime/threads.cpp line 338: > 336: if (EagerXrunInit && Arguments::init_libraries_at_startup()) { > 337: create_vm_init_libraries(); > 338: } Not obvious where this went. Changes to the initialization order can be very problematic. ------------- PR: https://git.openjdk.org/jdk/pull/12923 From mgronlun at openjdk.org Wed Mar 8 23:32:08 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 8 Mar 2023 23:32:08 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v4] In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 18:56:55 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> We are adding support to let JFR report on Agents. >> >> #### Design >> >> An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. >> >> A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) >> >> A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. >> >> To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: >> >> // Command line >> jdk.JavaAgent { >> startTime = 12:31:19.789 (2023-03-08) >> name = "JavaAgent.jar" >> options = "foo=bar" >> initialization = 12:31:15.574 (2023-03-08) >> initializationTime = 172 ms >> initializationMethod = "premain" >> } >> // Dynamic load >> jdk.JavaAgent { >> startTime = 12:31:31.158 (2023-03-08) >> name = "JavaAgent.jar" >> options = "bar=baz" >> initialization = 12:31:31.037 (2023-03-08) >> initializationTime = 64,1 ms >> initializationMethod = "agentmain" >> } >> >> The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. >> >> For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar >> >> The event will also detail which initialization method was invoked by the JVM, "premain" for command line agents, and "agentmain" for agents loaded dynamically. >> >> "initialization" is the timestamp the JVM invoked the initialization method, and "initializationTime" is the duration of executing the initialization method. >> >> "startTime" represents the time the JFR framework issued the periodic event; hence "initialization" will be earlier than "startTime". >> >> An agent can also be written in a native programming language, using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. >> >> To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: >> >> jdk.NativeAgent { >> startTime = 12:31:40.398 (2023-03-08) >> name = "jdwp" >> options = "transport=dt_socket,server=y,address=any,onjcmd=y" >> path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" >> dynamic = false >> initialization = 12:31:36.142 (2023-03-08) >> initializationTime = 0,00184 ms >> } >> >> The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported, and there is also a denotation if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true). >> >> The initialization of a native agent is performed by invoking an agent-specified callback routine which is not detailed in the event (for now). The "initialization" is the time the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationTime" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationTime" will be 0. >> >> #### Implementation >> >> There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. >> >> Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. >> >> To implement this capability, it was necessary to refactor the code used to represent agents, called AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. >> >> The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. >> >> The previous lists that maintain the agents (JVMTI) and libraries (JVMPI) are not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. >> >> Testing: jdk_jfr, tier 1 - 6 >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with two additional commits since the last revision: > > - remove JVMPI > - cleanup No need to load any JFR classes. No change to startup logic. ------------- PR: https://git.openjdk.org/jdk/pull/12923 From dchuyko at openjdk.org Thu Mar 9 00:18:05 2023 From: dchuyko at openjdk.org (Dmitry Chuyko) Date: Thu, 9 Mar 2023 00:18:05 GMT Subject: RFR: 8300669: AArch64: Table based tails processing and wider stores for Arrays.fill() intrinsic [v7] In-Reply-To: References: Message-ID: <0HhfPpk5EIXfhlmdTaT-ik1EQWgYXSKkK7f4fuLKGh0=.9e690153-fc70-49a0-aada-2829747da8cf@github.com> > This is a new AArch64 implementation of existing (1-4-byte element) stubs that are called in C2-compiled code for array fill patterns and Arrays.fill(). > > Main variant of existing algorithm: > > > [Short arrays (< 8 bytes): fill by element and exit]; > // ... > [align base to 8 bytes]; > // ... > // fill_words > head_len = (cnt & 14) / 2; > switch (head_len) { > do { > cnt -= 16; > stp; > case 7: > stp; > case 6: > stp; > // ... > case 1: > stp; > case 0: > base += 8*16; > } while (cnt); > } > [(over)write a tail < 8 bytes]; > > > Even in good case, only 16-byte GPR (STP) stores are used, and there is a jump for every 8 stores. There is always extra work to be done for misaligned targets, which especially affects small to medium lengths. > > The new implementation generates fill implementation for every length up to a certain threshold (160-byte length). These implementations form a table where you jump when the remaining target length is suitable. > > For each table entry (target length), we can have no branches and use the most number of widest possible stores that best fit the detected CPU model. Currently it is SIMD STPQ for Neoverse N2 and GPR STP for the rest. The choice is made after benchmarking and is controlled by the new UseSIMDForArrayFill flag in AArch64. > > Main variant of the new algorithm (see mode detailed description in comments): > > > [align data at 16 bytes]; > while(cnt_bytes > 128) { > [store 128 bytes]; > cnt_bytes -= 128; > } > [store tail of 0..127 bytes]; > > > > Both existing and proposed implementations specifically handle zero fill case (see comments about ZVA). New implementation contains a path for very small arrays that can be cut to further improve more generic case (added to avoid regressions). > > The check added in https://bugs.openjdk.org/browse/JDK-8298720 in StubGenerator is removed as it is a stub code being generated. For the selected threshold, the increase in code size is within 8 KB. > > New test TestArraysFill is added to intrinsics jtreg tests. It calls optimized versions of 2-arg and 4-arg Arrays.fill() for different data types, lengths and patterns. The target data is checked to be filled with the required value, the surrounding data is checked to be intact. > > Existing test/micro/org/openjdk/bench/java/util/ArraysFill.java benchmark was used only initially. There are many cases and data lengths to cover. A modified version of the benchmark is attached [1] to the RFE, but not included in the change as it takes too long to complete all valuable variants. > > Resulting performance data are listed in the spreadsheet [2] attache to the RFE. Target processors were Graviton 3, Graviton 2, TaiShan, A72 and A53. Latest data from Altra is not included but the picture there was similar to Graviton 2 in all experiments. There is a range of target lengths with various enhancement numbers. Interesting lengths are within table implementation threshold and close to them (stepped), small lengths (all) and long lengths (1 point, they look similar). Over this voluntary selection: > > - No major regressions were found. > - Geomean improvement: 11-33% > - Median improvement: 10-48% > > Testing: tier1, tier2 and the new test on fastdebug aarch64 and x86. > > [1] https://bugs.openjdk.org/secure/attachment/102426/ArraysFill.java > [2] https://bugs.openjdk.org/secure/attachment/102427/arrays-fill.ods Dmitry Chuyko has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge branch 'openjdk:master' into JDK-8300669 - Var in test - Merge branch 'openjdk:master' into JDK-8300669 - Wording about alignment - Fixed compilation on win/mac - Merge branch 'openjdk:master' into JDK-8300669 - Table based arrays_fill stub implementation for aarch64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12222/files - new: https://git.openjdk.org/jdk/pull/12222/files/6e5ff006..c6c85567 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12222&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12222&range=05-06 Stats: 115119 lines in 3209 files changed: 55358 ins; 26985 del; 32776 mod Patch: https://git.openjdk.org/jdk/pull/12222.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12222/head:pull/12222 PR: https://git.openjdk.org/jdk/pull/12222 From dholmes at openjdk.org Thu Mar 9 00:26:06 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 9 Mar 2023 00:26:06 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v4] In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 23:28:52 GMT, Markus Gr?nlund wrote: > No need to load any JFR classes. I thought JFR was all Java-based these days. But if no Java involved then that is good. > No change to startup logic. I flagged a change in my comment above. ------------- PR: https://git.openjdk.org/jdk/pull/12923 From dholmes at openjdk.org Thu Mar 9 04:55:24 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 9 Mar 2023 04:55:24 GMT Subject: RFR: 8302491: NoClassDefFoundError omits the original cause of an error [v6] In-Reply-To: References: Message-ID: On Tue, 7 Mar 2023 23:54:07 GMT, Ilarion Nakonechnyy wrote: >> The proposed approach added a new function for getting the cause of an exception -`java_lang_Throwable::get_cause_simple `, that gets called within `InstanceKlass::add_initialization_error` if an old one `java_lang_Throwable::get_cause_with_stack_trace` didn't succeed because of an exception during the VM call. The simple function doesn't call the VM for getting a stack trace but fills in any other information about an exception. >> >> Besides that, the discovering information about an exception was added to `ConstantPoolCacheEntry::save_and_throw_indy_exc` function. >> >> Jtreg for reproducing the issue also was added to the commit. >> The commit was tested with tier1 tests. > > Ilarion Nakonechnyy has updated the pull request incrementally with one additional commit since the last revision: > > Address a review notes Two small adjustments needed and a couple of nits. Thanks src/hotspot/share/classfile/javaClasses.cpp line 2742: > 2740: // not keep classes alive in the stack trace. > 2741: // call this: public StackTraceElement[] getStackTrace() > 2742: assert(throwable.not_null(), "shouldn't be"); You moved this assert but it needs to be before the first use of throwable() src/hotspot/share/classfile/javaClasses.cpp line 2744: > 2742: // symbolic stacktrace of 'throwable'. > 2743: > 2744: // Now create the message with the original exception and thread name. Existing: s/with/from/ src/hotspot/share/classfile/javaClasses.cpp line 2759: > 2757: Handle h_eiie = Exceptions::new_exception(current, exception_name, st.as_string()); > 2758: // If new_exception returns a different exception while creating the exception, > 2759: // abandon the attempts to save the initialization error and return null. Nit: s/attempts/attempt/ src/hotspot/share/oops/instanceKlass.cpp line 985: > 983: Handle init_error = java_lang_Throwable::create_initialization_error(current, exception); > 984: > 985: if ( init_error.is_null()) { Nit: extra space after ( src/hotspot/share/oops/instanceKlass.cpp line 986: > 984: > 985: if ( init_error.is_null()) { > 986: log_trace(class, init)("Initialization error is null for class %s", external_name()); You need to move the ResourceMark from line 995 to line 984 to cover this new logging statement. ------------- Changes requested by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/12566 From stuefe at openjdk.org Thu Mar 9 06:23:12 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 9 Mar 2023 06:23:12 GMT Subject: RFR: JDK-8303861: Error handling step timeouts should never be blocked by OnError and others Message-ID: Fatal error handling is subject to several timeouts: - a global timeout (controlled via ErrorLogTimeout) - local error reporting step timeouts. The latter aims to "give the JVM a kick" if it gets stuck in one particular place during error reporting. This prevents one error reporting step from hogging all the time allotted to error reporting under ErrorLogTimeout. There are three situations where atm we suppress the global error timeout: - if the JVM is embedded and the launcher has its abort hook installed. Obviously, that must be allowed to run. - if the user specified one or more OnError commands to run, and these did not yet run. These must have a chance to run unmolested. - if the user (typically developer) specified ShowMessageBoxOnError, and the error box has not yet been shown There is a bug though, that also prevents the step timeout from firing if either condition is true. That is plain wrong. In addition to that, the test interval WatcherThread uses to check for timeouts should be decreased. It sits at 1 second, which is too coarse-grained. -------- Patch: - reworks `VMError::check_timeout()` to never block step timeouts - adds clarifying comments - quadruples timeout check frequency by watcher thread - adds regression test for timeout handling with OnError Tested locally on Linux x64. ------------- Commit messages: - JDK-8303861-Error-handling-step-timeouts-should-never-be-blocked-by-OnError-and-others Changes: https://git.openjdk.org/jdk/pull/12936/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12936&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303861 Stats: 88 lines in 3 files changed: 55 ins; 1 del; 32 mod Patch: https://git.openjdk.org/jdk/pull/12936.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12936/head:pull/12936 PR: https://git.openjdk.org/jdk/pull/12936 From stuefe at openjdk.org Thu Mar 9 06:30:18 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 9 Mar 2023 06:30:18 GMT Subject: RFR: 8302073: Specifying OnError handler prevents WatcherThread to break a deadlock in report_and_die() In-Reply-To: References: Message-ID: <0jlQWouDE0tJ-ysn7WFYArqrJwHFQ-hhsZKKRGdVhmU=.95c4ceb0-9b8e-4c9b-9029-473626fb5a6b@github.com> The message from this sender included one or more files which could not be scanned for virus detection; do not open these files unless you are certain of the sender's intent. ---------------------------------------------------------------------- On Wed, 8 Mar 2023 21:31:14 GMT, David Holmes wrote: >> Pinging @dholmes-ora for input and opinion > > I thought we were still discussing options in JBS so this PR seems premature to me. I agree with @tstuefe initial comment this seems way too complex and I'm not even sure I can figure out the control flow here. @dholmes-ora My proposal would be to be pragmatic and continue the discussion here. This has been going on too long. Switching channels again would be more confusing. I would prefer JBS or ML discussion as well, but not everyone (eg. @jsolomon8080) has write access to JBS. ---- I thought this over some more. A big part of this problem is a plain bug in VMError::check_timeout(): it should never block *step timeouts*. We added step timeouts back in 2016 to deal with these kinds of problems: if the JVM hangs in error reporting, give it a kick to get it going again. Do so repeatedly. This is related to the global timeout, but works also independently from it (and therefore from the question of when to honor the global timeout). I opened https://bugs.openjdk.org/browse/JDK-8303861 to deal with this, see PR: https://github.com/openjdk/jdk/pull/12936 This may already be a big help for cases like these. In fact, it may already be enough, and we could maybe close this issue. However, if we have a recursive malloc situation, we may hang repeatedly. JVM will kick itself alive every time (with my patch) but this is still annoying. The root problem here is that we should not use malloc during error handling. Cannot always be avoided, but at least we should minimize malloc use. Decoder, in particular, should not use malloc. Therefore I also opened https://bugs.openjdk.org/browse/JDK-8303862 to track that. I won't have time to work on that. I have the hope that maybe @chhagedorn can :-) ? Otherwise Azul may also chip in some bug fixing. ---- All these are unrelated to the question of whether OnError should be blocked or not. I realize now that if we decide to (continue to) protect OnError from timeouts, we must never act on the global timeout until all OnError steps ran. Since these run right before VM exit, the original implementation that just blocked the global timeout altogether was actually right. So, there's that. Maybe JDK-8303861 is already enough for cases like this. At least JDK-8303861 does not introduce any backward compatibility issues. ------------- PR: https://git.openjdk.org/jdk/pull/12925 From kvn at openjdk.org Thu Mar 9 06:32:30 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 9 Mar 2023 06:32:30 GMT Subject: RFR: JDK-8300783: Consolidate byteswap implementations [v14] In-Reply-To: <-b783DPmWbWFeigKf7F7SFYddDKwErM4AdFcfRx01eM=.5794fa08-a0d7-4761-a449-8ebfd639e30d@github.com> References: <-b783DPmWbWFeigKf7F7SFYddDKwErM4AdFcfRx01eM=.5794fa08-a0d7-4761-a449-8ebfd639e30d@github.com> Message-ID: On Wed, 15 Feb 2023 15:39:14 GMT, Justin King wrote: >> Deduplicate byte swapping implementations by consolidating them into `utilities/byteswap.hpp`, following `std::byteswap` introduced in C++23. Further simplification of `Bytes` will follow in https://github.com/openjdk/jdk/pull/12078. > > Justin King has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 24 additional commits since the last revision: > > - Merge remote-tracking branch 'upstream/master' into byteswap > - Update based on review > > Signed-off-by: Justin King > - Fix copyright > > Signed-off-by: Justin King > - Update copyright > > Signed-off-by: Justin King > - Add missing include > > Signed-off-by: Justin King > - Remove unused include > > Signed-off-by: Justin King > - Reorganize tests > > Signed-off-by: Justin King > - Fix test > > Signed-off-by: Justin King > - Merge remote-tracking branch 'upstream/master' into byteswap > - Be restrict on requiring 1, 2, 4, or 8 byte integers > > Signed-off-by: Justin King > - ... and 14 more: https://git.openjdk.org/jdk/compare/fa0103ad...223d733b This looks good to me. I will run testing for latest version before approval. ------------- PR: https://git.openjdk.org/jdk/pull/12114 From rehn at openjdk.org Thu Mar 9 07:16:28 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 9 Mar 2023 07:16:28 GMT Subject: RFR: 8300926: Several startup regressions ~6-70% in 21-b6 all platforms [v8] In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 22:21:57 GMT, Daniel D. Daugherty wrote: > Re-reviewed v06 (main merge) and v07. Still thumbs up. Thank you! ------------- PR: https://git.openjdk.org/jdk/pull/12585 From stuefe at openjdk.org Thu Mar 9 08:34:13 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 9 Mar 2023 08:34:13 GMT Subject: RFR: 8302073: Specifying OnError handler prevents WatcherThread to break a deadlock in report_and_die() In-Reply-To: <3ZipShQ_gHqbJaO1aIg8yk-OcPwKg2D6eFJghlad5DM=.29bb804e-42a7-4d2c-b99c-31c21f663f24@github.com> References: <3ZipShQ_gHqbJaO1aIg8yk-OcPwKg2D6eFJghlad5DM=.29bb804e-42a7-4d2c-b99c-31c21f663f24@github.com> Message-ID: On Wed, 8 Mar 2023 20:09:38 GMT, jsolomon8080 wrote: > Hi - I'm the originator of this bug report. I'm glad this is getting fixed and I will ultimately defer to the java experts since I have never contributed code to this code base, but if it were me, I'd do the simplest thing possible. The solution presented here seems overly complex. I don't understand why an OnError script or an abort_hook is so special. I think it's up to the user to ensure that neither takes longer than ErrorLogTimeout, which defaults to 2 minutes. That's an eternity. Do java users really expect to wait 2 minutes for their processes to exit? Some do, some don't, there are cases for both. But that is not the point. We must guarantee that all OnError commands are run and that the abort hook is called. Since these things happen after error reporting, we cannot interrupt error reporting. This is a backward compatibility question. There are arguments for ErrorLogTimeout having precedence, and there are arguments for the current behavior. Both sides have good arguments, but OpenJDK tries to err on the side of compatibility here. > If this were my code, I would remove any guarantee about OnError and make it responsibility of the user to set ErrorLogTimeout appropriately. Nothing is stopping Azul from changing this behavior downstream in their VM, but upstream we need to find a consensus with all interested parties. > > I understand that there may be many users who have counted on this guarantee and you can't break them now. I also understand that I'm not aware of all the ways that OnError is used. I'm sure we will get to some solution that will fix the real problem, which I care about the most. Thank you. Let's see if this issue is moot once the fix for JDK-8303861 is out. Cheers, Thomas ------------- PR: https://git.openjdk.org/jdk/pull/12925 From adinn at openjdk.org Thu Mar 9 09:18:08 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 9 Mar 2023 09:18:08 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v4] In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 23:28:52 GMT, Markus Gr?nlund wrote: >> Markus Gr?nlund has updated the pull request incrementally with two additional commits since the last revision: >> >> - remove JVMPI >> - cleanup > > No need to load any JFR classes. No change to startup logic. @mgronlun Why mark Java agents as command-line or dynamic using `initializationMethod = "premain"/"agentMain"` and mark native agents using `dynamic = true/false`? Why not use `dynamic` for both? ------------- PR: https://git.openjdk.org/jdk/pull/12923 From mgronlun at openjdk.org Thu Mar 9 09:24:16 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 9 Mar 2023 09:24:16 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v4] In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 23:28:52 GMT, Markus Gr?nlund wrote: >> Markus Gr?nlund has updated the pull request incrementally with two additional commits since the last revision: >> >> - remove JVMPI >> - cleanup > > No need to load any JFR classes. No change to startup logic. > @mgronlun Why mark Java agents as command-line or dynamic using `initializationMethod = "premain"/"agentMain"` and mark native agents using `dynamic = true/false`? Why not use `dynamic` for both? Hi Andrew, that's a good question. I thought it could be derived in the JavaAgent case, because there are only two entry points, "premain" implies static and "agentmain" implies dynamic. For the native case, there is no information about the callback (I had it, but it depends on symbols), so the dynamic field is made explicit. It can also be added to the JavaAgent if that makes it clearer. ------------- PR: https://git.openjdk.org/jdk/pull/12923 From mgronlun at openjdk.org Thu Mar 9 09:32:19 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 9 Mar 2023 09:32:19 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v4] In-Reply-To: References: Message-ID: <5bzaYlM6HXfUNJITjTSIaGgcJ_51OQf6XWr07w__wUw=.d0a9ac8b-a9a1-4122-9d2f-880de717d071@github.com> On Wed, 8 Mar 2023 22:56:31 GMT, David Holmes wrote: >> Markus Gr?nlund has updated the pull request incrementally with two additional commits since the last revision: >> >> - remove JVMPI >> - cleanup > > src/hotspot/share/runtime/threads.cpp line 338: > >> 336: if (EagerXrunInit && Arguments::init_libraries_at_startup()) { >> 337: create_vm_init_libraries(); >> 338: } > > Not obvious where this went. Changes to the initialization order can be very problematic. Thanks, David. Two calls to launch XRun agents are invoked during startup, and they depend on the EagerXrunInit option. The !EagerXrunInit case is already located in create_vm(), but the EagerXrunInit was located as the first entry in initialize_java_lang_classes(), which I thought was tucked away a bit unnecessarily. I hoisted the EagerXrunInit from initialize_java_lang_classes() into to create_vm(). It's now the call just before initialize_java_lang_classes(). This made it clearer, i.e. to have both calls located directly in create_vm(). ------------- PR: https://git.openjdk.org/jdk/pull/12923 From mgronlun at openjdk.org Thu Mar 9 09:35:17 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 9 Mar 2023 09:35:17 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v4] In-Reply-To: References: Message-ID: On Thu, 9 Mar 2023 00:23:39 GMT, David Holmes wrote: > > No need to load any JFR classes. > > I thought JFR was all Java-based these days. But if no Java involved then that is good. Ehh, no. Far from it. > > No change to startup logic. > > I flagged a change in my comment above. Thanks, pls see my reply. ------------- PR: https://git.openjdk.org/jdk/pull/12923 From adinn at openjdk.org Thu Mar 9 09:39:16 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 9 Mar 2023 09:39:16 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v4] In-Reply-To: References: Message-ID: <6lvLldbtBOQUw1f_3lz6yJ8G33tpzQolD95cBHHvhNY=.0b6f4c9d-01b3-4bd4-9638-68b2e28a7a65@github.com> On Wed, 8 Mar 2023 18:56:55 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> We are adding support to let JFR report on Agents. >> >> #### Design >> >> An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. >> >> A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) >> >> A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. >> >> To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: >> >> // Command line >> jdk.JavaAgent { >> startTime = 12:31:19.789 (2023-03-08) >> name = "JavaAgent.jar" >> options = "foo=bar" >> initialization = 12:31:15.574 (2023-03-08) >> initializationTime = 172 ms >> initializationMethod = "premain" >> } >> // Dynamic load >> jdk.JavaAgent { >> startTime = 12:31:31.158 (2023-03-08) >> name = "JavaAgent.jar" >> options = "bar=baz" >> initialization = 12:31:31.037 (2023-03-08) >> initializationTime = 64,1 ms >> initializationMethod = "agentmain" >> } >> >> The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. >> >> For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar >> >> The event will also detail which initialization method was invoked by the JVM, "premain" for command line agents, and "agentmain" for agents loaded dynamically. >> >> "initialization" is the timestamp the JVM invoked the initialization method, and "initializationTime" is the duration of executing the initialization method. >> >> "startTime" represents the time the JFR framework issued the periodic event; hence "initialization" will be earlier than "startTime". >> >> An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. >> >> To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: >> >> jdk.NativeAgent { >> startTime = 12:31:40.398 (2023-03-08) >> name = "jdwp" >> options = "transport=dt_socket,server=y,address=any,onjcmd=y" >> path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" >> dynamic = false >> initialization = 12:31:36.142 (2023-03-08) >> initializationTime = 0,00184 ms >> } >> >> The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported, and there is also a denotation if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true). >> >> The initialization of a native agent is performed by invoking an agent-specified callback routine which is not detailed in the event (for now). The "initialization" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationTime" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationTime" will be 0. >> >> #### Implementation >> >> There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. >> >> Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. >> >> When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. >> >> The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. >> >> The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. >> >> Testing: jdk_jfr, tier 1 - 6 >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with two additional commits since the last revision: > > - remove JVMPI > - cleanup Yes, I appreciate that `dynamic` can be derived from `initializationMethod` -- and vice versa. However, I was approaching this semantically from the opposite end. To me the primary characteristic that the user would be interested in is whether the agent was loaded dynamically or on the command line (whatever the type of agent). The corresponding fact, for a Java agent, that it is entered, respectively, via method agentMain or preMain is a derived (implementation) detail. Is there a reason to mention that detail? ------------- PR: https://git.openjdk.org/jdk/pull/12923 From mgronlun at openjdk.org Thu Mar 9 09:47:16 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 9 Mar 2023 09:47:16 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v4] In-Reply-To: <6lvLldbtBOQUw1f_3lz6yJ8G33tpzQolD95cBHHvhNY=.0b6f4c9d-01b3-4bd4-9638-68b2e28a7a65@github.com> References: <6lvLldbtBOQUw1f_3lz6yJ8G33tpzQolD95cBHHvhNY=.0b6f4c9d-01b3-4bd4-9638-68b2e28a7a65@github.com> Message-ID: <1vaEi8bGZ5D5woUEmxe_zYIOR138w4N9Mwcv_Hk6-Z0=.ee6c0d43-a9bc-46d6-8c7f-7a7cf71fb704@github.com> The message from this sender included one or more files which could not be scanned for virus detection; do not open these files unless you are certain of the sender's intent. ---------------------------------------------------------------------- On Thu, 9 Mar 2023 09:36:28 GMT, Andrew Dinn wrote: > Yes, I appreciate that `dynamic` can be derived from `initializationMethod` -- and vice versa. However, I was approaching this semantically from the opposite end. To me the primary characteristic that the user would be interested in is whether the agent was loaded dynamically or on the command line (whatever the type of agent). The corresponding fact, for a Java agent, that it is entered, respectively, via method agentMain or preMain is a derived (implementation) detail. Is there a reason to mention that detail? That's a good point. The overall intent was to map what method was measured during initialization. That included native agent callbacks as well. It may be an unnecessary implementation detail and may restrict the possibility of growth. It is probably a better design abstraction to leave out the specific method. I dropped the callback function for the native agent, but we should also do the same for JavaAgents. ------------- PR: https://git.openjdk.org/jdk/pull/12923 From stuefe at openjdk.org Thu Mar 9 10:09:28 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 9 Mar 2023 10:09:28 GMT Subject: RFR: JDK-8303861: Error handling step timeouts should never be blocked by OnError and others [v2] In-Reply-To: References: Message-ID: > Fatal error handling is subject to several timeouts: > - a global timeout (controlled via ErrorLogTimeout) > - local error reporting step timeouts. > > The latter aims to "give the JVM a kick" if it gets stuck in one particular place during error reporting. This prevents one error reporting step from hogging all the time allotted to error reporting under ErrorLogTimeout. > > There are three situations where atm we suppress the global error timeout: > - if the JVM is embedded and the launcher has its abort hook installed. Obviously, that must be allowed to run. > - if the user specified one or more OnError commands to run, and these did not yet run. These must have a chance to run unmolested. > - if the user (typically developer) specified ShowMessageBoxOnError, and the error box has not yet been shown > > There is a bug though, that also prevents the step timeout from firing if either condition is true. That is plain wrong. > > In addition to that, the test interval WatcherThread uses to check for timeouts should be decreased. It sits at 1 second, which is too coarse-grained. > > -------- > > Patch: > - reworks `VMError::check_timeout()` to never block step timeouts > - adds clarifying comments > - quadruples timeout check frequency by watcher thread > - adds regression test for timeout handling with OnError > - additionally limits timeout per individual error reporting step to 5 seconds. 5 seconds is usually enough to distinguish a slow error reporting step from one that is endlessly hanging. > > Tested locally on Linux x64. Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: limit step timeout to 5 seconds max ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12936/files - new: https://git.openjdk.org/jdk/pull/12936/files/0a188617..70b9add7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12936&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12936&range=00-01 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12936.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12936/head:pull/12936 PR: https://git.openjdk.org/jdk/pull/12936 From duke at openjdk.org Thu Mar 9 10:36:36 2023 From: duke at openjdk.org (Afshin Zafari) Date: Thu, 9 Mar 2023 10:36:36 GMT Subject: RFR: 8292059: Do not inline InstanceKlass::allocate_instance() [v3] In-Reply-To: References: Message-ID: <7Aide4lBzCPDqqanrD8I9SsT6LXneQ8CKSU4os4lH-Q=.38f1186f-01b0-4975-ad16-fcfd4eb6c031@github.com> The message from this sender included one or more files which could not be scanned for virus detection; do not open these files unless you are certain of the sender's intent. ---------------------------------------------------------------------- > The inline and not-inline versions of the method is stress tested to compare the performance difference. The statistics are drawn in the following charts. The vertical axis is in milliseconds. > > ![chart (2)](https://user-images.githubusercontent.com/4697012/221848555-2884313e-9d26-41c9-a265-3f1ce295b17b.png) > > ![chart (3)](https://user-images.githubusercontent.com/4697012/221863810-94118677-b4af-468f-90c6-5ea365ae3588.png) Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: 8292059: Do not inline InstanceKlass::allocate_instance() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12782/files - new: https://git.openjdk.org/jdk/pull/12782/files/30a5734c..6cdd8357 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12782&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12782&range=01-02 Stats: 9 lines in 5 files changed: 6 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/12782.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12782/head:pull/12782 PR: https://git.openjdk.org/jdk/pull/12782 From duke at openjdk.org Thu Mar 9 10:36:39 2023 From: duke at openjdk.org (Afshin Zafari) Date: Thu, 9 Mar 2023 10:36:39 GMT Subject: RFR: 8292059: Do not inline InstanceKlass::allocate_instance() [v2] In-Reply-To: References: <8koli6nAbt8Rx4Je8MRic0dPloLTb9IiUyw25BvUI0s=.07267dc7-d2a1-4426-8876-1e41b1a248ac@github.com> Message-ID: On Mon, 6 Mar 2023 13:36:00 GMT, Stefan Karlsson wrote: > What is the motivation to change the parameter from `oop java_class` to `InstanceKlass*`? The call sites are now much noisier and harder to read. This is changed to gain performance when this function is called many times. ------------- PR: https://git.openjdk.org/jdk/pull/12782 From mgronlun at openjdk.org Thu Mar 9 10:57:27 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 9 Mar 2023 10:57:27 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v5] In-Reply-To: References: Message-ID: > Greetings, > > We are adding support to let JFR report on Agents. > > #### Design > > An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. > > A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) > > A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. > > To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: > > // Command line > jdk.JavaAgent { > startTime = 12:31:19.789 (2023-03-08) > name = "JavaAgent.jar" > options = "foo=bar" > dynamic = false > initialization = 12:31:15.574 (2023-03-08) > initializationTime = 172 ms > } > > // Dynamic load > jdk.JavaAgent { > startTime = 12:31:31.158 (2023-03-08) > name = "JavaAgent.jar" > options = "bar=baz" > dynamic = true > initialization = 12:31:31.037 (2023-03-08) > initializationTime = 64,1 ms > } > > The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. > > For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. > > The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) > > "initialization" is the timestamp the JVM invoked the initialization method, and "initializationTime" is the duration of executing the initialization method. > > "startTime" represents the time the JFR framework issued the periodic event; hence "initialization" will be earlier than "startTime". > > An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. > > To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: > > jdk.NativeAgent { > startTime = 12:31:40.398 (2023-03-08) > name = "jdwp" > options = "transport=dt_socket,server=y,address=any,onjcmd=y" > dynamic = false > initialization = 12:31:36.142 (2023-03-08) > initializationTime = 0,00184 ms > path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" > } > > The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. > > The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initialization" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationTime" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationTime" will be 0. > > #### Implementation > > There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. > > Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. > > When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. > > The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. > > The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. > > Testing: jdk_jfr, tier 1 - 6 > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: remove implementation details ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12923/files - new: https://git.openjdk.org/jdk/pull/12923/files/355d307c..f0c04055 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=03-04 Stats: 16 lines in 3 files changed: 9 ins; 6 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12923.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12923/head:pull/12923 PR: https://git.openjdk.org/jdk/pull/12923 From redestad at openjdk.org Thu Mar 9 11:11:24 2023 From: redestad at openjdk.org (Claes Redestad) Date: Thu, 9 Mar 2023 11:11:24 GMT Subject: RFR: 8292059: Do not inline InstanceKlass::allocate_instance() [v3] In-Reply-To: <7Aide4lBzCPDqqanrD8I9SsT6LXneQ8CKSU4os4lH-Q=.38f1186f-01b0-4975-ad16-fcfd4eb6c031@github.com> References: <7Aide4lBzCPDqqanrD8I9SsT6LXneQ8CKSU4os4lH-Q=.38f1186f-01b0-4975-ad16-fcfd4eb6c031@github.com> Message-ID: On Thu, 9 Mar 2023 10:36:36 GMT, Afshin Zafari wrote: >> The inline and not-inline versions of the method is stress tested to compare the performance difference. >> The `oop java_class` input parameter is changed to `InstanceKlass *` to gain more performance. >> The statistics are drawn in the following charts. The vertical axis is in milliseconds. >> >> ![chart (2)](https://user-images.githubusercontent.com/4697012/221848555-2884313e-9d26-41c9-a265-3f1ce295b17b.png) >> >> ![chart (3)](https://user-images.githubusercontent.com/4697012/221863810-94118677-b4af-468f-90c6-5ea365ae3588.png) > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > 8292059: Do not inline InstanceKlass::allocate_instance() src/hotspot/share/prims/jni.cpp line 967: > 965: > 966: instanceOop i = InstanceKlass::allocate_instance( > 967: InstanceKlass::cast(java_lang_Class::as_Klass(JNIHandles::resolve_non_null(clazz))), Perhaps it would be nice with a utility method to reduce some of the clutter. Just folding the `as_Klass` into a new method `instanceKlass::cast_from_oop(..)` (or just `from_oop`) would cut away a fair chunk and could be used to similar effect elsewhere (counting 21 cases where this would be applicable). ------------- PR: https://git.openjdk.org/jdk/pull/12782 From mgronlun at openjdk.org Thu Mar 9 11:33:18 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 9 Mar 2023 11:33:18 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v6] In-Reply-To: References: Message-ID: > Greetings, > > We are adding support to let JFR report on Agents. > > #### Design > > An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. > > A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) > > A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. > > To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: > > // Command line > jdk.JavaAgent { > startTime = 12:31:19.789 (2023-03-08) > name = "JavaAgent.jar" > options = "foo=bar" > dynamic = false > initialization = 12:31:15.574 (2023-03-08) > initializationTime = 172 ms > } > > // Dynamic load > jdk.JavaAgent { > startTime = 12:31:31.158 (2023-03-08) > name = "JavaAgent.jar" > options = "bar=baz" > dynamic = true > initialization = 12:31:31.037 (2023-03-08) > initializationTime = 64,1 ms > } > > The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. > > For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. > > The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) > > "initialization" is the timestamp the JVM invoked the initialization method, and "initializationTime" is the duration of executing the initialization method. > > "startTime" represents the time the JFR framework issued the periodic event; hence "initialization" will be earlier than "startTime". > > An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. > > To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: > > jdk.NativeAgent { > startTime = 12:31:40.398 (2023-03-08) > name = "jdwp" > options = "transport=dt_socket,server=y,address=any,onjcmd=y" > dynamic = false > initialization = 12:31:36.142 (2023-03-08) > initializationTime = 0,00184 ms > path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" > } > > The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. > > The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initialization" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationTime" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationTime" will be 0. > > #### Implementation > > There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. > > Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. > > When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. > > The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. > > The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. > > Testing: jdk_jfr, tier 1 - 6 > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12923/files - new: https://git.openjdk.org/jdk/pull/12923/files/f0c04055..80f22257 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=04-05 Stats: 5 lines in 2 files changed: 1 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/12923.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12923/head:pull/12923 PR: https://git.openjdk.org/jdk/pull/12923 From mgronlun at openjdk.org Thu Mar 9 11:48:23 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 9 Mar 2023 11:48:23 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v7] In-Reply-To: References: Message-ID: <3giPMTlZQoURo1LkJ6Bq5TAxt0WsrXqj_LFopPBon7U=.daf0ce0c-e4dd-4df6-b8f2-2f4408954a64@github.com> > Greetings, > > We are adding support to let JFR report on Agents. > > #### Design > > An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. > > A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) > > A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. > > To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: > > // Command line > jdk.JavaAgent { > startTime = 12:31:19.789 (2023-03-08) > name = "JavaAgent.jar" > options = "foo=bar" > dynamic = false > initialization = 12:31:15.574 (2023-03-08) > initializationTime = 172 ms > } > > // Dynamic load > jdk.JavaAgent { > startTime = 12:31:31.158 (2023-03-08) > name = "JavaAgent.jar" > options = "bar=baz" > dynamic = true > initialization = 12:31:31.037 (2023-03-08) > initializationTime = 64,1 ms > } > > The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. > > For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. > > The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) > > "initialization" is the timestamp the JVM invoked the initialization method, and "initializationTime" is the duration of executing the initialization method. > > "startTime" represents the time the JFR framework issued the periodic event; hence "initialization" will be earlier than "startTime". > > An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. > > To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: > > jdk.NativeAgent { > startTime = 12:31:40.398 (2023-03-08) > name = "jdwp" > options = "transport=dt_socket,server=y,address=any,onjcmd=y" > dynamic = false > initialization = 12:31:36.142 (2023-03-08) > initializationTime = 0,00184 ms > path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" > } > > The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. > > The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initialization" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationTime" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationTime" will be 0. > > #### Implementation > > There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. > > Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. > > When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. > > The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. > > The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. > > Testing: jdk_jfr, tier 1 - 6 > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: cleanup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12923/files - new: https://git.openjdk.org/jdk/pull/12923/files/80f22257..d0609bfb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=05-06 Stats: 15 lines in 3 files changed: 1 ins; 3 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/12923.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12923/head:pull/12923 PR: https://git.openjdk.org/jdk/pull/12923 From mgronlun at openjdk.org Thu Mar 9 11:56:10 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 9 Mar 2023 11:56:10 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v8] In-Reply-To: References: Message-ID: > Greetings, > > We are adding support to let JFR report on Agents. > > #### Design > > An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. > > A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) > > A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. > > To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: > > // Command line > jdk.JavaAgent { > startTime = 12:31:19.789 (2023-03-08) > name = "JavaAgent.jar" > options = "foo=bar" > dynamic = false > initialization = 12:31:15.574 (2023-03-08) > initializationTime = 172 ms > } > > // Dynamic load > jdk.JavaAgent { > startTime = 12:31:31.158 (2023-03-08) > name = "JavaAgent.jar" > options = "bar=baz" > dynamic = true > initialization = 12:31:31.037 (2023-03-08) > initializationTime = 64,1 ms > } > > The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. > > For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. > > The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) > > "initialization" is the timestamp the JVM invoked the initialization method, and "initializationTime" is the duration of executing the initialization method. > > "startTime" represents the time the JFR framework issued the periodic event; hence "initialization" will be earlier than "startTime". > > An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. > > To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: > > jdk.NativeAgent { > startTime = 12:31:40.398 (2023-03-08) > name = "jdwp" > options = "transport=dt_socket,server=y,address=any,onjcmd=y" > dynamic = false > initialization = 12:31:36.142 (2023-03-08) > initializationTime = 0,00184 ms > path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" > } > > The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. > > The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initialization" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationTime" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationTime" will be 0. > > #### Implementation > > There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. > > Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. > > When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. > > The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. > > The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. > > Testing: jdk_jfr, tier 1 - 6 > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: more cleanup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12923/files - new: https://git.openjdk.org/jdk/pull/12923/files/d0609bfb..db48fe8d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=06-07 Stats: 4 lines in 3 files changed: 1 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/12923.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12923/head:pull/12923 PR: https://git.openjdk.org/jdk/pull/12923 From stefank at openjdk.org Thu Mar 9 13:40:24 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 9 Mar 2023 13:40:24 GMT Subject: RFR: 8292059: Do not inline InstanceKlass::allocate_instance() [v2] In-Reply-To: References: <8koli6nAbt8Rx4Je8MRic0dPloLTb9IiUyw25BvUI0s=.07267dc7-d2a1-4426-8876-1e41b1a248ac@github.com> Message-ID: On Thu, 9 Mar 2023 10:31:58 GMT, Afshin Zafari wrote: > > What is the motivation to change the parameter from `oop java_class` to `InstanceKlass*`? The call sites are now much noisier and harder to read. > > This is changed to gain performance when this function is called many times. Could you explain how that helps the performance? ------------- PR: https://git.openjdk.org/jdk/pull/12782 From coleenp at openjdk.org Thu Mar 9 13:55:49 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 9 Mar 2023 13:55:49 GMT Subject: RFR: 8300926: Several startup regressions ~6-70% in 21-b6 all platforms [v8] In-Reply-To: References: Message-ID: <82_ePE_6SI4ZWzv5OZ0Ygs5BrQEQ6fKGrp_ZRsLuPAM=.a4f3e612-30b3-4988-a777-979d3ef0e8ae@github.com> On Tue, 7 Mar 2023 13:47:55 GMT, Robbin Ehn wrote: >> Hi all, please consider. >> >> The original issue was when thread 1 asked to deopt nmethod set X and thread 2 asked for the same or a subset of X. >> All method will already be marked, but the actual deoptimizing, not entrant, patching PC on stacks and patching post call nops, was not done yet. Which meant thread 2 could 'pass' thread 1. >> Most places did deopt under Compile_lock, thus this is not an issue, but WB and clearCallSiteContext do not. >> >> Since a handshakes may take long before completion and Compile_lock is used for so much more than deopts. >> The fix in https://bugs.openjdk.org/browse/JDK-8299074 instead always emitted a handshake even when everything was already marked. (instead of adding Compile_lock to all places) >> >> This turnout to be problematic in the startup, for example the number of deopt handshakes in jetty dry run startup went from 5 to 39 handshakes. >> >> This fix first adds a barrier for which you do not pass until the requested deopts have happened and it coalesces the handshakes. >> Secondly it moves handshakes part out of the Compile_lock where it is possible. >> >> Which means we fix the performance bug and we reduce the contention on Compile_lock, meaning higher throughput in compiler and things such as class-loading. >> >> It passes t1-t7 with flying colours! t8 still not completed and I'm redoing some testing due to last minute simplifications. >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request incrementally with two additional commits since the last revision: > > - Grab lock so code() is stable > - Non CHA based vtables fix Thank you for the comment. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.org/jdk/pull/12585 From mdoerr at openjdk.org Thu Mar 9 14:02:35 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 9 Mar 2023 14:02:35 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v12] In-Reply-To: References: Message-ID: > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Will get addressed separately: [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: The merge change has messed up some includes in the tests. Revert. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12708/files - new: https://git.openjdk.org/jdk/pull/12708/files/f75a240d..a2eed058 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=10-11 Stats: 44 lines in 6 files changed: 15 ins; 21 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From eosterlund at openjdk.org Thu Mar 9 14:10:36 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 9 Mar 2023 14:10:36 GMT Subject: RFR: 8300926: Several startup regressions ~6-70% in 21-b6 all platforms [v8] In-Reply-To: References: Message-ID: On Tue, 7 Mar 2023 13:47:55 GMT, Robbin Ehn wrote: >> Hi all, please consider. >> >> The original issue was when thread 1 asked to deopt nmethod set X and thread 2 asked for the same or a subset of X. >> All method will already be marked, but the actual deoptimizing, not entrant, patching PC on stacks and patching post call nops, was not done yet. Which meant thread 2 could 'pass' thread 1. >> Most places did deopt under Compile_lock, thus this is not an issue, but WB and clearCallSiteContext do not. >> >> Since a handshakes may take long before completion and Compile_lock is used for so much more than deopts. >> The fix in https://bugs.openjdk.org/browse/JDK-8299074 instead always emitted a handshake even when everything was already marked. (instead of adding Compile_lock to all places) >> >> This turnout to be problematic in the startup, for example the number of deopt handshakes in jetty dry run startup went from 5 to 39 handshakes. >> >> This fix first adds a barrier for which you do not pass until the requested deopts have happened and it coalesces the handshakes. >> Secondly it moves handshakes part out of the Compile_lock where it is possible. >> >> Which means we fix the performance bug and we reduce the contention on Compile_lock, meaning higher throughput in compiler and things such as class-loading. >> >> It passes t1-t7 with flying colours! t8 still not completed and I'm redoing some testing due to last minute simplifications. >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request incrementally with two additional commits since the last revision: > > - Grab lock so code() is stable > - Non CHA based vtables fix Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.org/jdk/pull/12585 From rehn at openjdk.org Thu Mar 9 14:48:42 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 9 Mar 2023 14:48:42 GMT Subject: RFR: 8300926: Several startup regressions ~6-70% in 21-b6 all platforms [v8] In-Reply-To: <82_ePE_6SI4ZWzv5OZ0Ygs5BrQEQ6fKGrp_ZRsLuPAM=.a4f3e612-30b3-4988-a777-979d3ef0e8ae@github.com> References: <82_ePE_6SI4ZWzv5OZ0Ygs5BrQEQ6fKGrp_ZRsLuPAM=.a4f3e612-30b3-4988-a777-979d3ef0e8ae@github.com> Message-ID: On Thu, 9 Mar 2023 13:52:25 GMT, Coleen Phillimore wrote: > Thank you for the comment. Thanks! ------------- PR: https://git.openjdk.org/jdk/pull/12585 From rehn at openjdk.org Thu Mar 9 14:48:45 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 9 Mar 2023 14:48:45 GMT Subject: RFR: 8300926: Several startup regressions ~6-70% in 21-b6 all platforms [v8] In-Reply-To: References: Message-ID: <3EeDW0z2xQyX9iDR_e3a8gUrx-u_LYYN0ykAcbDfx84=.6b07a31e-c717-474b-8756-6c10c0a7d3c6@github.com> The message from this sender included one or more files which could not be scanned for virus detection; do not open these files unless you are certain of the sender's intent. ---------------------------------------------------------------------- On Thu, 9 Mar 2023 14:07:18 GMT, Erik ?sterlund wrote: > Looks good. Thanks! ------------- PR: https://git.openjdk.org/jdk/pull/12585 From rehn at openjdk.org Thu Mar 9 14:48:48 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 9 Mar 2023 14:48:48 GMT Subject: RFR: 8300926: Several startup regressions ~6-70% in 21-b6 all platforms [v8] In-Reply-To: References: Message-ID: On Tue, 7 Mar 2023 13:47:55 GMT, Robbin Ehn wrote: >> Hi all, please consider. >> >> The original issue was when thread 1 asked to deopt nmethod set X and thread 2 asked for the same or a subset of X. >> All method will already be marked, but the actual deoptimizing, not entrant, patching PC on stacks and patching post call nops, was not done yet. Which meant thread 2 could 'pass' thread 1. >> Most places did deopt under Compile_lock, thus this is not an issue, but WB and clearCallSiteContext do not. >> >> Since a handshakes may take long before completion and Compile_lock is used for so much more than deopts. >> The fix in https://bugs.openjdk.org/browse/JDK-8299074 instead always emitted a handshake even when everything was already marked. (instead of adding Compile_lock to all places) >> >> This turnout to be problematic in the startup, for example the number of deopt handshakes in jetty dry run startup went from 5 to 39 handshakes. >> >> This fix first adds a barrier for which you do not pass until the requested deopts have happened and it coalesces the handshakes. >> Secondly it moves handshakes part out of the Compile_lock where it is possible. >> >> Which means we fix the performance bug and we reduce the contention on Compile_lock, meaning higher throughput in compiler and things such as class-loading. >> >> It passes t1-t7 with flying colours! t8 still not completed and I'm redoing some testing due to last minute simplifications. >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request incrementally with two additional commits since the last revision: > > - Grab lock so code() is stable > - Non CHA based vtables fix Now passed t1-t7. ------------- PR: https://git.openjdk.org/jdk/pull/12585 From kbarrett at openjdk.org Thu Mar 9 15:43:06 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 9 Mar 2023 15:43:06 GMT Subject: RFR: 8303900: Rename BitMap search functions Message-ID: Please review this renaming of the following functions in BitMap: get_next_one_offset => find_first_set_bit get_next_zero_offset => find_first_clear_bit get_next_one_offset_aligned_right => find_first_set_bit_aligned_right Note that ShenandoahMarkBitMap::get_next_one_offset is not being renamed. For some reason that class contains a copy of a sizable chunk of the code from some version of BitMap. (Not sure why it doesn't use an internal BitMapView rather than code copying.) Testing: mach5 tier1-5 ------------- Commit messages: - update gtests - update shenandoah uses - update zgc uses - update gc/shared uses - update Parallel uses - update G1 uses - rename functions Changes: https://git.openjdk.org/jdk/pull/12951/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12951&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303900 Stats: 75 lines in 12 files changed: 0 ins; 0 del; 75 mod Patch: https://git.openjdk.org/jdk/pull/12951.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12951/head:pull/12951 PR: https://git.openjdk.org/jdk/pull/12951 From mdoerr at openjdk.org Thu Mar 9 15:51:56 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 9 Mar 2023 15:51:56 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v13] In-Reply-To: References: Message-ID: > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Will get addressed separately: [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Remove LinuxPPC64CallArranger.java because it doesn't contain anything. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12708/files - new: https://git.openjdk.org/jdk/pull/12708/files/a2eed058..fb87284c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=11-12 Stats: 44 lines in 3 files changed: 0 ins; 38 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From coleenp at openjdk.org Thu Mar 9 15:52:29 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 9 Mar 2023 15:52:29 GMT Subject: RFR: 8292059: Do not inline InstanceKlass::allocate_instance() [v3] In-Reply-To: <7Aide4lBzCPDqqanrD8I9SsT6LXneQ8CKSU4os4lH-Q=.38f1186f-01b0-4975-ad16-fcfd4eb6c031@github.com> References: <7Aide4lBzCPDqqanrD8I9SsT6LXneQ8CKSU4os4lH-Q=.38f1186f-01b0-4975-ad16-fcfd4eb6c031@github.com> Message-ID: <_mI_uKQkRL0ORBaxslEozgsHhICHMymcOtsxQFfMKqI=.cae696ed-3dc8-4011-84fb-c638b10a4c3f@github.com> On Thu, 9 Mar 2023 10:36:36 GMT, Afshin Zafari wrote: >> The inline and not-inline versions of the method is stress tested to compare the performance difference. >> The `oop java_class` input parameter is changed to `InstanceKlass *` to gain more performance. >> The statistics are drawn in the following charts. The vertical axis is in milliseconds. >> >> ![chart (2)](https://user-images.githubusercontent.com/4697012/221848555-2884313e-9d26-41c9-a265-3f1ce295b17b.png) >> >> ![chart (3)](https://user-images.githubusercontent.com/4697012/221863810-94118677-b4af-468f-90c6-5ea365ae3588.png) > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > 8292059: Do not inline InstanceKlass::allocate_instance() There was a small performance difference (loss) with making the static InstanceKlass::allocate_instance function not inlined. The experiment to effectively inline the conversion from oop java_class to InstanceKlass regained this performance in Afshin's testing. I agree the parameter change is messy and with Claes that we should have a utility function - in instanceKlass.inline.hpp. Maybe call it from_class like: InstanceKlass* InstanceKlass::from_class(oop java_class) { ... } ------------- PR: https://git.openjdk.org/jdk/pull/12782 From coleenp at openjdk.org Thu Mar 9 15:56:12 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 9 Mar 2023 15:56:12 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry In-Reply-To: References: Message-ID: On Tue, 7 Mar 2023 19:25:58 GMT, Matias Saavedra Silva wrote: >> src/hotspot/share/oops/cpCache.cpp line 727: >> >>> 725: set_reference_map(nullptr); >>> 726: #if INCLUDE_CDS >>> 727: if (_initial_entries != nullptr) { >> >> @iklam with moving invokedynamic entries out, do you still need to save initialized entries ? Does invokehandle need this? (Should have separate RFE if more cleanup is possible) > > This along with the previous comment about `_invokedynamic_references_map` would probably be better suited for their own RFE. I think the scope of this PR should be limited to the indy structure and its implementation, so any changes related to invokehandle can be traced more easily. ok, that's fine. ------------- PR: https://git.openjdk.org/jdk/pull/12778 From mdoerr at openjdk.org Thu Mar 9 15:56:19 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 9 Mar 2023 15:56:19 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v3] In-Reply-To: References: <8b3vVrV22RuhdRoRYacXV0ZeghFGgKkC8S_z-iMrzAQ=.dd84b743-8b51-4281-8f5f-f9eff6207bc7@github.com> Message-ID: On Wed, 1 Mar 2023 06:27:19 GMT, Martin Doerr wrote: >> src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/CallArranger.java line 68: >> >>> 66: public abstract class CallArranger { >>> 67: // Linux PPC64 Little Endian uses ABI v2. >>> 68: private static final boolean useABIv2 = ByteOrder.nativeOrder() == ByteOrder.LITTLE_ENDIAN; >> >> Now that I'm here. This could be a potentially interesting case for having 2 subclasses of CallArranger: one for `useABIv2 == true` and one for `false`. > > Yeah, let's wait until we know what changes we need for AIX (and Big Endian linux). I think having a ABIV1CallArranger (for Big Endian linux and AIX) and a ABIV2CallArranger (linux ppc64le) will make sense. That's why I have removed the LinuxPPC64CallArranger.java with https://github.com/openjdk/jdk/pull/12708/commits/fb87284c1d3df946db378d196d7f48cd3acbab01. I guess a good time for such cleanup will be after the code for all variants is available (not part of this PR). ------------- PR: https://git.openjdk.org/jdk/pull/12708 From coleenp at openjdk.org Thu Mar 9 16:03:53 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 9 Mar 2023 16:03:53 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry In-Reply-To: References: Message-ID: On Tue, 7 Mar 2023 15:04:29 GMT, Richard Reingruber wrote: >> src/hotspot/cpu/ppc/templateTable_ppc_64.cpp line 53: >> >>> 51: >>> 52: #undef __ >>> 53: #define __ Disassembler::hook(__FILE__, __LINE__, _masm)-> >> >> What is this? Is this something useful for debugging the template interpreter? Probably doesn't belong with this change but might be nice to have (?) @reinrich > > Yes this is really useful when debugging the template interpreter. It annotates the disassembly with the generator source code. It helped tracking down a bug in the ppc part oft this pr. Other platforms have it too. > > Example: > > invokedynamic 186 invokedynamic [0x00003fff80075a00, 0x00003fff80075dc8] 968 bytes > > -------------------------------------------------------------------------------- > 0x00003fff80075a00: std r17,0(r15) ;;@FILE: src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp > ;; 2185: aep = __ pc(); __ push_ptr(); __ b(L); > 0x00003fff80075a04: addi r15,r15,-8 > 0x00003fff80075a08: b 0x00003fff80075a40 ;; 2185: aep = __ pc(); __ push_ptr(); __ b(L); > 0x00003fff80075a0c: stfs f15,0(r15) ;; 2186: fep = __ pc(); __ push_f(); __ b(L); > 0x00003fff80075a10: addi r15,r15,-8 > 0x00003fff80075a14: b 0x00003fff80075a40 ;; 2186: fep = __ pc(); __ push_f(); __ b(L); > 0x00003fff80075a18: stfd f15,-8(r15) ;; 2187: dep = __ pc(); __ push_d(); __ b(L); > 0x00003fff80075a1c: addi r15,r15,-16 > 0x00003fff80075a20: b 0x00003fff80075a40 ;; 2187: dep = __ pc(); __ push_d(); __ b(L); > 0x00003fff80075a24: li r0,0 ;; 2188: lep = __ pc(); __ push_l(); __ b(L); > 0x00003fff80075a28: std r0,0(r15) > 0x00003fff80075a2c: std r17,-8(r15) > 0x00003fff80075a30: addi r15,r15,-16 > 0x00003fff80075a34: b 0x00003fff80075a40 ;; 2188: lep = __ pc(); __ push_l(); __ b(L); > 0x00003fff80075a38: stw r17,0(r15) ;; 2189: __ align(32, 12, 24); // align L > ;; 2191: iep = __ pc(); __ push_i(); > 0x00003fff80075a3c: addi r15,r15,-8 > 0x00003fff80075a40: li r21,1 ;; 2192: vep = __ pc(); > ;; 2193: __ bind(L); > ;;@FILE: src/hotspot/share/interpreter/templateInterpreterGenerator.cpp > ;; 366: __ verify_FPU(1, t->tos_in()); > ;;@FILE: src/hotspot/cpu/ppc/templateTable_ppc_64.cpp > ;; 2293: __ load_resolved_indy_entry(cache, index); > 0x00003fff80075a44: lwax r21,r14,r21 > 0x00003fff80075a48: nand r21,r21,r21 > 0x00003fff80075a4c: ld r31,40(r27) > 0x00003fff80075a50: rldicr r21,r21,4,59 > 0x00003fff80075a54: addi r21,r21,8 > 0x00003fff80075a58: add r31,r31,r21 > 0x00003fff80075a5c: ld r22,0(r31) ;; 2294: __ ld_ptr(method, in_bytes(ResolvedIndyEntry::method_offset()), cache); > 0x00003fff80075a60: cmpdi r22,0 ;; 2297: __ cmpdi(CCR0, method, 0); > 0x00003fff80075a64: bne- 0x00003fff80075b94 ;; 2298: __ bne(CCR0, resolved);,bo=0b00100[no_hint] > 0x00003fff80075a68: li r4,186 ;; 2304: __ li(R4_ARG2, code); > 0x00003fff80075a6c: ld r11,0(r1) ;; 2305: __ call_VM(noreg, entry, R4_ARG2, true); This change should be in a further RFE though (and you can explain it there so we can maybe use it in the other platforms too). Does it affect performance when generating the template interpreter? Do you need to have hsdis in the LD_LIBRARY_PATH environment variable to use this? I see it's already used by default in one place. ------------- PR: https://git.openjdk.org/jdk/pull/12778 From coleenp at openjdk.org Thu Mar 9 16:03:54 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 9 Mar 2023 16:03:54 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry In-Reply-To: References: Message-ID: <0TIZDGiEpwyU6vRDvWCTiQBoyekNHwa5QoJr4ViPh0U=.7a45b939-ce28-4936-871b-dfc4c0b3a5ec@github.com> On Thu, 9 Mar 2023 16:00:53 GMT, Coleen Phillimore wrote: >> Yes this is really useful when debugging the template interpreter. It annotates the disassembly with the generator source code. It helped tracking down a bug in the ppc part oft this pr. Other platforms have it too. >> >> Example: >> >> invokedynamic 186 invokedynamic [0x00003fff80075a00, 0x00003fff80075dc8] 968 bytes >> >> -------------------------------------------------------------------------------- >> 0x00003fff80075a00: std r17,0(r15) ;;@FILE: src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp >> ;; 2185: aep = __ pc(); __ push_ptr(); __ b(L); >> 0x00003fff80075a04: addi r15,r15,-8 >> 0x00003fff80075a08: b 0x00003fff80075a40 ;; 2185: aep = __ pc(); __ push_ptr(); __ b(L); >> 0x00003fff80075a0c: stfs f15,0(r15) ;; 2186: fep = __ pc(); __ push_f(); __ b(L); >> 0x00003fff80075a10: addi r15,r15,-8 >> 0x00003fff80075a14: b 0x00003fff80075a40 ;; 2186: fep = __ pc(); __ push_f(); __ b(L); >> 0x00003fff80075a18: stfd f15,-8(r15) ;; 2187: dep = __ pc(); __ push_d(); __ b(L); >> 0x00003fff80075a1c: addi r15,r15,-16 >> 0x00003fff80075a20: b 0x00003fff80075a40 ;; 2187: dep = __ pc(); __ push_d(); __ b(L); >> 0x00003fff80075a24: li r0,0 ;; 2188: lep = __ pc(); __ push_l(); __ b(L); >> 0x00003fff80075a28: std r0,0(r15) >> 0x00003fff80075a2c: std r17,-8(r15) >> 0x00003fff80075a30: addi r15,r15,-16 >> 0x00003fff80075a34: b 0x00003fff80075a40 ;; 2188: lep = __ pc(); __ push_l(); __ b(L); >> 0x00003fff80075a38: stw r17,0(r15) ;; 2189: __ align(32, 12, 24); // align L >> ;; 2191: iep = __ pc(); __ push_i(); >> 0x00003fff80075a3c: addi r15,r15,-8 >> 0x00003fff80075a40: li r21,1 ;; 2192: vep = __ pc(); >> ;; 2193: __ bind(L); >> ;;@FILE: src/hotspot/share/interpreter/templateInterpreterGenerator.cpp >> ;; 366: __ verify_FPU(1, t->tos_in()); >> ;;@FILE: src/hotspot/cpu/ppc/templateTable_ppc_64.cpp >> ;; 2293: __ load_resolved_indy_entry(cache, index); >> 0x00003fff80075a44: lwax r21,r14,r21 >> 0x00003fff80075a48: nand r21,r21,r21 >> 0x00003fff80075a4c: ld r31,40(r27) >> 0x00003fff80075a50: rldicr r21,r21,4,59 >> 0x00003fff80075a54: addi r21,r21,8 >> 0x00003fff80075a58: add r31,r31,r21 >> 0x00003fff80075a5c: ld r22,0(r31) ;; 2294: __ ld_ptr(method, in_bytes(ResolvedIndyEntry::method_offset()), cache); >> 0x00003fff80075a60: cmpdi r22,0 ;; 2297: __ cmpdi(CCR0, method, 0); >> 0x00003fff80075a64: bne- 0x00003fff80075b94 ;; 2298: __ bne(CCR0, resolved);,bo=0b00100[no_hint] >> 0x00003fff80075a68: li r4,186 ;; 2304: __ li(R4_ARG2, code); >> 0x00003fff80075a6c: ld r11,0(r1) ;; 2305: __ call_VM(noreg, entry, R4_ARG2, true); > > This change should be in a further RFE though (and you can explain it there so we can maybe use it in the other platforms too). Does it affect performance when generating the template interpreter? Do you need to have hsdis in the LD_LIBRARY_PATH environment variable to use this? I see it's already used by default in one place. This looks cool. ------------- PR: https://git.openjdk.org/jdk/pull/12778 From rrich at openjdk.org Thu Mar 9 16:47:09 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 9 Mar 2023 16:47:09 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry In-Reply-To: <0TIZDGiEpwyU6vRDvWCTiQBoyekNHwa5QoJr4ViPh0U=.7a45b939-ce28-4936-871b-dfc4c0b3a5ec@github.com> References: <0TIZDGiEpwyU6vRDvWCTiQBoyekNHwa5QoJr4ViPh0U=.7a45b939-ce28-4936-871b-dfc4c0b3a5ec@github.com> Message-ID: On Thu, 9 Mar 2023 16:01:21 GMT, Coleen Phillimore wrote: >> This change should be in a further RFE though (and you can explain it there so we can maybe use it in the other platforms too). Does it affect performance when generating the template interpreter? Do you need to have hsdis in the LD_LIBRARY_PATH environment variable to use this? I see it's already used by default in one place. > > This looks cool. > This change should be in a further RFE though (and you can explain it there so we can maybe use it in the other platforms too). Ok. > Does it affect performance when generating the template interpreter? I didn't think it would affect performance if the interpreter is not printed. I have not measured it though. > Do you need to have hsdis in the LD_LIBRARY_PATH environment variable to use this? I see it's already used by default in one place. Yes you do. It is not working with the AbstractDisassembler which produces a hex dump of the machine code. ------------- PR: https://git.openjdk.org/jdk/pull/12778 From mgronlun at openjdk.org Thu Mar 9 16:58:42 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 9 Mar 2023 16:58:42 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v9] In-Reply-To: References: Message-ID: > Greetings, > > We are adding support to let JFR report on Agents. > > #### Design > > An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. > > A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) > > A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. > > To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: > > // Command line > jdk.JavaAgent { > startTime = 12:31:19.789 (2023-03-08) > name = "JavaAgent.jar" > options = "foo=bar" > dynamic = false > initialization = 12:31:15.574 (2023-03-08) > initializationTime = 172 ms > } > > // Dynamic load > jdk.JavaAgent { > startTime = 12:31:31.158 (2023-03-08) > name = "JavaAgent.jar" > options = "bar=baz" > dynamic = true > initialization = 12:31:31.037 (2023-03-08) > initializationTime = 64,1 ms > } > > The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. > > For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. > > The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) > > "initialization" is the timestamp the JVM invoked the initialization method, and "initializationTime" is the duration of executing the initialization method. > > "startTime" represents the time the JFR framework issued the periodic event; hence "initialization" will be earlier than "startTime". > > An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. > > To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: > > jdk.NativeAgent { > startTime = 12:31:40.398 (2023-03-08) > name = "jdwp" > options = "transport=dt_socket,server=y,address=any,onjcmd=y" > dynamic = false > initialization = 12:31:36.142 (2023-03-08) > initializationTime = 0,00184 ms > path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" > } > > The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. > > The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initialization" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationTime" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationTime" will be 0. > > #### Implementation > > There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. > > Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. > > When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. > > The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. > > The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. > > Testing: jdk_jfr, tier 1 - 6 > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: handle multiple envs with same VMInit callback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12923/files - new: https://git.openjdk.org/jdk/pull/12923/files/db48fe8d..abeaa324 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=07-08 Stats: 8 lines in 2 files changed: 5 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/12923.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12923/head:pull/12923 PR: https://git.openjdk.org/jdk/pull/12923 From kvn at openjdk.org Thu Mar 9 17:26:30 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 9 Mar 2023 17:26:30 GMT Subject: RFR: JDK-8300783: Consolidate byteswap implementations [v14] In-Reply-To: <-b783DPmWbWFeigKf7F7SFYddDKwErM4AdFcfRx01eM=.5794fa08-a0d7-4761-a449-8ebfd639e30d@github.com> References: <-b783DPmWbWFeigKf7F7SFYddDKwErM4AdFcfRx01eM=.5794fa08-a0d7-4761-a449-8ebfd639e30d@github.com> Message-ID: On Wed, 15 Feb 2023 15:39:14 GMT, Justin King wrote: >> Deduplicate byte swapping implementations by consolidating them into `utilities/byteswap.hpp`, following `std::byteswap` introduced in C++23. Further simplification of `Bytes` will follow in https://github.com/openjdk/jdk/pull/12078. > > Justin King has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 24 additional commits since the last revision: > > - Merge remote-tracking branch 'upstream/master' into byteswap > - Update based on review > > Signed-off-by: Justin King > - Fix copyright > > Signed-off-by: Justin King > - Update copyright > > Signed-off-by: Justin King > - Add missing include > > Signed-off-by: Justin King > - Remove unused include > > Signed-off-by: Justin King > - Reorganize tests > > Signed-off-by: Justin King > - Fix test > > Signed-off-by: Justin King > - Merge remote-tracking branch 'upstream/master' into byteswap > - Be restrict on requiring 1, 2, 4, or 8 byte integers > > Signed-off-by: Justin King > - ... and 14 more: https://git.openjdk.org/jdk/compare/044b3f10...223d733b Tier1-5 testing passed clean. Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.org/jdk/pull/12114 From mdoerr at openjdk.org Thu Mar 9 17:29:37 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 9 Mar 2023 17:29:37 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v14] In-Reply-To: References: Message-ID: > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Will get addressed separately: [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Introduce ABIv2CallArranger for linux ppc64le. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12708/files - new: https://git.openjdk.org/jdk/pull/12708/files/fb87284c..2e4e269e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=12-13 Stats: 178 lines in 8 files changed: 99 ins; 58 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From stefank at openjdk.org Thu Mar 9 17:37:04 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 9 Mar 2023 17:37:04 GMT Subject: RFR: 8292059: Do not inline InstanceKlass::allocate_instance() [v3] In-Reply-To: <7Aide4lBzCPDqqanrD8I9SsT6LXneQ8CKSU4os4lH-Q=.38f1186f-01b0-4975-ad16-fcfd4eb6c031@github.com> References: <7Aide4lBzCPDqqanrD8I9SsT6LXneQ8CKSU4os4lH-Q=.38f1186f-01b0-4975-ad16-fcfd4eb6c031@github.com> Message-ID: The message from this sender included one or more files which could not be scanned for virus detection; do not open these files unless you are certain of the sender's intent. ---------------------------------------------------------------------- On Thu, 9 Mar 2023 10:36:36 GMT, Afshin Zafari wrote: >> The inline and not-inline versions of the method is stress tested to compare the performance difference. >> The `oop java_class` input parameter is changed to `InstanceKlass *` to gain more performance. >> The statistics are drawn in the following charts. The vertical axis is in milliseconds. >> >> ![chart (2)](https://user-images.githubusercontent.com/4697012/221848555-2884313e-9d26-41c9-a265-3f1ce295b17b.png) >> >> ![chart (3)](https://user-images.githubusercontent.com/4697012/221863810-94118677-b4af-468f-90c6-5ea365ae3588.png) > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > 8292059: Do not inline InstanceKlass::allocate_instance() That's really surprising. I also don't see how any of the proposed changes could affect the GC so much. This makes me suspicious of the performance claims. Could you redo the benchmarking and give us more information about: 1) What benchmarks were run 2) What was the benchmarks scores and GC metrics 3) What was the run-to-run variance in the scores and metrics ------------- PR: https://git.openjdk.org/jdk/pull/12782 From mdoerr at openjdk.org Thu Mar 9 17:42:50 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 9 Mar 2023 17:42:50 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v3] In-Reply-To: References: <8b3vVrV22RuhdRoRYacXV0ZeghFGgKkC8S_z-iMrzAQ=.dd84b743-8b51-4281-8f5f-f9eff6207bc7@github.com> Message-ID: On Mon, 6 Mar 2023 16:38:37 GMT, Jorn Vernee wrote: >> @JornVernee: Thanks a lot for your detailed review! I have quite a few TODOs which include: >> - Include my tests for the HFA corner cases. >> - Try to improve handling of the overlapping registers as you suggested. >> - Check nesting of HFA. >> >> There will surely be more when looking into Big Endian support after merging with your recent work on https://github.com/openjdk/panama-foreign/compare/foreign-memaccess+abi...JornVernee:panama-foreign:OOB >> We should get rid of oversized accesses on PPC64, too. >> Thanks for sharing your plans to intrisify `linkToNative` in C2 later. I guess we should do more preparation work on all platforms when that gets addressed. > > @TheRealMDoerr I've moved the support for structs/unions that are not a power of 2 in size to this repo, so you should be able to merge the master branch to get it now. @JornVernee: Thanks! I've merged in your changes. TestArrayStructs is not yet completely working. I will need to investigate. I think I've done most other things you had requested. You may want to take a look at my recent commits. ------------- PR: https://git.openjdk.org/jdk/pull/12708 From stefank at openjdk.org Thu Mar 9 18:04:40 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 9 Mar 2023 18:04:40 GMT Subject: RFR: 8303900: Rename BitMap search functions In-Reply-To: References: Message-ID: <4GIB9OwfDN2SPR-vA9cUIN2eNhWjqxV47tBLlsoEWuM=.278dc6e2-93f6-438d-8938-8bf0f35c5599@github.com> On Thu, 9 Mar 2023 15:34:05 GMT, Kim Barrett wrote: > Please review this renaming of the following functions in BitMap: > > get_next_one_offset => find_first_set_bit > get_next_zero_offset => find_first_clear_bit > get_next_one_offset_aligned_right => find_first_set_bit_aligned_right > > Note that ShenandoahMarkBitMap::get_next_one_offset is not being renamed. For > some reason that class contains a copy of a sizable chunk of the code from > some version of BitMap. (Not sure why it doesn't use an internal BitMapView > rather than code copying.) > > Testing: > mach5 tier1-5 Looks good. ------------- Marked as reviewed by stefank (Reviewer). PR: https://git.openjdk.org/jdk/pull/12951 From kvn at openjdk.org Thu Mar 9 18:09:08 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 9 Mar 2023 18:09:08 GMT Subject: RFR: JDK-8300783: Consolidate byteswap implementations [v8] In-Reply-To: References: <4T6ba7HVAkPmaau2WD3FRRyOlmEz7MDX5nz2UM-rfms=.58f59fc2-6030-4d9f-914f-5f37df4fb95e@github.com> Message-ID: On Mon, 6 Mar 2023 16:27:26 GMT, Justin King wrote: >> CI run was fine wrt these changes. > > @dholmes-ora Poke. :) @jcking, as Committer, you can integrate it now. ------------- PR: https://git.openjdk.org/jdk/pull/12114 From dcubed at openjdk.org Thu Mar 9 18:54:49 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 9 Mar 2023 18:54:49 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v5] In-Reply-To: <-ImSJ7DFBeTtKn-R9IJcFE8wreHtVHYxWBv743xPa8s=.6ced1034-d7e1-4e23-a53d-81cbda44361a@github.com> References: <-ImSJ7DFBeTtKn-R9IJcFE8wreHtVHYxWBv743xPa8s=.6ced1034-d7e1-4e23-a53d-81cbda44361a@github.com> Message-ID: On Mon, 30 Jan 2023 14:30:41 GMT, Roman Kennke wrote: >> src/hotspot/share/runtime/synchronizer.cpp line 1336: >> >>> 1334: // Success! Return inflated monitor. >>> 1335: if (own) { >>> 1336: assert(current->is_Java_thread(), "must be: checked in is_lock_owned()"); >> >> `is_lock_owned()` currently does this: >> >> >> static bool is_lock_owned(Thread* thread, oop obj) { >> assert(UseFastLocking, "only call this with fast-locking enabled"); >> return thread->is_Java_thread() ? reinterpret_cast(thread)->lock_stack().contains(obj) : false; >> } >> >> >> so I would not say "checked in is_locked_owned()" since `is_locked_owned()` does >> not enforce that the caller is a JavaThread. > > If it's not a Java thread, `is_lock_owned()` returns `false`, and we wouldn't end up in the `if (own)` branch. Okay, I get it. `is_lock_owned()` only return `true` when called by a JavaThread and if that JavaThread owns the monitor. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From sviswanathan at openjdk.org Thu Mar 9 18:55:57 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 9 Mar 2023 18:55:57 GMT Subject: RFR: 8302191: Performance degradation for float/double modulo on Linux [v11] In-Reply-To: References: Message-ID: On Tue, 7 Mar 2023 08:51:53 GMT, Jan Kratochvil wrote: >> I have OCA already processed/approved. I am not Author but my Author request is being processed these days (sent to Rob McKenna). >> I did regression test x86_64 OpenJDK-8. I will leave other regression testing on GHA. >> The patch (and former GCC performance regression) affects only x86_64+i686. > > Jan Kratochvil has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Fix #endif comment - found by dholmes-ora. > - Merge branch 'master' into modulo > - Fix win32 broken build. > - Merge remote-tracking branch 'origin/master' into modulo > - Always include the _WIN64 workaround - a review by dholmes-ora. > - Remove comments to be moved to JBS (Bug System) - a review by jddarcy. > - Uppercase L - a review by turbanoff. > - Fix copyright author. > - Fix WIN32 vs. WIN64. > - Update according to the upstream review by David Holmes. > - ... and 1 more: https://git.openjdk.org/jdk/compare/285b0bd1...e4ff04dc src/hotspot/share/runtime/sharedRuntime.cpp line 238: > 236: #endif > 237: > 238: #if !defined(TARGET_COMPILER_gcc) || defined(_WIN64) The aarch64 and other builds are now broken with missing SharedRuntime::frem and SharedRuntime::drem. ------------- PR: https://git.openjdk.org/jdk/pull/12508 From dcubed at openjdk.org Thu Mar 9 19:00:56 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 9 Mar 2023 19:00:56 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v15] In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 18:25:15 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Inline initial LockStack stack This project is still currently baselined on jdk-21+10-761. I was expecting this merge: [Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2](https://github.com/openjdk/jdk/pull/10907/commits/3c9d0d822fc15a196c4b8920b89ad6d3d0547101) to sync in the latest main baseline bits, but apparently not. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From pchilanomate at openjdk.org Thu Mar 9 19:19:24 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 9 Mar 2023 19:19:24 GMT Subject: RFR: 8303908: Add missing check in VTMS_transition_disable_for_all() for suspend mode Message-ID: Please review this small fix. A suspender is a JvmtiVTMSTransitionDisabler monopolist, meaning VTMS_transition_disable_for_all() should not return while there is any active jvmtiVTMSTransitionDisabler. The code though is checking for active "all-disablers" but it's missing the check for active "single disablers". I attached a simple reproducer to the bug which I used to test the patch. Not sure if it was worth adding a test so the patch contains just the fix. Thanks, Patricio ------------- Commit messages: - v1 Changes: https://git.openjdk.org/jdk/pull/12956/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12956&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303908 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/12956.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12956/head:pull/12956 PR: https://git.openjdk.org/jdk/pull/12956 From fparain at openjdk.org Thu Mar 9 19:27:32 2023 From: fparain at openjdk.org (Frederic Parain) Date: Thu, 9 Mar 2023 19:27:32 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry In-Reply-To: References: Message-ID: <89PNDhKGJttxXQg3Izv3dvSM33KewEekz4kmwVUjQXo=.d7fc365c-9783-4e5a-ac04-ba770a51c43c@github.com> On Mon, 27 Feb 2023 21:37:34 GMT, Matias Saavedra Silva wrote: > The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. > > This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. > > Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. > > The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. > > This change supports the following platforms: x86, aarch64, PPC, and RISCV src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 1844: > 1842: // Scale the index to be the entry index * sizeof(ResolvedInvokeDynamicInfo) > 1843: mov(tmp, sizeof(ResolvedIndyEntry)); > 1844: mul(index, index, tmp); On 64bits platform, sizeof(ResolvedIndyEntry) is 16, a power of two, so shift instruction could be used instead of a multiply instructions (with an assert in case the size of ResolvedIndyEntry is changed). src/hotspot/cpu/x86/interp_masm_x86.cpp line 2075: > 2073: movptr(cache, Address(rbp, frame::interpreter_frame_cache_offset * wordSize)); > 2074: movptr(cache, Address(cache, in_bytes(ConstantPoolCache::invokedynamic_entries_offset()))); > 2075: imull(index, index, sizeof(ResolvedIndyEntry)); // Scale the index to be the entry index * sizeof(ResolvedInvokeDynamicInfo) A shift instruction could be used when sizeof(ResolvedIndyEntry) is a power of two. It is on x86_64 platforms but not on x86_32 platforms (both are using this file). Suggested change: if (is_power_of_2(sizeof(ResolvedIndyEntry))) { shll(index, log2i(sizeof(ResolvedIndyEntry))); } else { imull(index, index, sizeof(ResolvedIndyEntry)); // Scale the index to be the entry index * sizeof(ResolvedInvokeDynamicInfo) } src/hotspot/cpu/x86/templateTable_x86.cpp line 2747: > 2745: address entry = CAST_FROM_FN_PTR(address, InterpreterRuntime::resolve_from_cache); > 2746: __ movl(method, code); // this is essentially Bytecodes::_invokedynamic > 2747: __ call_VM(noreg, entry, method); // Example uses temp = rbx. In this case rbx is method The comment is confusing and seems to need an update. The register 'method' is used, but its content is not the method anymore, it is the bytecode. src/hotspot/cpu/x86/templateTable_x86.cpp line 2770: > 2768: // since the parameter_size includes it. > 2769: __ push(rbx); > 2770: __ mov(rbx, index); Why is the index (rdx) copied to rbx instead of using the index (rdx) register directly to call load_resolved_reference_at_index() ? The method doesn't modify the content of the register. src/hotspot/share/interpreter/bootstrapInfo.cpp line 67: > 65: assert(_indy_index != -1, ""); > 66: // Check if method is not null > 67: if ( _pool->resolved_indy_entry_at(_indy_index)->method() != nullptr) { _pool->resolved_reference_from_indy(_indy_index) is repeated 5 times. Using a local variable would make the code easier to read. ------------- PR: https://git.openjdk.org/jdk/pull/12778 From jcking at openjdk.org Thu Mar 9 19:42:37 2023 From: jcking at openjdk.org (Justin King) Date: Thu, 9 Mar 2023 19:42:37 GMT Subject: Integrated: JDK-8300783: Consolidate byteswap implementations In-Reply-To: References: Message-ID: <24UOZ9Bl9l7k6FwS4HXol82TdrCcf2XkGTSnPKy3Q0Q=.301ce9ec-3f43-4356-a8e2-621a60422b76@github.com> On Fri, 20 Jan 2023 14:55:40 GMT, Justin King wrote: > Deduplicate byte swapping implementations by consolidating them into `utilities/byteswap.hpp`, following `std::byteswap` introduced in C++23. Further simplification of `Bytes` will follow in https://github.com/openjdk/jdk/pull/12078. This pull request has now been integrated. Changeset: a9dba565 Author: Justin King URL: https://git.openjdk.org/jdk/commit/a9dba565688a29bef8626488c47519008dcadbe8 Stats: 1561 lines in 31 files changed: 507 ins; 1023 del; 31 mod 8300783: Consolidate byteswap implementations Reviewed-by: kbarrett, kvn ------------- PR: https://git.openjdk.org/jdk/pull/12114 From kbarrett at openjdk.org Thu Mar 9 20:17:20 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 9 Mar 2023 20:17:20 GMT Subject: RFR: 8303810: Restore attribute positions after JDK-8303839 to match JDK-8302124 [v3] In-Reply-To: References: Message-ID: <8Ohq_fYWdoYmD7bvYKKFgIbc8L0SkFE5U22q9t2ylmE=.948ef550-ba01-4eb7-8b3d-3cd8a45cf4ae@github.com> The message from this sender included one or more files which could not be scanned for virus detection; do not open these files unless you are certain of the sender's intent. ---------------------------------------------------------------------- On Thu, 9 Mar 2023 11:54:44 GMT, Julian Waters wrote: >> [JDK-8303839](https://bugs.openjdk.org/browse/JDK-8303839)'s revert of [[noreturn]] attributes also moved their already existing attributes back to behind their corresponding methods, they should be restored to where [JDK-8302124](https://bugs.openjdk.org/browse/JDK-8302124) requires them to be. Also fixes attribute from [JDK-8292269](https://bugs.openjdk.org/browse/JDK-8292269) > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > debug.hpp Please don't do this. This change will just make the redo harder by introducing merge conflicts. ------------- PR: https://git.openjdk.org/jdk/pull/12918 From kbarrett at openjdk.org Thu Mar 9 20:24:16 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 9 Mar 2023 20:24:16 GMT Subject: RFR: 8274400: HotSpot Style Guide should permit use of alignof [v7] In-Reply-To: References: <3IdeZGaGcFB1-DbVugn0P9v8vRQMLURzHFRSpAm46NA=.09bf4dd8-b652-49c3-b9a5-ad3f58dfbcc3@github.com> Message-ID: On Mon, 27 Feb 2023 13:02:02 GMT, Julian Waters wrote: >> This has been open and stable for a while. Anyone else? > >> This has been open and stable for a while. Anyone else? > > Also if there is anyone else coming to review this, consider looking at https://github.com/openjdk/jdk/pull/11431 as well, it's also been collecting dust for quite a while too ;-; @TheShermanTanker - this is ready to be integrated, as it has been approved by @vnkozlov (HotSpot lead). ------------- PR: https://git.openjdk.org/jdk/pull/11761 From kbarrett at openjdk.org Thu Mar 9 20:27:42 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 9 Mar 2023 20:27:42 GMT Subject: RFR: JDK-8303817: Add constexpr for natural malloc alignment In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 11:23:53 GMT, Thomas Stuefe wrote: > I miss having an easy way to get the alignment the libc guarantees for malloc. Let's add this, and we can use this right away with NMT's MallocHeader. > > Tests: manually built and tested x64 linux (clang + GCC) and x86 linux. src/hotspot/share/utilities/globalDefinitions.hpp line 1313: > 1311: long double d; > 1312: }; > 1313: constexpr int minimum_malloc_alignment = alignof(AllTheBigTypes); How is this different from `alignof(std::max_align_t)`? (Which we can use - JDK-8274400 (approved, just noticed author hasn't integrated yet) and JDK-8297912.) ------------- PR: https://git.openjdk.org/jdk/pull/12921 From coleenp at openjdk.org Thu Mar 9 20:43:09 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 9 Mar 2023 20:43:09 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry In-Reply-To: References: <0TIZDGiEpwyU6vRDvWCTiQBoyekNHwa5QoJr4ViPh0U=.7a45b939-ce28-4936-871b-dfc4c0b3a5ec@github.com> Message-ID: On Thu, 9 Mar 2023 16:44:21 GMT, Richard Reingruber wrote: >> This looks cool. > >> This change should be in a further RFE though (and you can explain it there so we can maybe use it in the other platforms too). > > Ok. > >> Does it affect performance when generating the template interpreter? > > I didn't think it would affect performance if the interpreter is not printed. I have not measured it though. > >> Do you need to have hsdis in the LD_LIBRARY_PATH environment variable to use this? I see it's already used by default in one place. > > Yes you do. It is not working with the AbstractDisassembler which produces a hex dump of the machine code. I was searching in the wrong directory, which is why I didn't find this and apparently I reviewed this change in 2018. We can leave this change here. Sorry for the noise. ------------- PR: https://git.openjdk.org/jdk/pull/12778 From fparain at openjdk.org Thu Mar 9 20:51:38 2023 From: fparain at openjdk.org (Frederic Parain) Date: Thu, 9 Mar 2023 20:51:38 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 15:46:12 GMT, Coleen Phillimore wrote: >> Please review this change re-implementing the FieldInfo data structure. >> >> The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. >> >> The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). >> >> More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 >> >> Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. >> >> The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. >> >> Tested with mach5, tier 1 to 7. >> >> Thank you. > > src/hotspot/share/classfile/classFileParser.cpp line 1491: > >> 1489: _temp_field_info = new GrowableArray(total_fields); >> 1490: >> 1491: ResourceMark rm(THREAD); > > Is the ResourceMark ok here or should it go before allocating _temp_field_info ? _temp_field_info must survive after ClassFileParser::parse_fields() has returned, so definitively after the allocation of _temp_field_info. That being said, I don't see any reason to have a ResourceMark here, probably a remain of some debugging/tracing code. I'll remove it. ------------- PR: https://git.openjdk.org/jdk/pull/12855 From stuefe at openjdk.org Thu Mar 9 20:52:29 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 9 Mar 2023 20:52:29 GMT Subject: RFR: JDK-8303817: Add constexpr for natural malloc alignment In-Reply-To: References: Message-ID: <3487w225FRkI0CtznUs0ihccA9XUZgjZoURy8y3DA0M=.a947cc34-066c-49ca-bfa5-a5819c0c5100@github.com> On Wed, 8 Mar 2023 11:23:53 GMT, Thomas Stuefe wrote: > I miss having an easy way to get the alignment the libc guarantees for malloc. Let's add this, and we can use this right away with NMT's MallocHeader. > > Tests: manually built and tested x64 linux (clang + GCC) and x86 linux. Did not realize that std::max_align_t was an option. Sure, I do that. ------------- PR: https://git.openjdk.org/jdk/pull/12921 From stuefe at openjdk.org Thu Mar 9 20:52:31 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 9 Mar 2023 20:52:31 GMT Subject: Withdrawn: JDK-8303817: Add constexpr for natural malloc alignment In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 11:23:53 GMT, Thomas Stuefe wrote: > I miss having an easy way to get the alignment the libc guarantees for malloc. Let's add this, and we can use this right away with NMT's MallocHeader. > > Tests: manually built and tested x64 linux (clang + GCC) and x86 linux. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/12921 From fparain at openjdk.org Thu Mar 9 21:02:05 2023 From: fparain at openjdk.org (Frederic Parain) Date: Thu, 9 Mar 2023 21:02:05 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 15:50:05 GMT, Coleen Phillimore wrote: >> Please review this change re-implementing the FieldInfo data structure. >> >> The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. >> >> The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). >> >> More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 >> >> Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. >> >> The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. >> >> Tested with mach5, tier 1 to 7. >> >> Thank you. > > src/hotspot/share/classfile/classFileParser.cpp line 1608: > >> 1606: fflags.update_injected(true); >> 1607: AccessFlags aflags; >> 1608: FieldInfo fi(aflags, (u2)(injected[n].name_index), (u2)(injected[n].signature_index), 0, fflags); > > I don't know why there's a cast here until I read more. If the FieldInfo name_index and signature_index fields are only u2 sized, could you pass this as an int and then in the constructor assert that the value doesn't overflow u2 instead? The type of name_index and signature_index is const vmSymbolID, because they names and signatures of injected fields do not come from a constant pool, but from the vmSymbol array. ------------- PR: https://git.openjdk.org/jdk/pull/12855 From rkennke at openjdk.org Thu Mar 9 21:08:16 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 9 Mar 2023 21:08:16 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v16] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 99 commits: - Merge branch 'master' into JDK-8291555-v2 - Various small fixes and improvements - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 - Use realloc instead of malloc+copy when growing the lock-stack - Inline initial LockStack stack - Fix interpreter asymmetric fast-locking - Fix merge error (move done label into correct places) - Merge branch 'master' into JDK-8291555-v2 - Small fixes - Fix anon owner in fast-path, avoid runtime call (aarch64) - ... and 89 more: https://git.openjdk.org/jdk/compare/5726d31e...f9f93b36 ------------- Changes: https://git.openjdk.org/jdk/pull/10907/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=15 Stats: 2090 lines in 74 files changed: 1327 ins; 94 del; 669 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From fparain at openjdk.org Thu Mar 9 21:12:11 2023 From: fparain at openjdk.org (Frederic Parain) Date: Thu, 9 Mar 2023 21:12:11 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams In-Reply-To: References: Message-ID: <0Ayok_tvKtFDl6deMwvAIUcT8MFkA8fIXSCHqwXKYts=.2cf55fae-34ae-4ce3-b733-b32df0acaf45@github.com> On Wed, 8 Mar 2023 16:05:57 GMT, Coleen Phillimore wrote: >> Please review this change re-implementing the FieldInfo data structure. >> >> The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. >> >> The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). >> >> More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 >> >> Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. >> >> The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. >> >> Tested with mach5, tier 1 to 7. >> >> Thank you. > > src/hotspot/share/classfile/fieldLayoutBuilder.cpp line 554: > >> 552: FieldInfo ctrl = _field_info->at(0); >> 553: FieldGroup* group = nullptr; >> 554: FieldInfo tfi = *it; > > What's the 't' in tfi? Maybe a longer variable name would be helpful here. At some point there was a TempFieldInfo type, hence the name. Renamed to fieldinfo. > src/hotspot/share/classfile/javaClasses.cpp line 871: > >> 869: // a new UNSIGNED5 stream, and substitute it to the old FieldInfo stream. >> 870: >> 871: int java_fields; > > Can you put InstanceKlass* ik = InstanceKlass::cast(k); here and use that so there's only one cast? Sure, done. ------------- PR: https://git.openjdk.org/jdk/pull/12855 From matsaave at openjdk.org Thu Mar 9 21:18:19 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 9 Mar 2023 21:18:19 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v2] In-Reply-To: References: Message-ID: <-Kj1YJ_nRa4nJtaxg3UR8uWhde6vIG1Jl-FFakGnHy4=.a41c6149-912b-4a66-8b1e-634bd27cdebb@github.com> > The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. > > This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. > > Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. > > The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. > > This change supports the following platforms: x86, aarch64, PPC, and RISCV Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Interpreter optimization and comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12778/files - new: https://git.openjdk.org/jdk/pull/12778/files/829517d6..c2d87e59 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=00-01 Stats: 47 lines in 10 files changed: 11 ins; 13 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/12778.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12778/head:pull/12778 PR: https://git.openjdk.org/jdk/pull/12778 From dcubed at openjdk.org Thu Mar 9 21:45:12 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 9 Mar 2023 21:45:12 GMT Subject: RFR: 8274400: HotSpot Style Guide should permit use of alignof [v7] In-Reply-To: References: <3IdeZGaGcFB1-DbVugn0P9v8vRQMLURzHFRSpAm46NA=.09bf4dd8-b652-49c3-b9a5-ad3f58dfbcc3@github.com> Message-ID: On Tue, 24 Jan 2023 09:21:37 GMT, Julian Waters wrote: >> The alignof operator was added by C++11. It returns the alignment for the given type. Various metaprogramming usages exist, in particular when using std::aligned_storage. Use of this operator should be permitted in HotSpot code. > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Oof Marked as reviewed by dcubed (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/11761 From dcubed at openjdk.org Thu Mar 9 22:02:41 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 9 Mar 2023 22:02:41 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v16] In-Reply-To: References: Message-ID: On Thu, 9 Mar 2023 21:08:16 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 99 commits: > > - Merge branch 'master' into JDK-8291555-v2 > - Various small fixes and improvements > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Use realloc instead of malloc+copy when growing the lock-stack > - Inline initial LockStack stack > - Fix interpreter asymmetric fast-locking > - Fix merge error (move done label into correct places) > - Merge branch 'master' into JDK-8291555-v2 > - Small fixes > - Fix anon owner in fast-path, avoid runtime call (aarch64) > - ... and 89 more: https://git.openjdk.org/jdk/compare/5726d31e...f9f93b36 This project is currently based on jdk-21+14-1079. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From sspitsyn at openjdk.org Thu Mar 9 22:27:11 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 9 Mar 2023 22:27:11 GMT Subject: RFR: 8302779: HelidonAppTest.java fails with "assert(_cb == CodeCache::find_blob(pc())) failed: Must be the same" or SIGSEGV In-Reply-To: References: Message-ID: <9TvBQ8ScbPmz7uMFo8fWYly239z03-tYBsNWV1N0s4A=.344c11c7-c375-4e75-a4b2-7ed1592971bc@github.com> On Tue, 7 Mar 2023 22:14:39 GMT, Patricio Chilano Mateo wrote: > Please review the following fix. The Method instance representing Continuation.enterSpecial() is replaced by a new Method during redefinition of the Continuation class. The already existing nmethod for it is not used, but a new one will be generated the first time enterSpecial() is resolved after redefinition. This means we could have more than one nmethod representing enterSpecial(), in particular, one generated before redefinition took place, and one after it. Now, when walking the stack, if we found a return barrier pc (as in Continuation::is_return_barrier_entry()) and we want to keep walking the physical stack then we know the sender will be the enterSpecial frame so we create it by calling ContinuationEntry::to_frame(). This method assumes there can only be one nmethod associated with enterSpecial() so we hit an assert later on. See the bug for more details of the crash. > > As I mentioned in the bug we don't need to rely on this assumption since we can re-read the updated value from _enter_special. But reading both _enter_special and _return_pc means we would need some kind of synchronization since to_frame() could be called concurrently with set_enter_code(). To avoid that we could just read _return_pc and calculate the blob from it each time, but I'm also assuming that overhead is undesired and that's why the static variable was introduced. Alternatively _enter_special could be read and _return_pc could be derived from it (by adding an extra field in the nmethod class). But if we go this route I think we would need to do a small fix on thaw too. After redefinition and before a new call to resolve enterSpecial(), the last thaw call for some virtual thread would create an entry frame with an old _return_pc (see ThawBase::new_entry_frame() and ThawBase::patch_return()). I'm not sure about the lifecycle of the old CodeBlob but it seems it could have bee n already removed if enterSpecial was not found while traversing everybody's stack. Maybe there are more issues. > > The simple solution implemented here is just to disallow redefinition of the Continuation class altogether. Another less restrictive option would be to keep the already generated enterSpecial nmethod, if there is one. I can also investigate one of the routes mentioned previously if desired. > > I tested the fix with the simple reproducer I added to the bug and also with the previously crashing HelidonAppTest.java test. > > Thanks, > Patricio Looks good. Thank you for taking care about it! Serguei, Thanks ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.org/jdk/pull/12911 From pchilanomate at openjdk.org Thu Mar 9 22:38:10 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 9 Mar 2023 22:38:10 GMT Subject: RFR: 8302779: HelidonAppTest.java fails with "assert(_cb == CodeCache::find_blob(pc())) failed: Must be the same" or SIGSEGV In-Reply-To: <9TvBQ8ScbPmz7uMFo8fWYly239z03-tYBsNWV1N0s4A=.344c11c7-c375-4e75-a4b2-7ed1592971bc@github.com> References: <9TvBQ8ScbPmz7uMFo8fWYly239z03-tYBsNWV1N0s4A=.344c11c7-c375-4e75-a4b2-7ed1592971bc@github.com> Message-ID: On Thu, 9 Mar 2023 22:24:00 GMT, Serguei Spitsyn wrote: > Looks good. Thank you for taking care about it! Serguei, Thanks > Thanks for the review Serguei! ------------- PR: https://git.openjdk.org/jdk/pull/12911 From pchilanomate at openjdk.org Thu Mar 9 22:57:22 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 9 Mar 2023 22:57:22 GMT Subject: Integrated: 8302779: HelidonAppTest.java fails with "assert(_cb == CodeCache::find_blob(pc())) failed: Must be the same" or SIGSEGV In-Reply-To: References: Message-ID: On Tue, 7 Mar 2023 22:14:39 GMT, Patricio Chilano Mateo wrote: > Please review the following fix. The Method instance representing Continuation.enterSpecial() is replaced by a new Method during redefinition of the Continuation class. The already existing nmethod for it is not used, but a new one will be generated the first time enterSpecial() is resolved after redefinition. This means we could have more than one nmethod representing enterSpecial(), in particular, one generated before redefinition took place, and one after it. Now, when walking the stack, if we found a return barrier pc (as in Continuation::is_return_barrier_entry()) and we want to keep walking the physical stack then we know the sender will be the enterSpecial frame so we create it by calling ContinuationEntry::to_frame(). This method assumes there can only be one nmethod associated with enterSpecial() so we hit an assert later on. See the bug for more details of the crash. > > As I mentioned in the bug we don't need to rely on this assumption since we can re-read the updated value from _enter_special. But reading both _enter_special and _return_pc means we would need some kind of synchronization since to_frame() could be called concurrently with set_enter_code(). To avoid that we could just read _return_pc and calculate the blob from it each time, but I'm also assuming that overhead is undesired and that's why the static variable was introduced. Alternatively _enter_special could be read and _return_pc could be derived from it (by adding an extra field in the nmethod class). But if we go this route I think we would need to do a small fix on thaw too. After redefinition and before a new call to resolve enterSpecial(), the last thaw call for some virtual thread would create an entry frame with an old _return_pc (see ThawBase::new_entry_frame() and ThawBase::patch_return()). I'm not sure about the lifecycle of the old CodeBlob but it seems it could have bee n already removed if enterSpecial was not found while traversing everybody's stack. Maybe there are more issues. > > The simple solution implemented here is just to disallow redefinition of the Continuation class altogether. Another less restrictive option would be to keep the already generated enterSpecial nmethod, if there is one. I can also investigate one of the routes mentioned previously if desired. > > I tested the fix with the simple reproducer I added to the bug and also with the previously crashing HelidonAppTest.java test. > > Thanks, > Patricio This pull request has now been integrated. Changeset: 8b740b46 Author: Patricio Chilano Mateo URL: https://git.openjdk.org/jdk/commit/8b740b46091c853c7cb66c361deda6dfbb2cedc8 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod 8302779: HelidonAppTest.java fails with "assert(_cb == CodeCache::find_blob(pc())) failed: Must be the same" or SIGSEGV Reviewed-by: coleenp, sspitsyn ------------- PR: https://git.openjdk.org/jdk/pull/12911 From inakonechnyy at openjdk.org Thu Mar 9 23:12:00 2023 From: inakonechnyy at openjdk.org (Ilarion Nakonechnyy) Date: Thu, 9 Mar 2023 23:12:00 GMT Subject: RFR: 8302491: NoClassDefFoundError omits the original cause of an error [v7] In-Reply-To: References: Message-ID: > The proposed approach added a new function for getting the cause of an exception -`java_lang_Throwable::get_cause_simple `, that gets called within `InstanceKlass::add_initialization_error` if an old one `java_lang_Throwable::get_cause_with_stack_trace` didn't succeed because of an exception during the VM call. The simple function doesn't call the VM for getting a stack trace but fills in any other information about an exception. > > Besides that, the discovering information about an exception was added to `ConstantPoolCacheEntry::save_and_throw_indy_exc` function. > > Jtreg for reproducing the issue also was added to the commit. > The commit was tested with tier1 tests. Ilarion Nakonechnyy has updated the pull request incrementally with one additional commit since the last revision: Some corrections ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12566/files - new: https://git.openjdk.org/jdk/pull/12566/files/bd4df11d..5e575962 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12566&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12566&range=05-06 Stats: 7 lines in 2 files changed: 1 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/12566.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12566/head:pull/12566 PR: https://git.openjdk.org/jdk/pull/12566 From dcubed at openjdk.org Thu Mar 9 23:23:26 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 9 Mar 2023 23:23:26 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v16] In-Reply-To: References: Message-ID: On Thu, 9 Mar 2023 21:08:16 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 99 commits: > > - Merge branch 'master' into JDK-8291555-v2 > - Various small fixes and improvements > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Use realloc instead of malloc+copy when growing the lock-stack > - Inline initial LockStack stack > - Fix interpreter asymmetric fast-locking > - Fix merge error (move done label into correct places) > - Merge branch 'master' into JDK-8291555-v2 > - Small fixes > - Fix anon owner in fast-path, avoid runtime call (aarch64) > - ... and 89 more: https://git.openjdk.org/jdk/compare/5726d31e...f9f93b36 Another chunk of partial review. This time I did the src/hotspot/cpu/aarch64 and src/hotspot/cpu/arm files: src/hotspot/cpu/aarch64/aarch64.ad src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp src/hotspot/cpu/aarch64/c1_MacroAssembler_aarch64.cpp src/hotspot/cpu/aarch64/c2_CodeStubs_aarch64.cpp src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp src/hotspot/cpu/aarch64/stubRoutines_aarch64.cpp src/hotspot/cpu/aarch64/stubRoutines_aarch64.hpp src/hotspot/cpu/arm/c1_LIRAssembler_arm.cpp src/hotspot/cpu/arm/c1_MacroAssembler_arm.cpp src/hotspot/cpu/aarch64/aarch64.ad line 3856: > 3854: // Indicate success at cont. > 3855: __ cmp(oop, oop); > 3856: __ b(count); This code does `b(count)` and the `count` label is after the `cont` label so the comment on L3912 doesn't quite make sense. Perhaps: `// Indicate success on completion.` src/hotspot/cpu/aarch64/c2_CodeStubs_aarch64.cpp line 68: > 66: > 67: int C2CheckLockStackStub::max_size() const { > 68: return 20; nit - a comment explaining this choice of literal value ('20') would be useful src/hotspot/cpu/aarch64/c2_CodeStubs_aarch64.cpp line 79: > 77: > 78: int C2HandleAnonOMOwnerStub::max_size() const { > 79: return 20; nit - a comment explaining this choice of literal value ('20') would be useful src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5491: > 5489: > 5490: > 5491: __ pop_call_clobbered_registers(); nit - double blank lines here for a reason? src/hotspot/cpu/arm/c1_MacroAssembler_arm.cpp line 56: > 54: } > 55: > 56: void C1_MacroAssembler::build_frame(int frame_size_in_bytes, int bang_size_in_bytes, int max_monitors) { So the `max_monitors` param is added, but not use of it. Is someone else doing the 32-bit ARM port? ------------- PR: https://git.openjdk.org/jdk/pull/10907 From dcubed at openjdk.org Thu Mar 9 23:23:30 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 9 Mar 2023 23:23:30 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v15] In-Reply-To: References: Message-ID: <6OKumxjpgT6L0uYvNnYr4tZO3VjvC_ixqFoaRo7bBHs=.8bad7e67-e663-4371-aab8-830df61bbf0e@github.com> The message from this sender included one or more files which could not be scanned for virus detection; do not open these files unless you are certain of the sender's intent. ---------------------------------------------------------------------- On Wed, 8 Mar 2023 18:25:15 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Inline initial LockStack stack src/hotspot/cpu/aarch64/aarch64.ad line 3914: > 3912: // Indicate success at cont. > 3913: __ cmp(oop, oop); > 3914: __ b(count); This code does `b(count)` and the `count` label is after the `cont` label so the comment on L3912 doesn't quite make sense. Perhaps: `// Indicate success on completion.` ------------- PR: https://git.openjdk.org/jdk/pull/10907 From dcubed at openjdk.org Fri Mar 10 01:06:31 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 10 Mar 2023 01:06:31 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v16] In-Reply-To: References: Message-ID: On Thu, 9 Mar 2023 21:08:16 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 99 commits: > > - Merge branch 'master' into JDK-8291555-v2 > - Various small fixes and improvements > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Use realloc instead of malloc+copy when growing the lock-stack > - Inline initial LockStack stack > - Fix interpreter asymmetric fast-locking > - Fix merge error (move done label into correct places) > - Merge branch 'master' into JDK-8291555-v2 > - Small fixes > - Fix anon owner in fast-path, avoid runtime call (aarch64) > - ... and 89 more: https://git.openjdk.org/jdk/compare/5726d31e...f9f93b36 Another chunk of partial review. This time I did the src/hotspot/cpu/x86 files: src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp src/hotspot/cpu/x86/c1_LIRGenerator_x86.cpp src/hotspot/cpu/x86/c1_MacroAssembler_x86.cpp src/hotspot/cpu/x86/c1_MacroAssembler_x86.hpp src/hotspot/cpu/x86/c2_CodeStubs_x86.cpp src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp src/hotspot/cpu/x86/c2_MacroAssembler_x86.hpp src/hotspot/cpu/x86/interp_masm_x86.cpp src/hotspot/cpu/x86/macroAssembler_x86.cpp src/hotspot/cpu/x86/macroAssembler_x86.hpp src/hotspot/cpu/x86/sharedRuntime_x86_32.cpp src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp src/hotspot/cpu/x86/stubGenerator_x86_64.cpp src/hotspot/cpu/x86/stubGenerator_x86_64.hpp src/hotspot/cpu/x86/stubRoutines_x86.cpp src/hotspot/cpu/x86/stubRoutines_x86.hpp src/hotspot/cpu/x86/x86_32.ad src/hotspot/cpu/x86/x86_64.ad src/hotspot/cpu/x86/c2_CodeStubs_x86.cpp line 77: > 75: > 76: int C2CheckLockStackStub::max_size() const { > 77: return 10; nit - a comment explaining this choice of literal value ('10') would be useful src/hotspot/cpu/x86/c2_CodeStubs_x86.cpp line 89: > 87: #ifdef _LP64 > 88: int C2HandleAnonOMOwnerStub::max_size() const { > 89: return 17; nit - a comment explaining this choice of literal value ('17') would be useful src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 624: > 622: jmp(COUNT); > 623: #else > 624: // We can not emit the lock-stack-check in verified_entry() because we don't have enough nit typo: s/can not/cannot/ src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 632: > 630: jmp(COUNT); > 631: bind(slow); > 632: testptr(objReg, objReg); // ZF=0 to indicate failure nit perhaps: // force ZF=0 to indicate failure src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 694: > 692: // Invariant: tmpReg == 0. tmpReg is EAX which is the implicit cmpxchg comparand. > 693: lock(); > 694: cmpxchgptr(thread, Address(boxReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner))); Now that `fast_lock` is being passed in a `thread` register, you've switched from using `scrReg` to using `thread`. Of course, this means that the comment from L685 -> L691 needs to be revisited/rewritten. Unfortunately, the GitHub UI doesn't let me highlight from L685 -> L690 as part of this comment. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 815: > 813: jccb(Assembler::zero, Stacked); > 814: if (UseFastLocking) { > 815: // If the owner is ANONYMOUS, we need to fix it - in an outline stube. nit typo: s/outline stube/outline stub/ src/hotspot/cpu/x86/interp_masm_x86.cpp line 1357: > 1355: cmpptr(obj_reg, Address(tmp, -oopSize)); > 1356: jcc(Assembler::notEqual, slow_case); > 1357: // Try to swing header from locked to unlock. nit typo: s/locked to unlock./locked to unlocked./ src/hotspot/cpu/x86/macroAssembler_x86.cpp line 9733: > 9731: #else > 9732: const Register thread = rax; > 9733: get_thread(rax); Other places that use this idiom do it like this: const Register thread = rax; get_thread(thread); ------------- PR: https://git.openjdk.org/jdk/pull/10907 From dcubed at openjdk.org Fri Mar 10 01:16:28 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 10 Mar 2023 01:16:28 GMT Subject: RFR: 8300926: Several startup regressions ~6-70% in 21-b6 all platforms [v6] In-Reply-To: References: Message-ID: On Mon, 27 Feb 2023 00:29:26 GMT, David Holmes wrote: >> Robbin Ehn has updated the pull request incrementally with two additional commits since the last revision: >> >> - Comment fixes >> - Include/fwd fixes > > In `ciEnv::register_method` we take the `Compile_lock` to ensure `add_to_hierarchy` can't run: > > // Prevent SystemDictionary::add_to_hierarchy from running > // and invalidating our dependencies until we install this method. > // No safepoints are allowed. Otherwise, class redefinition can occur in between. > MutexLocker ml(Compile_lock); > NoSafepointVerifier nsv; > > does moving the deopt outside of the `Compile_lock` affect that? I've just tried to figure out whether @dholmes-ora's concerns have been addressed. The last mesgs that I see from David are from 2023.02.26 so about 2 weeks ago, but I can't tell for sure whether @robehn has addressed those remaining comments or not. Sorry, but the GitHub UI is really messing me up here. ------------- PR: https://git.openjdk.org/jdk/pull/12585 From dholmes at openjdk.org Fri Mar 10 02:14:20 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 10 Mar 2023 02:14:20 GMT Subject: RFR: 8300926: Several startup regressions ~6-70% in 21-b6 all platforms [v8] In-Reply-To: References: Message-ID: On Tue, 7 Mar 2023 13:47:55 GMT, Robbin Ehn wrote: >> Hi all, please consider. >> >> The original issue was when thread 1 asked to deopt nmethod set X and thread 2 asked for the same or a subset of X. >> All method will already be marked, but the actual deoptimizing, not entrant, patching PC on stacks and patching post call nops, was not done yet. Which meant thread 2 could 'pass' thread 1. >> Most places did deopt under Compile_lock, thus this is not an issue, but WB and clearCallSiteContext do not. >> >> Since a handshakes may take long before completion and Compile_lock is used for so much more than deopts. >> The fix in https://bugs.openjdk.org/browse/JDK-8299074 instead always emitted a handshake even when everything was already marked. (instead of adding Compile_lock to all places) >> >> This turnout to be problematic in the startup, for example the number of deopt handshakes in jetty dry run startup went from 5 to 39 handshakes. >> >> This fix first adds a barrier for which you do not pass until the requested deopts have happened and it coalesces the handshakes. >> Secondly it moves handshakes part out of the Compile_lock where it is possible. >> >> Which means we fix the performance bug and we reduce the contention on Compile_lock, meaning higher throughput in compiler and things such as class-loading. >> >> It passes t1-t7 with flying colours! t8 still not completed and I'm redoing some testing due to last minute simplifications. >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request incrementally with two additional commits since the last revision: > > - Grab lock so code() is stable > - Non CHA based vtables fix I think my previous comments have been addressed. Thanks src/hotspot/share/classfile/systemDictionary.cpp line 1618: > 1616: > 1617: // In case we are not using CHA based vtables we need to make sure the loaded > 1618: // deopt is completed before anyone link this class. nit s/link/links/ src/hotspot/share/classfile/systemDictionary.cpp line 1623: > 1621: if (!UseVtableBasedCHA) { > 1622: k->init_monitor()->lock(); > 1623: } You could use a `MutexLocker` to manage this too. ------------- PR: https://git.openjdk.org/jdk/pull/12585 From dholmes at openjdk.org Fri Mar 10 02:27:13 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 10 Mar 2023 02:27:13 GMT Subject: RFR: 8302491: NoClassDefFoundError omits the original cause of an error [v7] In-Reply-To: References: Message-ID: On Thu, 9 Mar 2023 23:12:00 GMT, Ilarion Nakonechnyy wrote: >> The proposed approach added a new function for getting the cause of an exception -`java_lang_Throwable::get_cause_simple `, that gets called within `InstanceKlass::add_initialization_error` if an old one `java_lang_Throwable::get_cause_with_stack_trace` didn't succeed because of an exception during the VM call. The simple function doesn't call the VM for getting a stack trace but fills in any other information about an exception. >> >> Besides that, the discovering information about an exception was added to `ConstantPoolCacheEntry::save_and_throw_indy_exc` function. >> >> Jtreg for reproducing the issue also was added to the commit. >> The commit was tested with tier1 tests. > > Ilarion Nakonechnyy has updated the pull request incrementally with one additional commit since the last revision: > > Some corrections Looks good. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/12566 From dholmes at openjdk.org Fri Mar 10 03:24:06 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 10 Mar 2023 03:24:06 GMT Subject: RFR: 8303810: Restore attribute positions after JDK-8303839 to match JDK-8302124 [v3] In-Reply-To: References: Message-ID: On Thu, 9 Mar 2023 11:54:44 GMT, Julian Waters wrote: >> [JDK-8303839](https://bugs.openjdk.org/browse/JDK-8303839)'s revert of [[noreturn]] attributes also moved their already existing attributes back to behind their corresponding methods, they should be restored to where [JDK-8302124](https://bugs.openjdk.org/browse/JDK-8302124) requires them to be. Also fixes attribute from [JDK-8292269](https://bugs.openjdk.org/browse/JDK-8292269) > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > debug.hpp I agree with Kim. Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/12918 From sspitsyn at openjdk.org Fri Mar 10 03:54:01 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 10 Mar 2023 03:54:01 GMT Subject: RFR: 8303908: Add missing check in VTMS_transition_disable_for_all() for suspend mode In-Reply-To: References: Message-ID: On Thu, 9 Mar 2023 18:55:06 GMT, Patricio Chilano Mateo wrote: > Please review this small fix. A suspender is a JvmtiVTMSTransitionDisabler monopolist, meaning VTMS_transition_disable_for_all() should not return while there is any active jvmtiVTMSTransitionDisabler. The code though is checking for active "all-disablers" but it's missing the check for active "single disablers". > I attached a simple reproducer to the bug which I used to test the patch. Not sure if it was worth adding a test so the patch contains just the fix. > > Thanks, > Patricio This looks good. Thank you for catching and taking care about it! Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.org/jdk/pull/12956 From dholmes at openjdk.org Fri Mar 10 04:40:13 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 10 Mar 2023 04:40:13 GMT Subject: RFR: 8303908: Add missing check in VTMS_transition_disable_for_all() for suspend mode In-Reply-To: References: Message-ID: On Thu, 9 Mar 2023 18:55:06 GMT, Patricio Chilano Mateo wrote: > Please review this small fix. A suspender is a JvmtiVTMSTransitionDisabler monopolist, meaning VTMS_transition_disable_for_all() should not return while there is any active jvmtiVTMSTransitionDisabler. The code though is checking for active "all-disablers" but it's missing the check for active "single disablers". > I attached a simple reproducer to the bug which I used to test the patch. Not sure if it was worth adding a test so the patch contains just the fix. > > Thanks, > Patricio src/hotspot/share/prims/jvmtiThreadState.cpp line 372: > 370: java_lang_Thread::dec_VTMS_transition_disable_count(vth()); > 371: Atomic::dec(&_VTMS_transition_disable_for_one_count); > 372: if (_VTMS_transition_disable_for_one_count == 0 || _is_SR) { Sorry I don't understand why this `_is_SR` check was removed. I admit I can't really figure out what this field means anyway, but there is nothing in the issue description that suggests this also needs changing - and it is now different to `VTMS_transition_enable_for_all`. ------------- PR: https://git.openjdk.org/jdk/pull/12956 From dholmes at openjdk.org Fri Mar 10 05:27:18 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 10 Mar 2023 05:27:18 GMT Subject: RFR: 8300926: Several startup regressions ~6-70% in 21-b6 all platforms [v8] In-Reply-To: References: Message-ID: <13XU6KtsVFH4NbZMqa0OcTAMEleA-Ia1Oy_34b5IV-I=.ee32fceb-ec0e-40ad-8c64-fdd57dc99941@github.com> On Fri, 10 Mar 2023 02:09:54 GMT, David Holmes wrote: >> Robbin Ehn has updated the pull request incrementally with two additional commits since the last revision: >> >> - Grab lock so code() is stable >> - Non CHA based vtables fix > > src/hotspot/share/classfile/systemDictionary.cpp line 1623: > >> 1621: if (!UseVtableBasedCHA) { >> 1622: k->init_monitor()->lock(); >> 1623: } > > You could use a `MutexLocker` to manage this too. Doh! No you can't this is a monitor not a mutex. ------------- PR: https://git.openjdk.org/jdk/pull/12585 From dholmes at openjdk.org Fri Mar 10 06:40:05 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 10 Mar 2023 06:40:05 GMT Subject: RFR: JDK-8303861: Error handling step timeouts should never be blocked by OnError and others [v2] In-Reply-To: References: Message-ID: On Thu, 9 Mar 2023 10:09:28 GMT, Thomas Stuefe wrote: >> Fatal error handling is subject to several timeouts: >> - a global timeout (controlled via ErrorLogTimeout) >> - local error reporting step timeouts. >> >> The latter aims to "give the JVM a kick" if it gets stuck in one particular place during error reporting. This prevents one error reporting step from hogging all the time allotted to error reporting under ErrorLogTimeout. >> >> There are three situations where atm we suppress the global error timeout: >> - if the JVM is embedded and the launcher has its abort hook installed. Obviously, that must be allowed to run. >> - if the user specified one or more OnError commands to run, and these did not yet run. These must have a chance to run unmolested. >> - if the user (typically developer) specified ShowMessageBoxOnError, and the error box has not yet been shown >> >> There is a bug though, that also prevents the step timeout from firing if either condition is true. That is plain wrong. >> >> In addition to that, the test interval WatcherThread uses to check for timeouts should be decreased. It sits at 1 second, which is too coarse-grained. >> >> -------- >> >> Patch: >> - reworks `VMError::check_timeout()` to never block step timeouts >> - adds clarifying comments >> - quadruples timeout check frequency by watcher thread >> - adds regression test for timeout handling with OnError >> - additionally limits timeout per individual error reporting step to 5 seconds. 5 seconds is usually enough to distinguish a slow error reporting step from one that is endlessly hanging. >> >> Tested locally on Linux x64. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > limit step timeout to 5 seconds max Changes seem fine. Thanks for the clear explanation. src/hotspot/share/runtime/nonJavaThread.cpp line 274: > 272: > 273: // Wait a second, then recheck for timeout. > 274: os::naked_short_sleep(999); Harmless change but I don't see why we need sub-second resolution when the ErrorLogTimeout is in seconds. ?? ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/12936 From stuefe at openjdk.org Fri Mar 10 07:23:15 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 10 Mar 2023 07:23:15 GMT Subject: RFR: JDK-8303861: Error handling step timeouts should never be blocked by OnError and others [v2] In-Reply-To: References: Message-ID: <74f_tU9_sqUx333OTutJ1T_b1HLSssABZfHSNAhcuvU=.f8eb353d-863f-4a4a-9270-09d906eae2d0@github.com> On Fri, 10 Mar 2023 06:27:28 GMT, David Holmes wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> limit step timeout to 5 seconds max > > src/hotspot/share/runtime/nonJavaThread.cpp line 274: > >> 272: >> 273: // Wait a second, then recheck for timeout. >> 274: os::naked_short_sleep(999); > > Harmless change but I don't see why we need sub-second resolution when the ErrorLogTimeout is in seconds. ?? This matters because it also applies to each step timeout too. If we check only once per second, we overshoot each timeout by up to one second. This overshooting happens for every step timeout. If we are in a situation like here where we ignore the global step timeout, and we keep running into deadlocks e.g. in malloc, we will encounter a lot of step timeouts; these would add up. ------------- PR: https://git.openjdk.org/jdk/pull/12936 From stuefe at openjdk.org Fri Mar 10 07:36:17 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 10 Mar 2023 07:36:17 GMT Subject: RFR: JDK-8303861: Error handling step timeouts should never be blocked by OnError and others [v2] In-Reply-To: <74f_tU9_sqUx333OTutJ1T_b1HLSssABZfHSNAhcuvU=.f8eb353d-863f-4a4a-9270-09d906eae2d0@github.com> References: <74f_tU9_sqUx333OTutJ1T_b1HLSssABZfHSNAhcuvU=.f8eb353d-863f-4a4a-9270-09d906eae2d0@github.com> Message-ID: On Fri, 10 Mar 2023 07:20:00 GMT, Thomas Stuefe wrote: >> src/hotspot/share/runtime/nonJavaThread.cpp line 274: >> >>> 272: >>> 273: // Wait a second, then recheck for timeout. >>> 274: os::naked_short_sleep(999); >> >> Harmless change but I don't see why we need sub-second resolution when the ErrorLogTimeout is in seconds. ?? > > This matters because it also applies to each step timeout too. If we check only once per second, we overshoot each timeout by up to one second. This overshooting happens for every step timeout. If we are in a situation like here where we ignore the global step timeout, and we keep running into deadlocks e.g. in malloc, we will encounter a lot of step timeouts; these would add up. Note that this matters even more in the context of https://github.com/openjdk/jdk/pull/11017, since that will increase the granularity for each error reporting step, potentially exposing us to a lot more individual timeouts. ------------- PR: https://git.openjdk.org/jdk/pull/12936 From rkennke at openjdk.org Fri Mar 10 09:41:10 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 10 Mar 2023 09:41:10 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v17] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fixes in response to Daniel's review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/f9f93b36..51a00e91 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=15-16 Stats: 9 lines in 3 files changed: 6 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Fri Mar 10 09:41:27 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 10 Mar 2023 09:41:27 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v16] In-Reply-To: References: Message-ID: On Thu, 9 Mar 2023 23:17:39 GMT, Daniel D. Daugherty wrote: >> Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 99 commits: >> >> - Merge branch 'master' into JDK-8291555-v2 >> - Various small fixes and improvements >> - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 >> - Use realloc instead of malloc+copy when growing the lock-stack >> - Inline initial LockStack stack >> - Fix interpreter asymmetric fast-locking >> - Fix merge error (move done label into correct places) >> - Merge branch 'master' into JDK-8291555-v2 >> - Small fixes >> - Fix anon owner in fast-path, avoid runtime call (aarch64) >> - ... and 89 more: https://git.openjdk.org/jdk/compare/5726d31e...f9f93b36 > > src/hotspot/cpu/arm/c1_MacroAssembler_arm.cpp line 56: > >> 54: } >> 55: >> 56: void C1_MacroAssembler::build_frame(int frame_size_in_bytes, int bang_size_in_bytes, int max_monitors) { > > So the `max_monitors` param is added, but not use of it. > Is someone else doing the 32-bit ARM port? Hopefully :-) I currently can't do it, though. With fast-locking (and the rest of Lilliput) behind an experimental flag, this is probably ok for now? ------------- PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Fri Mar 10 09:55:03 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 10 Mar 2023 09:55:03 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v18] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fixes in response to Daniel's review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/51a00e91..8ba676a0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=16-17 Stats: 20 lines in 4 files changed: 6 ins; 7 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From mgronlun at openjdk.org Fri Mar 10 10:43:23 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Fri, 10 Mar 2023 10:43:23 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v10] In-Reply-To: References: Message-ID: > Greetings, > > We are adding support to let JFR report on Agents. > > #### Design > > An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. > > A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) > > A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. > > To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: > > // Command line > jdk.JavaAgent { > startTime = 12:31:19.789 (2023-03-08) > name = "JavaAgent.jar" > options = "foo=bar" > dynamic = false > initialization = 12:31:15.574 (2023-03-08) > initializationTime = 172 ms > } > > // Dynamic load > jdk.JavaAgent { > startTime = 12:31:31.158 (2023-03-08) > name = "JavaAgent.jar" > options = "bar=baz" > dynamic = true > initialization = 12:31:31.037 (2023-03-08) > initializationTime = 64,1 ms > } > > The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. > > For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. > > The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) > > "initialization" is the timestamp the JVM invoked the initialization method, and "initializationTime" is the duration of executing the initialization method. > > "startTime" represents the time the JFR framework issued the periodic event; hence "initialization" will be earlier than "startTime". > > An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. > > To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: > > jdk.NativeAgent { > startTime = 12:31:40.398 (2023-03-08) > name = "jdwp" > options = "transport=dt_socket,server=y,address=any,onjcmd=y" > dynamic = false > initialization = 12:31:36.142 (2023-03-08) > initializationTime = 0,00184 ms > path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" > } > > The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. > > The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initialization" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationTime" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationTime" will be 0. > > #### Implementation > > There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. > > Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. > > When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. > > The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. > > The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. > > Testing: jdk_jfr, tier 1 - 6 > > Thanks > Markus Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: more cleanup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12923/files - new: https://git.openjdk.org/jdk/pull/12923/files/abeaa324..741b8686 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12923&range=08-09 Stats: 12 lines in 3 files changed: 1 ins; 10 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12923.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12923/head:pull/12923 PR: https://git.openjdk.org/jdk/pull/12923 From redestad at openjdk.org Fri Mar 10 12:05:18 2023 From: redestad at openjdk.org (Claes Redestad) Date: Fri, 10 Mar 2023 12:05:18 GMT Subject: RFR: 8292059: Do not inline InstanceKlass::allocate_instance() [v3] In-Reply-To: References: <7Aide4lBzCPDqqanrD8I9SsT6LXneQ8CKSU4os4lH-Q=.38f1186f-01b0-4975-ad16-fcfd4eb6c031@github.com> Message-ID: <2UIY6-oXHq0ZoyH15FivSR88Zcn6jhoM2HNCGiEeReE=.9f69d990-7884-4f7b-bdb9-03dc9e958518@github.com> On Thu, 9 Mar 2023 17:33:29 GMT, Stefan Karlsson wrote: > That's really surprising. I also don't see how any of the proposed changes could affect the GC so much. This makes me suspicious of the performance claims. > > Could you redo the benchmarking and give us more information about: > > 1. What benchmarks were run > 2. What was the benchmarks scores and GC metrics > 3. What was the run-to-run variance in the scores and metrics FWIW we had a discussion about this earlier and I'm as skeptical as you about the results listed in this PRs description. I also pointed to the microbenchmark - `./test/micro/org/openjdk/bench/vm/lambda/capture/Capture0.java` - which I used originally to motivate the move of `allocate_instance(oop, TRAPS)` to inline.hpp, ran the numbers and saw that there's no longer any discernible effect from moving it back to .cpp: Benchmark Mode Cnt Score Error Units Capture0.lambda_01 avgt 12 65,492 ? 1,675 ns/op # master Capture0.lambda_01 avgt 12 65,022 ? 1,241 ns/op # pr/12782 This was before the `oop -> InstanceKlass*` change. @afshin-zafari then ran that experiment and saw a small win on that microbenchmark from doing parameter change. While the win is probably too small to matter I didn't find it objectionable since it seems reasonable to move oop conversions earlier. ------------- PR: https://git.openjdk.org/jdk/pull/12782 From dholmes at openjdk.org Fri Mar 10 12:12:15 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 10 Mar 2023 12:12:15 GMT Subject: RFR: JDK-8303861: Error handling step timeouts should never be blocked by OnError and others [v2] In-Reply-To: References: <74f_tU9_sqUx333OTutJ1T_b1HLSssABZfHSNAhcuvU=.f8eb353d-863f-4a4a-9270-09d906eae2d0@github.com> Message-ID: On Fri, 10 Mar 2023 07:33:30 GMT, Thomas Stuefe wrote: >> This matters because it also applies to each step timeout too. If we check only once per second, we overshoot each timeout by up to one second. This overshooting happens for every step timeout. If we are in a situation like here where we ignore the global step timeout, and we keep running into deadlocks e.g. in malloc, we will encounter a lot of step timeouts; these would add up. > > Note that this matters even more in the context of https://github.com/openjdk/jdk/pull/11017, since that will increase the granularity for each error reporting step, potentially exposing us to a lot more individual timeouts. I thought there was a limit of three step timeouts before we gave up? And the step timeout is a quarter of the global so still many seconds. ------------- PR: https://git.openjdk.org/jdk/pull/12936 From rkennke at openjdk.org Fri Mar 10 12:25:09 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 10 Mar 2023 12:25:09 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v19] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Use nullptr instead of NULL in touched code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/8ba676a0..8aad280a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=17-18 Stats: 10 lines in 6 files changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Fri Mar 10 12:36:28 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 10 Mar 2023 12:36:28 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v20] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Use nullptr instead of NULL in touched code (x86) - Use nullptr instead of NULL in touched code (riscv) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/8aad280a..27a4b107 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=18-19 Stats: 17 lines in 8 files changed: 0 ins; 0 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From aboldtch at openjdk.org Fri Mar 10 12:39:03 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 10 Mar 2023 12:39:03 GMT Subject: RFR: 8303900: Rename BitMap search functions In-Reply-To: References: Message-ID: <1EhzQNYr_PTQylNyTm4u4VPII9VuSt26QJmm7pORNFs=.02f76a7f-7cd4-45d5-8602-ac8613fe8c81@github.com> On Thu, 9 Mar 2023 15:34:05 GMT, Kim Barrett wrote: > Please review this renaming of the following functions in BitMap: > > get_next_one_offset => find_first_set_bit > get_next_zero_offset => find_first_clear_bit > get_next_one_offset_aligned_right => find_first_set_bit_aligned_right > > Note that ShenandoahMarkBitMap::get_next_one_offset is not being renamed. For > some reason that class contains a copy of a sizable chunk of the code from > some version of BitMap. (Not sure why it doesn't use an internal BitMapView > rather than code copying.) > > Testing: > mach5 tier1-5 lgtm ------------- Marked as reviewed by aboldtch (Committer). PR: https://git.openjdk.org/jdk/pull/12951 From rkennke at openjdk.org Fri Mar 10 12:45:12 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 10 Mar 2023 12:45:12 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v21] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 - Use nullptr instead of NULL in touched code (shared) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/27a4b107..5fe2afcf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=19-20 Stats: 8 lines in 6 files changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From stuefe at openjdk.org Fri Mar 10 13:14:25 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 10 Mar 2023 13:14:25 GMT Subject: RFR: JDK-8303861: Error handling step timeouts should never be blocked by OnError and others [v2] In-Reply-To: References: <74f_tU9_sqUx333OTutJ1T_b1HLSssABZfHSNAhcuvU=.f8eb353d-863f-4a4a-9270-09d906eae2d0@github.com> Message-ID: <3tSPj-Axlj3cUMcIHVqcolQHo3Kj9XtPFDUjT285wrQ=.5b5e1098-c89f-4403-9b4f-ca438f6dc899@github.com> On Fri, 10 Mar 2023 12:09:34 GMT, David Holmes wrote: >> Note that this matters even more in the context of https://github.com/openjdk/jdk/pull/11017, since that will increase the granularity for each error reporting step, potentially exposing us to a lot more individual timeouts. > > I thought there was a limit of three step timeouts before we gave up? And the step timeout is a quarter of the global so still many seconds. There is no such limit. There is an implicit limit of 4 step timeouts since step timeout time was 1/4th of ErrorLogTimeout. But - that does not apply if we ignore ErrorLogTimeout - in this patch, I curtailed the maximum step timeout time to 5 seconds (see https://github.com/openjdk/jdk/blob/70b9add7685b3424ac3fee55597d470aa76c8b1b/src/hotspot/share/utilities/vmError.cpp#L1807-L1809), exactly to prevent it from growing very large for very large ErrorLogTimeouts. So, with my patch: Each step timeout is either ErrorLogTimeout/4 or 5 seconds, whatever is shorter. While writing this, I noted that this also holds true if we then go and ignore ErrorLogTimeout. Should I then just use 5 seconds? Not sure. ------------- PR: https://git.openjdk.org/jdk/pull/12936 From coleenp at openjdk.org Fri Mar 10 13:19:15 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 10 Mar 2023 13:19:15 GMT Subject: RFR: 8302491: NoClassDefFoundError omits the original cause of an error [v7] In-Reply-To: References: Message-ID: <5OMkVd6g1moN3jUlReWecYiWbF7USb30A4Y1UpxAifM=.0afc8558-eaa5-4375-8a77-a1e7dd826a87@github.com> On Thu, 9 Mar 2023 23:12:00 GMT, Ilarion Nakonechnyy wrote: >> The proposed approach added a new function for getting the cause of an exception -`java_lang_Throwable::get_cause_simple `, that gets called within `InstanceKlass::add_initialization_error` if an old one `java_lang_Throwable::get_cause_with_stack_trace` didn't succeed because of an exception during the VM call. The simple function doesn't call the VM for getting a stack trace but fills in any other information about an exception. >> >> Besides that, the discovering information about an exception was added to `ConstantPoolCacheEntry::save_and_throw_indy_exc` function. >> >> Jtreg for reproducing the issue also was added to the commit. >> The commit was tested with tier1 tests. > > Ilarion Nakonechnyy has updated the pull request incrementally with one additional commit since the last revision: > > Some corrections This looks really good except the name I don't like. Thanks for cleaning this up. src/hotspot/share/classfile/javaClasses.cpp line 2758: > 2756: > 2757: Symbol* exception_name = vmSymbols::java_lang_ExceptionInInitializerError(); > 2758: Handle h_eiie = Exceptions::new_exception(current, exception_name, st.as_string()); I don't like the h_eiie name. I keep trying to pronounce it in English. How about "initialization_error" ? or "init_error" ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.org/jdk/pull/12566 From tschatzl at openjdk.org Fri Mar 10 14:06:16 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 10 Mar 2023 14:06:16 GMT Subject: RFR: 8303963: Replace various encodings of UINT/SIZE_MAX in gc code Message-ID: Hi all, please review this refactoring that replaces various casts in GC more-or-less related to get all bits set in an uint/size_t with the available constants from cstdint. The ones in ZGC files were skipped on request. Testing: local compilation, gha Thanks, Thomas ------------- Commit messages: - Initial version Changes: https://git.openjdk.org/jdk/pull/12973/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12973&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303963 Stats: 15 lines in 13 files changed: 0 ins; 2 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/12973.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12973/head:pull/12973 PR: https://git.openjdk.org/jdk/pull/12973 From jcking at openjdk.org Fri Mar 10 14:10:11 2023 From: jcking at openjdk.org (Justin King) Date: Fri, 10 Mar 2023 14:10:11 GMT Subject: RFR: JDK-8300582: Introduce interface for unaligned memory accesses [v11] In-Reply-To: References: Message-ID: > Introduce interface for unaligned memory accesses `UnalignedAccess`, consolidate the byte swapping implementation to `byteswap`, and switch to a generic implementation of `Bytes`. Justin King has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: Propose unaligned access interface Signed-off-by: Justin King ------------- Changes: https://git.openjdk.org/jdk/pull/12078/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12078&range=10 Stats: 319 lines in 2 files changed: 319 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/12078.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12078/head:pull/12078 PR: https://git.openjdk.org/jdk/pull/12078 From ayang at openjdk.org Fri Mar 10 14:24:07 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 10 Mar 2023 14:24:07 GMT Subject: RFR: 8303963: Replace various encodings of UINT/SIZE_MAX in gc code In-Reply-To: References: Message-ID: On Fri, 10 Mar 2023 12:58:42 GMT, Thomas Schatzl wrote: > Hi all, > > please review this refactoring that replaces various casts in GC more-or-less related to get all bits set in an uint/size_t with the available constants from cstdint. > The ones in ZGC files were skipped on request. > > Testing: local compilation, gha > > Thanks, > Thomas Marked as reviewed by ayang (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/12973 From rkennke at openjdk.org Fri Mar 10 15:01:18 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 10 Mar 2023 15:01:18 GMT Subject: RFR: JDK-8303861: Error handling step timeouts should never be blocked by OnError and others [v2] In-Reply-To: References: Message-ID: On Thu, 9 Mar 2023 10:09:28 GMT, Thomas Stuefe wrote: >> Fatal error handling is subject to several timeouts: >> - a global timeout (controlled via ErrorLogTimeout) >> - local error reporting step timeouts. >> >> The latter aims to "give the JVM a kick" if it gets stuck in one particular place during error reporting. This prevents one error reporting step from hogging all the time allotted to error reporting under ErrorLogTimeout. >> >> There are three situations where atm we suppress the global error timeout: >> - if the JVM is embedded and the launcher has its abort hook installed. Obviously, that must be allowed to run. >> - if the user specified one or more OnError commands to run, and these did not yet run. These must have a chance to run unmolested. >> - if the user (typically developer) specified ShowMessageBoxOnError, and the error box has not yet been shown >> >> There is a bug though, that also prevents the step timeout from firing if either condition is true. That is plain wrong. >> >> In addition to that, the test interval WatcherThread uses to check for timeouts should be decreased. It sits at 1 second, which is too coarse-grained. >> >> -------- >> >> Patch: >> - reworks `VMError::check_timeout()` to never block step timeouts >> - adds clarifying comments >> - quadruples timeout check frequency by watcher thread >> - adds regression test for timeout handling with OnError >> - additionally limits timeout per individual error reporting step to 5 seconds. 5 seconds is usually enough to distinguish a slow error reporting step from one that is endlessly hanging. >> >> Tested locally on Linux x64. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > limit step timeout to 5 seconds max Looks good to me. I only have 2 comments. If you change this, I don't need another review. Thank you! Roman src/hotspot/share/utilities/vmError.cpp line 1787: > 1785: // Global timeout hit? > 1786: if (!ignore_global_timeout) { > 1787: const jlong reporting_start_time_l = get_reporting_start_time(); What is the meaning of _l here? Is it to indicate the type of the variable? If so, I would suggest to remove it. (I know it is pre-existing. I leave this up to you.) test/hotspot/jtreg/runtime/ErrorHandling/TimeoutInErrorHandlingTest.java line 40: > 38: > 39: /* > 40: * @test TimeoutInErrorHandlingTest-default Isn't the better way to name this just '@test TimeoutInErrorHandlingTest' and then '@id=default' and '@id=with-on-error' ? ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.org/jdk/pull/12936 From fparain at openjdk.org Fri Mar 10 15:31:16 2023 From: fparain at openjdk.org (Frederic Parain) Date: Fri, 10 Mar 2023 15:31:16 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 16:25:07 GMT, Coleen Phillimore wrote: >> Please review this change re-implementing the FieldInfo data structure. >> >> The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. >> >> The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). >> >> More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 >> >> Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. >> >> The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. >> >> Tested with mach5, tier 1 to 7. >> >> Thank you. > > src/hotspot/share/oops/fieldStreams.hpp line 104: > >> 102: AccessFlags flags; >> 103: flags.set_flags(field()->access_flags()); >> 104: return flags; > > Did this used to do this for a reason? Using the setter rather than the constructor filters out the VM defined flags and keeps only the flags from the class file. ------------- PR: https://git.openjdk.org/jdk/pull/12855 From rehn at openjdk.org Fri Mar 10 15:36:07 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 10 Mar 2023 15:36:07 GMT Subject: RFR: 8300926: Several startup regressions ~6-70% in 21-b6 all platforms [v9] In-Reply-To: References: Message-ID: <4USx1viY-9Jbm_bqdXyiQOVEBXzqb-tQbbTTZxzBNM4=.c46edf13-8beb-456f-b43f-5678de1edf15@github.com> > Hi all, please consider. > > The original issue was when thread 1 asked to deopt nmethod set X and thread 2 asked for the same or a subset of X. > All method will already be marked, but the actual deoptimizing, not entrant, patching PC on stacks and patching post call nops, was not done yet. Which meant thread 2 could 'pass' thread 1. > Most places did deopt under Compile_lock, thus this is not an issue, but WB and clearCallSiteContext do not. > > Since a handshakes may take long before completion and Compile_lock is used for so much more than deopts. > The fix in https://bugs.openjdk.org/browse/JDK-8299074 instead always emitted a handshake even when everything was already marked. (instead of adding Compile_lock to all places) > > This turnout to be problematic in the startup, for example the number of deopt handshakes in jetty dry run startup went from 5 to 39 handshakes. > > This fix first adds a barrier for which you do not pass until the requested deopts have happened and it coalesces the handshakes. > Secondly it moves handshakes part out of the Compile_lock where it is possible. > > Which means we fix the performance bug and we reduce the contention on Compile_lock, meaning higher throughput in compiler and things such as class-loading. > > It passes t1-t7 with flying colours! t8 still not completed and I'm redoing some testing due to last minute simplifications. > > Thanks, Robbin Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: Nit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12585/files - new: https://git.openjdk.org/jdk/pull/12585/files/be2d3b5e..c39ff2c4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12585&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12585&range=07-08 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12585.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12585/head:pull/12585 PR: https://git.openjdk.org/jdk/pull/12585 From rehn at openjdk.org Fri Mar 10 15:39:33 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 10 Mar 2023 15:39:33 GMT Subject: RFR: 8300926: Several startup regressions ~6-70% in 21-b6 all platforms [v8] In-Reply-To: References: Message-ID: On Fri, 10 Mar 2023 02:01:12 GMT, David Holmes wrote: >> Robbin Ehn has updated the pull request incrementally with two additional commits since the last revision: >> >> - Grab lock so code() is stable >> - Non CHA based vtables fix > > src/hotspot/share/classfile/systemDictionary.cpp line 1618: > >> 1616: >> 1617: // In case we are not using CHA based vtables we need to make sure the loaded >> 1618: // deopt is completed before anyone link this class. > > nit s/link/links/ Fixed, thanks. ------------- PR: https://git.openjdk.org/jdk/pull/12585 From pchilanomate at openjdk.org Fri Mar 10 17:03:16 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 10 Mar 2023 17:03:16 GMT Subject: RFR: 8303908: Add missing check in VTMS_transition_disable_for_all() for suspend mode In-Reply-To: References: Message-ID: On Fri, 10 Mar 2023 03:51:05 GMT, Serguei Spitsyn wrote: > This looks good. Thank you for catching and taking care about it! Serguei > Thanks for the review Serguei! ------------- PR: https://git.openjdk.org/jdk/pull/12956 From pchilanomate at openjdk.org Fri Mar 10 17:11:18 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 10 Mar 2023 17:11:18 GMT Subject: RFR: 8303908: Add missing check in VTMS_transition_disable_for_all() for suspend mode In-Reply-To: References: Message-ID: On Fri, 10 Mar 2023 04:34:13 GMT, David Holmes wrote: >> Please review this small fix. A suspender is a JvmtiVTMSTransitionDisabler monopolist, meaning VTMS_transition_disable_for_all() should not return while there is any active jvmtiVTMSTransitionDisabler. The code though is checking for active "all-disablers" but it's missing the check for active "single disablers". >> I attached a simple reproducer to the bug which I used to test the patch. Not sure if it was worth adding a test so the patch contains just the fix. >> >> Thanks, >> Patricio > > src/hotspot/share/prims/jvmtiThreadState.cpp line 372: > >> 370: java_lang_Thread::dec_VTMS_transition_disable_count(vth()); >> 371: Atomic::dec(&_VTMS_transition_disable_for_one_count); >> 372: if (_VTMS_transition_disable_for_one_count == 0 || _is_SR) { > > Sorry I don't understand why this `_is_SR` check was removed. I admit I can't really figure out what this field means anyway, but there is nothing in the issue description that suggests this also needs changing - and it is now different to `VTMS_transition_enable_for_all`. A JvmtiVTMSTransitionDisabler instance that is a "single disabler" only blocks other virtual threads trying to transition or JvmtiVTMSTransitionDisabler monopolists. Both of them will check for _VTMS_transition_disable_for_one_count (the JvmtiVTMSTransitionDisabler monopolist was missing that check) so just checking when that counter is zero is enough. In fact, for a "single disabler" _is_SR is always false so that check wasn't doing anything. Yes, this is not actually needed for the fix, but when looking at which condition we use to wait and which one to notify I caught this, sorry for not explaining that part. And looking closer at VTMS_transition_enable_for_all() now I see the check for _is_SR is not doing anything too, because if _VTMS_transition_disable_for_all_count was not zero after the decrement then this can't be a JvmtiVTMSTransitionDisabler monopolist, i.e _is_SR will be false. When a monopolist is running all other "disable all" JvmtiVTMSTransitionDisabler instances if any will be waiting in the first "while (_SR_mode)" loop in VTMS_transition_disable_for_all(), so _VTMS_transition_disable_for_all_count will be one through the monopolist run. So this should be an assert after the decrement: assert(!_is_SR || _VTMS_transition_disable_for_all_count == 0, ""). ------------- PR: https://git.openjdk.org/jdk/pull/12956 From duke at openjdk.org Fri Mar 10 17:15:48 2023 From: duke at openjdk.org (Afshin Zafari) Date: Fri, 10 Mar 2023 17:15:48 GMT Subject: RFR: 8292059: Do not inline InstanceKlass::allocate_instance() [v4] In-Reply-To: References: Message-ID: <7FX3eaP7l70pTU91DWMpuQVeDjpcekd3WUo0VFbMwow=.eb5e9998-b17d-40b7-b95e-0c28afd84f16@github.com> > The inline and not-inline versions of the method is tested to compare the performance difference. > ### Test > `make test TEST=micro:Capture0.lambda_01 MICRO="VM_OPTIONS=-XX:TieredStopAtLevel=1" ` Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: 8292059: Do not inline InstanceKlass::allocate_instance() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12782/files - new: https://git.openjdk.org/jdk/pull/12782/files/6cdd8357..fe3de0b5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12782&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12782&range=02-03 Stats: 21 lines in 4 files changed: 2 ins; 10 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/12782.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12782/head:pull/12782 PR: https://git.openjdk.org/jdk/pull/12782 From duke at openjdk.org Fri Mar 10 17:15:54 2023 From: duke at openjdk.org (Afshin Zafari) Date: Fri, 10 Mar 2023 17:15:54 GMT Subject: RFR: 8292059: Do not inline InstanceKlass::allocate_instance() [v3] In-Reply-To: References: <7Aide4lBzCPDqqanrD8I9SsT6LXneQ8CKSU4os4lH-Q=.38f1186f-01b0-4975-ad16-fcfd4eb6c031@github.com> Message-ID: <_0vSxDffz2IQ3MAftdfuqQ5qTeV8d1yjrnS0TT2peOM=.9fdd060b-1ee9-492d-8756-bdfd7588d8ac@github.com> On Thu, 9 Mar 2023 11:07:57 GMT, Claes Redestad wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> 8292059: Do not inline InstanceKlass::allocate_instance() > > src/hotspot/share/prims/jni.cpp line 967: > >> 965: >> 966: instanceOop i = InstanceKlass::allocate_instance( >> 967: InstanceKlass::cast(java_lang_Class::as_Klass(JNIHandles::resolve_non_null(clazz))), > > Perhaps it would be nice with a utility method to reduce some of the clutter. Just folding the `as_Klass` into a new method `instanceKlass::cast_from_oop(..)` (or just `from_oop`) would cut away a fair chunk and could be used to similar effect elsewhere (counting 21 cases where this would be applicable). An specific RFE is created for this parameter change. The change is reverted and not in this PR anymore. ------------- PR: https://git.openjdk.org/jdk/pull/12782 From coleenp at openjdk.org Fri Mar 10 17:35:27 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 10 Mar 2023 17:35:27 GMT Subject: RFR: 8292059: Do not inline InstanceKlass::allocate_instance() [v4] In-Reply-To: <7FX3eaP7l70pTU91DWMpuQVeDjpcekd3WUo0VFbMwow=.eb5e9998-b17d-40b7-b95e-0c28afd84f16@github.com> References: <7FX3eaP7l70pTU91DWMpuQVeDjpcekd3WUo0VFbMwow=.eb5e9998-b17d-40b7-b95e-0c28afd84f16@github.com> Message-ID: <2jx5pf1wpCPJErB50jnyF7nMAPr75prYVjnYw6vpErs=.3461994b-c424-4732-aa73-3c3326bbb32d@github.com> On Fri, 10 Mar 2023 17:15:48 GMT, Afshin Zafari wrote: >> The inline and not-inline versions of the method is tested to compare the performance difference. >> ### Test >> `make test TEST=micro:Capture0.lambda_01 MICRO="VM_OPTIONS=-XX:TieredStopAtLevel=1" ` > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > 8292059: Do not inline InstanceKlass::allocate_instance() You need to revert the changes in jni.cpp and instanceKlass.hpp. ------------- Changes requested by coleenp (Reviewer). PR: https://git.openjdk.org/jdk/pull/12782 From kbarrett at openjdk.org Fri Mar 10 20:36:58 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 10 Mar 2023 20:36:58 GMT Subject: RFR: 8303963: Replace various encodings of UINT/SIZE_MAX in gc code In-Reply-To: References: Message-ID: On Fri, 10 Mar 2023 12:58:42 GMT, Thomas Schatzl wrote: > Hi all, > > please review this refactoring that replaces various casts in GC and more-or-less related to get all bits set in an uint/size_t with the available constants from cstdint. > The ones in ZGC files were skipped on request. > > Testing: local compilation, gha > > Thanks, > Thomas Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.org/jdk/pull/12973 From kbarrett at openjdk.org Fri Mar 10 21:10:30 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 10 Mar 2023 21:10:30 GMT Subject: RFR: 8303900: Rename BitMap search functions [v2] In-Reply-To: References: Message-ID: > Please review this renaming of the following functions in BitMap: > > get_next_one_offset => find_first_set_bit > get_next_zero_offset => find_first_clear_bit > get_next_one_offset_aligned_right => find_first_set_bit_aligned_right > > Note that ShenandoahMarkBitMap::get_next_one_offset is not being renamed. For > some reason that class contains a copy of a sizable chunk of the code from > some version of BitMap. (Not sure why it doesn't use an internal BitMapView > rather than code copying.) > > Testing: > mach5 tier1-5 Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Merge branch 'master' into find-rename - update gtests - update shenandoah uses - update zgc uses - update gc/shared uses - update Parallel uses - update G1 uses - rename functions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12951/files - new: https://git.openjdk.org/jdk/pull/12951/files/94fa65a5..c503e584 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12951&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12951&range=00-01 Stats: 63148 lines in 577 files changed: 54630 ins; 2410 del; 6108 mod Patch: https://git.openjdk.org/jdk/pull/12951.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12951/head:pull/12951 PR: https://git.openjdk.org/jdk/pull/12951 From kbarrett at openjdk.org Fri Mar 10 21:10:33 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 10 Mar 2023 21:10:33 GMT Subject: RFR: 8303900: Rename BitMap search functions [v2] In-Reply-To: <4GIB9OwfDN2SPR-vA9cUIN2eNhWjqxV47tBLlsoEWuM=.278dc6e2-93f6-438d-8938-8bf0f35c5599@github.com> References: <4GIB9OwfDN2SPR-vA9cUIN2eNhWjqxV47tBLlsoEWuM=.278dc6e2-93f6-438d-8938-8bf0f35c5599@github.com> Message-ID: On Thu, 9 Mar 2023 18:01:57 GMT, Stefan Karlsson wrote: >> Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: >> >> - Merge branch 'master' into find-rename >> - update gtests >> - update shenandoah uses >> - update zgc uses >> - update gc/shared uses >> - update Parallel uses >> - update G1 uses >> - rename functions > > Looks good. Thanks for reviews @stefank and @xmas92 ------------- PR: https://git.openjdk.org/jdk/pull/12951 From kbarrett at openjdk.org Fri Mar 10 21:20:56 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 10 Mar 2023 21:20:56 GMT Subject: Integrated: 8303900: Rename BitMap search functions In-Reply-To: References: Message-ID: On Thu, 9 Mar 2023 15:34:05 GMT, Kim Barrett wrote: > Please review this renaming of the following functions in BitMap: > > get_next_one_offset => find_first_set_bit > get_next_zero_offset => find_first_clear_bit > get_next_one_offset_aligned_right => find_first_set_bit_aligned_right > > Note that ShenandoahMarkBitMap::get_next_one_offset is not being renamed. For > some reason that class contains a copy of a sizable chunk of the code from > some version of BitMap. (Not sure why it doesn't use an internal BitMapView > rather than code copying.) > > Testing: > mach5 tier1-5 This pull request has now been integrated. Changeset: 21169285 Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/2116928528c0554b2ba0171bd7968ab693972804 Stats: 75 lines in 12 files changed: 0 ins; 0 del; 75 mod 8303900: Rename BitMap search functions Reviewed-by: stefank, aboldtch ------------- PR: https://git.openjdk.org/jdk/pull/12951 From jvernee at openjdk.org Sat Mar 11 02:23:24 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Sat, 11 Mar 2023 02:23:24 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v3] In-Reply-To: References: <8b3vVrV22RuhdRoRYacXV0ZeghFGgKkC8S_z-iMrzAQ=.dd84b743-8b51-4281-8f5f-f9eff6207bc7@github.com> Message-ID: On Thu, 9 Mar 2023 17:40:02 GMT, Martin Doerr wrote: >> @TheRealMDoerr I've moved the support for structs/unions that are not a power of 2 in size to this repo, so you should be able to merge the master branch to get it now. > > @JornVernee: Thanks! I've merged in your changes. TestArrayStructs is not yet completely working. I will need to investigate. I think I've done most other things you had requested. You may want to take a look at my recent commits. @TheRealMDoerr I've been keeping up with the changes you're making. I just have to look at the new test for HFA's you've added you added (next week). Besides fixing the TestArrayStructs test, do you have anything else you still want to add to this PR? ------------- PR: https://git.openjdk.org/jdk/pull/12708 From jvernee at openjdk.org Sat Mar 11 02:23:25 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Sat, 11 Mar 2023 02:23:25 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v3] In-Reply-To: References: <8b3vVrV22RuhdRoRYacXV0ZeghFGgKkC8S_z-iMrzAQ=.dd84b743-8b51-4281-8f5f-f9eff6207bc7@github.com> Message-ID: <0RrqDfVTh0EmUK3xCInQa1gXbZwdLpd_jY-3TYgdqwo=.e1ca03f4-8e66-4d3d-97f2-59f5bbd7287b@github.com> On Fri, 3 Mar 2023 10:59:32 GMT, Martin Doerr wrote: >> Thanks! I need to find extra time for this. Sounds like a good idea and I may be able to get rid of some nasty code. > > Done by https://github.com/openjdk/jdk/pull/12708/commits/98e242c24c07ea977b7709b9f8d0c10ce87e84c0 (using a record instead of a `VMStorage[][]` because I think this is better readable). Note that it's a bit more complicated. I couldn't use your `dup` trick, because I need to put the value into a GP reg and one half of it to a FP reg. The Panama code doesn't support that (IllegalArgumentException: Invalid operand type: interface java.lang.foreign.MemorySegment. float expected). Thanks, your solution looks good. ------------- PR: https://git.openjdk.org/jdk/pull/12708 From iklam at openjdk.org Sat Mar 11 02:34:22 2023 From: iklam at openjdk.org (Ioi Lam) Date: Sat, 11 Mar 2023 02:34:22 GMT Subject: RFR: 8301136: Improve unlink() and unlink_all() of ResourceHashtableBase [v5] In-Reply-To: <9GAFUt_y36yObC0oOhzxNC05Y8Ja_fkUPxvhZCuFSPY=.255df6aa-d417-4277-9799-a9208e758158@github.com> References: <9GAFUt_y36yObC0oOhzxNC05Y8Ja_fkUPxvhZCuFSPY=.255df6aa-d417-4277-9799-a9208e758158@github.com> Message-ID: On Sun, 29 Jan 2023 06:41:10 GMT, Ioi Lam wrote: >> Xin Liu has updated the pull request incrementally with one additional commit since the last revision: >> >> Add lambda API unlink(Function f) per reviewers' request. > > I updated [JDK-8301296](https://bugs.openjdk.org/browse/JDK-8301296) to clarify the problems with the current ResourceHashtable design and my proposal for fixing them. > > I think the above proposed fixes shouldn't block the progress of this PR, which is just an optimization that maintains the current behavior. So let's continue the discussions here. > > There are two parts of this PR: > > [1] For the optimization (counting the number of entries and exit the loop early), do you have any performance data that shows this is beneficial? > > For this optimization to be beneficial, we must have two conditions -- (a) the table is too large so it's likely to have many unused entries, and (b) the hash function is bad so most of the unused entries are at the end of the table. > > For (a), maybe it's better to change the table to ResizeableResourceHashtable? > For (b), maybe you can also print out the occupancy of the table in your traces like this one (in your earlier PR https://github.com/openjdk/jdk/pull/12016) > > > [12.824s][debug][hashtables] table_size = 109, break at 68 > > > If we have many entries (e.g., 40) but they all pack to the front end of the table, that means we have a bad hash function. > > [2] As I suggested earlier, we should consolidate the code to use a single unlink loop, so you can apply this counting optimization in a single place. > > I am not quite sure why you would need the following in your "wrapper" functions: > > > if (clean) { > *ptr = node->_next; > } else { > ptr = &(node->_next); > } > > > and > > > if (node->_next == nullptr) { > // nullify the bucket when reach the end of linked list. > *ptr = nullptr; > } > > > I wrote a version of the consolidated loop that's much simpler. It also aligns with the old code so the diff is more readable: > > https://github.com/openjdk/jdk/compare/master...iklam:jdk:8301136-resource-hash-unlink-all-suggestion > > Note that I have both `unlink_all()` and `unlink_all(Function function)`, that's because the current API allows the user function to do two things (a) check if the entry should be freed, (b) perform special clean up on the entry. So if you want to free all entries but need to perform special clean up, you'd call `unlink_all(Function function)`. > @iklam > > Like you said, I mixed up 2 things. My real intention is to introduce a new api `unlink_all()` because I used it in my project. > > I understand that you want me to refactor unlink(Iterator) using lambda. In order to have an efficient unlink_all() and dtor, I have to factor out unlink_impl(). It's private and acts as the algorithmic template. > > What should I do now? Should I split this JBS issue into 2? One is for the optimization and the other one for "unlink_all()"? that would make thing easier to review. > > thanks, --lx I think it?s best to split this PR into two. We can first address the unlink_all issue and do the refactoring in a subsequent PR. Thanks ------------- PR: https://git.openjdk.org/jdk/pull/12213 From stuefe at openjdk.org Sat Mar 11 14:35:35 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 11 Mar 2023 14:35:35 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v21] In-Reply-To: References: Message-ID: <0Ui89e1xMwZGBUG7hiyTfpr7FD9lJ5sXJJZg846XG54=.2f7843f0-f62b-43d1-8f4a-f21db7cc3666@github.com> On Fri, 10 Mar 2023 12:45:12 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Use nullptr instead of NULL in touched code (shared) I'm looking into the arm32 port. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From jwaters at openjdk.org Sat Mar 11 14:36:26 2023 From: jwaters at openjdk.org (Julian Waters) Date: Sat, 11 Mar 2023 14:36:26 GMT Subject: RFR: 8274400: HotSpot Style Guide should permit use of alignof [v7] In-Reply-To: References: <3IdeZGaGcFB1-DbVugn0P9v8vRQMLURzHFRSpAm46NA=.09bf4dd8-b652-49c3-b9a5-ad3f58dfbcc3@github.com> Message-ID: On Thu, 9 Mar 2023 20:21:33 GMT, Kim Barrett wrote: >>> This has been open and stable for a while. Anyone else? >> >> Also if there is anyone else coming to review this, consider looking at https://github.com/openjdk/jdk/pull/11431 as well, it's also been collecting dust for quite a while too ;-; > > @TheShermanTanker - this is ready to be integrated, as it has been approved by @vnkozlov (HotSpot lead). @kimbarrett Oh, my mistake, will push in a moment. In the meantime, is it ok if I ask how to progress with [8250269: Replace ATTRIBUTE_ALIGNED with alignas](https://github.com/openjdk/jdk/pull/11431)? ------------- PR: https://git.openjdk.org/jdk/pull/11761 From jwaters at openjdk.org Sat Mar 11 14:40:31 2023 From: jwaters at openjdk.org (Julian Waters) Date: Sat, 11 Mar 2023 14:40:31 GMT Subject: RFR: 8303810: Restore attribute positions after JDK-8303839 to match JDK-8302124 [v3] In-Reply-To: References: Message-ID: On Thu, 9 Mar 2023 11:54:44 GMT, Julian Waters wrote: >> [JDK-8303839](https://bugs.openjdk.org/browse/JDK-8303839)'s revert of [[noreturn]] attributes also moved their already existing attributes back to behind their corresponding methods, they should be restored to where [JDK-8302124](https://bugs.openjdk.org/browse/JDK-8302124) requires them to be. Also fixes attribute from [JDK-8292269](https://bugs.openjdk.org/browse/JDK-8292269) > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > debug.hpp Alright ------------- PR: https://git.openjdk.org/jdk/pull/12918 From jwaters at openjdk.org Sat Mar 11 14:40:34 2023 From: jwaters at openjdk.org (Julian Waters) Date: Sat, 11 Mar 2023 14:40:34 GMT Subject: Integrated: 8274400: HotSpot Style Guide should permit use of alignof In-Reply-To: <3IdeZGaGcFB1-DbVugn0P9v8vRQMLURzHFRSpAm46NA=.09bf4dd8-b652-49c3-b9a5-ad3f58dfbcc3@github.com> References: <3IdeZGaGcFB1-DbVugn0P9v8vRQMLURzHFRSpAm46NA=.09bf4dd8-b652-49c3-b9a5-ad3f58dfbcc3@github.com> Message-ID: On Thu, 22 Dec 2022 04:39:47 GMT, Julian Waters wrote: > The alignof operator was added by C++11. It returns the alignment for the given type. Various metaprogramming usages exist, in particular when using std::aligned_storage. Use of this operator should be permitted in HotSpot code. This pull request has now been integrated. Changeset: a06426a5 Author: Julian Waters URL: https://git.openjdk.org/jdk/commit/a06426a52f16c08c95b1c0270a5fc40721921022 Stats: 18 lines in 2 files changed: 16 ins; 0 del; 2 mod 8274400: HotSpot Style Guide should permit use of alignof Reviewed-by: kbarrett, kvn, dholmes, dcubed ------------- PR: https://git.openjdk.org/jdk/pull/11761 From jwaters at openjdk.org Sat Mar 11 14:40:33 2023 From: jwaters at openjdk.org (Julian Waters) Date: Sat, 11 Mar 2023 14:40:33 GMT Subject: Withdrawn: 8303810: Restore attribute positions after JDK-8303839 to match JDK-8302124 In-Reply-To: References: Message-ID: <6nQIo08AWqpAXCLVOr30cO8O5FtizMZc7sp8fu94TRY=.c85be9a6-ba66-49d5-ad89-77e7d60cc936@github.com> On Wed, 8 Mar 2023 08:39:08 GMT, Julian Waters wrote: > [JDK-8303839](https://bugs.openjdk.org/browse/JDK-8303839)'s revert of [[noreturn]] attributes also moved their already existing attributes back to behind their corresponding methods, they should be restored to where [JDK-8302124](https://bugs.openjdk.org/browse/JDK-8302124) requires them to be. Also fixes attribute from [JDK-8292269](https://bugs.openjdk.org/browse/JDK-8292269) This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/12918 From jwaters at openjdk.org Sat Mar 11 14:55:56 2023 From: jwaters at openjdk.org (Julian Waters) Date: Sat, 11 Mar 2023 14:55:56 GMT Subject: RFR: 8301244: Tidy up compiler specific warnings files [v9] In-Reply-To: References: Message-ID: > Cleans up some code in compilerWarnings_*.hpp files to be slightly neater Julian Waters has updated the pull request incrementally with four additional commits since the last revision: - compilerWarnings_visCPP.hpp - Spelling - compilerWarnings_gcc.hpp - macros.hpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12255/files - new: https://git.openjdk.org/jdk/pull/12255/files/8ba0ff35..30561ee1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12255&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12255&range=07-08 Stats: 9 lines in 3 files changed: 0 ins; 3 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/12255.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12255/head:pull/12255 PR: https://git.openjdk.org/jdk/pull/12255 From jwaters at openjdk.org Sat Mar 11 15:12:22 2023 From: jwaters at openjdk.org (Julian Waters) Date: Sat, 11 Mar 2023 15:12:22 GMT Subject: RFR: 8301244: Tidy up compiler specific warnings files [v9] In-Reply-To: References: Message-ID: On Sat, 11 Mar 2023 14:55:56 GMT, Julian Waters wrote: >> Cleans up some code in compilerWarnings_*.hpp files to be slightly neater > > Julian Waters has updated the pull request incrementally with four additional commits since the last revision: > > - compilerWarnings_visCPP.hpp > - Spelling > - compilerWarnings_gcc.hpp > - macros.hpp Alright, in that case I'll simply ditch the PRAGMA utility and simply have this as a tidying up for both compilerWarnings.hpp files ------------- PR: https://git.openjdk.org/jdk/pull/12255 From stuefe at openjdk.org Sat Mar 11 15:17:30 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 11 Mar 2023 15:17:30 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v21] In-Reply-To: References: Message-ID: <9Z5c4yco0VZ_8emD3C43P3LYtJfItTHkRyj9MCsKcNg=.b4342519-99bb-45d0-8d4b-169624e3ff2d@github.com> On Fri, 10 Mar 2023 12:45:12 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Use nullptr instead of NULL in touched code (shared) Proposal for omitting the lockstack size check (at least in 75% of all times): - We know that Thread as well as grown lockstack backing buffers start at malloc-aligned boundaries. Practically this is 16 (64-bit), 4-8 (32-bit). So at the very least 4. - Make the initial lockstack this size. Define it so that initial slot stack starts at offset 0. - Load the current slot pointer as you do now. Check the lowest 2 bits. If all are zero, go the slower path (load the current limit and compare against limit, ...). - If bit 0 or 1 are set, you can omit this check. You are done since you have not yet reached the limit. - You can expand this proposal to any alignment you like. You need to declare the lockstack slots with `alignof(X)`, and the compiler will take care that the *initial* slot stack is always well aligned. As for larger slot stacks, we will have to allocate them in an aligned fashion using posix_memalign (we need this as NMT-wrapped version, but thats trivial) ------------- PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Sat Mar 11 16:00:33 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Sat, 11 Mar 2023 16:00:33 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v21] In-Reply-To: References: Message-ID: On Fri, 10 Mar 2023 12:45:12 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Use nullptr instead of NULL in touched code (shared) > Proposal for omitting the lockstack size check (at least in 75% of all times): > > * We know that Thread as well as grown lockstack backing buffers start at malloc-aligned boundaries. Practically this is 16 (64-bit), 4-8 (32-bit). So at the very least 4. > * Make the initial lockstack this size. Define it so that initial slot stack starts at offset 0. > * Load the current slot pointer as you do now. Check the lowest 2 bits. If all are zero, go the slower path (load the current limit and compare against limit, ...). > * If bit 0 or 1 are set, you can omit this check. You are done since you have not yet reached the limit. > * You can expand this proposal to any alignment you like. You need to declare the lockstack slots with `alignof(X)`, and the compiler will take care that the _initial_ slot stack is always well aligned. As for larger slot stacks, we will have to allocate them in an aligned fashion using posix_memalign (we need this as NMT-wrapped version, but thats trivial) This would only work when pushing a single slot, right? Have you seen what we're doing in the compiled (C1 and C2) paths (in x86_64 and aarch64)? There we're doing a (conservative) estimate how many lock-slots are needed in the method, and check for enough slots upon method entry once, and then elide the check altogether in the lock-enter implementation. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From stuefe at openjdk.org Sat Mar 11 16:14:30 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 11 Mar 2023 16:14:30 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v21] In-Reply-To: References: Message-ID: <6JDIBigQkgXxOQDE0UEeZhX8ountKQAliKpynUVzcbY=.abf3a7f5-6744-437c-aee0-86a197248a62@github.com> On Sat, 11 Mar 2023 15:57:53 GMT, Roman Kennke wrote: > > Proposal for omitting the lockstack size check (at least in 75% of all times): > > > > * We know that Thread as well as grown lockstack backing buffers start at malloc-aligned boundaries. Practically this is 16 (64-bit), 4-8 (32-bit). So at the very least 4. > > * Make the initial lockstack this size. Define it so that initial slot stack starts at offset 0. > > * Load the current slot pointer as you do now. Check the lowest 2 bits. If all are zero, go the slower path (load the current limit and compare against limit, ...). > > * If bit 0 or 1 are set, you can omit this check. You are done since you have not yet reached the limit. > > * You can expand this proposal to any alignment you like. You need to declare the lockstack slots with `alignof(X)`, and the compiler will take care that the _initial_ slot stack is always well aligned. As for larger slot stacks, we will have to allocate them in an aligned fashion using posix_memalign (we need this as NMT-wrapped version, but thats trivial) > > This would only work when pushing a single slot, right? Have you seen what we're doing in the compiled (C1 and C2) paths (in x86_64 and aarch64)? There we're doing a (conservative) estimate how many lock-slots are needed in the method, and check for enough slots upon method entry once, and then elide the check altogether in the lock-enter implementation. Yeah, I just realized this myself. I started working on the template interpreter first, where we push single stack slots. There it may still make sense. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From stuefe at openjdk.org Sat Mar 11 16:30:38 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 11 Mar 2023 16:30:38 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v21] In-Reply-To: References: Message-ID: <7265U-aASDjFX1CMrbxDZZCPHrYJkufD1QDFBuB1WSA=.623488a7-9ede-4ec2-b840-1e5601a9b97a@github.com> On Fri, 10 Mar 2023 12:45:12 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Use nullptr instead of NULL in touched code (shared) Not a full review, just stuff I stumbled over while looking into the arm port. General notes: I dislike the "Fast" moniker for UseFastLocking. Old thin locks were not particularly slow either. Also, I believe I have seen places where "fast locking/unlocking" were used before to describe stackbased locking. Can we name this something like "NewStyleThinLocks" or similar? src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 821: > 819: call_VM(noreg, > 820: CAST_FROM_FN_PTR(address, InterpreterRuntime::monitorenter), > 821: UseFastLocking ? obj_reg : lock_reg); The first call to InterpreterRuntime::monitorenter, under UseHeavyMonitors: Please add assert for !UseFastLocking. src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 6234: > 6232: > 6233: // Load (object->mark() | 1) into hdr > 6234: orr(hdr, hdr, markWord::unlocked_value); I wondered why this is needed. Should we not have the header of an unloaded object in hdr? Or is this a safeguard against a misuse of this function (called with the header of an already locked object)? src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 6246: > 6244: str(obj, Address(t1, 0)); > 6245: add(t1, t1, oopSize); > 6246: str(t1, Address(rthread, JavaThread::lock_stack_current_offset())); This, and its counterpart pop, may be worth factoring out src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 6267: > 6265: ldr(t1, Address(rthread, JavaThread::lock_stack_current_offset())); > 6266: sub(t1, t1, oopSize); > 6267: str(t1, Address(rthread, JavaThread::lock_stack_current_offset())); good comments, helpful src/hotspot/share/opto/c2_CodeStubs.hpp line 89: > 87: }; > 88: > 89: class C2CheckLockStackStub : public C2CodeStub { Badly named, please reconsider. This does not check the lock stack, it grows it. "Check" sounds like a non-modifying state verification. Proposal: C2EnsureLockStackSizeStub src/hotspot/share/runtime/globals.hpp line 1983: > 1981: \ > 1982: product(bool, UseFastLocking, false, EXPERIMENTAL, \ > 1983: "Use fast-locking instead of stack-locking") \ Please ergo-disable this for UseHeavyMonitors. One less thing to think about. src/hotspot/share/runtime/lockStack.hpp line 46: > 44: void grow(size_t min_capacity); > 45: > 46: void validate(const char* msg) const PRODUCT_RETURN; nit: these functions seem normally to be called "verify" src/hotspot/share/runtime/lockStack.hpp line 52: > 50: static ByteSize limit_offset() { return byte_offset_of(LockStack, _limit); } > 51: > 52: static void ensure_lock_stack_size(oop* _required_limit); I would split this, do the comparison inline, only the actual growth in the cpp file. src/hotspot/share/runtime/lockStack.hpp line 64: > 62: > 63: // GC support > 64: inline void oops_do(OopClosure* cl); Does this need to be nonconst? ------------- PR: https://git.openjdk.org/jdk/pull/10907 From duke at openjdk.org Sat Mar 11 16:31:06 2023 From: duke at openjdk.org (Afshin Zafari) Date: Sat, 11 Mar 2023 16:31:06 GMT Subject: RFR: 8292059: Do not inline InstanceKlass::allocate_instance() [v5] In-Reply-To: References: Message-ID: <0mRQdiYkOqYL4mvfGOiUrsFGDUgTq_FMbBrV9DmBZWo=.1630b1b8-31e9-4f76-9b1e-e202711c8714@github.com> > The inline and not-inline versions of the method is tested to compare the performance difference. > ### Test > `make test TEST=micro:Capture0.lambda_01 MICRO="VM_OPTIONS=-XX:TieredStopAtLevel=1" ` Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: 8292059: Do not inline InstanceKlass::allocate_instance() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12782/files - new: https://git.openjdk.org/jdk/pull/12782/files/fe3de0b5..0ef3159a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12782&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12782&range=03-04 Stats: 3 lines in 2 files changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/12782.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12782/head:pull/12782 PR: https://git.openjdk.org/jdk/pull/12782 From duke at openjdk.org Sat Mar 11 16:32:23 2023 From: duke at openjdk.org (Afshin Zafari) Date: Sat, 11 Mar 2023 16:32:23 GMT Subject: RFR: 8292059: Do not inline InstanceKlass::allocate_instance() [v4] In-Reply-To: <2jx5pf1wpCPJErB50jnyF7nMAPr75prYVjnYw6vpErs=.3461994b-c424-4732-aa73-3c3326bbb32d@github.com> References: <7FX3eaP7l70pTU91DWMpuQVeDjpcekd3WUo0VFbMwow=.eb5e9998-b17d-40b7-b95e-0c28afd84f16@github.com> <2jx5pf1wpCPJErB50jnyF7nMAPr75prYVjnYw6vpErs=.3461994b-c424-4732-aa73-3c3326bbb32d@github.com> Message-ID: <8uKidgmfe4DIvzbGilcwcCp4gvxWclNOosj_pQa5bpc=.33aefbc3-5328-493e-917c-d984e55d5f18@github.com> On Fri, 10 Mar 2023 17:31:58 GMT, Coleen Phillimore wrote: > You need to revert the changes in jni.cpp and instanceKlass.hpp. They are reverted. ------------- PR: https://git.openjdk.org/jdk/pull/12782 From stuefe at openjdk.org Sat Mar 11 16:37:24 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 11 Mar 2023 16:37:24 GMT Subject: RFR: JDK-8303861: Error handling step timeouts should never be blocked by OnError and others [v2] In-Reply-To: References: Message-ID: <1lMEsynz434ly8ccXRxooqKSnAoo-fLV7V3XYCubq5w=.45f5a9a8-6034-4c84-8b20-bb8d93194d86@github.com> On Fri, 10 Mar 2023 14:54:20 GMT, Roman Kennke wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> limit step timeout to 5 seconds max > > src/hotspot/share/utilities/vmError.cpp line 1787: > >> 1785: // Global timeout hit? >> 1786: if (!ignore_global_timeout) { >> 1787: const jlong reporting_start_time_l = get_reporting_start_time(); > > What is the meaning of _l here? Is it to indicate the type of the variable? If so, I would suggest to remove it. (I know it is pre-existing. I leave this up to you.) This is a remnant of the old implementation. I'll change it. ------------- PR: https://git.openjdk.org/jdk/pull/12936 From stuefe at openjdk.org Sat Mar 11 16:44:05 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 11 Mar 2023 16:44:05 GMT Subject: RFR: JDK-8303861: Error handling step timeouts should never be blocked by OnError and others [v3] In-Reply-To: References: Message-ID: > Fatal error handling is subject to several timeouts: > - a global timeout (controlled via ErrorLogTimeout) > - local error reporting step timeouts. > > The latter aims to "give the JVM a kick" if it gets stuck in one particular place during error reporting. This prevents one error reporting step from hogging all the time allotted to error reporting under ErrorLogTimeout. > > There are three situations where atm we suppress the global error timeout: > - if the JVM is embedded and the launcher has its abort hook installed. Obviously, that must be allowed to run. > - if the user specified one or more OnError commands to run, and these did not yet run. These must have a chance to run unmolested. > - if the user (typically developer) specified ShowMessageBoxOnError, and the error box has not yet been shown > > There is a bug though, that also prevents the step timeout from firing if either condition is true. That is plain wrong. > > In addition to that, the test interval WatcherThread uses to check for timeouts should be decreased. It sits at 1 second, which is too coarse-grained. > > -------- > > Patch: > - reworks `VMError::check_timeout()` to never block step timeouts > - adds clarifying comments > - quadruples timeout check frequency by watcher thread > - adds regression test for timeout handling with OnError > - additionally limits timeout per individual error reporting step to 5 seconds. 5 seconds is usually enough to distinguish a slow error reporting step from one that is endlessly hanging. > > Tested locally on Linux x64. Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: feedback roman ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12936/files - new: https://git.openjdk.org/jdk/pull/12936/files/70b9add7..2c21ac35 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12936&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12936&range=01-02 Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/12936.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12936/head:pull/12936 PR: https://git.openjdk.org/jdk/pull/12936 From stuefe at openjdk.org Sat Mar 11 16:44:09 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 11 Mar 2023 16:44:09 GMT Subject: RFR: JDK-8303861: Error handling step timeouts should never be blocked by OnError and others [v2] In-Reply-To: References: Message-ID: On Fri, 10 Mar 2023 14:56:43 GMT, Roman Kennke wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> limit step timeout to 5 seconds max > > test/hotspot/jtreg/runtime/ErrorHandling/TimeoutInErrorHandlingTest.java line 40: > >> 38: >> 39: /* >> 40: * @test TimeoutInErrorHandlingTest-default > > Isn't the better way to name this just '@test TimeoutInErrorHandlingTest' and then '@id=default' and '@id=with-on-error' ? Not sure what you mean? There is no @id jtreg tag. https://openjdk.org/jtreg/tag-spec.html ------------- PR: https://git.openjdk.org/jdk/pull/12936 From kbarrett at openjdk.org Sat Mar 11 16:54:15 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 11 Mar 2023 16:54:15 GMT Subject: RFR: 8304016: Add BitMap find_last suite of functions Message-ID: Please review this change that adds functions to BitMap for finding the last set/clear bit in a range. Testing: mach5 tier1, including new gtesting for the new functions. ------------- Commit messages: - find_last_set/clear_bit Changes: https://git.openjdk.org/jdk/pull/12988/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12988&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8304016 Stats: 216 lines in 3 files changed: 160 ins; 24 del; 32 mod Patch: https://git.openjdk.org/jdk/pull/12988.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12988/head:pull/12988 PR: https://git.openjdk.org/jdk/pull/12988 From rkennke at openjdk.org Sat Mar 11 16:57:27 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Sat, 11 Mar 2023 16:57:27 GMT Subject: RFR: JDK-8303861: Error handling step timeouts should never be blocked by OnError and others [v2] In-Reply-To: References: Message-ID: On Sat, 11 Mar 2023 16:37:21 GMT, Thomas Stuefe wrote: > Not sure what you mean? There is no @id jtreg tag. https://openjdk.org/jtreg/tag-spec.html Sorry, I got this wrong. The way to do it is @test id=$somestring. See for example: https://github.com/openjdk/jdk/blob/a06426a52f16c08c95b1c0270a5fc40721921022/test/hotspot/jtreg/gc/stress/gcold/TestGCOldWithShenandoah.java#L27 ------------- PR: https://git.openjdk.org/jdk/pull/12936 From jwaters at openjdk.org Sat Mar 11 17:17:06 2023 From: jwaters at openjdk.org (Julian Waters) Date: Sat, 11 Mar 2023 17:17:06 GMT Subject: RFR: 8301244: Tidy up compiler specific warnings files [v10] In-Reply-To: References: Message-ID: > Cleans up some code in compilerWarnings_*.hpp files to be slightly neater Julian Waters has updated the pull request incrementally with one additional commit since the last revision: Revert to initial name ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12255/files - new: https://git.openjdk.org/jdk/pull/12255/files/30561ee1..a3185f7c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12255&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12255&range=08-09 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12255.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12255/head:pull/12255 PR: https://git.openjdk.org/jdk/pull/12255 From kbarrett at openjdk.org Sat Mar 11 17:17:11 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 11 Mar 2023 17:17:11 GMT Subject: RFR: 8301244: Tidy up compiler specific warnings files [v9] In-Reply-To: References: Message-ID: <1_2C-gE39fNXfVNmm4mvB7Y4ACOd5icaB-1Q7-eTVJo=.12608e78-be5d-4153-9f5c-9c2c2213c6cb@github.com> On Sat, 11 Mar 2023 14:55:56 GMT, Julian Waters wrote: >> Cleans up some code in compilerWarnings_*.hpp files to be slightly neater > > Julian Waters has updated the pull request incrementally with four additional commits since the last revision: > > - compilerWarnings_visCPP.hpp > - Spelling > - compilerWarnings_gcc.hpp > - macros.hpp Changes requested by kbarrett (Reviewer). src/hotspot/share/utilities/compilerWarnings_gcc.hpp line 37: > 35: #endif > 36: > 37: #define PRAGMA_DISABLE_GCC_WARNING(option) _Pragma(STR(GCC diagnostic ignored option)) s/option/option_string/ as before, to make it a little more obvious that the argument is a string literal. ------------- PR: https://git.openjdk.org/jdk/pull/12255 From jwaters at openjdk.org Sat Mar 11 17:17:12 2023 From: jwaters at openjdk.org (Julian Waters) Date: Sat, 11 Mar 2023 17:17:12 GMT Subject: RFR: 8301244: Tidy up compiler specific warnings files [v9] In-Reply-To: <1_2C-gE39fNXfVNmm4mvB7Y4ACOd5icaB-1Q7-eTVJo=.12608e78-be5d-4153-9f5c-9c2c2213c6cb@github.com> References: <1_2C-gE39fNXfVNmm4mvB7Y4ACOd5icaB-1Q7-eTVJo=.12608e78-be5d-4153-9f5c-9c2c2213c6cb@github.com> Message-ID: On Sat, 11 Mar 2023 17:01:00 GMT, Kim Barrett wrote: >> Julian Waters has updated the pull request incrementally with four additional commits since the last revision: >> >> - compilerWarnings_visCPP.hpp >> - Spelling >> - compilerWarnings_gcc.hpp >> - macros.hpp > > src/hotspot/share/utilities/compilerWarnings_gcc.hpp line 37: > >> 35: #endif >> 36: >> 37: #define PRAGMA_DISABLE_GCC_WARNING(option) _Pragma(STR(GCC diagnostic ignored option)) > > s/option/option_string/ as before, to make it a little more obvious that the argument is a string literal. Alright ------------- PR: https://git.openjdk.org/jdk/pull/12255 From dcubed at openjdk.org Sat Mar 11 18:17:34 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Sat, 11 Mar 2023 18:17:34 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v18] In-Reply-To: References: Message-ID: On Fri, 10 Mar 2023 09:55:03 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fixes in response to Daniel's review src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 684: > 682: xorptr(tmpReg, tmpReg); > 683: > 684: // Appears unlocked - try to swing _owner from null to curren thread. nit typo: s/curren thread/current thread/ ------------- PR: https://git.openjdk.org/jdk/pull/10907 From dcubed at openjdk.org Sat Mar 11 18:45:35 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Sat, 11 Mar 2023 18:45:35 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v21] In-Reply-To: References: Message-ID: <4Wi21QdftsyLGcwMc0P4ho3ZS6VZP4gP0MxWkok_gtM=.8c93b6b3-1f48-4979-833e-275278de9d98@github.com> On Fri, 10 Mar 2023 12:45:12 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Use nullptr instead of NULL in touched code (shared) Another partial review. This time I reviewed these files: src/hotspot/share/c1/c1_Compilation.cpp src/hotspot/share/c1/c1_Compilation.hpp src/hotspot/share/c1/c1_GraphBuilder.cpp src/hotspot/share/c1/c1_LIRAssembler.cpp src/hotspot/share/c1/c1_MacroAssembler.hpp src/hotspot/share/c1/c1_Runtime1.cpp src/hotspot/share/interpreter/interpreterRuntime.cpp src/hotspot/share/opto/c2_CodeStubs.hpp src/hotspot/share/opto/compile.cpp src/hotspot/share/opto/compile.hpp src/hotspot/share/opto/locknode.cpp src/hotspot/share/opto/parse1.cpp src/hotspot/share/opto/compile.hpp line 637: > 635: void push_monitor() { _max_monitors++; } > 636: void reset_max_monitors() { _max_monitors = 0; } > 637: uint max_monitors() { return _max_monitors; } The prevailing style in this file appears to have some indenting after the ')' and before the '{'. It's somewhat inconsistent as to how much, but mostly more than a single space. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From dcubed at openjdk.org Sat Mar 11 18:52:31 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Sat, 11 Mar 2023 18:52:31 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v21] In-Reply-To: References: Message-ID: On Fri, 10 Mar 2023 12:45:12 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Use nullptr instead of NULL in touched code (shared) I was pleasantly surprised at how few C1 and C2 changes were needed. Nice! At this point, I have not reviewed the 'ppc', 'riscv' or s390 files so I've done a first pass review of 63 of the 74 files. I'll have to double check that I didn't miss anything that I need to review and I'll have to do another crawl thru review pass after letting the code percolate in my brain for a few days without looking at it. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From kbarrett at openjdk.org Sun Mar 12 10:22:21 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Sun, 12 Mar 2023 10:22:21 GMT Subject: RFR: 8301244: Tidy up compiler specific warnings files [v10] In-Reply-To: References: Message-ID: <5WVBbaPCyTGUNbEr9drvxyofnDZdjPR8zVFX-sBcxuY=.05410d48-b8ae-4e0d-980b-9d8779ad15e1@github.com> On Sat, 11 Mar 2023 17:17:06 GMT, Julian Waters wrote: >> Cleans up some code in compilerWarnings_*.hpp files to be slightly neater > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Revert to initial name Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.org/jdk/pull/12255 From duke at openjdk.org Sun Mar 12 13:21:18 2023 From: duke at openjdk.org (Jan Kratochvil) Date: Sun, 12 Mar 2023 13:21:18 GMT Subject: RFR: 8302191: Performance degradation for float/double modulo on Linux [v12] In-Reply-To: References: Message-ID: > I have OCA already processed/approved. I am not Author but my Author request is being processed these days (sent to Rob McKenna). > I did regression test x86_64 OpenJDK-8. I will leave other regression testing on GHA. > The patch (and former GCC performance regression) affects only x86_64+i686. Jan Kratochvil has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: - Fix missing SharedRuntime::frem and SharedRuntime::drem on aarch64. - bugreported by sviswa7. - Merge branch 'master' into modulo - Fix #endif comment - found by dholmes-ora. - Merge branch 'master' into modulo - Fix win32 broken build. - Merge remote-tracking branch 'origin/master' into modulo - Always include the _WIN64 workaround - a review by dholmes-ora. - Remove comments to be moved to JBS (Bug System) - a review by jddarcy. - Uppercase L - a review by turbanoff. - Fix copyright author. - ... and 3 more: https://git.openjdk.org/jdk/compare/9f63957b...f03d4cdf ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12508/files - new: https://git.openjdk.org/jdk/pull/12508/files/e4ff04dc..f03d4cdf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12508&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12508&range=10-11 Stats: 1676 lines in 129 files changed: 670 ins; 408 del; 598 mod Patch: https://git.openjdk.org/jdk/pull/12508.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12508/head:pull/12508 PR: https://git.openjdk.org/jdk/pull/12508 From duke at openjdk.org Sun Mar 12 13:21:21 2023 From: duke at openjdk.org (Jan Kratochvil) Date: Sun, 12 Mar 2023 13:21:21 GMT Subject: RFR: 8302191: Performance degradation for float/double modulo on Linux [v11] In-Reply-To: References: Message-ID: On Thu, 9 Mar 2023 18:53:13 GMT, Sandhya Viswanathan wrote: >> Jan Kratochvil has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: >> >> - Fix #endif comment - found by dholmes-ora. >> - Merge branch 'master' into modulo >> - Fix win32 broken build. >> - Merge remote-tracking branch 'origin/master' into modulo >> - Always include the _WIN64 workaround - a review by dholmes-ora. >> - Remove comments to be moved to JBS (Bug System) - a review by jddarcy. >> - Uppercase L - a review by turbanoff. >> - Fix copyright author. >> - Fix WIN32 vs. WIN64. >> - Update according to the upstream review by David Holmes. >> - ... and 1 more: https://git.openjdk.org/jdk/compare/f101d786...e4ff04dc > > src/hotspot/share/runtime/sharedRuntime.cpp line 238: > >> 236: #endif >> 237: >> 238: #if !defined(TARGET_COMPILER_gcc) || defined(_WIN64) > > The aarch64 and other builds are now broken with missing SharedRuntime::frem and SharedRuntime::drem. True, thanks. ------------- PR: https://git.openjdk.org/jdk/pull/12508 From dholmes at openjdk.org Mon Mar 13 01:27:22 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 13 Mar 2023 01:27:22 GMT Subject: RFR: 8303150: DCmd framework unnecessarily creates a DCmd instance on registration Message-ID: When DCmd factories are registered, the factory is passed the number of arguments taken by the DCmd - using a template method `get_num_arguments`. For DCmds that don't extend DCmdWithParser there has to be a static `num_arguments()` method in that class. For DCmds that do extend DCmdWithParser the logic instantiates an instance of the DCmd, extracts its parser and calls its `num_arguments` method which dynamically counts the number of defined arguments. Creating an instance of each DCmd and dynamically counting arguments is inefficient and unnecessary, the number of arguments is statically known and easily expressed (in fact many of the JFR DCmds already statically define this). So we add the static `num_arguments()` method to each class that needs it and return the statically counted number of arguments. To ensure the static number and actual number don't get out-of-sync, we keep the original dynamic logic for use in debug builds to assert that the static and dynamic counts are the same. The assert will trigger during a debug build if something does get out of sync, for example if a new DCmd (extending DCmdWithParser) were added but didn't define the static `num_arguments()` method. A number of DCmd classes were unnecessarily defining their own dynamic version of `num_arguments` and these are now removed. In the template method I use `ENABLE_IF(std::is_convertible::value)` to check we only call this on DCmd classes. This may be unnecessary but it seemed consistent with the other template methods. Note that `std::is_base_of` only works for immediate super types. Testing: tiers 1-4 Performance: in theory we should see some improvement in startup; in practice it is barely noticeable. Thanks. ------------- Commit messages: - 8303150: DCmd framework unnecessarily creates a DCmd instance on registration Changes: https://git.openjdk.org/jdk/pull/12994/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12994&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303150 Stats: 99 lines in 12 files changed: 34 ins; 57 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/12994.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12994/head:pull/12994 PR: https://git.openjdk.org/jdk/pull/12994 From dholmes at openjdk.org Mon Mar 13 05:37:25 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 13 Mar 2023 05:37:25 GMT Subject: RFR: 8303908: Add missing check in VTMS_transition_disable_for_all() for suspend mode In-Reply-To: References: Message-ID: On Thu, 9 Mar 2023 18:55:06 GMT, Patricio Chilano Mateo wrote: > Please review this small fix. A suspender is a JvmtiVTMSTransitionDisabler monopolist, meaning VTMS_transition_disable_for_all() should not return while there is any active jvmtiVTMSTransitionDisabler. The code though is checking for active "all-disablers" but it's missing the check for active "single disablers". > I attached a simple reproducer to the bug which I used to test the patch. Not sure if it was worth adding a test so the patch contains just the fix. > > Thanks, > Patricio Looks good. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/12956 From dholmes at openjdk.org Mon Mar 13 05:37:28 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 13 Mar 2023 05:37:28 GMT Subject: RFR: 8303908: Add missing check in VTMS_transition_disable_for_all() for suspend mode In-Reply-To: References: Message-ID: On Fri, 10 Mar 2023 17:06:58 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/share/prims/jvmtiThreadState.cpp line 372: >> >>> 370: java_lang_Thread::dec_VTMS_transition_disable_count(vth()); >>> 371: Atomic::dec(&_VTMS_transition_disable_for_one_count); >>> 372: if (_VTMS_transition_disable_for_one_count == 0 || _is_SR) { >> >> Sorry I don't understand why this `_is_SR` check was removed. I admit I can't really figure out what this field means anyway, but there is nothing in the issue description that suggests this also needs changing - and it is now different to `VTMS_transition_enable_for_all`. > > A JvmtiVTMSTransitionDisabler instance that is a "single disabler" only blocks other virtual threads trying to transition or JvmtiVTMSTransitionDisabler monopolists. Both of them will check for _VTMS_transition_disable_for_one_count (the JvmtiVTMSTransitionDisabler monopolist was missing that check) so just checking when that counter is zero is enough. In fact, for a "single disabler" _is_SR is always false so that check wasn't doing anything. Yes, this is not actually needed for the fix, but when looking at which condition we use to wait and which one to notify I caught this, sorry for not explaining that part. > > And looking closer at VTMS_transition_enable_for_all() now I see the check for _is_SR is not doing anything too, because if _VTMS_transition_disable_for_all_count was not zero after the decrement then this can't be a JvmtiVTMSTransitionDisabler monopolist, i.e _is_SR will be false. When a monopolist is running all other "disable all" JvmtiVTMSTransitionDisabler instances if any will be waiting in the first "while (_SR_mode)" loop in VTMS_transition_disable_for_all(), so _VTMS_transition_disable_for_all_count will be one through the monopolist run. So this should be an assert after the decrement: assert(!_is_SR || _VTMS_transition_disable_for_all_count == 0, ""). Thanks for clarifying - I was puzzled by the way `is_SR` was being used. ------------- PR: https://git.openjdk.org/jdk/pull/12956 From stuefe at openjdk.org Mon Mar 13 05:50:24 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 13 Mar 2023 05:50:24 GMT Subject: RFR: JDK-8303861: Error handling step timeouts should never be blocked by OnError and others [v2] In-Reply-To: References: Message-ID: On Sat, 11 Mar 2023 16:54:47 GMT, Roman Kennke wrote: >> Not sure what you mean? There is no @id jtreg tag. https://openjdk.org/jtreg/tag-spec.html > >> Not sure what you mean? There is no @id jtreg tag. https://openjdk.org/jtreg/tag-spec.html > > Sorry, I got this wrong. The way to do it is @test id=$somestring. See for example: > > https://github.com/openjdk/jdk/blob/a06426a52f16c08c95b1c0270a5fc40721921022/test/hotspot/jtreg/gc/stress/gcold/TestGCOldWithShenandoah.java#L27 Oh, you are right. Its also plain wrong, that specifier does nothing and gets ignored. I'll switch to "id" ------------- PR: https://git.openjdk.org/jdk/pull/12936 From stuefe at openjdk.org Mon Mar 13 05:57:01 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 13 Mar 2023 05:57:01 GMT Subject: RFR: JDK-8303861: Error handling step timeouts should never be blocked by OnError and others [v4] In-Reply-To: References: Message-ID: > Fatal error handling is subject to several timeouts: > - a global timeout (controlled via ErrorLogTimeout) > - local error reporting step timeouts. > > The latter aims to "give the JVM a kick" if it gets stuck in one particular place during error reporting. This prevents one error reporting step from hogging all the time allotted to error reporting under ErrorLogTimeout. > > There are three situations where atm we suppress the global error timeout: > - if the JVM is embedded and the launcher has its abort hook installed. Obviously, that must be allowed to run. > - if the user specified one or more OnError commands to run, and these did not yet run. These must have a chance to run unmolested. > - if the user (typically developer) specified ShowMessageBoxOnError, and the error box has not yet been shown > > There is a bug though, that also prevents the step timeout from firing if either condition is true. That is plain wrong. > > In addition to that, the test interval WatcherThread uses to check for timeouts should be decreased. It sits at 1 second, which is too coarse-grained. > > -------- > > Patch: > - reworks `VMError::check_timeout()` to never block step timeouts > - adds clarifying comments > - quadruples timeout check frequency by watcher thread > - adds regression test for timeout handling with OnError > - additionally limits timeout per individual error reporting step to 5 seconds. 5 seconds is usually enough to distinguish a slow error reporting step from one that is endlessly hanging. > > Tested locally on Linux x64. Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: correct test names ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12936/files - new: https://git.openjdk.org/jdk/pull/12936/files/2c21ac35..52f382db Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12936&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12936&range=02-03 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/12936.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12936/head:pull/12936 PR: https://git.openjdk.org/jdk/pull/12936 From dholmes at openjdk.org Mon Mar 13 06:02:33 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 13 Mar 2023 06:02:33 GMT Subject: RFR: JDK-8303861: Error handling step timeouts should never be blocked by OnError and others [v4] In-Reply-To: References: Message-ID: On Mon, 13 Mar 2023 05:57:01 GMT, Thomas Stuefe wrote: >> Fatal error handling is subject to several timeouts: >> - a global timeout (controlled via ErrorLogTimeout) >> - local error reporting step timeouts. >> >> The latter aims to "give the JVM a kick" if it gets stuck in one particular place during error reporting. This prevents one error reporting step from hogging all the time allotted to error reporting under ErrorLogTimeout. >> >> There are three situations where atm we suppress the global error timeout: >> - if the JVM is embedded and the launcher has its abort hook installed. Obviously, that must be allowed to run. >> - if the user specified one or more OnError commands to run, and these did not yet run. These must have a chance to run unmolested. >> - if the user (typically developer) specified ShowMessageBoxOnError, and the error box has not yet been shown >> >> There is a bug though, that also prevents the step timeout from firing if either condition is true. That is plain wrong. >> >> In addition to that, the test interval WatcherThread uses to check for timeouts should be decreased. It sits at 1 second, which is too coarse-grained. >> >> -------- >> >> Patch: >> - reworks `VMError::check_timeout()` to never block step timeouts >> - adds clarifying comments >> - quadruples timeout check frequency by watcher thread >> - adds regression test for timeout handling with OnError >> - additionally limits timeout per individual error reporting step to 5 seconds. 5 seconds is usually enough to distinguish a slow error reporting step from one that is endlessly hanging. >> >> Tested locally on Linux x64. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > correct test names test/hotspot/jtreg/runtime/ErrorHandling/TimeoutInErrorHandlingTest.java line 51: > 49: > 50: /* > 51: * @test id=@with-on-error Do you really want the `@` in the name? ------------- PR: https://git.openjdk.org/jdk/pull/12936 From stefank at openjdk.org Mon Mar 13 06:38:29 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 13 Mar 2023 06:38:29 GMT Subject: RFR: 8292059: Do not inline InstanceKlass::allocate_instance() [v5] In-Reply-To: <0mRQdiYkOqYL4mvfGOiUrsFGDUgTq_FMbBrV9DmBZWo=.1630b1b8-31e9-4f76-9b1e-e202711c8714@github.com> References: <0mRQdiYkOqYL4mvfGOiUrsFGDUgTq_FMbBrV9DmBZWo=.1630b1b8-31e9-4f76-9b1e-e202711c8714@github.com> Message-ID: On Sat, 11 Mar 2023 16:31:06 GMT, Afshin Zafari wrote: >> The inline and not-inline versions of the method is tested to compare the performance difference. >> ### Test >> `make test TEST=micro:Capture0.lambda_01 MICRO="VM_OPTIONS=-XX:TieredStopAtLevel=1" ` > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > 8292059: Do not inline InstanceKlass::allocate_instance() Changes requested by stefank (Reviewer). src/hotspot/share/oops/oop.inline.hpp line 37: > 35: #include "oops/compressedOops.inline.hpp" > 36: #include "oops/instanceKlass.hpp" > 37: #include "oops/klass.inline.hpp" I don't see anything in oop.inline.hpp that needs klass.inline.hpp. src/hotspot/share/prims/jni.cpp line 61: > 59: #include "oops/instanceKlass.inline.hpp" > 60: #include "oops/instanceOop.hpp" > 61: #include "oops/klass.inline.hpp" jni.cpp uses klass.inline.hpp, so this line needs to stay. ------------- PR: https://git.openjdk.org/jdk/pull/12782 From stefank at openjdk.org Mon Mar 13 06:38:31 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 13 Mar 2023 06:38:31 GMT Subject: RFR: 8292059: Do not inline InstanceKlass::allocate_instance() [v5] In-Reply-To: References: <8koli6nAbt8Rx4Je8MRic0dPloLTb9IiUyw25BvUI0s=.07267dc7-d2a1-4426-8876-1e41b1a248ac@github.com> Message-ID: On Mon, 6 Mar 2023 13:30:31 GMT, Stefan Karlsson wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> 8292059: Do not inline InstanceKlass::allocate_instance() > > src/hotspot/share/oops/instanceKlass.inline.hpp line 28: > >> 26: #define SHARE_OOPS_INSTANCEKLASS_INLINE_HPP >> 27: >> 28: #include "oops/instanceKlass.hpp" > > All these includes should *not* be removed. This file should still doesn't *explicitly* include everything it uses. It also should keep on including "oops/instanceKlass.hpp" (with the blank line after it, as per our style guide). ------------- PR: https://git.openjdk.org/jdk/pull/12782 From dholmes at openjdk.org Mon Mar 13 06:51:33 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 13 Mar 2023 06:51:33 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v10] In-Reply-To: References: Message-ID: On Fri, 10 Mar 2023 10:43:23 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> We are adding support to let JFR report on Agents. >> >> #### Design >> >> An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. >> >> A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) >> >> A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. >> >> To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: >> >> // Command line >> jdk.JavaAgent { >> startTime = 12:31:19.789 (2023-03-08) >> name = "JavaAgent.jar" >> options = "foo=bar" >> dynamic = false >> initialization = 12:31:15.574 (2023-03-08) >> initializationTime = 172 ms >> } >> >> // Dynamic load >> jdk.JavaAgent { >> startTime = 12:31:31.158 (2023-03-08) >> name = "JavaAgent.jar" >> options = "bar=baz" >> dynamic = true >> initialization = 12:31:31.037 (2023-03-08) >> initializationTime = 64,1 ms >> } >> >> The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. >> >> For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. >> >> The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) >> >> "initialization" is the timestamp the JVM invoked the initialization method, and "initializationTime" is the duration of executing the initialization method. >> >> "startTime" represents the time the JFR framework issued the periodic event; hence "initialization" will be earlier than "startTime". >> >> An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. >> >> To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: >> >> jdk.NativeAgent { >> startTime = 12:31:40.398 (2023-03-08) >> name = "jdwp" >> options = "transport=dt_socket,server=y,address=any,onjcmd=y" >> dynamic = false >> initialization = 12:31:36.142 (2023-03-08) >> initializationTime = 0,00184 ms >> path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" >> } >> >> The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. >> >> The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initialization" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationTime" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationTime" will be 0. >> >> #### Implementation >> >> There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. >> >> Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. >> >> When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. >> >> The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. >> >> The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. >> >> Testing: jdk_jfr, tier 1 - 6 >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: > > more cleanup Still working through this. A few minor comments below. src/hotspot/share/prims/agent.cpp line 34: > 32: } > 33: > 34: static const char* allocate_copy(const char* str) { Why not just use `os::strdup`? src/hotspot/share/prims/agentList.cpp line 64: > 62: void AgentList::add_xrun(const char* name, char* options, bool absolute_path) { > 63: Agent* agent = new Agent(name, options, absolute_path); > 64: agent->_is_xrun = true; Why direct access of private field instead of having a setter like other parts of the Agent API? src/hotspot/share/prims/agentList.cpp line 227: > 225: * store data in their JvmtiEnv local storage. > 226: * > 227: * Please see JPLISAgent.c in module java.instrument, see JPLISAgent.h and JPLISAgent.c. No need to mention the .c file twice. src/hotspot/share/prims/agentList.cpp line 419: > 417: const jint err = (*on_load_entry)(&main_vm, const_cast(agent->options()), NULL); > 418: if (err != JNI_OK) { > 419: vm_exit_during_initialization("-Xrun library failed to init", agent->name()); Do you need to be back in `_thread_in_vm` before exiting? src/hotspot/share/prims/agentList.cpp line 542: > 540: > 541: // Invoke the Agent_OnAttach function > 542: JavaThread* THREAD = JavaThread::current(); // For exception macros. Nit: just use `current` rather than `THREAD` and don't use the exception macros. ------------- PR: https://git.openjdk.org/jdk/pull/12923 From dholmes at openjdk.org Mon Mar 13 06:51:37 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 13 Mar 2023 06:51:37 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v9] In-Reply-To: References: Message-ID: On Thu, 9 Mar 2023 16:58:42 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> We are adding support to let JFR report on Agents. >> >> #### Design >> >> An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. >> >> A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) >> >> A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. >> >> To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: >> >> // Command line >> jdk.JavaAgent { >> startTime = 12:31:19.789 (2023-03-08) >> name = "JavaAgent.jar" >> options = "foo=bar" >> dynamic = false >> initialization = 12:31:15.574 (2023-03-08) >> initializationTime = 172 ms >> } >> >> // Dynamic load >> jdk.JavaAgent { >> startTime = 12:31:31.158 (2023-03-08) >> name = "JavaAgent.jar" >> options = "bar=baz" >> dynamic = true >> initialization = 12:31:31.037 (2023-03-08) >> initializationTime = 64,1 ms >> } >> >> The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. >> >> For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. >> >> The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) >> >> "initialization" is the timestamp the JVM invoked the initialization method, and "initializationTime" is the duration of executing the initialization method. >> >> "startTime" represents the time the JFR framework issued the periodic event; hence "initialization" will be earlier than "startTime". >> >> An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. >> >> To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: >> >> jdk.NativeAgent { >> startTime = 12:31:40.398 (2023-03-08) >> name = "jdwp" >> options = "transport=dt_socket,server=y,address=any,onjcmd=y" >> dynamic = false >> initialization = 12:31:36.142 (2023-03-08) >> initializationTime = 0,00184 ms >> path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" >> } >> >> The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. >> >> The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initialization" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationTime" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationTime" will be 0. >> >> #### Implementation >> >> There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. >> >> Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. >> >> When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. >> >> The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. >> >> The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. >> >> Testing: jdk_jfr, tier 1 - 6 >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: > > handle multiple envs with same VMInit callback src/hotspot/share/prims/agent.cpp line 41: > 39: char* copy = AllocateHeap(length + 1, mtInternal); > 40: strncpy(copy, str, length + 1); > 41: assert(strncmp(copy, str, length + 1) == 0, "invariant"); Unclear what you are checking here. Don't you trust strncpy? ------------- PR: https://git.openjdk.org/jdk/pull/12923 From stuefe at openjdk.org Mon Mar 13 06:53:27 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 13 Mar 2023 06:53:27 GMT Subject: RFR: JDK-8303861: Error handling step timeouts should never be blocked by OnError and others [v4] In-Reply-To: References: Message-ID: On Mon, 13 Mar 2023 05:59:47 GMT, David Holmes wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> correct test names > > test/hotspot/jtreg/runtime/ErrorHandling/TimeoutInErrorHandlingTest.java line 51: > >> 49: >> 50: /* >> 51: * @test id=@with-on-error > > Do you really want the `@` in the name? Argh. Typo, sorry. New keyboard. ------------- PR: https://git.openjdk.org/jdk/pull/12936 From rehn at openjdk.org Mon Mar 13 09:38:13 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 13 Mar 2023 09:38:13 GMT Subject: Integrated: 8300926: Several startup regressions ~6-70% in 21-b6 all platforms In-Reply-To: References: Message-ID: On Thu, 16 Feb 2023 08:38:42 GMT, Robbin Ehn wrote: > Hi all, please consider. > > The original issue was when thread 1 asked to deopt nmethod set X and thread 2 asked for the same or a subset of X. > All method will already be marked, but the actual deoptimizing, not entrant, patching PC on stacks and patching post call nops, was not done yet. Which meant thread 2 could 'pass' thread 1. > Most places did deopt under Compile_lock, thus this is not an issue, but WB and clearCallSiteContext do not. > > Since a handshakes may take long before completion and Compile_lock is used for so much more than deopts. > The fix in https://bugs.openjdk.org/browse/JDK-8299074 instead always emitted a handshake even when everything was already marked. (instead of adding Compile_lock to all places) > > This turnout to be problematic in the startup, for example the number of deopt handshakes in jetty dry run startup went from 5 to 39 handshakes. > > This fix first adds a barrier for which you do not pass until the requested deopts have happened and it coalesces the handshakes. > Secondly it moves handshakes part out of the Compile_lock where it is possible. > > Which means we fix the performance bug and we reduce the contention on Compile_lock, meaning higher throughput in compiler and things such as class-loading. > > It passes t1-t7 with flying colours! t8 still not completed and I'm redoing some testing due to last minute simplifications. > > Thanks, Robbin This pull request has now been integrated. Changeset: c183fce9 Author: Robbin Ehn URL: https://git.openjdk.org/jdk/commit/c183fce9543ca15f5db632babecdb7797d0745e4 Stats: 449 lines in 24 files changed: 215 ins; 109 del; 125 mod 8300926: Several startup regressions ~6-70% in 21-b6 all platforms Reviewed-by: eosterlund, dcubed, coleenp ------------- PR: https://git.openjdk.org/jdk/pull/12585 From adinn at openjdk.org Mon Mar 13 09:48:43 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 13 Mar 2023 09:48:43 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v10] In-Reply-To: References: Message-ID: On Fri, 10 Mar 2023 10:43:23 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> We are adding support to let JFR report on Agents. >> >> #### Design >> >> An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. >> >> A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) >> >> A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. >> >> To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: >> >> // Command line >> jdk.JavaAgent { >> startTime = 12:31:19.789 (2023-03-08) >> name = "JavaAgent.jar" >> options = "foo=bar" >> dynamic = false >> initialization = 12:31:15.574 (2023-03-08) >> initializationTime = 172 ms >> } >> >> // Dynamic load >> jdk.JavaAgent { >> startTime = 12:31:31.158 (2023-03-08) >> name = "JavaAgent.jar" >> options = "bar=baz" >> dynamic = true >> initialization = 12:31:31.037 (2023-03-08) >> initializationTime = 64,1 ms >> } >> >> The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. >> >> For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. >> >> The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) >> >> "initialization" is the timestamp the JVM invoked the initialization method, and "initializationTime" is the duration of executing the initialization method. >> >> "startTime" represents the time the JFR framework issued the periodic event; hence "initialization" will be earlier than "startTime". >> >> An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. >> >> To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: >> >> jdk.NativeAgent { >> startTime = 12:31:40.398 (2023-03-08) >> name = "jdwp" >> options = "transport=dt_socket,server=y,address=any,onjcmd=y" >> dynamic = false >> initialization = 12:31:36.142 (2023-03-08) >> initializationTime = 0,00184 ms >> path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" >> } >> >> The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. >> >> The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initialization" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationTime" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationTime" will be 0. >> >> #### Implementation >> >> There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. >> >> Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. >> >> When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. >> >> The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. >> >> The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. >> >> Testing: jdk_jfr, tier 1 - 6 >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: > > more cleanup src/hotspot/share/jfr/metadata/metadata.xml line 1182: > 1180: > 1181: > 1182: @mgronlun A somewhat drive-by comment. It might be clearer if you renamed these event fields and accessors, plus also the corresponding fields and accessors in class Agent, as `initializationTime` and `initializationDuration`. ------------- PR: https://git.openjdk.org/jdk/pull/12923 From adinn at openjdk.org Mon Mar 13 09:52:38 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 13 Mar 2023 09:52:38 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v10] In-Reply-To: References: Message-ID: On Mon, 13 Mar 2023 06:29:11 GMT, David Holmes wrote: >> Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: >> >> more cleanup > > src/hotspot/share/prims/agentList.cpp line 64: > >> 62: void AgentList::add_xrun(const char* name, char* options, bool absolute_path) { >> 63: Agent* agent = new Agent(name, options, absolute_path); >> 64: agent->_is_xrun = true; > > Why direct access of private field instead of having a setter like other parts of the Agent API? n.b. that also applies for accesses/updates to field _next. ------------- PR: https://git.openjdk.org/jdk/pull/12923 From tschatzl at openjdk.org Mon Mar 13 09:59:40 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 13 Mar 2023 09:59:40 GMT Subject: RFR: 8303963: Replace various encodings of UINT/SIZE_MAX in gc code In-Reply-To: References: Message-ID: On Fri, 10 Mar 2023 14:20:54 GMT, Albert Mingkun Yang wrote: >> Hi all, >> >> please review this refactoring that replaces various casts in GC and more-or-less related to get all bits set in an uint/size_t with the available constants from cstdint. >> The ones in ZGC files were skipped on request. >> >> Testing: local compilation, gha >> >> Thanks, >> Thomas > > Marked as reviewed by ayang (Reviewer). Thanks @albertnetymk @kimbarrett for your reviews ------------- PR: https://git.openjdk.org/jdk/pull/12973 From tschatzl at openjdk.org Mon Mar 13 09:59:42 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 13 Mar 2023 09:59:42 GMT Subject: Integrated: 8303963: Replace various encodings of UINT/SIZE_MAX in gc code In-Reply-To: References: Message-ID: On Fri, 10 Mar 2023 12:58:42 GMT, Thomas Schatzl wrote: > Hi all, > > please review this refactoring that replaces various casts in GC and more-or-less related to get all bits set in an uint/size_t with the available constants from cstdint. > The ones in ZGC files were skipped on request. > > Testing: local compilation, gha > > Thanks, > Thomas This pull request has now been integrated. Changeset: b575e54b Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/b575e54bc96c8fc413893dbbe91d0b5ce0192179 Stats: 15 lines in 13 files changed: 0 ins; 2 del; 13 mod 8303963: Replace various encodings of UINT/SIZE_MAX in gc code Reviewed-by: ayang, kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/12973 From stuefe at openjdk.org Mon Mar 13 10:15:37 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 13 Mar 2023 10:15:37 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v21] In-Reply-To: References: Message-ID: On Fri, 10 Mar 2023 12:45:12 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Use nullptr instead of NULL in touched code (shared) More comments src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 884: > 882: fast_unlock(obj_reg, header_reg, swap_reg, rscratch1, slow_case); > 883: b(count); > 884: bind(slow_case); small nit, move the bind to where the slow case actually is? src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 892: > 890: // Test for recursion > 891: cbz(header_reg, count); > 892: Not your patch, but I found interesting that arm does actually zero out the object slot in the BasicLock on the stack. I assume that is not needed, right? src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 6220: > 6218: // - obj: the object to be locked > 6219: // - hdr: the header, already loaded from obj, will be destroyed > 6220: // - t1, t2, t3: temporary registers, will be destroyed Adapt comment: we don't use t3 src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 6249: > 6247: } > 6248: > 6249: void MacroAssembler::fast_unlock(Register obj, Register hdr, Register t1, Register t2, Label& slow) { Could you add a comment here too as you did for lock? ------------- PR: https://git.openjdk.org/jdk/pull/10907 From stuefe at openjdk.org Mon Mar 13 10:15:40 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 13 Mar 2023 10:15:40 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v21] In-Reply-To: <7265U-aASDjFX1CMrbxDZZCPHrYJkufD1QDFBuB1WSA=.623488a7-9ede-4ec2-b840-1e5601a9b97a@github.com> References: <7265U-aASDjFX1CMrbxDZZCPHrYJkufD1QDFBuB1WSA=.623488a7-9ede-4ec2-b840-1e5601a9b97a@github.com> Message-ID: <58K_81BGoa-o4Pq1LtVhV_fOyYvSIxV1e66syvpefro=.9496823a-207e-467b-917d-f0f8852cda5c@github.com> On Sat, 11 Mar 2023 14:53:29 GMT, Thomas Stuefe wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 >> - Use nullptr instead of NULL in touched code (shared) > > src/hotspot/share/runtime/lockStack.hpp line 52: > >> 50: static ByteSize limit_offset() { return byte_offset_of(LockStack, _limit); } >> 51: >> 52: static void ensure_lock_stack_size(oop* _required_limit); > > I would split this, do the comparison inline, only the actual growth in the cpp file. Just realized that this interface is actually a bit odd: since we pass a wish pointer that has nothing to do with either current state nor final result. In fact, the pointer could at the moment point into the lock stack of a different thread. So this is "the pointer that would designate the end of the LockStack if the lockstack were enlarged *in-place*". Maybe add a comment like that. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From stuefe at openjdk.org Mon Mar 13 10:24:39 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 13 Mar 2023 10:24:39 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v21] In-Reply-To: References: Message-ID: <6jWzeHbL7AH2PDn3-k_3B8jWKfVs3VEG9up7pw265n0=.3ce51a4e-4148-433f-991f-606802d79d50@github.com> On Fri, 10 Mar 2023 12:45:12 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Use nullptr instead of NULL in touched code (shared) Lockstack::grow: I would add a mode (either tied to ASSERT or as a stand-alone switch, I prefer the latter) to give us just as many slots as needed and no more. With NMT on, this will give us trailing canaries so we have overwriter alerts on the next grow. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From fparain at openjdk.org Mon Mar 13 12:09:10 2023 From: fparain at openjdk.org (Frederic Parain) Date: Mon, 13 Mar 2023 12:09:10 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v2] In-Reply-To: References: Message-ID: > Please review this change re-implementing the FieldInfo data structure. > > The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. > > The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). > > More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 > > Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. > > The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. > > Tested with mach5, tier 1 to 7. > > Thank you. Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: Addressing comments from first reviews ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12855/files - new: https://git.openjdk.org/jdk/pull/12855/files/42a4d6a0..ce1180ef Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12855&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12855&range=00-01 Stats: 111 lines in 13 files changed: 40 ins; 22 del; 49 mod Patch: https://git.openjdk.org/jdk/pull/12855.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12855/head:pull/12855 PR: https://git.openjdk.org/jdk/pull/12855 From inakonechnyy at openjdk.org Mon Mar 13 12:45:34 2023 From: inakonechnyy at openjdk.org (Ilarion Nakonechnyy) Date: Mon, 13 Mar 2023 12:45:34 GMT Subject: RFR: 8302491: NoClassDefFoundError omits the original cause of an error [v7] In-Reply-To: References: Message-ID: <6rT55JPqgHHPzlDCiUb_SHK1YtQ0rDQlpZ7X3z73uow=.b6ae4f58-bb8f-4a40-9539-4b51f3b0c049@github.com> On Thu, 9 Mar 2023 23:12:00 GMT, Ilarion Nakonechnyy wrote: >> The proposed approach added a new function for getting the cause of an exception -`java_lang_Throwable::get_cause_simple `, that gets called within `InstanceKlass::add_initialization_error` if an old one `java_lang_Throwable::get_cause_with_stack_trace` didn't succeed because of an exception during the VM call. The simple function doesn't call the VM for getting a stack trace but fills in any other information about an exception. >> >> Besides that, the discovering information about an exception was added to `ConstantPoolCacheEntry::save_and_throw_indy_exc` function. >> >> Jtreg for reproducing the issue also was added to the commit. >> The commit was tested with tier1 tests. > > Ilarion Nakonechnyy has updated the pull request incrementally with one additional commit since the last revision: > > Some corrections Thank you for the review, thanks! ------------- PR: https://git.openjdk.org/jdk/pull/12566 From inakonechnyy at openjdk.org Mon Mar 13 13:48:05 2023 From: inakonechnyy at openjdk.org (Ilarion Nakonechnyy) Date: Mon, 13 Mar 2023 13:48:05 GMT Subject: RFR: 8302491: NoClassDefFoundError omits the original cause of an error [v8] In-Reply-To: References: Message-ID: > The proposed approach added a new function for getting the cause of an exception -`java_lang_Throwable::get_cause_simple `, that gets called within `InstanceKlass::add_initialization_error` if an old one `java_lang_Throwable::get_cause_with_stack_trace` didn't succeed because of an exception during the VM call. The simple function doesn't call the VM for getting a stack trace but fills in any other information about an exception. > > Besides that, the discovering information about an exception was added to `ConstantPoolCacheEntry::save_and_throw_indy_exc` function. > > Jtreg for reproducing the issue also was added to the commit. > The commit was tested with tier1 tests. Ilarion Nakonechnyy has updated the pull request incrementally with one additional commit since the last revision: minor correction ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12566/files - new: https://git.openjdk.org/jdk/pull/12566/files/5e575962..e55cb339 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12566&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12566&range=06-07 Stats: 7 lines in 1 file changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/12566.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12566/head:pull/12566 PR: https://git.openjdk.org/jdk/pull/12566 From jwaters at openjdk.org Mon Mar 13 14:30:41 2023 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 13 Mar 2023 14:30:41 GMT Subject: RFR: 8301244: Tidy up compiler specific warnings files [v8] In-Reply-To: References: Message-ID: On Wed, 1 Feb 2023 07:46:06 GMT, David Holmes wrote: >> Julian Waters has updated the pull request incrementally with four additional commits since the last revision: >> >> - gcc >> - VISCPP >> - Comment >> - Re-add PRAGMA > > So can I assume you have some uses for this in the pipeline? Thanks Kim, waiting for a second reviewer (@dholmes-ora perhaps?) ------------- PR: https://git.openjdk.org/jdk/pull/12255 From pchilanomate at openjdk.org Mon Mar 13 15:49:24 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 13 Mar 2023 15:49:24 GMT Subject: RFR: 8303908: Add missing check in VTMS_transition_disable_for_all() for suspend mode In-Reply-To: References: Message-ID: On Mon, 13 Mar 2023 05:34:06 GMT, David Holmes wrote: > Looks good. Thanks. > Thanks for the review David! ------------- PR: https://git.openjdk.org/jdk/pull/12956 From sviswanathan at openjdk.org Mon Mar 13 16:14:30 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 13 Mar 2023 16:14:30 GMT Subject: RFR: 8302191: Performance degradation for float/double modulo on Linux [v12] In-Reply-To: References: Message-ID: On Sun, 12 Mar 2023 13:21:18 GMT, Jan Kratochvil wrote: >> I have OCA already processed/approved. I am not Author but my Author request is being processed these days (sent to Rob McKenna). >> I did regression test x86_64 OpenJDK-8. I will leave other regression testing on GHA. >> The patch (and former GCC performance regression) affects only x86_64+i686. > > Jan Kratochvil has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: > > - Fix missing SharedRuntime::frem and SharedRuntime::drem on aarch64. > - bugreported by sviswa7. > - Merge branch 'master' into modulo > - Fix #endif comment - found by dholmes-ora. > - Merge branch 'master' into modulo > - Fix win32 broken build. > - Merge remote-tracking branch 'origin/master' into modulo > - Always include the _WIN64 workaround - a review by dholmes-ora. > - Remove comments to be moved to JBS (Bug System) - a review by jddarcy. > - Uppercase L - a review by turbanoff. > - Fix copyright author. > - ... and 3 more: https://git.openjdk.org/jdk/compare/37d28306...f03d4cdf Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR: https://git.openjdk.org/jdk/pull/12508 From sviswanathan at openjdk.org Mon Mar 13 16:14:32 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 13 Mar 2023 16:14:32 GMT Subject: RFR: 8302191: Performance degradation for float/double modulo on Linux [v9] In-Reply-To: <_z7GiZvqh8NpSWzxLMKrcQB_0g_xz8lneN9LOI6DedI=.6453aa68-1f70-4662-897c-04537214cd8a@github.com> References: <_z7GiZvqh8NpSWzxLMKrcQB_0g_xz8lneN9LOI6DedI=.6453aa68-1f70-4662-897c-04537214cd8a@github.com> Message-ID: <4kBiCRBRZjc1gAPnVYcQnqherkdMyzBbyLVZGLcf_nc=.1bcd7bdd-35c2-480d-ab81-4c38a7c7a35c@github.com> On Sat, 4 Mar 2023 01:11:33 GMT, Sandhya Viswanathan wrote: > Very nice performance increase. The only concern I have is that the x87 fpu control (using fldcw instruction) is not set in 64 bit builds by the JVM anymore explicitly. It is only set in the 32bit builds. Maybe @iwanowww or @shipilev have some thoughts on this. The default value of x87 fpu control word (0x37F) works for this case so it should be ok. ------------- PR: https://git.openjdk.org/jdk/pull/12508 From mdoerr at openjdk.org Mon Mar 13 16:23:21 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 13 Mar 2023 16:23:21 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v3] In-Reply-To: References: <8b3vVrV22RuhdRoRYacXV0ZeghFGgKkC8S_z-iMrzAQ=.dd84b743-8b51-4281-8f5f-f9eff6207bc7@github.com> Message-ID: On Sat, 11 Mar 2023 02:20:31 GMT, Jorn Vernee wrote: >> @JornVernee: Thanks! I've merged in your changes. TestArrayStructs is not yet completely working. I will need to investigate. I think I've done most other things you had requested. You may want to take a look at my recent commits. > > @TheRealMDoerr I've been keeping up with the changes you're making. I just have to look at the new test for HFA's you've added you added (next week). > > Besides fixing the TestArrayStructs test, do you have anything else you still want to add to this PR? @JornVernee: I guess I should add a couple of Upcalls to my new test. Otherwise, I have only planned to fix `TestArrayStructs`. Further changes (and maybe new tests) can still get done when working on Big Endian / AIX or when there is a demand. I'm currently wondering about the `TestArrayStructs` failures. Passing arrays with up to 7 elements seems to work fine. When I pass 8 elements, the last element of `capturedArgs` gets observed as 0. When I pass more than 8 elements, element 5 and 6 of `capturedArgs` get observed as 0. `DowncallLinker.invokeInterpBindings` has the correct `args`, but `UpcallLinker.invokeInterpBindings` doesn't receive the correct values as `lowLevelArgs`. They contain the wrong zeros. The remaining elements look correct. Do you have an idea what could be going wrong? Otherwise, I'll have to continue debugging. ------------- PR: https://git.openjdk.org/jdk/pull/12708 From fparain at openjdk.org Mon Mar 13 16:26:06 2023 From: fparain at openjdk.org (Frederic Parain) Date: Mon, 13 Mar 2023 16:26:06 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v3] In-Reply-To: References: Message-ID: > Please review this change re-implementing the FieldInfo data structure. > > The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. > > The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). > > More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 > > Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. > > The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. > > Tested with mach5, tier 1 to 7. > > Thank you. Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: SA additional caching from Chris Plummer ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12855/files - new: https://git.openjdk.org/jdk/pull/12855/files/ce1180ef..322b626d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12855&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12855&range=01-02 Stats: 78 lines in 2 files changed: 35 ins; 34 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/12855.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12855/head:pull/12855 PR: https://git.openjdk.org/jdk/pull/12855 From jvernee at openjdk.org Mon Mar 13 16:40:18 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 13 Mar 2023 16:40:18 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v3] In-Reply-To: References: <8b3vVrV22RuhdRoRYacXV0ZeghFGgKkC8S_z-iMrzAQ=.dd84b743-8b51-4281-8f5f-f9eff6207bc7@github.com> Message-ID: <1OXigMWnfCHCkxCI3D29NpgIpgG36Ltlnry1eytOPow=.5f423637-0a12-454e-a31c-57b2fc11123f@github.com> On Mon, 13 Mar 2023 16:20:22 GMT, Martin Doerr wrote: > I guess I should add a couple of Upcalls to my new test. Otherwise, I have only planned to fix TestArrayStructs. Further changes (and maybe new tests) can still get done when working on Big Endian / AIX or when there is a demand. Okay. The reason I ask is because we are looking to get started with the JEP PR, but it might be nice to wrap up this PR first, to avoid any merge conflicts. I also have https://github.com/openjdk/jdk/pull/12883 which would need changes to the PPC call arranger/linker as well (though, it's a pretty simple refactor). I'm trying to figure out what the best order to do things in is. > I'm currently wondering about the TestArrayStructs failures. Passing arrays with up to 7 elements seems to work fine. When I pass 8 elements, the last element of capturedArgs gets observed as 0. When I pass more than 8 elements, element 5 and 6 of capturedArgs get observed as 0. DowncallLinker.invokeInterpBindings has the correct args, but UpcallLinker.invokeInterpBindings doesn't receive the correct values as lowLevelArgs. They contain the wrong zeros. The remaining elements look correct. Do you have an idea what could be going wrong? Otherwise, I'll have to continue debugging. This sounds like there might be a mismatch between the Java and native side. I suggest looking at the assembly generated for the native function for the failing case, and seeing if it matches what is generated by CallArranger. Here is also where adding a CallArranger test can be useful (in test/jdk/java/foreign/callarranger), to test whether the resulting bindings match your expectation for that function descriptor. Also, you might want to check the layout the native compiler uses for the particular struct, and verify that it matches the Java side. (i.e. there's no weird padding or something, it's just a struct of 8 bytes). ------------- PR: https://git.openjdk.org/jdk/pull/12708 From jcking at openjdk.org Mon Mar 13 16:47:32 2023 From: jcking at openjdk.org (Justin King) Date: Mon, 13 Mar 2023 16:47:32 GMT Subject: RFR: JDK-8303184: ZGC incompatible with ASan Message-ID: Update ZGC to work with ASan and fix missing LSan root region registration for ZGC. Currently all ZGC tests will fail on x86 with ASan enabled, as it is unable to reserve the address regions necessary due to overlap with ASan. x86 does not appear to have the address layout detection logic of the other architectures. Other alternatives are port the address layout detection logic to x86 (I was not comfortable doing this) or just disable ZGC when building Hotspot with ASan. ------------- Commit messages: - Merge remote-tracking branch 'upstream/master' into asan-zgc - Register MmapArrayAllocator allocations with LSan - Force ZGC to layout 3 with ASan Changes: https://git.openjdk.org/jdk/pull/13000/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13000&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303184 Stats: 18 lines in 2 files changed: 16 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13000.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13000/head:pull/13000 PR: https://git.openjdk.org/jdk/pull/13000 From coleenp at openjdk.org Mon Mar 13 16:51:20 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 13 Mar 2023 16:51:20 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v3] In-Reply-To: References: Message-ID: On Mon, 13 Mar 2023 16:26:06 GMT, Frederic Parain wrote: >> Please review this change re-implementing the FieldInfo data structure. >> >> The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. >> >> The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). >> >> More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 >> >> Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. >> >> The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. >> >> Tested with mach5, tier 1 to 7. >> >> Thank you. > > Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: > > SA additional caching from Chris Plummer Most minor comments but one .inline.hpp still in an hpp file. I should point out that I only skimmed the SA and JVMCI changes. src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 2653: > 2651: } > 2652: InstanceKlass* iklass = InstanceKlass::cast(klass); > 2653: if (index < 0 ||index > iklass->total_fields_count()) { nit: need space after || src/hotspot/share/oops/fieldInfo.cpp line 45: > 43: } > 44: > 45: void FieldInfo::print_from_growable_array(GrowableArray* array, outputStream* os, ConstantPool* cp) { For consistency, can you make the outputStream parameter first? src/hotspot/share/oops/instanceKlass.hpp line 32: > 30: #include "oops/constMethod.hpp" > 31: #include "oops/constantPool.hpp" > 32: #include "oops/fieldInfo.inline.hpp" This shouldn't have an inline.hpp inclusion. src/hotspot/share/runtime/vmStructs.cpp line 2304: > 2302: declare_constant(FieldInfo::FieldFlags::_ff_generic) \ > 2303: declare_constant(FieldInfo::FieldFlags::_ff_stable) \ > 2304: declare_constant(FieldInfo::FieldFlags::_ff_contended) \ If there are flags that SA doesn't use, like contended, I don't think they should be included in the information that we pass to SA. ------------- Changes requested by coleenp (Reviewer). PR: https://git.openjdk.org/jdk/pull/12855 From coleenp at openjdk.org Mon Mar 13 16:51:25 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 13 Mar 2023 16:51:25 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v3] In-Reply-To: References: Message-ID: On Thu, 9 Mar 2023 20:49:04 GMT, Frederic Parain wrote: >> src/hotspot/share/classfile/classFileParser.cpp line 1491: >> >>> 1489: _temp_field_info = new GrowableArray(total_fields); >>> 1490: >>> 1491: ResourceMark rm(THREAD); >> >> Is the ResourceMark ok here or should it go before allocating _temp_field_info ? > > _temp_field_info must survive after ClassFileParser::parse_fields() has returned, so definitively after the allocation of _temp_field_info. That being said, I don't see any reason to have a ResourceMark here, probably a remain of some debugging/tracing code. I'll remove it. ok, good. The ResourceMark might be a problem with the GrowableArray if it grows. >> src/hotspot/share/classfile/classFileParser.cpp line 1608: >> >>> 1606: fflags.update_injected(true); >>> 1607: AccessFlags aflags; >>> 1608: FieldInfo fi(aflags, (u2)(injected[n].name_index), (u2)(injected[n].signature_index), 0, fflags); >> >> I don't know why there's a cast here until I read more. If the FieldInfo name_index and signature_index fields are only u2 sized, could you pass this as an int and then in the constructor assert that the value doesn't overflow u2 instead? > > The type of name_index and signature_index is const vmSymbolID, because they names and signatures of injected fields do not come from a constant pool, but from the vmSymbol array. ok the cast is fine here. >> src/hotspot/share/oops/fieldStreams.hpp line 104: >> >>> 102: AccessFlags flags; >>> 103: flags.set_flags(field()->access_flags()); >>> 104: return flags; >> >> Did this used to do this for a reason? > > Using the setter rather than the constructor filters out the VM defined flags and keeps only the flags from the class file. I see, thanks. ------------- PR: https://git.openjdk.org/jdk/pull/12855 From coleenp at openjdk.org Mon Mar 13 16:51:29 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 13 Mar 2023 16:51:29 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v3] In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 15:53:03 GMT, Coleen Phillimore wrote: >> Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: >> >> SA additional caching from Chris Plummer > > src/hotspot/share/classfile/classFileParser.cpp line 1634: > >> 1632: for(int i = 0; i < _temp_field_info->length(); i++) { >> 1633: name = _temp_field_info->adr_at(i)->name(_cp); >> 1634: sig = _temp_field_info->adr_at(i)->signature(_cp); > > This checking for duplicates looks like a good candidate for a separate function because parse_fields is so long. I'm adding this comment to remember to file an RFE to look into making this function shorter and factor out this code. Filed a cleanup RFE https://bugs.openjdk.org/browse/JDK-8304069 ------------- PR: https://git.openjdk.org/jdk/pull/12855 From jvernee at openjdk.org Mon Mar 13 16:57:43 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 13 Mar 2023 16:57:43 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v14] In-Reply-To: References: Message-ID: On Thu, 9 Mar 2023 17:29:37 GMT, Martin Doerr wrote: >> Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". >> >> This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). >> >> Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. >> >> There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Will get addressed separately: [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) >> >> The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. >> >> I had to make changes to shared code and code for other platforms: >> 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: >> - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. >> - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. >> - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! >> 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Introduce ABIv2CallArranger for linux ppc64le. test/jdk/java/foreign/TestHFA.java line 32: > 30: * @enablePreview > 31: * @requires ((os.arch == "amd64" | os.arch == "x86_64") & sun.arch.data.model == "64") | os.arch == "aarch64" | os.arch == "ppc64le" | os.arch == "riscv64" > 32: * @requires !vm.musl Not sure if this test should be disabled on musl? ------------- PR: https://git.openjdk.org/jdk/pull/12708 From inakonechnyy at openjdk.org Mon Mar 13 17:29:44 2023 From: inakonechnyy at openjdk.org (Ilarion Nakonechnyy) Date: Mon, 13 Mar 2023 17:29:44 GMT Subject: Integrated: 8302491: NoClassDefFoundError omits the original cause of an error In-Reply-To: References: Message-ID: <1TzQu3SHM08hktYVu5nfIDA9IvklKfOBdl_PvZ5iX_c=.77c1323c-a476-4037-a7ef-b36fb4eb248a@github.com> On Tue, 14 Feb 2023 21:58:01 GMT, Ilarion Nakonechnyy wrote: > The proposed approach added a new function for getting the cause of an exception -`java_lang_Throwable::get_cause_simple `, that gets called within `InstanceKlass::add_initialization_error` if an old one `java_lang_Throwable::get_cause_with_stack_trace` didn't succeed because of an exception during the VM call. The simple function doesn't call the VM for getting a stack trace but fills in any other information about an exception. > > Besides that, the discovering information about an exception was added to `ConstantPoolCacheEntry::save_and_throw_indy_exc` function. > > Jtreg for reproducing the issue also was added to the commit. > The commit was tested with tier1 tests. This pull request has now been integrated. Changeset: 56851075 Author: Ilarion Nakonechnyy Committer: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/5685107579f0f00b5eae881311315cec34c1ddcb Stats: 57 lines in 3 files changed: 21 ins; 13 del; 23 mod 8302491: NoClassDefFoundError omits the original cause of an error Reviewed-by: coleenp, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/12566 From fparain at openjdk.org Mon Mar 13 17:32:35 2023 From: fparain at openjdk.org (Frederic Parain) Date: Mon, 13 Mar 2023 17:32:35 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v3] In-Reply-To: References: Message-ID: <-zydTVMcZURxcBOz8Rj5Gi0DUtqkANHuYnSUStzY0dY=.2c84dc16-4d7a-44fc-a27c-0ffb9f56bec8@github.com> On Mon, 13 Mar 2023 16:41:05 GMT, Coleen Phillimore wrote: >> Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: >> >> SA additional caching from Chris Plummer > > src/hotspot/share/runtime/vmStructs.cpp line 2304: > >> 2302: declare_constant(FieldInfo::FieldFlags::_ff_generic) \ >> 2303: declare_constant(FieldInfo::FieldFlags::_ff_stable) \ >> 2304: declare_constant(FieldInfo::FieldFlags::_ff_contended) \ > > If there are flags that SA doesn't use, like contended, I don't think they should be included in the information that we pass to SA. The contended flag is required to be able to decode the compressed stream, because it signals the presence of an optional part of a field description. The only flags that was not required for the decoding of the stream and was not used by the SA was Stable, and I'll remove it in the next commit. ------------- PR: https://git.openjdk.org/jdk/pull/12855 From xliu at openjdk.org Mon Mar 13 17:32:39 2023 From: xliu at openjdk.org (Xin Liu) Date: Mon, 13 Mar 2023 17:32:39 GMT Subject: Withdrawn: 8301136: Improve unlink() and unlink_all() of ResourceHashtableBase In-Reply-To: References: Message-ID: On Thu, 26 Jan 2023 08:15:37 GMT, Xin Liu wrote: > 1. Apply the same idea of JDK-8300184 to unlink(). > 2. Because ResourceHashtableBase doesn't support copy assignment, client of it has to purge all elements first when it needs to assign it. We would like provide a specialized version called 'unlink_all()'. We don't need to update each node's _next in this case. We only nullify all buckets. > 3. This patch also provides a specialized version of unlink_all() for destructor. We don't even update buckets. it's dead anyway. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/12213 From fparain at openjdk.org Mon Mar 13 17:45:17 2023 From: fparain at openjdk.org (Frederic Parain) Date: Mon, 13 Mar 2023 17:45:17 GMT Subject: RFR: 8303150: DCmd framework unnecessarily creates a DCmd instance on registration In-Reply-To: References: Message-ID: On Mon, 13 Mar 2023 01:19:50 GMT, David Holmes wrote: > When DCmd factories are registered, the factory is passed the number of arguments taken by the DCmd - using a template method `get_num_arguments`. For DCmds that don't extend DCmdWithParser there has to be a static `num_arguments()` method in that class. For DCmds that do extend DCmdWithParser the logic instantiates an instance of the DCmd, extracts its parser and calls its `num_arguments` method which dynamically counts the number of defined arguments. > > Creating an instance of each DCmd and dynamically counting arguments is inefficient and unnecessary, the number of arguments is statically known and easily expressed (in fact many of the JFR DCmds already statically define this). So we add the static `num_arguments()` method to each class that needs it and return the statically counted number of arguments. To ensure the static number and actual number don't get out-of-sync, we keep the original dynamic logic for use in debug builds to assert that the static and dynamic counts are the same. The assert will trigger during a debug build if something does get out of sync, for example if a new DCmd (extending DCmdWithParser) were added but didn't define the static `num_arguments()` method. > > A number of DCmd classes were unnecessarily defining their own dynamic version of `num_arguments` and these are now removed. > > In the template method I use `ENABLE_IF(std::is_convertible::value)` to check we only call this on DCmd classes. This may be unnecessary but it seemed consistent with the other template methods. Note that `std::is_base_of` only works for immediate super types. > > Testing: tiers 1-4 > > Performance: in theory we should see some improvement in startup; in practice it is barely noticeable. > > Thanks. The concern during the initial implementation was that the value returned by num_arguments() and the real number of arguments could become out of sync, but your solution to check that they are consistent only on debug builds addresses this concern. Thank you for fixing this! ------------- Marked as reviewed by fparain (Committer). PR: https://git.openjdk.org/jdk/pull/12994 From cjplummer at openjdk.org Mon Mar 13 18:02:28 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Mon, 13 Mar 2023 18:02:28 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v3] In-Reply-To: <-zydTVMcZURxcBOz8Rj5Gi0DUtqkANHuYnSUStzY0dY=.2c84dc16-4d7a-44fc-a27c-0ffb9f56bec8@github.com> References: <-zydTVMcZURxcBOz8Rj5Gi0DUtqkANHuYnSUStzY0dY=.2c84dc16-4d7a-44fc-a27c-0ffb9f56bec8@github.com> Message-ID: On Mon, 13 Mar 2023 17:29:28 GMT, Frederic Parain wrote: >> src/hotspot/share/runtime/vmStructs.cpp line 2304: >> >>> 2302: declare_constant(FieldInfo::FieldFlags::_ff_generic) \ >>> 2303: declare_constant(FieldInfo::FieldFlags::_ff_stable) \ >>> 2304: declare_constant(FieldInfo::FieldFlags::_ff_contended) \ >> >> If there are flags that SA doesn't use, like contended, I don't think they should be included in the information that we pass to SA. > > The contended flag is required to be able to decode the compressed stream, because it signals the presence of an optional part of a field description. The only flag that was not required for the decoding of the stream and was not used by the SA was Stable, and I'll remove it in the next commit. Leaving it in allows the field to be displayed if an SA user ever dumps a FieldFlags object. Generally speaking it is good to keep these structs complete, or at least complete with any info that might be useful when debugging with SA. ------------- PR: https://git.openjdk.org/jdk/pull/12855 From rkennke at openjdk.org Mon Mar 13 18:43:41 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 13 Mar 2023 18:43:41 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v22] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Re-design LockStack for faster lock-stack depth-check ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/5fe2afcf..0b7be891 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=20-21 Stats: 251 lines in 34 files changed: 28 ins; 144 del; 79 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From eosterlund at openjdk.org Mon Mar 13 18:43:55 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 13 Mar 2023 18:43:55 GMT Subject: RFR: JDK-8303184: ZGC incompatible with ASan In-Reply-To: References: Message-ID: On Mon, 13 Mar 2023 16:37:41 GMT, Justin King wrote: > Update ZGC to work with ASan and fix missing LSan root region registration for ZGC. > > Currently all ZGC tests will fail on x86 with ASan enabled, as it is unable to reserve the address regions necessary due to overlap with ASan. x86 does not appear to have the address layout detection logic of the other architectures. Other alternatives are port the address layout detection logic to x86 (I was not comfortable doing this) or just disable ZGC when building Hotspot with ASan. Have you tried if this a problem with generational ZGC? (zgc_generational branch in the zgc repo) ------------- PR: https://git.openjdk.org/jdk/pull/13000 From fparain at openjdk.org Mon Mar 13 18:51:17 2023 From: fparain at openjdk.org (Frederic Parain) Date: Mon, 13 Mar 2023 18:51:17 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v4] In-Reply-To: References: Message-ID: > Please review this change re-implementing the FieldInfo data structure. > > The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. > > The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). > > More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 > > Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. > > The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. > > Tested with mach5, tier 1 to 7. > > Thank you. Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: Fixes includes and style ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12855/files - new: https://git.openjdk.org/jdk/pull/12855/files/322b626d..12b4f1b4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12855&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12855&range=02-03 Stats: 9 lines in 5 files changed: 3 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/12855.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12855/head:pull/12855 PR: https://git.openjdk.org/jdk/pull/12855 From fparain at openjdk.org Mon Mar 13 18:51:20 2023 From: fparain at openjdk.org (Frederic Parain) Date: Mon, 13 Mar 2023 18:51:20 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v3] In-Reply-To: References: <-zydTVMcZURxcBOz8Rj5Gi0DUtqkANHuYnSUStzY0dY=.2c84dc16-4d7a-44fc-a27c-0ffb9f56bec8@github.com> Message-ID: On Mon, 13 Mar 2023 17:59:03 GMT, Chris Plummer wrote: >> The contended flag is required to be able to decode the compressed stream, because it signals the presence of an optional part of a field description. The only flag that was not required for the decoding of the stream and was not used by the SA was Stable, and I'll remove it in the next commit. > > Leaving it in allows the field to be displayed if an SA user ever dumps a FieldFlags object. Generally speaking it is good to keep these structs complete, or at least complete with any info that might be useful when debugging with SA. The "stable" flag and the related methods have been preserved in the last commit. ------------- PR: https://git.openjdk.org/jdk/pull/12855 From jcking at openjdk.org Mon Mar 13 19:04:53 2023 From: jcking at openjdk.org (Justin King) Date: Mon, 13 Mar 2023 19:04:53 GMT Subject: RFR: JDK-8303184: ZGC incompatible with ASan In-Reply-To: References: Message-ID: <6j54xRLZM57ODtKz4cAqdDkG2xVwr1fUImlvp35W-4A=.9969f3da-2403-4b44-8f11-2a8da4734fcc@github.com> On Mon, 13 Mar 2023 18:41:27 GMT, Erik ?sterlund wrote: > Have you tried if this a problem with generational ZGC? (zgc_generational branch in the zgc repo) I looked at generational ZGC code, it looks to be afflicted by the same thing. https://github.com/openjdk/zgc/blob/zgc_generational/src/hotspot/cpu/x86/gc/z/zAddress_x86.cpp ------------- PR: https://git.openjdk.org/jdk/pull/13000 From kvn at openjdk.org Mon Mar 13 19:26:08 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 13 Mar 2023 19:26:08 GMT Subject: RFR: 8303431: [JVMCI] libgraal annotation API [v5] In-Reply-To: References: Message-ID: <94K1Ap5beXB44rCRuHL4PZfbSFizd93ZrFlyG9sTXj4=.e89d2ec7-f8ff-4be7-8f19-dbc2034f1934@github.com> On Wed, 8 Mar 2023 22:59:23 GMT, Doug Simon wrote: >> This PR extends JVMCI with new API (`jdk.vm.ci.meta.Annotated`) for accessing annotations. The main differences from `java.lang.reflect.AnnotatedElement` are: >> * Each `Annotated` method explicitly specifies the annotation type(s) for which it wants annotation data. That is, there is no direct equivalent of `AnnotatedElement.getAnnotations()`. >> * Annotation data is returned in a map-like object (of type `jdk.vm.ci.meta.AnnotationData`) instead of in an `Annotation` object. This works better for libgraal as it avoids the need for annotation types to be loaded and included in libgraal. >> >> To demonstrate the new API, here's an example in terms `java.lang.reflect.AnnotatedElement` (which `ResolvedJavaType` implements): >> >> ResolvedJavaMethod method = ...; >> ExplodeLoop a = method.getAnnotation(ExplodeLoop.class); >> return switch (a.kind()) { >> case FULL_UNROLL -> LoopExplosionKind.FULL_UNROLL; >> case FULL_UNROLL_UNTIL_RETURN -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; >> ... >> } >> >> >> The same code using the new API: >> >> >> ResolvedJavaMethod method = ...; >> ResolvedJavaType explodeLoopType = ...; >> AnnotationData a = method.getAnnotationDataFor(explodeLoopType); >> return switch (a.getEnum("kind").getName()) { >> case "FULL_UNROLL" -> LoopExplosionKind.FULL_UNROLL; >> case "FULL_UNROLL_UNTIL_RETURN" -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; >> ... >> } >> >> >> The implementation relies on new methods in `jdk.internal.vm.VMSupport` for parsing annotations and serializing/deserializing to/from a byte array. This allows the annotation data to be passed from the HotSpot heap to the libgraal heap. > > Doug Simon has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Merge remote-tracking branch 'openjdk-jdk/master' into JDK-8303431 > - switched to use of lists and maps instead of arrays > - fixed whitespace > - added support for inherited annotations > - Merge branch 'master' into JDK-8303431 > - made AnnotationDataDecoder package-private > - add annotation API to JVMCI test/jdk/jdk/internal/vm/TestTranslatedException.java line 61: > 59: encodeDecode(throwable); > 60: } > 61: Why this was removed? ------------- PR: https://git.openjdk.org/jdk/pull/12810 From kvn at openjdk.org Mon Mar 13 19:31:20 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 13 Mar 2023 19:31:20 GMT Subject: RFR: 8303431: [JVMCI] libgraal annotation API [v5] In-Reply-To: References: Message-ID: <4aRnY_K2BIlWr4MII9PaMop0Qi2gqcvSSn7Heqlxuvw=.063e4544-5fcd-4fab-9b2c-3bcaeffcd381@github.com> On Wed, 8 Mar 2023 22:59:23 GMT, Doug Simon wrote: >> This PR extends JVMCI with new API (`jdk.vm.ci.meta.Annotated`) for accessing annotations. The main differences from `java.lang.reflect.AnnotatedElement` are: >> * Each `Annotated` method explicitly specifies the annotation type(s) for which it wants annotation data. That is, there is no direct equivalent of `AnnotatedElement.getAnnotations()`. >> * Annotation data is returned in a map-like object (of type `jdk.vm.ci.meta.AnnotationData`) instead of in an `Annotation` object. This works better for libgraal as it avoids the need for annotation types to be loaded and included in libgraal. >> >> To demonstrate the new API, here's an example in terms `java.lang.reflect.AnnotatedElement` (which `ResolvedJavaType` implements): >> >> ResolvedJavaMethod method = ...; >> ExplodeLoop a = method.getAnnotation(ExplodeLoop.class); >> return switch (a.kind()) { >> case FULL_UNROLL -> LoopExplosionKind.FULL_UNROLL; >> case FULL_UNROLL_UNTIL_RETURN -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; >> ... >> } >> >> >> The same code using the new API: >> >> >> ResolvedJavaMethod method = ...; >> ResolvedJavaType explodeLoopType = ...; >> AnnotationData a = method.getAnnotationDataFor(explodeLoopType); >> return switch (a.getEnum("kind").getName()) { >> case "FULL_UNROLL" -> LoopExplosionKind.FULL_UNROLL; >> case "FULL_UNROLL_UNTIL_RETURN" -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; >> ... >> } >> >> >> The implementation relies on new methods in `jdk.internal.vm.VMSupport` for parsing annotations and serializing/deserializing to/from a byte array. This allows the annotation data to be passed from the HotSpot heap to the libgraal heap. > > Doug Simon has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Merge remote-tracking branch 'openjdk-jdk/master' into JDK-8303431 > - switched to use of lists and maps instead of arrays > - fixed whitespace > - added support for inherited annotations > - Merge branch 'master' into JDK-8303431 > - made AnnotationDataDecoder package-private > - add annotation API to JVMCI I looked on Hotspot, JVMCI and VMSupport.java changes. But you need to ask Tom to look on JVMCI changes in details. And someone from core-libs who familiar with Annotations have to comment on your implementation in general because I am not expert. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.org/jdk/pull/12810 From rkennke at openjdk.org Mon Mar 13 20:02:45 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 13 Mar 2023 20:02:45 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v23] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: X86 parts ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/0b7be891..75db4f0a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=21-22 Stats: 143 lines in 14 files changed: 0 ins; 124 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From pchilanomate at openjdk.org Mon Mar 13 20:18:03 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 13 Mar 2023 20:18:03 GMT Subject: Integrated: 8303908: Add missing check in VTMS_transition_disable_for_all() for suspend mode In-Reply-To: References: Message-ID: On Thu, 9 Mar 2023 18:55:06 GMT, Patricio Chilano Mateo wrote: > Please review this small fix. A suspender is a JvmtiVTMSTransitionDisabler monopolist, meaning VTMS_transition_disable_for_all() should not return while there is any active jvmtiVTMSTransitionDisabler. The code though is checking for active "all-disablers" but it's missing the check for active "single disablers". > I attached a simple reproducer to the bug which I used to test the patch. Not sure if it was worth adding a test so the patch contains just the fix. > > Thanks, > Patricio This pull request has now been integrated. Changeset: a8f662ec Author: Patricio Chilano Mateo URL: https://git.openjdk.org/jdk/commit/a8f662ecb2cf13ba7fa499b9a9150da4318306a8 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod 8303908: Add missing check in VTMS_transition_disable_for_all() for suspend mode Reviewed-by: sspitsyn, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/12956 From dnsimon at openjdk.org Mon Mar 13 20:27:09 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 13 Mar 2023 20:27:09 GMT Subject: RFR: 8303431: [JVMCI] libgraal annotation API [v5] In-Reply-To: <94K1Ap5beXB44rCRuHL4PZfbSFizd93ZrFlyG9sTXj4=.e89d2ec7-f8ff-4be7-8f19-dbc2034f1934@github.com> References: <94K1Ap5beXB44rCRuHL4PZfbSFizd93ZrFlyG9sTXj4=.e89d2ec7-f8ff-4be7-8f19-dbc2034f1934@github.com> Message-ID: On Mon, 13 Mar 2023 19:23:39 GMT, Vladimir Kozlov wrote: >> Doug Simon has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: >> >> - Merge remote-tracking branch 'openjdk-jdk/master' into JDK-8303431 >> - switched to use of lists and maps instead of arrays >> - fixed whitespace >> - added support for inherited annotations >> - Merge branch 'master' into JDK-8303431 >> - made AnnotationDataDecoder package-private >> - add annotation API to JVMCI > > test/jdk/jdk/internal/vm/TestTranslatedException.java line 61: > >> 59: encodeDecode(throwable); >> 60: } >> 61: > > Why this was removed? Because it does exactly the same thing as `encodeDecodeTest`. It should have been cleaned up in the original PR that introduced this test. ------------- PR: https://git.openjdk.org/jdk/pull/12810 From coleenp at openjdk.org Mon Mar 13 20:42:29 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 13 Mar 2023 20:42:29 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v4] In-Reply-To: References: Message-ID: <3YhDOFYnbJ4QsrXUQbUQfFbXHb75eK1Mowuv9yYaXqE=.62fc2fd0-5384-440d-919a-1c59e9b2f3fb@github.com> On Mon, 13 Mar 2023 18:51:17 GMT, Frederic Parain wrote: >> Please review this change re-implementing the FieldInfo data structure. >> >> The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. >> >> The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). >> >> More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 >> >> Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. >> >> The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. >> >> Tested with mach5, tier 1 to 7. >> >> Thank you. > > Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: > > Fixes includes and style All my comments are addressed. Thank you! This is significant work. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.org/jdk/pull/12855 From mdoerr at openjdk.org Mon Mar 13 21:00:31 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 13 Mar 2023 21:00:31 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v3] In-Reply-To: <1OXigMWnfCHCkxCI3D29NpgIpgG36Ltlnry1eytOPow=.5f423637-0a12-454e-a31c-57b2fc11123f@github.com> References: <8b3vVrV22RuhdRoRYacXV0ZeghFGgKkC8S_z-iMrzAQ=.dd84b743-8b51-4281-8f5f-f9eff6207bc7@github.com> <1OXigMWnfCHCkxCI3D29NpgIpgG36Ltlnry1eytOPow=.5f423637-0a12-454e-a31c-57b2fc11123f@github.com> Message-ID: <8I_sDu55v8ThrFyF4mMWbKOOKeGkfYvqo66JLNnULUY=.830fe48d-b65f-437c-82fa-9e7dac76c7ab@github.com> On Mon, 13 Mar 2023 16:37:18 GMT, Jorn Vernee wrote: > > I'm currently wondering about the TestArrayStructs failures. Passing arrays with up to 7 elements seems to work fine. When I pass 8 elements, the last element of capturedArgs gets observed as 0. When I pass more than 8 elements, element 5 and 6 of capturedArgs get observed as 0. > > DowncallLinker.invokeInterpBindings has the correct args, but UpcallLinker.invokeInterpBindings doesn't receive the correct values as lowLevelArgs. They contain the wrong zeros. The remaining elements look correct. > > Do you have an idea what could be going wrong? Otherwise, I'll have to continue debugging. > > This sounds like there might be a mismatch between the Java and native side. I suggest looking at the assembly generated for the native function for the failing case, and seeing if it matches what is generated by CallArranger. Here is also where adding a CallArranger test can be useful (in test/jdk/java/foreign/callarranger), to test whether the resulting bindings match your expectation for that function descriptor. > > Also, you might want to check the layout the native compiler uses for the particular struct, and verify that it matches the Java side. (i.e. there's no weird padding or something, it's just a struct of 8 bytes). Note that argument and return value passing works. I'm getting all values back. So, the native side seems to be ok. Only (one or two) values in `returnBox` are broken. ------------- PR: https://git.openjdk.org/jdk/pull/12708 From coleenp at openjdk.org Mon Mar 13 21:07:10 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 13 Mar 2023 21:07:10 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v2] In-Reply-To: <-Kj1YJ_nRa4nJtaxg3UR8uWhde6vIG1Jl-FFakGnHy4=.a41c6149-912b-4a66-8b1e-634bd27cdebb@github.com> References: <-Kj1YJ_nRa4nJtaxg3UR8uWhde6vIG1Jl-FFakGnHy4=.a41c6149-912b-4a66-8b1e-634bd27cdebb@github.com> Message-ID: <33uTpOt8ALbYOl5axezzxriVn4V1h860H3YWEbJ-PDY=.429dbaaa-e75e-47fb-88aa-3bd451ee4662@github.com> On Thu, 9 Mar 2023 21:18:19 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Interpreter optimization and comments src/hotspot/cpu/x86/interp_masm_x86.cpp line 2075: > 2073: movptr(cache, Address(rbp, frame::interpreter_frame_cache_offset * wordSize)); > 2074: movptr(cache, Address(cache, in_bytes(ConstantPoolCache::invokedynamic_entries_offset()))); > 2075: if (is_power_of_2(sizeof(ResolvedIndyEntry))) { This was a good suggestion but I wonder if we should assert ResolvedIndyEntry is a power of 2 so we know if we change the size and make it go the slower path? Or is 32 bit not a power of two and we need this? ------------- PR: https://git.openjdk.org/jdk/pull/12778 From coleenp at openjdk.org Mon Mar 13 21:17:06 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 13 Mar 2023 21:17:06 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v2] In-Reply-To: <-Kj1YJ_nRa4nJtaxg3UR8uWhde6vIG1Jl-FFakGnHy4=.a41c6149-912b-4a66-8b1e-634bd27cdebb@github.com> References: <-Kj1YJ_nRa4nJtaxg3UR8uWhde6vIG1Jl-FFakGnHy4=.a41c6149-912b-4a66-8b1e-634bd27cdebb@github.com> Message-ID: <3QfQXArLyZzTcdg4r9bSGJKmnoG_YY8OFOJ0eLz2rYY=.e83d9ff2-8c7e-471c-b250-97a92e7db1e5@github.com> On Thu, 9 Mar 2023 21:18:19 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Interpreter optimization and comments I have a couple of very minor comments. This change is great. Thank you! src/hotspot/cpu/x86/templateTable_x86.cpp line 2798: > 2796: bool is_invokevirtual, > 2797: bool is_invokevfinal, /*unused*/ > 2798: bool is_invokedynamic /*unused*/) { Can you remove the parameter since the s390 port is here? src/hotspot/share/oops/resolvedIndyEntry.hpp line 112: > 110: set_flags(has_appendix); > 111: // Set the method last since it is read lock free. > 112: // Resolution is indicated by whether or not he method is set. typo: he -> the ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.org/jdk/pull/12778 From matsaave at openjdk.org Mon Mar 13 21:26:06 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Mon, 13 Mar 2023 21:26:06 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v2] In-Reply-To: <33uTpOt8ALbYOl5axezzxriVn4V1h860H3YWEbJ-PDY=.429dbaaa-e75e-47fb-88aa-3bd451ee4662@github.com> References: <-Kj1YJ_nRa4nJtaxg3UR8uWhde6vIG1Jl-FFakGnHy4=.a41c6149-912b-4a66-8b1e-634bd27cdebb@github.com> <33uTpOt8ALbYOl5axezzxriVn4V1h860H3YWEbJ-PDY=.429dbaaa-e75e-47fb-88aa-3bd451ee4662@github.com> Message-ID: On Mon, 13 Mar 2023 21:04:22 GMT, Coleen Phillimore wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> Interpreter optimization and comments > > src/hotspot/cpu/x86/interp_masm_x86.cpp line 2075: > >> 2073: movptr(cache, Address(rbp, frame::interpreter_frame_cache_offset * wordSize)); >> 2074: movptr(cache, Address(cache, in_bytes(ConstantPoolCache::invokedynamic_entries_offset()))); >> 2075: if (is_power_of_2(sizeof(ResolvedIndyEntry))) { > > This was a good suggestion but I wonder if we should assert ResolvedIndyEntry is a power of 2 so we know if we change the size and make it go the slower path? Or is 32 bit not a power of two and we need this? Currently the structure is a power of two on 64 bits but this is not the case on 32 bit systems. ------------- PR: https://git.openjdk.org/jdk/pull/12778 From dnsimon at openjdk.org Mon Mar 13 21:57:43 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 13 Mar 2023 21:57:43 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v4] In-Reply-To: References: Message-ID: <5apLgAhjSwiK-sv6Xrl9yZctZTAe_GahWGyk8rUYgvk=.1af11917-7547-48ba-b958-01c6ef6f9f18@github.com> On Mon, 13 Mar 2023 18:51:17 GMT, Frederic Parain wrote: >> Please review this change re-implementing the FieldInfo data structure. >> >> The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. >> >> The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). >> >> More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 >> >> Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. >> >> The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. >> >> Tested with mach5, tier 1 to 7. >> >> Thank you. > > Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: > > Fixes includes and style src/hotspot/share/jvmci/jvmciEnv.cpp line 1439: > 1437: JNIAccessMark jni(this, THREAD); > 1438: jobject result = jni()->NewObject(JNIJVMCI::FieldInfo::clazz(), > 1439: JNIJVMCI::VMFlag::constructor(), `JNIJVMCI::VMFlag::constructor()` is the wrong constructor. src/hotspot/share/jvmci/jvmciEnv.hpp line 149: > 147: }; > 148: > 149: extern JNIEXPORT jobjectArray c2v_getDeclaredFieldsInfo(JNIEnv* env, jobject, jobject, jlong); What's the purpose of this declaration? I don't think you need it or the `friend` declaration below since `new_FieldInfo(FieldInfo* fieldinfo, JVMCI_TRAPS)` is public. src/hotspot/share/jvmci/vmStructs_jvmci.cpp line 416: > 414: declare_constant(FieldInfo::FieldFlags::_ff_injected) \ > 415: declare_constant(FieldInfo::FieldFlags::_ff_stable) \ > 416: declare_constant(FieldInfo::FieldFlags::_ff_generic) \ I don't think `_ff_generic` is used in the JVMCI Java code so this entry can be deleted. Please double check the other entries. src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotConstantPool.java line 814: > 812: HotSpotResolvedObjectTypeImpl resolvedHolder; > 813: try { > 814: resolvedHolder = compilerToVM().resolveFieldInPool(this, index, (HotSpotResolvedJavaMethodImpl) method, (byte) opcode, info); Please update the javadoc for `CompilerToVM.resolveFieldInPool` to reflect the expanded definition of `info`. src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotResolvedObjectTypeImpl.java line 88: > 86: > 87: /** > 88: * Lazily initialized cache for FieldInfo nit: missing `.` at end of sentence src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ResolvedJavaField.java line 48: > 46: * Returns VM internal flags associated with this field > 47: */ > 48: int getInternalModifiers(); We've never exposed the internal modifiers before in public JVMCI API and we should refrain from doing so until there's a good reason to do so. Please remove this method. test/hotspot/jtreg/compiler/jvmci/jdk.vm.ci.runtime.test/src/jdk/vm/ci/runtime/test/TestResolvedJavaField.java line 97: > 95: > 96: @Test > 97: public void getInternalModifiersTest() { No need for this test since the `getInternalModifiers` method should be removed. ------------- PR: https://git.openjdk.org/jdk/pull/12855 From dnsimon at openjdk.org Mon Mar 13 22:06:46 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 13 Mar 2023 22:06:46 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v2] In-Reply-To: <-Kj1YJ_nRa4nJtaxg3UR8uWhde6vIG1Jl-FFakGnHy4=.a41c6149-912b-4a66-8b1e-634bd27cdebb@github.com> References: <-Kj1YJ_nRa4nJtaxg3UR8uWhde6vIG1Jl-FFakGnHy4=.a41c6149-912b-4a66-8b1e-634bd27cdebb@github.com> Message-ID: On Thu, 9 Mar 2023 21:18:19 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Interpreter optimization and comments As communicated to Matias earlier via email, the JVMCI changes in this PR look fine. ------------- Marked as reviewed by dnsimon (Committer). PR: https://git.openjdk.org/jdk/pull/12778 From coleenp at openjdk.org Mon Mar 13 22:06:47 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 13 Mar 2023 22:06:47 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v2] In-Reply-To: <-Kj1YJ_nRa4nJtaxg3UR8uWhde6vIG1Jl-FFakGnHy4=.a41c6149-912b-4a66-8b1e-634bd27cdebb@github.com> References: <-Kj1YJ_nRa4nJtaxg3UR8uWhde6vIG1Jl-FFakGnHy4=.a41c6149-912b-4a66-8b1e-634bd27cdebb@github.com> Message-ID: On Thu, 9 Mar 2023 21:18:19 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Interpreter optimization and comments @dougxc Can you have a look at the jvmci changes in this PR also? ------------- PR: https://git.openjdk.org/jdk/pull/12778 From dcubed at openjdk.org Mon Mar 13 23:01:17 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Mon, 13 Mar 2023 23:01:17 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v22] In-Reply-To: References: Message-ID: On Mon, 13 Mar 2023 18:43:41 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Re-design LockStack for faster lock-stack depth-check src/hotspot/share/runtime/lockStack.cpp line 35: > 33: int LockStack::end_offset() { > 34: return in_bytes(JavaThread::lock_stack_base_offset()) + CAPACITY * oopSize; > 35: } It's a little odd to have function `end_offset()` be defined before the `LockStack::LockStack()` constructor. src/hotspot/share/runtime/lockStack.hpp line 51: > 49: public: > 50: static ByteSize offset_offset() { return byte_offset_of(LockStack, _offset); } > 51: static ByteSize base_offset() { return byte_offset_of(LockStack, _base); } nit - you might want to align these '{' indents. src/hotspot/share/runtime/lockStack.inline.hpp line 84: > 82: if (_base[i] == o) { > 83: validate("post-contains"); > 84: validate("post-contains"); Two `validate()` calls in a row. Probably a cut-n-paste error. src/hotspot/share/runtime/synchronizer.cpp line 514: > 512: } > 513: } > 514: } else { Consider adding a comment after L513 and before L514: ` // No room on the lock_stack so fall-through to inflate-enter.` src/hotspot/share/runtime/vmStructs.cpp line 704: > 702: nonstatic_field(JavaThread, _lock_stack, LockStack) \ > 703: nonstatic_field(LockStack, _offset, int) \ > 704: nonstatic_field(LockStack, _base[0], oop) \ It surprises me that you can specify `_base[0]` here. nit: the indent before the backslash is off. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From dcubed at openjdk.org Mon Mar 13 23:01:08 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Mon, 13 Mar 2023 23:01:08 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v23] In-Reply-To: References: Message-ID: On Mon, 13 Mar 2023 20:02:45 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > X86 parts Reviewed v21 changes except for riscv. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From dcubed at openjdk.org Mon Mar 13 23:13:56 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Mon, 13 Mar 2023 23:13:56 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v23] In-Reply-To: References: Message-ID: On Mon, 13 Mar 2023 20:02:45 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > X86 parts Also reviewed v22 changes; no comments on those. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From fparain at openjdk.org Mon Mar 13 23:28:41 2023 From: fparain at openjdk.org (Frederic Parain) Date: Mon, 13 Mar 2023 23:28:41 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v2] In-Reply-To: <-Kj1YJ_nRa4nJtaxg3UR8uWhde6vIG1Jl-FFakGnHy4=.a41c6149-912b-4a66-8b1e-634bd27cdebb@github.com> References: <-Kj1YJ_nRa4nJtaxg3UR8uWhde6vIG1Jl-FFakGnHy4=.a41c6149-912b-4a66-8b1e-634bd27cdebb@github.com> Message-ID: On Thu, 9 Mar 2023 21:18:19 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Interpreter optimization and comments Marked as reviewed by fparain (Committer). ------------- PR: https://git.openjdk.org/jdk/pull/12778 From cjplummer at openjdk.org Tue Mar 14 01:28:57 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 14 Mar 2023 01:28:57 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v4] In-Reply-To: References: Message-ID: On Mon, 13 Mar 2023 18:51:17 GMT, Frederic Parain wrote: >> Please review this change re-implementing the FieldInfo data structure. >> >> The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. >> >> The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). >> >> More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 >> >> Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. >> >> The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. >> >> Tested with mach5, tier 1 to 7. >> >> Thank you. > > Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: > > Fixes includes and style Changes requested by cjplummer (Reviewer). src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/Field.java line 75: > 73: int initialValueIndex; > 74: int genericSignatureIndex; > 75: int contendedGroup; It seems that these should all be shorts. All the getter methods are casting them to short. src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/Field.java line 99: > 97: if (fieldIsInitialized(fieldInfoValues.fieldFlags)) fieldInfoValues.initialValueIndex = crs.readInt(); // read initial value index > 98: if (fieldIsGeneric(fieldInfoValues.fieldFlags)) fieldInfoValues.genericSignatureIndex = crs.readInt(); // read generic signature index > 99: if (fieldIsContended(fieldInfoValues.fieldFlags)) fieldInfoValues.contendedGroup = crs.readInt(); // read contended group Column with is too wide. These lines would be easier to read if you made each one multiple lines with curly braces. src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/Field.java line 107: > 105: int javafieldsCount = crs.readInt(); // read num_java_fields > 106: int VMFieldsCount = crs.readInt(); // read num_injected_fields; > 107: int numFields = javafieldsCount + VMFieldsCount; VMFieldsCount -> vmFieldsCount, or maybe just use num_java_fields and num_injected_fields src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/InstanceKlass.java line 285: > 283: public short getFieldNameIndex(int index) { > 284: if (index >= getJavaFieldsCount()) throw new IndexOutOfBoundsException("not a Java field;"); > 285: return (short)getField(index).getNameIndex(); Cast to short not needed src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/InstanceKlass.java line 303: > 301: public short getFieldSignatureIndex(int index) { > 302: if (index >= getJavaFieldsCount()) throw new IndexOutOfBoundsException("not a Java field;"); > 303: return (short)getField(index).getGenericSignatureIndex(); Cast to short is not needed src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/InstanceKlass.java line 321: > 319: public short getFieldInitialValueIndex(int index) { > 320: if (index >= getJavaFieldsCount()) throw new IndexOutOfBoundsException("not a Java field;"); > 321: return (short)getField(index).getInitialValueIndex(); cast to short is not needed src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/InstanceKlass.java line 325: > 323: > 324: public int getFieldOffset(int index) { > 325: return (int)getField(index).getOffset(); Cast to int is not needed ------------- PR: https://git.openjdk.org/jdk/pull/12855 From fyang at openjdk.org Tue Mar 14 01:44:03 2023 From: fyang at openjdk.org (Fei Yang) Date: Tue, 14 Mar 2023 01:44:03 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v22] In-Reply-To: References: Message-ID: On Mon, 13 Mar 2023 18:43:41 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Re-design LockStack for faster lock-stack depth-check src/hotspot/cpu/aarch64/c2_CodeStubs_aarch64.cpp line 87: > 85: __ sub(t, t, oopSize); > 86: __ str(t, Address(rthread, JavaThread::lock_stack_offset_offset())); > 87: It looks to me that the '_offset' of LockStack should be updated with ldrw, subw and strw instructions here. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From cjplummer at openjdk.org Tue Mar 14 02:03:32 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 14 Mar 2023 02:03:32 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v4] In-Reply-To: References: Message-ID: On Mon, 13 Mar 2023 18:51:17 GMT, Frederic Parain wrote: >> Please review this change re-implementing the FieldInfo data structure. >> >> The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. >> >> The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). >> >> More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 >> >> Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. >> >> The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. >> >> Tested with mach5, tier 1 to 7. >> >> Thank you. > > Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: > > Fixes includes and style src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/InstanceKlass.java line 108: > 106: CLASS_STATE_INITIALIZATION_ERROR = db.lookupIntConstant("InstanceKlass::initialization_error").intValue(); > 107: // We need a new fieldsCache each time we attach. > 108: fieldsCache = new HashMap(); This should probably be a WeakHashMap. I tried it and it seems to work (or at least didn't cause any problems). However, when doing a heap dump I didn't notice the table being any smaller on exit when it was made weak, even though there were numerous GC's while dumping the heap. The is the Address of the hotspot InstanceKlass instance, and this Address is referenced by the SA InstanceKlass mirror. So theoretically when the reference to the mirror goes way, then the cache entry can be cleared. ------------- PR: https://git.openjdk.org/jdk/pull/12855 From jvernee at openjdk.org Tue Mar 14 03:05:41 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 14 Mar 2023 03:05:41 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v3] In-Reply-To: <8I_sDu55v8ThrFyF4mMWbKOOKeGkfYvqo66JLNnULUY=.830fe48d-b65f-437c-82fa-9e7dac76c7ab@github.com> References: <8b3vVrV22RuhdRoRYacXV0ZeghFGgKkC8S_z-iMrzAQ=.dd84b743-8b51-4281-8f5f-f9eff6207bc7@github.com> <1OXigMWnfCHCkxCI3D29NpgIpgG36Ltlnry1eytOPow=.5f423637-0a12-454e-a31c-57b2fc11123f@github.com> <8I_sDu55v8ThrFyF4mMWbKOOKeGkfYvqo66 JLNnULUY=.830fe48d-b65f-437c-82fa-9e7dac76c7ab@github.com> Message-ID: On Mon, 13 Mar 2023 20:57:22 GMT, Martin Doerr wrote: > Note that argument and return value passing works. I'm getting all values back. So, the native side seems to be ok. Only (one or two) values in `returnBox` are broken. You mean you tested by returning each element of the array one by one? If so, that rules out an issue with the downcall, or the struct not being in the right format (in the register), I think. I suggest checking the bindings generated for the upcall, and seeing if they match what the native code does. This could be done in `jshell` for instance: $ .\build\windows-fastdebug\images\jdk\bin\jshell.exe --enable-preview '--add-exports=java.base/jdk.internal.foreign.abi.x64.windows=ALL-UNNAMED' '--add-exports=java.base/jdk.internal.foreign.abi=ALL-UNNAMED' jshell> import java.lang.foreign.* jshell> import java.lang.invoke.* jshell> import static java.lang.foreign.ValueLayout.* jshell> import jdk.internal.foreign.abi.x64.windows.CallArranger jshell> MemoryLayout.structLayout(MemoryLayout.sequenceLayout(8, JAVA_BYTE)) $5 ==> [[8:b8]] jshell> FunctionDescriptor.of($5, $5, JAVA_BYTE, JAVA_BYTE, JAVA_BYTE, JAVA_BYTE, JAVA_BYTE, JAVA_BYTE, JAVA_BYTE, JAVA_BYTE) $6 ==> ([[8:b8]]b8b8b8b8b8b8b8b8)[[8:b8]] jshell> CallArranger.getBindings($6.toMethodType(), $6, true) $7 ==> Bindings[callingSequence=jdk.internal.foreign.abi.CallingSequence at 396e2f39, isInMemoryReturn=false] jshell> System.out.println($7.callingSequence().asString()) CallingSequence: { callerMethodType: (long,int,int,int,int,int,int,int,int)long calleeMethodType: (MemorySegment,byte,byte,byte,byte,byte,byte,byte,byte)MemorySegment FunctionDescriptor: ([[8:b8]]b8b8b8b8b8b8b8b8)[[8:b8]] Argument Bindings: 0: [Allocate[size=8, alignment=1], Dup[], VMLoad[storage=VMStorage[type=0, segmentMaskOrSize=15, indexOrOffset=1, debugName=rcx], type=long], BufferStore[offset=0, type=long, byteWidth=8]] 1: [VMLoad[storage=VMStorage[type=0, segmentMaskOrSize=15, indexOrOffset=2, debugName=rdx], type=int], INT_TO_BYTE] 2: [VMLoad[storage=VMStorage[type=0, segmentMaskOrSize=15, indexOrOffset=8, debugName=r8], type=int], INT_TO_BYTE] 3: [VMLoad[storage=VMStorage[type=0, segmentMaskOrSize=15, indexOrOffset=9, debugName=r9], type=int], INT_TO_BYTE] 4: [VMLoad[storage=VMStorage[type=3, segmentMaskOrSize=8, indexOrOffset=0, debugName=Stack at 0], type=int], INT_TO_BYTE] 5: [VMLoad[storage=VMStorage[type=3, segmentMaskOrSize=8, indexOrOffset=8, debugName=Stack at 8], type=int], INT_TO_BYTE] 6: [VMLoad[storage=VMStorage[type=3, segmentMaskOrSize=8, indexOrOffset=16, debugName=Stack at 16], type=int], INT_TO_BYTE] 7: [VMLoad[storage=VMStorage[type=3, segmentMaskOrSize=8, indexOrOffset=24, debugName=Stack at 24], type=int], INT_TO_BYTE] 8: [VMLoad[storage=VMStorage[type=3, segmentMaskOrSize=8, indexOrOffset=32, debugName=Stack at 32], type=int], INT_TO_BYTE] Return bindings: [BufferLoad[offset=0, type=long, byteWidth=8], VMStore[storage=VMStorage[type=0, segmentMaskOrSize=15, indexOrOffset=0, debugName=rax], type=long]] } ------------- PR: https://git.openjdk.org/jdk/pull/12708 From dholmes at openjdk.org Tue Mar 14 04:50:27 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 14 Mar 2023 04:50:27 GMT Subject: RFR: 8301244: Tidy up compiler specific warnings files [v10] In-Reply-To: References: Message-ID: On Sat, 11 Mar 2023 17:17:06 GMT, Julian Waters wrote: >> Cleans up some code in compilerWarnings_*.hpp files to be slightly neater > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > Revert to initial name Seems okay. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/12255 From dholmes at openjdk.org Tue Mar 14 04:56:27 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 14 Mar 2023 04:56:27 GMT Subject: RFR: 8302191: Performance degradation for float/double modulo on Linux [v12] In-Reply-To: References: Message-ID: On Sun, 12 Mar 2023 13:21:18 GMT, Jan Kratochvil wrote: >> I have OCA already processed/approved. I am not Author but my Author request is being processed these days (sent to Rob McKenna). >> I did regression test x86_64 OpenJDK-8. I will leave other regression testing on GHA. >> The patch (and former GCC performance regression) affects only x86_64+i686. > > Jan Kratochvil has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: > > - Fix missing SharedRuntime::frem and SharedRuntime::drem on aarch64. > - bugreported by sviswa7. > - Merge branch 'master' into modulo > - Fix #endif comment - found by dholmes-ora. > - Merge branch 'master' into modulo > - Fix win32 broken build. > - Merge remote-tracking branch 'origin/master' into modulo > - Always include the _WIN64 workaround - a review by dholmes-ora. > - Remove comments to be moved to JBS (Bug System) - a review by jddarcy. > - Uppercase L - a review by turbanoff. > - Fix copyright author. > - ... and 3 more: https://git.openjdk.org/jdk/compare/9f5d42ca...f03d4cdf These changes seem good to go now, but I'd like to run them through our CI testing first. Thanks. ------------- PR: https://git.openjdk.org/jdk/pull/12508 From dholmes at openjdk.org Tue Mar 14 05:12:21 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 14 Mar 2023 05:12:21 GMT Subject: RFR: 8303150: DCmd framework unnecessarily creates a DCmd instance on registration In-Reply-To: References: Message-ID: On Mon, 13 Mar 2023 17:42:34 GMT, Frederic Parain wrote: >> When DCmd factories are registered, the factory is passed the number of arguments taken by the DCmd - using a template method `get_num_arguments`. For DCmds that don't extend DCmdWithParser there has to be a static `num_arguments()` method in that class. For DCmds that do extend DCmdWithParser the logic instantiates an instance of the DCmd, extracts its parser and calls its `num_arguments` method which dynamically counts the number of defined arguments. >> >> Creating an instance of each DCmd and dynamically counting arguments is inefficient and unnecessary, the number of arguments is statically known and easily expressed (in fact many of the JFR DCmds already statically define this). So we add the static `num_arguments()` method to each class that needs it and return the statically counted number of arguments. To ensure the static number and actual number don't get out-of-sync, we keep the original dynamic logic for use in debug builds to assert that the static and dynamic counts are the same. The assert will trigger during a debug build if something does get out of sync, for example if a new DCmd (extending DCmdWithParser) were added but didn't define the static `num_arguments()` method. >> >> A number of DCmd classes were unnecessarily defining their own dynamic version of `num_arguments` and these are now removed. >> >> In the template method I use `ENABLE_IF(std::is_convertible::value)` to check we only call this on DCmd classes. This may be unnecessary but it seemed consistent with the other template methods. Note that `std::is_base_of` only works for immediate super types. >> >> Testing: tiers 1-4 >> >> Performance: in theory we should see some improvement in startup; in practice it is barely noticeable. >> >> Thanks. > > The concern during the initial implementation was that the value returned by num_arguments() and the real number of arguments could become out of sync, but your solution to check that they are consistent only on debug builds addresses this concern. > Thank you for fixing this! Thanks for the review @fparain ! ------------- PR: https://git.openjdk.org/jdk/pull/12994 From dholmes at openjdk.org Tue Mar 14 05:29:23 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 14 Mar 2023 05:29:23 GMT Subject: RFR: 8302191: Performance degradation for float/double modulo on Linux [v5] In-Reply-To: References: Message-ID: <_TLQFyghr0kX0qSFHw11hfIw9FQu4lsajzyHvfWVRHs=.60b65b6a-7b46-4334-a9c0-f69e3065f854@github.com> On Thu, 2 Mar 2023 13:13:56 GMT, Jan Kratochvil wrote: >> You can't move the _WIN64 workaround to the sharedRuntime_x86.cpp file as that code is also used by Windows-Aarch64. Whether it needs the workaround or not is another matter, but unless proven otherwise we have to assume it does. > >> You can't move the _WIN64 workaround to the sharedRuntime_x86.cpp file as that code is also used by Windows-Aarch64. Whether it needs the workaround or not is another matter, but unless proven otherwise we have to assume it does. > > I hope/believe the bug affects only amd64 and not aarch64. I want to verify it but I have some difficulty getting remote access to such Windows boxes, I am working on the verification. @jankratochvil can you update your master branch to latest and re-merge please. Unfortunately the current set of changes has a bug ([JDK-8302189](https://bugs.openjdk.org/browse/JDK-8302189)) that prevents building on macOS. ------------- PR: https://git.openjdk.org/jdk/pull/12508 From dholmes at openjdk.org Tue Mar 14 05:53:28 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 14 Mar 2023 05:53:28 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v10] In-Reply-To: <5bzaYlM6HXfUNJITjTSIaGgcJ_51OQf6XWr07w__wUw=.d0a9ac8b-a9a1-4122-9d2f-880de717d071@github.com> References: <5bzaYlM6HXfUNJITjTSIaGgcJ_51OQf6XWr07w__wUw=.d0a9ac8b-a9a1-4122-9d2f-880de717d071@github.com> Message-ID: <9WvL-zpi-ekddOKD2iqHtRAmqFJwwtL1gwKxnsLtA7A=.60564e75-e261-4ed0-a89f-8179c7ffdaa5@github.com> On Thu, 9 Mar 2023 09:29:41 GMT, Markus Gr?nlund wrote: >> src/hotspot/share/runtime/threads.cpp line 338: >> >>> 336: if (EagerXrunInit && Arguments::init_libraries_at_startup()) { >>> 337: create_vm_init_libraries(); >>> 338: } >> >> Not obvious where this went. Changes to the initialization order can be very problematic. > > Thanks, David. Two calls to launch XRun agents are invoked during startup, and they depend on the EagerXrunInit option. The !EagerXrunInit case is already located in create_vm(), but the EagerXrunInit was located as the first entry in initialize_java_lang_classes(), which I thought was tucked away a bit unnecessarily. > > I hoisted the EagerXrunInit case from initialize_java_lang_classes() up to create_vm(). It's now the call just before initialize_java_lang_classes(). > > This made it clearer, i.e. to have both calls located directly in create_vm(). Thanks for clarifying. That makes sense. ------------- PR: https://git.openjdk.org/jdk/pull/12923 From dholmes at openjdk.org Tue Mar 14 06:03:37 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 14 Mar 2023 06:03:37 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v10] In-Reply-To: References: Message-ID: On Fri, 10 Mar 2023 10:43:23 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> We are adding support to let JFR report on Agents. >> >> #### Design >> >> An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. >> >> A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) >> >> A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. >> >> To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: >> >> // Command line >> jdk.JavaAgent { >> startTime = 12:31:19.789 (2023-03-08) >> name = "JavaAgent.jar" >> options = "foo=bar" >> dynamic = false >> initialization = 12:31:15.574 (2023-03-08) >> initializationTime = 172 ms >> } >> >> // Dynamic load >> jdk.JavaAgent { >> startTime = 12:31:31.158 (2023-03-08) >> name = "JavaAgent.jar" >> options = "bar=baz" >> dynamic = true >> initialization = 12:31:31.037 (2023-03-08) >> initializationTime = 64,1 ms >> } >> >> The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. >> >> For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. >> >> The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) >> >> "initialization" is the timestamp the JVM invoked the initialization method, and "initializationTime" is the duration of executing the initialization method. >> >> "startTime" represents the time the JFR framework issued the periodic event; hence "initialization" will be earlier than "startTime". >> >> An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. >> >> To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: >> >> jdk.NativeAgent { >> startTime = 12:31:40.398 (2023-03-08) >> name = "jdwp" >> options = "transport=dt_socket,server=y,address=any,onjcmd=y" >> dynamic = false >> initialization = 12:31:36.142 (2023-03-08) >> initializationTime = 0,00184 ms >> path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" >> } >> >> The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. >> >> The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initialization" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationTime" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationTime" will be 0. >> >> #### Implementation >> >> There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. >> >> Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. >> >> When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. >> >> The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. >> >> The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. >> >> Testing: jdk_jfr, tier 1 - 6 >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: > > more cleanup I've had a good look through now and have a better sense of the refactoring. Seems good. I'll wait for any tweaks before hitting the approve button though. Thanks ------------- PR: https://git.openjdk.org/jdk/pull/12923 From never at openjdk.org Tue Mar 14 07:12:18 2023 From: never at openjdk.org (Tom Rodriguez) Date: Tue, 14 Mar 2023 07:12:18 GMT Subject: RFR: 8303431: [JVMCI] libgraal annotation API [v5] In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 22:59:23 GMT, Doug Simon wrote: >> This PR extends JVMCI with new API (`jdk.vm.ci.meta.Annotated`) for accessing annotations. The main differences from `java.lang.reflect.AnnotatedElement` are: >> * Each `Annotated` method explicitly specifies the annotation type(s) for which it wants annotation data. That is, there is no direct equivalent of `AnnotatedElement.getAnnotations()`. >> * Annotation data is returned in a map-like object (of type `jdk.vm.ci.meta.AnnotationData`) instead of in an `Annotation` object. This works better for libgraal as it avoids the need for annotation types to be loaded and included in libgraal. >> >> To demonstrate the new API, here's an example in terms `java.lang.reflect.AnnotatedElement` (which `ResolvedJavaType` implements): >> >> ResolvedJavaMethod method = ...; >> ExplodeLoop a = method.getAnnotation(ExplodeLoop.class); >> return switch (a.kind()) { >> case FULL_UNROLL -> LoopExplosionKind.FULL_UNROLL; >> case FULL_UNROLL_UNTIL_RETURN -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; >> ... >> } >> >> >> The same code using the new API: >> >> >> ResolvedJavaMethod method = ...; >> ResolvedJavaType explodeLoopType = ...; >> AnnotationData a = method.getAnnotationDataFor(explodeLoopType); >> return switch (a.getEnum("kind").getName()) { >> case "FULL_UNROLL" -> LoopExplosionKind.FULL_UNROLL; >> case "FULL_UNROLL_UNTIL_RETURN" -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; >> ... >> } >> >> >> The implementation relies on new methods in `jdk.internal.vm.VMSupport` for parsing annotations and serializing/deserializing to/from a byte array. This allows the annotation data to be passed from the HotSpot heap to the libgraal heap. > > Doug Simon has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Merge remote-tracking branch 'openjdk-jdk/master' into JDK-8303431 > - switched to use of lists and maps instead of arrays > - fixed whitespace > - added support for inherited annotations > - Merge branch 'master' into JDK-8303431 > - made AnnotationDataDecoder package-private > - add annotation API to JVMCI The JVMCI changes look ok to me. src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 2699: > 2697: typeArrayOop ba = typeArrayOop(res); > 2698: int ba_len = ba->length(); > 2699: if (ba_len <= 256) { Is this really necessary? Resource allocation is very cheap. src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/Annotated.java line 40: > 38: * annotations of this element. > 39: * > 40: * If this element is a class, then {@link Inherited} annotations are included in set of in the set ------------- Marked as reviewed by never (Reviewer). PR: https://git.openjdk.org/jdk/pull/12810 From jwaters at openjdk.org Tue Mar 14 07:29:11 2023 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 14 Mar 2023 07:29:11 GMT Subject: RFR: 8301244: Tidy up compiler specific warnings files [v11] In-Reply-To: References: Message-ID: > Cleans up some code in compilerWarnings_*.hpp files to be slightly neater Julian Waters has updated the pull request incrementally with one additional commit since the last revision: PRAGMA_NONNULL_IGNORED should match other pragma formats ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12255/files - new: https://git.openjdk.org/jdk/pull/12255/files/a3185f7c..99f8c3a9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12255&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12255&range=09-10 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/12255.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12255/head:pull/12255 PR: https://git.openjdk.org/jdk/pull/12255 From jwaters at openjdk.org Tue Mar 14 07:29:14 2023 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 14 Mar 2023 07:29:14 GMT Subject: Integrated: 8301244: Tidy up compiler specific warnings files In-Reply-To: References: Message-ID: <7s9pbp-4g87b3XFtZ-w0BkA2WA_znDAak3IdWutjgs0=.62a0fa99-e28d-433c-b626-7f6076224d70@github.com> On Fri, 27 Jan 2023 13:40:20 GMT, Julian Waters wrote: > Cleans up some code in compilerWarnings_*.hpp files to be slightly neater This pull request has now been integrated. Changeset: 2bb990ed Author: Julian Waters URL: https://git.openjdk.org/jdk/commit/2bb990edde5c8a08b9a9b209aa1fcdc3c38c3cb8 Stats: 9 lines in 2 files changed: 1 ins; 4 del; 4 mod 8301244: Tidy up compiler specific warnings files Reviewed-by: kbarrett, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/12255 From chagedorn at openjdk.org Tue Mar 14 09:08:12 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 14 Mar 2023 09:08:12 GMT Subject: RFR: 8302073: Specifying OnError handler prevents WatcherThread to break a deadlock in report_and_die() In-Reply-To: <0jlQWouDE0tJ-ysn7WFYArqrJwHFQ-hhsZKKRGdVhmU=.95c4ceb0-9b8e-4c9b-9029-473626fb5a6b@github.com> References: <0jlQWouDE0tJ-ysn7WFYArqrJwHFQ-hhsZKKRGdVhmU=.95c4ceb0-9b8e-4c9b-9029-473626fb5a6b@github.com> Message-ID: On Thu, 9 Mar 2023 06:26:56 GMT, Thomas Stuefe wrote: > Decoder, in particular, should not use malloc. Therefore I also opened https://bugs.openjdk.org/browse/JDK-8303862 to track that. I won't have time to work on that. I have the hope that maybe @chhagedorn can :-) ? Does any of our current allocation strategies allow the usage of a custom scratch buffer (i.e. the error reporting scratch buffer) as memory location? If not, it could get more complicated. I could still try to tackle that but I'm not sure if I have the necessary knowledge in that area. ------------- PR: https://git.openjdk.org/jdk/pull/12925 From rrich at openjdk.org Tue Mar 14 09:24:08 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 14 Mar 2023 09:24:08 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v2] In-Reply-To: <-Kj1YJ_nRa4nJtaxg3UR8uWhde6vIG1Jl-FFakGnHy4=.a41c6149-912b-4a66-8b1e-634bd27cdebb@github.com> References: <-Kj1YJ_nRa4nJtaxg3UR8uWhde6vIG1Jl-FFakGnHy4=.a41c6149-912b-4a66-8b1e-634bd27cdebb@github.com> Message-ID: On Thu, 9 Mar 2023 21:18:19 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Interpreter optimization and comments @matias9927 can I ask you to merge master? There seem to be conflicts (at least I see a message "This branch has conflicts that must be resolved"). I'd like to give the change a spin in our CI testing. This requires that it can be applied on master. ------------- PR: https://git.openjdk.org/jdk/pull/12778 From mbaesken at openjdk.org Tue Mar 14 09:27:39 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 14 Mar 2023 09:27:39 GMT Subject: RFR: JDK-8302989: Add missing INCLUDE_CDS checks [v4] In-Reply-To: References: Message-ID: On Fri, 3 Mar 2023 08:55:07 GMT, Matthias Baesken wrote: >> The cds only coding in hotspot is usually guarded with the INCLUDE_CDS macro so that it can be removed at compile time in case the correct configure flags are set. >> However at some places INCLUDE_CDS is missing and should be added. >> >> One question - should (additionally to the UseSharedSpaces code section) the DumpSharedSpaces code sections be guarded as well with INCLUDE_CDS macros ? > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > Adjust arguments handling I noticed that the Xshare-handling IS already an issue on AIX - there cds is not supported and `-Xshare:off` does not work. So tests like the ones I mentioned in my list above e.g. java/math/BigInteger/largeMemory/DivisionOverflow.java all fail on AIX (like on other platforms without cds). ------------- PR: https://git.openjdk.org/jdk/pull/12691 From kevinw at openjdk.org Tue Mar 14 09:49:54 2023 From: kevinw at openjdk.org (Kevin Walls) Date: Tue, 14 Mar 2023 09:49:54 GMT Subject: RFR: 8303150: DCmd framework unnecessarily creates a DCmd instance on registration In-Reply-To: References: Message-ID: On Mon, 13 Mar 2023 01:19:50 GMT, David Holmes wrote: > When DCmd factories are registered, the factory is passed the number of arguments taken by the DCmd - using a template method `get_num_arguments`. For DCmds that don't extend DCmdWithParser there has to be a static `num_arguments()` method in that class. For DCmds that do extend DCmdWithParser the logic instantiates an instance of the DCmd, extracts its parser and calls its `num_arguments` method which dynamically counts the number of defined arguments. > > Creating an instance of each DCmd and dynamically counting arguments is inefficient and unnecessary, the number of arguments is statically known and easily expressed (in fact many of the JFR DCmds already statically define this). So we add the static `num_arguments()` method to each class that needs it and return the statically counted number of arguments. To ensure the static number and actual number don't get out-of-sync, we keep the original dynamic logic for use in debug builds to assert that the static and dynamic counts are the same. The assert will trigger during a debug build if something does get out of sync, for example if a new DCmd (extending DCmdWithParser) were added but didn't define the static `num_arguments()` method. > > A number of DCmd classes were unnecessarily defining their own dynamic version of `num_arguments` and these are now removed. > > In the template method I use `ENABLE_IF(std::is_convertible::value)` to check we only call this on DCmd classes. This may be unnecessary but it seemed consistent with the other template methods. Note that `std::is_base_of` only works for immediate super types. > > Testing: tiers 1-4 > > Performance: in theory we should see some improvement in startup; in practice it is barely noticeable. > > Thanks. src/hotspot/share/services/diagnosticFramework.hpp line 1: > 1: /* Should we update this comment: 279 // - For subclasses of DCmdWithParser, it's calculated by DCmdParser::num_arguments(). ------------- PR: https://git.openjdk.org/jdk/pull/12994 From duke at openjdk.org Tue Mar 14 09:50:17 2023 From: duke at openjdk.org (Afshin Zafari) Date: Tue, 14 Mar 2023 09:50:17 GMT Subject: RFR: 8292059: Do not inline InstanceKlass::allocate_instance() [v6] In-Reply-To: References: Message-ID: > The inline and not-inline versions of the method is tested to compare the performance difference. > ### Test > `make test TEST=micro:Capture0.lambda_01 MICRO="VM_OPTIONS=-XX:TieredStopAtLevel=1" ` Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: 8292059: Do not inline InstanceKlass::allocate_instance() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12782/files - new: https://git.openjdk.org/jdk/pull/12782/files/0ef3159a..4165cab8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12782&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12782&range=04-05 Stats: 9 lines in 3 files changed: 8 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/12782.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12782/head:pull/12782 PR: https://git.openjdk.org/jdk/pull/12782 From rkennke at openjdk.org Tue Mar 14 10:19:30 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 14 Mar 2023 10:19:30 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v23] In-Reply-To: References: Message-ID: On Mon, 13 Mar 2023 20:02:45 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > X86 parts Thank you all for your review comments! I will address them today. Yesterday I pushed a rather significant change that probably warrants some explanation. Previously, the lock-stack could be grown when its capacity is no longer sufficient. However, that means that we needed to maintain 3 pointers: the stack base, the current stack-pointer and the limit. Also, checking for room on the lock-stack involved loading 2 of these two pointers (current and limit) and comparing them. This used to be tricky because it requires two registers on some platforms. The insight that leads to the improved implementation is that the lock-stack is very commonly very shallow: I did an experiment with several workloads yesterday and it never exceeded a depth of 5. I now made the lock-stack size 8 elements and fixed-size. When the lock-stack ever is full, then we have to bite the bullet and inflate the monitor, but this should be very very rare. On the upside, the check for lock stack is now much simpler: we only need to load the current stack offset and compare it to the end offset - which is a constant and can be encoded as immediate. Also, the current 'pointer' is now an offset relative to the beginning of the JavaThread structure, this way the lock-stack can be addressed using indirect addressing on rthread. Additionally, I eliminated the code that checks for enough lock-stack upon method entry. This has not been very useful and often lead to excessive lock-stack-growth. @RealFYang You may want to update the RISCV code to reflect those latest changes, otherwise it would now be broken. I will now address your comments and also change the implementation of SA. Thanks, Roman ------------- PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Tue Mar 14 10:49:30 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 14 Mar 2023 10:49:30 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v24] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Use -w instructions in new locking code stubs (aarch64) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/75db4f0a..87b95bf7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=22-23 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From mgronlun at openjdk.org Tue Mar 14 12:25:54 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 14 Mar 2023 12:25:54 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v10] In-Reply-To: References: Message-ID: <5uEVRUEr0vSBiTHqKpKVwG1k-v5UrFr9RVAip3K8NSg=.a7bf35b5-b372-4ba6-b217-642c6ad4e2a8@github.com> On Mon, 13 Mar 2023 06:22:21 GMT, David Holmes wrote: >> Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: >> >> more cleanup > > src/hotspot/share/prims/agent.cpp line 34: > >> 32: } >> 33: >> 34: static const char* allocate_copy(const char* str) { > > Why not just use `os::strdup`? Better alternative, thanks David. > src/hotspot/share/prims/agentList.cpp line 227: > >> 225: * store data in their JvmtiEnv local storage. >> 226: * >> 227: * Please see JPLISAgent.c in module java.instrument, see JPLISAgent.h and JPLISAgent.c. > > No need to mention the .c file twice. Good point. > src/hotspot/share/prims/agentList.cpp line 419: > >> 417: const jint err = (*on_load_entry)(&main_vm, const_cast(agent->options()), NULL); >> 418: if (err != JNI_OK) { >> 419: vm_exit_during_initialization("-Xrun library failed to init", agent->name()); > > Do you need to be back in `_thread_in_vm` before exiting? Hmm. This was ported as is. I will double-check. > src/hotspot/share/prims/agentList.cpp line 542: > >> 540: >> 541: // Invoke the Agent_OnAttach function >> 542: JavaThread* THREAD = JavaThread::current(); // For exception macros. > > Nit: just use `current` rather than `THREAD` and don't use the exception macros. Ported as is but good point, will update. ------------- PR: https://git.openjdk.org/jdk/pull/12923 From mgronlun at openjdk.org Tue Mar 14 12:26:00 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 14 Mar 2023 12:26:00 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v9] In-Reply-To: References: Message-ID: On Fri, 10 Mar 2023 06:57:46 GMT, David Holmes wrote: >> Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: >> >> handle multiple envs with same VMInit callback > > src/hotspot/share/prims/agent.cpp line 41: > >> 39: char* copy = AllocateHeap(length + 1, mtInternal); >> 40: strncpy(copy, str, length + 1); >> 41: assert(strncmp(copy, str, length + 1) == 0, "invariant"); > > Unclear what you are checking here. Don't you trust strncpy? Maybe a bit paranoid, yes. I can clean up. ------------- PR: https://git.openjdk.org/jdk/pull/12923 From mgronlun at openjdk.org Tue Mar 14 12:26:03 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 14 Mar 2023 12:26:03 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v10] In-Reply-To: References: Message-ID: <-OvxuwPKbYU514MyCXdcC5-0Nt1ftlipuUFueCe3DGc=.3b0140b3-ef41-4ad6-9515-ec6e9ef40250@github.com> On Mon, 13 Mar 2023 09:49:39 GMT, Andrew Dinn wrote: >> src/hotspot/share/prims/agentList.cpp line 64: >> >>> 62: void AgentList::add_xrun(const char* name, char* options, bool absolute_path) { >>> 63: Agent* agent = new Agent(name, options, absolute_path); >>> 64: agent->_is_xrun = true; >> >> Why direct access of private field instead of having a setter like other parts of the Agent API? > > n.b. that also applies for accesses/updates to field _next. I wanted all accesses to use the iterator. The only access is given to the iterator and AgentList by way of being friends. No need to expose more. ------------- PR: https://git.openjdk.org/jdk/pull/12923 From mgronlun at openjdk.org Tue Mar 14 12:29:10 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 14 Mar 2023 12:29:10 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v10] In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 06:01:05 GMT, David Holmes wrote: > I've had a good look through now and have a better sense of the refactoring. Seems good. > > > > I'll wait for any tweaks before hitting the approve button though. > > > > Thanks Thanks so much for taking a look. I realized that implementation details of loading should probably reside in agent.cpp, not agentList.cpp. I am currently off on vacation and will update when back. Thanks also to Andrew Dinn for comments. ------------- PR: https://git.openjdk.org/jdk/pull/12923 From mgronlun at openjdk.org Tue Mar 14 12:29:14 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 14 Mar 2023 12:29:14 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v10] In-Reply-To: References: Message-ID: On Mon, 13 Mar 2023 09:46:04 GMT, Andrew Dinn wrote: >> Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: >> >> more cleanup > > src/hotspot/share/jfr/metadata/metadata.xml line 1182: > >> 1180: >> 1181: >> 1182: > > @mgronlun A somewhat drive-by comment. It might be clearer if you renamed these event fields and accessors, plus also the corresponding fields and accessors in class Agent, as `initializationTime` and `initializationDuration`. Makes sense. ------------- PR: https://git.openjdk.org/jdk/pull/12923 From duke at openjdk.org Tue Mar 14 12:45:42 2023 From: duke at openjdk.org (Jan Kratochvil) Date: Tue, 14 Mar 2023 12:45:42 GMT Subject: RFR: 8302191: Performance degradation for float/double modulo on Linux [v13] In-Reply-To: References: Message-ID: > I have OCA already processed/approved. I am not Author but my Author request is being processed these days (sent to Rob McKenna). > I did regression test x86_64 OpenJDK-8. I will leave other regression testing on GHA. > The patch (and former GCC performance regression) affects only x86_64+i686. Jan Kratochvil has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: - Merge branch 'master' into modulo - Fix missing SharedRuntime::frem and SharedRuntime::drem on aarch64. - bugreported by sviswa7. - Merge branch 'master' into modulo - Fix #endif comment - found by dholmes-ora. - Merge branch 'master' into modulo - Fix win32 broken build. - Merge remote-tracking branch 'origin/master' into modulo - Always include the _WIN64 workaround - a review by dholmes-ora. - Remove comments to be moved to JBS (Bug System) - a review by jddarcy. - Uppercase L - a review by turbanoff. - ... and 4 more: https://git.openjdk.org/jdk/compare/e171ac47...65af58da ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12508/files - new: https://git.openjdk.org/jdk/pull/12508/files/f03d4cdf..65af58da Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12508&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12508&range=11-12 Stats: 71276 lines in 811 files changed: 58833 ins; 4868 del; 7575 mod Patch: https://git.openjdk.org/jdk/pull/12508.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12508/head:pull/12508 PR: https://git.openjdk.org/jdk/pull/12508 From fparain at openjdk.org Tue Mar 14 13:15:54 2023 From: fparain at openjdk.org (Frederic Parain) Date: Tue, 14 Mar 2023 13:15:54 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v4] In-Reply-To: <5apLgAhjSwiK-sv6Xrl9yZctZTAe_GahWGyk8rUYgvk=.1af11917-7547-48ba-b958-01c6ef6f9f18@github.com> References: <5apLgAhjSwiK-sv6Xrl9yZctZTAe_GahWGyk8rUYgvk=.1af11917-7547-48ba-b958-01c6ef6f9f18@github.com> Message-ID: On Mon, 13 Mar 2023 21:53:37 GMT, Doug Simon wrote: >> Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixes includes and style > > src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ResolvedJavaField.java line 48: > >> 46: * Returns VM internal flags associated with this field >> 47: */ >> 48: int getInternalModifiers(); > > We've never exposed the internal modifiers before in public JVMCI API and we should refrain from doing so until there's a good reason to do so. Please remove this method. Access to internal modifiers is needed in `HotSpotResolvedJavaFieldTest.testEquivalenceForInternalFields()`. I moved the declaration of the method to `HotSpotResolvedJavaField`. Does this change work for you? ------------- PR: https://git.openjdk.org/jdk/pull/12855 From fparain at openjdk.org Tue Mar 14 13:19:37 2023 From: fparain at openjdk.org (Frederic Parain) Date: Tue, 14 Mar 2023 13:19:37 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v4] In-Reply-To: <5apLgAhjSwiK-sv6Xrl9yZctZTAe_GahWGyk8rUYgvk=.1af11917-7547-48ba-b958-01c6ef6f9f18@github.com> References: <5apLgAhjSwiK-sv6Xrl9yZctZTAe_GahWGyk8rUYgvk=.1af11917-7547-48ba-b958-01c6ef6f9f18@github.com> Message-ID: On Mon, 13 Mar 2023 21:44:59 GMT, Doug Simon wrote: >> Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixes includes and style > > src/hotspot/share/jvmci/vmStructs_jvmci.cpp line 416: > >> 414: declare_constant(FieldInfo::FieldFlags::_ff_injected) \ >> 415: declare_constant(FieldInfo::FieldFlags::_ff_stable) \ >> 416: declare_constant(FieldInfo::FieldFlags::_ff_generic) \ > > I don't think `_ff_generic` is used in the JVMCI Java code so this entry can be deleted. Please double check the other entries. _ff_generic removed. _ff_stable is used in `HotSpotResolvedJavaFieldImpl.isStable()`. _ff_injected is used in `HotSpotResolvedJavaFieldImpl.isInternal()` ------------- PR: https://git.openjdk.org/jdk/pull/12855 From coleenp at openjdk.org Tue Mar 14 13:21:55 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 14 Mar 2023 13:21:55 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v2] In-Reply-To: <3QfQXArLyZzTcdg4r9bSGJKmnoG_YY8OFOJ0eLz2rYY=.e83d9ff2-8c7e-471c-b250-97a92e7db1e5@github.com> References: <-Kj1YJ_nRa4nJtaxg3UR8uWhde6vIG1Jl-FFakGnHy4=.a41c6149-912b-4a66-8b1e-634bd27cdebb@github.com> <3QfQXArLyZzTcdg4r9bSGJKmnoG_YY8OFOJ0eLz2rYY=.e83d9ff2-8c7e-471c-b250-97a92e7db1e5@github.com> Message-ID: On Mon, 13 Mar 2023 21:05:11 GMT, Coleen Phillimore wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> Interpreter optimization and comments > > src/hotspot/cpu/x86/templateTable_x86.cpp line 2798: > >> 2796: bool is_invokevirtual, >> 2797: bool is_invokevfinal, /*unused*/ >> 2798: bool is_invokedynamic /*unused*/) { > > Can you remove the parameter since the s390 port is here? Ok, never mind, I saw s390 port but it doesn't seem to be in these changes (?) ------------- PR: https://git.openjdk.org/jdk/pull/12778 From coleenp at openjdk.org Tue Mar 14 13:28:29 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 14 Mar 2023 13:28:29 GMT Subject: RFR: 8304089: Convert TraceDependencies to UL Message-ID: <0aGc1NdAjpvksCWmXb1gZOPp9MV0n6xWvG8EaEp2ZLg=.b79cf218-9a23-420d-bec5-7509b7f8f1c1@github.com> This change converts TraceDependencies to UL and removes the develop option. I think this provides further flexibility to add tags to only trace certain things in dependency analysis, as I did when trying to understand a PR for a deoptimization change. For now, the messages are the same and the option is -Xlog:dependencies=debug. Tested with tier1-4 ------------- Commit messages: - Logging Changes: https://git.openjdk.org/jdk/pull/13007/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13007&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8304089 Stats: 78 lines in 14 files changed: 23 ins; 3 del; 52 mod Patch: https://git.openjdk.org/jdk/pull/13007.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13007/head:pull/13007 PR: https://git.openjdk.org/jdk/pull/13007 From coleenp at openjdk.org Tue Mar 14 13:38:10 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 14 Mar 2023 13:38:10 GMT Subject: RFR: 8304089: Convert TraceDependencies to UL [v2] In-Reply-To: <0aGc1NdAjpvksCWmXb1gZOPp9MV0n6xWvG8EaEp2ZLg=.b79cf218-9a23-420d-bec5-7509b7f8f1c1@github.com> References: <0aGc1NdAjpvksCWmXb1gZOPp9MV0n6xWvG8EaEp2ZLg=.b79cf218-9a23-420d-bec5-7509b7f8f1c1@github.com> Message-ID: > This change converts TraceDependencies to UL and removes the develop option. I think this provides further flexibility to add tags to only trace certain things in dependency analysis, as I did when trying to understand a PR for a deoptimization change. For now, the messages are the same and the option is -Xlog:dependencies=debug. > Tested with tier1-4 Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - Merge branch 'master' into trace-dependencies-logging - Logging ------------- Changes: https://git.openjdk.org/jdk/pull/13007/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13007&range=01 Stats: 79 lines in 14 files changed: 23 ins; 3 del; 53 mod Patch: https://git.openjdk.org/jdk/pull/13007.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13007/head:pull/13007 PR: https://git.openjdk.org/jdk/pull/13007 From fparain at openjdk.org Tue Mar 14 13:40:55 2023 From: fparain at openjdk.org (Frederic Parain) Date: Tue, 14 Mar 2023 13:40:55 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v4] In-Reply-To: <5apLgAhjSwiK-sv6Xrl9yZctZTAe_GahWGyk8rUYgvk=.1af11917-7547-48ba-b958-01c6ef6f9f18@github.com> References: <5apLgAhjSwiK-sv6Xrl9yZctZTAe_GahWGyk8rUYgvk=.1af11917-7547-48ba-b958-01c6ef6f9f18@github.com> Message-ID: On Mon, 13 Mar 2023 21:35:17 GMT, Doug Simon wrote: >> Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixes includes and style > > src/hotspot/share/jvmci/jvmciEnv.hpp line 149: > >> 147: }; >> 148: >> 149: extern JNIEXPORT jobjectArray c2v_getDeclaredFieldsInfo(JNIEnv* env, jobject, jobject, jlong); > > What's the purpose of this declaration? I don't think you need it or the `friend` declaration below since `new_FieldInfo(FieldInfo* fieldinfo, JVMCI_TRAPS)` is public. Without this declaration, builds fail on Windows with this error: `error C2375: 'c2v_getDeclaredFieldsInfo': redefinition; different linkage` ------------- PR: https://git.openjdk.org/jdk/pull/12855 From matsaave at openjdk.org Tue Mar 14 13:59:48 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 14 Mar 2023 13:59:48 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v3] In-Reply-To: References: Message-ID: > The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. > > This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. > > Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. > > The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. > > This change supports the following platforms: x86, aarch64, PPC, and RISCV Matias Saavedra Silva has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Typo in comment - Merge branch 'master' into resolvedIndyEntry_8301995 - Interpreter optimization and comments - PPC and RISCV port - 8301995: Move invokedynamic resolution information out of the cpCache ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12778/files - new: https://git.openjdk.org/jdk/pull/12778/files/c2d87e59..a3e7ca02 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=01-02 Stats: 92608 lines in 1481 files changed: 72908 ins; 8825 del; 10875 mod Patch: https://git.openjdk.org/jdk/pull/12778.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12778/head:pull/12778 PR: https://git.openjdk.org/jdk/pull/12778 From stuefe at openjdk.org Tue Mar 14 14:05:14 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 14 Mar 2023 14:05:14 GMT Subject: RFR: 8302073: Specifying OnError handler prevents WatcherThread to break a deadlock in report_and_die() In-Reply-To: References: <0jlQWouDE0tJ-ysn7WFYArqrJwHFQ-hhsZKKRGdVhmU=.95c4ceb0-9b8e-4c9b-9029-473626fb5a6b@github.com> Message-ID: <1c5aw52eXF3IwxfP7pk4ZlfMer3eI02C41XyDuZruik=.cd19d3a0-51be-451c-9a0a-bf386b3114b6@github.com> On Tue, 14 Mar 2023 09:04:51 GMT, Christian Hagedorn wrote: > > Decoder, in particular, should not use malloc. Therefore I also opened https://bugs.openjdk.org/browse/JDK-8303862 to track that. I won't have time to work on that. I have the hope that maybe @chhagedorn can :-) ? > > Does any of our current allocation strategies allow the usage of a custom scratch buffer (i.e. the error reporting scratch buffer) as memory location? If not, it could get more complicated. I could still try to tackle that but I'm not sure if I have the necessary knowledge in that area. What we usually do for these kind of problems is to pass the scratch buffer via argument to the processing functions. And make that use either it or, if NULL was passes, allocate its own thing. Another way to do this would be to add a scratch buffer pointer to Thread. A third way to do this (I had been experimenting with it) would be to pre-allocate a scratch buffer, and once error handling began, to use that inside os::malloc. That is the most involved solution, though, and I'm not particularly fond of it. I know many reviewers would hate it, too :-) ------------- PR: https://git.openjdk.org/jdk/pull/12925 From stuefe at openjdk.org Tue Mar 14 14:22:28 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 14 Mar 2023 14:22:28 GMT Subject: RFR: JDK-8303861: Error handling step timeouts should never be blocked by OnError and others [v5] In-Reply-To: References: Message-ID: > Fatal error handling is subject to several timeouts: > - a global timeout (controlled via ErrorLogTimeout) > - local error reporting step timeouts. > > The latter aims to "give the JVM a kick" if it gets stuck in one particular place during error reporting. This prevents one error reporting step from hogging all the time allotted to error reporting under ErrorLogTimeout. > > There are three situations where atm we suppress the global error timeout: > - if the JVM is embedded and the launcher has its abort hook installed. Obviously, that must be allowed to run. > - if the user specified one or more OnError commands to run, and these did not yet run. These must have a chance to run unmolested. > - if the user (typically developer) specified ShowMessageBoxOnError, and the error box has not yet been shown > > There is a bug though, that also prevents the step timeout from firing if either condition is true. That is plain wrong. > > In addition to that, the test interval WatcherThread uses to check for timeouts should be decreased. It sits at 1 second, which is too coarse-grained. > > -------- > > Patch: > - reworks `VMError::check_timeout()` to never block step timeouts > - adds clarifying comments > - quadruples timeout check frequency by watcher thread > - adds regression test for timeout handling with OnError > - additionally limits timeout per individual error reporting step to 5 seconds. 5 seconds is usually enough to distinguish a slow error reporting step from one that is endlessly hanging. > > Tested locally on Linux x64. Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: remove-stray-at-sign ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12936/files - new: https://git.openjdk.org/jdk/pull/12936/files/52f382db..fcbb198a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12936&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12936&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12936.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12936/head:pull/12936 PR: https://git.openjdk.org/jdk/pull/12936 From rrich at openjdk.org Tue Mar 14 14:54:06 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 14 Mar 2023 14:54:06 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v2] In-Reply-To: References: <-Kj1YJ_nRa4nJtaxg3UR8uWhde6vIG1Jl-FFakGnHy4=.a41c6149-912b-4a66-8b1e-634bd27cdebb@github.com> <3QfQXArLyZzTcdg4r9bSGJKmnoG_YY8OFOJ0eLz2rYY=.e83d9ff2-8c7e-471c-b250-97a92e7db1e5@github.com> Message-ID: <6ungKcriyVh3xBdyFAA7AwOHgNMAO8E1fWeGi1Ap3gA=.fc8c1ee2-66a4-4324-be69-f186360efb5a@github.com> On Tue, 14 Mar 2023 13:18:40 GMT, Coleen Phillimore wrote: > Ok, never mind, I saw s390 port but it doesn't seem to be in these changes (?) It is not in these changes. @offamitkumar is working on s390x. It is not yet finished though. (I wasn't aware that putting the URL of this PR into a comment somewhere else adds a comment to this PR) ------------- PR: https://git.openjdk.org/jdk/pull/12778 From dnsimon at openjdk.org Tue Mar 14 15:13:00 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 14 Mar 2023 15:13:00 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v4] In-Reply-To: References: <5apLgAhjSwiK-sv6Xrl9yZctZTAe_GahWGyk8rUYgvk=.1af11917-7547-48ba-b958-01c6ef6f9f18@github.com> Message-ID: <23n-dTVRiGuVl7imPvKph7q43FuB1k7Hak6-mGNDKeM=.40ca325c-e53f-4950-bece-99b7e4f4d367@github.com> On Tue, 14 Mar 2023 13:12:31 GMT, Frederic Parain wrote: >> src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ResolvedJavaField.java line 48: >> >>> 46: * Returns VM internal flags associated with this field >>> 47: */ >>> 48: int getInternalModifiers(); >> >> We've never exposed the internal modifiers before in public JVMCI API and we should refrain from doing so until there's a good reason to do so. Please remove this method. > > Access to internal modifiers is needed in `HotSpotResolvedJavaFieldTest.testEquivalenceForInternalFields()`. I moved the declaration of the method to `HotSpotResolvedJavaField`. Does this change work for you? Just use reflection to read the internal flags (like this test already does for the `index` field). I've attached [review.patch](https://github.com/openjdk/jdk/files/10970245/review.patch) with this change and a few other changes I think should be made for better naming (plus one test cleanup). ------------- PR: https://git.openjdk.org/jdk/pull/12855 From coleenp at openjdk.org Tue Mar 14 15:40:53 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 14 Mar 2023 15:40:53 GMT Subject: RFR: 8304089: Convert TraceDependencies to UL [v3] In-Reply-To: <0aGc1NdAjpvksCWmXb1gZOPp9MV0n6xWvG8EaEp2ZLg=.b79cf218-9a23-420d-bec5-7509b7f8f1c1@github.com> References: <0aGc1NdAjpvksCWmXb1gZOPp9MV0n6xWvG8EaEp2ZLg=.b79cf218-9a23-420d-bec5-7509b7f8f1c1@github.com> Message-ID: > This change converts TraceDependencies to UL and removes the develop option. I think this provides further flexibility to add tags to only trace certain things in dependency analysis, as I did when trying to understand a PR for a deoptimization change. For now, the messages are the same and the option is -Xlog:dependencies=debug. > Tested with tier1-4 Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Fix merge conflict. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13007/files - new: https://git.openjdk.org/jdk/pull/13007/files/8c0bc890..e5851202 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13007&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13007&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13007.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13007/head:pull/13007 PR: https://git.openjdk.org/jdk/pull/13007 From rkennke at openjdk.org Tue Mar 14 15:47:29 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 14 Mar 2023 15:47:29 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v25] In-Reply-To: References: Message-ID: <_3eNnc8JNcoPdK8IZHTUGkSMqnERJmfab9ry323jVKI=.1faf478c-af7f-4d82-a35b-0252b2d068cc@github.com> > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: SA fixes related to latest changes in LockStack ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/87b95bf7..2f572097 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=23-24 Stats: 15 lines in 1 file changed: 4 ins; 0 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From stuefe at openjdk.org Tue Mar 14 15:54:44 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 14 Mar 2023 15:54:44 GMT Subject: RFR: JDK-8303861: Error handling step timeouts should never be blocked by OnError and others [v2] In-Reply-To: References: Message-ID: <3cSfYO5FRvkmAJHe-jC75wWtNoCVPMV9Lu85pYCsHUg=.9bbd6c43-3a2a-4f65-971f-6df64218d0a9@github.com> On Fri, 10 Mar 2023 06:37:28 GMT, David Holmes wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> limit step timeout to 5 seconds max > > Changes seem fine. Thanks for the clear explanation. okay this has been cooking long enough I think. Thanks @dholmes-ora and @rkennke ! ------------- PR: https://git.openjdk.org/jdk/pull/12936 From stuefe at openjdk.org Tue Mar 14 15:54:46 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 14 Mar 2023 15:54:46 GMT Subject: Integrated: JDK-8303861: Error handling step timeouts should never be blocked by OnError and others In-Reply-To: References: Message-ID: On Thu, 9 Mar 2023 06:12:02 GMT, Thomas Stuefe wrote: > Fatal error handling is subject to several timeouts: > - a global timeout (controlled via ErrorLogTimeout) > - local error reporting step timeouts. > > The latter aims to "give the JVM a kick" if it gets stuck in one particular place during error reporting. This prevents one error reporting step from hogging all the time allotted to error reporting under ErrorLogTimeout. > > There are three situations where atm we suppress the global error timeout: > - if the JVM is embedded and the launcher has its abort hook installed. Obviously, that must be allowed to run. > - if the user specified one or more OnError commands to run, and these did not yet run. These must have a chance to run unmolested. > - if the user (typically developer) specified ShowMessageBoxOnError, and the error box has not yet been shown > > There is a bug though, that also prevents the step timeout from firing if either condition is true. That is plain wrong. > > In addition to that, the test interval WatcherThread uses to check for timeouts should be decreased. It sits at 1 second, which is too coarse-grained. > > -------- > > Patch: > - reworks `VMError::check_timeout()` to never block step timeouts > - adds clarifying comments > - quadruples timeout check frequency by watcher thread > - adds regression test for timeout handling with OnError > - additionally limits timeout per individual error reporting step to 5 seconds. 5 seconds is usually enough to distinguish a slow error reporting step from one that is endlessly hanging. > > Tested locally on Linux x64. This pull request has now been integrated. Changeset: a00f5d24 Author: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/a00f5d24d3824e3ab84208401a967efe0e7bf88e Stats: 93 lines in 3 files changed: 57 ins; 1 del; 35 mod 8303861: Error handling step timeouts should never be blocked by OnError and others Reviewed-by: dholmes, rkennke ------------- PR: https://git.openjdk.org/jdk/pull/12936 From dnsimon at openjdk.org Tue Mar 14 15:56:48 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 14 Mar 2023 15:56:48 GMT Subject: RFR: 8303431: [JVMCI] libgraal annotation API [v5] In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 06:28:20 GMT, Tom Rodriguez wrote: >> Doug Simon has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: >> >> - Merge remote-tracking branch 'openjdk-jdk/master' into JDK-8303431 >> - switched to use of lists and maps instead of arrays >> - fixed whitespace >> - added support for inherited annotations >> - Merge branch 'master' into JDK-8303431 >> - made AnnotationDataDecoder package-private >> - add annotation API to JVMCI > > src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 2699: > >> 2697: typeArrayOop ba = typeArrayOop(res); >> 2698: int ba_len = ba->length(); >> 2699: if (ba_len <= 256) { > > Is this really necessary? Resource allocation is very cheap. Ok, good point. I'll remove the optimization. ------------- PR: https://git.openjdk.org/jdk/pull/12810 From dnsimon at openjdk.org Tue Mar 14 16:06:06 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 14 Mar 2023 16:06:06 GMT Subject: RFR: 8303431: [JVMCI] libgraal annotation API [v6] In-Reply-To: References: Message-ID: <0hYs21V1ZWB8o92CfvkEW3i0dZKkeW8kYGQu0p6xvtM=.e76da2cd-dbe5-4da2-a6cb-775f081b9a6a@github.com> > This PR extends JVMCI with new API (`jdk.vm.ci.meta.Annotated`) for accessing annotations. The main differences from `java.lang.reflect.AnnotatedElement` are: > * Each `Annotated` method explicitly specifies the annotation type(s) for which it wants annotation data. That is, there is no direct equivalent of `AnnotatedElement.getAnnotations()`. > * Annotation data is returned in a map-like object (of type `jdk.vm.ci.meta.AnnotationData`) instead of in an `Annotation` object. This works better for libgraal as it avoids the need for annotation types to be loaded and included in libgraal. > > To demonstrate the new API, here's an example in terms `java.lang.reflect.AnnotatedElement` (which `ResolvedJavaType` implements): > > ResolvedJavaMethod method = ...; > ExplodeLoop a = method.getAnnotation(ExplodeLoop.class); > return switch (a.kind()) { > case FULL_UNROLL -> LoopExplosionKind.FULL_UNROLL; > case FULL_UNROLL_UNTIL_RETURN -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; > ... > } > > > The same code using the new API: > > > ResolvedJavaMethod method = ...; > ResolvedJavaType explodeLoopType = ...; > AnnotationData a = method.getAnnotationDataFor(explodeLoopType); > return switch (a.getEnum("kind").getName()) { > case "FULL_UNROLL" -> LoopExplosionKind.FULL_UNROLL; > case "FULL_UNROLL_UNTIL_RETURN" -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; > ... > } > > > The implementation relies on new methods in `jdk.internal.vm.VMSupport` for parsing annotations and serializing/deserializing to/from a byte array. This allows the annotation data to be passed from the HotSpot heap to the libgraal heap. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: addressed review feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12810/files - new: https://git.openjdk.org/jdk/pull/12810/files/a85fa13a..abaf2375 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12810&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12810&range=04-05 Stats: 10 lines in 2 files changed: 0 ins; 9 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12810.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12810/head:pull/12810 PR: https://git.openjdk.org/jdk/pull/12810 From gcao at openjdk.org Tue Mar 14 16:18:21 2023 From: gcao at openjdk.org (Gui Cao) Date: Tue, 14 Mar 2023 16:18:21 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v3] In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 13:59:48 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Typo in comment > - Merge branch 'master' into resolvedIndyEntry_8301995 > - Interpreter optimization and comments > - PPC and RISCV port > - 8301995: Move invokedynamic resolution information out of the cpCache Hi, I have updated the riscv related code by referring to the latest aarch64 related changes, please help me to update it. https://github.com/zifeihan/jdk/commit/ca9f110ca4eb066f828442265f43ed0d9311a9cc (on this branch: https://github.com/zifeihan/jdk/commits/follow_12778) @RealFYang @DingliZhang Please help review the RISCV port code. ------------- PR: https://git.openjdk.org/jdk/pull/12778 From stuefe at openjdk.org Tue Mar 14 16:45:48 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 14 Mar 2023 16:45:48 GMT Subject: RFR: 8302073: Specifying OnError handler prevents WatcherThread to break a deadlock in report_and_die() In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 18:11:09 GMT, Alexey Pavlyutkin wrote: >> The patch fixes error reporting to check timeout in the case when a user specifies OnError hander. Before VMError:check_timeout() ignored timeout in this case, and so didn't break malloc() deadlock. >> >> Verification (amd64/20.04LTS): the idea of the test is to crash JVM running with error hander of 3 successive `sleep` commands for 1s, 10s, and 60s with and without specified timeout >> >> >> 16:52:17 at alex@alex-VirtualBox>( echo " >> public class C { >> public static void main(String[] args) throws Throwable { >>> while (true) Thread.sleep(1000); >>> } >>> } >>> " >> C.java ) >> 16:57:35 at alex@alex-VirtualBox>./build/linux-x86_64-server-release/images/jdk/bin/java -XX:OnError='sleep 1;sleep 10;sleep 60' ./C.java & >> [2] 179574 >> 17:00:19 at alex@alex-VirtualBox>kill -s SIGSEGV 179574 >> 17:00:27 at alex@alex-VirtualBox># >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # SIGSEGV (0xb) at pc=0x00007f7b1701ecd5 (sent by kill), pid=179574, tid=179574 >> # >> # JRE version: OpenJDK Runtime Environment (21.0) (build 21-internal-adhoc.alex.jdk) >> # Java VM: OpenJDK 64-Bit Server VM (21-internal-adhoc.alex.jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) >> # Problematic frame: >> # C [libpthread.so.0+0x9cd5] __pthread_clockjoin_ex+0x255 >> # >> # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /home/alex/jdk/core.179574) >> # >> # An error report file with more information is saved as: >> # /home/alex/jdk/hs_err_pid179574.log >> # >> # If you would like to submit a bug report, please visit: >> # https://bugreport.java.com/bugreport/crash.jsp >> # >> # >> # -XX:OnError="sleep 1;sleep 10;sleep 60" >> # Executing /bin/sh -c "sleep 1" ... >> # Executing /bin/sh -c "sleep 10" ... >> # Executing /bin/sh -c "sleep 60" ... >> >> [2]+ Aborted (core dumped) ./build/linux-x86_64-server-release/images/jdk/bin/java -XX:OnError='sleep 1;sleep 10;sleep 60' ./C.java >> 17:02:03 at alex@alex-VirtualBox>./build/linux-x86_64-server-release/images/jdk/bin/java -XX:ErrorLogTimeout=5 -XX:OnError='sleep 1;sleep 10;sleep 60' ./C.java & >> [2] 179602 >> 17:02:32 at alex@alex-VirtualBox>kill -s SIGSEGV 179602 >> 17:02:41 at alex@alex-VirtualBox># >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # SIGSEGV (0xb) at pc=0x00007f9d71b18cd5 (sent by kill), pid=179602, tid=179602 >> # >> # JRE version: OpenJDK Runtime Environment (21.0) (build 21-internal-adhoc.alex.jdk) >> # Java VM: OpenJDK 64-Bit Server VM (21-internal-adhoc.alex.jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) >> # Problematic frame: >> # C [libpthread.so.0+0x9cd5] __pthread_clockjoin_ex+0x255 >> # >> # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /home/alex/jdk/core.179602) >> # >> # An error report file with more information is saved as: >> # /home/alex/jdk/hs_err_pid179602.log >> # >> # If you would like to submit a bug report, please visit: >> # https://bugreport.java.com/bugreport/crash.jsp >> # >> # >> # -XX:OnError="sleep 1;sleep 10;sleep 60" >> # Executing /bin/sh -c "sleep 1" ... >> # Executing /bin/sh -c "sleep 10" ... >> >> ------ Timeout during error reporting after 11 s. ------ >> >> 17:02:54 at alex@alex-VirtualBox> >> >> >> Regression (amd64/20.04LTS): `test/hotspot/jtreg/runtime/ErrorHandling` with different combinations of `-vmoption:-XX:ErrorLogTimeout=10` and `-vmoption:-XX:OnError='sleep 10'` > >> Thinking this through some more, I'm starting to doubt we do the right thing here. It is certainly convoluted: we _are_ the reporting thread here, so what happens is that: >> >> * after fork_and_exec, we call check_timeout >> * check_timeout will signal ourself >> * we receive the signal >> * we enter the secondary signal handler recursively >> * we then re-enter VMError::report_and_die >> * we then note that this is a timeout and print an error message and call os::die. >> >> That is too complicated for my taste, and I don't know if there are any hidden issues with VM::check_timeout(). Since now, we call it from two threads, possibly, the reporting thread and the watcher thread. That function was not intended for concurrent usage. >> >> And before thinking about the correct behavior, we need to clarify if the protection we grant an OnError invocation extends to the whole chain of error scripts. Right now we say OnError scripts should not be interrupted. Okay, but what about the next OnError script? If the user specifies several OnError scripts, should they all get a chance to run to finish? >> >> Because denying the follow-up error scripts a chance to run feels weirdly arbitrary. Either those scripts are essential, or they aren't. If they are, all should run. If they are not, it should be okay to kill the JVM _while it is waiting for the child process to finish_ - this would make the patch simpler, and people argue that this would be the correct behavior anyway. >> >> Personally, I think that killing the JVM while it is in waitpid waiting for the child is probably benign. Child would be reparented, possibly zombified on badly set up systems, but that's it. It would probably run to completion. > > ok, let's replace check_timeout() with something like > > jlong expiration = get_reporting_start_time() + ( jlong )ErrorLogTimeout * TIMESTAMP_TO_SECONDS_FACTOR; > if ( get_current_timestamp() > expiration ) break; @apavlyutkin @jsolomon8080 https://bugs.openjdk.org/browse/JDK-8303861 is in. Would be interesting to see if this already helps the customer. ------------- PR: https://git.openjdk.org/jdk/pull/12925 From matsaave at openjdk.org Tue Mar 14 17:04:48 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 14 Mar 2023 17:04:48 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v2] In-Reply-To: References: <-Kj1YJ_nRa4nJtaxg3UR8uWhde6vIG1Jl-FFakGnHy4=.a41c6149-912b-4a66-8b1e-634bd27cdebb@github.com> Message-ID: On Tue, 14 Mar 2023 09:20:54 GMT, Richard Reingruber wrote: > @matias9927 can I ask you to merge master? There seem to be conflicts (at least I see a message "This branch has conflicts that must be resolved"). I'd like to give the change a spin in our CI testing. This requires that it can be applied on master. I saw that merge error but nothing came up when I tried to merge locally. The branch is updated nonetheless, so you should be able to test it now @reinrich ! ------------- PR: https://git.openjdk.org/jdk/pull/12778 From jwaters at openjdk.org Tue Mar 14 18:00:30 2023 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 14 Mar 2023 18:00:30 GMT Subject: RFR: 8301308: Remove version conditionalization for gcc/clang PRAGMA_DIAG_PUSH/POP Message-ID: <9BP1enfxaU7HVyJEL27oCu9kMC45tWMkeUcqg1Xr6zE=.4a774982-1c1a-44ed-8862-2c300e79f458@github.com> As of now we at minimum require clang 3.5 and gcc 6 to compile the Java Platform, the version checks for gcc/clang here are for whether clang is either version 4 and above, or has a minor version higher than 3.1, and for gcc either a major version higher than 4 or minor version above 4.6. Now these will always pass, so they can be removed. Also changes the macro definition location to match Visual C++ and look neater ------------- Commit messages: - Remove version conditionalization for gcc/clang PRAGMA_DIAG_PUSH/POP Changes: https://git.openjdk.org/jdk/pull/13025/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13025&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8301308 Stats: 13 lines in 1 file changed: 3 ins; 10 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13025.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13025/head:pull/13025 PR: https://git.openjdk.org/jdk/pull/13025 From duke at openjdk.org Tue Mar 14 18:01:57 2023 From: duke at openjdk.org (Alexey Pavlyutkin) Date: Tue, 14 Mar 2023 18:01:57 GMT Subject: RFR: 8302073: Specifying OnError handler prevents WatcherThread to break a deadlock in report_and_die() In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 18:11:09 GMT, Alexey Pavlyutkin wrote: >> The patch fixes error reporting to check timeout in the case when a user specifies OnError hander. Before VMError:check_timeout() ignored timeout in this case, and so didn't break malloc() deadlock. >> >> Verification (amd64/20.04LTS): the idea of the test is to crash JVM running with error hander of 3 successive `sleep` commands for 1s, 10s, and 60s with and without specified timeout >> >> >> 16:52:17 at alex@alex-VirtualBox>( echo " >> public class C { >> public static void main(String[] args) throws Throwable { >>> while (true) Thread.sleep(1000); >>> } >>> } >>> " >> C.java ) >> 16:57:35 at alex@alex-VirtualBox>./build/linux-x86_64-server-release/images/jdk/bin/java -XX:OnError='sleep 1;sleep 10;sleep 60' ./C.java & >> [2] 179574 >> 17:00:19 at alex@alex-VirtualBox>kill -s SIGSEGV 179574 >> 17:00:27 at alex@alex-VirtualBox># >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # SIGSEGV (0xb) at pc=0x00007f7b1701ecd5 (sent by kill), pid=179574, tid=179574 >> # >> # JRE version: OpenJDK Runtime Environment (21.0) (build 21-internal-adhoc.alex.jdk) >> # Java VM: OpenJDK 64-Bit Server VM (21-internal-adhoc.alex.jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) >> # Problematic frame: >> # C [libpthread.so.0+0x9cd5] __pthread_clockjoin_ex+0x255 >> # >> # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /home/alex/jdk/core.179574) >> # >> # An error report file with more information is saved as: >> # /home/alex/jdk/hs_err_pid179574.log >> # >> # If you would like to submit a bug report, please visit: >> # https://bugreport.java.com/bugreport/crash.jsp >> # >> # >> # -XX:OnError="sleep 1;sleep 10;sleep 60" >> # Executing /bin/sh -c "sleep 1" ... >> # Executing /bin/sh -c "sleep 10" ... >> # Executing /bin/sh -c "sleep 60" ... >> >> [2]+ Aborted (core dumped) ./build/linux-x86_64-server-release/images/jdk/bin/java -XX:OnError='sleep 1;sleep 10;sleep 60' ./C.java >> 17:02:03 at alex@alex-VirtualBox>./build/linux-x86_64-server-release/images/jdk/bin/java -XX:ErrorLogTimeout=5 -XX:OnError='sleep 1;sleep 10;sleep 60' ./C.java & >> [2] 179602 >> 17:02:32 at alex@alex-VirtualBox>kill -s SIGSEGV 179602 >> 17:02:41 at alex@alex-VirtualBox># >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # SIGSEGV (0xb) at pc=0x00007f9d71b18cd5 (sent by kill), pid=179602, tid=179602 >> # >> # JRE version: OpenJDK Runtime Environment (21.0) (build 21-internal-adhoc.alex.jdk) >> # Java VM: OpenJDK 64-Bit Server VM (21-internal-adhoc.alex.jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) >> # Problematic frame: >> # C [libpthread.so.0+0x9cd5] __pthread_clockjoin_ex+0x255 >> # >> # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /home/alex/jdk/core.179602) >> # >> # An error report file with more information is saved as: >> # /home/alex/jdk/hs_err_pid179602.log >> # >> # If you would like to submit a bug report, please visit: >> # https://bugreport.java.com/bugreport/crash.jsp >> # >> # >> # -XX:OnError="sleep 1;sleep 10;sleep 60" >> # Executing /bin/sh -c "sleep 1" ... >> # Executing /bin/sh -c "sleep 10" ... >> >> ------ Timeout during error reporting after 11 s. ------ >> >> 17:02:54 at alex@alex-VirtualBox> >> >> >> Regression (amd64/20.04LTS): `test/hotspot/jtreg/runtime/ErrorHandling` with different combinations of `-vmoption:-XX:ErrorLogTimeout=10` and `-vmoption:-XX:OnError='sleep 10'` > >> Thinking this through some more, I'm starting to doubt we do the right thing here. It is certainly convoluted: we _are_ the reporting thread here, so what happens is that: >> >> * after fork_and_exec, we call check_timeout >> * check_timeout will signal ourself >> * we receive the signal >> * we enter the secondary signal handler recursively >> * we then re-enter VMError::report_and_die >> * we then note that this is a timeout and print an error message and call os::die. >> >> That is too complicated for my taste, and I don't know if there are any hidden issues with VM::check_timeout(). Since now, we call it from two threads, possibly, the reporting thread and the watcher thread. That function was not intended for concurrent usage. >> >> And before thinking about the correct behavior, we need to clarify if the protection we grant an OnError invocation extends to the whole chain of error scripts. Right now we say OnError scripts should not be interrupted. Okay, but what about the next OnError script? If the user specifies several OnError scripts, should they all get a chance to run to finish? >> >> Because denying the follow-up error scripts a chance to run feels weirdly arbitrary. Either those scripts are essential, or they aren't. If they are, all should run. If they are not, it should be okay to kill the JVM _while it is waiting for the child process to finish_ - this would make the patch simpler, and people argue that this would be the correct behavior anyway. >> >> Personally, I think that killing the JVM while it is in waitpid waiting for the child is probably benign. Child would be reparented, possibly zombified on badly set up systems, but that's it. It would probably run to completion. > > ok, let's replace check_timeout() with something like > > jlong expiration = get_reporting_start_time() + ( jlong )ErrorLogTimeout * TIMESTAMP_TO_SECONDS_FACTOR; > if ( get_current_timestamp() > expiration ) break; > @apavlyutkin @jsolomon8080 https://bugs.openjdk.org/browse/JDK-8303861 is in. Would be interesting to see if this already helps the customer. Sure, I'll try to backport the changes to zulu11. Thanks a lot ------------- PR: https://git.openjdk.org/jdk/pull/12925 From vlivanov at openjdk.org Tue Mar 14 18:08:46 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 14 Mar 2023 18:08:46 GMT Subject: RFR: 8304089: Convert TraceDependencies to UL [v3] In-Reply-To: References: <0aGc1NdAjpvksCWmXb1gZOPp9MV0n6xWvG8EaEp2ZLg=.b79cf218-9a23-420d-bec5-7509b7f8f1c1@github.com> Message-ID: On Tue, 14 Mar 2023 15:40:53 GMT, Coleen Phillimore wrote: >> This change converts TraceDependencies to UL and removes the develop option. I think this provides further flexibility to add tags to only trace certain things in dependency analysis, as I did when trying to understand a PR for a deoptimization change. For now, the messages are the same and the option is -Xlog:dependencies=debug. >> Tested with tier1-4 > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix merge conflict. Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.org/jdk/pull/13007 From chagedorn at openjdk.org Tue Mar 14 18:18:37 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 14 Mar 2023 18:18:37 GMT Subject: RFR: 8302073: Specifying OnError handler prevents WatcherThread to break a deadlock in report_and_die() In-Reply-To: References: <0jlQWouDE0tJ-ysn7WFYArqrJwHFQ-hhsZKKRGdVhmU=.95c4ceb0-9b8e-4c9b-9029-473626fb5a6b@github.com> Message-ID: On Tue, 14 Mar 2023 09:04:51 GMT, Christian Hagedorn wrote: >> @dholmes-ora My proposal would be to be pragmatic and continue the discussion here. This has been going on too long. Switching channels again would be more confusing. >> >> I would prefer JBS or ML discussion as well, but not everyone (eg. @jsolomon8080) has write access to JBS. >> >> ---- >> >> I thought this over some more. >> >> A big part of this problem is a plain bug in VMError::check_timeout(): it should never block *step timeouts*. We added step timeouts back in 2016 to deal with these kinds of problems: if the JVM hangs in error reporting, give it a kick to get it going again. Do so repeatedly. This is related to the global timeout, but works also independently from it (and therefore from the question of when to honor the global timeout). >> >> I opened https://bugs.openjdk.org/browse/JDK-8303861 to deal with this, see PR: https://github.com/openjdk/jdk/pull/12936 >> >> This may already be a big help for cases like these. In fact, it may already be enough, and we could maybe close this issue. >> >> However, if we have a recursive malloc situation, we may hang repeatedly. JVM will kick itself alive every time (with my patch) but this is still annoying. The root problem here is that we should not use malloc during error handling. Cannot always be avoided, but at least we should minimize malloc use. >> >> Decoder, in particular, should not use malloc. Therefore I also opened https://bugs.openjdk.org/browse/JDK-8303862 to track that. I won't have time to work on that. I have the hope that maybe @chhagedorn can :-) ? Otherwise Azul may also chip in some bug fixing. >> >> ---- >> >> All these are unrelated to the question of whether OnError should be blocked or not. I realize now that if we decide to (continue to) protect OnError from timeouts, we must never act on the global timeout until all OnError steps ran. Since these run right before VM exit, the original implementation that just blocked the global timeout altogether was actually right. >> >> So, there's that. Maybe JDK-8303861 is already enough for cases like this. At least JDK-8303861 does not introduce any backward compatibility issues. > >> Decoder, in particular, should not use malloc. Therefore I also opened https://bugs.openjdk.org/browse/JDK-8303862 to track that. I won't have time to work on that. I have the hope that maybe @chhagedorn can :-) ? > > Does any of our current allocation strategies allow the usage of a custom scratch buffer (i.e. the error reporting scratch buffer) as memory location? If not, it could get more complicated. I could still try to tackle that but I'm not sure if I have the necessary knowledge in that area. > > > Decoder, in particular, should not use malloc. Therefore I also opened https://bugs.openjdk.org/browse/JDK-8303862 to track that. I won't have time to work on that. I have the hope that maybe @chhagedorn can :-) ? > > > > > > Does any of our current allocation strategies allow the usage of a custom scratch buffer (i.e. the error reporting scratch buffer) as memory location? If not, it could get more complicated. I could still try to tackle that but I'm not sure if I have the necessary knowledge in that area. > > What we usually do for these kind of problems is to pass the scratch buffer via argument to the processing functions. And make that use either it or, if NULL was passes, allocate its own thing. > > Another way to do this would be to add a scratch buffer pointer to Thread. > > A third way to do this (I had been experimenting with it) would be to pre-allocate a scratch buffer, and once error handling began, to use that inside os::malloc. That is the most involved solution, though, and I'm not particularly fond of it. I know many reviewers would hate it, too :-) Thanks for the summary. Could we also use "placement new" to use the scratch buffer to allocate and create objects to? For example, when creating a decoder, to use something like decoder = new(scratch_buffer) ElfDecoder(); here: https://github.com/openjdk/jdk/blob/4e631fa43fd821846c12ae2177360c44cf770766/src/hotspot/share/utilities/decoder.cpp#L62-L70 Then we can still utilize polymorphism. But then `AbstractDecoder` needs to be something else than `CHeapObj`. And of course, this approach also needs careful checking that a new object actually fits into the provided scratch buffer. On top of that, it gets more complicated when we want to keep multiple objects alive in the same scratch buffer. So, I'm not sure if this approach is feasible, though. ------------- PR: https://git.openjdk.org/jdk/pull/12925 From dcubed at openjdk.org Tue Mar 14 18:22:28 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Tue, 14 Mar 2023 18:22:28 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v25] In-Reply-To: <_3eNnc8JNcoPdK8IZHTUGkSMqnERJmfab9ry323jVKI=.1faf478c-af7f-4d82-a35b-0252b2d068cc@github.com> References: <_3eNnc8JNcoPdK8IZHTUGkSMqnERJmfab9ry323jVKI=.1faf478c-af7f-4d82-a35b-0252b2d068cc@github.com> Message-ID: On Tue, 14 Mar 2023 15:47:29 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > SA fixes related to latest changes in LockStack I kicked off a round of Mach5 Tier1 testing last night. I got 133 SA test failures that are probably fixed by v24. runtime/logging/MonitorInflationTest.java also failed on all 5 configs tested in Tier1: java.lang.RuntimeException: 'inflate(has_locker):' missing from stdout/stderr at jdk.test.lib.process.OutputAnalyzer.shouldContain(OutputAnalyzer.java:221) at MonitorInflationTest.analyzeOutputOn(MonitorInflationTest.java:41) at MonitorInflationTest.main(MonitorInflationTest.java:56) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) at java.base/java.lang.reflect.Method.invoke(Method.java:578) at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:312) at java.base/java.lang.Thread.run(Thread.java:1623) I suspect that the failing condition is one that I added to the test a long time ago so I'll be taking a look. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Tue Mar 14 18:22:30 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 14 Mar 2023 18:22:30 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v25] In-Reply-To: References: <_3eNnc8JNcoPdK8IZHTUGkSMqnERJmfab9ry323jVKI=.1faf478c-af7f-4d82-a35b-0252b2d068cc@github.com> Message-ID: On Tue, 14 Mar 2023 18:16:36 GMT, Daniel D. Daugherty wrote: > I kicked off a round of Mach5 Tier1 testing last night. I got 133 SA test failures that are probably fixed by v24. runtime/logging/MonitorInflationTest.java also failed on all 5 configs tested in Tier1: > > ``` > java.lang.RuntimeException: 'inflate(has_locker):' missing from stdout/stderr > at jdk.test.lib.process.OutputAnalyzer.shouldContain(OutputAnalyzer.java:221) > at MonitorInflationTest.analyzeOutputOn(MonitorInflationTest.java:41) > at MonitorInflationTest.main(MonitorInflationTest.java:56) > at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) > at java.base/java.lang.reflect.Method.invoke(Method.java:578) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:312) > at java.base/java.lang.Thread.run(Thread.java:1623) > ``` > > I suspect that the failing condition is one that I added to the test a long time ago so I'll be taking a look. Aww, that is bad timing. I pushed a change yesterday that broke SA, and I only pushed a fix 2 hours ago. It should be good now, in case you want to try it again. Thank you for your effort to review and test this change! Roman ------------- PR: https://git.openjdk.org/jdk/pull/10907 From dcubed at openjdk.org Tue Mar 14 18:44:56 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Tue, 14 Mar 2023 18:44:56 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v25] In-Reply-To: <_3eNnc8JNcoPdK8IZHTUGkSMqnERJmfab9ry323jVKI=.1faf478c-af7f-4d82-a35b-0252b2d068cc@github.com> References: <_3eNnc8JNcoPdK8IZHTUGkSMqnERJmfab9ry323jVKI=.1faf478c-af7f-4d82-a35b-0252b2d068cc@github.com> Message-ID: On Tue, 14 Mar 2023 15:47:29 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > SA fixes related to latest changes in LockStack I've reviewed the changes in v23 and v24. Trying another Mach5 Tier1 job set. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Tue Mar 14 18:52:39 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 14 Mar 2023 18:52:39 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v26] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Change log message when inflating fast-locked object ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/2f572097..b834f0ca Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=25 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=24-25 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Tue Mar 14 18:52:42 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 14 Mar 2023 18:52:42 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v25] In-Reply-To: References: <_3eNnc8JNcoPdK8IZHTUGkSMqnERJmfab9ry323jVKI=.1faf478c-af7f-4d82-a35b-0252b2d068cc@github.com> Message-ID: On Tue, 14 Mar 2023 18:41:59 GMT, Daniel D. Daugherty wrote: > I've reviewed the changes in v23 and v24. Trying another Mach5 Tier1 job set. I just now pushed a simple change that changes the log message 'inflate(fast-locked)' to 'inflate(has_locker)' to make those tests happy. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From dcubed at openjdk.org Tue Mar 14 18:52:45 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Tue, 14 Mar 2023 18:52:45 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v25] In-Reply-To: <_3eNnc8JNcoPdK8IZHTUGkSMqnERJmfab9ry323jVKI=.1faf478c-af7f-4d82-a35b-0252b2d068cc@github.com> References: <_3eNnc8JNcoPdK8IZHTUGkSMqnERJmfab9ry323jVKI=.1faf478c-af7f-4d82-a35b-0252b2d068cc@github.com> Message-ID: On Tue, 14 Mar 2023 15:47:29 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > SA fixes related to latest changes in LockStack And it looks like you just pushed a fix for: runtime/logging/MonitorInflationTest.java. I killed my Mach5 Tier1 and I'll resync again... ------------- PR: https://git.openjdk.org/jdk/pull/10907 From kbarrett at openjdk.org Tue Mar 14 18:55:36 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 14 Mar 2023 18:55:36 GMT Subject: RFR: 8301308: Remove version conditionalization for gcc/clang PRAGMA_DIAG_PUSH/POP In-Reply-To: <9BP1enfxaU7HVyJEL27oCu9kMC45tWMkeUcqg1Xr6zE=.4a774982-1c1a-44ed-8862-2c300e79f458@github.com> References: <9BP1enfxaU7HVyJEL27oCu9kMC45tWMkeUcqg1Xr6zE=.4a774982-1c1a-44ed-8862-2c300e79f458@github.com> Message-ID: On Tue, 14 Mar 2023 17:51:12 GMT, Julian Waters wrote: > As of now we at minimum require clang 3.5 and gcc 6 to compile the Java Platform, the version checks for gcc/clang here are for whether clang is either version 4 and above, or has a minor version higher than 3.1, and for gcc either a major version higher than 4 or minor version above 4.6. Now these will always pass, so they can be removed. Also changes the macro definition location to match Visual C++ and look neater Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.org/jdk/pull/13025 From coleenp at openjdk.org Tue Mar 14 19:34:55 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 14 Mar 2023 19:34:55 GMT Subject: RFR: 8304089: Convert TraceDependencies to UL [v3] In-Reply-To: References: <0aGc1NdAjpvksCWmXb1gZOPp9MV0n6xWvG8EaEp2ZLg=.b79cf218-9a23-420d-bec5-7509b7f8f1c1@github.com> Message-ID: <73p3Jef69fNhq6evZtlnN7a-Aoy0hxJzXDpMBIzsz0c=.1f599ec1-577c-4dbe-926c-03b61bc71138@github.com> On Tue, 14 Mar 2023 15:40:53 GMT, Coleen Phillimore wrote: >> This change converts TraceDependencies to UL and removes the develop option. I think this provides further flexibility to add tags to only trace certain things in dependency analysis, as I did when trying to understand a PR for a deoptimization change. For now, the messages are the same and the option is -Xlog:dependencies=debug. >> Tested with tier1-4 > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix merge conflict. Thank you Vladimir. ------------- PR: https://git.openjdk.org/jdk/pull/13007 From dnsimon at openjdk.org Tue Mar 14 19:49:48 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 14 Mar 2023 19:49:48 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v4] In-Reply-To: References: <5apLgAhjSwiK-sv6Xrl9yZctZTAe_GahWGyk8rUYgvk=.1af11917-7547-48ba-b958-01c6ef6f9f18@github.com> Message-ID: On Tue, 14 Mar 2023 13:37:23 GMT, Frederic Parain wrote: >> src/hotspot/share/jvmci/jvmciEnv.hpp line 149: >> >>> 147: }; >>> 148: >>> 149: extern JNIEXPORT jobjectArray c2v_getDeclaredFieldsInfo(JNIEnv* env, jobject, jobject, jlong); >> >> What's the purpose of this declaration? I don't think you need it or the `friend` declaration below since `new_FieldInfo(FieldInfo* fieldinfo, JVMCI_TRAPS)` is public. > > Without this declaration, builds fail on Windows with this error: > `error C2375: 'c2v_getDeclaredFieldsInfo': redefinition; different linkage` Strange - thats not needed for other `JVMCIEnv` methods called from `jvmciCompilerToVM.cpp`. There must be some way to avoid this. ------------- PR: https://git.openjdk.org/jdk/pull/12855 From fparain at openjdk.org Tue Mar 14 19:49:49 2023 From: fparain at openjdk.org (Frederic Parain) Date: Tue, 14 Mar 2023 19:49:49 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v4] In-Reply-To: References: <5apLgAhjSwiK-sv6Xrl9yZctZTAe_GahWGyk8rUYgvk=.1af11917-7547-48ba-b958-01c6ef6f9f18@github.com> Message-ID: <0SZEwskweBz1Ri6krqHq9rGWvmwiQ9fkgfRcUEKtTuo=.67dbf4d3-4432-4187-a39d-f16ef76cc2ce@github.com> On Tue, 14 Mar 2023 15:11:36 GMT, Doug Simon wrote: >> Without this declaration, builds fail on Windows with this error: >> `error C2375: 'c2v_getDeclaredFieldsInfo': redefinition; different linkage` > > Strange - thats not needed for other `JVMCIEnv` methods called from `jvmciCompilerToVM.cpp`. There must be some way to avoid this. The issue was caused by the `friend` declaration below (I cannot remember why I added in the first place), which seems to add an implicit declaration of the method that was conflicting with the original declaration of the method. Once the `friend` declaration is removed, builds on Windows don't need the `extern` declaration anymore. ------------- PR: https://git.openjdk.org/jdk/pull/12855 From matsaave at openjdk.org Tue Mar 14 20:20:41 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 14 Mar 2023 20:20:41 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v4] In-Reply-To: References: Message-ID: > The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. > > This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. > > Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. > > The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. > > This change supports the following platforms: x86, aarch64, PPC, and RISCV Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: RISCV port update ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12778/files - new: https://git.openjdk.org/jdk/pull/12778/files/a3e7ca02..db892223 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=02-03 Stats: 23 lines in 2 files changed: 5 ins; 12 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/12778.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12778/head:pull/12778 PR: https://git.openjdk.org/jdk/pull/12778 From fparain at openjdk.org Tue Mar 14 20:32:53 2023 From: fparain at openjdk.org (Frederic Parain) Date: Tue, 14 Mar 2023 20:32:53 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v4] In-Reply-To: <23n-dTVRiGuVl7imPvKph7q43FuB1k7Hak6-mGNDKeM=.40ca325c-e53f-4950-bece-99b7e4f4d367@github.com> References: <5apLgAhjSwiK-sv6Xrl9yZctZTAe_GahWGyk8rUYgvk=.1af11917-7547-48ba-b958-01c6ef6f9f18@github.com> <23n-dTVRiGuVl7imPvKph7q43FuB1k7Hak6-mGNDKeM=.40ca325c-e53f-4950-bece-99b7e4f4d367@github.com> Message-ID: On Tue, 14 Mar 2023 15:10:04 GMT, Doug Simon wrote: >> Access to internal modifiers is needed in `HotSpotResolvedJavaFieldTest.testEquivalenceForInternalFields()`. I moved the declaration of the method to `HotSpotResolvedJavaField`. Does this change work for you? > > Just use reflection to read the internal flags (like this test already does for the `index` field). > > I've attached [review.patch](https://github.com/openjdk/jdk/files/10970245/review.patch) with this change and a few other changes I think should be made for better naming (plus one test cleanup). Thank you for the patch, it will be included in the next commit. ------------- PR: https://git.openjdk.org/jdk/pull/12855 From mdoerr at openjdk.org Tue Mar 14 20:33:34 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 14 Mar 2023 20:33:34 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v15] In-Reply-To: References: Message-ID: > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Will get addressed separately: [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Allow TestHFA to run on musl. Add Upcalls. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12708/files - new: https://git.openjdk.org/jdk/pull/12708/files/2e4e269e..364a84ed Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=13-14 Stats: 208 lines in 2 files changed: 204 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From mdoerr at openjdk.org Tue Mar 14 20:33:40 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 14 Mar 2023 20:33:40 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v14] In-Reply-To: References: Message-ID: On Mon, 13 Mar 2023 14:31:19 GMT, Jorn Vernee wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Introduce ABIv2CallArranger for linux ppc64le. > > test/jdk/java/foreign/TestHFA.java line 32: > >> 30: * @enablePreview >> 31: * @requires ((os.arch == "amd64" | os.arch == "x86_64") & sun.arch.data.model == "64") | os.arch == "aarch64" | os.arch == "ppc64le" | os.arch == "riscv64" >> 32: * @requires !vm.musl > > Not sure if this test should be disabled on musl? Changed with https://github.com/openjdk/jdk/pull/12708/commits/364a84edc416abd5f1318f78057c92720fe96990. Plus added Upcalls. ------------- PR: https://git.openjdk.org/jdk/pull/12708 From mdoerr at openjdk.org Tue Mar 14 22:30:22 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 14 Mar 2023 22:30:22 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v16] In-Reply-To: References: Message-ID: > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Will get addressed separately: [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Fix storing 32 bit integers into Java frames. Enable TestArrayStructs. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12708/files - new: https://git.openjdk.org/jdk/pull/12708/files/364a84ed..9173af20 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=14-15 Stats: 14 lines in 2 files changed: 5 ins; 3 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From duke at openjdk.org Tue Mar 14 22:33:52 2023 From: duke at openjdk.org (jsolomon8080) Date: Tue, 14 Mar 2023 22:33:52 GMT Subject: RFR: 8302073: Specifying OnError handler prevents WatcherThread to break a deadlock in report_and_die() In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 17:58:28 GMT, Alexey Pavlyutkin wrote: > > @apavlyutkin @jsolomon8080 https://bugs.openjdk.org/browse/JDK-8303861 is in. Would be interesting to see if this already helps the customer. > > Sure, I'll try to backport the changes to zulu11. Thanks a lot I look forward to receiving a version of zulu11 with this change and testing it out! ------------- PR: https://git.openjdk.org/jdk/pull/12925 From mdoerr at openjdk.org Tue Mar 14 22:34:47 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 14 Mar 2023 22:34:47 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v16] In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 22:30:22 GMT, Martin Doerr wrote: >> Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". >> >> This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). >> >> Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. >> >> There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Will get addressed separately: [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) >> >> The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. >> >> I had to make changes to shared code and code for other platforms: >> 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: >> - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. >> - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. >> - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! >> 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Fix storing 32 bit integers into Java frames. Enable TestArrayStructs. Thanks for the hint. Using `jshell` is a good idea. { callerMethodType: (long,int,int,int,int,int,int,int,int)long calleeMethodType: (MemorySegment,byte,byte,byte,byte,byte,byte,byte,byte)MemorySegment FunctionDescriptor: ([[8:b8]]b8b8b8b8b8b8b8b8)[[8:b8]] Argument Bindings: 0: [Allocate[size=8, alignment=1], Dup[], VMLoad[storage=VMStorage[type=0, segmentMaskOrSize=3, indexOrOffset=3, debugName=r3], type=long], BufferStore[offset=0, type=long, byteWidth=8]] 1: [VMLoad[storage=VMStorage[type=0, segmentMaskOrSize=3, indexOrOffset=4, debugName=r4], type=int], INT_TO_BYTE] 2: [VMLoad[storage=VMStorage[type=0, segmentMaskOrSize=3, indexOrOffset=5, debugName=r5], type=int], INT_TO_BYTE] 3: [VMLoad[storage=VMStorage[type=0, segmentMaskOrSize=3, indexOrOffset=6, debugName=r6], type=int], INT_TO_BYTE] 4: [VMLoad[storage=VMStorage[type=0, segmentMaskOrSize=3, indexOrOffset=7, debugName=r7], type=int], INT_TO_BYTE] 5: [VMLoad[storage=VMStorage[type=0, segmentMaskOrSize=3, indexOrOffset=8, debugName=r8], type=int], INT_TO_BYTE] 6: [VMLoad[storage=VMStorage[type=0, segmentMaskOrSize=3, indexOrOffset=9, debugName=r9], type=int], INT_TO_BYTE] 7: [VMLoad[storage=VMStorage[type=0, segmentMaskOrSize=3, indexOrOffset=10, debugName=r10], type=int], INT_TO_BYTE] 8: [VMLoad[storage=VMStorage[type=2, segmentMaskOrSize=8, indexOrOffset=64, debugName=Stack at 64], type=int], INT_TO_BYTE] Return: [BufferLoad[offset=0, type=long, byteWidth=8], VMStore[storage=VMStorage[type=0, segmentMaskOrSize=3, indexOrOffset=3, debugName=r3], type=long]] } I have fixed the issue and enabled the test. My code still contained an old assumption which doesn't hold for the cases I had implemented more recently. So, this PR is probably complete. I'll rerun tests and wait for 2 reviews. ------------- PR: https://git.openjdk.org/jdk/pull/12708 From mdoerr at openjdk.org Tue Mar 14 22:55:51 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 14 Mar 2023 22:55:51 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v16] In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 22:30:22 GMT, Martin Doerr wrote: >> Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". >> >> This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). >> >> Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. >> >> There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Will get addressed separately: [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) >> >> The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. >> >> I had to make changes to shared code and code for other platforms: >> 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: >> - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. >> - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. >> - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! >> 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Fix storing 32 bit integers into Java frames. Enable TestArrayStructs. Btw. the new cases in which we use int and short accesses when byteWidth is not a power of 2 are never unaligned AFAICS. I guess _UNALIGNED is unnecessary in the JAVA_INT_UNALIGNED and JAVA_SHORT_UNALIGNED. They are always aligned wrt. to their size. ------------- PR: https://git.openjdk.org/jdk/pull/12708 From ccheung at openjdk.org Tue Mar 14 23:37:46 2023 From: ccheung at openjdk.org (Calvin Cheung) Date: Tue, 14 Mar 2023 23:37:46 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v4] In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 20:20:41 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > RISCV port update Looks good. Just a few minor comments. src/hotspot/share/interpreter/bootstrapInfo.cpp line 218: > 216: _indy_index, > 217: pool()->tag_at(_bss_index), > 218: CHECK_false); Please indent lines 216-218 like before. src/hotspot/share/interpreter/bootstrapInfo.cpp line 234: > 232: if (_indy_index > -1) { > 233: os::snprintf_checked(what, sizeof(what), "indy#%d", _indy_index); > 234: } Since the `else` case doesn?t have braces, maybe omit the braces for this case as well? src/hotspot/share/oops/cpCache.cpp line 618: > 616: indy_resolution_failed(), parameter_size()); > 617: if ((bytecode_1() == Bytecodes::_invokehandle)) { > 618: constantPoolHandle cph(Thread::current(), cache->constant_pool()); There is another `cph` defined at line 601. Could that one be used? src/hotspot/share/oops/cpCache.cpp line 652: > 650: int size = ConstantPoolCache::size(length); > 651: > 652: // Initialize resolvedinvokedynamicinfo array with available data Maybe breakup the long word `resolvedinvokedynamicinfo`? src/hotspot/share/oops/cpCache.cpp line 653: > 651: > 652: // Initialize resolvedinvokedynamicinfo array with available data > 653: Array* array; Suggestion: rename `array` to `resolved_indy_entries`. src/hotspot/share/oops/cpCache.cpp line 664: > 662: > 663: return new (loader_data, size, MetaspaceObj::ConstantPoolCacheType, THREAD) > 664: ConstantPoolCache(length, index_map, invokedynamic_map, array); I think it reads better if this line is indented to right after the open parenthesis. src/hotspot/share/prims/methodComparator.cpp line 119: > 117: if ((old_cp->name_ref_at(index_old) != new_cp->name_ref_at(index_new)) || > 118: (old_cp->signature_ref_at(index_old) != new_cp->signature_ref_at(index_new))) > 119: return false; Please adjust the indentations of lines 118 and 119 to be the same as lines 124 and 125. src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/interpreter/BytecodeWithCPIndex.java line 61: > 59: } else { > 60: return cpCache.getEntryAt((int) (0xFFFF & cpCacheIndex)).getConstantPoolIndex(); > 61: } Maybe align all `return` statements with line 56? src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/ResolvedIndyArray.java line 38: > 36: public class ResolvedIndyArray extends GenericArray { > 37: static { > 38: VM.registerVMInitializedObserver(new Observer() { Indentation for java code should be 4 spaces. src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/ResolvedIndyEntry.java line 38: > 36: private static long size; > 37: private static long baseOffset; > 38: private static CIntegerField cpIndex; Indentation for java code should be 4 spaces. ------------- PR: https://git.openjdk.org/jdk/pull/12778 From dcubed at openjdk.org Wed Mar 15 01:28:32 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 15 Mar 2023 01:28:32 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v26] In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 18:52:39 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Change log message when inflating fast-locked object I did Mach5 Tier{1,2,3} on v25. Please see the bug report for the gory details: Tier1 - 1 known, unrelated failure Tier2 - 4 closed, unknown, related test failures Tier3 - 8 closed, unknown, related test failures; 2 open, known, unrelated test failures; 16 open, unknown, related test failures I'm pausing my Mach5 testing at this point. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From dholmes at openjdk.org Wed Mar 15 01:54:30 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 15 Mar 2023 01:54:30 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v25] In-Reply-To: References: <_3eNnc8JNcoPdK8IZHTUGkSMqnERJmfab9ry323jVKI=.1faf478c-af7f-4d82-a35b-0252b2d068cc@github.com> Message-ID: On Tue, 14 Mar 2023 18:46:47 GMT, Roman Kennke wrote: >> I've reviewed the changes in v23 and v24. Trying another >> Mach5 Tier1 job set. > >> I've reviewed the changes in v23 and v24. Trying another Mach5 Tier1 job set. > > I just now pushed a simple change that changes the log message 'inflate(fast-locked)' to 'inflate(has_locker)' to make those tests happy. @rkennke this still seems to be very much a work-in-progress rather than actual PR review at this stage. Perhaps it should move back to draft until you actually have something you think is ready for integration? ------------- PR: https://git.openjdk.org/jdk/pull/10907 From gcao at openjdk.org Wed Mar 15 02:07:25 2023 From: gcao at openjdk.org (Gui Cao) Date: Wed, 15 Mar 2023 02:07:25 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v4] In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 20:20:41 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > RISCV port update Changes requested by gcao (Author). ------------- PR: https://git.openjdk.org/jdk/pull/12778 From gcao at openjdk.org Wed Mar 15 02:07:29 2023 From: gcao at openjdk.org (Gui Cao) Date: Wed, 15 Mar 2023 02:07:29 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v3] In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 13:59:48 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Typo in comment > - Merge branch 'master' into resolvedIndyEntry_8301995 > - Interpreter optimization and comments > - PPC and RISCV port > - 8301995: Move invokedynamic resolution information out of the cpCache src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 1843: > 1841: ldr(cache, Address(rcpool, in_bytes(ConstantPoolCache::invokedynamic_entries_offset()))); > 1842: // Scale the index to be the entry index * sizeof(ResolvedInvokeDynamicInfo) > 1843: mov(tmp, sizeof(ResolvedIndyEntry)); The tmp register is not used here, is it redundant? ------------- PR: https://git.openjdk.org/jdk/pull/12778 From dholmes at openjdk.org Wed Mar 15 02:08:20 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 15 Mar 2023 02:08:20 GMT Subject: RFR: 8301308: Remove version conditionalization for gcc/clang PRAGMA_DIAG_PUSH/POP In-Reply-To: <9BP1enfxaU7HVyJEL27oCu9kMC45tWMkeUcqg1Xr6zE=.4a774982-1c1a-44ed-8862-2c300e79f458@github.com> References: <9BP1enfxaU7HVyJEL27oCu9kMC45tWMkeUcqg1Xr6zE=.4a774982-1c1a-44ed-8862-2c300e79f458@github.com> Message-ID: On Tue, 14 Mar 2023 17:51:12 GMT, Julian Waters wrote: > As of now we at minimum require clang 3.5 and gcc 6 to compile the Java Platform, the version checks for gcc/clang here are for whether clang is either version 4 and above, or has a minor version higher than 3.1, and for gcc either a major version higher than 4 or minor version above 4.6. Now these will always pass, so they can be removed. Also changes the macro definition location to match Visual C++ and look neater LGTM. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/13025 From dholmes at openjdk.org Wed Mar 15 02:36:06 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 15 Mar 2023 02:36:06 GMT Subject: RFR: 8303150: DCmd framework unnecessarily creates a DCmd instance on registration [v2] In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 09:46:12 GMT, Kevin Walls wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> Comment update - Kevin's feedback > > src/hotspot/share/services/diagnosticFramework.hpp line 1: > >> 1: /* > > Should we update this comment: > 279 // - For subclasses of DCmdWithParser, it's calculated by DCmdParser::num_arguments(). Fixed - thanks. ------------- PR: https://git.openjdk.org/jdk/pull/12994 From dholmes at openjdk.org Wed Mar 15 02:36:03 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 15 Mar 2023 02:36:03 GMT Subject: RFR: 8303150: DCmd framework unnecessarily creates a DCmd instance on registration [v2] In-Reply-To: References: Message-ID: > When DCmd factories are registered, the factory is passed the number of arguments taken by the DCmd - using a template method `get_num_arguments`. For DCmds that don't extend DCmdWithParser there has to be a static `num_arguments()` method in that class. For DCmds that do extend DCmdWithParser the logic instantiates an instance of the DCmd, extracts its parser and calls its `num_arguments` method which dynamically counts the number of defined arguments. > > Creating an instance of each DCmd and dynamically counting arguments is inefficient and unnecessary, the number of arguments is statically known and easily expressed (in fact many of the JFR DCmds already statically define this). So we add the static `num_arguments()` method to each class that needs it and return the statically counted number of arguments. To ensure the static number and actual number don't get out-of-sync, we keep the original dynamic logic for use in debug builds to assert that the static and dynamic counts are the same. The assert will trigger during a debug build if something does get out of sync, for example if a new DCmd (extending DCmdWithParser) were added but didn't define the static `num_arguments()` method. > > A number of DCmd classes were unnecessarily defining their own dynamic version of `num_arguments` and these are now removed. > > In the template method I use `ENABLE_IF(std::is_convertible::value)` to check we only call this on DCmd classes. This may be unnecessary but it seemed consistent with the other template methods. Note that `std::is_base_of` only works for immediate super types. > > Testing: tiers 1-4 > > Performance: in theory we should see some improvement in startup; in practice it is barely noticeable. > > Thanks. David Holmes has updated the pull request incrementally with one additional commit since the last revision: Comment update - Kevin's feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12994/files - new: https://git.openjdk.org/jdk/pull/12994/files/5ef9724d..cfe345d8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12994&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12994&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12994.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12994/head:pull/12994 PR: https://git.openjdk.org/jdk/pull/12994 From dholmes at openjdk.org Wed Mar 15 02:57:20 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 15 Mar 2023 02:57:20 GMT Subject: RFR: 8304089: Convert TraceDependencies to UL [v3] In-Reply-To: References: <0aGc1NdAjpvksCWmXb1gZOPp9MV0n6xWvG8EaEp2ZLg=.b79cf218-9a23-420d-bec5-7509b7f8f1c1@github.com> Message-ID: <13yGfFhRFEsjHA-ox_6GxPiyU8w_hpQtgjHbsw6Glq0=.c2330c3f-0316-4f4c-aade-b6ad6c8543ee@github.com> On Tue, 14 Mar 2023 15:40:53 GMT, Coleen Phillimore wrote: >> This change converts TraceDependencies to UL and removes the develop option. I think this provides further flexibility to add tags to only trace certain things in dependency analysis, as I did when trying to understand a PR for a deoptimization change. For now, the messages are the same and the option is -Xlog:dependencies=debug. >> Tested with tier1-4 > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix merge conflict. Overall looks good. Just a couple of small tweaks needed. Thanks. src/hotspot/share/runtime/arguments.cpp line 3589: > 3587: PrintCompilation || PrintInlining || PrintDependencies || PrintNativeNMethods || > 3588: PrintDebugInfo || PrintRelocations || PrintNMethods || PrintExceptionHandlers || > 3589: PrintAssembly || TraceDeoptimization || log_is_enabled(Debug, dependencies) || Now TraceDependencies is converted to UL I think it should just be deleted from this function. We don't need to enable LogVMOutput in that case. src/hotspot/share/runtime/arguments.cpp line 4004: > 4002: bool trace_dependencies = log_is_enabled(Debug, dependencies); > 4003: if (trace_dependencies && VerifyDependencies) { > 4004: if (trace_dependencies) { This inner if is not needed. ------------- Changes requested by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/13007 From dholmes at openjdk.org Wed Mar 15 04:59:33 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 15 Mar 2023 04:59:33 GMT Subject: RFR: 8302191: Performance degradation for float/double modulo on Linux [v13] In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 12:45:42 GMT, Jan Kratochvil wrote: >> I have OCA already processed/approved. I am not Author but my Author request is being processed these days (sent to Rob McKenna). >> I did regression test x86_64 OpenJDK-8. I will leave other regression testing on GHA. >> The patch (and former GCC performance regression) affects only x86_64+i686. > > Jan Kratochvil has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: > > - Merge branch 'master' into modulo > - Fix missing SharedRuntime::frem and SharedRuntime::drem on aarch64. > - bugreported by sviswa7. > - Merge branch 'master' into modulo > - Fix #endif comment - found by dholmes-ora. > - Merge branch 'master' into modulo > - Fix win32 broken build. > - Merge remote-tracking branch 'origin/master' into modulo > - Always include the _WIN64 workaround - a review by dholmes-ora. > - Remove comments to be moved to JBS (Bug System) - a review by jddarcy. > - Uppercase L - a review by turbanoff. > - ... and 4 more: https://git.openjdk.org/jdk/compare/e315d7df...65af58da Functional CI testing in tiers 1-4 is good. I'm also running some benchmarks on Linux-x64 ------------- PR: https://git.openjdk.org/jdk/pull/12508 From stuefe at openjdk.org Wed Mar 15 06:03:22 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 15 Mar 2023 06:03:22 GMT Subject: RFR: 8302073: Specifying OnError handler prevents WatcherThread to break a deadlock in report_and_die() In-Reply-To: References: <0jlQWouDE0tJ-ysn7WFYArqrJwHFQ-hhsZKKRGdVhmU=.95c4ceb0-9b8e-4c9b-9029-473626fb5a6b@github.com> Message-ID: On Tue, 14 Mar 2023 18:13:11 GMT, Christian Hagedorn wrote: > > > > Decoder, in particular, should not use malloc. Therefore I also opened https://bugs.openjdk.org/browse/JDK-8303862 to track that. I won't have time to work on that. I have the hope that maybe @chhagedorn can :-) ? > > > > > > > > > Does any of our current allocation strategies allow the usage of a custom scratch buffer (i.e. the error reporting scratch buffer) as memory location? If not, it could get more complicated. I could still try to tackle that but I'm not sure if I have the necessary knowledge in that area. > > > > > > What we usually do for these kind of problems is to pass the scratch buffer via argument to the processing functions. And make that use either it or, if NULL was passes, allocate its own thing. > > Another way to do this would be to add a scratch buffer pointer to Thread. > > A third way to do this (I had been experimenting with it) would be to pre-allocate a scratch buffer, and once error handling began, to use that inside os::malloc. That is the most involved solution, though, and I'm not particularly fond of it. I know many reviewers would hate it, too :-) > > Thanks for the summary. Could we also use "placement new" to use the scratch buffer to allocate and create objects to? For example, when creating a decoder, to use something like > > ``` > decoder = new(scratch_buffer) ElfDecoder(); > ``` > > here: > > https://github.com/openjdk/jdk/blob/4e631fa43fd821846c12ae2177360c44cf770766/src/hotspot/share/utilities/decoder.cpp#L62-L70 > > Then we can still utilize polymorphism. But then `AbstractDecoder` needs to be something else than `CHeapObj`. And of course, this approach also needs careful checking that a new object actually fits into the provided scratch buffer. On top of that, it gets more complicated when we want to keep multiple objects alive in the same scratch buffer. So, I'm not sure if this approach is feasible, though. Its possible to use placement new, and also to place several objects like this; but I realize now the scratch buffer might be too small. I need to think this through a bit. Maybe providing a pre-allocated buffer for os::malloc would be the right thing to do. ------------- PR: https://git.openjdk.org/jdk/pull/12925 From fyang at openjdk.org Wed Mar 15 07:43:32 2023 From: fyang at openjdk.org (Fei Yang) Date: Wed, 15 Mar 2023 07:43:32 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v25] In-Reply-To: References: <_3eNnc8JNcoPdK8IZHTUGkSMqnERJmfab9ry323jVKI=.1faf478c-af7f-4d82-a35b-0252b2d068cc@github.com> Message-ID: On Tue, 14 Mar 2023 18:46:47 GMT, Roman Kennke wrote: >> I've reviewed the changes in v23 and v24. Trying another >> Mach5 Tier1 job set. > >> I've reviewed the changes in v23 and v24. Trying another Mach5 Tier1 job set. > > I just now pushed a simple change that changes the log message 'inflate(fast-locked)' to 'inflate(has_locker)' to make those tests happy. @rkennke : Hi, I have prepared some extra changes for RISC-V to make it work. See attachment. BTW: You might also want to use -w instructions in MacroAssembler::fast_unlock for aarch64. [more-riscv-changes.txt](https://github.com/openjdk/jdk/files/10977109/more-riscv-changes.txt) ------------- PR: https://git.openjdk.org/jdk/pull/10907 From aboldtch at openjdk.org Wed Mar 15 08:15:20 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 15 Mar 2023 08:15:20 GMT Subject: RFR: 8304149: Avoid walking the CodeCache in DeoptimizationScope::deoptimize_marked Message-ID: Change DeoptimizationScope to keep track of the marked CompiledMethods in a list to avoid having to walk the CodeCache to find them again when deoptimizing. This adds a linked list to DeoptimizationScope which tracks the marked CompiledMethods for the active deoptimization generation. Then when deoptimize_marked is called the committing caller claims the list and uses it to deoptimize the linked (and marked) CompileMethods instead of iterating over the CodeCache to find them again. Testing: Oracle platforms tier 1-7 ------------- Commit messages: - 8304149: Avoid walking the CodeCache in DeoptimizationScope::deoptimize_marked Changes: https://git.openjdk.org/jdk/pull/13036/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13036&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8304149 Stats: 139 lines in 6 files changed: 94 ins; 40 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/13036.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13036/head:pull/13036 PR: https://git.openjdk.org/jdk/pull/13036 From stuefe at openjdk.org Wed Mar 15 08:49:24 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 15 Mar 2023 08:49:24 GMT Subject: RFR: 8303150: DCmd framework unnecessarily creates a DCmd instance on registration [v2] In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 02:36:03 GMT, David Holmes wrote: >> When DCmd factories are registered, the factory is passed the number of arguments taken by the DCmd - using a template method `get_num_arguments`. For DCmds that don't extend DCmdWithParser there has to be a static `num_arguments()` method in that class. For DCmds that do extend DCmdWithParser the logic instantiates an instance of the DCmd, extracts its parser and calls its `num_arguments` method which dynamically counts the number of defined arguments. >> >> Creating an instance of each DCmd and dynamically counting arguments is inefficient and unnecessary, the number of arguments is statically known and easily expressed (in fact many of the JFR DCmds already statically define this). So we add the static `num_arguments()` method to each class that needs it and return the statically counted number of arguments. To ensure the static number and actual number don't get out-of-sync, we keep the original dynamic logic for use in debug builds to assert that the static and dynamic counts are the same. The assert will trigger during a debug build if something does get out of sync, for example if a new DCmd (extending DCmdWithParser) were added but didn't define the static `num_arguments()` method. >> >> A number of DCmd classes were unnecessarily defining their own dynamic version of `num_arguments` and these are now removed. >> >> In the template method I use `ENABLE_IF(std::is_convertible::value)` to check we only call this on DCmd classes. This may be unnecessary but it seemed consistent with the other template methods. Note that `std::is_base_of` only works for immediate super types. >> >> Testing: tiers 1-4 >> >> Performance: in theory we should see some improvement in startup; in practice it is barely noticeable. >> >> Thanks. > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > Comment update - Kevin's feedback Thanks a lot for removing that blemish, David! That has bugged me (and others) repeatedly. Question inline, but looks good all in all. src/hotspot/share/services/diagnosticFramework.hpp line 446: > 444: > 445: template ::value)> > 446: static int get_parsed_num_arguments() { I don't understand why we need the dynamic allocation here. Should we not be able to call your new static versions of num_arguments for DCmdWithParser children, like we do for DCmd children above? And if so, could we then not unify those two functions? ------------- PR: https://git.openjdk.org/jdk/pull/12994 From stefank at openjdk.org Wed Mar 15 08:53:29 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 15 Mar 2023 08:53:29 GMT Subject: RFR: 8292059: Do not inline InstanceKlass::allocate_instance() [v6] In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 09:50:17 GMT, Afshin Zafari wrote: >> The inline and not-inline versions of the method is tested to compare the performance difference. >> ### Test >> `make test TEST=micro:Capture0.lambda_01 MICRO="VM_OPTIONS=-XX:TieredStopAtLevel=1" ` > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > 8292059: Do not inline InstanceKlass::allocate_instance() I don't oppose this change. ------------- Marked as reviewed by stefank (Reviewer). PR: https://git.openjdk.org/jdk/pull/12782 From rkennke at openjdk.org Wed Mar 15 09:41:30 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 15 Mar 2023 09:41:30 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v27] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with three additional commits since the last revision: - More RISCV changes (by Fei Yang) - Use -w instructions in fast_unlock() - Increase stub size of C2HandleAnonOwnerStub to 18 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/b834f0ca..0ad01c1d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=25-26 Stats: 136 lines in 7 files changed: 62 ins; 28 del; 46 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Wed Mar 15 09:41:30 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 15 Mar 2023 09:41:30 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v26] In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 01:25:33 GMT, Daniel D. Daugherty wrote: > I did Mach5 Tier{1,2,3} on v25. Please see the bug report for the gory details: > > Tier1 - 1 known, unrelated failure Tier2 - 4 closed, unknown, related test failures Tier3 - 8 closed, unknown, related test failures; 2 open, known, unrelated test failures; 16 open, unknown, related test failures > > I'm pausing my Mach5 testing at this point. Hi Daniel, I could not reproduce any of the test failures, neither on x86_64 nor on aarch64. I have blindly fixed the code stub size issue, it seems rather trivial. Would it be possible to open/send me the failing test that triggers vframeArray assert or extract a reproducer that you could publish? I looked at it but could not figure out what could be going on there. Thanks, Roman ------------- PR: https://git.openjdk.org/jdk/pull/10907 From kevinw at openjdk.org Wed Mar 15 10:39:20 2023 From: kevinw at openjdk.org (Kevin Walls) Date: Wed, 15 Mar 2023 10:39:20 GMT Subject: RFR: 8303150: DCmd framework unnecessarily creates a DCmd instance on registration [v2] In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 02:36:03 GMT, David Holmes wrote: >> When DCmd factories are registered, the factory is passed the number of arguments taken by the DCmd - using a template method `get_num_arguments`. For DCmds that don't extend DCmdWithParser there has to be a static `num_arguments()` method in that class. For DCmds that do extend DCmdWithParser the logic instantiates an instance of the DCmd, extracts its parser and calls its `num_arguments` method which dynamically counts the number of defined arguments. >> >> Creating an instance of each DCmd and dynamically counting arguments is inefficient and unnecessary, the number of arguments is statically known and easily expressed (in fact many of the JFR DCmds already statically define this). So we add the static `num_arguments()` method to each class that needs it and return the statically counted number of arguments. To ensure the static number and actual number don't get out-of-sync, we keep the original dynamic logic for use in debug builds to assert that the static and dynamic counts are the same. The assert will trigger during a debug build if something does get out of sync, for example if a new DCmd (extending DCmdWithParser) were added but didn't define the static `num_arguments()` method. >> >> A number of DCmd classes were unnecessarily defining their own dynamic version of `num_arguments` and these are now removed. >> >> In the template method I use `ENABLE_IF(std::is_convertible::value)` to check we only call this on DCmd classes. This may be unnecessary but it seemed consistent with the other template methods. Note that `std::is_base_of` only works for immediate super types. >> >> Testing: tiers 1-4 >> >> Performance: in theory we should see some improvement in startup; in practice it is barely noticeable. >> >> Thanks. > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > Comment update - Kevin's feedback Marked as reviewed by kevinw (Committer). Took me a bit of time, as I was thinking if we compare a hard-coded value with a parsed argument count, how does that cope with optional arguments? Maybe "get_parsed_num_arguments()" suggested to me it was a live command being parsed, but DCmdParser's _arguments_list is a list of known possible arguments. Should have read your "number of arguments taken by the DCmd" at the top more carefully. ------------- PR: https://git.openjdk.org/jdk/pull/12994 From mdoerr at openjdk.org Wed Mar 15 10:51:20 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 15 Mar 2023 10:51:20 GMT Subject: RFR: 8296440: Remove Method* handling from cleanup_inline_caches_impl In-Reply-To: References: Message-ID: On Wed, 1 Mar 2023 11:07:36 GMT, Richard Reingruber wrote: > This PR replaces cleaning of static stubs in CompiledMethod::cleanup_inline_caches_impl() with a guarantee that it is actually not needed because the holder of the embedded target Method* is alive if the caller nmethod is not unloading. > > The holder of the target Method* has to be alive because it is reachable from the caller nmethod's oop pool. This is checked by `check_path_to_callee()` when a statically bound call gets resolved. > > C2i entry barriers can be removed for the same reason. > > Testing: > > Many rounds in our CI testing which includes most JCK and JTREG tests, Renaissance benchmark and SAP specific tests with fastdebug and release builds on the standard platforms plus PPC64. > > I've also done tier1 and tier2 tests with -XX:-Inline and tier1 tests with ZGC. > > I've started hotspot and jdk tier1 tests with -Xcomp. They were not finished when I stopped them after 24h. Impressive that we can get rid of so much complicated code! Looks correct to me. Is the check fast enough to do it on every resolution? ------------- Marked as reviewed by mdoerr (Reviewer). PR: https://git.openjdk.org/jdk/pull/12802 From jsjolen at openjdk.org Wed Mar 15 11:09:25 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 15 Mar 2023 11:09:25 GMT Subject: RFR: JDK-8301498: Replace NULL with nullptr in cpu/x86 [v3] In-Reply-To: References: Message-ID: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/x86. Unfortunately the script that does the change isn't perfect, and so we > need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Fix vnkozlov's suggestions - Merge remote-tracking branch 'origin/master' into JDK-8301498 - Some more fixes - Fixes - Replace NULL with nullptr in cpu/x86 ------------- Changes: https://git.openjdk.org/jdk/pull/12326/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12326&range=02 Stats: 665 lines in 54 files changed: 0 ins; 0 del; 665 mod Patch: https://git.openjdk.org/jdk/pull/12326.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12326/head:pull/12326 PR: https://git.openjdk.org/jdk/pull/12326 From jsjolen at openjdk.org Wed Mar 15 11:09:27 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 15 Mar 2023 11:09:27 GMT Subject: RFR: JDK-8301498: Replace NULL with nullptr in cpu/x86 [v2] In-Reply-To: References: Message-ID: On Mon, 13 Feb 2023 09:26:07 GMT, Johan Sj?len wrote: >> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/x86. Unfortunately the script that does the change isn't perfect, and so we >> need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. >> >> Here are some typical things to look out for: >> >> 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). >> 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. >> 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. >> >> An example of this: >> >> ```c++ >> // This function returns null >> void* ret_null(); >> // This function returns true if *x == nullptr >> bool is_nullptr(void** x); >> >> >> Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. >> >> Thanks! > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Some more fixes Hi Vladimir! I applied your changes, thanks for the review. ------------- PR: https://git.openjdk.org/jdk/pull/12326 From dholmes at openjdk.org Wed Mar 15 12:30:24 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 15 Mar 2023 12:30:24 GMT Subject: RFR: 8303150: DCmd framework unnecessarily creates a DCmd instance on registration [v2] In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 08:45:17 GMT, Thomas Stuefe wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> Comment update - Kevin's feedback > > src/hotspot/share/services/diagnosticFramework.hpp line 446: > >> 444: >> 445: template ::value)> >> 446: static int get_parsed_num_arguments() { > > I don't understand why we need the dynamic allocation here. Should we not be able to call your new static versions of num_arguments for DCmdWithParser children, like we do for DCmd children above? And if so, could we then not unify those two functions? Thanks for looking at this @tstuefe. The whole point of the parsed version is to ensure the static version doesn't get forgotten or out-of-date. Just realised the `get_parsed_num_arguments` should also be `ifdef ASSERT`. ------------- PR: https://git.openjdk.org/jdk/pull/12994 From dholmes at openjdk.org Wed Mar 15 12:30:22 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 15 Mar 2023 12:30:22 GMT Subject: RFR: 8303150: DCmd framework unnecessarily creates a DCmd instance on registration [v2] In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 10:36:14 GMT, Kevin Walls wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> Comment update - Kevin's feedback > > Took me a bit of time, as I was thinking if we compare a hard-coded value with a parsed argument count, how does that cope with optional arguments? Maybe "get_parsed_num_arguments()" suggested to me it was a live command being parsed, but DCmdParser's _arguments_list is a list of known possible arguments. Should have read your "number of arguments taken by the DCmd" at the top more carefully. Thanks for the review @kevinjwalls . ------------- PR: https://git.openjdk.org/jdk/pull/12994 From coleenp at openjdk.org Wed Mar 15 12:54:30 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 15 Mar 2023 12:54:30 GMT Subject: RFR: 8292059: Do not inline InstanceKlass::allocate_instance() [v6] In-Reply-To: References: <9ZXi9uNa5ETIhldKLCDAYojtXTGEg-5EexLwHNE2zhI=.026815b3-3117-4184-be4c-5fdf42c2655f@github.com> <1W22QvJKaoJVRI5Wrx6DZEizGOxHsiSAEoLkHsCauQY=.6e934949-d854-49f8-9e1e-5786c0d04c8f@github.com> Message-ID: On Fri, 3 Mar 2023 16:51:32 GMT, Coleen Phillimore wrote: >> All the #include headers are removed except the `utilities/devirtualizer.inline.hpp`. The .cpp files that need it should include it before `oops/instanceKlass.inline.hpp` which violates the _sorted header filename_ in the coding style rules. > > Good. I think devirtualizer.inline.hpp is used by the GC inlined functions so should remain here. Sorry for being confusing before. My comment was that you should remove #includes that aren't directly referenced by this .inline.hpp file. I see that you've done this now, but do you need classfile/vmSymbols.hpp here? I don't see it. ------------- PR: https://git.openjdk.org/jdk/pull/12782 From coleenp at openjdk.org Wed Mar 15 13:09:05 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 15 Mar 2023 13:09:05 GMT Subject: RFR: 8304089: Convert TraceDependencies to UL [v4] In-Reply-To: <0aGc1NdAjpvksCWmXb1gZOPp9MV0n6xWvG8EaEp2ZLg=.b79cf218-9a23-420d-bec5-7509b7f8f1c1@github.com> References: <0aGc1NdAjpvksCWmXb1gZOPp9MV0n6xWvG8EaEp2ZLg=.b79cf218-9a23-420d-bec5-7509b7f8f1c1@github.com> Message-ID: > This change converts TraceDependencies to UL and removes the develop option. I think this provides further flexibility to add tags to only trace certain things in dependency analysis, as I did when trying to understand a PR for a deoptimization change. For now, the messages are the same and the option is -Xlog:dependencies=debug. > Tested with tier1-4 Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Fix redundant 'if' ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13007/files - new: https://git.openjdk.org/jdk/pull/13007/files/e5851202..421b4ee3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13007&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13007&range=02-03 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13007.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13007/head:pull/13007 PR: https://git.openjdk.org/jdk/pull/13007 From coleenp at openjdk.org Wed Mar 15 13:09:14 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 15 Mar 2023 13:09:14 GMT Subject: RFR: 8304089: Convert TraceDependencies to UL [v3] In-Reply-To: <13yGfFhRFEsjHA-ox_6GxPiyU8w_hpQtgjHbsw6Glq0=.c2330c3f-0316-4f4c-aade-b6ad6c8543ee@github.com> References: <0aGc1NdAjpvksCWmXb1gZOPp9MV0n6xWvG8EaEp2ZLg=.b79cf218-9a23-420d-bec5-7509b7f8f1c1@github.com> <13yGfFhRFEsjHA-ox_6GxPiyU8w_hpQtgjHbsw6Glq0=.c2330c3f-0316-4f4c-aade-b6ad6c8543ee@github.com> Message-ID: On Wed, 15 Mar 2023 02:49:54 GMT, David Holmes wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix merge conflict. > > src/hotspot/share/runtime/arguments.cpp line 3589: > >> 3587: PrintCompilation || PrintInlining || PrintDependencies || PrintNativeNMethods || >> 3588: PrintDebugInfo || PrintRelocations || PrintNMethods || PrintExceptionHandlers || >> 3589: PrintAssembly || TraceDeoptimization || log_is_enabled(Debug, dependencies) || > > Now TraceDependencies is converted to UL I think it should just be deleted from this function. We don't need to enable LogVMOutput in that case. No, this is the right thing to do. If -Xlog:dependency - the compiler group also expects the dependency printed to the compiler log file. The logging is a separate mechanism, but should be enabled with -Xlog:dependency. > src/hotspot/share/runtime/arguments.cpp line 4004: > >> 4002: bool trace_dependencies = log_is_enabled(Debug, dependencies); >> 4003: if (trace_dependencies && VerifyDependencies) { >> 4004: if (trace_dependencies) { > > This inner if is not needed. fixed, thanks. ------------- PR: https://git.openjdk.org/jdk/pull/13007 From jsjolen at openjdk.org Wed Mar 15 13:13:23 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 15 Mar 2023 13:13:23 GMT Subject: RFR: JDK-8301495: Replace NULL with nullptr in cpu/ppc Message-ID: Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/ppc. Unfortunately the script that does the change isn't perfect, and so we need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. Here are some typical things to look out for: 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. An example of this: ```c++ // This function returns null void* ret_null(); // This function returns true if *x == nullptr bool is_nullptr(void** x); Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. Thanks! ------------- Commit messages: - Merge remote-tracking branch 'origin/master' into JDK-8301495 - Revert change in file - Fixes - Merge remote-tracking branch 'origin/master' into JDK-8301495 - reinrich suggestions - Replace NULL with nullptr in cpu/ppc Changes: https://git.openjdk.org/jdk/pull/12323/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12323&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8301495 Stats: 384 lines in 51 files changed: 0 ins; 0 del; 384 mod Patch: https://git.openjdk.org/jdk/pull/12323.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12323/head:pull/12323 PR: https://git.openjdk.org/jdk/pull/12323 From rrich at openjdk.org Wed Mar 15 13:13:37 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 15 Mar 2023 13:13:37 GMT Subject: RFR: JDK-8301495: Replace NULL with nullptr in cpu/ppc In-Reply-To: References: Message-ID: On Tue, 31 Jan 2023 11:39:48 GMT, Johan Sj?len wrote: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/ppc. Unfortunately the script that does the change isn't perfect, and so we > need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! Looks good, just a few changes I'd like to suggest. Thanks, Richard. src/hotspot/cpu/ppc/abstractInterpreter_ppc.cpp line 97: > 95: // Parameters: > 96: // > 97: // interpreter_frame != null: Suggestion: // interpreter_frame != nullptr: src/hotspot/cpu/ppc/frame_ppc.cpp line 114: > 112: > 113: // At this point, there still is a chance that fp_safe is false. > 114: // In particular, (fp == null) might be true. So let's check and Suggestion: // In particular, (fp == nullptr) might be true. So let's check and src/hotspot/cpu/ppc/interp_masm_ppc_64.cpp line 920: > 918: // } else if (THREAD->is_lock_owned((address)displaced_header)) > 919: // // Simple recursive case. > 920: // monitor->lock()->set_displaced_header(null); Suggestion: // monitor->lock()->set_displaced_header(nullptr); src/hotspot/cpu/ppc/interp_masm_ppc_64.cpp line 981: > 979: // } else if (THREAD->is_lock_owned((address)displaced_header)) > 980: // // Simple recursive case. > 981: // monitor->lock()->set_displaced_header(null); Suggestion: // monitor->lock()->set_displaced_header(nullptr); src/hotspot/cpu/ppc/interp_masm_ppc_64.cpp line 1031: > 1029: // template code: > 1030: // > 1031: // if ((displaced_header = monitor->displaced_header()) == null) { Suggestion: // if ((displaced_header = monitor->displaced_header()) == nullptr) { src/hotspot/cpu/ppc/interp_masm_ppc_64.cpp line 1033: > 1031: // if ((displaced_header = monitor->displaced_header()) == null) { > 1032: // // Recursive unlock. Mark the monitor unlocked by setting the object field to null. > 1033: // monitor->set_obj(null); Suggestion: // monitor->set_obj(nullptr); src/hotspot/cpu/ppc/interp_masm_ppc_64.cpp line 1036: > 1034: // } else if (Atomic::cmpxchg(obj->mark_addr(), monitor, displaced_header) == monitor) { > 1035: // // We swapped the unlocked mark in displaced_header into the object's mark word. > 1036: // monitor->set_obj(null); Suggestion: // monitor->set_obj(nullptr); src/hotspot/cpu/ppc/interp_masm_ppc_64.cpp line 1062: > 1060: // } else if (Atomic::cmpxchg(obj->mark_addr(), monitor, displaced_header) == monitor) { > 1061: // // We swapped the unlocked mark in displaced_header into the object's mark word. > 1062: // monitor->set_obj(null); Suggestion: // monitor->set_obj(nullptr); src/hotspot/cpu/ppc/interp_masm_ppc_64.cpp line 1097: > 1095: b(done); // Monitor register may be overwritten! Runtime has already freed the slot. > 1096: > 1097: // Exchange worked, do monitor->set_obj(null); Suggestion: // Exchange worked, do monitor->set_obj(nullptr); src/hotspot/cpu/ppc/interpreterRT_ppc.cpp line 102: > 100: > 101: // The handle for a receiver will never be null. > 102: bool do_nullptr_check = offset() != 0 || is_static(); Suggestion: bool do_null_check = offset() != 0 || is_static(); src/hotspot/cpu/ppc/interpreterRT_ppc.cpp line 105: > 103: > 104: Label do_null; > 105: if (do_nullptr_check) { Suggestion: if (do_null_check) { src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 2421: > 2419: BLOCK_COMMENT("arraycopy initial argument checks"); > 2420: > 2421: __ cmpdi(CCR1, src, 0); // if (src == null) return -1; Suggestion: __ cmpdi(CCR1, src, 0); // if (src == nullptr) return -1; src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 2423: > 2421: __ cmpdi(CCR1, src, 0); // if (src == null) return -1; > 2422: __ extsw_(src_pos, src_pos); // if (src_pos < 0) return -1; > 2423: __ cmpdi(CCR5, dst, 0); // if (dst == null) return -1; Suggestion: __ cmpdi(CCR5, dst, 0); // if (dst == nullptr) return -1; src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp line 525: > 523: __ ld(R3_RET, Interpreter::stackElementSize, R15_esp); // get receiver > 524: > 525: // Check if receiver == null and go the slow path. Suggestion: // Check if receiver == nullptr and go the slow path. ------------- Changes requested by rrich (Reviewer). PR: https://git.openjdk.org/jdk/pull/12323 From jsjolen at openjdk.org Wed Mar 15 13:13:42 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 15 Mar 2023 13:13:42 GMT Subject: RFR: JDK-8301495: Replace NULL with nullptr in cpu/ppc In-Reply-To: References: Message-ID: On Tue, 31 Jan 2023 11:39:48 GMT, Johan Sj?len wrote: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/ppc. Unfortunately the script that does the change isn't perfect, and so we > need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! Found a few minor issues. src/hotspot/cpu/ppc/frame_ppc.cpp line 114: > 112: > 113: // At this point, there still is a chance that fp_safe is false. > 114: // In particular, (fp == nullptr) might be true. So let's check and "fp might be null" src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 1819: > 1817: } > 1818: > 1819: // for (scan = klass->itable(); scan->interface() != null; scan += scan_step) { nullptr src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 2710: > 2708: // Handle existing monitor. > 2709: bind(object_has_monitor); > 2710: // The object's monitor m is unlocked iff m->owner == null, is null src/hotspot/os/posix/os_posix.cpp line 293: > 291: ((MemTracker ::tracking_level() == NMT_detail) > 292: ? NativeCallStack(1) > 293: : NativeCallStack(NativeCallStack ::FakeMarker ::its_fake))); Accidentally committed this, will revert to original. ------------- PR: https://git.openjdk.org/jdk/pull/12323 From jsjolen at openjdk.org Wed Mar 15 13:13:43 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 15 Mar 2023 13:13:43 GMT Subject: RFR: JDK-8301495: Replace NULL with nullptr in cpu/ppc In-Reply-To: References: Message-ID: On Wed, 1 Feb 2023 08:37:33 GMT, Richard Reingruber wrote: >> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/ppc. Unfortunately the script that does the change isn't perfect, and so we >> need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. >> >> Here are some typical things to look out for: >> >> 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). >> 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. >> 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. >> >> An example of this: >> >> ```c++ >> // This function returns null >> void* ret_null(); >> // This function returns true if *x == nullptr >> bool is_nullptr(void** x); >> >> >> Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. >> >> Thanks! > > Looks good, just a few changes I'd like to suggest. > > Thanks, Richard. Thanks @reinrich for your review. I've fixed the issues you found and fixed some that I found also, we will see what GHA says about this commit. ------------- PR: https://git.openjdk.org/jdk/pull/12323 From sjohanss at openjdk.org Wed Mar 15 13:23:25 2023 From: sjohanss at openjdk.org (Stefan Johansson) Date: Wed, 15 Mar 2023 13:23:25 GMT Subject: RFR: 8191565: Last-ditch Full GC should also move humongous objects [v2] In-Reply-To: References: <58l059EvQI6HNQyjUYSGYEWt6x-c1yvtmfX1QWfinH8=.87517ba1-ec81-4b9f-a41b-b05c8d33cf3d@github.com> Message-ID: On Mon, 6 Mar 2023 10:22:27 GMT, Ivan Walulya wrote: >> Hi All, >> >> Please review this change to move humongous regions during the Last-Ditch full gc ( on `do_maximal_compaction`). This change will enable G1 to avoid encountering Out-Of-Memory errors that may occur due to the fragmentation of memory regions caused by the allocation of large memory blocks. >> >> Here's how it works: At the end of `phase2_prepare_compaction`, G1 performs a serial compaction process for regular objects, which results in the heap being divided into two parts. The first part is a densely populated prefix that contains all the regular objects that have been moved. The second part consists of the remaining heap space, which may contain free regions, uncommitted regions, and regions that are not compacting. By moving/compacting the humongous objects in the second part of the heap closer to the dense prefix, G1 reduces the region fragmentation and avoids running into OOM errors. >> >> We have enabled for G1 the Jtreg test that was previously used only for Shenandoah to test such workload. >> >> Testing: Tier 1-3 > > Ivan Walulya has updated the pull request incrementally with two additional commits since the last revision: > > - Refactor resetting humongous metadata > - Thomas review Overall it looks good, just a couple of small comments below. Many thanks for taking on this change. src/hotspot/share/gc/g1/g1CollectedHeap.hpp line 609: > 607: inline void register_humongous_candidate_region_with_region_attr(uint index); > 608: > 609: void reset_humongous_metadata(HeapRegion* first_hr, I would go with just `set_humongous_metadata(...)` src/hotspot/share/gc/g1/g1FullGCCompactionPoint.hpp line 47: > 45: void switch_region(); > 46: HeapRegion* next_region(); > 47: Pair find_contiguous_before(HeapRegion* hr, uint num_regions); I think we can skip returning a `Pair` here. The second value is only used to check if we found any range and instead we could use something like `UINT_MAX` or the max number of regions to mark that. ------------- Changes requested by sjohanss (Reviewer). PR: https://git.openjdk.org/jdk/pull/12830 From jwaters at openjdk.org Wed Mar 15 13:38:45 2023 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 15 Mar 2023 13:38:45 GMT Subject: Integrated: 8301308: Remove version conditionalization for gcc/clang PRAGMA_DIAG_PUSH/POP In-Reply-To: <9BP1enfxaU7HVyJEL27oCu9kMC45tWMkeUcqg1Xr6zE=.4a774982-1c1a-44ed-8862-2c300e79f458@github.com> References: <9BP1enfxaU7HVyJEL27oCu9kMC45tWMkeUcqg1Xr6zE=.4a774982-1c1a-44ed-8862-2c300e79f458@github.com> Message-ID: <2ypkTuVbu1byxtgFpO2Yb8xmYMn_aHcgKnYD_XBdyN0=.ff452911-03b1-4396-adfd-51ac5e5e8f62@github.com> On Tue, 14 Mar 2023 17:51:12 GMT, Julian Waters wrote: > As of now we at minimum require clang 3.5 and gcc 6 to compile the Java Platform, the version checks for gcc/clang here are for whether clang is either version 4 and above, or has a minor version higher than 3.1, and for gcc either a major version higher than 4 or minor version above 4.6. Now these will always pass, so they can be removed. Also changes the macro definition location to match Visual C++ and look neater This pull request has now been integrated. Changeset: 3d77e217 Author: Julian Waters URL: https://git.openjdk.org/jdk/commit/3d77e217b2b97d2c290c50c4dc55987ecc13eb79 Stats: 13 lines in 1 file changed: 3 ins; 10 del; 0 mod 8301308: Remove version conditionalization for gcc/clang PRAGMA_DIAG_PUSH/POP Reviewed-by: kbarrett, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/13025 From duke at openjdk.org Wed Mar 15 13:39:10 2023 From: duke at openjdk.org (Afshin Zafari) Date: Wed, 15 Mar 2023 13:39:10 GMT Subject: RFR: 8292059: Do not inline InstanceKlass::allocate_instance() [v7] In-Reply-To: References: Message-ID: > The inline and not-inline versions of the method is tested to compare the performance difference. > ### Test > `make test TEST=micro:Capture0.lambda_01 MICRO="VM_OPTIONS=-XX:TieredStopAtLevel=1" ` Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: 8292059: Do not inline InstanceKlass::allocate_instance() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12782/files - new: https://git.openjdk.org/jdk/pull/12782/files/4165cab8..da1a1331 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12782&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12782&range=05-06 Stats: 2 lines in 2 files changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/12782.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12782/head:pull/12782 PR: https://git.openjdk.org/jdk/pull/12782 From fparain at openjdk.org Wed Mar 15 13:45:46 2023 From: fparain at openjdk.org (Frederic Parain) Date: Wed, 15 Mar 2023 13:45:46 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v4] In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 01:25:01 GMT, Chris Plummer wrote: >> Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixes includes and style > > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/Field.java line 75: > >> 73: int initialValueIndex; >> 74: int genericSignatureIndex; >> 75: int contendedGroup; > > It seems that these should all be shorts. All the getter methods are casting them to short. Indexes in the constant pool are unsigned shorts, but Java shorts are signed, using ints is the simplest way to store those indexes. > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/InstanceKlass.java line 108: > >> 106: CLASS_STATE_INITIALIZATION_ERROR = db.lookupIntConstant("InstanceKlass::initialization_error").intValue(); >> 107: // We need a new fieldsCache each time we attach. >> 108: fieldsCache = new HashMap(); > > This should probably be a WeakHashMap. I tried it and it seems to work (or at least didn't cause any problems). However, when doing a heap dump I didn't notice the table being any smaller on exit when it was made weak, even though there were numerous GC's while dumping the heap. > > The `` is the Address of the hotspot InstanceKlass instance, and this Address is referenced by the SA InstanceKlass mirror. So theoretically when the reference to the mirror goes way, then the cache entry can be cleared. I've changed the map to a WeakHashMap and didn't see any issue during testing. But I didn't measure footprint. > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/InstanceKlass.java line 325: > >> 323: >> 324: public int getFieldOffset(int index) { >> 325: return (int)getField(index).getOffset(); > > Cast to int is not needed Other APIs (like MetadaField) are using longs to pass offsets, doing a cast here is less disruptive than changing all the other APIs. ------------- PR: https://git.openjdk.org/jdk/pull/12855 From duke at openjdk.org Wed Mar 15 13:46:42 2023 From: duke at openjdk.org (Afshin Zafari) Date: Wed, 15 Mar 2023 13:46:42 GMT Subject: RFR: 8292059: Do not inline InstanceKlass::allocate_instance() [v7] In-Reply-To: References: <9ZXi9uNa5ETIhldKLCDAYojtXTGEg-5EexLwHNE2zhI=.026815b3-3117-4184-be4c-5fdf42c2655f@github.com> <1W22QvJKaoJVRI5Wrx6DZEizGOxHsiSAEoLkHsCauQY=.6e934949-d854-49f8-9e1e-5786c0d04c8f@github.com> Message-ID: On Wed, 15 Mar 2023 12:51:53 GMT, Coleen Phillimore wrote: >> Good. I think devirtualizer.inline.hpp is used by the GC inlined functions so should remain here. > > Sorry for being confusing before. My comment was that you should remove #includes that aren't directly referenced by this .inline.hpp file. I see that you've done this now, but do you need classfile/vmSymbols.hpp here? I don't see it. I mis-interpreted the required changes. It is not needed here, and moved to `jvmtiEnvBase.cpp`. ------------- PR: https://git.openjdk.org/jdk/pull/12782 From kvn at openjdk.org Wed Mar 15 13:51:30 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 15 Mar 2023 13:51:30 GMT Subject: RFR: JDK-8301498: Replace NULL with nullptr in cpu/x86 [v3] In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 11:09:25 GMT, Johan Sj?len wrote: >> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/x86. Unfortunately the script that does the change isn't perfect, and so we >> need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. >> >> Here are some typical things to look out for: >> >> 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). >> 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. >> 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. >> >> An example of this: >> >> ```c++ >> // This function returns null >> void* ret_null(); >> // This function returns true if *x == nullptr >> bool is_nullptr(void** x); >> >> >> Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. >> >> Thanks! > > Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Fix vnkozlov's suggestions > - Merge remote-tracking branch 'origin/master' into JDK-8301498 > - Some more fixes > - Fixes > - Replace NULL with nullptr in cpu/x86 Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.org/jdk/pull/12326 From coleenp at openjdk.org Wed Mar 15 14:12:32 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 15 Mar 2023 14:12:32 GMT Subject: RFR: 8292059: Do not inline InstanceKlass::allocate_instance() [v7] In-Reply-To: References: Message-ID: <4R7Gp-JUViFk-Pu3b7DwQ_hAhZavttNs1cBt1CnGYsQ=.62a967e0-1626-4083-b6b0-73e61e0eade1@github.com> On Wed, 15 Mar 2023 13:39:10 GMT, Afshin Zafari wrote: >> The inline and not-inline versions of the method is tested to compare the performance difference. >> ### Test >> `make test TEST=micro:Capture0.lambda_01 MICRO="VM_OPTIONS=-XX:TieredStopAtLevel=1" ` > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > 8292059: Do not inline InstanceKlass::allocate_instance() Great. This looks good. We'll wait for GHA and integrate. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.org/jdk/pull/12782 From rrich at openjdk.org Wed Mar 15 14:15:20 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 15 Mar 2023 14:15:20 GMT Subject: RFR: 8296440: Remove Method* handling from cleanup_inline_caches_impl In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 10:48:58 GMT, Martin Doerr wrote: > Impressive that we can get rid of so much complicated code! Looks correct to me. Thanks for reviewing, Martin. > Is the check fast enough to do it on every resolution? I though so. The easy cases where the callee is from a permanent loader or where caller and callee have the same holder oop should be common. I'm not sure how to quantify the performance effect though. I wouldn't object to turn it into an assertion if requested. ------------- PR: https://git.openjdk.org/jdk/pull/12802 From mdoerr at openjdk.org Wed Mar 15 15:23:42 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 15 Mar 2023 15:23:42 GMT Subject: RFR: 8296440: Remove Method* handling from cleanup_inline_caches_impl In-Reply-To: References: Message-ID: <-3WO0jPxKC1CTVEB1ioqIgLFt5MRFE7zPf4pJw5Erls=.0d4c243c-eefc-4450-9df2-64a927e509eb@github.com> On Wed, 1 Mar 2023 11:07:36 GMT, Richard Reingruber wrote: > This PR replaces cleaning of static stubs in CompiledMethod::cleanup_inline_caches_impl() with a guarantee that it is actually not needed because the holder of the embedded target Method* is alive if the caller nmethod is not unloading. > > The holder of the target Method* has to be alive because it is reachable from the caller nmethod's oop pool. This is checked by `check_path_to_callee()` when a statically bound call gets resolved. > > C2i entry barriers can be removed for the same reason. > > Testing: > > Many rounds in our CI testing which includes most JCK and JTREG tests, Renaissance benchmark and SAP specific tests with fastdebug and release builds on the standard platforms plus PPC64. > > I've also done tier1 and tier2 tests with -XX:-Inline and tier1 tests with ZGC. > > I've started hotspot and jdk tier1 tests with -Xcomp. They were not finished when I stopped them after 24h. I guess that the benchmarks don't show any impact. If so, I'm ok with your version. If there are concerns, it would be possible to make it switchable by a diagnostic flag. But I'd still keep it enabled by default to get maximum test coverage. ------------- PR: https://git.openjdk.org/jdk/pull/12802 From kvn at openjdk.org Wed Mar 15 15:38:00 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 15 Mar 2023 15:38:00 GMT Subject: RFR: 8296440: Remove Method* handling from cleanup_inline_caches_impl In-Reply-To: References: Message-ID: On Wed, 1 Mar 2023 11:07:36 GMT, Richard Reingruber wrote: > This PR replaces cleaning of static stubs in CompiledMethod::cleanup_inline_caches_impl() with a guarantee that it is actually not needed because the holder of the embedded target Method* is alive if the caller nmethod is not unloading. > > The holder of the target Method* has to be alive because it is reachable from the caller nmethod's oop pool. This is checked by `check_path_to_callee()` when a statically bound call gets resolved. > > C2i entry barriers can be removed for the same reason. > > Testing: > > Many rounds in our CI testing which includes most JCK and JTREG tests, Renaissance benchmark and SAP specific tests with fastdebug and release builds on the standard platforms plus PPC64. > > I've also done tier1 and tier2 tests with -XX:-Inline and tier1 tests with ZGC. > > I've started hotspot and jdk tier1 tests with -Xcomp. They were not finished when I stopped them after 24h. Let me test it. I think `check_path_to_callee` should be done only in debug VM. It is verification code. But for **pre-integration** testing it is fine to have it as guarantee. ------------- PR: https://git.openjdk.org/jdk/pull/12802 From matsaave at openjdk.org Wed Mar 15 15:38:10 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 15 Mar 2023 15:38:10 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v4] In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 23:29:17 GMT, Calvin Cheung wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> RISCV port update > > src/hotspot/share/interpreter/bootstrapInfo.cpp line 234: > >> 232: if (_indy_index > -1) { >> 233: os::snprintf_checked(what, sizeof(what), "indy#%d", _indy_index); >> 234: } > > Since the `else` case doesn?t have braces, maybe omit the braces for this case as well? The if statements below use braces so I think it would be better to add braces to the else case. ------------- PR: https://git.openjdk.org/jdk/pull/12778 From rrich at openjdk.org Wed Mar 15 15:41:13 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 15 Mar 2023 15:41:13 GMT Subject: RFR: 8296440: Remove Method* handling from cleanup_inline_caches_impl In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 15:34:48 GMT, Vladimir Kozlov wrote: > Let me test it. > > I think `check_path_to_callee` should be done only in debug VM. It is verification code. But for **pre-integration** testing it is fine to have it as guarantee. Thanks for testing! I'll make the test dependent on ASSERT afterwards. ------------- PR: https://git.openjdk.org/jdk/pull/12802 From fparain at openjdk.org Wed Mar 15 15:41:17 2023 From: fparain at openjdk.org (Frederic Parain) Date: Wed, 15 Mar 2023 15:41:17 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v5] In-Reply-To: References: Message-ID: > Please review this change re-implementing the FieldInfo data structure. > > The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. > > The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). > > More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 > > Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. > > The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. > > Tested with mach5, tier 1 to 7. > > Thank you. Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: SA and JVMCI fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12855/files - new: https://git.openjdk.org/jdk/pull/12855/files/12b4f1b4..f81337f7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12855&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12855&range=03-04 Stats: 130 lines in 13 files changed: 14 ins; 46 del; 70 mod Patch: https://git.openjdk.org/jdk/pull/12855.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12855/head:pull/12855 PR: https://git.openjdk.org/jdk/pull/12855 From matsaave at openjdk.org Wed Mar 15 16:08:21 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 15 Mar 2023 16:08:21 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v3] In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 15:39:39 GMT, Gui Cao wrote: >> Matias Saavedra Silva has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Typo in comment >> - Merge branch 'master' into resolvedIndyEntry_8301995 >> - Interpreter optimization and comments >> - PPC and RISCV port >> - 8301995: Move invokedynamic resolution information out of the cpCache > > src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 1843: > >> 1841: ldr(cache, Address(rcpool, in_bytes(ConstantPoolCache::invokedynamic_entries_offset()))); >> 1842: // Scale the index to be the entry index * sizeof(ResolvedInvokeDynamicInfo) >> 1843: mov(tmp, sizeof(ResolvedIndyEntry)); > > The tmp register is not used here, is it redundant? Right, the tmp register is not needed anymore thanks to the mul to shift optimization. Note that shifting will not be possible on 32-bit systems due to the size of ResolvedIndyEntry not being a power of two. This optimization only works on 64 bit builds. ------------- PR: https://git.openjdk.org/jdk/pull/12778 From matsaave at openjdk.org Wed Mar 15 16:35:22 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 15 Mar 2023 16:35:22 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v5] In-Reply-To: References: Message-ID: > The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. > > This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. > > Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. > > The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. > > This change supports the following platforms: x86, aarch64, PPC, and RISCV Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Fixed indentation and other comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12778/files - new: https://git.openjdk.org/jdk/pull/12778/files/db892223..415e7116 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=03-04 Stats: 71 lines in 9 files changed: 1 ins; 3 del; 67 mod Patch: https://git.openjdk.org/jdk/pull/12778.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12778/head:pull/12778 PR: https://git.openjdk.org/jdk/pull/12778 From cjplummer at openjdk.org Wed Mar 15 16:41:16 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 15 Mar 2023 16:41:16 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v5] In-Reply-To: References: Message-ID: <_thEXXKYB00W5Mmg8hGRKMLqo6vog84sjtp2Mqf2wqk=.8f5a76a6-30c5-4a05-ae7a-753e1d70ddee@github.com> On Wed, 15 Mar 2023 15:41:17 GMT, Frederic Parain wrote: >> Please review this change re-implementing the FieldInfo data structure. >> >> The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. >> >> The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). >> >> More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 >> >> Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. >> >> The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. >> >> Tested with mach5, tier 1 to 7. >> >> Thank you. > > Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: > > SA and JVMCI fixes SA changes looks good. Thanks for taking care of this! ------------- Marked as reviewed by cjplummer (Reviewer). PR: https://git.openjdk.org/jdk/pull/12855 From chagedorn at openjdk.org Wed Mar 15 16:47:06 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 15 Mar 2023 16:47:06 GMT Subject: RFR: 8302073: Specifying OnError handler prevents WatcherThread to break a deadlock in report_and_die() In-Reply-To: References: <0jlQWouDE0tJ-ysn7WFYArqrJwHFQ-hhsZKKRGdVhmU=.95c4ceb0-9b8e-4c9b-9029-473626fb5a6b@github.com> Message-ID: <9a2KV9P9iP8EOEDuqpxePREsXK-02Rw6p_QHmfm86bg=.5ac9b48f-e3c6-461c-b66c-437b21f34a16@github.com> On Tue, 14 Mar 2023 18:13:11 GMT, Christian Hagedorn wrote: >>> Decoder, in particular, should not use malloc. Therefore I also opened https://bugs.openjdk.org/browse/JDK-8303862 to track that. I won't have time to work on that. I have the hope that maybe @chhagedorn can :-) ? >> >> Does any of our current allocation strategies allow the usage of a custom scratch buffer (i.e. the error reporting scratch buffer) as memory location? If not, it could get more complicated. I could still try to tackle that but I'm not sure if I have the necessary knowledge in that area. > >> > > Decoder, in particular, should not use malloc. Therefore I also opened https://bugs.openjdk.org/browse/JDK-8303862 to track that. I won't have time to work on that. I have the hope that maybe @chhagedorn can :-) ? >> > >> > >> > Does any of our current allocation strategies allow the usage of a custom scratch buffer (i.e. the error reporting scratch buffer) as memory location? If not, it could get more complicated. I could still try to tackle that but I'm not sure if I have the necessary knowledge in that area. >> >> What we usually do for these kind of problems is to pass the scratch buffer via argument to the processing functions. And make that use either it or, if NULL was passes, allocate its own thing. >> >> Another way to do this would be to add a scratch buffer pointer to Thread. >> >> A third way to do this (I had been experimenting with it) would be to pre-allocate a scratch buffer, and once error handling began, to use that inside os::malloc. That is the most involved solution, though, and I'm not particularly fond of it. I know many reviewers would hate it, too :-) > > Thanks for the summary. Could we also use "placement new" to use the scratch buffer to allocate and create objects to? For example, when creating a decoder, to use something like > > decoder = new(scratch_buffer) ElfDecoder(); > > here: > https://github.com/openjdk/jdk/blob/4e631fa43fd821846c12ae2177360c44cf770766/src/hotspot/share/utilities/decoder.cpp#L62-L70 > Then we can still utilize polymorphism. But then `AbstractDecoder` needs to be something else than `CHeapObj`. And of course, this approach also needs careful checking that a new object actually fits into the provided scratch buffer. On top of that, it gets more complicated when we want to keep multiple objects alive in the same scratch buffer. So, I'm not sure if this approach is feasible, though. > > > > > Decoder, in particular, should not use malloc. Therefore I also opened https://bugs.openjdk.org/browse/JDK-8303862 to track that. I won't have time to work on that. I have the hope that maybe @chhagedorn can :-) ? > > > > > > > > > > > > Does any of our current allocation strategies allow the usage of a custom scratch buffer (i.e. the error reporting scratch buffer) as memory location? If not, it could get more complicated. I could still try to tackle that but I'm not sure if I have the necessary knowledge in that area. > > > > > > > > > What we usually do for these kind of problems is to pass the scratch buffer via argument to the processing functions. And make that use either it or, if NULL was passes, allocate its own thing. > > > Another way to do this would be to add a scratch buffer pointer to Thread. > > > A third way to do this (I had been experimenting with it) would be to pre-allocate a scratch buffer, and once error handling began, to use that inside os::malloc. That is the most involved solution, though, and I'm not particularly fond of it. I know many reviewers would hate it, too :-) > > > > > > Thanks for the summary. Could we also use "placement new" to use the scratch buffer to allocate and create objects to? For example, when creating a decoder, to use something like > > ``` > > decoder = new(scratch_buffer) ElfDecoder(); > > ``` > > > > > > > > > > > > > > > > > > > > > > > > here: > > https://github.com/openjdk/jdk/blob/4e631fa43fd821846c12ae2177360c44cf770766/src/hotspot/share/utilities/decoder.cpp#L62-L70 > > > > Then we can still utilize polymorphism. But then `AbstractDecoder` needs to be something else than `CHeapObj`. And of course, this approach also needs careful checking that a new object actually fits into the provided scratch buffer. On top of that, it gets more complicated when we want to keep multiple objects alive in the same scratch buffer. So, I'm not sure if this approach is feasible, though. > > Its possible to use placement new, and also to place several objects like this; but I realize now the scratch buffer might be too small. Yes, that might be the case. > > I need to think this through a bit. Maybe providing a pre-allocated buffer for os::malloc would be the right thing to do. That might be better and it sounds easier to do instead of going with a separate allocation strategy with placement new using the scratch buffer. ------------- PR: https://git.openjdk.org/jdk/pull/12925 From rrich at openjdk.org Wed Mar 15 16:53:40 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 15 Mar 2023 16:53:40 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v2] In-Reply-To: References: <-Kj1YJ_nRa4nJtaxg3UR8uWhde6vIG1Jl-FFakGnHy4=.a41c6149-912b-4a66-8b1e-634bd27cdebb@github.com> Message-ID: On Tue, 14 Mar 2023 17:01:20 GMT, Matias Saavedra Silva wrote: > > @matias9927 can I ask you to merge master? There seem to be conflicts (at least I see a message "This branch has conflicts that must be resolved"). I'd like to give the change a spin in our CI testing. This requires that it can be applied on master. > > I saw that merge error but nothing came up when I tried to merge locally. The branch is updated nonetheless, so you should be able to test it now @reinrich ! Thanks. The testing didn't reveal anything. ------------- PR: https://git.openjdk.org/jdk/pull/12778 From iwalulya at openjdk.org Wed Mar 15 17:00:38 2023 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 15 Mar 2023 17:00:38 GMT Subject: RFR: 8191565: Last-ditch Full GC should also move humongous objects [v3] In-Reply-To: <58l059EvQI6HNQyjUYSGYEWt6x-c1yvtmfX1QWfinH8=.87517ba1-ec81-4b9f-a41b-b05c8d33cf3d@github.com> References: <58l059EvQI6HNQyjUYSGYEWt6x-c1yvtmfX1QWfinH8=.87517ba1-ec81-4b9f-a41b-b05c8d33cf3d@github.com> Message-ID: > Hi All, > > Please review this change to move humongous regions during the Last-Ditch full gc ( on `do_maximal_compaction`). This change will enable G1 to avoid encountering Out-Of-Memory errors that may occur due to the fragmentation of memory regions caused by the allocation of large memory blocks. > > Here's how it works: At the end of `phase2_prepare_compaction`, G1 performs a serial compaction process for regular objects, which results in the heap being divided into two parts. The first part is a densely populated prefix that contains all the regular objects that have been moved. The second part consists of the remaining heap space, which may contain free regions, uncommitted regions, and regions that are not compacting. By moving/compacting the humongous objects in the second part of the heap closer to the dense prefix, G1 reduces the region fragmentation and avoids running into OOM errors. > > We have enabled for G1 the Jtreg test that was previously used only for Shenandoah to test such workload. > > Testing: Tier 1-3 Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: StefanJ review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12830/files - new: https://git.openjdk.org/jdk/pull/12830/files/6f8dd514..ac6bb065 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12830&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12830&range=01-02 Stats: 23 lines in 5 files changed: 0 ins; 2 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/12830.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12830/head:pull/12830 PR: https://git.openjdk.org/jdk/pull/12830 From jvernee at openjdk.org Wed Mar 15 17:17:51 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 15 Mar 2023 17:17:51 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v16] In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 22:30:22 GMT, Martin Doerr wrote: >> Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". >> >> This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). >> >> Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. >> >> There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Will get addressed separately: [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) >> >> The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. >> >> I had to make changes to shared code and code for other platforms: >> 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: >> - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. >> - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. >> - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! >> 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Fix storing 32 bit integers into Java frames. Enable TestArrayStructs. src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/TypeClass.java line 2: > 1: /* > 2: * Copyright (c) 2022, 2023 Oracle and/or its affiliates. All rights reserved. The copyright header here is missing a comma after the second year: Suggestion: * Copyright (c) 2022, 2023, Oracle and/or its affiliates. All rights reserved. ------------- PR: https://git.openjdk.org/jdk/pull/12708 From cslucas at openjdk.org Wed Mar 15 17:20:47 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 15 Mar 2023 17:20:47 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v2] In-Reply-To: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: <8d3LAUIIFVAJIXyrV2YafqAtAe6yiSPUS5THd2VynTk=.006e4cf8-90fe-43ea-8bb3-bbda4d3244f9@github.com> > Can I please get reviews for this PR to add support for the rematerialization of scalar-replaced objects that participate in allocation merges? > > The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. > > With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: > > ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) > > What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: > > ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) > > This PR adds support scalar replacing allocations participating in merges that are used *only* as debug information in SafePointNode and its subclasses. Although there is a performance benefit in doing scalar replacement in this scenario only, the goal of this PR is mainly to add infrastructure to support the rematerialization of SR objects participating in merges. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP, Load+AddP, primarily) subsequently. > > The approach I used is pretty straightforward. It consists basically in: 1) Extend SafePointScalarObjectNode to represent multiple SR objects; 2) Add a new Class to support rematerialization of SR objects part of merges; 3) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 4) Patch C2 to generate unique types for SR objects participating in allocation merges used only as debug information. > > I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression that might be related. I also tested with several applications and didn't see any failure. Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - Merge master - Add support for rematerializing scalar replaced objects participating in allocation merges ------------- Changes: https://git.openjdk.org/jdk/pull/12897/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12897&range=01 Stats: 1803 lines in 18 files changed: 1653 ins; 9 del; 141 mod Patch: https://git.openjdk.org/jdk/pull/12897.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12897/head:pull/12897 PR: https://git.openjdk.org/jdk/pull/12897 From jvernee at openjdk.org Wed Mar 15 17:27:32 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 15 Mar 2023 17:27:32 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v16] In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 22:52:44 GMT, Martin Doerr wrote: > Btw. the new cases in which we use int and short accesses when byteWidth is not a power of 2 are never unaligned AFAICS. I guess _UNALIGNED is unnecessary in the JAVA_INT_UNALIGNED and JAVA_SHORT_UNALIGNED. They are always aligned wrt. to their size. They are not necessarily aligned, for instance of the struct we are given is not aligned itself (at least in the reading case). Though we currently reject that case in AbstractLinker, it is something we might want to allow, and we have been looking at that as of late, for instance to pass packed structs to native code. FWIW, using the unaligned variant effectively just turns off the alignment checks we do. So we have less safety, in theory. There should be no performance difference though just from using the _UNALIGNED layout. ------------- PR: https://git.openjdk.org/jdk/pull/12708 From dnsimon at openjdk.org Wed Mar 15 18:03:33 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 15 Mar 2023 18:03:33 GMT Subject: RFR: 8303431: [JVMCI] libgraal annotation API [v6] In-Reply-To: <0hYs21V1ZWB8o92CfvkEW3i0dZKkeW8kYGQu0p6xvtM=.e76da2cd-dbe5-4da2-a6cb-775f081b9a6a@github.com> References: <0hYs21V1ZWB8o92CfvkEW3i0dZKkeW8kYGQu0p6xvtM=.e76da2cd-dbe5-4da2-a6cb-775f081b9a6a@github.com> Message-ID: On Tue, 14 Mar 2023 16:06:06 GMT, Doug Simon wrote: >> This PR extends JVMCI with new API (`jdk.vm.ci.meta.Annotated`) for accessing annotations. The main differences from `java.lang.reflect.AnnotatedElement` are: >> * All methods in the `Annotated` interface explicitly specify requested annotation type(s). That is, there is no equivalent of `AnnotatedElement.getAnnotations()`. >> * Annotation data is returned in a map-like object (of type `jdk.vm.ci.meta.AnnotationData`) instead of in an `Annotation` object. This works better for libgraal as it avoids the need for annotation types to be loaded and included in libgraal. >> >> To demonstrate the new API, here's an example in terms `java.lang.reflect.AnnotatedElement` (which `ResolvedJavaType` implements): >> >> ResolvedJavaMethod method = ...; >> ExplodeLoop a = method.getAnnotation(ExplodeLoop.class); >> return switch (a.kind()) { >> case FULL_UNROLL -> LoopExplosionKind.FULL_UNROLL; >> case FULL_UNROLL_UNTIL_RETURN -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; >> ... >> } >> >> >> The same code using the new API: >> >> >> ResolvedJavaMethod method = ...; >> ResolvedJavaType explodeLoopType = ...; >> AnnotationData a = method.getAnnotationDataFor(explodeLoopType); >> return switch (a.getEnum("kind").getName()) { >> case "FULL_UNROLL" -> LoopExplosionKind.FULL_UNROLL; >> case "FULL_UNROLL_UNTIL_RETURN" -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; >> ... >> } >> >> >> The implementation relies on new methods in `jdk.internal.vm.VMSupport` for parsing annotations and serializing/deserializing to/from a byte array. This allows the annotation data to be passed from the HotSpot heap to the libgraal heap. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > addressed review feedback @jddarcy would you be able to help review this PR? Based on git log, you are knowledgeable in `sun.reflect.annotation`. If not, please suggest who else I can reach out to for a review. ------------- PR: https://git.openjdk.org/jdk/pull/12810 From cslucas at openjdk.org Wed Mar 15 18:06:58 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 15 Mar 2023 18:06:58 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v3] In-Reply-To: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: > Can I please get reviews for this PR to add support for the rematerialization of scalar-replaced objects that participate in allocation merges? > > The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. > > With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: > > ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) > > What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: > > ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) > > This PR adds support scalar replacing allocations participating in merges that are used *only* as debug information in SafePointNode and its subclasses. Although there is a performance benefit in doing scalar replacement in this scenario only, the goal of this PR is mainly to add infrastructure to support the rematerialization of SR objects participating in merges. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP, Load+AddP, primarily) subsequently. > > The approach I used is pretty straightforward. It consists basically in: 1) Extend SafePointScalarObjectNode to represent multiple SR objects; 2) Add a new Class to support rematerialization of SR objects part of merges; 3) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 4) Patch C2 to generate unique types for SR objects participating in allocation merges used only as debug information. > > I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression that might be related. I also tested with several applications and didn't see any failure. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Fix some typos and do some small refactorings. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12897/files - new: https://git.openjdk.org/jdk/pull/12897/files/ea67a304..3b492d2e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12897&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12897&range=01-02 Stats: 72 lines in 8 files changed: 1 ins; 7 del; 64 mod Patch: https://git.openjdk.org/jdk/pull/12897.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12897/head:pull/12897 PR: https://git.openjdk.org/jdk/pull/12897 From matsaave at openjdk.org Wed Mar 15 18:45:00 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 15 Mar 2023 18:45:00 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v6] In-Reply-To: References: Message-ID: > The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. > > This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. > > Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. > > The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. > > This change supports the following platforms: x86, aarch64, PPC, and RISCV Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Fixed aarch64 interpreter mistake ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12778/files - new: https://git.openjdk.org/jdk/pull/12778/files/415e7116..9a3a63ae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=04-05 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12778.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12778/head:pull/12778 PR: https://git.openjdk.org/jdk/pull/12778 From mdoerr at openjdk.org Wed Mar 15 18:53:08 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 15 Mar 2023 18:53:08 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v17] In-Reply-To: References: Message-ID: > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Will get addressed separately: [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Fix Copyright format. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12708/files - new: https://git.openjdk.org/jdk/pull/12708/files/9173af20..5320f895 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=15-16 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From dcubed at openjdk.org Wed Mar 15 18:56:35 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 15 Mar 2023 18:56:35 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v27] In-Reply-To: References: Message-ID: <3lU8tL9eqZfwn3qgslK3WcjAInuVFsuY2X1vpukzbJI=.53255494-3359-46fb-a311-30cd59091b7b@github.com> On Wed, 15 Mar 2023 09:41:30 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with three additional commits since the last revision: > > - More RISCV changes (by Fei Yang) > - Use -w instructions in fast_unlock() > - Increase stub size of C2HandleAnonOwnerStub to 18 Reviewed the v26 changes except for the riscv files. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From mdoerr at openjdk.org Wed Mar 15 19:07:34 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 15 Mar 2023 19:07:34 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v6] In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 18:45:00 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Fixed aarch64 interpreter mistake src/hotspot/cpu/ppc/templateTable_ppc_64.cpp line 3398: > 3396: const Bytecodes::Code code = bytecode(); > 3397: const bool is_invokeinterface = code == Bytecodes::_invokeinterface; > 3398: const bool is_invokedynamic = code == false; // should not reach here with invokedynamic This looks strange! I guess you wanted to delete more? ------------- PR: https://git.openjdk.org/jdk/pull/12778 From jvernee at openjdk.org Wed Mar 15 19:12:27 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 15 Mar 2023 19:12:27 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v17] In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 18:53:08 GMT, Martin Doerr wrote: >> Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". >> >> This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). >> >> Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. >> >> There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Will get addressed separately: [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) >> >> The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. >> >> I had to make changes to shared code and code for other platforms: >> 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: >> - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. >> - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. >> - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! >> 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Fix Copyright format. Marked as reviewed by jvernee (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/12708 From dcubed at openjdk.org Wed Mar 15 19:17:38 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 15 Mar 2023 19:17:38 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v26] In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 09:36:25 GMT, Roman Kennke wrote: > Would it be possible to open/send me the failing test that triggers vframeArray assert > or extract a reproducer that you could publish? I have started an internal discussion at Oracle to see what it would take to move that test from closed to open. Will keep you posted. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From darcy at openjdk.org Wed Mar 15 19:26:21 2023 From: darcy at openjdk.org (Joe Darcy) Date: Wed, 15 Mar 2023 19:26:21 GMT Subject: RFR: 8303431: [JVMCI] libgraal annotation API [v6] In-Reply-To: References: <0hYs21V1ZWB8o92CfvkEW3i0dZKkeW8kYGQu0p6xvtM=.e76da2cd-dbe5-4da2-a6cb-775f081b9a6a@github.com> Message-ID: On Wed, 15 Mar 2023 18:00:49 GMT, Doug Simon wrote: > @jddarcy would you be able to help review this PR? Based on git log, you are knowledgeable in `sun.reflect.annotation`. If not, please suggest who else I can reach out to for a review. I can take a look at this, but it will be at least a few days before I can swap it in. I've also worked on the annotation-reading API in javax.lang.model. I assume https://bugs.openjdk.org/browse/JDK-8303431 recounts the motivation behind this change? ------------- PR: https://git.openjdk.org/jdk/pull/12810 From sspitsyn at openjdk.org Wed Mar 15 19:35:25 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 15 Mar 2023 19:35:25 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v10] In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 12:26:16 GMT, Markus Gr?nlund wrote: >> I've had a good look through now and have a better sense of the refactoring. Seems good. >> >> I'll wait for any tweaks before hitting the approve button though. >> >> Thanks > >> I've had a good look through now and have a better sense of the refactoring. Seems good. >> >> >> >> I'll wait for any tweaks before hitting the approve button though. >> >> >> >> Thanks > > Thanks so much for taking a look. I realized that implementation details of loading should probably reside in agent.cpp, not agentList.cpp. > > I am currently off on vacation and will update when back. Thanks also to Andrew Dinn for comments. @mgronlun I'm looking at the fixes but it will take some time. ------------- PR: https://git.openjdk.org/jdk/pull/12923 From rkennke at openjdk.org Wed Mar 15 19:43:34 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 15 Mar 2023 19:43:34 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v26] In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 19:14:09 GMT, Daniel D. Daugherty wrote: > > Would it be possible to open/send me the failing test that triggers vframeArray assert > > or extract a reproducer that you could publish? > > I have started an internal discussion at Oracle to see what it would take to move that test from closed to open. Will keep you posted. Thank you! Regarding moving this PR back to draft, I am not sure. I can do that, yes. But really the fundamental algorithm and implementation is basically fixed since half a year already. I have re-worked it into a fresh PR based on the request to put it behind a flag. The recent change to a fixed-size lock-stack has probably invalidated part of your previous reviews, and I am sorry for that. On the upside, it removed a lot of complexity in the JIT compilers and assembly code generators. What else do I expect to happen? Thomas is working on an ARM(32) port, but this is quite separate and could even land after this PR is done. I still don't quite like the naming. Fast-locking doesn't really say anything and it's not (meant to be) faster than the previous stack-locking. It is an alternative (and less racy, on the object header) way to implement a thin-locking layer before inflating monitors, that is all. So maybe -XX:+UseNewThinLocking? It is somewhat temporary anyway. At least my hope is that when we eventually switch to Lilliput turned on by default, we would entirely remove stack-locking. I would also add some code in arguments.cpp to keep this new thin locking turned off on platforms that don't yet support it. Besides that, from my POV, it is pretty much done. What do you think? ------------- PR: https://git.openjdk.org/jdk/pull/10907 From dnsimon at openjdk.org Wed Mar 15 20:37:22 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 15 Mar 2023 20:37:22 GMT Subject: RFR: 8303431: [JVMCI] libgraal annotation API [v6] In-Reply-To: References: <0hYs21V1ZWB8o92CfvkEW3i0dZKkeW8kYGQu0p6xvtM=.e76da2cd-dbe5-4da2-a6cb-775f081b9a6a@github.com> Message-ID: On Wed, 15 Mar 2023 19:23:52 GMT, Joe Darcy wrote: > I assume https://bugs.openjdk.org/browse/JDK-8303431 recounts the motivation behind this change? Yes, it does. Thanks in advance. ------------- PR: https://git.openjdk.org/jdk/pull/12810 From jvernee at openjdk.org Wed Mar 15 20:55:31 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 15 Mar 2023 20:55:31 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v16] In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 22:52:44 GMT, Martin Doerr wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix storing 32 bit integers into Java frames. Enable TestArrayStructs. > > Btw. the new cases in which we use int and short accesses when byteWidth is not a power of 2 are never unaligned AFAICS. I guess _UNALIGNED is unnecessary in the JAVA_INT_UNALIGNED and JAVA_SHORT_UNALIGNED. They are always aligned wrt. to their size. @TheRealMDoerr I've approved the PR. I suggest for a second reviewer to try and get someone who knows the PPC port. ------------- PR: https://git.openjdk.org/jdk/pull/12708 From dcubed at openjdk.org Wed Mar 15 20:58:38 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 15 Mar 2023 20:58:38 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v27] In-Reply-To: References: Message-ID: <14eqGd_d9yW5aXDqAYrCnohm4cB7tACgAKc_qDsTJGA=.33c510e9-63f0-4a5f-aebe-bc77f0ffbefc@github.com> On Wed, 15 Mar 2023 09:41:30 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with three additional commits since the last revision: > > - More RISCV changes (by Fei Yang) > - Use -w instructions in fast_unlock() > - Increase stub size of C2HandleAnonOwnerStub to 18 Personally, I'm fine with leaving this PR in the non-draft/ready-to-review state. However, that's because I'm very much in sync (no pun intended) with where this code is at currently. I catch up on the latest changes everyday and I've started doing Mach5 test cycles everyday. When I return to Orlando, I will start doing stress testing runs in my lab. For other folks that started reviewing earlier than I did, they may have a different POV. I do see the change to a fixed-size lock-stack as a wonderful improvement because it got rid of so many changes. I think the project went from 74 changed and new files down to 51 changed and new files. As for naming, I don't have any suggestions at the moment. I believe @dholmes-ora has commented in other reviews that "Naming is hard..." ------------- PR: https://git.openjdk.org/jdk/pull/10907 From stuefe at openjdk.org Wed Mar 15 21:32:36 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 15 Mar 2023 21:32:36 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v27] In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 09:41:30 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with three additional commits since the last revision: > > - More RISCV changes (by Fei Yang) > - Use -w instructions in fast_unlock() > - Increase stub size of C2HandleAnonOwnerStub to 18 I proposed NewStyleThinLocks or ThinLocks2. Anything really thats easy to grep for and to distinguish from old stack based locks. I can live with RomanStyleLocks :) If things work out this is a transient state anyway. I also think Roman should just decide and name it. Getting rid pf the growing part of LockStack is a relief. The missing ports are no showstoppers, we can do them later (I'll work on arm but I'm swamped atm so this may take a week or so). ------------- PR: https://git.openjdk.org/jdk/pull/10907 From kvn at openjdk.org Thu Mar 16 00:20:20 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 16 Mar 2023 00:20:20 GMT Subject: RFR: 8296440: Remove Method* handling from cleanup_inline_caches_impl In-Reply-To: References: Message-ID: <8C06DfuEkBnOaaP1WFv7Y6TRftP9XMitffS3UbOaf8c=.abb8c6e3-7a3b-4412-89b3-8819bd93900e@github.com> On Wed, 1 Mar 2023 11:07:36 GMT, Richard Reingruber wrote: > This PR replaces cleaning of static stubs in CompiledMethod::cleanup_inline_caches_impl() with a guarantee that it is actually not needed because the holder of the embedded target Method* is alive if the caller nmethod is not unloading. > > The holder of the target Method* has to be alive because it is reachable from the caller nmethod's oop pool. This is checked by `check_path_to_callee()` when a statically bound call gets resolved. > > C2i entry barriers can be removed for the same reason. > > Testing: > > Many rounds in our CI testing which includes most JCK and JTREG tests, Renaissance benchmark and SAP specific tests with fastdebug and release builds on the standard platforms plus PPC64. > > I've also done tier1 and tier2 tests with -XX:-Inline and tier1 tests with ZGC. > > I've started hotspot and jdk tier1 tests with -Xcomp. They were not finished when I stopped them after 24h. I hit guarantee in tier7 running our closed tests with JRuby and `-Xcomp -ea -esa -XX:CompileThreshold=100` flags: # Internal Error (/workspace/open/src/hotspot/share/runtime/sharedRuntime.cpp:1411), pid=1150904, tid=1150907 # guarantee(false) failed: Missing dependency resolving optimized virtual (invokeinterface) call to jnr.enxio.channels.Native$LibC$jnr$ffi$2::read Unfortunately I can't share tests. ------------- PR: https://git.openjdk.org/jdk/pull/12802 From kvn at openjdk.org Thu Mar 16 00:28:20 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 16 Mar 2023 00:28:20 GMT Subject: RFR: 8296440: Remove Method* handling from cleanup_inline_caches_impl In-Reply-To: References: Message-ID: On Wed, 1 Mar 2023 11:07:36 GMT, Richard Reingruber wrote: > This PR replaces cleaning of static stubs in CompiledMethod::cleanup_inline_caches_impl() with a guarantee that it is actually not needed because the holder of the embedded target Method* is alive if the caller nmethod is not unloading. > > The holder of the target Method* has to be alive because it is reachable from the caller nmethod's oop pool. This is checked by `check_path_to_callee()` when a statically bound call gets resolved. > > C2i entry barriers can be removed for the same reason. > > Testing: > > Many rounds in our CI testing which includes most JCK and JTREG tests, Renaissance benchmark and SAP specific tests with fastdebug and release builds on the standard platforms plus PPC64. > > I've also done tier1 and tier2 tests with -XX:-Inline and tier1 tests with ZGC. > > I've started hotspot and jdk tier1 tests with -Xcomp. They were not finished when I stopped them after 24h. It was linux-x64-debug VM: # JRE version: Java(TM) SE Runtime Environment (21.0) (fastdebug build 21-internal-LTS-2023-03-15-1549029.vladimir.kozlov.jdkgit2) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 21-internal-LTS-2023-03-15-1549029.vladimir.kozlov.jdkgit2, compiled mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x18cca1f] SharedRuntime::resolve_sub_helper(bool, bool, JavaThread*)+0x96f ------------- PR: https://git.openjdk.org/jdk/pull/12802 From fjiang at openjdk.org Thu Mar 16 03:53:16 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 16 Mar 2023 03:53:16 GMT Subject: RFR: 8304293: RISC-V: JDK-8276799 missed atomic intrinsic support for C1 Message-ID: <9U2safmMfGsfIIlyBZW3E3dbm68mKcBAzemr5yqoPEw=.c261f575-2cbc-402e-b742-7a6bdc8c2450@github.com> The following intrinsics in C1 are controlled by supports_atomic_xxx, but they are not set properly on RISC-V: - _getAndAddInt - _getAndAddLong - _getAndSetInt - _getAndSetLong - _getAndSetReference RISC-V provides a set of atomic instructions [1], these intrinsics could be enabled by default. Here is the HIR output of C1: before: B18 (V) [189, 196] -> B20 pred: B8 B17 empty stack inlining depth 0 __bci__use__tid____instr____________________________________ 0 0 a251 3 0 a252 null 4 0 l254 274954985816L 7 0 l255 1L . 8 0 l256 a251.invokespecial(a252, l254, l255) jdk/internal/misc/Unsafe.getAndAddLong(Ljava/lang/Object;JJ)J . 193 0 l258 a42._24 := l256 (J) tid . 196 0 259 goto B20 after: B18 (V) [189, 196] -> B20 pred: B8 B17 empty stack inlining depth 0 __bci__use__tid____instr____________________________________ 0 0 a251 3 0 a252 null 4 0 l254 274954985816L 7 0 l255 1L . 8 0 l256 UnsafeGetAndSet (add)(a252, l254, value l255) . 193 0 l258 a42._24 := l256 (J) tid . 196 0 259 goto B20 1. https://github.com/riscv/riscv-isa-manual/blob/8b9047d8d20ef548f7996efee1550760d7bc1279/src/a.tex#L416-L422 Testing: - [ ] tier1 on Unmatched board (release build) ------------- Commit messages: - missed atomic intrinsic support for c1 Changes: https://git.openjdk.org/jdk/pull/13053/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13053&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8304293 Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13053.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13053/head:pull/13053 PR: https://git.openjdk.org/jdk/pull/13053 From kvn at openjdk.org Thu Mar 16 04:10:18 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 16 Mar 2023 04:10:18 GMT Subject: RFR: 8296440: Remove Method* handling from cleanup_inline_caches_impl In-Reply-To: References: Message-ID: On Wed, 1 Mar 2023 11:07:36 GMT, Richard Reingruber wrote: > This PR replaces cleaning of static stubs in CompiledMethod::cleanup_inline_caches_impl() with a guarantee that it is actually not needed because the holder of the embedded target Method* is alive if the caller nmethod is not unloading. > > The holder of the target Method* has to be alive because it is reachable from the caller nmethod's oop pool. This is checked by `check_path_to_callee()` when a statically bound call gets resolved. > > C2i entry barriers can be removed for the same reason. > > Testing: > > Many rounds in our CI testing which includes most JCK and JTREG tests, Renaissance benchmark and SAP specific tests with fastdebug and release builds on the standard platforms plus PPC64. > > I've also done tier1 and tier2 tests with -XX:-Inline and tier1 tests with ZGC. > > I've started hotspot and jdk tier1 tests with -Xcomp. They were not finished when I stopped them after 24h. Finally got hs_err file. The top of call stack look like this: V [libjvm.so+0x18cca1f] SharedRuntime::resolve_sub_helper(bool, bool, JavaThread*)+0x96f (sharedRuntime.cpp:1411) V [libjvm.so+0x18ccdb9] SharedRuntime::resolve_helper(bool, bool, JavaThread*)+0x39 (sharedRuntime.cpp:1246) V [libjvm.so+0x18cdaee] SharedRuntime::resolve_opt_virtual_call_C(JavaThread*)+0x15e (sharedRuntime.cpp:1690) v ~RuntimeStub::resolve_opt_virtual_call 0x00007fdab0ddb89b J 17706 c2 jnr.enxio.channels.Native.read(ILjava/nio/ByteBuffer;)I org.jruby.dist (80 bytes) @ 0x00007fdab1317d9c [0x00007fdab1317d20+0x000000000000007c] J 17705 c2 jnr.enxio.channels.NativeDeviceChannel.read(Ljava/nio/ByteBuffer;)I org.jruby.dist (91 bytes) @ 0x00007fdab2037704 [0x00007fdab20376c0+0x0000000000000044] J 17384 c2 org.jruby.util.io.PosixShim.read(Lorg/jruby/util/io/ChannelFD;[BIIZ)I org.jruby.dist (192 bytes) @ 0x00007fdab190984c [0x00007fdab1909720+0x000000000000012c] J 17379 c2 org.jruby.util.io.OpenFile$2.run(Lorg/jruby/runtime/ThreadContext;Lorg/jruby/util/io/OpenFile;[BII)I org.jruby.dist (82 bytes) @ 0x00007fdab1fff4e8 [0x00007fdab1fff400+0x00000000000000e8] ------------- PR: https://git.openjdk.org/jdk/pull/12802 From dchuyko at openjdk.org Thu Mar 16 04:55:25 2023 From: dchuyko at openjdk.org (Dmitry Chuyko) Date: Thu, 16 Mar 2023 04:55:25 GMT Subject: RFR: 8300669: AArch64: Table based tails processing and wider stores for Arrays.fill() intrinsic [v7] In-Reply-To: <0HhfPpk5EIXfhlmdTaT-ik1EQWgYXSKkK7f4fuLKGh0=.9e690153-fc70-49a0-aada-2829747da8cf@github.com> References: <0HhfPpk5EIXfhlmdTaT-ik1EQWgYXSKkK7f4fuLKGh0=.9e690153-fc70-49a0-aada-2829747da8cf@github.com> Message-ID: <5JHSnU7J6tzmoU1yKRYGOA1T8WDi2NHeXWPfQCOYKhU=.7f8023de-9f58-49a3-8c9b-a8e46c7430e9@github.com> On Thu, 9 Mar 2023 00:18:05 GMT, Dmitry Chuyko wrote: >> This is a new AArch64 implementation of existing (1-4-byte element) stubs that are called in C2-compiled code for array fill patterns and Arrays.fill(). >> >> Main variant of existing algorithm: >> >> >> [Short arrays (< 8 bytes): fill by element and exit]; >> // ... >> [align base to 8 bytes]; >> // ... >> // fill_words >> head_len = (cnt & 14) / 2; >> switch (head_len) { >> do { >> cnt -= 16; >> stp; >> case 7: >> stp; >> case 6: >> stp; >> // ... >> case 1: >> stp; >> case 0: >> base += 8*16; >> } while (cnt); >> } >> [(over)write a tail < 8 bytes]; >> >> >> Even in good case, only 16-byte GPR (STP) stores are used, and there is a jump for every 8 stores. There is always extra work to be done for misaligned targets, which especially affects small to medium lengths. >> >> The new implementation generates fill implementation for every length up to a certain threshold (160-byte length). These implementations form a table where you jump when the remaining target length is suitable. >> >> For each table entry (target length), we can have no branches and use the most number of widest possible stores that best fit the detected CPU model. Currently it is SIMD STPQ for Neoverse N2 and GPR STP for the rest. The choice is made after benchmarking and is controlled by the new UseSIMDForArrayFill flag in AArch64. >> >> Main variant of the new algorithm (see mode detailed description in comments): >> >> >> [align data at 16 bytes]; >> while(cnt_bytes > 128) { >> [store 128 bytes]; >> cnt_bytes -= 128; >> } >> [store tail of 0..127 bytes]; >> >> >> >> Both existing and proposed implementations specifically handle zero fill case (see comments about ZVA). New implementation contains a path for very small arrays that can be cut to further improve more generic case (added to avoid regressions). >> >> The check added in https://bugs.openjdk.org/browse/JDK-8298720 in StubGenerator is removed as it is a stub code being generated. For the selected threshold, the increase in code size is within 8 KB. >> >> New test TestArraysFill is added to intrinsics jtreg tests. It calls optimized versions of 2-arg and 4-arg Arrays.fill() for different data types, lengths and patterns. The target data is checked to be filled with the required value, the surrounding data is checked to be intact. >> >> Existing test/micro/org/openjdk/bench/java/util/ArraysFill.java benchmark was used only initially. There are many cases and data lengths to cover. A modified version of the benchmark is attached [1] to the RFE, but not included in the change as it takes too long to complete all valuable variants. >> >> Resulting performance data are listed in the spreadsheet [2] attache to the RFE. Target processors were Graviton 3, Graviton 2, TaiShan, A72 and A53. Latest data from Altra is not included but the picture there was similar to Graviton 2 in all experiments. There is a range of target lengths with various enhancement numbers. Interesting lengths are within table implementation threshold and close to them (stepped), small lengths (all) and long lengths (1 point, they look similar). Over this voluntary selection: >> >> - No major regressions were found. >> - Geomean improvement: 11-33% >> - Median improvement: 10-48% >> >> Testing: tier1, tier2 and the new test on fastdebug aarch64 and x86. >> >> [1] https://bugs.openjdk.org/secure/attachment/102426/ArraysFill.java >> [2] https://bugs.openjdk.org/secure/attachment/102427/arrays-fill.ods > > Dmitry Chuyko has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'openjdk:master' into JDK-8300669 > - Var in test > - Merge branch 'openjdk:master' into JDK-8300669 > - Wording about alignment > - Fixed compilation on win/mac > - Merge branch 'openjdk:master' into JDK-8300669 > - Table based arrays_fill stub implementation for aarch64 Th? functionality in question applies to Arrays.fill() and to fill loops as well. First one is easier to study: get (start, fill length, array length), and get the call weight in profiles. As an example consider parameters distribution in OpenJDK jtreg tests. ![02-jtreg-lengths](https://user-images.githubusercontent.com/31855791/225517679-1df9db71-03cd-4ed6-9b79-be19c229d73c.png) Here most fills start with 0 and fill length is below 512 elements. ------------- PR: https://git.openjdk.org/jdk/pull/12222 From dchuyko at openjdk.org Thu Mar 16 05:07:21 2023 From: dchuyko at openjdk.org (Dmitry Chuyko) Date: Thu, 16 Mar 2023 05:07:21 GMT Subject: RFR: 8300669: AArch64: Table based tails processing and wider stores for Arrays.fill() intrinsic [v7] In-Reply-To: <0HhfPpk5EIXfhlmdTaT-ik1EQWgYXSKkK7f4fuLKGh0=.9e690153-fc70-49a0-aada-2829747da8cf@github.com> References: <0HhfPpk5EIXfhlmdTaT-ik1EQWgYXSKkK7f4fuLKGh0=.9e690153-fc70-49a0-aada-2829747da8cf@github.com> Message-ID: On Thu, 9 Mar 2023 00:18:05 GMT, Dmitry Chuyko wrote: >> This is a new AArch64 implementation of existing (1-4-byte element) stubs that are called in C2-compiled code for array fill patterns and Arrays.fill(). >> >> Main variant of existing algorithm: >> >> >> [Short arrays (< 8 bytes): fill by element and exit]; >> // ... >> [align base to 8 bytes]; >> // ... >> // fill_words >> head_len = (cnt & 14) / 2; >> switch (head_len) { >> do { >> cnt -= 16; >> stp; >> case 7: >> stp; >> case 6: >> stp; >> // ... >> case 1: >> stp; >> case 0: >> base += 8*16; >> } while (cnt); >> } >> [(over)write a tail < 8 bytes]; >> >> >> Even in good case, only 16-byte GPR (STP) stores are used, and there is a jump for every 8 stores. There is always extra work to be done for misaligned targets, which especially affects small to medium lengths. >> >> The new implementation generates fill implementation for every length up to a certain threshold (160-byte length). These implementations form a table where you jump when the remaining target length is suitable. >> >> For each table entry (target length), we can have no branches and use the most number of widest possible stores that best fit the detected CPU model. Currently it is SIMD STPQ for Neoverse N2 and GPR STP for the rest. The choice is made after benchmarking and is controlled by the new UseSIMDForArrayFill flag in AArch64. >> >> Main variant of the new algorithm (see mode detailed description in comments): >> >> >> [align data at 16 bytes]; >> while(cnt_bytes > 128) { >> [store 128 bytes]; >> cnt_bytes -= 128; >> } >> [store tail of 0..127 bytes]; >> >> >> >> Both existing and proposed implementations specifically handle zero fill case (see comments about ZVA). New implementation contains a path for very small arrays that can be cut to further improve more generic case (added to avoid regressions). >> >> The check added in https://bugs.openjdk.org/browse/JDK-8298720 in StubGenerator is removed as it is a stub code being generated. For the selected threshold, the increase in code size is within 8 KB. >> >> New test TestArraysFill is added to intrinsics jtreg tests. It calls optimized versions of 2-arg and 4-arg Arrays.fill() for different data types, lengths and patterns. The target data is checked to be filled with the required value, the surrounding data is checked to be intact. >> >> Existing test/micro/org/openjdk/bench/java/util/ArraysFill.java benchmark was used only initially. There are many cases and data lengths to cover. A modified version of the benchmark is attached [1] to the RFE, but not included in the change as it takes too long to complete all valuable variants. >> >> Resulting performance data are listed in the spreadsheet [2] attache to the RFE. Target processors were Graviton 3, Graviton 2, TaiShan, A72 and A53. Latest data from Altra is not included but the picture there was similar to Graviton 2 in all experiments. There is a range of target lengths with various enhancement numbers. Interesting lengths are within table implementation threshold and close to them (stepped), small lengths (all) and long lengths (1 point, they look similar). Over this voluntary selection: >> >> - No major regressions were found. >> - Geomean improvement: 11-33% >> - Median improvement: 10-48% >> >> Testing: tier1, tier2 and the new test on fastdebug aarch64 and x86. >> >> [1] https://bugs.openjdk.org/secure/attachment/102426/ArraysFill.java >> [2] https://bugs.openjdk.org/secure/attachment/102427/arrays-fill.ods > > Dmitry Chuyko has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'openjdk:master' into JDK-8300669 > - Var in test > - Merge branch 'openjdk:master' into JDK-8300669 > - Wording about alignment > - Fixed compilation on win/mac > - Merge branch 'openjdk:master' into JDK-8300669 > - Table based arrays_fill stub implementation for aarch64 In micro-benchmarks we can compare proposed implementation ("patch"), existing one ("clean") and also a variant when stubs are not called ("off"). For instance here is a plot for aligned int fills on Graviton 3. ![03-fill-512-3](https://user-images.githubusercontent.com/31855791/225518139-11da0cb8-277c-4405-bc63-a5f459f038ce.png) Existing stub is slower than no-stub variant on many lengths and shows leaping results for neighbor lengths. Patched version is the same 30% to current version as that is to no-stub version for <90 elements. Patched version is same or better than no-stub version for longer lengths and it "leaps" much less. ------------- PR: https://git.openjdk.org/jdk/pull/12222 From dchuyko at openjdk.org Thu Mar 16 05:17:22 2023 From: dchuyko at openjdk.org (Dmitry Chuyko) Date: Thu, 16 Mar 2023 05:17:22 GMT Subject: RFR: 8300669: AArch64: Table based tails processing and wider stores for Arrays.fill() intrinsic [v7] In-Reply-To: <0HhfPpk5EIXfhlmdTaT-ik1EQWgYXSKkK7f4fuLKGh0=.9e690153-fc70-49a0-aada-2829747da8cf@github.com> References: <0HhfPpk5EIXfhlmdTaT-ik1EQWgYXSKkK7f4fuLKGh0=.9e690153-fc70-49a0-aada-2829747da8cf@github.com> Message-ID: On Thu, 9 Mar 2023 00:18:05 GMT, Dmitry Chuyko wrote: >> This is a new AArch64 implementation of existing (1-4-byte element) stubs that are called in C2-compiled code for array fill patterns and Arrays.fill(). >> >> Main variant of existing algorithm: >> >> >> [Short arrays (< 8 bytes): fill by element and exit]; >> // ... >> [align base to 8 bytes]; >> // ... >> // fill_words >> head_len = (cnt & 14) / 2; >> switch (head_len) { >> do { >> cnt -= 16; >> stp; >> case 7: >> stp; >> case 6: >> stp; >> // ... >> case 1: >> stp; >> case 0: >> base += 8*16; >> } while (cnt); >> } >> [(over)write a tail < 8 bytes]; >> >> >> Even in good case, only 16-byte GPR (STP) stores are used, and there is a jump for every 8 stores. There is always extra work to be done for misaligned targets, which especially affects small to medium lengths. >> >> The new implementation generates fill implementation for every length up to a certain threshold (160-byte length). These implementations form a table where you jump when the remaining target length is suitable. >> >> For each table entry (target length), we can have no branches and use the most number of widest possible stores that best fit the detected CPU model. Currently it is SIMD STPQ for Neoverse N2 and GPR STP for the rest. The choice is made after benchmarking and is controlled by the new UseSIMDForArrayFill flag in AArch64. >> >> Main variant of the new algorithm (see mode detailed description in comments): >> >> >> [align data at 16 bytes]; >> while(cnt_bytes > 128) { >> [store 128 bytes]; >> cnt_bytes -= 128; >> } >> [store tail of 0..127 bytes]; >> >> >> >> Both existing and proposed implementations specifically handle zero fill case (see comments about ZVA). New implementation contains a path for very small arrays that can be cut to further improve more generic case (added to avoid regressions). >> >> The check added in https://bugs.openjdk.org/browse/JDK-8298720 in StubGenerator is removed as it is a stub code being generated. For the selected threshold, the increase in code size is within 8 KB. >> >> New test TestArraysFill is added to intrinsics jtreg tests. It calls optimized versions of 2-arg and 4-arg Arrays.fill() for different data types, lengths and patterns. The target data is checked to be filled with the required value, the surrounding data is checked to be intact. >> >> Existing test/micro/org/openjdk/bench/java/util/ArraysFill.java benchmark was used only initially. There are many cases and data lengths to cover. A modified version of the benchmark is attached [1] to the RFE, but not included in the change as it takes too long to complete all valuable variants. >> >> Resulting performance data are listed in the spreadsheet [2] attache to the RFE. Target processors were Graviton 3, Graviton 2, TaiShan, A72 and A53. Latest data from Altra is not included but the picture there was similar to Graviton 2 in all experiments. There is a range of target lengths with various enhancement numbers. Interesting lengths are within table implementation threshold and close to them (stepped), small lengths (all) and long lengths (1 point, they look similar). Over this voluntary selection: >> >> - No major regressions were found. >> - Geomean improvement: 11-33% >> - Median improvement: 10-48% >> >> Testing: tier1, tier2 and the new test on fastdebug aarch64 and x86. >> >> [1] https://bugs.openjdk.org/secure/attachment/102426/ArraysFill.java >> [2] https://bugs.openjdk.org/secure/attachment/102427/arrays-fill.ods > > Dmitry Chuyko has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'openjdk:master' into JDK-8300669 > - Var in test > - Merge branch 'openjdk:master' into JDK-8300669 > - Wording about alignment > - Fixed compilation on win/mac > - Merge branch 'openjdk:master' into JDK-8300669 > - Table based arrays_fill stub implementation for aarch64 In non-dedicated benchmarks there is some improvement in message digest operations for small lengths where cleaning operation has noticeable weight. E.g. SHA-224 for size 16 is 4.7% faster (MD5, SHA-256, SHA3-256 show 1-4%). SpecJVM serial is 0.14% faster (fill() weight there is ~0.28%). ------------- PR: https://git.openjdk.org/jdk/pull/12222 From dchuyko at openjdk.org Thu Mar 16 05:21:21 2023 From: dchuyko at openjdk.org (Dmitry Chuyko) Date: Thu, 16 Mar 2023 05:21:21 GMT Subject: RFR: 8300669: AArch64: Table based tails processing and wider stores for Arrays.fill() intrinsic [v7] In-Reply-To: <0HhfPpk5EIXfhlmdTaT-ik1EQWgYXSKkK7f4fuLKGh0=.9e690153-fc70-49a0-aada-2829747da8cf@github.com> References: <0HhfPpk5EIXfhlmdTaT-ik1EQWgYXSKkK7f4fuLKGh0=.9e690153-fc70-49a0-aada-2829747da8cf@github.com> Message-ID: On Thu, 9 Mar 2023 00:18:05 GMT, Dmitry Chuyko wrote: >> This is a new AArch64 implementation of existing (1-4-byte element) stubs that are called in C2-compiled code for array fill patterns and Arrays.fill(). >> >> Main variant of existing algorithm: >> >> >> [Short arrays (< 8 bytes): fill by element and exit]; >> // ... >> [align base to 8 bytes]; >> // ... >> // fill_words >> head_len = (cnt & 14) / 2; >> switch (head_len) { >> do { >> cnt -= 16; >> stp; >> case 7: >> stp; >> case 6: >> stp; >> // ... >> case 1: >> stp; >> case 0: >> base += 8*16; >> } while (cnt); >> } >> [(over)write a tail < 8 bytes]; >> >> >> Even in good case, only 16-byte GPR (STP) stores are used, and there is a jump for every 8 stores. There is always extra work to be done for misaligned targets, which especially affects small to medium lengths. >> >> The new implementation generates fill implementation for every length up to a certain threshold (160-byte length). These implementations form a table where you jump when the remaining target length is suitable. >> >> For each table entry (target length), we can have no branches and use the most number of widest possible stores that best fit the detected CPU model. Currently it is SIMD STPQ for Neoverse N2 and GPR STP for the rest. The choice is made after benchmarking and is controlled by the new UseSIMDForArrayFill flag in AArch64. >> >> Main variant of the new algorithm (see mode detailed description in comments): >> >> >> [align data at 16 bytes]; >> while(cnt_bytes > 128) { >> [store 128 bytes]; >> cnt_bytes -= 128; >> } >> [store tail of 0..127 bytes]; >> >> >> >> Both existing and proposed implementations specifically handle zero fill case (see comments about ZVA). New implementation contains a path for very small arrays that can be cut to further improve more generic case (added to avoid regressions). >> >> The check added in https://bugs.openjdk.org/browse/JDK-8298720 in StubGenerator is removed as it is a stub code being generated. For the selected threshold, the increase in code size is within 8 KB. >> >> New test TestArraysFill is added to intrinsics jtreg tests. It calls optimized versions of 2-arg and 4-arg Arrays.fill() for different data types, lengths and patterns. The target data is checked to be filled with the required value, the surrounding data is checked to be intact. >> >> Existing test/micro/org/openjdk/bench/java/util/ArraysFill.java benchmark was used only initially. There are many cases and data lengths to cover. A modified version of the benchmark is attached [1] to the RFE, but not included in the change as it takes too long to complete all valuable variants. >> >> Resulting performance data are listed in the spreadsheet [2] attache to the RFE. Target processors were Graviton 3, Graviton 2, TaiShan, A72 and A53. Latest data from Altra is not included but the picture there was similar to Graviton 2 in all experiments. There is a range of target lengths with various enhancement numbers. Interesting lengths are within table implementation threshold and close to them (stepped), small lengths (all) and long lengths (1 point, they look similar). Over this voluntary selection: >> >> - No major regressions were found. >> - Geomean improvement: 11-33% >> - Median improvement: 10-48% >> >> Testing: tier1, tier2 and the new test on fastdebug aarch64 and x86. >> >> [1] https://bugs.openjdk.org/secure/attachment/102426/ArraysFill.java >> [2] https://bugs.openjdk.org/secure/attachment/102427/arrays-fill.ods > > Dmitry Chuyko has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'openjdk:master' into JDK-8300669 > - Var in test > - Merge branch 'openjdk:master' into JDK-8300669 > - Wording about alignment > - Fixed compilation on win/mac > - Merge branch 'openjdk:master' into JDK-8300669 > - Table based arrays_fill stub implementation for aarch64 It is not so easy to see icache pressure in benchmarks but all branching penalties show up quickly. Current patch allows tuning of the table size. It actually can be a mask to allow certain lengths. So for any known application where icache pressure is significant it is possible to try different table volumes. ------------- PR: https://git.openjdk.org/jdk/pull/12222 From sspitsyn at openjdk.org Thu Mar 16 05:24:59 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 16 Mar 2023 05:24:59 GMT Subject: RFR: 8304303: implement VirtualThread class notifyJvmti methods as C2 intrinsics Message-ID: <-Pt3zLSu1Y2GYeM8XEivglUyDVXlAqMIA42-_zEnHlo=.7dd40f19-160a-4f11-8702-99c69a9b9923@github.com> This is needed for performance improvements in support of virtual threads. The update includes the following: 1. Refactored the `VirtualThread` native methods: `notifyJvmtiMountBegin` and `notifyJvmtiMountEnd` => `notifyJvmtiMount` `notifyJvmtiUnmountBegin` and `notifyJvmtiUnmountEnd` => `notifyJvmtiUnmount` 2. Still useful implementation of old native methods is moved from `jvm.cpp` to `jvmtiThreadState.cpp`: `JVM_VirtualThreadMountStart` => `VTMS_mount_begin` `JVM_VirtualThreadMountEnd` => `VTMS_mount_end` `JVM_VirtualThreadUnmountStart` = > `VTMS_unmount_begin` `JVM_VirtualThreadUnmountEnd` => `VTMS_mount_end` 3. Intrinsified the `VirtualThread` native methods: `notifyJvmtiMount`, `notifyJvmtiUnmount`, `notifyJvmtiHideFrames`. 4. Removed the`VirtualThread` static boolean state variable `notifyJvmtiEvents` and its support in `javaClasses`. 5. Added static boolean state variable `_VTMS_notify_jvmti_events` to the jvmtiVTMSTransitionDisabler class as a replacement of the `VirtualThread` `notifyJvmtiEvents` variable. Implementing the same methods as C1 intrinsics can be needed in the future but is a low priority for now. Testing: - Ran mach5 tiers 1-6. No regressions were found. ------------- Commit messages: - fix traling spaces in a couple of files - minor update for VTMS_notify_jvmti_events variable - 8304303: implement VirtualThread class notifyJvmti methods as C2 intrinsics Changes: https://git.openjdk.org/jdk/pull/13054/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13054&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8304303 Stats: 438 lines in 20 files changed: 265 ins; 125 del; 48 mod Patch: https://git.openjdk.org/jdk/pull/13054.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13054/head:pull/13054 PR: https://git.openjdk.org/jdk/pull/13054 From dholmes at openjdk.org Thu Mar 16 05:48:31 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 16 Mar 2023 05:48:31 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v26] In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 19:40:33 GMT, Roman Kennke wrote: >>> Would it be possible to open/send me the failing test that triggers vframeArray assert >>> or extract a reproducer that you could publish? >> >> I have started an internal discussion at Oracle to see what it would take >> to move that test from closed to open. Will keep you posted. > >> > Would it be possible to open/send me the failing test that triggers vframeArray assert >> > or extract a reproducer that you could publish? >> >> I have started an internal discussion at Oracle to see what it would take to move that test from closed to open. Will keep you posted. > > Thank you! > > Regarding moving this PR back to draft, I am not sure. I can do that, yes. But really the fundamental algorithm and implementation is basically fixed since half a year already. I have re-worked it into a fresh PR based on the request to put it behind a flag. The recent change to a fixed-size lock-stack has probably invalidated part of your previous reviews, and I am sorry for that. On the upside, it removed a lot of complexity in the JIT compilers and assembly code generators. > > What else do I expect to happen? > > Thomas is working on an ARM(32) port, but this is quite separate and could even land after this PR is done. > > I still don't quite like the naming. Fast-locking doesn't really say anything and it's not (meant to be) faster than the previous stack-locking. It is an alternative (and less racy, on the object header) way to implement a thin-locking layer before inflating monitors, that is all. So maybe -XX:+UseNewThinLocking? It is somewhat temporary anyway. At least my hope is that when we eventually switch to Lilliput turned on by default, we would entirely remove stack-locking. > > I would also add some code in arguments.cpp to keep this new thin locking turned off on platforms that don't yet support it. > > Besides that, from my POV, it is pretty much done. > > What do you think? @rkennke The changed to fixed-size lock stack was a significant change as you note and that suggested to me that the design was still in flux. So I have to wonder whether everything is in fact now stable? (or as stable as one should expect for an experimental new feature) > Fast-locking doesn't really say anything and it's not (meant to be) faster than the previous stack-locking. Agreed. But I don't think "Thin locks" is an option as that was specifically an IBM locking implementation. Historically Hotspot's locking mechanism has internally been referred to as stack-locks, so I would suggest UseNewStackLocks ------------- PR: https://git.openjdk.org/jdk/pull/10907 From stuefe at openjdk.org Thu Mar 16 06:08:34 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 16 Mar 2023 06:08:34 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v26] In-Reply-To: References: Message-ID: <4ID9G5P6KGhzXLzlOEc2_lcAlMUo2GppzSA_gazLT2Q=.f3299b50-20f6-4666-9a60-4145edf3c5ee@github.com> On Thu, 16 Mar 2023 05:45:29 GMT, David Holmes wrote: > Agreed. But I don't think "Thin locks" is an option as that was specifically an IBM locking implementation. Historically Hotspot's locking mechanism has internally been referred to as stack-locks, so I would suggest UseNewStackLocks They don't use the stack anymore; would this not be us using a wrong name just for history's sake? ------------- PR: https://git.openjdk.org/jdk/pull/10907 From dholmes at openjdk.org Thu Mar 16 06:26:59 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 16 Mar 2023 06:26:59 GMT Subject: RFR: 8303150: DCmd framework unnecessarily creates a DCmd instance on registration [v3] In-Reply-To: References: Message-ID: > When DCmd factories are registered, the factory is passed the number of arguments taken by the DCmd - using a template method `get_num_arguments`. For DCmds that don't extend DCmdWithParser there has to be a static `num_arguments()` method in that class. For DCmds that do extend DCmdWithParser the logic instantiates an instance of the DCmd, extracts its parser and calls its `num_arguments` method which dynamically counts the number of defined arguments. > > Creating an instance of each DCmd and dynamically counting arguments is inefficient and unnecessary, the number of arguments is statically known and easily expressed (in fact many of the JFR DCmds already statically define this). So we add the static `num_arguments()` method to each class that needs it and return the statically counted number of arguments. To ensure the static number and actual number don't get out-of-sync, we keep the original dynamic logic for use in debug builds to assert that the static and dynamic counts are the same. The assert will trigger during a debug build if something does get out of sync, for example if a new DCmd (extending DCmdWithParser) were added but didn't define the static `num_arguments()` method. > > A number of DCmd classes were unnecessarily defining their own dynamic version of `num_arguments` and these are now removed. > > In the template method I use `ENABLE_IF(std::is_convertible::value)` to check we only call this on DCmd classes. This may be unnecessary but it seemed consistent with the other template methods. Note that `std::is_base_of` only works for immediate super types. > > Testing: tiers 1-4 > > Performance: in theory we should see some improvement in startup; in practice it is barely noticeable. > > Thanks. David Holmes has updated the pull request incrementally with one additional commit since the last revision: Make get_parsed_num_arguments() ASSERT only code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12994/files - new: https://git.openjdk.org/jdk/pull/12994/files/cfe345d8..fc46d253 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12994&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12994&range=01-02 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/12994.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12994/head:pull/12994 PR: https://git.openjdk.org/jdk/pull/12994 From stuefe at openjdk.org Thu Mar 16 06:26:59 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 16 Mar 2023 06:26:59 GMT Subject: RFR: 8303150: DCmd framework unnecessarily creates a DCmd instance on registration [v3] In-Reply-To: References: Message-ID: <24zSArtcqklQwi5GcF6x0D3j948ZShoGaOq4eXhGQ2g=.5a045f65-8ef4-43df-8e60-11456450deb7@github.com> On Thu, 16 Mar 2023 06:22:48 GMT, David Holmes wrote: >> When DCmd factories are registered, the factory is passed the number of arguments taken by the DCmd - using a template method `get_num_arguments`. For DCmds that don't extend DCmdWithParser there has to be a static `num_arguments()` method in that class. For DCmds that do extend DCmdWithParser the logic instantiates an instance of the DCmd, extracts its parser and calls its `num_arguments` method which dynamically counts the number of defined arguments. >> >> Creating an instance of each DCmd and dynamically counting arguments is inefficient and unnecessary, the number of arguments is statically known and easily expressed (in fact many of the JFR DCmds already statically define this). So we add the static `num_arguments()` method to each class that needs it and return the statically counted number of arguments. To ensure the static number and actual number don't get out-of-sync, we keep the original dynamic logic for use in debug builds to assert that the static and dynamic counts are the same. The assert will trigger during a debug build if something does get out of sync, for example if a new DCmd (extending DCmdWithParser) were added but didn't define the static `num_arguments()` method. >> >> A number of DCmd classes were unnecessarily defining their own dynamic version of `num_arguments` and these are now removed. >> >> In the template method I use `ENABLE_IF(std::is_convertible::value)` to check we only call this on DCmd classes. This may be unnecessary but it seemed consistent with the other template methods. Note that `std::is_base_of` only works for immediate super types. >> >> Testing: tiers 1-4 >> >> Performance: in theory we should see some improvement in startup; in practice it is barely noticeable. >> >> Thanks. > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > Make get_parsed_num_arguments() ASSERT only code @dholmes-ora thanks for the explanation and the added ASSERT. LGTM ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.org/jdk/pull/12994 From rkennke at openjdk.org Thu Mar 16 06:34:35 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 16 Mar 2023 06:34:35 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v26] In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 19:40:33 GMT, Roman Kennke wrote: >>> Would it be possible to open/send me the failing test that triggers vframeArray assert >>> or extract a reproducer that you could publish? >> >> I have started an internal discussion at Oracle to see what it would take >> to move that test from closed to open. Will keep you posted. > >> > Would it be possible to open/send me the failing test that triggers vframeArray assert >> > or extract a reproducer that you could publish? >> >> I have started an internal discussion at Oracle to see what it would take to move that test from closed to open. Will keep you posted. > > Thank you! > > Regarding moving this PR back to draft, I am not sure. I can do that, yes. But really the fundamental algorithm and implementation is basically fixed since half a year already. I have re-worked it into a fresh PR based on the request to put it behind a flag. The recent change to a fixed-size lock-stack has probably invalidated part of your previous reviews, and I am sorry for that. On the upside, it removed a lot of complexity in the JIT compilers and assembly code generators. > > What else do I expect to happen? > > Thomas is working on an ARM(32) port, but this is quite separate and could even land after this PR is done. > > I still don't quite like the naming. Fast-locking doesn't really say anything and it's not (meant to be) faster than the previous stack-locking. It is an alternative (and less racy, on the object header) way to implement a thin-locking layer before inflating monitors, that is all. So maybe -XX:+UseNewThinLocking? It is somewhat temporary anyway. At least my hope is that when we eventually switch to Lilliput turned on by default, we would entirely remove stack-locking. > > I would also add some code in arguments.cpp to keep this new thin locking turned off on platforms that don't yet support it. > > Besides that, from my POV, it is pretty much done. > > What do you think? > @rkennke The changed to fixed-size lock stack was a significant change as you note and that suggested to me that the design was still in flux. So I have to wonder whether everything is in fact now stable? (or as stable as one should expect for an experimental new feature) I think it is, except for the few points that I mentioned earlier, and anything that comes up in reviews, I don't expect any major design changes. In fact, I would actively hold them back if anything comes up, to move this PR across the line at this point. I can't think of any bad spots where I thunk 'oh this is ugly - this needs a better approach' though. > > Fast-locking doesn't really say anything and it's not (meant to be) faster than the previous stack-locking. > > > > Agreed. But I don't think "Thin locks" is an option as that was specifically an IBM locking implementation. Historically Hotspot's locking mechanism has internally been referred to as stack-locks, so I would suggest UseNewStackLocks That's fine by me. Thank you, Roman ------------- PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Thu Mar 16 06:34:36 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 16 Mar 2023 06:34:36 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v26] In-Reply-To: <4ID9G5P6KGhzXLzlOEc2_lcAlMUo2GppzSA_gazLT2Q=.f3299b50-20f6-4666-9a60-4145edf3c5ee@github.com> References: <4ID9G5P6KGhzXLzlOEc2_lcAlMUo2GppzSA_gazLT2Q=.f3299b50-20f6-4666-9a60-4145edf3c5ee@github.com> Message-ID: On Thu, 16 Mar 2023 06:05:38 GMT, Thomas Stuefe wrote: > > > > Agreed. But I don't think "Thin locks" is an option as that was specifically an IBM locking implementation. Historically Hotspot's locking mechanism has internally been referred to as stack-locks, so I would suggest UseNewStackLocks > > > > They don't use the stack anymore; would this not be us using a wrong name just for history's sake? Well, it's still got the lock-stacks. :-D ------------- PR: https://git.openjdk.org/jdk/pull/10907 From dholmes at openjdk.org Thu Mar 16 06:35:22 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 16 Mar 2023 06:35:22 GMT Subject: RFR: 8303150: DCmd framework unnecessarily creates a DCmd instance on registration [v3] In-Reply-To: <24zSArtcqklQwi5GcF6x0D3j948ZShoGaOq4eXhGQ2g=.5a045f65-8ef4-43df-8e60-11456450deb7@github.com> References: <24zSArtcqklQwi5GcF6x0D3j948ZShoGaOq4eXhGQ2g=.5a045f65-8ef4-43df-8e60-11456450deb7@github.com> Message-ID: On Thu, 16 Mar 2023 06:21:48 GMT, Thomas Stuefe wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> Make get_parsed_num_arguments() ASSERT only code > > @dholmes-ora thanks for the explanation and the added ASSERT. LGTM Thanks for the review @tstuefe . I will wait for some final build checks and then integrate. ------------- PR: https://git.openjdk.org/jdk/pull/12994 From stuefe at openjdk.org Thu Mar 16 06:42:32 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 16 Mar 2023 06:42:32 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v26] In-Reply-To: References: <4ID9G5P6KGhzXLzlOEc2_lcAlMUo2GppzSA_gazLT2Q=.f3299b50-20f6-4666-9a60-4145edf3c5ee@github.com> Message-ID: On Thu, 16 Mar 2023 06:31:42 GMT, Roman Kennke wrote: > > > Agreed. But I don't think "Thin locks" is an option as that was specifically an IBM locking implementation. Historically Hotspot's locking mechanism has internally been referred to as stack-locks, so I would suggest UseNewStackLocks > > > > > > They don't use the stack anymore; would this not be us using a wrong name just for history's sake? > > Well, it's still got the lock-stacks. :-D Yes, but we have variables like "is_stack_locked" etc, without an "old" qualifier. Idk, up to you. Better than UseFastLocking I guess. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From sspitsyn at openjdk.org Thu Mar 16 07:11:00 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 16 Mar 2023 07:11:00 GMT Subject: RFR: 8304303: implement VirtualThread class notifyJvmti methods as C2 intrinsics [v2] In-Reply-To: <-Pt3zLSu1Y2GYeM8XEivglUyDVXlAqMIA42-_zEnHlo=.7dd40f19-160a-4f11-8702-99c69a9b9923@github.com> References: <-Pt3zLSu1Y2GYeM8XEivglUyDVXlAqMIA42-_zEnHlo=.7dd40f19-160a-4f11-8702-99c69a9b9923@github.com> Message-ID: > This is needed for performance improvements in support of virtual threads. > The update includes the following: > > 1. Refactored the `VirtualThread` native methods: > `notifyJvmtiMountBegin` and `notifyJvmtiMountEnd` => `notifyJvmtiMount` > `notifyJvmtiUnmountBegin` and `notifyJvmtiUnmountEnd` => `notifyJvmtiUnmount` > 2. Still useful implementation of old native methods is moved from `jvm.cpp` to `jvmtiThreadState.cpp`: > `JVM_VirtualThreadMountStart` => `VTMS_mount_begin` > `JVM_VirtualThreadMountEnd` => `VTMS_mount_end` > `JVM_VirtualThreadUnmountStart` = > `VTMS_unmount_begin` > `JVM_VirtualThreadUnmountEnd` => `VTMS_mount_end` > 3. Intrinsified the `VirtualThread` native methods: `notifyJvmtiMount`, `notifyJvmtiUnmount`, `notifyJvmtiHideFrames`. > 4. Removed the`VirtualThread` static boolean state variable `notifyJvmtiEvents` and its support in `javaClasses`. > 5. Added static boolean state variable `_VTMS_notify_jvmti_events` to the jvmtiVTMSTransitionDisabler class as a replacement of the `VirtualThread` `notifyJvmtiEvents` variable. > > Implementing the same methods as C1 intrinsics can be needed in the future but is a low priority for now. > > Testing: > - Ran mach5 tiers 1-6. No regressions were found. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: include jniHandles.hpp into sharedRuntime.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13054/files - new: https://git.openjdk.org/jdk/pull/13054/files/8a379320..397b6337 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13054&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13054&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13054.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13054/head:pull/13054 PR: https://git.openjdk.org/jdk/pull/13054 From dholmes at openjdk.org Thu Mar 16 07:16:20 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 16 Mar 2023 07:16:20 GMT Subject: RFR: 8304089: Convert TraceDependencies to UL [v3] In-Reply-To: References: <0aGc1NdAjpvksCWmXb1gZOPp9MV0n6xWvG8EaEp2ZLg=.b79cf218-9a23-420d-bec5-7509b7f8f1c1@github.com> <13yGfFhRFEsjHA-ox_6GxPiyU8w_hpQtgjHbsw6Glq0=.c2330c3f-0316-4f4c-aade-b6ad6c8543ee@github.com> Message-ID: <2IKS8cYMhoU1JRBEalwF1ZeV4Vih78eXffu_GXZ4JkQ=.27d433ac-1f04-47ce-a177-c791a42bc6fe@github.com> On Wed, 15 Mar 2023 12:55:31 GMT, Coleen Phillimore wrote: >> src/hotspot/share/runtime/arguments.cpp line 3589: >> >>> 3587: PrintCompilation || PrintInlining || PrintDependencies || PrintNativeNMethods || >>> 3588: PrintDebugInfo || PrintRelocations || PrintNMethods || PrintExceptionHandlers || >>> 3589: PrintAssembly || TraceDeoptimization || log_is_enabled(Debug, dependencies) || >> >> Now TraceDependencies is converted to UL I think it should just be deleted from this function. We don't need to enable LogVMOutput in that case. > > No, this is the right thing to do. If -Xlog:dependency - the compiler group also expects the dependency printed to the compiler log file. The logging is a separate mechanism, but should be enabled with -Xlog:dependency. Sorry I don't follow that. `use_vm_log()` only affects non-product builds and forces `LogVMOutput` to true. That in turn will cause `defaultStream::init_log()` to execute which initializes the log file etc. But I don't see how that would cause UL logging for "dependencies" to also get written to the log file??? ------------- PR: https://git.openjdk.org/jdk/pull/13007 From dholmes at openjdk.org Thu Mar 16 07:26:27 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 16 Mar 2023 07:26:27 GMT Subject: RFR: JDK-8301498: Replace NULL with nullptr in cpu/x86 [v3] In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 11:09:25 GMT, Johan Sj?len wrote: >> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/x86. Unfortunately the script that does the change isn't perfect, and so we >> need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. >> >> Here are some typical things to look out for: >> >> 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). >> 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. >> 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. >> >> An example of this: >> >> ```c++ >> // This function returns null >> void* ret_null(); >> // This function returns true if *x == nullptr >> bool is_nullptr(void** x); >> >> >> Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. >> >> Thanks! > > Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Fix vnkozlov's suggestions > - Merge remote-tracking branch 'origin/master' into JDK-8301498 > - Some more fixes > - Fixes > - Replace NULL with nullptr in cpu/x86 Still good. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/12326 From dholmes at openjdk.org Thu Mar 16 07:34:36 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 16 Mar 2023 07:34:36 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v27] In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 09:41:30 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with three additional commits since the last revision: > > - More RISCV changes (by Fei Yang) > - Use -w instructions in fast_unlock() > - Increase stub size of C2HandleAnonOwnerStub to 18 I agree "stack lock" is not ideal even though we do still have lock-stacks. :) But "thin locks" is definitely out, and "fast lock" seems generally agreed to be meaningless. How about just UseNewLocks ? Or UseNewMarkWordLocks? ------------- PR: https://git.openjdk.org/jdk/pull/10907 From stuefe at openjdk.org Thu Mar 16 08:03:38 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 16 Mar 2023 08:03:38 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v27] In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 09:41:30 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with three additional commits since the last revision: > > - More RISCV changes (by Fei Yang) > - Use -w instructions in fast_unlock() > - Increase stub size of C2HandleAnonOwnerStub to 18 I like UseNewLocks but fear that this may conflict with Oracles plan (?) to move OMs into heap, which would be another revamp of locking - fat locks in this case - and may come with yet another switch. Other than that, UseNewLocks sounds good and succinct. Another proposal: UseThreadLockStack or UseLockStack ------------- PR: https://git.openjdk.org/jdk/pull/10907 From rehn at openjdk.org Thu Mar 16 08:15:38 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 16 Mar 2023 08:15:38 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v27] In-Reply-To: References: Message-ID: On Thu, 16 Mar 2023 08:00:38 GMT, Thomas Stuefe wrote: > I like UseNewLocks but fear that this may conflict with Oracles plan (?) to move OMs into heap, which would be another revamp of locking - fat locks in this case - and may come with yet another switch. Other than that, UseNewLocks sounds good and succinct. > > Another proposal: UseThreadLockStack or UseLockStack Just a FYI, at the moment we have: product(ccstr, ObjectSynchronizerMode, "fast", \ "ObjectSynchronizer modes: " \ "legacy: legacy native system; " \ "native: java entry with native monitors; " \ "heavy: java entry with always inflated Java monitors; " \ "fast: java entry with fast-locks and" \ " inflate-on-demand Java monitors; ") \ At least personally I prefer one option than using many. A cmd line with e.g. `-XX:-UseLockStack -XX:+UseHeavyMonitors` It's harder, for me ?, to figure out what is selected and what was intended to be selected. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From mdoerr at openjdk.org Thu Mar 16 08:30:24 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 16 Mar 2023 08:30:24 GMT Subject: RFR: 8296440: Remove Method* handling from cleanup_inline_caches_impl In-Reply-To: References: Message-ID: On Wed, 1 Mar 2023 11:07:36 GMT, Richard Reingruber wrote: > This PR replaces cleaning of static stubs in CompiledMethod::cleanup_inline_caches_impl() with a guarantee that it is actually not needed because the holder of the embedded target Method* is alive if the caller nmethod is not unloading. > > The holder of the target Method* has to be alive because it is reachable from the caller nmethod's oop pool. This is checked by `check_path_to_callee()` when a statically bound call gets resolved. > > C2i entry barriers can be removed for the same reason. > > Testing: > > Many rounds in our CI testing which includes most JCK and JTREG tests, Renaissance benchmark and SAP specific tests with fastdebug and release builds on the standard platforms plus PPC64. > > I've also done tier1 and tier2 tests with -XX:-Inline and tier1 tests with ZGC. > > I've started hotspot and jdk tier1 tests with -Xcomp. They were not finished when I stopped them after 24h. Thanks a lot for testing! Can it happen that the `to_holder` is referenced from the instruction stream, but not from the oop constant section on x86 (or any platform with `relocInfo::mustIterateImmediateOopsInCode()`)? ------------- PR: https://git.openjdk.org/jdk/pull/12802 From mdoerr at openjdk.org Thu Mar 16 08:36:39 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 16 Mar 2023 08:36:39 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v16] In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 22:52:44 GMT, Martin Doerr wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix storing 32 bit integers into Java frames. Enable TestArrayStructs. > > Btw. the new cases in which we use int and short accesses when byteWidth is not a power of 2 are never unaligned AFAICS. I guess _UNALIGNED is unnecessary in the JAVA_INT_UNALIGNED and JAVA_SHORT_UNALIGNED. They are always aligned wrt. to their size. > @TheRealMDoerr I've approved the PR. I suggest for a second reviewer to try and get someone who knows the PPC port. Thanks for the review! I guess I should also wait for your 2 PRs to be integrated, so I can merge and adapt. ------------- PR: https://git.openjdk.org/jdk/pull/12708 From rkennke at openjdk.org Thu Mar 16 08:39:39 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 16 Mar 2023 08:39:39 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v27] In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 09:41:30 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with three additional commits since the last revision: > > - More RISCV changes (by Fei Yang) > - Use -w instructions in fast_unlock() > - Increase stub size of C2HandleAnonOwnerStub to 18 I like -XX:+UseNewLocks, too. I wouldn't overcomplicate things: this flag is meant to be transitional, it is not meant to be used by end-users (except the bravest nerds) at all. When it lands, the Lilliput flag (e.g. +UseCompactObjectHeaders) will also control the locking flag. Eventually (e.g. release+1) both flags would become on by default and afterwards (e.g. release+2) would go away entirely, at which point the whole original stack-locking would disappear. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From stuefe at openjdk.org Thu Mar 16 08:52:37 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 16 Mar 2023 08:52:37 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v27] In-Reply-To: References: Message-ID: On Thu, 16 Mar 2023 08:36:45 GMT, Roman Kennke wrote: >> Roman Kennke has updated the pull request incrementally with three additional commits since the last revision: >> >> - More RISCV changes (by Fei Yang) >> - Use -w instructions in fast_unlock() >> - Increase stub size of C2HandleAnonOwnerStub to 18 > > I like -XX:+UseNewLocks, too. I wouldn't overcomplicate things: this flag is meant to be transitional, it is not meant to be used by end-users (except the bravest nerds) at all. When it lands, the Lilliput flag (e.g. +UseCompactObjectHeaders) will also control the locking flag. Eventually (e.g. release+1) both flags would become on by default and afterwards (e.g. release+2) would go away entirely, at which point the whole original stack-locking would disappear. @rkennke I must be missing something. In aarch64, why do we handle the non-symmetric-unlock case in interpreter, but not in C1/C2? There, we just seem to pop whatever is on top. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Thu Mar 16 09:05:39 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 16 Mar 2023 09:05:39 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v27] In-Reply-To: References: Message-ID: <1cMqmQSAilNe5gNdJj0HBvo5OJqz8wvsetLHTZgF9s4=.79f9c9f2-aac9-4cd1-9d41-3201f89c52b3@github.com> On Thu, 16 Mar 2023 08:36:45 GMT, Roman Kennke wrote: >> Roman Kennke has updated the pull request incrementally with three additional commits since the last revision: >> >> - More RISCV changes (by Fei Yang) >> - Use -w instructions in fast_unlock() >> - Increase stub size of C2HandleAnonOwnerStub to 18 > > I like -XX:+UseNewLocks, too. I wouldn't overcomplicate things: this flag is meant to be transitional, it is not meant to be used by end-users (except the bravest nerds) at all. When it lands, the Lilliput flag (e.g. +UseCompactObjectHeaders) will also control the locking flag. Eventually (e.g. release+1) both flags would become on by default and afterwards (e.g. release+2) would go away entirely, at which point the whole original stack-locking would disappear. > @rkennke I must be missing something. In aarch64, why do we handle the non-symmetric-unlock case in interpreter, but not in C1/C2? There, we just seem to pop whatever is on top. C1 and C2 don't allow assymmetric locking. If that ever happens, they would refuse to compile the method. We should probably check that this assumption holds true when popping the top entry in an #ASSERT block. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From stuefe at openjdk.org Thu Mar 16 09:09:48 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 16 Mar 2023 09:09:48 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v27] In-Reply-To: <1cMqmQSAilNe5gNdJj0HBvo5OJqz8wvsetLHTZgF9s4=.79f9c9f2-aac9-4cd1-9d41-3201f89c52b3@github.com> References: <1cMqmQSAilNe5gNdJj0HBvo5OJqz8wvsetLHTZgF9s4=.79f9c9f2-aac9-4cd1-9d41-3201f89c52b3@github.com> Message-ID: On Thu, 16 Mar 2023 09:02:19 GMT, Roman Kennke wrote: > > > @rkennke I must be missing something. In aarch64, why do we handle the non-symmetric-unlock case in interpreter, but not in C1/C2? There, we just seem to pop whatever is on top. > > C1 and C2 don't allow assymmetric locking. If that ever happens, they would refuse to compile the method. We should probably check that this assumption holds true when popping the top entry in an #ASSERT block. Thanks for clarifying. Yes, asserting that would make sense. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From rrich at openjdk.org Thu Mar 16 09:10:39 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 16 Mar 2023 09:10:39 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v6] In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 19:04:41 GMT, Martin Doerr wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixed aarch64 interpreter mistake > > src/hotspot/cpu/ppc/templateTable_ppc_64.cpp line 3398: > >> 3396: const Bytecodes::Code code = bytecode(); >> 3397: const bool is_invokeinterface = code == Bytecodes::_invokeinterface; >> 3398: const bool is_invokedynamic = code == false; // should not reach here with invokedynamic > > This looks strange! I guess you wanted to delete more? Basically I kept the local variable as a name for the (now) constant value passed in the call at L3409. The parameter cannot be eliminated since `load_invoke_cp_cache_entry()` is declared in a shared header. I could replace the variable reference in the call with `false /* is_invokedynamic */` if you like that better. Personally I'd prefer the current version. ------------- PR: https://git.openjdk.org/jdk/pull/12778 From mdoerr at openjdk.org Thu Mar 16 09:29:26 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 16 Mar 2023 09:29:26 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v6] In-Reply-To: References: Message-ID: On Thu, 16 Mar 2023 09:07:27 GMT, Richard Reingruber wrote: >> src/hotspot/cpu/ppc/templateTable_ppc_64.cpp line 3398: >> >>> 3396: const Bytecodes::Code code = bytecode(); >>> 3397: const bool is_invokeinterface = code == Bytecodes::_invokeinterface; >>> 3398: const bool is_invokedynamic = code == false; // should not reach here with invokedynamic >> >> This looks strange! I guess you wanted to delete more? > > Basically I kept the local variable as a name for the (now) constant value passed in the call at L3409. > > The parameter cannot be eliminated since `load_invoke_cp_cache_entry()` is declared in a shared header. > > I could replace the variable reference in the call with `false /* is_invokedynamic */` if you like that better. Personally I'd prefer the current version. I meant `code == false`. That was probably not intended. ------------- PR: https://git.openjdk.org/jdk/pull/12778 From rrich at openjdk.org Thu Mar 16 09:29:27 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 16 Mar 2023 09:29:27 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v6] In-Reply-To: References: Message-ID: On Thu, 16 Mar 2023 09:21:26 GMT, Martin Doerr wrote: >> Basically I kept the local variable as a name for the (now) constant value passed in the call at L3409. >> >> The parameter cannot be eliminated since `load_invoke_cp_cache_entry()` is declared in a shared header. >> >> I could replace the variable reference in the call with `false /* is_invokedynamic */` if you like that better. Personally I'd prefer the current version. > > I meant `code == false`. That was probably not intended. Oh my ... Your are right of course. ------------- PR: https://git.openjdk.org/jdk/pull/12778 From rrich at openjdk.org Thu Mar 16 09:29:30 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 16 Mar 2023 09:29:30 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v6] In-Reply-To: References: Message-ID: <4xv3uef5CQ0pArU0jbJjWms_qd3akl8ZdbRs18CW31w=.51b759d7-d479-4174-92da-d4bda0500597@github.com> On Wed, 15 Mar 2023 18:45:00 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Fixed aarch64 interpreter mistake src/hotspot/cpu/ppc/templateTable_ppc_64.cpp line 3398: > 3396: const Bytecodes::Code code = bytecode(); > 3397: const bool is_invokeinterface = code == Bytecodes::_invokeinterface; > 3398: const bool is_invokedynamic = code == false; // should not reach here with invokedynamic This is what I meant. Suggestion: const bool is_invokedynamic = false; // should not reach here with invokedynamic Thanks! ------------- PR: https://git.openjdk.org/jdk/pull/12778 From aph at openjdk.org Thu Mar 16 09:35:20 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 16 Mar 2023 09:35:20 GMT Subject: RFR: 8300669: AArch64: Table based tails processing and wider stores for Arrays.fill() intrinsic [v7] In-Reply-To: <5JHSnU7J6tzmoU1yKRYGOA1T8WDi2NHeXWPfQCOYKhU=.7f8023de-9f58-49a3-8c9b-a8e46c7430e9@github.com> References: <0HhfPpk5EIXfhlmdTaT-ik1EQWgYXSKkK7f4fuLKGh0=.9e690153-fc70-49a0-aada-2829747da8cf@github.com> <5JHSnU7J6tzmoU1yKRYGOA1T8WDi2NHeXWPfQCOYKhU=.7f8023de-9f58-49a3-8c9b-a8e46c7430e9@github.com> Message-ID: On Thu, 16 Mar 2023 04:52:57 GMT, Dmitry Chuyko wrote: > As an example consider parameters distribution in OpenJDK jtreg tests. ![02-jtreg-lengths](https://user-images.githubusercontent.com/31855791/225517679-1df9db71-03cd-4ed6-9b79-be19c229d73c.png) Here most fills start with 0 and fill length is below 512 elements. Can you please explain what this means? Are these all non-zero fills? ------------- PR: https://git.openjdk.org/jdk/pull/12222 From fyang at openjdk.org Thu Mar 16 09:38:18 2023 From: fyang at openjdk.org (Fei Yang) Date: Thu, 16 Mar 2023 09:38:18 GMT Subject: RFR: 8304293: RISC-V: JDK-8276799 missed atomic intrinsic support for C1 In-Reply-To: <9U2safmMfGsfIIlyBZW3E3dbm68mKcBAzemr5yqoPEw=.c261f575-2cbc-402e-b742-7a6bdc8c2450@github.com> References: <9U2safmMfGsfIIlyBZW3E3dbm68mKcBAzemr5yqoPEw=.c261f575-2cbc-402e-b742-7a6bdc8c2450@github.com> Message-ID: <6HOmP1tPgq26gezEvauaAs0UYxGKQrXg3iOA7bNC1TQ=.857c8220-ad7b-49f7-bddc-c7c9a807664b@github.com> On Thu, 16 Mar 2023 03:37:10 GMT, Feilong Jiang wrote: > The following intrinsics in C1 are controlled by supports_atomic_xxx, but they are not set properly on RISC-V: > - _getAndAddInt > - _getAndAddLong > - _getAndSetInt > - _getAndSetLong > - _getAndSetReference > > RISC-V provides a set of atomic instructions [1], these intrinsics could be enabled by default. > > Here is the HIR output of C1: > > before: > > > B18 (V) [189, 196] -> B20 pred: B8 B17 > empty stack > inlining depth 0 > __bci__use__tid____instr____________________________________ > 0 0 a251 > 3 0 a252 null > 4 0 l254 274954985816L > 7 0 l255 1L > . 8 0 l256 a251.invokespecial(a252, l254, l255) > jdk/internal/misc/Unsafe.getAndAddLong(Ljava/lang/Object;JJ)J > . 193 0 l258 a42._24 := l256 (J) tid > . 196 0 259 goto B20 > > > after: > > > B18 (V) [189, 196] -> B20 pred: B8 B17 > empty stack > inlining depth 0 > __bci__use__tid____instr____________________________________ > 0 0 a251 > 3 0 a252 null > 4 0 l254 274954985816L > 7 0 l255 1L > . 8 0 l256 UnsafeGetAndSet (add)(a252, l254, value l255) > . 193 0 l258 a42._24 := l256 (J) tid > . 196 0 259 goto B20 > > > 1. https://github.com/riscv/riscv-isa-manual/blob/8b9047d8d20ef548f7996efee1550760d7bc1279/src/a.tex#L416-L422 > > Testing: > > - [ ] tier1 on Unmatched board (release build) Looks good. Thanks for finding this. ------------- Marked as reviewed by fyang (Reviewer). PR: https://git.openjdk.org/jdk/pull/13053 From rehn at openjdk.org Thu Mar 16 10:23:41 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 16 Mar 2023 10:23:41 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v27] In-Reply-To: <1cMqmQSAilNe5gNdJj0HBvo5OJqz8wvsetLHTZgF9s4=.79f9c9f2-aac9-4cd1-9d41-3201f89c52b3@github.com> References: <1cMqmQSAilNe5gNdJj0HBvo5OJqz8wvsetLHTZgF9s4=.79f9c9f2-aac9-4cd1-9d41-3201f89c52b3@github.com> Message-ID: On Thu, 16 Mar 2023 09:02:19 GMT, Roman Kennke wrote: >> I like -XX:+UseNewLocks, too. I wouldn't overcomplicate things: this flag is meant to be transitional, it is not meant to be used by end-users (except the bravest nerds) at all. When it lands, the Lilliput flag (e.g. +UseCompactObjectHeaders) will also control the locking flag. Eventually (e.g. release+1) both flags would become on by default and afterwards (e.g. release+2) would go away entirely, at which point the whole original stack-locking would disappear. > >> @rkennke I must be missing something. In aarch64, why do we handle the non-symmetric-unlock case in interpreter, but not in C1/C2? There, we just seem to pop whatever is on top. > > C1 and C2 don't allow assymmetric locking. If that ever happens, they would refuse to compile the method. We should probably check that this assumption holds true when popping the top entry in an #ASSERT block. > > > @rkennke I must be missing something. In aarch64, why do we handle the non-symmetric-unlock case in interpreter, but not in C1/C2? There, we just seem to pop whatever is on top. > > > > > > C1 and C2 don't allow assymmetric locking. If that ever happens, they would refuse to compile the method. We should probably check that this assumption holds true when popping the top entry in an #ASSERT block. > > Thanks for clarifying. Yes, asserting that would make sense. FYI: I'm trying to convince folks that JVMS should be allowed to enforce asymmetric locking. We think most people don't know they will be stuck in interpreter, unintended. What was discussed latest was to diagnose and warn about this behavior as a first step. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From stuefe at openjdk.org Thu Mar 16 10:29:31 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 16 Mar 2023 10:29:31 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v27] In-Reply-To: References: <1cMqmQSAilNe5gNdJj0HBvo5OJqz8wvsetLHTZgF9s4=.79f9c9f2-aac9-4cd1-9d41-3201f89c52b3@github.com> Message-ID: On Thu, 16 Mar 2023 10:20:21 GMT, Robbin Ehn wrote: > > > > > @rkennke I must be missing something. In aarch64, why do we handle the non-symmetric-unlock case in interpreter, but not in C1/C2? There, we just seem to pop whatever is on top. > > > > > > > > > C1 and C2 don't allow assymmetric locking. If that ever happens, they would refuse to compile the method. We should probably check that this assumption holds true when popping the top entry in an #ASSERT block. > > > > > > Thanks for clarifying. Yes, asserting that would make sense. > > FYI: I'm trying to convince folks that JVMS should be allowed to enforce asymmetric locking. We think most people don't know they will be stuck in interpreter, unintended. What was discussed latest was to diagnose and warn about this behavior as a first step. Sounds good. Just to be clear, you mean enforce symmetric locking? resp. forbid asymmetric locking? ------------- PR: https://git.openjdk.org/jdk/pull/10907 From rehn at openjdk.org Thu Mar 16 11:00:35 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 16 Mar 2023 11:00:35 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v27] In-Reply-To: References: <1cMqmQSAilNe5gNdJj0HBvo5OJqz8wvsetLHTZgF9s4=.79f9c9f2-aac9-4cd1-9d41-3201f89c52b3@github.com> Message-ID: On Thu, 16 Mar 2023 10:26:26 GMT, Thomas Stuefe wrote: > Sounds good. Just to be clear, you mean enforce symmetric locking? resp. forbid asymmetric locking? Yes, sorry, thanks for correcting! :) ------------- PR: https://git.openjdk.org/jdk/pull/10907 From rrich at openjdk.org Thu Mar 16 11:20:19 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 16 Mar 2023 11:20:19 GMT Subject: RFR: 8296440: Remove Method* handling from cleanup_inline_caches_impl In-Reply-To: <8C06DfuEkBnOaaP1WFv7Y6TRftP9XMitffS3UbOaf8c=.abb8c6e3-7a3b-4412-89b3-8819bd93900e@github.com> References: <8C06DfuEkBnOaaP1WFv7Y6TRftP9XMitffS3UbOaf8c=.abb8c6e3-7a3b-4412-89b3-8819bd93900e@github.com> Message-ID: <-kGuL7H1z4fuW2g7GcmikUOSsGUla7zthnDC-Jc6fxI=.e83dcb81-040a-4fb1-b1be-d70b4626f165@github.com> On Thu, 16 Mar 2023 00:17:27 GMT, Vladimir Kozlov wrote: > ``` > # Internal Error (/workspace/open/src/hotspot/share/runtime/sharedRuntime.cpp:1411), pid=1150904, tid=1150907 > # guarantee(false) failed: Missing dependency resolving optimized virtual (invokeinterface) call to jnr.enxio.channels.Native$LibC$jnr$ffi$2::read > ``` Very interesting indeed :). Thanks again for the testing! Have you seen this also on aarch64? ------------- PR: https://git.openjdk.org/jdk/pull/12802 From rkennke at openjdk.org Thu Mar 16 12:24:35 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 16 Mar 2023 12:24:35 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v21] In-Reply-To: <7265U-aASDjFX1CMrbxDZZCPHrYJkufD1QDFBuB1WSA=.623488a7-9ede-4ec2-b840-1e5601a9b97a@github.com> References: <7265U-aASDjFX1CMrbxDZZCPHrYJkufD1QDFBuB1WSA=.623488a7-9ede-4ec2-b840-1e5601a9b97a@github.com> Message-ID: On Sat, 11 Mar 2023 14:57:19 GMT, Thomas Stuefe wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 >> - Use nullptr instead of NULL in touched code (shared) > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 6234: > >> 6232: >> 6233: // Load (object->mark() | 1) into hdr >> 6234: orr(hdr, hdr, markWord::unlocked_value); > > I wondered why this is needed. Should we not have the header of an unloaded object in hdr? Or is this a safeguard against a misuse of this function (called with the header of an already locked object)? It could be a monitor-locked header. In C2 this is not possible and we *could* save an instruction here, I guess. Not sure if it is worth it, though. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Thu Mar 16 12:32:39 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 16 Mar 2023 12:32:39 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v21] In-Reply-To: <7265U-aASDjFX1CMrbxDZZCPHrYJkufD1QDFBuB1WSA=.623488a7-9ede-4ec2-b840-1e5601a9b97a@github.com> References: <7265U-aASDjFX1CMrbxDZZCPHrYJkufD1QDFBuB1WSA=.623488a7-9ede-4ec2-b840-1e5601a9b97a@github.com> Message-ID: On Sat, 11 Mar 2023 14:52:54 GMT, Thomas Stuefe wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 >> - Use nullptr instead of NULL in touched code (shared) > > src/hotspot/share/runtime/lockStack.hpp line 64: > >> 62: >> 63: // GC support >> 64: inline void oops_do(OopClosure* cl); > > Does this need to be nonconst? Yes, because the OopClosures can (and will) update the inline array elements. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From jvernee at openjdk.org Thu Mar 16 12:37:28 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 16 Mar 2023 12:37:28 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v17] In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 18:53:08 GMT, Martin Doerr wrote: >> Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". >> >> This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). >> >> Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. >> >> There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Will get addressed separately: [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) >> >> The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. >> >> I had to make changes to shared code and code for other platforms: >> 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: >> - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. >> - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. >> - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! >> 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Fix Copyright format. Ok, I've gone ahead and integrated: https://github.com/openjdk/jdk/pull/12883 & https://github.com/openjdk/jdk/pull/12908 That's all I had at the moment. I think the rest will come with the JEP PR. ------------- PR: https://git.openjdk.org/jdk/pull/12708 From rkennke at openjdk.org Thu Mar 16 12:51:10 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 16 Mar 2023 12:51:10 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v28] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Several changes (mostly cosmetic) in response to reviews ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/0ad01c1d..2445a19d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=27 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=26-27 Stats: 21 lines in 11 files changed: 7 ins; 5 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From mdoerr at openjdk.org Thu Mar 16 13:15:18 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 16 Mar 2023 13:15:18 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v18] In-Reply-To: References: Message-ID: > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Will get addressed separately: [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: - Merge branch 'master' into PPC64_Panama - Fix Copyright format. - Fix storing 32 bit integers into Java frames. Enable TestArrayStructs. - Allow TestHFA to run on musl. Add Upcalls. - Introduce ABIv2CallArranger for linux ppc64le. - Remove LinuxPPC64CallArranger.java because it doesn't contain anything. - The merge change has messed up some includes in the tests. Revert. - Remove STRUCT_REFERENCE which was incorrectly taken from aarch64. Pass size to bufferLoad/Store. Enable TestNested.java. - Merge remote-tracking branch 'origin' into PPC64_Panama - Handle HFA corner cases with overlapping registers in Java. - ... and 8 more: https://git.openjdk.org/jdk/compare/7dbab81d...944742b5 ------------- Changes: https://git.openjdk.org/jdk/pull/12708/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=17 Stats: 2434 lines in 61 files changed: 2324 ins; 1 del; 109 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From coleenp at openjdk.org Thu Mar 16 13:19:21 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 16 Mar 2023 13:19:21 GMT Subject: RFR: 8304089: Convert TraceDependencies to UL [v3] In-Reply-To: <2IKS8cYMhoU1JRBEalwF1ZeV4Vih78eXffu_GXZ4JkQ=.27d433ac-1f04-47ce-a177-c791a42bc6fe@github.com> References: <0aGc1NdAjpvksCWmXb1gZOPp9MV0n6xWvG8EaEp2ZLg=.b79cf218-9a23-420d-bec5-7509b7f8f1c1@github.com> <13yGfFhRFEsjHA-ox_6GxPiyU8w_hpQtgjHbsw6Glq0=.c2330c3f-0316-4f4c-aade-b6ad6c8543ee@github.com> <2IKS8cYMhoU1JRBEalwF1ZeV4Vih78eXffu_GXZ4JkQ=.27d433ac-1f04-47ce-a177-c791a42bc6fe@github.com> Message-ID: On Thu, 16 Mar 2023 07:13:16 GMT, David Holmes wrote: >> No, this is the right thing to do. If -Xlog:dependency - the compiler group also expects the dependency printed to the compiler log file. The logging is a separate mechanism, but should be enabled with -Xlog:dependency. > > Sorry I don't follow that. `use_vm_log()` only affects non-product builds and forces `LogVMOutput` to true. That in turn will cause `defaultStream::init_log()` to execute which initializes the log file etc. But I don't see how that would cause UL logging for "dependencies" to also get written to the log file??? See the function log_dependency() https://github.com/openjdk/jdk/blob/421b4ee33c652cc7c444fbbf298bbc23d052c2fe/src/hotspot/share/code/dependencies.cpp#L845 called from here (as one place). https://github.com/openjdk/jdk/blob/421b4ee33c652cc7c444fbbf298bbc23d052c2fe/src/hotspot/share/code/dependencies.cpp#L2071 When TraceDependencies was true, the log file would be non-null and log_dependency would write to it. Keeping this with -Xlog:dependencies=debug retains what TraceDependencies did. ------------- PR: https://git.openjdk.org/jdk/pull/13007 From mdoerr at openjdk.org Thu Mar 16 13:26:20 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 16 Mar 2023 13:26:20 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v19] In-Reply-To: References: Message-ID: > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Will get addressed separately: [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: - Merge branch 'openjdk:master' into PPC64_Panama - Merge branch 'master' into PPC64_Panama - Fix Copyright format. - Fix storing 32 bit integers into Java frames. Enable TestArrayStructs. - Allow TestHFA to run on musl. Add Upcalls. - Introduce ABIv2CallArranger for linux ppc64le. - Remove LinuxPPC64CallArranger.java because it doesn't contain anything. - The merge change has messed up some includes in the tests. Revert. - Remove STRUCT_REFERENCE which was incorrectly taken from aarch64. Pass size to bufferLoad/Store. Enable TestNested.java. - Merge remote-tracking branch 'origin' into PPC64_Panama - ... and 9 more: https://git.openjdk.org/jdk/compare/f6291520...2f2f41be ------------- Changes: https://git.openjdk.org/jdk/pull/12708/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=18 Stats: 2434 lines in 61 files changed: 2324 ins; 1 del; 109 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From sjohanss at openjdk.org Thu Mar 16 13:37:21 2023 From: sjohanss at openjdk.org (Stefan Johansson) Date: Thu, 16 Mar 2023 13:37:21 GMT Subject: RFR: 8191565: Last-ditch Full GC should also move humongous objects [v3] In-Reply-To: References: <58l059EvQI6HNQyjUYSGYEWt6x-c1yvtmfX1QWfinH8=.87517ba1-ec81-4b9f-a41b-b05c8d33cf3d@github.com> Message-ID: On Wed, 15 Mar 2023 17:00:38 GMT, Ivan Walulya wrote: >> Hi All, >> >> Please review this change to move humongous regions during the Last-Ditch full gc ( on `do_maximal_compaction`). This change will enable G1 to avoid encountering Out-Of-Memory errors that may occur due to the fragmentation of memory regions caused by the allocation of large memory blocks. >> >> Here's how it works: At the end of `phase2_prepare_compaction`, G1 performs a serial compaction process for regular objects, which results in the heap being divided into two parts. The first part is a densely populated prefix that contains all the regular objects that have been moved. The second part consists of the remaining heap space, which may contain free regions, uncommitted regions, and regions that are not compacting. By moving/compacting the humongous objects in the second part of the heap closer to the dense prefix, G1 reduces the region fragmentation and avoids running into OOM errors. >> >> We have enabled for G1 the Jtreg test that was previously used only for Shenandoah to test such workload. >> >> Testing: Tier 1-3 > > Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: > > StefanJ review Looks good! ------------- Marked as reviewed by sjohanss (Reviewer). PR: https://git.openjdk.org/jdk/pull/12830 From kvn at openjdk.org Thu Mar 16 14:02:19 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 16 Mar 2023 14:02:19 GMT Subject: RFR: 8296440: Remove Method* handling from cleanup_inline_caches_impl In-Reply-To: <-kGuL7H1z4fuW2g7GcmikUOSsGUla7zthnDC-Jc6fxI=.e83dcb81-040a-4fb1-b1be-d70b4626f165@github.com> References: <8C06DfuEkBnOaaP1WFv7Y6TRftP9XMitffS3UbOaf8c=.abb8c6e3-7a3b-4412-89b3-8819bd93900e@github.com> <-kGuL7H1z4fuW2g7GcmikUOSsGUla7zthnDC-Jc6fxI=.e83dcb81-040a-4fb1-b1be-d70b4626f165@github.com> Message-ID: On Thu, 16 Mar 2023 11:17:33 GMT, Richard Reingruber wrote: > Have you seen this also on aarch64? I just ran it on linux-aarch64 (Ampere) with the same flags and got the same failure with almost same call stack (without `jnr.enxio`): V [libjvm.so+0x16c57cc] SharedRuntime::resolve_sub_helper(bool, bool, JavaThread*)+0x9b8 (sharedRuntime.cpp:1411) V [libjvm.so+0x16c595c] SharedRuntime::resolve_helper(bool, bool, JavaThread*)+0x3c (sharedRuntime.cpp:1246) V [libjvm.so+0x16c6644] SharedRuntime::resolve_opt_virtual_call_C(JavaThread*)+0x174 (sharedRuntime.cpp:1690) v ~RuntimeStub::resolve_opt_virtual_call 0x0000ffff742a37d0 J 32008 c2 org.jruby.util.io.OpenFile$2.run(Lorg/jruby/runtime/ThreadContext;Ljava/lang/Object;[BII)I org.jruby.dist (15 bytes) @ 0x0000ffff769502f8 [0x0000ffff76950280+0x0000000000000078] ------------- PR: https://git.openjdk.org/jdk/pull/12802 From kvn at openjdk.org Thu Mar 16 14:06:19 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 16 Mar 2023 14:06:19 GMT Subject: RFR: 8296440: Remove Method* handling from cleanup_inline_caches_impl In-Reply-To: References: Message-ID: On Wed, 1 Mar 2023 11:07:36 GMT, Richard Reingruber wrote: > This PR replaces cleaning of static stubs in CompiledMethod::cleanup_inline_caches_impl() with a guarantee that it is actually not needed because the holder of the embedded target Method* is alive if the caller nmethod is not unloading. > > The holder of the target Method* has to be alive because it is reachable from the caller nmethod's oop pool. This is checked by `check_path_to_callee()` when a statically bound call gets resolved. > > C2i entry barriers can be removed for the same reason. > > Testing: > > Many rounds in our CI testing which includes most JCK and JTREG tests, Renaissance benchmark and SAP specific tests with fastdebug and release builds on the standard platforms plus PPC64. > > I've also done tier1 and tier2 tests with -XX:-Inline and tier1 tests with ZGC. > > I've started hotspot and jdk tier1 tests with -Xcomp. They were not finished when I stopped them after 24h. On aarch64 `guarantee` line is the same. I assume `jnr.enxio.channels.Native` methods were inlined that is why they not on call stack: guarantee(false) failed: Missing dependency resolving optimized virtual (invokeinterface) call to jnr.enxio.channels.Native$LibC$jnr$ffi$2::read ------------- PR: https://git.openjdk.org/jdk/pull/12802 From rrich at openjdk.org Thu Mar 16 14:15:25 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 16 Mar 2023 14:15:25 GMT Subject: RFR: 8296440: Remove Method* handling from cleanup_inline_caches_impl In-Reply-To: References: Message-ID: On Wed, 1 Mar 2023 11:07:36 GMT, Richard Reingruber wrote: > This PR replaces cleaning of static stubs in CompiledMethod::cleanup_inline_caches_impl() with a guarantee that it is actually not needed because the holder of the embedded target Method* is alive if the caller nmethod is not unloading. > > The holder of the target Method* has to be alive because it is reachable from the caller nmethod's oop pool. This is checked by `check_path_to_callee()` when a statically bound call gets resolved. > > C2i entry barriers can be removed for the same reason. > > Testing: > > Many rounds in our CI testing which includes most JCK and JTREG tests, Renaissance benchmark and SAP specific tests with fastdebug and release builds on the standard platforms plus PPC64. > > I've also done tier1 and tier2 tests with -XX:-Inline and tier1 tests with ZGC. > > I've started hotspot and jdk tier1 tests with -Xcomp. They were not finished when I stopped them after 24h. Seems to be independen of `relocInfo::mustIterateImmediateOopsInCode()` then because on aarch64 it is false. Still this is a good point. The verification code should better use `nmethod::oops_do()`. ------------- PR: https://git.openjdk.org/jdk/pull/12802 From mdoerr at openjdk.org Thu Mar 16 14:17:12 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 16 Mar 2023 14:17:12 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v20] In-Reply-To: References: Message-ID: <6zPly5eBW4Y9v5qD0VKBSigBod2OJeqKoymR_TEogO4=.88f854ce-6a35-4022-a363-1e617bbbd823@github.com> > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Will get addressed separately: [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. Martin Doerr has updated the pull request incrementally with two additional commits since the last revision: - Adaptation for JDK-8303022. - Adaptation for JDK-8303684. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12708/files - new: https://git.openjdk.org/jdk/pull/12708/files/2f2f41be..a21f6cfb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=18-19 Stats: 23 lines in 5 files changed: 9 ins; 3 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From iwalulya at openjdk.org Thu Mar 16 14:19:36 2023 From: iwalulya at openjdk.org (Ivan Walulya) Date: Thu, 16 Mar 2023 14:19:36 GMT Subject: RFR: 8191565: Last-ditch Full GC should also move humongous objects [v3] In-Reply-To: References: <58l059EvQI6HNQyjUYSGYEWt6x-c1yvtmfX1QWfinH8=.87517ba1-ec81-4b9f-a41b-b05c8d33cf3d@github.com> Message-ID: On Thu, 16 Mar 2023 13:34:55 GMT, Stefan Johansson wrote: >> Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision: >> >> StefanJ review > > Looks good! Thanks @kstefanj and @tschatzl for the reviews! ------------- PR: https://git.openjdk.org/jdk/pull/12830 From iwalulya at openjdk.org Thu Mar 16 14:19:38 2023 From: iwalulya at openjdk.org (Ivan Walulya) Date: Thu, 16 Mar 2023 14:19:38 GMT Subject: Integrated: 8191565: Last-ditch Full GC should also move humongous objects In-Reply-To: <58l059EvQI6HNQyjUYSGYEWt6x-c1yvtmfX1QWfinH8=.87517ba1-ec81-4b9f-a41b-b05c8d33cf3d@github.com> References: <58l059EvQI6HNQyjUYSGYEWt6x-c1yvtmfX1QWfinH8=.87517ba1-ec81-4b9f-a41b-b05c8d33cf3d@github.com> Message-ID: On Thu, 2 Mar 2023 13:48:10 GMT, Ivan Walulya wrote: > Hi All, > > Please review this change to move humongous regions during the Last-Ditch full gc ( on `do_maximal_compaction`). This change will enable G1 to avoid encountering Out-Of-Memory errors that may occur due to the fragmentation of memory regions caused by the allocation of large memory blocks. > > Here's how it works: At the end of `phase2_prepare_compaction`, G1 performs a serial compaction process for regular objects, which results in the heap being divided into two parts. The first part is a densely populated prefix that contains all the regular objects that have been moved. The second part consists of the remaining heap space, which may contain free regions, uncommitted regions, and regions that are not compacting. By moving/compacting the humongous objects in the second part of the heap closer to the dense prefix, G1 reduces the region fragmentation and avoids running into OOM errors. > > We have enabled for G1 the Jtreg test that was previously used only for Shenandoah to test such workload. > > Testing: Tier 1-3 This pull request has now been integrated. Changeset: 96889bf3 Author: Ivan Walulya URL: https://git.openjdk.org/jdk/commit/96889bf3e4f36fa7f9e9b9989a1bc3ac4719bfeb Stats: 401 lines in 15 files changed: 322 ins; 47 del; 32 mod 8191565: Last-ditch Full GC should also move humongous objects Reviewed-by: tschatzl, sjohanss ------------- PR: https://git.openjdk.org/jdk/pull/12830 From mdoerr at openjdk.org Thu Mar 16 14:42:10 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 16 Mar 2023 14:42:10 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v21] In-Reply-To: References: Message-ID: > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Will get addressed separately: [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Move ABIv2CallArranger out of linux subdirectory. ABIv1/2 does match the AIX/linux separation. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12708/files - new: https://git.openjdk.org/jdk/pull/12708/files/a21f6cfb..4666aa22 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=19-20 Stats: 81 lines in 3 files changed: 40 ins; 40 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From mdoerr at openjdk.org Thu Mar 16 14:42:12 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 16 Mar 2023 14:42:12 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v20] In-Reply-To: <6zPly5eBW4Y9v5qD0VKBSigBod2OJeqKoymR_TEogO4=.88f854ce-6a35-4022-a363-1e617bbbd823@github.com> References: <6zPly5eBW4Y9v5qD0VKBSigBod2OJeqKoymR_TEogO4=.88f854ce-6a35-4022-a363-1e617bbbd823@github.com> Message-ID: <2ly6-8MdohRIteqaZsBXJd5R6FxhhmsAv4pwdIaZPW4=.77a93182-17bc-4149-99fb-2c6064c297eb@github.com> On Thu, 16 Mar 2023 14:17:12 GMT, Martin Doerr wrote: >> Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". >> >> This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). >> >> Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. >> >> There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Will get addressed separately: [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) >> >> The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. >> >> I had to make changes to shared code and code for other platforms: >> 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: >> - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. >> - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. >> - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! >> 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. > > Martin Doerr has updated the pull request incrementally with two additional commits since the last revision: > > - Adaptation for JDK-8303022. > - Adaptation for JDK-8303684. Rebasing + minor cleanup done. Tests are passing. ------------- PR: https://git.openjdk.org/jdk/pull/12708 From dnsimon at openjdk.org Thu Mar 16 15:13:32 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 16 Mar 2023 15:13:32 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v5] In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 15:41:17 GMT, Frederic Parain wrote: >> Please review this change re-implementing the FieldInfo data structure. >> >> The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. >> >> The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). >> >> More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 >> >> Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. >> >> The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. >> >> Tested with mach5, tier 1 to 7. >> >> Thank you. > > Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: > > SA and JVMCI fixes Marked as reviewed by dnsimon (Committer). ------------- PR: https://git.openjdk.org/jdk/pull/12855 From rrich at openjdk.org Thu Mar 16 16:15:10 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 16 Mar 2023 16:15:10 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v6] In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 18:45:00 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Fixed aarch64 interpreter mistake src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 2335: > 2333: > 2334: __ load_resolved_indy_entry(cache, index); > 2335: __ ldr(method, Address(cache, in_bytes(ResolvedIndyEntry::method_offset()))); Should this load have acquire semantics? Like [here in template interpreter](https://github.com/openjdk/jdk/blob/2f23c80e0de44815d26a7d541701e16c9c1d32bc/src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp#L239) and [here for the zero interpreter](https://github.com/openjdk/jdk/blob/2f23c80e0de44815d26a7d541701e16c9c1d32bc/src/hotspot/share/oops/cpCache.inline.hpp#L33)? Call stack for zero interpreter is ConstantPoolCacheEntry::indices_ord() ConstantPoolCacheEntry::bytecode_1() ConstantPoolCacheEntry::is_resolved(enum Bytecodes::Code) BytecodeInterpreter::run(interpreterState) ------------- PR: https://git.openjdk.org/jdk/pull/12778 From dcubed at openjdk.org Thu Mar 16 19:51:10 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 16 Mar 2023 19:51:10 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v28] In-Reply-To: References: Message-ID: <3B1OYwu_wroqQecRIZNNk7Yrrs_X_nk4hrjiC9IeGvk=.d3a9c5ac-7578-4d08-95a6-35b046d2cf26@github.com> On Thu, 16 Mar 2023 12:51:10 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Several changes (mostly cosmetic) in response to reviews I did a round of Mach5 Tier[1-4] testing on v26. Please see the bug report for the gory details. There are 12 tests in Tier4 that fail when -Xcheck:jni is used. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From dcubed at openjdk.org Thu Mar 16 19:55:18 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 16 Mar 2023 19:55:18 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v28] In-Reply-To: References: Message-ID: On Thu, 16 Mar 2023 12:51:10 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Several changes (mostly cosmetic) in response to reviews Another way to look at the option name question is to invert the sense of the option. The old stack-locking code would be enabled by this new `UseStackLocking` option (which would be on by default for now) and the newer locking code that uses a lock-stack that is embedded in the JavaThread would be the "else" case of the temporary `UseStackLocking` option. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Thu Mar 16 20:56:15 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 16 Mar 2023 20:56:15 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: References: Message-ID: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 - Set condition flags correctly after fast-lock call on aarch64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/2445a19d..37f061b0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=28 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=27-28 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Thu Mar 16 20:56:19 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 16 Mar 2023 20:56:19 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v28] In-Reply-To: References: Message-ID: On Thu, 16 Mar 2023 12:51:10 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Several changes (mostly cosmetic) in response to reviews In my last changes I made a stupid mistake and don't set the condition flags correctly to force the slow-path, on aarch64. This is only relevant when we exceed the lock-stack capacity, that is why it's failing so rarely. I don't see a similar problem on x86_64 - have we observed any failures on x86_64? I pushed a fix for aarch64. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From dcubed at openjdk.org Thu Mar 16 20:56:22 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 16 Mar 2023 20:56:22 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v28] In-Reply-To: References: Message-ID: On Thu, 16 Mar 2023 12:51:10 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Several changes (mostly cosmetic) in response to reviews src/hotspot/cpu/x86/x86_32.ad line 617: > 615: int bangsize = C->output()->bang_size_in_bytes(); > 616: > 617: __ verified_entry(framesize, C->output()->need_stack_bang(bangsize)?bangsize:0, C->in_24_bit_fp_mode(), C->stub_function() != NULL); Why did this change from `nullptr` -> `NULL`? src/hotspot/cpu/x86/x86_64.ad line 925: > 923: } > 924: > 925: __ verified_entry(framesize, C->output()->need_stack_bang(bangsize)?bangsize:0, false, C->stub_function() != NULL); Why did this change from `nullptr` -> `NULL`? ------------- PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Thu Mar 16 21:00:37 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 16 Mar 2023 21:00:37 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v28] In-Reply-To: References: Message-ID: On Thu, 16 Mar 2023 20:50:12 GMT, Daniel D. Daugherty wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Several changes (mostly cosmetic) in response to reviews > > src/hotspot/cpu/x86/x86_32.ad line 617: > >> 615: int bangsize = C->output()->bang_size_in_bytes(); >> 616: >> 617: __ verified_entry(framesize, C->output()->need_stack_bang(bangsize)?bangsize:0, C->in_24_bit_fp_mode(), C->stub_function() != NULL); > > Why did this change from `nullptr` -> `NULL`? I reverted that part back to upstream state (at least what is in JDK-21+13). ------------- PR: https://git.openjdk.org/jdk/pull/10907 From dholmes at openjdk.org Thu Mar 16 21:04:17 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 16 Mar 2023 21:04:17 GMT Subject: Integrated: 8303150: DCmd framework unnecessarily creates a DCmd instance on registration In-Reply-To: References: Message-ID: <3HkwSgvBIzeNYJLuTC5-IMKvZPmiBKyZ2WtI25U5dcA=.5df6c36c-fce6-4629-b74f-10b6e49d23d4@github.com> On Mon, 13 Mar 2023 01:19:50 GMT, David Holmes wrote: > When DCmd factories are registered, the factory is passed the number of arguments taken by the DCmd - using a template method `get_num_arguments`. For DCmds that don't extend DCmdWithParser there has to be a static `num_arguments()` method in that class. For DCmds that do extend DCmdWithParser the logic instantiates an instance of the DCmd, extracts its parser and calls its `num_arguments` method which dynamically counts the number of defined arguments. > > Creating an instance of each DCmd and dynamically counting arguments is inefficient and unnecessary, the number of arguments is statically known and easily expressed (in fact many of the JFR DCmds already statically define this). So we add the static `num_arguments()` method to each class that needs it and return the statically counted number of arguments. To ensure the static number and actual number don't get out-of-sync, we keep the original dynamic logic for use in debug builds to assert that the static and dynamic counts are the same. The assert will trigger during a debug build if something does get out of sync, for example if a new DCmd (extending DCmdWithParser) were added but didn't define the static `num_arguments()` method. > > A number of DCmd classes were unnecessarily defining their own dynamic version of `num_arguments` and these are now removed. > > In the template method I use `ENABLE_IF(std::is_convertible::value)` to check we only call this on DCmd classes. This may be unnecessary but it seemed consistent with the other template methods. Note that `std::is_base_of` only works for immediate super types. > > Testing: tiers 1-4 > > Performance: in theory we should see some improvement in startup; in practice it is barely noticeable. > > Thanks. This pull request has now been integrated. Changeset: a487a270 Author: David Holmes URL: https://git.openjdk.org/jdk/commit/a487a270dcd6d6a6b5ea49dece515334a0e48efc Stats: 104 lines in 12 files changed: 36 ins; 59 del; 9 mod 8303150: DCmd framework unnecessarily creates a DCmd instance on registration Reviewed-by: fparain, stuefe, kevinw ------------- PR: https://git.openjdk.org/jdk/pull/12994 From dcubed at openjdk.org Thu Mar 16 21:09:18 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 16 Mar 2023 21:09:18 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v28] In-Reply-To: References: Message-ID: On Thu, 16 Mar 2023 20:47:59 GMT, Roman Kennke wrote: > I pushed a fix for aarch64. Do you think this is the cause for the -Xcheck:jni failures that I ran into in my Tier4 testing? ------------- PR: https://git.openjdk.org/jdk/pull/10907 From dcubed at openjdk.org Thu Mar 16 21:09:20 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 16 Mar 2023 21:09:20 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v28] In-Reply-To: References: Message-ID: On Thu, 16 Mar 2023 20:57:31 GMT, Roman Kennke wrote: >> src/hotspot/cpu/x86/x86_32.ad line 617: >> >>> 615: int bangsize = C->output()->bang_size_in_bytes(); >>> 616: >>> 617: __ verified_entry(framesize, C->output()->need_stack_bang(bangsize)?bangsize:0, C->in_24_bit_fp_mode(), C->stub_function() != NULL); >> >> Why did this change from `nullptr` -> `NULL`? > > I reverted that part back to upstream state (at least what is in JDK-21+13). Okay. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Thu Mar 16 21:19:29 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 16 Mar 2023 21:19:29 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v28] In-Reply-To: References: Message-ID: On Thu, 16 Mar 2023 21:05:54 GMT, Daniel D. Daugherty wrote: > > I pushed a fix for aarch64. > > > > Do you think this is the cause for the -Xcheck:jni failures that I ran into > > in my Tier4 testing? Yes, and with high probability also for some/all of the other failures. It leads to the situation that when the lock-stack is full, it should take the slow-path, but doesn't (because the flags are not set correctly) and thus leave the object unlocked. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From dcubed at openjdk.org Thu Mar 16 21:33:32 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 16 Mar 2023 21:33:32 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: <-aGsX_dmmSBQPrgTVaCsZVU3gFQg1gs9gqS8RFzkRC4=.21ec8ca5-18c5-4f8a-a286-f0c1a32bdeee@github.com> On Thu, 16 Mar 2023 20:56:15 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Set condition flags correctly after fast-lock call on aarch64 I've reviewed the v27 and v28 changes and kicked off yet another round of Mach5 testing. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From matsaave at openjdk.org Thu Mar 16 21:39:57 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 16 Mar 2023 21:39:57 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v6] In-Reply-To: References: Message-ID: On Thu, 16 Mar 2023 16:11:57 GMT, Richard Reingruber wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixed aarch64 interpreter mistake > > src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 2335: > >> 2333: >> 2334: __ load_resolved_indy_entry(cache, index); >> 2335: __ ldr(method, Address(cache, in_bytes(ResolvedIndyEntry::method_offset()))); > > Should this load have acquire semantics? > Like [here in template interpreter](https://github.com/openjdk/jdk/blob/2f23c80e0de44815d26a7d541701e16c9c1d32bc/src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp#L239) and [here for the zero interpreter](https://github.com/openjdk/jdk/blob/2f23c80e0de44815d26a7d541701e16c9c1d32bc/src/hotspot/share/oops/cpCache.inline.hpp#L33)? > > Call stack for zero interpreter is > > ConstantPoolCacheEntry::indices_ord() > ConstantPoolCacheEntry::bytecode_1() > ConstantPoolCacheEntry::is_resolved(enum Bytecodes::Code) > BytecodeInterpreter::run(interpreterState) Yes, acquire semantics should be used here. Thank you for pointing this out! ------------- PR: https://git.openjdk.org/jdk/pull/12778 From rrich at openjdk.org Thu Mar 16 22:23:24 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 16 Mar 2023 22:23:24 GMT Subject: RFR: 8296440: Remove Method* handling from cleanup_inline_caches_impl In-Reply-To: References: Message-ID: On Wed, 1 Mar 2023 11:07:36 GMT, Richard Reingruber wrote: > This PR replaces cleaning of static stubs in CompiledMethod::cleanup_inline_caches_impl() with a guarantee that it is actually not needed because the holder of the embedded target Method* is alive if the caller nmethod is not unloading. > > The holder of the target Method* has to be alive because it is reachable from the caller nmethod's oop pool. This is checked by `check_path_to_callee()` when a statically bound call gets resolved. > > C2i entry barriers can be removed for the same reason. > > Testing: > > Many rounds in our CI testing which includes most JCK and JTREG tests, Renaissance benchmark and SAP specific tests with fastdebug and release builds on the standard platforms plus PPC64. > > I've also done tier1 and tier2 tests with -XX:-Inline and tier1 tests with ZGC. > > I've started hotspot and jdk tier1 tests with -Xcomp. They were not finished when I stopped them after 24h. Reproducer: `./bin/jruby -J-Xcomp -J-ea -J-esa -J-XX:CompileThreshold=100 -S rake spec:ruby:fast ` ------------- PR: https://git.openjdk.org/jdk/pull/12802 From dholmes at openjdk.org Thu Mar 16 22:38:39 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 16 Mar 2023 22:38:39 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v5] In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 15:41:17 GMT, Frederic Parain wrote: >> Please review this change re-implementing the FieldInfo data structure. >> >> The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. >> >> The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). >> >> More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 >> >> Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. >> >> The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. >> >> Tested with mach5, tier 1 to 7. >> >> Thank you. > > Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: > > SA and JVMCI fixes Nice piece of work Fred - I won't pretend to follow every detail. A few nits on unnecessary alignment (which may match pre-existing style not evident in the diff). Thanks. src/hotspot/share/oops/fieldInfo.inline.hpp line 170: > 168: new_flags = old_flags & ~mask; > 169: witness = Atomic::cmpxchg(&flags, old_flags, new_flags); > 170: } while(witness != old_flags); Just to prove I did read this :) space needed after `while` src/hotspot/share/oops/fieldInfo.inline.hpp line 174: > 172: > 173: inline void FieldStatus::update_flag(FieldStatusBitPosition pos, bool z) { > 174: if (z) atomic_set_bits( _flags, flag_mask(pos)); Nit: extra space before `_flags` src/hotspot/share/oops/fieldInfo.inline.hpp line 175: > 173: inline void FieldStatus::update_flag(FieldStatusBitPosition pos, bool z) { > 174: if (z) atomic_set_bits( _flags, flag_mask(pos)); > 175: else atomic_clear_bits(_flags, flag_mask(pos)); Nit: no need for the extra spaces. If you really want these to align just place them on ne wlines. src/hotspot/share/oops/instanceKlass.inline.hpp line 50: > 48: > 49: inline Symbol* InstanceKlass::field_name (int index) const { return field(index).name(constants()); } > 50: inline Symbol* InstanceKlass::field_signature (int index) const { return field(index).signature(constants()); } There should not be spaces between a method name and the opening `(`. I'm really not a fine of this kind of alignment. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.org/jdk/pull/12855 From jvernee at openjdk.org Fri Mar 17 00:10:31 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 17 Mar 2023 00:10:31 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v21] In-Reply-To: References: Message-ID: <4-GOVkzGTUqypflffoXoeF9zAlBleE_1zxV3LfcVHow=.afc0426b-2b97-4b02-9ecc-e2d11a1f8a7e@github.com> On Thu, 16 Mar 2023 14:42:10 GMT, Martin Doerr wrote: >> Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". >> >> This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). >> >> Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. >> >> There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Will get addressed separately: [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) >> >> The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. >> >> I had to make changes to shared code and code for other platforms: >> 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: >> - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. >> - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. >> - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! >> 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Move ABIv2CallArranger out of linux subdirectory. ABIv1/2 does match the AIX/linux separation. Marked as reviewed by jvernee (Reviewer). ------------- PR: https://git.openjdk.org/jdk/pull/12708 From sspitsyn at openjdk.org Fri Mar 17 01:36:47 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 17 Mar 2023 01:36:47 GMT Subject: RFR: 8304303: implement VirtualThread class notifyJvmti methods as C2 intrinsics [v3] In-Reply-To: <-Pt3zLSu1Y2GYeM8XEivglUyDVXlAqMIA42-_zEnHlo=.7dd40f19-160a-4f11-8702-99c69a9b9923@github.com> References: <-Pt3zLSu1Y2GYeM8XEivglUyDVXlAqMIA42-_zEnHlo=.7dd40f19-160a-4f11-8702-99c69a9b9923@github.com> Message-ID: > This is needed for performance improvements in support of virtual threads. > The update includes the following: > > 1. Refactored the `VirtualThread` native methods: > `notifyJvmtiMountBegin` and `notifyJvmtiMountEnd` => `notifyJvmtiMount` > `notifyJvmtiUnmountBegin` and `notifyJvmtiUnmountEnd` => `notifyJvmtiUnmount` > 2. Still useful implementation of old native methods is moved from `jvm.cpp` to `jvmtiThreadState.cpp`: > `JVM_VirtualThreadMountStart` => `VTMS_mount_begin` > `JVM_VirtualThreadMountEnd` => `VTMS_mount_end` > `JVM_VirtualThreadUnmountStart` = > `VTMS_unmount_begin` > `JVM_VirtualThreadUnmountEnd` => `VTMS_mount_end` > 3. Intrinsified the `VirtualThread` native methods: `notifyJvmtiMount`, `notifyJvmtiUnmount`, `notifyJvmtiHideFrames`. > 4. Removed the`VirtualThread` static boolean state variable `notifyJvmtiEvents` and its support in `javaClasses`. > 5. Added static boolean state variable `_VTMS_notify_jvmti_events` to the jvmtiVTMSTransitionDisabler class as a replacement of the `VirtualThread` `notifyJvmtiEvents` variable. > > Implementing the same methods as C1 intrinsics can be needed in the future but is a low priority for now. > > Testing: > - Ran mach5 tiers 1-6. No regressions were found. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: address pre-review comments from Leonid ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13054/files - new: https://git.openjdk.org/jdk/pull/13054/files/397b6337..f3692263 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13054&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13054&range=01-02 Stats: 22 lines in 2 files changed: 14 ins; 4 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/13054.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13054/head:pull/13054 PR: https://git.openjdk.org/jdk/pull/13054 From dholmes at openjdk.org Fri Mar 17 04:56:22 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 17 Mar 2023 04:56:22 GMT Subject: RFR: 8304089: Convert TraceDependencies to UL [v3] In-Reply-To: References: <0aGc1NdAjpvksCWmXb1gZOPp9MV0n6xWvG8EaEp2ZLg=.b79cf218-9a23-420d-bec5-7509b7f8f1c1@github.com> <13yGfFhRFEsjHA-ox_6GxPiyU8w_hpQtgjHbsw6Glq0=.c2330c3f-0316-4f4c-aade-b6ad6c8543ee@github.com> <2IKS8cYMhoU1JRBEalwF1ZeV4Vih78eXffu_GXZ4JkQ=.27d433ac-1f04-47ce-a177-c791a42bc6fe@github.com> Message-ID: On Thu, 16 Mar 2023 13:16:42 GMT, Coleen Phillimore wrote: >> Sorry I don't follow that. `use_vm_log()` only affects non-product builds and forces `LogVMOutput` to true. That in turn will cause `defaultStream::init_log()` to execute which initializes the log file etc. But I don't see how that would cause UL logging for "dependencies" to also get written to the log file??? > > See the function log_dependency() > > https://github.com/openjdk/jdk/blob/421b4ee33c652cc7c444fbbf298bbc23d052c2fe/src/hotspot/share/code/dependencies.cpp#L845 > > called from here (as one place). > > https://github.com/openjdk/jdk/blob/421b4ee33c652cc7c444fbbf298bbc23d052c2fe/src/hotspot/share/code/dependencies.cpp#L2071 > > When TraceDependencies was true, the log file would be non-null and log_dependency would write to it. Keeping this with -Xlog:dependencies=debug retains what TraceDependencies did. Hmmm this seems broken to me then. We output the general dependency logging to one place based on UL configuration, but then we output `log_dependency` to the log file. If these are meant to be related and always reported together then we have lost that. If they are actually unrelated then this may still need the TraceDependencies flag to control it. @iwanowww can you comment on this please? ------------- PR: https://git.openjdk.org/jdk/pull/13007 From stuefe at openjdk.org Fri Mar 17 06:18:40 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 17 Mar 2023 06:18:40 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v28] In-Reply-To: References: Message-ID: On Thu, 16 Mar 2023 20:47:59 GMT, Roman Kennke wrote: > In my last changes I made a stupid mistake and don't set the condition flags correctly to force the slow-path, on aarch64. This is only relevant when we exceed the lock-stack capacity, that is why it's failing so rarely. I don't see a similar problem on x86_64 - have we observed any failures on x86_64? I pushed a fix for aarch64. I noticed this too for arm; I used cmp to clear EQ but using tst seems better. I also do it inside fast_lock, to give it a defined exit state wrt EQ|NE, since it saves me from having to think about this on every call site. But at least the fail case may be fiddly without conditional execution. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Fri Mar 17 06:36:37 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 17 Mar 2023 06:36:37 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v28] In-Reply-To: References: Message-ID: <_ZdPTSdrR4kfE69GsWFm3Y_WqA_g0aa-m1o3J-TJH6I=.ad2bb926-994c-4f3c-9e74-cebb1456c55e@github.com> On Fri, 17 Mar 2023 06:15:28 GMT, Thomas Stuefe wrote: > > In my last changes I made a stupid mistake and don't set the condition flags correctly to force the slow-path, on aarch64. This is only relevant when we exceed the lock-stack capacity, that is why it's failing so rarely. I don't see a similar problem on x86_64 - have we observed any failures on x86_64? I pushed a fix for aarch64. > > > > I noticed this too for arm; I used cmp to clear EQ but using tst seems better. I also do it inside fast_lock, to give it a defined exit state wrt EQ|NE, since it saves me from having to think about this on every call site. But at least the fail case may be fiddly without conditional execution. Cmp(r,r) would not clear EQ, but set it. Unless you do cmp(r,0) on a non-null register. Tst is better at least on x86 because it encodes smaller. *shrugs* You can do it in the shared fast_lock() but it's really only needed in C2, that's why I'm doing it there. Maybe I'm too perfectionist when it comes to assembly code? ------------- PR: https://git.openjdk.org/jdk/pull/10907 From stuefe at openjdk.org Fri Mar 17 06:44:37 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 17 Mar 2023 06:44:37 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v28] In-Reply-To: <_ZdPTSdrR4kfE69GsWFm3Y_WqA_g0aa-m1o3J-TJH6I=.ad2bb926-994c-4f3c-9e74-cebb1456c55e@github.com> References: <_ZdPTSdrR4kfE69GsWFm3Y_WqA_g0aa-m1o3J-TJH6I=.ad2bb926-994c-4f3c-9e74-cebb1456c55e@github.com> Message-ID: On Fri, 17 Mar 2023 06:33:43 GMT, Roman Kennke wrote: > > > > In my last changes I made a stupid mistake and don't set the condition flags correctly to force the slow-path, on aarch64. This is only relevant when we exceed the lock-stack capacity, that is why it's failing so rarely. I don't see a similar problem on x86_64 - have we observed any failures on x86_64? I pushed a fix for aarch64. > > > > > > I noticed this too for arm; I used cmp to clear EQ but using tst seems better. I also do it inside fast_lock, to give it a defined exit state wrt EQ|NE, since it saves me from having to think about this on every call site. But at least the fail case may be fiddly without conditional execution. > > Cmp(r,r) would not clear EQ, but set it. Unless you do cmp(r,0) on a non-null register. Sure. I used cmp with an immediate that I knew was not the value. Clunky, I know. As I wrote, tst seems better. > Tst is better at least on x86 because it encodes smaller. _shrugs_ > > You can do it in the shared fast_lock() but it's really only needed in C2, that's why I'm doing it there. Maybe I'm too perfectionist when it comes to assembly code? I felt just better having it there, at least for the start. I may still move it outside to C2. Lets see. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From amitkumar at openjdk.org Fri Mar 17 08:13:09 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 17 Mar 2023 08:13:09 GMT Subject: RFR: 8302328: [s390x] Simplify asm_assert definition In-Reply-To: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> References: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> Message-ID: On Thu, 2 Mar 2023 07:49:05 GMT, Amit Kumar wrote: > This PR cleanups some assert statements and specifies branch condition at calling site itself. Remaining asm_assert methods are inlined as well. A gentle reminder for reviewing this PR. ------------- PR: https://git.openjdk.org/jdk/pull/12822 From duke at openjdk.org Fri Mar 17 08:51:53 2023 From: duke at openjdk.org (Afshin Zafari) Date: Fri, 17 Mar 2023 08:51:53 GMT Subject: RFR: 8292059: Do not inline InstanceKlass::allocate_instance() [v7] In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 13:39:10 GMT, Afshin Zafari wrote: >> The inline and not-inline versions of the method is tested to compare the performance difference. >> ### Test >> `make test TEST=micro:Capture0.lambda_01 MICRO="VM_OPTIONS=-XX:TieredStopAtLevel=1" ` > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > 8292059: Do not inline InstanceKlass::allocate_instance() Thank you all reviewers for your comments on this PR. ------------- PR: https://git.openjdk.org/jdk/pull/12782 From stuefe at openjdk.org Fri Mar 17 09:04:24 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 17 Mar 2023 09:04:24 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v28] In-Reply-To: <_ZdPTSdrR4kfE69GsWFm3Y_WqA_g0aa-m1o3J-TJH6I=.ad2bb926-994c-4f3c-9e74-cebb1456c55e@github.com> References: <_ZdPTSdrR4kfE69GsWFm3Y_WqA_g0aa-m1o3J-TJH6I=.ad2bb926-994c-4f3c-9e74-cebb1456c55e@github.com> Message-ID: On Fri, 17 Mar 2023 06:33:43 GMT, Roman Kennke wrote: >>> In my last changes I made a stupid mistake and don't set the condition flags correctly to force the slow-path, on aarch64. This is only relevant when we exceed the lock-stack capacity, that is why it's failing so rarely. I don't see a similar problem on x86_64 - have we observed any failures on x86_64? I pushed a fix for aarch64. >> >> I noticed this too for arm; I used cmp to clear EQ but using tst seems better. I also do it inside fast_lock, to give it a defined exit state wrt EQ|NE, since it saves me from having to think about this on every call site. But at least the fail case may be fiddly without conditional execution. > >> > In my last changes I made a stupid mistake and don't set the condition flags correctly to force the slow-path, on aarch64. This is only relevant when we exceed the lock-stack capacity, that is why it's failing so rarely. I don't see a similar problem on x86_64 - have we observed any failures on x86_64? I pushed a fix for aarch64. >> >> >> >> I noticed this too for arm; I used cmp to clear EQ but using tst seems better. I also do it inside fast_lock, to give it a defined exit state wrt EQ|NE, since it saves me from having to think about this on every call site. But at least the fail case may be fiddly without conditional execution. > > Cmp(r,r) would not clear EQ, but set it. Unless you do cmp(r,0) on a non-null register. Tst is better at least on x86 because it encodes smaller. *shrugs* > > You can do it in the shared fast_lock() but it's really only needed in C2, that's why I'm doing it there. Maybe I'm too perfectionist when it comes to assembly code? @rkennke I was not able to directly use 'JavaThread::lock_stack_offset_offset()' in cmp since it was not encodable as immediate. You did not hit the same problem on aarch64, right? IIUC that was more out of accident, since you should have similar or the same (not sure) restrictions for encoding immediates. But your Thread layout is probably different and the offset may just happened to be encodable. If so, that would make you vulnerable against changes in Thread that change the offset of the LockStack. Anyway, for now I solved this by using the second scratch register as intermediate. One more instruction though. I am now experimenting with my original idea of placing the Lockstack slots at a known aligned offset and then testing the alignment of the current index/pointer. This should be possible with a simple TST. Lets see how this goes. ------------- PR: https://git.openjdk.org/jdk/pull/10907 From mdoerr at openjdk.org Fri Mar 17 09:31:17 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 17 Mar 2023 09:31:17 GMT Subject: RFR: 8302328: [s390x] Simplify asm_assert definition In-Reply-To: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> References: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> Message-ID: On Thu, 2 Mar 2023 07:49:05 GMT, Amit Kumar wrote: > This PR cleanups some assert statements and specifies branch condition at calling site itself. Remaining asm_assert methods are inlined as well. Note that PPC64 still has `asm_assert_eq` and `asm_assert_ne`. But I'm ok with replacing it. Let's hear what other people think. Including `.inline.hpp` files from `.hpp` files is not acceptable. You need to find a different solution. ------------- PR: https://git.openjdk.org/jdk/pull/12822 From amitkumar at openjdk.org Fri Mar 17 10:24:41 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 17 Mar 2023 10:24:41 GMT Subject: RFR: 8302328: [s390x] Simplify asm_assert definition In-Reply-To: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> References: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> Message-ID: On Thu, 2 Mar 2023 07:49:05 GMT, Amit Kumar wrote: > This PR cleanups some assert statements and specifies branch condition at calling site itself. Remaining asm_assert methods are inlined as well. So does that make sense to not define them inlined at all ? Because if I do not include `.inline.hpp` then it's doesn't build. ------------- PR: https://git.openjdk.org/jdk/pull/12822 From sspitsyn at openjdk.org Fri Mar 17 10:31:46 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 17 Mar 2023 10:31:46 GMT Subject: RFR: 8304303: implement VirtualThread class notifyJvmti methods as C2 intrinsics [v4] In-Reply-To: <-Pt3zLSu1Y2GYeM8XEivglUyDVXlAqMIA42-_zEnHlo=.7dd40f19-160a-4f11-8702-99c69a9b9923@github.com> References: <-Pt3zLSu1Y2GYeM8XEivglUyDVXlAqMIA42-_zEnHlo=.7dd40f19-160a-4f11-8702-99c69a9b9923@github.com> Message-ID: <-kZ3wf7zOt0zABMfgibzmuT5VHuROnTA92lkqbhitbE=.fd934229-b4a6-469a-9c4b-ac9f26efd80f@github.com> > This is needed for performance improvements in support of virtual threads. > The update includes the following: > > 1. Refactored the `VirtualThread` native methods: > `notifyJvmtiMountBegin` and `notifyJvmtiMountEnd` => `notifyJvmtiMount` > `notifyJvmtiUnmountBegin` and `notifyJvmtiUnmountEnd` => `notifyJvmtiUnmount` > 2. Still useful implementation of old native methods is moved from `jvm.cpp` to `jvmtiThreadState.cpp`: > `JVM_VirtualThreadMountStart` => `VTMS_mount_begin` > `JVM_VirtualThreadMountEnd` => `VTMS_mount_end` > `JVM_VirtualThreadUnmountStart` = > `VTMS_unmount_begin` > `JVM_VirtualThreadUnmountEnd` => `VTMS_mount_end` > 3. Intrinsified the `VirtualThread` native methods: `notifyJvmtiMount`, `notifyJvmtiUnmount`, `notifyJvmtiHideFrames`. > 4. Removed the`VirtualThread` static boolean state variable `notifyJvmtiEvents` and its support in `javaClasses`. > 5. Added static boolean state variable `_VTMS_notify_jvmti_events` to the jvmtiVTMSTransitionDisabler class as a replacement of the `VirtualThread` `notifyJvmtiEvents` variable. > > Implementing the same methods as C1 intrinsics can be needed in the future but is a low priority for now. > > Testing: > - Ran mach5 tiers 1-6. No regressions were found. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: minor tweaks in intrisics implementation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13054/files - new: https://git.openjdk.org/jdk/pull/13054/files/f3692263..8233f0ab Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13054&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13054&range=02-03 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/13054.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13054/head:pull/13054 PR: https://git.openjdk.org/jdk/pull/13054 From lucy at openjdk.org Fri Mar 17 11:05:10 2023 From: lucy at openjdk.org (Lutz Schmidt) Date: Fri, 17 Mar 2023 11:05:10 GMT Subject: RFR: 8302328: [s390x] Simplify asm_assert definition In-Reply-To: References: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> Message-ID: On Fri, 17 Mar 2023 10:21:07 GMT, Amit Kumar wrote: > So does that make sense to not define them inlined at all ? Because if I do not include `.inline.hpp` then it's doesn't build. I was going to ask you anyway why you decided to convert the asm_assert stuff to inline methods. In general, you do that for performance reasons. Sometimes, it even helps with code size. Think of simple getter and setter methods, where the call overhead is much larger than the "useful" code. The following rules can guide you with "optimization" decisions: - Optimize for the release build case (execution time and footprint). - Optimize for the hottest code path. - Optimize for the "OK" case. - The path taken if an error occurred does not need to be as efficient as possible. - If an assertion fails, it's the end of the JVM. There is no need to rush. With the inclusion of macroAssembler.inline.hpp in interp_masm.hpp you are out of luck. As Martin stated, this is not acceptable. *.inline.hpp files are only to be included in *.cpp files. There are 63 *.cpp files including interp_masm.hpp. Any of these may require the definitions from macroAssembler.inline.hpp. How to find out which ones? Trial and error. ------------- PR: https://git.openjdk.org/jdk/pull/12822 From lucy at openjdk.org Fri Mar 17 11:08:24 2023 From: lucy at openjdk.org (Lutz Schmidt) Date: Fri, 17 Mar 2023 11:08:24 GMT Subject: RFR: 8302328: [s390x] Simplify asm_assert definition In-Reply-To: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> References: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> Message-ID: On Thu, 2 Mar 2023 07:49:05 GMT, Amit Kumar wrote: > This PR cleanups some assert statements and specifies branch condition at calling site itself. Remaining asm_assert methods are inlined as well. Overall, I like this change. There are some locations you have to revisit, though. Wrt inlining the as_assert* stuff see my separate comment. src/hotspot/cpu/s390/c1_LIRAssembler_s390.cpp line 2985: > 2983: } > 2984: } else { > 2985: __ asm_assert(Assembler::bcondNotEqual, "unexpected null obj", __LINE__); Condition code is from "load and test" instruction. Semantically, that is a comparison against zero. Therefore, please use the semantically correct mask bcondNotZero. Helps others to understand the code. src/hotspot/cpu/s390/gc/g1/g1BarrierSetAssembler_s390.cpp line 180: > 178: #ifdef ASSERT > 179: __ z_ltgr(Rpre_val, Rpre_val); > 180: __ asm_assert(Assembler::bcondNotEqual, "null oop not allowed (G1 pre)", 0x321); // Checked by caller. Please use bcondNotZero. For reasoning, see above. src/hotspot/cpu/s390/gc/g1/g1BarrierSetAssembler_s390.cpp line 292: > 290: #ifdef ASSERT > 291: __ z_ltgr(Rnew_val, Rnew_val); > 292: __ asm_assert(Assembler::bcondNotEqual, "null oop not allowed (G1 post)", 0x322); // Checked by caller. Please use bcondNotZero. For reasoning, see above. src/hotspot/cpu/s390/macroAssembler_s390.inline.hpp line 352: > 350: } else { > 351: if (tmp != expected_size) { > 352: z_lgr(tmp, expected_size); You could use lgr_if_needed() here. src/hotspot/cpu/s390/macroAssembler_s390.inline.hpp line 353: > 351: if (tmp != expected_size) { > 352: z_lgr(tmp, expected_size); > 353: } Shouldn't the else block close here? As coded, the method will have no effect for (tmp == noreg). src/hotspot/cpu/s390/runtime_s390.cpp line 122: > 120: #ifdef ASSERT > 121: __ z_ltgr(handle_exception, handle_exception); > 122: __ asm_assert(Assembler::bcondNotEqual, "handler must not be NULL", 0x852); Please use bcondNotZero. For reasoning, see above. src/hotspot/cpu/s390/sharedRuntime_s390.cpp line 2479: > 2477: // Make sure that there is at least one entry in the array. > 2478: DEBUG_ONLY(__ z_ltgr(number_of_frames_reg, number_of_frames_reg)); > 2479: __ asm_assert(Assembler::bcondNotEqual, "array_size must be > 0", 0x205); Please use bcondNotZero. For reasoning, see above. src/hotspot/cpu/s390/stubGenerator_s390.cpp line 738: > 736: #ifdef ASSERT > 737: __ z_srag(Z_R0, count, 31); // Just leave the sign (must be zero) in Z_R0. > 738: __ asm_assert(Assembler::bcondEqual, "missing zero extend", 0xAFFE); Please use bcondZero. The CC from srag is equivalent to comparing the result against zero. src/hotspot/cpu/s390/stubRoutines_s390.cpp line 58: > 56: __ z_cgr(table, Z_R0); // safety net > 57: __ z_bre(L); > 58: __ z_illtrap(); You should move z_illtrap() after asm_assert(). Not your fault! src/hotspot/cpu/s390/stubRoutines_s390.cpp line 68: > 66: __ z_bre(L); > 67: __ z_l(Z_R0, Address(table, 4)); // Load data from memory, we know the constant we compared against. > 68: __ z_illtrap(); You should move z_illtrap() after asm_assert(). Not your fault! src/hotspot/cpu/s390/stubRoutines_s390.cpp line 103: > 101: __ z_cgr(table, Z_R0); // safety net > 102: __ z_bre(L); > 103: __ z_illtrap(); You should move z_illtrap() after asm_assert(). Not your fault! src/hotspot/cpu/s390/stubRoutines_s390.cpp line 113: > 111: __ z_bre(L); > 112: __ z_lg(Z_R0, Address(table, 8)); // Load data from memory, we know the constant we compared against. > 113: __ z_illtrap(); You should move z_illtrap() after asm_assert(). Not your fault! src/hotspot/share/interpreter/interp_masm.hpp line 28: > 26: #define SHARE_INTERPRETER_INTERP_MASM_HPP > 27: > 28: #include "asm/macroAssembler.inline.hpp" All the CPU_HEADER(interp_masm) files include "asm/macroAssembler.hpp" as well. That is redundant. You could fix it, but probably with a separate PR. ------------- Changes requested by lucy (Reviewer). PR: https://git.openjdk.org/jdk/pull/12822 From duke at openjdk.org Fri Mar 17 13:26:33 2023 From: duke at openjdk.org (Afshin Zafari) Date: Fri, 17 Mar 2023 13:26:33 GMT Subject: Integrated: 8292059: Do not inline InstanceKlass::allocate_instance() In-Reply-To: References: Message-ID: <4F-EPimrITYjasIFSTTwZHBSC2wg0RjxUA_v5RLFvv0=.2e84d720-33d8-4ace-94e7-6a83bf21938d@github.com> On Tue, 28 Feb 2023 11:11:54 GMT, Afshin Zafari wrote: > The inline and not-inline versions of the method is tested to compare the performance difference. > ### Test > `make test TEST=micro:Capture0.lambda_01 MICRO="VM_OPTIONS=-XX:TieredStopAtLevel=1" ` This pull request has now been integrated. Changeset: cb4ae192 Author: Afshin Zafari Committer: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/cb4ae1922db7fe3645fd50f301b4a1be965bc79b Stats: 30 lines in 3 files changed: 13 ins; 16 del; 1 mod 8292059: Do not inline InstanceKlass::allocate_instance() Reviewed-by: coleenp, stefank ------------- PR: https://git.openjdk.org/jdk/pull/12782 From fparain at openjdk.org Fri Mar 17 13:51:05 2023 From: fparain at openjdk.org (Frederic Parain) Date: Fri, 17 Mar 2023 13:51:05 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v6] In-Reply-To: References: Message-ID: > Please review this change re-implementing the FieldInfo data structure. > > The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. > > The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). > > More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 > > Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. > > The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. > > Tested with mach5, tier 1 to 7. > > Thank you. Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: Style fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12855/files - new: https://git.openjdk.org/jdk/pull/12855/files/f81337f7..ab57b03a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12855&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12855&range=04-05 Stats: 5 lines in 2 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/12855.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12855/head:pull/12855 PR: https://git.openjdk.org/jdk/pull/12855 From lucy at openjdk.org Fri Mar 17 14:30:54 2023 From: lucy at openjdk.org (Lutz Schmidt) Date: Fri, 17 Mar 2023 14:30:54 GMT Subject: RFR: 8302328: [s390x] Simplify asm_assert definition In-Reply-To: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> References: <51VAVmniXPGE4sgcdw2AnU1hqZn2fVJRyDEbrt3AyyU=.b70f59dc-8397-4da4-b36b-266766434b19@github.com> Message-ID: On Thu, 2 Mar 2023 07:49:05 GMT, Amit Kumar wrote: > This PR cleanups some assert statements and specifies branch condition at calling site itself. Remaining asm_assert methods are inlined as well. Changes requested by lucy (Reviewer). src/hotspot/cpu/s390/macroAssembler_s390.hpp line 870: > 868: > 869: // Assert if CC indicates "not equal" (check_equal==true) or "equal" (check_equal==false). > 870: inline void asm_assert(bool check_equal, const char* msg, int id) { This will not work. Implementation of called method is not visible here. Need to move implementation to macroAssembler_s390.inline.hpp ------------- PR: https://git.openjdk.org/jdk/pull/12822 From aturbanov at openjdk.org Fri Mar 17 14:54:30 2023 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Fri, 17 Mar 2023 14:54:30 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v6] In-Reply-To: References: Message-ID: On Fri, 17 Mar 2023 13:51:05 GMT, Frederic Parain wrote: >> Please review this change re-implementing the FieldInfo data structure. >> >> The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. >> >> The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). >> >> More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 >> >> Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. >> >> The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. >> >> Tested with mach5, tier 1 to 7. >> >> Thank you. > > Frederic Parain has updated the pull request incrementally with one additional commit since the last revision: > > Style fixes src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/InstanceKlass.java line 268: > 266: > 267: Field getField(int index) { > 268: synchronized(this) { nit Suggestion: synchronized (this) { src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/tools/jcore/ClassWriter.java line 380: > 378: dos.writeShort(accessFlags & (short) JVM_RECOGNIZED_FIELD_MODIFIERS); > 379: > 380: int nameIndex = klass.getFieldNameIndex(index); nit: Suggestion: int nameIndex = klass.getFieldNameIndex(index); ------------- PR: https://git.openjdk.org/jdk/pull/12855 From aturbanov at openjdk.org Fri Mar 17 14:57:44 2023 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Fri, 17 Mar 2023 14:57:44 GMT Subject: RFR: 8303431: [JVMCI] libgraal annotation API [v6] In-Reply-To: <0hYs21V1ZWB8o92CfvkEW3i0dZKkeW8kYGQu0p6xvtM=.e76da2cd-dbe5-4da2-a6cb-775f081b9a6a@github.com> References: <0hYs21V1ZWB8o92CfvkEW3i0dZKkeW8kYGQu0p6xvtM=.e76da2cd-dbe5-4da2-a6cb-775f081b9a6a@github.com> Message-ID: On Tue, 14 Mar 2023 16:06:06 GMT, Doug Simon wrote: >> This PR extends JVMCI with new API (`jdk.vm.ci.meta.Annotated`) for accessing annotations. The main differences from `java.lang.reflect.AnnotatedElement` are: >> * All methods in the `Annotated` interface explicitly specify requested annotation type(s). That is, there is no equivalent of `AnnotatedElement.getAnnotations()`. >> * Annotation data is returned in a map-like object (of type `jdk.vm.ci.meta.AnnotationData`) instead of in an `Annotation` object. This works better for libgraal as it avoids the need for annotation types to be loaded and included in libgraal. >> >> To demonstrate the new API, here's an example in terms `java.lang.reflect.AnnotatedElement` (which `ResolvedJavaType` implements): >> >> ResolvedJavaMethod method = ...; >> ExplodeLoop a = method.getAnnotation(ExplodeLoop.class); >> return switch (a.kind()) { >> case FULL_UNROLL -> LoopExplosionKind.FULL_UNROLL; >> case FULL_UNROLL_UNTIL_RETURN -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; >> ... >> } >> >> >> The same code using the new API: >> >> >> ResolvedJavaMethod method = ...; >> ResolvedJavaType explodeLoopType = ...; >> AnnotationData a = method.getAnnotationDataFor(explodeLoopType); >> return switch (a.getEnum("kind").getName()) { >> case "FULL_UNROLL" -> LoopExplosionKind.FULL_UNROLL; >> case "FULL_UNROLL_UNTIL_RETURN" -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; >> ... >> } >> >> >> The implementation relies on new methods in `jdk.internal.vm.VMSupport` for parsing annotations and serializing/deserializing to/from a byte array. This allows the annotation data to be passed from the HotSpot heap to the libgraal heap. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > addressed review feedback src/java.base/share/classes/jdk/internal/vm/VMSupport.java line 237: > 235: try { > 236: ByteArrayOutputStream baos = new ByteArrayOutputStream(128); > 237: try(DataOutputStream dos = new DataOutputStream(baos)) { nit Suggestion: try (DataOutputStream dos = new DataOutputStream(baos)) { src/java.base/share/classes/jdk/internal/vm/VMSupport.java line 564: > 562: } else if (length <= 127) { > 563: dos.writeByte((byte) (0x80 | length)); > 564: } else { nit Suggestion: } else { ------------- PR: https://git.openjdk.org/jdk/pull/12810 From aturbanov at openjdk.org Fri Mar 17 14:59:43 2023 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Fri, 17 Mar 2023 14:59:43 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v6] In-Reply-To: References: Message-ID: On Wed, 15 Mar 2023 18:45:00 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Fixed aarch64 interpreter mistake src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/ResolvedIndyEntry.java line 49: > 47: > 48: private static synchronized void initialize(TypeDataBase db) throws WrongTypeException { > 49: Type type = db.lookupType("ResolvedIndyEntry"); Suggestion: Type type = db.lookupType("ResolvedIndyEntry"); ------------- PR: https://git.openjdk.org/jdk/pull/12778 From dnsimon at openjdk.org Fri Mar 17 15:38:49 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 17 Mar 2023 15:38:49 GMT Subject: RFR: 8303431: [JVMCI] libgraal annotation API [v7] In-Reply-To: References: Message-ID: > This PR extends JVMCI with new API (`jdk.vm.ci.meta.Annotated`) for accessing annotations. The main differences from `java.lang.reflect.AnnotatedElement` are: > * All methods in the `Annotated` interface explicitly specify requested annotation type(s). That is, there is no equivalent of `AnnotatedElement.getAnnotations()`. > * Annotation data is returned in a map-like object (of type `jdk.vm.ci.meta.AnnotationData`) instead of in an `Annotation` object. This works better for libgraal as it avoids the need for annotation types to be loaded and included in libgraal. > > To demonstrate the new API, here's an example in terms `java.lang.reflect.AnnotatedElement` (which `ResolvedJavaType` implements): > > ResolvedJavaMethod method = ...; > ExplodeLoop a = method.getAnnotation(ExplodeLoop.class); > return switch (a.kind()) { > case FULL_UNROLL -> LoopExplosionKind.FULL_UNROLL; > case FULL_UNROLL_UNTIL_RETURN -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; > ... > } > > > The same code using the new API: > > > ResolvedJavaMethod method = ...; > ResolvedJavaType explodeLoopType = ...; > AnnotationData a = method.getAnnotationDataFor(explodeLoopType); > return switch (a.getEnum("kind").getName()) { > case "FULL_UNROLL" -> LoopExplosionKind.FULL_UNROLL; > case "FULL_UNROLL_UNTIL_RETURN" -> LoopExplosionKind.FULL_UNROLL_UNTIL_RETURN; > ... > } > > > The implementation relies on new methods in `jdk.internal.vm.VMSupport` for parsing annotations and serializing/deserializing to/from a byte array. This allows the annotation data to be passed from the HotSpot heap to the libgraal heap. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: [skip ci] formatting fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12810/files - new: https://git.openjdk.org/jdk/pull/12810/files/abaf2375..32131796 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12810&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12810&range=05-06 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/12810.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12810/head:pull/12810 PR: https://git.openjdk.org/jdk/pull/12810 From fparain at openjdk.org Fri Mar 17 15:58:35 2023 From: fparain at openjdk.org (Frederic Parain) Date: Fri, 17 Mar 2023 15:58:35 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v7] In-Reply-To: References: Message-ID: > Please review this change re-implementing the FieldInfo data structure. > > The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. > > The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). > > More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 > > Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. > > The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. > > Tested with mach5, tier 1 to 7. > > Thank you. Frederic Parain has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: - Style fixes - Merge remote-tracking branch 'upstream/master' into fieldinfo_unsigned5 - Style fixes - SA and JVMCI fixes - Fixes includes and style - SA additional caching from Chris Plummer - Addressing comments from first reviews - Merge remote-tracking branch 'upstream/master' into fieldinfo_unsigned5 - Reimplementation of FieldInfo as an unsigned5 stream ------------- Changes: https://git.openjdk.org/jdk/pull/12855/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12855&range=06 Stats: 1790 lines in 54 files changed: 927 ins; 483 del; 380 mod Patch: https://git.openjdk.org/jdk/pull/12855.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12855/head:pull/12855 PR: https://git.openjdk.org/jdk/pull/12855 From vlivanov at openjdk.org Fri Mar 17 17:36:09 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 17 Mar 2023 17:36:09 GMT Subject: RFR: 8304303: implement VirtualThread class notifyJvmti methods as C2 intrinsics [v4] In-Reply-To: <-kZ3wf7zOt0zABMfgibzmuT5VHuROnTA92lkqbhitbE=.fd934229-b4a6-469a-9c4b-ac9f26efd80f@github.com> References: <-Pt3zLSu1Y2GYeM8XEivglUyDVXlAqMIA42-_zEnHlo=.7dd40f19-160a-4f11-8702-99c69a9b9923@github.com> <-kZ3wf7zOt0zABMfgibzmuT5VHuROnTA92lkqbhitbE=.fd934229-b4a6-469a-9c4b-ac9f26efd80f@github.com> Message-ID: On Fri, 17 Mar 2023 10:31:46 GMT, Serguei Spitsyn wrote: >> This is needed for performance improvements in support of virtual threads. >> The update includes the following: >> >> 1. Refactored the `VirtualThread` native methods: >> `notifyJvmtiMountBegin` and `notifyJvmtiMountEnd` => `notifyJvmtiMount` >> `notifyJvmtiUnmountBegin` and `notifyJvmtiUnmountEnd` => `notifyJvmtiUnmount` >> 2. Still useful implementation of old native methods is moved from `jvm.cpp` to `jvmtiThreadState.cpp`: >> `JVM_VirtualThreadMountStart` => `VTMS_mount_begin` >> `JVM_VirtualThreadMountEnd` => `VTMS_mount_end` >> `JVM_VirtualThreadUnmountStart` = > `VTMS_unmount_begin` >> `JVM_VirtualThreadUnmountEnd` => `VTMS_mount_end` >> 3. Intrinsified the `VirtualThread` native methods: `notifyJvmtiMount`, `notifyJvmtiUnmount`, `notifyJvmtiHideFrames`. >> 4. Removed the`VirtualThread` static boolean state variable `notifyJvmtiEvents` and its support in `javaClasses`. >> 5. Added static boolean state variable `_VTMS_notify_jvmti_events` to the jvmtiVTMSTransitionDisabler class as a replacement of the `VirtualThread` `notifyJvmtiEvents` variable. >> >> Implementing the same methods as C1 intrinsics can be needed in the future but is a low priority for now. >> >> Testing: >> - Ran mach5 tiers 1-6. No regressions were found. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > minor tweaks in intrisics implementation Overall, compiler changes look good. Any performance numbers to justify the intrinsification? ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.org/jdk/pull/13054 From sspitsyn at openjdk.org Fri Mar 17 18:36:18 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 17 Mar 2023 18:36:18 GMT Subject: RFR: 8304303: implement VirtualThread class notifyJvmti methods as C2 intrinsics [v4] In-Reply-To: References: <-Pt3zLSu1Y2GYeM8XEivglUyDVXlAqMIA42-_zEnHlo=.7dd40f19-160a-4f11-8702-99c69a9b9923@github.com> <-kZ3wf7zOt0zABMfgibzmuT5VHuROnTA92lkqbhitbE=.fd934229-b4a6-469a-9c4b-ac9f26efd80f@github.com> Message-ID: On Fri, 17 Mar 2023 17:33:46 GMT, Vladimir Ivanov wrote: > Overall, compiler changes look good. > Any performance numbers to justify the intrinsification? Thank you for review and your guidance and help with C2 intrinsification! My goal was to move the notifyJvmtiEvents checks from Java to VM side without a performance penalty. I do not observe any performance degradation with customized Skynet benchmark executing 5 million virtual threads. Used `time` utility to measure total execution time (in milliseconds) of 10 runs on Oracle Linux server: - without intrinsics: 6083, 5405, 5270, 5700, 5004, 5402, 5536, 5031, 4902, 5124 - with intrinsics: 5904, 5287, 5470, 5672, 5298, 5053, 6154, 4992, 6237, 5155 ------------- PR: https://git.openjdk.org/jdk/pull/13054 From matsaave at openjdk.org Fri Mar 17 19:53:28 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Fri, 17 Mar 2023 19:53:28 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v7] In-Reply-To: References: Message-ID: > The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. > > This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. > > Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. > > The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. > > This change supports the following platforms: x86, aarch64, PPC, and RISCV Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Acquire semantics ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12778/files - new: https://git.openjdk.org/jdk/pull/12778/files/9a3a63ae..3dc112b2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=05-06 Stats: 10 lines in 4 files changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/12778.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12778/head:pull/12778 PR: https://git.openjdk.org/jdk/pull/12778 From fparain at openjdk.org Fri Mar 17 20:19:39 2023 From: fparain at openjdk.org (Frederic Parain) Date: Fri, 17 Mar 2023 20:19:39 GMT Subject: RFR: 8292818: replace 96-bit representation for field metadata with variable-sized streams [v7] In-Reply-To: References: Message-ID: On Fri, 17 Mar 2023 15:58:35 GMT, Frederic Parain wrote: >> Please review this change re-implementing the FieldInfo data structure. >> >> The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. >> >> The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). >> >> More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 >> >> Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. >> >> The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. >> >> Tested with mach5, tier 1 to 7. >> >> Thank you. > > Frederic Parain has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: > > - Style fixes > - Merge remote-tracking branch 'upstream/master' into fieldinfo_unsigned5 > - Style fixes > - SA and JVMCI fixes > - Fixes includes and style > - SA additional caching from Chris Plummer > - Addressing comments from first reviews > - Merge remote-tracking branch 'upstream/master' into fieldinfo_unsigned5 > - Reimplementation of FieldInfo as an unsigned5 stream Chris, Doug, thank you for your reviews and your help. Coleen, David, Andrey, thank you for your reviews. ------------- PR: https://git.openjdk.org/jdk/pull/12855 From fparain at openjdk.org Fri Mar 17 20:22:48 2023 From: fparain at openjdk.org (Frederic Parain) Date: Fri, 17 Mar 2023 20:22:48 GMT Subject: Integrated: 8292818: replace 96-bit representation for field metadata with variable-sized streams In-Reply-To: References: Message-ID: <1g-MNikg2bzX02U4IDcsKO4nGqPUWZI-77gTMrmQtlA=.5a102edd-69f2-4cd8-9738-fe5075f02f2c@github.com> On Fri, 3 Mar 2023 14:50:34 GMT, Frederic Parain wrote: > Please review this change re-implementing the FieldInfo data structure. > > The FieldInfo array is an old data structure storing fields metadata. It has poor extension capabilities, a complex management code because of lack of strong typing and semantic overloading, and a poor memory efficiency. > > The new implementation uses a compressed stream to store those metadata, achieving better memory density and providing flexible extensibility, while exposing a strongly typed set of data when uncompressed. The stream is compressed using the unsigned5 encoding, which alreay present in the JDK (because of pack200) and the JVM (because JIT compulers use it to comrpess debugging information). > > More technical details are available in the CR: https://bugs.openjdk.org/browse/JDK-8292818 > > Those changes include a re-organisation of fields' flags, splitting the previous heterogeneous AccessFlags field into three distincts flag categories: immutable flags from the class file, immutable fields defined by the JVM, and finally mutable flags defined by the JVM. > > The SA, CI, and JVMCI, which all used to access the old FieldInfo array, have been updated too to deal with the new FieldInfo format. > > Tested with mach5, tier 1 to 7. > > Thank you. This pull request has now been integrated. Changeset: bfb812a8 Author: Frederic Parain URL: https://git.openjdk.org/jdk/commit/bfb812a8ff8bca70aed7695c73f019ae66ac6f33 Stats: 1790 lines in 54 files changed: 927 ins; 483 del; 380 mod 8292818: replace 96-bit representation for field metadata with variable-sized streams Co-authored-by: John R Rose Co-authored-by: Chris Plummer Reviewed-by: dholmes, coleenp, cjplummer, dnsimon ------------- PR: https://git.openjdk.org/jdk/pull/12855 From matsaave at openjdk.org Fri Mar 17 22:08:23 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Fri, 17 Mar 2023 22:08:23 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v8] In-Reply-To: References: Message-ID: > The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. > > This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. > > Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. > > The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. > > This change supports the following platforms: x86, aarch64, PPC, and RISCV Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Fixed aarch64 and added load-acquire for resolution check ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12778/files - new: https://git.openjdk.org/jdk/pull/12778/files/3dc112b2..6600e6dc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=06-07 Stats: 7 lines in 2 files changed: 4 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/12778.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12778/head:pull/12778 PR: https://git.openjdk.org/jdk/pull/12778 From stuefe at openjdk.org Sat Mar 18 06:34:22 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 18 Mar 2023 06:34:22 GMT Subject: RFR: JDK-8294266: Add a way to pre-touch java thread stacks [v4] In-Reply-To: References: Message-ID: On Mon, 14 Nov 2022 10:16:58 GMT, Thomas Stuefe wrote: >> When doing performance- and footprint analysis, `AlwaysPreTouch` option is very handy for reducing noise. It would be good to have a similar option for pre-touching thread stacks. In addition to reducing noise, it can serve as worst-case test for thread costs, as well as a test for NMT regressions. >> >> Patch adds a new diagnostic switch, `AlwaysPreTouchStacks`, as a companion switch to `AlwaysPreTouch`. Touching is super-simple using `alloca()`. Also, regression test. >> >> Examples: >> >> NMT, thread stacks, 10000 Threads, default: >> >> >> - Thread (reserved=10332400KB, committed=331828KB) >> (thread #10021) >> (stack: reserved=10301560KB, committed=300988KB) >> (malloc=19101KB #60755) >> (arena=11739KB #20037) >> >> >> NMT, thread stacks, 10000 Threads, +AlwaysPreTouchStacks: >> >> >> - Thread (reserved=10332400KB, committed=10284360KB) >> (thread #10021) >> (stack: reserved=10301560KB, committed=10253520KB) >> (malloc=19101KB #60755) >> (arena=11739KB #20037) > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > test changes, comment change not yet bot ------------- PR: https://git.openjdk.org/jdk/pull/10403 From yzhu at openjdk.org Sat Mar 18 08:50:18 2023 From: yzhu at openjdk.org (Yanhong Zhu) Date: Sat, 18 Mar 2023 08:50:18 GMT Subject: RFR: 8304293: RISC-V: JDK-8276799 missed atomic intrinsic support for C1 In-Reply-To: <9U2safmMfGsfIIlyBZW3E3dbm68mKcBAzemr5yqoPEw=.c261f575-2cbc-402e-b742-7a6bdc8c2450@github.com> References: <9U2safmMfGsfIIlyBZW3E3dbm68mKcBAzemr5yqoPEw=.c261f575-2cbc-402e-b742-7a6bdc8c2450@github.com> Message-ID: On Thu, 16 Mar 2023 03:37:10 GMT, Feilong Jiang wrote: > The following intrinsics in C1 are controlled by supports_atomic_xxx, but they are not set properly on RISC-V: > - _getAndAddInt > - _getAndAddLong > - _getAndSetInt > - _getAndSetLong > - _getAndSetReference > > RISC-V provides a set of atomic instructions [1], these intrinsics could be enabled by default. > > Here is the HIR output of C1: > > before: > > > B18 (V) [189, 196] -> B20 pred: B8 B17 > empty stack > inlining depth 0 > __bci__use__tid____instr____________________________________ > 0 0 a251 > 3 0 a252 null > 4 0 l254 274954985816L > 7 0 l255 1L > . 8 0 l256 a251.invokespecial(a252, l254, l255) > jdk/internal/misc/Unsafe.getAndAddLong(Ljava/lang/Object;JJ)J > . 193 0 l258 a42._24 := l256 (J) tid > . 196 0 259 goto B20 > > > after: > > > B18 (V) [189, 196] -> B20 pred: B8 B17 > empty stack > inlining depth 0 > __bci__use__tid____instr____________________________________ > 0 0 a251 > 3 0 a252 null > 4 0 l254 274954985816L > 7 0 l255 1L > . 8 0 l256 UnsafeGetAndSet (add)(a252, l254, value l255) > . 193 0 l258 a42._24 := l256 (J) tid > . 196 0 259 goto B20 > > > 1. https://github.com/riscv/riscv-isa-manual/blob/8b9047d8d20ef548f7996efee1550760d7bc1279/src/a.tex#L416-L422 > > Testing: > > - [x] `hotspot_tier1` & `jdk_tier1` on Unmatched board (release build) Looks good to me. ------------- Marked as reviewed by yzhu (Author). PR: https://git.openjdk.org/jdk/pull/13053 From alanb at openjdk.org Sat Mar 18 11:27:20 2023 From: alanb at openjdk.org (Alan Bateman) Date: Sat, 18 Mar 2023 11:27:20 GMT Subject: RFR: 8304303: implement VirtualThread class notifyJvmti methods as C2 intrinsics [v4] In-Reply-To: <-kZ3wf7zOt0zABMfgibzmuT5VHuROnTA92lkqbhitbE=.fd934229-b4a6-469a-9c4b-ac9f26efd80f@github.com> References: <-Pt3zLSu1Y2GYeM8XEivglUyDVXlAqMIA42-_zEnHlo=.7dd40f19-160a-4f11-8702-99c69a9b9923@github.com> <-kZ3wf7zOt0zABMfgibzmuT5VHuROnTA92lkqbhitbE=.fd934229-b4a6-469a-9c4b-ac9f26efd80f@github.com> Message-ID: <-OJbhkKU3EtSS8E31eEd62h3-x5Szpl_Hk0apm1a6aQ=.687c660f-bc13-41cd-bc63-c59ca60300f0@github.com> On Fri, 17 Mar 2023 10:31:46 GMT, Serguei Spitsyn wrote: >> This is needed for future performance/scalability improvements in JVMTI support of virtual threads. >> The update includes the following: >> >> 1. Refactored the `VirtualThread` native methods: >> `notifyJvmtiMountBegin` and `notifyJvmtiMountEnd` => `notifyJvmtiMount` >> `notifyJvmtiUnmountBegin` and `notifyJvmtiUnmountEnd` => `notifyJvmtiUnmount` >> 2. Still useful implementation of old native methods is moved from `jvm.cpp` to `jvmtiThreadState.cpp`: >> `JVM_VirtualThreadMountStart` => `VTMS_mount_begin` >> `JVM_VirtualThreadMountEnd` => `VTMS_mount_end` >> `JVM_VirtualThreadUnmountStart` = > `VTMS_unmount_begin` >> `JVM_VirtualThreadUnmountEnd` => `VTMS_mount_end` >> 3. Intrinsified the `VirtualThread` native methods: `notifyJvmtiMount`, `notifyJvmtiUnmount`, `notifyJvmtiHideFrames`. >> 4. Removed the`VirtualThread` static boolean state variable `notifyJvmtiEvents` and its support in `javaClasses`. >> 5. Added static boolean state variable `_VTMS_notify_jvmti_events` to the jvmtiVTMSTransitionDisabler class as a replacement of the `VirtualThread` `notifyJvmtiEvents` variable. >> >> Implementing the same methods as C1 intrinsics can be needed in the future but is a low priority for now. >> >> Testing: >> - Ran mach5 tiers 1-6. No regressions were found. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > minor tweaks in intrisics implementation The most important case is when there is no JVMTI env. If I read the changes correctly, the overhead for park/continue changes from one volatile-read (notifyJvmtiEvents) to two plain-writes (JavaThread::_is_in_VTMS_transition). If a JVMTI env has been created then there is no benefit for the moment as there is still a call into the runtime to interact with JvmtiVTMSTransitionDisabler. So I think you are saying that is for follow-on PRs. ------------- PR: https://git.openjdk.org/jdk/pull/13054 From qamai at openjdk.org Sun Mar 19 13:10:09 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 19 Mar 2023 13:10:09 GMT Subject: RFR: 8304450: [vectorapi] Refactor VectorShuffle implementation Message-ID: Hi, This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, VectorShuffle is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: 1. Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. 2. Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. 3. Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. 4. Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. Upon these changes, a `rearrange` can emit more efficient code: var species = IntVector.SPECIES_128; var v1 = IntVector.fromArray(species, SRC1, 0); var v2 = IntVector.fromArray(species, SRC2, 0); v1.rearrange(v2.toShuffle()).intoArray(DST, 0); Before: movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} vmovdqu 0x10(%r10),%xmm2 movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} vmovdqu 0x10(%r10),%xmm1 movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} vmovdqu 0x10(%r10),%xmm0 vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask ; {external_word} vpackusdw %xmm0,%xmm0,%xmm0 vpackuswb %xmm0,%xmm0,%xmm0 vpmovsxbd %xmm0,%xmm3 vpcmpgtd %xmm3,%xmm1,%xmm3 vtestps %xmm3,%xmm3 jne 0x00007fc2acb4e0d8 vpmovzxbd %xmm0,%xmm0 vpermd %ymm2,%ymm0,%ymm0 movabs $0x751588f98,%r10 ; {oop([I{0x0000000751588f98})} vmovdqu %xmm0,0x10(%r10) After: movabs $0x751589c78,%r10 ; {oop([I{0x0000000751589c78})} vmovdqu 0x10(%r10),%xmm1 movabs $0x75158ac88,%r10 ; {oop([I{0x000000075158ac88})} vmovdqu 0x10(%r10),%xmm2 vpxor %xmm0,%xmm0,%xmm0 vpcmpgtd %xmm2,%xmm0,%xmm3 vtestps %xmm3,%xmm3 jne 0x00007fa818b27cb1 vpermd %ymm1,%ymm2,%ymm0 movabs $0x751588c68,%r10 ; {oop([I{0x0000000751588c68})} vmovdqu %xmm0,0x10(%r10) Please take a look and leave reviews. Thanks a lot. ------------- Commit messages: - fix internal types, clean up - optimise laneIsValid - Merge branch 'master' into shufflerefactor - small beautifications - other architecture - fix mismatched fp vector payload types - draft Changes: https://git.openjdk.org/jdk/pull/13093/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13093&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8304450 Stats: 4440 lines in 62 files changed: 2567 ins; 651 del; 1222 mod Patch: https://git.openjdk.org/jdk/pull/13093.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13093/head:pull/13093 PR: https://git.openjdk.org/jdk/pull/13093 From lmesnik at openjdk.org Sun Mar 19 16:52:20 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Sun, 19 Mar 2023 16:52:20 GMT Subject: RFR: 8304303: implement VirtualThread class notifyJvmti methods as C2 intrinsics [v4] In-Reply-To: <-kZ3wf7zOt0zABMfgibzmuT5VHuROnTA92lkqbhitbE=.fd934229-b4a6-469a-9c4b-ac9f26efd80f@github.com> References: <-Pt3zLSu1Y2GYeM8XEivglUyDVXlAqMIA42-_zEnHlo=.7dd40f19-160a-4f11-8702-99c69a9b9923@github.com> <-kZ3wf7zOt0zABMfgibzmuT5VHuROnTA92lkqbhitbE=.fd934229-b4a6-469a-9c4b-ac9f26efd80f@github.com> Message-ID: On Fri, 17 Mar 2023 10:31:46 GMT, Serguei Spitsyn wrote: >> This is needed for future performance/scalability improvements in JVMTI support of virtual threads. >> The update includes the following: >> >> 1. Refactored the `VirtualThread` native methods: >> `notifyJvmtiMountBegin` and `notifyJvmtiMountEnd` => `notifyJvmtiMount` >> `notifyJvmtiUnmountBegin` and `notifyJvmtiUnmountEnd` => `notifyJvmtiUnmount` >> 2. Still useful implementation of old native methods is moved from `jvm.cpp` to `jvmtiThreadState.cpp`: >> `JVM_VirtualThreadMountStart` => `VTMS_mount_begin` >> `JVM_VirtualThreadMountEnd` => `VTMS_mount_end` >> `JVM_VirtualThreadUnmountStart` = > `VTMS_unmount_begin` >> `JVM_VirtualThreadUnmountEnd` => `VTMS_mount_end` >> 3. Intrinsified the `VirtualThread` native methods: `notifyJvmtiMount`, `notifyJvmtiUnmount`, `notifyJvmtiHideFrames`. >> 4. Removed the`VirtualThread` static boolean state variable `notifyJvmtiEvents` and its support in `javaClasses`. >> 5. Added static boolean state variable `_VTMS_notify_jvmti_events` to the jvmtiVTMSTransitionDisabler class as a replacement of the `VirtualThread` `notifyJvmtiEvents` variable. >> >> Implementing the same methods as C1 intrinsics can be needed in the future but is a low priority for now. >> >> Testing: >> - Ran mach5 tiers 1-6. No regressions were found. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > minor tweaks in intrisics implementation I haven't reviewed C2 changes, all other changes look good to me. ------------- Marked as reviewed by lmesnik (Reviewer). PR: https://git.openjdk.org/jdk/pull/13054 From qamai at openjdk.org Sun Mar 19 19:38:04 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 19 Mar 2023 19:38:04 GMT Subject: RFR: 8304450: [vectorapi] Refactor VectorShuffle implementation [v2] In-Reply-To: References: Message-ID: > Hi, > > This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, VectorShuffle is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: > > 1. Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. > 2. Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. > 3. Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. > 4. Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. > > Upon these changes, a `rearrange` can emit more efficient code: > > var species = IntVector.SPECIES_128; > var v1 = IntVector.fromArray(species, SRC1, 0); > var v2 = IntVector.fromArray(species, SRC2, 0); > v1.rearrange(v2.toShuffle()).intoArray(DST, 0); > > Before: > movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} > vmovdqu 0x10(%r10),%xmm2 > movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} > vmovdqu 0x10(%r10),%xmm1 > movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} > vmovdqu 0x10(%r10),%xmm0 > vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask > ; {external_word} > vpackusdw %xmm0,%xmm0,%xmm0 > vpackuswb %xmm0,%xmm0,%xmm0 > vpmovsxbd %xmm0,%xmm3 > vpcmpgtd %xmm3,%xmm1,%xmm3 > vtestps %xmm3,%xmm3 > jne 0x00007fc2acb4e0d8 > vpmovzxbd %xmm0,%xmm0 > vpermd %ymm2,%ymm0,%ymm0 > movabs $0x751588f98,%r10 ; {oop([I{0x0000000751588f98})} > vmovdqu %xmm0,0x10(%r10) > > After: > movabs $0x751589c78,%r10 ; {oop([I{0x0000000751589c78})} > vmovdqu 0x10(%r10),%xmm1 > movabs $0x75158ac88,%r10 ; {oop([I{0x000000075158ac88})} > vmovdqu 0x10(%r10),%xmm2 > vpxor %xmm0,%xmm0,%xmm0 > vpcmpgtd %xmm2,%xmm0,%xmm3 > vtestps %xmm3,%xmm3 > jne 0x00007fa818b27cb1 > vpermd %ymm1,%ymm2,%ymm0 > movabs $0x751588c68,%r10 ; {oop([I{0x0000000751588c68})} > vmovdqu %xmm0,0x10(%r10) > > Please take a look and leave reviews. Thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: fix Matcher::vector_needs_load_shuffle ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13093/files - new: https://git.openjdk.org/jdk/pull/13093/files/7acf928d..060554a9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13093&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13093&range=00-01 Stats: 9 lines in 1 file changed: 4 ins; 3 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13093.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13093/head:pull/13093 PR: https://git.openjdk.org/jdk/pull/13093 From kbarrett at openjdk.org Sun Mar 19 22:06:53 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Sun, 19 Mar 2023 22:06:53 GMT Subject: RFR: 8304016: Add BitMap find_last suite of functions [v2] In-Reply-To: References: Message-ID: > Please review this change that adds functions to BitMap for finding the last > set/clear bit in a range. > > Testing: > mach5 tier1, including new gtesting for the new functions. Kim Barrett has updated the pull request incrementally with two additional commits since the last revision: - shrink find_first_bit_impl - improve find_last_bit_impl ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12988/files - new: https://git.openjdk.org/jdk/pull/12988/files/c3a75dd1..b3d2aed1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12988&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12988&range=00-01 Stats: 49 lines in 1 file changed: 16 ins; 12 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/12988.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12988/head:pull/12988 PR: https://git.openjdk.org/jdk/pull/12988 From kbarrett at openjdk.org Sun Mar 19 22:09:18 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Sun, 19 Mar 2023 22:09:18 GMT Subject: RFR: 8304016: Add BitMap find_last suite of functions In-Reply-To: References: Message-ID: On Sat, 11 Mar 2023 16:46:44 GMT, Kim Barrett wrote: > Please review this change that adds functions to BitMap for finding the last > set/clear bit in a range. > > Testing: > mach5 tier1, including new gtesting for the new functions. I've improved find_first/last_bit_impl since the original PR commit. (1) In find_last_bit_impl, testing the last bit and the last word has been improved, shaving off an instruction or two (depending on platform). The previous version incorrectly expected shifts to set condition codes. (2) The previous version of find_last_bit_impl expected the compiler to CSE merge the nearly (and now actually, because of the change above) identical code for handling found bits in cword. But gcc (at least) doen't seem to do that, so we do it manually. This doesn't have any performance cost, just saves 20-30% of the (inline) code size. (3) Applied to find_first_bit_impl a similar technique to (2). The manual tail merging is a little bit indirect. It might be that it would be easier to understand by being explicit and using a goto. ------------- PR: https://git.openjdk.org/jdk/pull/12988 From fjiang at openjdk.org Mon Mar 20 00:52:29 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Mon, 20 Mar 2023 00:52:29 GMT Subject: RFR: 8304293: RISC-V: JDK-8276799 missed atomic intrinsic support for C1 In-Reply-To: <6HOmP1tPgq26gezEvauaAs0UYxGKQrXg3iOA7bNC1TQ=.857c8220-ad7b-49f7-bddc-c7c9a807664b@github.com> References: <9U2safmMfGsfIIlyBZW3E3dbm68mKcBAzemr5yqoPEw=.c261f575-2cbc-402e-b742-7a6bdc8c2450@github.com> <6HOmP1tPgq26gezEvauaAs0UYxGKQrXg3iOA7bNC1TQ=.857c8220-ad7b-49f7-bddc-c7c9a807664b@github.com> Message-ID: On Thu, 16 Mar 2023 09:35:10 GMT, Fei Yang wrote: >> The following intrinsics in C1 are controlled by supports_atomic_xxx, but they are not set properly on RISC-V: >> - _getAndAddInt >> - _getAndAddLong >> - _getAndSetInt >> - _getAndSetLong >> - _getAndSetReference >> >> RISC-V provides a set of atomic instructions [1], these intrinsics could be enabled by default. >> >> Here is the HIR output of C1: >> >> before: >> >> >> B18 (V) [189, 196] -> B20 pred: B8 B17 >> empty stack >> inlining depth 0 >> __bci__use__tid____instr____________________________________ >> 0 0 a251 >> 3 0 a252 null >> 4 0 l254 274954985816L >> 7 0 l255 1L >> . 8 0 l256 a251.invokespecial(a252, l254, l255) >> jdk/internal/misc/Unsafe.getAndAddLong(Ljava/lang/Object;JJ)J >> . 193 0 l258 a42._24 := l256 (J) tid >> . 196 0 259 goto B20 >> >> >> after: >> >> >> B18 (V) [189, 196] -> B20 pred: B8 B17 >> empty stack >> inlining depth 0 >> __bci__use__tid____instr____________________________________ >> 0 0 a251 >> 3 0 a252 null >> 4 0 l254 274954985816L >> 7 0 l255 1L >> . 8 0 l256 UnsafeGetAndSet (add)(a252, l254, value l255) >> . 193 0 l258 a42._24 := l256 (J) tid >> . 196 0 259 goto B20 >> >> >> 1. https://github.com/riscv/riscv-isa-manual/blob/8b9047d8d20ef548f7996efee1550760d7bc1279/src/a.tex#L416-L422 >> >> Testing: >> >> - [x] `hotspot_tier1` & `jdk_tier1` on Unmatched board (release build) > > Looks good. Thanks for finding this. @RealFYang @yhzhu20 -- Thanks! ------------- PR: https://git.openjdk.org/jdk/pull/13053 From fjiang at openjdk.org Mon Mar 20 00:57:27 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Mon, 20 Mar 2023 00:57:27 GMT Subject: Integrated: 8304293: RISC-V: JDK-8276799 missed atomic intrinsic support for C1 In-Reply-To: <9U2safmMfGsfIIlyBZW3E3dbm68mKcBAzemr5yqoPEw=.c261f575-2cbc-402e-b742-7a6bdc8c2450@github.com> References: <9U2safmMfGsfIIlyBZW3E3dbm68mKcBAzemr5yqoPEw=.c261f575-2cbc-402e-b742-7a6bdc8c2450@github.com> Message-ID: On Thu, 16 Mar 2023 03:37:10 GMT, Feilong Jiang wrote: > The following intrinsics in C1 are controlled by supports_atomic_xxx, but they are not set properly on RISC-V: > - _getAndAddInt > - _getAndAddLong > - _getAndSetInt > - _getAndSetLong > - _getAndSetReference > > RISC-V provides a set of atomic instructions [1], these intrinsics could be enabled by default. > > Here is the HIR output of C1: > > before: > > > B18 (V) [189, 196] -> B20 pred: B8 B17 > empty stack > inlining depth 0 > __bci__use__tid____instr____________________________________ > 0 0 a251 > 3 0 a252 null > 4 0 l254 274954985816L > 7 0 l255 1L > . 8 0 l256 a251.invokespecial(a252, l254, l255) > jdk/internal/misc/Unsafe.getAndAddLong(Ljava/lang/Object;JJ)J > . 193 0 l258 a42._24 := l256 (J) tid > . 196 0 259 goto B20 > > > after: > > > B18 (V) [189, 196] -> B20 pred: B8 B17 > empty stack > inlining depth 0 > __bci__use__tid____instr____________________________________ > 0 0 a251 > 3 0 a252 null > 4 0 l254 274954985816L > 7 0 l255 1L > . 8 0 l256 UnsafeGetAndSet (add)(a252, l254, value l255) > . 193 0 l258 a42._24 := l256 (J) tid > . 196 0 259 goto B20 > > > 1. https://github.com/riscv/riscv-isa-manual/blob/8b9047d8d20ef548f7996efee1550760d7bc1279/src/a.tex#L416-L422 > > Testing: > > - [x] `hotspot_tier1` & `jdk_tier1` on Unmatched board (release build) This pull request has now been integrated. Changeset: c09f83ec Author: Feilong Jiang Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/c09f83ec25749af349fb5609e3641b5bb6d34072 Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod 8304293: RISC-V: JDK-8276799 missed atomic intrinsic support for C1 Reviewed-by: fyang, yzhu ------------- PR: https://git.openjdk.org/jdk/pull/13053 From gcao at openjdk.org Mon Mar 20 01:47:34 2023 From: gcao at openjdk.org (Gui Cao) Date: Mon, 20 Mar 2023 01:47:34 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v8] In-Reply-To: References: Message-ID: On Fri, 17 Mar 2023 22:08:23 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Fixed aarch64 and added load-acquire for resolution check Hi, I have updated the riscv related code by referring to the latest aarch64 related changes, please help me to update it. and i tested hotsport , jdk's tier1 and no new errors were introduced https://github.com/zifeihan/jdk/commit/9c17c5b4953eebdebc6eb84b90a2ff9ca97c78c5 (on this branch: https://github.com/zifeihan/jdk/commits/12778_riscv_port) ------------- PR: https://git.openjdk.org/jdk/pull/12778 From sspitsyn at openjdk.org Mon Mar 20 07:18:24 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 20 Mar 2023 07:18:24 GMT Subject: RFR: 8304303: implement VirtualThread class notifyJvmti methods as C2 intrinsics [v4] In-Reply-To: <-OJbhkKU3EtSS8E31eEd62h3-x5Szpl_Hk0apm1a6aQ=.687c660f-bc13-41cd-bc63-c59ca60300f0@github.com> References: <-Pt3zLSu1Y2GYeM8XEivglUyDVXlAqMIA42-_zEnHlo=.7dd40f19-160a-4f11-8702-99c69a9b9923@github.com> <-kZ3wf7zOt0zABMfgibzmuT5VHuROnTA92lkqbhitbE=.fd934229-b4a6-469a-9c4b-ac9f26efd80f@github.com> <-OJbhkKU3EtSS8E31eEd62h3-x5Szpl_Hk0apm1a6aQ=.687c660f-bc13-41cd-bc63-c59ca60300f0@github.com> Message-ID: On Sat, 18 Mar 2023 11:24:47 GMT, Alan Bateman wrote: > The most important case is when there is no JVMTI env. If I read the changes correctly, the overhead for park/continue changes from one volatile-read (notifyJvmtiEvents) to two plain-writes (JavaThread::_is_in_VTMS_transition). > > If a JVMTI env has been created then there is no benefit for the moment as there is still a call into the runtime to interact with JvmtiVTMSTransitionDisabler. So I think you are saying that is for follow-on PRs. @AlanBateman Yes, your conclusion is correct. ------------- PR: https://git.openjdk.org/jdk/pull/13054 From sspitsyn at openjdk.org Mon Mar 20 07:18:27 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 20 Mar 2023 07:18:27 GMT Subject: RFR: 8304303: implement VirtualThread class notifyJvmti methods as C2 intrinsics [v4] In-Reply-To: <-kZ3wf7zOt0zABMfgibzmuT5VHuROnTA92lkqbhitbE=.fd934229-b4a6-469a-9c4b-ac9f26efd80f@github.com> References: <-Pt3zLSu1Y2GYeM8XEivglUyDVXlAqMIA42-_zEnHlo=.7dd40f19-160a-4f11-8702-99c69a9b9923@github.com> <-kZ3wf7zOt0zABMfgibzmuT5VHuROnTA92lkqbhitbE=.fd934229-b4a6-469a-9c4b-ac9f26efd80f@github.com> Message-ID: On Fri, 17 Mar 2023 10:31:46 GMT, Serguei Spitsyn wrote: >> This is needed for future performance/scalability improvements in JVMTI support of virtual threads. >> The update includes the following: >> >> 1. Refactored the `VirtualThread` native methods: >> `notifyJvmtiMountBegin` and `notifyJvmtiMountEnd` => `notifyJvmtiMount` >> `notifyJvmtiUnmountBegin` and `notifyJvmtiUnmountEnd` => `notifyJvmtiUnmount` >> 2. Still useful implementation of old native methods is moved from `jvm.cpp` to `jvmtiThreadState.cpp`: >> `JVM_VirtualThreadMountStart` => `VTMS_mount_begin` >> `JVM_VirtualThreadMountEnd` => `VTMS_mount_end` >> `JVM_VirtualThreadUnmountStart` = > `VTMS_unmount_begin` >> `JVM_VirtualThreadUnmountEnd` => `VTMS_mount_end` >> 3. Intrinsified the `VirtualThread` native methods: `notifyJvmtiMount`, `notifyJvmtiUnmount`, `notifyJvmtiHideFrames`. >> 4. Removed the`VirtualThread` static boolean state variable `notifyJvmtiEvents` and its support in `javaClasses`. >> 5. Added static boolean state variable `_VTMS_notify_jvmti_events` to the jvmtiVTMSTransitionDisabler class as a replacement of the `VirtualThread` `notifyJvmtiEvents` variable. >> >> Implementing the same methods as C1 intrinsics can be needed in the future but is a low priority for now. >> >> Testing: >> - Ran mach5 tiers 1-6. No regressions were found. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > minor tweaks in intrisics implementation Thank you for review, Leonid! ------------- PR: https://git.openjdk.org/jdk/pull/13054 From stefank at openjdk.org Mon Mar 20 09:14:24 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 20 Mar 2023 09:14:24 GMT Subject: RFR: 8304016: Add BitMap find_last suite of functions [v2] In-Reply-To: References: Message-ID: On Sun, 19 Mar 2023 22:06:53 GMT, Kim Barrett wrote: >> Please review this change that adds functions to BitMap for finding the last >> set/clear bit in a range. >> >> Testing: >> mach5 tier1, including new gtesting for the new functions. > > Kim Barrett has updated the pull request incrementally with two additional commits since the last revision: > > - shrink find_first_bit_impl > - improve find_last_bit_impl Thanks for creating this PR. As we have discussed offline, this is not my preferred code style for these functions, but I'm happy to see this functionality being upstreamed, so consider this reviewed. ------------- Marked as reviewed by stefank (Reviewer). PR: https://git.openjdk.org/jdk/pull/12988 From aboldtch at openjdk.org Mon Mar 20 09:35:28 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 20 Mar 2023 09:35:28 GMT Subject: RFR: 8304016: Add BitMap find_last suite of functions [v2] In-Reply-To: References: Message-ID: On Sun, 19 Mar 2023 22:06:53 GMT, Kim Barrett wrote: >> Please review this change that adds functions to BitMap for finding the last >> set/clear bit in a range. >> >> Testing: >> mach5 tier1, including new gtesting for the new functions. > > Kim Barrett has updated the pull request incrementally with two additional commits since the last revision: > > - shrink find_first_bit_impl > - improve find_last_bit_impl The rewrite made the algorithm more readable, imo. At some point I would like to type up the `beg` and `end` so it is clear that it is a range with `[beg, end)` ------------- Marked as reviewed by aboldtch (Committer). PR: https://git.openjdk.org/jdk/pull/12988 From matsaave at openjdk.org Mon Mar 20 14:29:35 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Mon, 20 Mar 2023 14:29:35 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v9] In-Reply-To: References: Message-ID: > The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. > > This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. > > Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. > > The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. > > This change supports the following platforms: x86, aarch64, PPC, and RISCV Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Fix riscv interpreter mistake and acquire semantics ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12778/files - new: https://git.openjdk.org/jdk/pull/12778/files/6600e6dc..8607f62a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=07-08 Stats: 18 lines in 4 files changed: 4 ins; 7 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/12778.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12778/head:pull/12778 PR: https://git.openjdk.org/jdk/pull/12778 From kvn at openjdk.org Mon Mar 20 15:46:11 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 20 Mar 2023 15:46:11 GMT Subject: RFR: 8231349: Move intrinsic stubs generation to compiler runtime initialization code Message-ID: Based on performance data (see graph in RFE) I propose to implement @cl4es suggestion to move intrinsics stubs generation to C2 (and JVMCI) runtime initialization code. It has <1% difference from not generated these stubs at all and we will not win on 1 core VMs but it is simpler and safer solution, I think. It also automatically (no need for new code) do not generate these stubs if C2 is not used (-Xint or low TieredStopAt Level. On demand stubs generation requires synchronization between threads during application run which may introduce some instability and may be other issues. But it could be beneficial for Interpreter and C1 if we want more intrinsics stubs to be used by C1 and Interpreter (they use CRC32 only now). I filed separate RFE [8304422](https://bugs.openjdk.org/browse/JDK-8304422). Changes: - Added new platform specific diagnostic flag `-XX:+MoveIntrinsicStubsGen`. It is ON by default if VM is built with C2 or JVMCI compilers except Zero and 32-bit Arm VMs which have no or few intrinsics. - Split `StubGenerator::generate_all()` method into two: `generate_final_stubs()` and `generate_compiler_stubs()`. Moved only C2 (and JVMCI) intrinsic stubs generation to new method. - I renamed methods and stubs buffer sizes according to new code. Now we have 4 separate **named** stubs buffers and corresponding methods: _Initial, Continuation, Compiler, Final_. - I added new UL printing to find new sizes for buffers and adjusted them on `aarch64` and `x86`. On other platforms I used the same as before value for `compiler_stubs` and `final_stubs`: > java -Xlog:stubs -XX:+UseCompressedOops -XX:+CheckCompressedOops -XX:+VerifyOops -XX:-VerifyStackAtCalls -version [0.006s][info][stubs] StubRoutines (initial stubs) [0x00007f94900fcc00, 0x00007f9490101b60] used: 16152, free: 4168 [0.026s][info][stubs] StubRoutines (continuation stubs) [0x00007f9490102580, 0x00007f9490102e90] used: 741, free: 1579 [0.051s][info][stubs] StubRoutines (final stubs) [0x00007f9490155600, 0x00007f949015cc70] used: 26484, free: 3836 [0.090s][info][stubs] StubRoutines (compiler stubs) [0x00007f94904ccc00, 0x00007f94904d9bd0] used: 46988, free: 6212 java version "21-internal" 2023-09-19 LTS -Xlog:stubs=debug will print size information for each stub: [0.005s][debug][stubs] ICache::flush_icache_stub [0x00007fb2d3828080, 0x00007fb2d382809d] (29 bytes) [0.005s][debug][stubs] VM_Version::get_cpu_info_stub [0x00007fb2d3828380, 0x00007fb2d3828714] (916 bytes) [0.005s][debug][stubs] VM_Version::detect_virt_stub [0x00007fb2d3828714, 0x00007fb2d382872e] (26 bytes) [0.005s][debug][stubs] StubRoutines::forward exception [0x00007fb2d3828c00, 0x00007fb2d3828c92] (146 bytes) Testing: tier1-7, Xcomp, stress on x64 and aarch64. I have changes for all platforms. Please test it on platforms you support. ------------- Commit messages: - remove trailing white space - Copyright year update. Small corrections. - 8231349: Move intrinsic stubs generation to compiler runtime initialization code Changes: https://git.openjdk.org/jdk/pull/13096/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13096&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8231349 Stats: 738 lines in 38 files changed: 386 ins; 125 del; 227 mod Patch: https://git.openjdk.org/jdk/pull/13096.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13096/head:pull/13096 PR: https://git.openjdk.org/jdk/pull/13096 From redestad at openjdk.org Mon Mar 20 15:46:16 2023 From: redestad at openjdk.org (Claes Redestad) Date: Mon, 20 Mar 2023 15:46:16 GMT Subject: RFR: 8231349: Move intrinsic stubs generation to compiler runtime initialization code In-Reply-To: References: Message-ID: On Mon, 20 Mar 2023 07:05:23 GMT, Vladimir Kozlov wrote: > Based on performance data (see graph in RFE) I propose to implement @cl4es suggestion to move intrinsics stubs generation to C2 (and JVMCI) runtime initialization code. > > It has <1% difference from not generated these stubs at all and we will not win on 1 core VMs but it is simpler and safer solution, I think. It also automatically (no need for new code) do not generate these stubs if C2 is not used (-Xint or low TieredStopAt Level. > > On demand stubs generation requires synchronization between threads during application run which may introduce some instability and may be other issues. But it could be beneficial for Interpreter and C1 if we want more intrinsics stubs to be used by C1 and Interpreter (they use CRC32 only now). I filed separate RFE [8304422](https://bugs.openjdk.org/browse/JDK-8304422). > > Changes: > - Added new platform specific diagnostic flag `-XX:+MoveIntrinsicStubsGen`. It is ON by default if VM is built with C2 or JVMCI compilers except Zero and 32-bit Arm VMs which have no or few intrinsics. > - Split `StubGenerator::generate_all()` method into two: `generate_final_stubs()` and `generate_compiler_stubs()`. Moved only C2 (and JVMCI) intrinsic stubs generation to new method. > - I renamed methods and stubs buffer sizes according to new code. Now we have 4 separate **named** stubs buffers and corresponding methods: _Initial, Continuation, Compiler, Final_. > - I added new UL printing to find new sizes for buffers and adjusted them on `aarch64` and `x86`. On other platforms I used the same as before value for `compiler_stubs` and `final_stubs`: > >> java -Xlog:stubs -XX:+UseCompressedOops -XX:+CheckCompressedOops -XX:+VerifyOops -XX:-VerifyStackAtCalls -version > [0.006s][info][stubs] StubRoutines (initial stubs) [0x00007f94900fcc00, 0x00007f9490101b60] used: 16152, free: 4168 > [0.026s][info][stubs] StubRoutines (continuation stubs) [0x00007f9490102580, 0x00007f9490102e90] used: 741, free: 1579 > [0.051s][info][stubs] StubRoutines (final stubs) [0x00007f9490155600, 0x00007f949015cc70] used: 26484, free: 3836 > [0.090s][info][stubs] StubRoutines (compiler stubs) [0x00007f94904ccc00, 0x00007f94904d9bd0] used: 46988, free: 6212 > java version "21-internal" 2023-09-19 LTS > > -Xlog:stubs=debug will print size information for each stub: > [0.005s][debug][stubs] ICache::flush_icache_stub [0x00007fb2d3828080, 0x00007fb2d382809d] (29 bytes) > [0.005s][debug][stubs] VM_Version::get_cpu_info_stub [0x00007fb2d3828380, 0x00007fb2d3828714] (916 bytes) > [0.005s][debug][stubs] VM_Version::detect_virt_stub [0x00007fb2d3828714, 0x00007fb2d382872e] (26 bytes) > [0.005s][debug][stubs] StubRoutines::forward exception [0x00007fb2d3828c00, 0x00007fb2d3828c92] (146 bytes) > > > Testing: tier1-7, Xcomp, stress on x64 and aarch64. > > I have changes for all platforms. Please test it on platforms you support. FWIW this looks good to me! Perhaps there's some improvements that can be made (see inline comments regarding the `count_positives` stub), but it might be prudent not to spend more time than necessary on this too much if anyone will be looking at https://bugs.openjdk.org/browse/JDK-8304422 soon enough. src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 8093: > 8091: > 8092: // countPositives stub for large arrays. > 8093: StubRoutines::aarch64::_count_positives = generate_count_positives(StubRoutines::aarch64::_count_positives_long); A small detail but I am pretty certain this stub is only used by C2 and could be moved to `generate_compiler_stubs`. But it opens a question if there are more stubs that look like they are shared but are really only used by C2. For historical reasons this intrinsic was implemented with a macro+stub on aarch64 but x64 et al. When doing so the macro was defined in MacroAssembler and not C2_MacroAssembler, but it is effectively only used from aarch64.ad. It might be interesting to make C1 (and possibly interpreter) use this stub when available, but if/when that happens moving it back to `generate_final_stubs` is relatively straightforward. src/hotspot/cpu/aarch64/stubRoutines_aarch64.hpp line 42: > 40: _continuation_stubs_code_size = 2000, > 41: _compiler_stubs_code_size = 30000, > 42: _final_stubs_code_size = 20000 The tricky part when updating these is knowing which set of CPU features and VM flags will generate the largest possible stubs, but it looks like you've added ample of free space with these estimates. src/hotspot/share/runtime/stubRoutines.cpp line 413: > 411: void compiler_stubs_init(bool in_compiler_thread) { > 412: if (in_compiler_thread && MoveIntrinsicStubsGen) { > 413: // Temporare revert state of stubs generation because "Temporarily" ------------- Marked as reviewed by redestad (Reviewer). PR: https://git.openjdk.org/jdk/pull/13096 From jcking at openjdk.org Mon Mar 20 16:29:40 2023 From: jcking at openjdk.org (Justin King) Date: Mon, 20 Mar 2023 16:29:40 GMT Subject: RFR: JDK-8304539: Cleanup utilities/{count_leading_zeros,count_trailing_zeros,population_count}.hpp Message-ID: As the title says, cleanup the mentioned headers. This is similar to `byteswap.hpp` and removes the extraneous `#ifdef` for XLC since it is really just Clang now. ------------- Commit messages: - Merge remote-tracking branch 'upstream/master' into popcount - Update copyright - Cleanup based on byteswap.hpp - Refactor count_leading_zeros, count_trailing_zeros, and population_count Changes: https://git.openjdk.org/jdk/pull/13103/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13103&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8304539 Stats: 413 lines in 3 files changed: 184 ins; 155 del; 74 mod Patch: https://git.openjdk.org/jdk/pull/13103.diff Fetch: git fetch https://git.openjdk.org/jdk pull/13103/head:pull/13103 PR: https://git.openjdk.org/jdk/pull/13103 From kvn at openjdk.org Mon Mar 20 17:04:48 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 20 Mar 2023 17:04:48 GMT Subject: RFR: 8231349: Move intrinsic stubs generation to compiler runtime initialization code In-Reply-To: References: Message-ID: On Mon, 20 Mar 2023 14:42:02 GMT, Claes Redestad wrote: >> Based on performance data (see graph in RFE) I propose to implement @cl4es suggestion to move intrinsics stubs generation to C2 (and JVMCI) runtime initialization code. >> >> It has <1% difference from not generated these stubs at all and we will not win on 1 core VMs but it is simpler and safer solution, I think. It also automatically (no need for new code) do not generate these stubs if C2 is not used (-Xint or low TieredStopAt Level. >> >> On demand stubs generation requires synchronization between threads during application run which may introduce some instability and may be other issues. But it could be beneficial for Interpreter and C1 if we want more intrinsics stubs to be used by C1 and Interpreter (they use CRC32 only now). I filed separate RFE [8304422](https://bugs.openjdk.org/browse/JDK-8304422). >> >> Changes: >> - Added new platform specific diagnostic flag `-XX:+MoveIntrinsicStubsGen`. It is ON by default if VM is built with C2 or JVMCI compilers except Zero and 32-bit Arm VMs which have no or few intrinsics. >> - Split `StubGenerator::generate_all()` method into two: `generate_final_stubs()` and `generate_compiler_stubs()`. Moved only C2 (and JVMCI) intrinsic stubs generation to new method. >> - I renamed methods and stubs buffer sizes according to new code. Now we have 4 separate **named** stubs buffers and corresponding methods: _Initial, Continuation, Compiler, Final_. >> - I added new UL printing to find new sizes for buffers and adjusted them on `aarch64` and `x86`. On other platforms I used the same as before value for `compiler_stubs` and `final_stubs`: >> >>> java -Xlog:stubs -XX:+UseCompressedOops -XX:+CheckCompressedOops -XX:+VerifyOops -XX:-VerifyStackAtCalls -version >> [0.006s][info][stubs] StubRoutines (initial stubs) [0x00007f94900fcc00, 0x00007f9490101b60] used: 16152, free: 4168 >> [0.026s][info][stubs] StubRoutines (continuation stubs) [0x00007f9490102580, 0x00007f9490102e90] used: 741, free: 1579 >> [0.051s][info][stubs] StubRoutines (final stubs) [0x00007f9490155600, 0x00007f949015cc70] used: 26484, free: 3836 >> [0.090s][info][stubs] StubRoutines (compiler stubs) [0x00007f94904ccc00, 0x00007f94904d9bd0] used: 46988, free: 6212 >> java version "21-internal" 2023-09-19 LTS >> >> -Xlog:stubs=debug will print size information for each stub: >> [0.005s][debug][stubs] ICache::flush_icache_stub [0x00007fb2d3828080, 0x00007fb2d382809d] (29 bytes) >> [0.005s][debug][stubs] VM_Version::get_cpu_info_stub [0x00007fb2d3828380, 0x00007fb2d3828714] (916 bytes) >> [0.005s][debug][stubs] VM_Version::detect_virt_stub [0x00007fb2d3828714, 0x00007fb2d382872e] (26 bytes) >> [0.005s][debug][stubs] StubRoutines::forward exception [0x00007fb2d3828c00, 0x00007fb2d3828c92] (146 bytes) >> >> >> Testing: tier1-7, Xcomp, stress on x64 and aarch64. >> >> I have changes for all platforms. Please test it on platforms you support. > > src/hotspot/cpu/aarch64/stubRoutines_aarch64.hpp line 42: > >> 40: _continuation_stubs_code_size = 2000, >> 41: _compiler_stubs_code_size = 30000, >> 42: _final_stubs_code_size = 20000 > > The tricky part when updating these is knowing which set of CPU features and VM flags will generate the largest possible stubs, but it looks like you've added ample of free space with these estimates. Yes, that is why I ran with `-XX:+UseCompressedOops -XX:+CheckCompressedOops -XX:+VerifyOops -XX:-VerifyStackAtCalls` flags which increase generated code size. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13096#discussion_r1142437121 From kvn at openjdk.org Mon Mar 20 17:17:13 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 20 Mar 2023 17:17:13 GMT Subject: RFR: 8231349: Move intrinsic stubs generation to compiler runtime initialization code [v2] In-Reply-To: References: Message-ID: > Based on performance data (see graph in RFE) I propose to implement @cl4es suggestion to move intrinsics stubs generation to C2 (and JVMCI) runtime initialization code. > > It has <1% difference from not generated these stubs at all and we will not win on 1 core VMs but it is simpler and safer solution, I think. It also automatically (no need for new code) do not generate these stubs if C2 is not used (-Xint or low TieredStopAt Level. > > On demand stubs generation requires synchronization between threads during application run which may introduce some instability and may be other issues. But it could be beneficial for Interpreter and C1 if we want more intrinsics stubs to be used by C1 and Interpreter (they use CRC32 only now). I filed separate RFE [8304422](https://bugs.openjdk.org/browse/JDK-8304422). > > Changes: > - Added new platform specific diagnostic flag `-XX:+MoveIntrinsicStubsGen`. It is ON by default if VM is built with C2 or JVMCI compilers except Zero and 32-bit Arm VMs which have no or few intrinsics. > - Split `StubGenerator::generate_all()` method into two: `generate_final_stubs()` and `generate_compiler_stubs()`. Moved only C2 (and JVMCI) intrinsic stubs generation to new method. > - I renamed methods and stubs buffer sizes according to new code. Now we have 4 separate **named** stubs buffers and corresponding methods: _Initial, Continuation, Compiler, Final_. > - I added new UL printing to find new sizes for buffers and adjusted them on `aarch64` and `x86`. On other platforms I used the same as before value for `compiler_stubs` and `final_stubs`: > >> java -Xlog:stubs -XX:+UseCompressedOops -XX:+CheckCompressedOops -XX:+VerifyOops -XX:-VerifyStackAtCalls -version > [0.006s][info][stubs] StubRoutines (initial stubs) [0x00007f94900fcc00, 0x00007f9490101b60] used: 16152, free: 4168 > [0.026s][info][stubs] StubRoutines (continuation stubs) [0x00007f9490102580, 0x00007f9490102e90] used: 741, free: 1579 > [0.051s][info][stubs] StubRoutines (final stubs) [0x00007f9490155600, 0x00007f949015cc70] used: 26484, free: 3836 > [0.090s][info][stubs] StubRoutines (compiler stubs) [0x00007f94904ccc00, 0x00007f94904d9bd0] used: 46988, free: 6212 > java version "21-internal" 2023-09-19 LTS > > -Xlog:stubs=debug will print size information for each stub: > [0.005s][debug][stubs] ICache::flush_icache_stub [0x00007fb2d3828080, 0x00007fb2d382809d] (29 bytes) > [0.005s][debug][stubs] VM_Version::get_cpu_info_stub [0x00007fb2d3828380, 0x00007fb2d3828714] (916 bytes) > [0.005s][debug][stubs] VM_Version::detect_virt_stub [0x00007fb2d3828714, 0x00007fb2d382872e] (26 bytes) > [0.005s][debug][stubs] StubRoutines::forward exception [0x00007fb2d3828c00, 0x00007fb2d3828c92] (146 bytes) > > > Testing: tier1-7, Xcomp, stress on x64 and aarch64. > > I have changes for all platforms. Please test it on platforms you support. Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: Address Claes comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13096/files - new: https://git.openjdk.org/jdk/pull/13096/files/ccbc8b56..fe04169d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13096&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13096&range=00-01 Stats: 7 lines in 2 files changed: 3 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13096.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13096/head:pull/13096 PR: https://git.openjdk.org/jdk/pull/13096 From stuart.monteith at arm.com Mon Mar 20 17:32:50 2023 From: stuart.monteith at arm.com (Stuart Monteith) Date: Mon, 20 Mar 2023 17:32:50 +0000 Subject: ASAN and slowdebug Message-ID: Hello, While looking at ASAN to try to reproduce another issue with ASAN, I tried building OpenJDK configured with: --enable-asan --with-debug-level=slowdebug However, there is a compilation problem that persists up to and include GCC 12.2.0: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80959 As slowdebug builds are built with -O0, this results in code to be inserted by the macros LEAVE() and CHECK_NH+CHECK_ macros to be not optimised away. This causes the following two warnings/errors: * For target hotspot_variant-server_libjvm_objs_memoryService.o: jdk/src/hotspot/share/services/memoryService.cpp: In static member function 'static Handle MemoryService::create_MemoryUsage_obj(MemoryUsage, JavaThread*)': jdk/src/hotspot/share/services/memoryService.cpp:219:1: error: control reaches end of non-void function [-Werror=return-type] * For target support_native_java.base_libjli_java.o: jdk/src/java.base/share/native/libjli/java.c: In function 'JavaMain': jdk/src/java.base/share/native/libjli/java.c:556:1: error: control reaches end of non-void function [-Werror=return-type] 556 | } | ^ There are a couple of ways of working around this. One is to change create_MemoryUsage_obj and libjli/java.c to place a return statement at the end of the function block, after the macros. The changes are straightforward, but we're obviously not guaranteed to not reintroduce the problem elsewhere. Another is to pass: "--disable-warnings-as-errors" or more specifically "--with-extra-cflags=-Wno-return-type --with-extra-cxxflags=-Wno-return-type" to configure. This would be a manual process, unless the make files were changed to handle this, however this does remove the return type checks. Part of the reason for sending this is to at least document the issue publicly. Are there any opinions on what should to be done next? Stuart From bkilambi at openjdk.org Mon Mar 20 17:37:59 2023 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 20 Mar 2023 17:37:59 GMT Subject: RFR: 8301012: [vectorapi]: Intrinsify CompressBitsV/ExpandBitsV and add the AArch64 SVE backend implementation [v2] In-Reply-To: References: Message-ID: > This patch adds mid-end compiler vector IR nodes for the scalar CompressBits and ExpandBits nodes - CompressBitsV and ExpandBitsV and also adds aarch64 backend support for these nodes using SVE2 instructions (included in the svebitperm feature). As there are direct instructions in SVE2 that map to these operations, a huge speed up in performance can be observed and it might significantly benefit all those workloads that extensively run these operations on an SVE2(with svebitperm feature) supporting machine. > > All the JTREG tests under "test/jdk/jdk/incubator/vector" pass successfully with this patch on an SVE2 machine. > The JMH tests - COMPRESS_BITS and EXPAND_BITS from [1] and [2] were run on a 128-bit vector length, SVE2 and svebitperm supporting aarch64 machine. Following are the gains observed with this patch - > > > Benchmark (length) Mode Cnt Gain > IntMaxVector.COMPRESS_BITS 1024 thrpt 15 81.68x > IntMaxVector.EXPAND_BITS 1024 thrpt 15 85.65x > LongMaxVector.COMPRESS_BITS 1024 thrpt 15 70.78x > LongMaxVector.EXPAND_BITS 1024 thrpt 15 76.31x > > > The "Gain" column is the ratio between the throughput of benchmark runs with this patch and that of benchmark runs without this patch. This patch does not change the performance of these operations for all other machines that do not support these instructions or when run on a different architecture. > With this patch, vectorization of CompressBits and ExpandBits operations happens only through vectorapi for aarch64. Autovectorization does not take place as the current JDK source does not contain aarch64 backend implementation for scalar CompressBits and ExpandBits. However, this PR - https://github.com/openjdk/jdk/pull/10537 adds aarch64 backend implementaton for CompressBits and ExpandBits and may lead to autovectorization of these nodes as well eventually but this PR is a standalone one and not dependent on the scalar implementation. > > [1] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/IntMaxVector.java > [2] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/LongMaxVector.java Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - Merge master and fix conflicts - 8301012: [vectorapi]: Intrinsify CompressBitsV/ExpandBitsV and add the AArch64 SVE backend implementation This patch adds mid-end compiler vector IR nodes for the scalar CompressBits and ExpandBits nodes - CompressBitsV and ExpandBitsV and also adds aarch64 backend support for these nodes using SVE2 instructions (included in the svebitperm feature). As there are direct instructions in SVE2 that map to these operations, a huge speed up in performance can be observed and it might significantly benefit all those workloads that extensively run these operations on an SVE2(with svebitperm feature) supporting machine. All the JTREG tests under "test/jdk/jdk/incubator/vector" pass successfully with this patch on an SVE2 machine. The JMH tests - COMPRESS_BITS and EXPAND_BITS from [1] and [2] were run on a 128-bit vector length, SVE2 and svebitperm supporting aarch64 machine. Following are the gains observed with this patch - Benchmark (length) Mode Cnt Gain IntMaxVector.COMPRESS_BITS 1024 thrpt 15 81.68x IntMaxVector.EXPAND_BITS 1024 thrpt 15 85.65x LongMaxVector.COMPRESS_BITS 1024 thrpt 15 70.78x LongMaxVector.EXPAND_BITS 1024 thrpt 15 76.31x The "Gain" column is the ratio between the throughput of benchmark runs with this patch and that of benchmark runs without this patch. This patch does not change the performance of these operations for all other machines that do not support these instructions or when run on a different architecture. With this patch, vectorization of CompressBits and ExpandBits operations happens only through vectorapi for aarch64. Autovectorization does not take place as the current JDK source does not contain aarch64 backend implementation for scalar CompressBits and ExpandBits. However, this PR - https://github.com/openjdk/jdk/pull/10537 adds aarch64 backend implementaton for CompressBits and ExpandBits and may lead to autovectorization of these nodes as well eventually but this PR is a standalone one and not dependent on the scalar implementation. [1] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/IntMaxVector.java [2] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/LongMaxVector.java ------------- Changes: https://git.openjdk.org/jdk/pull/12446/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12446&range=01 Stats: 259 lines in 9 files changed: 254 ins; 2 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/12446.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12446/head:pull/12446 PR: https://git.openjdk.org/jdk/pull/12446 From jcking at google.com Mon Mar 20 17:45:40 2023 From: jcking at google.com (Justin King) Date: Mon, 20 Mar 2023 10:45:40 -0700 Subject: ASAN and slowdebug In-Reply-To: References: Message-ID: Ew. So for `MemoryService::create_MemoryUsage_obj` it's effectively adding stuff after the return statement, causing the compiler to complain. I think for `MemoryService::create_MemoryUsage_obj` that last CHECK should probably be changed to THREAD since it's not useful and the code added after the return is unreachable. And LEAVE should probably be changed to not have the return statement wrapped in an arbitrary if statement with JNI_TRUE as the expression, unless there is a reason for that. On Mon, Mar 20, 2023 at 10:34?AM Stuart Monteith wrote: > Hello, > While looking at ASAN to try to reproduce another issue with ASAN, I > tried building OpenJDK configured with: > --enable-asan --with-debug-level=slowdebug > > However, there is a compilation problem that persists up to and include > GCC 12.2.0: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80959 > > As slowdebug builds are built with -O0, this results in code to be > inserted by the macros LEAVE() and CHECK_NH+CHECK_ > macros to be not optimised away. This causes the following two > warnings/errors: > > * For target hotspot_variant-server_libjvm_objs_memoryService.o: > jdk/src/hotspot/share/services/memoryService.cpp: In static member > function 'static Handle > MemoryService::create_MemoryUsage_obj(MemoryUsage, JavaThread*)': > jdk/src/hotspot/share/services/memoryService.cpp:219:1: error: control > reaches end of non-void function > [-Werror=return-type] > > * For target support_native_java.base_libjli_java.o: > jdk/src/java.base/share/native/libjli/java.c: In function 'JavaMain': > jdk/src/java.base/share/native/libjli/java.c:556:1: error: control reaches > end of non-void function [-Werror=return-type] > 556 | } > | ^ > > > There are a couple of ways of working around this. One is to change > create_MemoryUsage_obj and libjli/java.c to place a > return statement at the end of the function block, after the macros. The > changes are straightforward, but we're > obviously not guaranteed to not reintroduce the problem elsewhere. > > Another is to pass: "--disable-warnings-as-errors" or more specifically > "--with-extra-cflags=-Wno-return-type > --with-extra-cxxflags=-Wno-return-type" to configure. This would be a > manual process, unless the make files were changed > to handle this, however this does remove the return type checks. > > > Part of the reason for sending this is to at least document the issue > publicly. Are there any opinions on what should to > be done next? > > > Stuart > > -- [image: Google Logo] Justin King Software Engineer jcking at google.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3999 bytes Desc: S/MIME Cryptographic Signature URL: From rrich at openjdk.org Mon Mar 20 17:47:49 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 20 Mar 2023 17:47:49 GMT Subject: RFR: 8296440: Remove Method* handling from cleanup_inline_caches_impl [v2] In-Reply-To: References: Message-ID: > This PR replaces cleaning of static stubs in CompiledMethod::cleanup_inline_caches_impl() with a guarantee that it is actually not needed because the holder of the embedded target Method* is alive if the caller nmethod is not unloading. > > The holder of the target Method* has to be alive because it is reachable from the caller nmethod's oop pool. This is checked by `check_path_to_callee()` when a statically bound call gets resolved. > > C2i entry barriers can be removed for the same reason. > > Testing: > > Many rounds in our CI testing which includes most JCK and JTREG tests, Renaissance benchmark and SAP specific tests with fastdebug and release builds on the standard platforms plus PPC64. > > I've also done tier1 and tier2 tests with -XX:-Inline and tier1 tests with ZGC. > > I've started hotspot and jdk tier1 tests with -Xcomp. They were not finished when I stopped them after 24h. Richard Reingruber has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: - Adding TestStaticallyBoundTargetIsReachable.java - Path to target exists also if the receiver is a constant of the caller - Use nmethod::oops_do() to search for to_holder in from_nm - Merge branch 'master' - Remove MacroAssembler::resolve_weak_handle() - Remove keep_alive_offset() and holder_offset() from CLD - Remove MacroAssembler::load_method_holder_cld() - Remove c2i entry barrier - Check dependency for statically bound call ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12802/files - new: https://git.openjdk.org/jdk/pull/12802/files/8f27bd37..876e55dc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12802&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12802&range=00-01 Stats: 166452 lines in 2166 files changed: 119994 ins; 27292 del; 19166 mod Patch: https://git.openjdk.org/jdk/pull/12802.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12802/head:pull/12802 PR: https://git.openjdk.org/jdk/pull/12802 From duke at openjdk.org Mon Mar 20 18:12:45 2023 From: duke at openjdk.org (ExE Boss) Date: Mon, 20 Mar 2023 18:12:45 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) In-Reply-To: References: Message-ID: On Fri, 17 Mar 2023 15:42:56 GMT, Per Minborg wrote: > API changes for the FFM API (third preview) > > Specdiff: > https://cr.openjdk.org/~pminborg/panama/21/v1/specdiff/overview-summary.html > > Javadoc: > https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/module-summary.html src/java.base/share/classes/java/lang/foreign/AddressLayout.java line 59: > 57: */ > 58: @PreviewFeature(feature = PreviewFeature.Feature.FOREIGN) > 59: sealed public interface AddressLayout extends ValueLayout permits ValueLayouts.OfAddressImpl { This should?match other sealed?interfaces: Suggestion: public sealed interface AddressLayout extends ValueLayout permits ValueLayouts.OfAddressImpl { src/java.base/share/classes/java/lang/foreign/Linker.java line 578: > 576: * Execution state is captured by a downcall method handle on invocation, by writing it > 577: * to a native segment provided by the user to the downcall method handle. > 578: * For this purpose, a downcall method handle linked with the this Suggestion: * For this purpose, a downcall method handle linked with this ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13079#discussion_r1140677384 PR Review Comment: https://git.openjdk.org/jdk/pull/13079#discussion_r1140679387 From pminborg at openjdk.org Mon Mar 20 18:12:39 2023 From: pminborg at openjdk.org (Per Minborg) Date: Mon, 20 Mar 2023 18:12:39 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) Message-ID: API changes for the FFM API (third preview) Specdiff: https://cr.openjdk.org/~pminborg/panama/21/v1/specdiff/overview-summary.html Javadoc: https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/module-summary.html ------------- Commit messages: - Update after first round of comments - Remove extra line - Remove Panama-specific content - Generate delta Changes: https://git.openjdk.org/jdk/pull/13079/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13079&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8304265 Stats: 13601 lines in 269 files changed: 5644 ins; 5999 del; 1958 mod Patch: https://git.openjdk.org/jdk/pull/13079.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13079/head:pull/13079 PR: https://git.openjdk.org/jdk/pull/13079 From mcimadamore at openjdk.org Mon Mar 20 18:12:41 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 20 Mar 2023 18:12:41 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) In-Reply-To: References: Message-ID: On Fri, 17 Mar 2023 15:42:56 GMT, Per Minborg wrote: > API changes for the FFM API (third preview) > > Specdiff: > https://cr.openjdk.org/~pminborg/panama/21/v1/specdiff/overview-summary.html > > Javadoc: > https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/module-summary.html Here are the main API changes introduced in this round: * `SegmentScope` has been simplified into a pure lifetime abstraction and moved into a nested class `MemorySegment.Scope`. All segments have a scope (e.g. the segment lifetime), which is usually the scope of some `Arena`. All the factory methods in `SegmentScope` to create non-closeable lifetimes have been moved to `Arena` (e.g. `Arena.ofAuto` and `Arena.global`). This leads to a simplified API, which still allows to build custom arenas via simple delegation, but, at the same time, allows clients to use arenas with minimal indirections (e.g. `arena.scope()` is no longer needed in many places). Some factory names in `Arena` were also updated (e.g. from `openConfined` to `ofConfined`). * `ValueLayout::OfAddress` has been moved to a toplevel class `AddressLayout`. Also, the method to obtain an address layout of unbounded size (`OfAddress::asUnbounded`) has been changed, so that it now takes the layout of the target region of memory pointed to by the address (`AddressLayout::withTargetLayout`). * A new *layout path* is provided to dereference an address layout. This allows memory segment var handle to deal with complex dereference expressions like `*(a[10].x).y`. * A new linker implementation, namely the *fallback linker* has been added. This linker is based on `libffi` and provides a very easy way to add support for `Linker` API, even in platforms that have limited functionalities (such as [zero](https://openjdk.org/projects/zero/)). * The `VaList` interface has been dropped. Unfortunately, the behavior of `va_list` is hopelessly platform specific, and we could also never make full use of it in the `jextract` tool, given that parsing support `va_list` is very limited in `libclang`. * The API for unsafely attaching spatial/temporal bounds to an unsafe memory segment has been improved and streamlined. The `MemorySegment::ofAddress` method is now a single, unrestricted method which turns a long address into a native memory segment whose base address is the provided address. The returned segment has a scope that is always alive, and has zero-length. To resize, or add new temporal bounds to an existing segments, clients can use the new `MemorySegment::reinterpret` methods. The logic for attaching a cleanup action to a memory segment has also been updated: now the cleanup action will take as input a shallow copy of the memory segment, in a scope that is always alive, so that clients can pass that copy to other functions in order to perform custom cleanup. * We have made some changes and simplfications to the way in which runtime values such as `errno` are preserved, The `CapturedCallState` interface has been removed. Instead, there is a way to obtain a group layout of all the values that can be saved, given the platform in which the linker runs. Clients can query the layout, e.g. obtaining names for the values to be saved, and then create a linker option which lists all the name of the values to be saved. * We have added support for *trivial* (or *leaf*) calls - that is native calls whose execution completes very quickly. This option might be useful when calling functions whose total execution time is comparable to that of the overhead of the change of the thread state from Java to native (in JNI, such calls are handled using *critical JNI*). ------------- PR Comment: https://git.openjdk.org/jdk/pull/13079#issuecomment-1476648707 From jcking at openjdk.org Mon Mar 20 18:17:12 2023 From: jcking at openjdk.org (Justin King) Date: Mon, 20 Mar 2023 18:17:12 GMT Subject: RFR: JDK-8304539: Cleanup utilities/{count_leading_zeros, count_trailing_zeros, population_count}.hpp [v2] In-Reply-To: References: Message-ID: > As the title says, cleanup the mentioned headers. This is similar to `byteswap.hpp` and removes the extraneous `#ifdef` for XLC since it is really just Clang now. Justin King has updated the pull request incrementally with one additional commit since the last revision: Fix ambiguous call Signed-off-by: Justin King ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13103/files - new: https://git.openjdk.org/jdk/pull/13103/files/52859de3..729b0c7a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13103&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13103&range=00-01 Stats: 26 lines in 3 files changed: 12 ins; 6 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/13103.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13103/head:pull/13103 PR: https://git.openjdk.org/jdk/pull/13103 From rrich at openjdk.org Mon Mar 20 18:20:03 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 20 Mar 2023 18:20:03 GMT Subject: RFR: 8296440: Remove Method* handling from cleanup_inline_caches_impl In-Reply-To: <8C06DfuEkBnOaaP1WFv7Y6TRftP9XMitffS3UbOaf8c=.abb8c6e3-7a3b-4412-89b3-8819bd93900e@github.com> References: <8C06DfuEkBnOaaP1WFv7Y6TRftP9XMitffS3UbOaf8c=.abb8c6e3-7a3b-4412-89b3-8819bd93900e@github.com> Message-ID: On Thu, 16 Mar 2023 00:17:27 GMT, Vladimir Kozlov wrote: > I hit guarantee in tier7 running our closed tests with JRuby and `-Xcomp -ea -esa -XX:CompileThreshold=100` flags: > > ``` > # Internal Error (/workspace/open/src/hotspot/share/runtime/sharedRuntime.cpp:1411), pid=1150904, tid=1150907 > # guarantee(false) failed: Missing dependency resolving optimized virtual (invokeinterface) call to jnr.enxio.channels.Native$LibC$jnr$ffi$2::read > ``` > > Unfortunately I can't share tests. Thanks again for the testing. The guarantee fails for the read call at https://github.com/jnr/jnr-enxio/blob/4e21f3a341c2cb7e8007b7bebfb964182d36318d/src/main/java/jnr/enxio/channels/Native.java#L120. It is an invokeinterface with a [constant receiver](https://github.com/jnr/jnr-enxio/blob/4e21f3a341c2cb7e8007b7bebfb964182d36318d/src/main/java/jnr/enxio/channels/Native.java#L67). I've implemented and added a test. The guarantee fails when CHA based optimization is not possible. Then no dependencies are emitted but the call is still optimized based on the constness of the receiver. Btw wouldn't it be better to skip the CHA attempt in order to avoid emitting dependencies? It might be even possible that the caller gets deoptimized because of them. I've adapted the verification code. If the receiver is found among the constants then we know a permanent path to the target exists even if the receiver is not constant as the target does not depend on the receiver. My reproducer `./bin/jruby -J-Xcomp -J-ea -J-esa -J-XX:CompileThreshold=100 -S rake spec:ruby:fast` does not hit the guarantee with the changes anymore. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12802#issuecomment-1476722228 From jcking at openjdk.org Mon Mar 20 18:26:00 2023 From: jcking at openjdk.org (Justin King) Date: Mon, 20 Mar 2023 18:26:00 GMT Subject: RFR: JDK-8304539: Cleanup utilities/{count_leading_zeros, count_trailing_zeros, population_count}.hpp [v3] In-Reply-To: References: Message-ID: > As the title says, cleanup the mentioned headers. This is similar to `byteswap.hpp` and removes the extraneous `#ifdef` for XLC since it is really just Clang now. Justin King has updated the pull request incrementally with one additional commit since the last revision: Remove unnecessary templating from count_trailing_zeros Signed-off-by: Justin King ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13103/files - new: https://git.openjdk.org/jdk/pull/13103/files/729b0c7a..e494027f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13103&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13103&range=01-02 Stats: 11 lines in 1 file changed: 0 ins; 3 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/13103.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13103/head:pull/13103 PR: https://git.openjdk.org/jdk/pull/13103 From jcking at openjdk.org Mon Mar 20 18:33:42 2023 From: jcking at openjdk.org (Justin King) Date: Mon, 20 Mar 2023 18:33:42 GMT Subject: RFR: JDK-8304539: Cleanup utilities/{count_leading_zeros, count_trailing_zeros, population_count}.hpp [v4] In-Reply-To: References: Message-ID: > As the title says, cleanup the mentioned headers. This is similar to `byteswap.hpp` and removes the extraneous `#ifdef` for XLC since it is really just Clang now. Justin King has updated the pull request incrementally with one additional commit since the last revision: Redo non-templated approach Signed-off-by: Justin King ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13103/files - new: https://git.openjdk.org/jdk/pull/13103/files/e494027f..c1f4bd9a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13103&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13103&range=02-03 Stats: 46 lines in 1 file changed: 6 ins; 6 del; 34 mod Patch: https://git.openjdk.org/jdk/pull/13103.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13103/head:pull/13103 PR: https://git.openjdk.org/jdk/pull/13103 From coleenp at openjdk.org Mon Mar 20 19:04:18 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 20 Mar 2023 19:04:18 GMT Subject: RFR: 8296440: Remove Method* handling from cleanup_inline_caches_impl [v2] In-Reply-To: References: Message-ID: On Mon, 20 Mar 2023 17:47:49 GMT, Richard Reingruber wrote: >> This PR replaces cleaning of static stubs in CompiledMethod::cleanup_inline_caches_impl() with a guarantee that it is actually not needed because the holder of the embedded target Method* is alive if the caller nmethod is not unloading. >> >> The holder of the target Method* has to be alive because it is reachable from the caller nmethod's oop pool. This is checked by `check_path_to_callee()` when a statically bound call gets resolved. >> >> C2i entry barriers can be removed for the same reason. >> >> Testing: >> >> Many rounds in our CI testing which includes most JCK and JTREG tests, Renaissance benchmark and SAP specific tests with fastdebug and release builds on the standard platforms plus PPC64. >> >> I've also done tier1 and tier2 tests with -XX:-Inline and tier1 tests with ZGC. >> >> I've started hotspot and jdk tier1 tests with -Xcomp. They were not finished when I stopped them after 24h. > > Richard Reingruber has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: > > - Adding TestStaticallyBoundTargetIsReachable.java > - Path to target exists also if the receiver is a constant of the caller > - Use nmethod::oops_do() to search for to_holder in from_nm > - Merge branch 'master' > - Remove MacroAssembler::resolve_weak_handle() > - Remove keep_alive_offset() and holder_offset() from CLD > - Remove MacroAssembler::load_method_holder_cld() > - Remove c2i entry barrier > - Check dependency for statically bound call I don't really know if this is right but some comments anyway. src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 4308: > 4306: void MacroAssembler::load_method_holder_cld(Register rresult, Register rmethod) { > 4307: load_method_holder(rresult, rmethod); > 4308: ldr(rresult, Address(rresult, InstanceKlass::class_loader_data_offset())); Can you remove InstanceKlass::klass_loader_data_offset() also? src/hotspot/share/classfile/classLoaderData.cpp line 791: > 789: > 790: bool ClassLoaderData::handles_contain(oop obj) { > 791: return _handles.contains(obj); This might need to be protected by a metaspace_lock. src/hotspot/share/runtime/sharedRuntime.cpp line 1354: > 1352: } > 1353: > 1354: class Search2OopsClosure : public OopClosure { Should this all be under #ifdef ASSERT? src/hotspot/share/runtime/sharedRuntime.cpp line 1418: > 1416: return; // `to` is reachable by iterating parents of `from` > 1417: } > 1418: } I'd be happier if this part was a function in ClassLoaderData or refactored from record_dependency (along with the constains function). Since it's similar code. ------------- PR Review: https://git.openjdk.org/jdk/pull/12802#pullrequestreview-1349193798 PR Review Comment: https://git.openjdk.org/jdk/pull/12802#discussion_r1142542383 PR Review Comment: https://git.openjdk.org/jdk/pull/12802#discussion_r1142553804 PR Review Comment: https://git.openjdk.org/jdk/pull/12802#discussion_r1142560411 PR Review Comment: https://git.openjdk.org/jdk/pull/12802#discussion_r1142563560 From jcking at openjdk.org Mon Mar 20 19:05:59 2023 From: jcking at openjdk.org (Justin King) Date: Mon, 20 Mar 2023 19:05:59 GMT Subject: RFR: JDK-8304539: Cleanup utilities/{count_leading_zeros, count_trailing_zeros, population_count}.hpp [v5] In-Reply-To: References: Message-ID: > As the title says, cleanup the mentioned headers. This is similar to `byteswap.hpp` and removes the extraneous `#ifdef` for XLC since it is really just Clang now. Justin King has updated the pull request incrementally with one additional commit since the last revision: Remove intrinsic specifier for CountOneBits Signed-off-by: Justin King ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13103/files - new: https://git.openjdk.org/jdk/pull/13103/files/c1f4bd9a..024a6ef7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13103&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13103&range=03-04 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13103.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13103/head:pull/13103 PR: https://git.openjdk.org/jdk/pull/13103 From jcking at openjdk.org Mon Mar 20 19:15:46 2023 From: jcking at openjdk.org (Justin King) Date: Mon, 20 Mar 2023 19:15:46 GMT Subject: RFR: JDK-8304539: Cleanup utilities/{count_leading_zeros, count_trailing_zeros, population_count}.hpp [v6] In-Reply-To: References: Message-ID: > As the title says, cleanup the mentioned headers. This is similar to `byteswap.hpp` and removes the extraneous `#ifdef` for XLC since it is really just Clang now. Justin King has updated the pull request incrementally with one additional commit since the last revision: Go back to templating for count_trailing_zeros for consistency Signed-off-by: Justin King ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13103/files - new: https://git.openjdk.org/jdk/pull/13103/files/024a6ef7..1ee4120a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13103&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13103&range=04-05 Stats: 71 lines in 1 file changed: 29 ins; 15 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/13103.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13103/head:pull/13103 PR: https://git.openjdk.org/jdk/pull/13103 From cslucas at openjdk.org Mon Mar 20 19:23:34 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 20 Mar 2023 19:23:34 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v4] In-Reply-To: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: <6NDwZSpjSrokmglncPRp4tM7_Hiq4b26dXukhXODpKo=.8ba7efd0-bc44-4f1e-beb8-c1c68bc33515@github.com> > Can I please get reviews for this PR? > > The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. > > With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: > > ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) > > What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: > > ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) > > This PR adds support scalar replacing allocations participating in merges that are used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. > > The approach I used for _rematerialization_ is pretty straightforward. It consists basically in: 1) Extend SafePointScalarObjectNode to represent multiple SR objects; 2) Add a new Class to support rematerialization of SR objects part of merges; 3) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 4) Patch C2 to generate unique types for SR objects participating in some allocation merges. > > The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straight forward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. > > I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also tested with several applications and didn't see any failure. I also run tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Add support for SR'ing some inputs of merges used for field loads ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12897/files - new: https://git.openjdk.org/jdk/pull/12897/files/3b492d2e..a158ae66 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12897&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12897&range=02-03 Stats: 481 lines in 9 files changed: 292 ins; 117 del; 72 mod Patch: https://git.openjdk.org/jdk/pull/12897.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12897/head:pull/12897 PR: https://git.openjdk.org/jdk/pull/12897 From kbarrett at openjdk.org Mon Mar 20 19:28:34 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 20 Mar 2023 19:28:34 GMT Subject: RFR: 8304016: Add BitMap find_last suite of functions [v3] In-Reply-To: References: Message-ID: > Please review this change that adds functions to BitMap for finding the last > set/clear bit in a range. > > Testing: > mach5 tier1, including new gtesting for the new functions. Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into find-last - shrink find_first_bit_impl - improve find_last_bit_impl - find_last_set/clear_bit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12988/files - new: https://git.openjdk.org/jdk/pull/12988/files/b3d2aed1..fcfebb2a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12988&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12988&range=01-02 Stats: 78648 lines in 898 files changed: 48978 ins; 20413 del; 9257 mod Patch: https://git.openjdk.org/jdk/pull/12988.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12988/head:pull/12988 PR: https://git.openjdk.org/jdk/pull/12988 From kbarrett at openjdk.org Mon Mar 20 19:28:36 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 20 Mar 2023 19:28:36 GMT Subject: RFR: 8304016: Add BitMap find_last suite of functions [v2] In-Reply-To: References: Message-ID: On Mon, 20 Mar 2023 09:11:57 GMT, Stefan Karlsson wrote: >> Kim Barrett has updated the pull request incrementally with two additional commits since the last revision: >> >> - shrink find_first_bit_impl >> - improve find_last_bit_impl > > Thanks for creating this PR. As we have discussed offline, this is not my preferred code style for these functions, but I'm happy to see this functionality being upstreamed, so consider this reviewed. Thanks for reviews @stefank and @xmas92 ------------- PR Comment: https://git.openjdk.org/jdk/pull/12988#issuecomment-1476810717 From kbarrett at openjdk.org Mon Mar 20 19:28:37 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 20 Mar 2023 19:28:37 GMT Subject: Integrated: 8304016: Add BitMap find_last suite of functions In-Reply-To: References: Message-ID: On Sat, 11 Mar 2023 16:46:44 GMT, Kim Barrett wrote: > Please review this change that adds functions to BitMap for finding the last > set/clear bit in a range. > > Testing: > mach5 tier1, including new gtesting for the new functions. This pull request has now been integrated. Changeset: 2d0d057d Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/2d0d057d6691d4abe4ca1ef44b29f03043323b67 Stats: 232 lines in 3 files changed: 168 ins; 28 del; 36 mod 8304016: Add BitMap find_last suite of functions Reviewed-by: stefank, aboldtch ------------- PR: https://git.openjdk.org/jdk/pull/12988 From erikj at openjdk.org Mon Mar 20 19:32:31 2023 From: erikj at openjdk.org (Erik Joelsson) Date: Mon, 20 Mar 2023 19:32:31 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) In-Reply-To: References: Message-ID: On Fri, 17 Mar 2023 15:42:56 GMT, Per Minborg wrote: > API changes for the FFM API (third preview) > > Specdiff: > https://cr.openjdk.org/~pminborg/panama/21/v1/specdiff/overview-summary.html > > Javadoc: > https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/module-summary.html Build changes look ok. ------------- Marked as reviewed by erikj (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13079#pullrequestreview-1349265135 From jcking at openjdk.org Mon Mar 20 19:40:23 2023 From: jcking at openjdk.org (Justin King) Date: Mon, 20 Mar 2023 19:40:23 GMT Subject: RFR: JDK-8304539: Cleanup utilities/{count_leading_zeros, count_trailing_zeros, population_count}.hpp [v7] In-Reply-To: References: Message-ID: > As the title says, cleanup the mentioned headers. This is similar to `byteswap.hpp` and removes the extraneous `#ifdef` for XLC since it is really just Clang now. Justin King has updated the pull request incrementally with one additional commit since the last revision: Add missing #endif Signed-off-by: Justin King ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13103/files - new: https://git.openjdk.org/jdk/pull/13103/files/1ee4120a..8b36d016 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13103&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13103&range=05-06 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13103.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13103/head:pull/13103 PR: https://git.openjdk.org/jdk/pull/13103 From sspitsyn at openjdk.org Mon Mar 20 19:58:51 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 20 Mar 2023 19:58:51 GMT Subject: Integrated: 8304303: implement VirtualThread class notifyJvmti methods as C2 intrinsics In-Reply-To: <-Pt3zLSu1Y2GYeM8XEivglUyDVXlAqMIA42-_zEnHlo=.7dd40f19-160a-4f11-8702-99c69a9b9923@github.com> References: <-Pt3zLSu1Y2GYeM8XEivglUyDVXlAqMIA42-_zEnHlo=.7dd40f19-160a-4f11-8702-99c69a9b9923@github.com> Message-ID: <2QSQ5C7cdI-KgoFNa3aLqdYQQbLMY7P3qrKvVMsN86I=.503493f9-d9d0-4bff-b8a0-f95f82d412bb@github.com> On Thu, 16 Mar 2023 05:03:51 GMT, Serguei Spitsyn wrote: > This is needed for future performance/scalability improvements in JVMTI support of virtual threads. > The update includes the following: > > 1. Refactored the `VirtualThread` native methods: > `notifyJvmtiMountBegin` and `notifyJvmtiMountEnd` => `notifyJvmtiMount` > `notifyJvmtiUnmountBegin` and `notifyJvmtiUnmountEnd` => `notifyJvmtiUnmount` > 2. Still useful implementation of old native methods is moved from `jvm.cpp` to `jvmtiThreadState.cpp`: > `JVM_VirtualThreadMountStart` => `VTMS_mount_begin` > `JVM_VirtualThreadMountEnd` => `VTMS_mount_end` > `JVM_VirtualThreadUnmountStart` = > `VTMS_unmount_begin` > `JVM_VirtualThreadUnmountEnd` => `VTMS_mount_end` > 3. Intrinsified the `VirtualThread` native methods: `notifyJvmtiMount`, `notifyJvmtiUnmount`, `notifyJvmtiHideFrames`. > 4. Removed the`VirtualThread` static boolean state variable `notifyJvmtiEvents` and its support in `javaClasses`. > 5. Added static boolean state variable `_VTMS_notify_jvmti_events` to the jvmtiVTMSTransitionDisabler class as a replacement of the `VirtualThread` `notifyJvmtiEvents` variable. > > Implementing the same methods as C1 intrinsics can be needed in the future but is a low priority for now. > > Testing: > - Ran mach5 tiers 1-6. No regressions were found. This pull request has now been integrated. Changeset: bc0ed730 Author: Serguei Spitsyn URL: https://git.openjdk.org/jdk/commit/bc0ed730f2c9dad55d0046b4fe8c9cd623b6dbf8 Stats: 449 lines in 20 files changed: 276 ins; 125 del; 48 mod 8304303: implement VirtualThread class notifyJvmti methods as C2 intrinsics Reviewed-by: vlivanov, lmesnik ------------- PR: https://git.openjdk.org/jdk/pull/13054 From jcking at openjdk.org Mon Mar 20 20:55:21 2023 From: jcking at openjdk.org (Justin King) Date: Mon, 20 Mar 2023 20:55:21 GMT Subject: RFR: JDK-8304539: Cleanup utilities/{count_leading_zeros, count_trailing_zeros, population_count}.hpp [v8] In-Reply-To: References: Message-ID: > As the title says, cleanup the mentioned headers. This is similar to `byteswap.hpp` and removes the extraneous `#ifdef` for XLC since it is really just Clang now. Justin King has updated the pull request incrementally with one additional commit since the last revision: Add default template implementation Signed-off-by: Justin King ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13103/files - new: https://git.openjdk.org/jdk/pull/13103/files/8b36d016..01ffca24 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13103&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13103&range=06-07 Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13103.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13103/head:pull/13103 PR: https://git.openjdk.org/jdk/pull/13103 From jcking at openjdk.org Mon Mar 20 21:01:30 2023 From: jcking at openjdk.org (Justin King) Date: Mon, 20 Mar 2023 21:01:30 GMT Subject: RFR: JDK-8304539: Cleanup utilities/{count_leading_zeros, count_trailing_zeros, population_count}.hpp [v9] In-Reply-To: References: Message-ID: > As the title says, cleanup the mentioned headers. This is similar to `byteswap.hpp` and removes the extraneous `#ifdef` for XLC since it is really just Clang now. Justin King has updated the pull request incrementally with one additional commit since the last revision: Move template declaration Signed-off-by: Justin King ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13103/files - new: https://git.openjdk.org/jdk/pull/13103/files/01ffca24..fc392637 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13103&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13103&range=07-08 Stats: 30 lines in 1 file changed: 15 ins; 15 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13103.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13103/head:pull/13103 PR: https://git.openjdk.org/jdk/pull/13103 From psandoz at openjdk.org Mon Mar 20 23:33:41 2023 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 20 Mar 2023 23:33:41 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) In-Reply-To: References: Message-ID: On Fri, 17 Mar 2023 15:42:56 GMT, Per Minborg wrote: > API changes for the FFM API (third preview) > > Specdiff: > https://cr.openjdk.org/~pminborg/panama/21/v1/specdiff/overview-summary.html > > Javadoc: > https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/module-summary.html src/java.base/share/classes/java/lang/foreign/AddressLayout.java line 93: > 91: * @apiNote > 92: * This method can also be used to create an address layout which, when used, creates native memory > 93: * segments with maximal size (e.g. {@linkplain Long#MAX_VALUE}. This can be done by using a target sequence Suggestion: * segments with maximal size (e.g. {@linkplain Long#MAX_VALUE}). This can be done by using a target sequence ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13079#discussion_r1142769618 From psandoz at openjdk.org Tue Mar 21 00:38:49 2023 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 21 Mar 2023 00:38:49 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) In-Reply-To: References: Message-ID: On Fri, 17 Mar 2023 15:42:56 GMT, Per Minborg wrote: > API changes for the FFM API (third preview) > > Specdiff: > https://cr.openjdk.org/~pminborg/panama/21/v1/specdiff/overview-summary.html > > Javadoc: > https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/module-summary.html src/java.base/share/classes/java/lang/foreign/Linker.java line 479: > 477: * Otherwise, the invocation throws {@link WrongThreadException}; and > 478: *
  • {@code A} is kept alive during the invocation. For instance, if {@code A} has been obtained using a > 479: * {@linkplain Arena#ofConfined() confined arena}, any attempt to {@linkplain Arena#close() close} Is that correct? Do you mean a shared arena? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13079#discussion_r1142798464 From sspitsyn at openjdk.org Tue Mar 21 00:56:51 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 21 Mar 2023 00:56:51 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v10] In-Reply-To: References: Message-ID: On Fri, 10 Mar 2023 10:43:23 GMT, Markus Gr?nlund wrote: >> Greetings, >> >> We are adding support to let JFR report on Agents. >> >> #### Design >> >> An Agent is a library that uses any instrumentation or profiling APIs. Most agents are started and initialized on the command line, but agents can also be loaded dynamically during runtime. Because command line agents initialize during the VM startup sequence, they add to the overall startup time latency in getting the VM ready. The events will report on the time the agent took to initialize. >> >> A JavaAgent is an agent written in the Java programming language, using the APIs in the package [java.lang.instrument](https://docs.oracle.com/en/java/javase/19/docs/api/java.instrument/java/lang/instrument/package-summary.html) >> >> A JavaAgent is sometimes called a JPLIS agent, where the acronym JPLIS stands for Java Programming Language Instrumentation Services. >> >> To report on JavaAgents, JFR will add the new event type jdk.JavaAgent and events will look similar to these two examples: >> >> // Command line >> jdk.JavaAgent { >> startTime = 12:31:19.789 (2023-03-08) >> name = "JavaAgent.jar" >> options = "foo=bar" >> dynamic = false >> initialization = 12:31:15.574 (2023-03-08) >> initializationTime = 172 ms >> } >> >> // Dynamic load >> jdk.JavaAgent { >> startTime = 12:31:31.158 (2023-03-08) >> name = "JavaAgent.jar" >> options = "bar=baz" >> dynamic = true >> initialization = 12:31:31.037 (2023-03-08) >> initializationTime = 64,1 ms >> } >> >> The jdk.JavaAgent event type is a JFR periodic event that iterates over running Java agents. >> >> For a JavaAgent event, the agent's name will be the specific .jar file containing the instrumentation code. The options will be the specific options passed to the .jar file as part of launching the agent, for example, on the command line: -javaagent: JavaAgent.jar=foo=bar. >> >> The "dynamic" field denotes if the agent was loaded via the command line (dynamic = false) or dynamically (dynamic = true) >> >> "initialization" is the timestamp the JVM invoked the initialization method, and "initializationTime" is the duration of executing the initialization method. >> >> "startTime" represents the time the JFR framework issued the periodic event; hence "initialization" will be earlier than "startTime". >> >> An agent can also be written in a native programming language using the [JVM Tools Interface (JVMTI)](https://docs.oracle.com/en/java/javase/19/docs/specs/jvmti.html). This kind of agent, sometimes called a native agent, is a platform-specific binary, sometimes referred to as a library, but here it means a .so or .dll file. >> >> To report on native agents, JFR will add the new event type jdk.NativeAgent and events will look similar to this example: >> >> jdk.NativeAgent { >> startTime = 12:31:40.398 (2023-03-08) >> name = "jdwp" >> options = "transport=dt_socket,server=y,address=any,onjcmd=y" >> dynamic = false >> initialization = 12:31:36.142 (2023-03-08) >> initializationTime = 0,00184 ms >> path = "c:\ade\github\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\jdwp.dll" >> } >> >> The layout of the event type is very similar to the jdk.JavaAgent event, but here the path to the native library is reported. >> >> The initialization of a native agent is performed by invoking an agent-specified callback routine. The "initialization" is when the JVM sent or would have sent the JVMTI VMInit event to a specified callback. "initializationTime" is the duration to execute that specific callback. If no callback is specified for the JVMTI VMInit event, the "initializationTime" will be 0. >> >> #### Implementation >> >> There has not existed a reification of a JavaAgent directly in the JVM, as these are built on top of the JDK native library, "instrument", using a many-to-one mapping. At the level of the JVM, the only representation of agents after startup is through JvmtiEnv's, which agents request from the JVM during startup and initialization ? as such, mapping which JvmtiEnv belongs to what JavaAgent was not possible before. >> >> Using implementation details of how the JDK native library "instrument" interacts with the JVM, we can build this mapping to track what JvmtiEnv's "belong" to what JavaAgent. This mapping now lets us report the Java-relevant context (name, options) and measure the time it takes for the JavaAgent to initialize. >> >> When implementing this capability, it was necessary to refactor the code used to represent agents, AgentLibrary. The previous implementation was located primarily in arguments.cpp, and threads.cpp but also jvmtiExport.cpp. >> >> The refactoring isolates the relevant logic into two new modules, prims/agent.hpp and prims/agentList.hpp. Breaking out this code from their older places will help reduce the sizes of oversized arguments.cpp and threads.cpp. >> >> The previous two lists that maintained "agents" (JVMTI) and "libraries" (Xrun) were not thread-safe for concurrent iterations. A single list that allows for concurrent iterations is therefore introduced. >> >> Testing: jdk_jfr, tier 1 - 6 >> >> Thanks >> Markus > > Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: > > more cleanup src/hotspot/share/prims/jvmtiEnvBase.hpp line 166: > 164: > 165: const void* get_env_local_storage() { return _env_local_storage; } > 166: Why was this change/move necessary? Do I miss anything? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12923#discussion_r1142804605 From psandoz at openjdk.org Tue Mar 21 00:57:17 2023 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 21 Mar 2023 00:57:17 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) In-Reply-To: References: Message-ID: On Fri, 17 Mar 2023 15:42:56 GMT, Per Minborg wrote: > API changes for the FFM API (third preview) > > Specdiff: > https://cr.openjdk.org/~pminborg/panama/21/v1/specdiff/overview-summary.html > > Javadoc: > https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/module-summary.html src/java.base/share/classes/java/lang/foreign/Linker.java line 609: > 607: * @see #captureStateLayout() > 608: */ > 609: static Option captureCallState(String... capturedState) { What if a name is not recognized? Throw IAE? src/java.base/share/classes/java/lang/foreign/Linker.java line 621: > 619: * to a downcall handle linked with {@link #captureCallState(String...)}} > 620: * > 621: * @see #captureCallState(String...) How does a caller know what the structure may contain? Should we document the platform specific structures? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13079#discussion_r1142804110 PR Review Comment: https://git.openjdk.org/jdk/pull/13079#discussion_r1142804836 From david.holmes at oracle.com Tue Mar 21 01:03:18 2023 From: david.holmes at oracle.com (David Holmes) Date: Tue, 21 Mar 2023 11:03:18 +1000 Subject: ASAN and slowdebug In-Reply-To: References: Message-ID: <89076570-14fd-4d98-d68f-a2125c2a1390@oracle.com> On 21/03/2023 3:45 am, Justin King wrote: > Ew. So for `MemoryService::create_MemoryUsage_obj` it's effectively > adding stuff after the return statement, causing the compiler to > complain. I think for `MemoryService::create_MemoryUsage_obj` that last > CHECK should probably be changed to THREAD since it's not useful. Yep. We have been cleaning these up as we discover them, but no easy way to find them. David and the > code added after the return is unreachable. And LEAVE should probably be > changed to not have the return statement wrapped in an arbitrary?if > statement with JNI_TRUE as the expression, unless there is a reason for > that. > > On Mon, Mar 20, 2023 at 10:34?AM Stuart Monteith > > wrote: > > Hello, > ? ?While looking at ASAN to try to reproduce another issue with > ASAN, I tried building OpenJDK configured with: > ? ? ? ?--enable-asan --with-debug-level=slowdebug > > However, there is a compilation problem that persists up to and > include GCC 12.2.0: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80959 > > > As slowdebug builds are built with -O0, this results in code to be > inserted by the macros LEAVE() and CHECK_NH+CHECK_ > macros to be not optimised away. This causes the following two > warnings/errors: > > ? * For target hotspot_variant-server_libjvm_objs_memoryService.o: > jdk/src/hotspot/share/services/memoryService.cpp: In static member > function 'static Handle > MemoryService::create_MemoryUsage_obj(MemoryUsage, JavaThread*)': > jdk/src/hotspot/share/services/memoryService.cpp:219:1: error: > control reaches end of non-void function > [-Werror=return-type] > > ? * For target support_native_java.base_libjli_java.o: > jdk/src/java.base/share/native/libjli/java.c: In function 'JavaMain': > jdk/src/java.base/share/native/libjli/java.c:556:1: error: control > reaches end of non-void function [-Werror=return-type] > ? ?556 | } > ? ? ? ?| ^ > > > There are a couple of ways of working around this. One is to? change > create_MemoryUsage_obj and libjli/java.c to place a > return statement at the end of the function block, after the macros. > The changes are straightforward, but we're > obviously not guaranteed to not reintroduce the problem elsewhere. > > Another is to pass:? "--disable-warnings-as-errors" or more > specifically "--with-extra-cflags=-Wno-return-type > --with-extra-cxxflags=-Wno-return-type" to configure. This would be > a manual process, unless the make files were changed > to handle this, however this does remove the return type checks. > > > Part of the reason for sending this is to at least document the > issue publicly. Are there any opinions on what should to > be done next? > > > Stuart > > > > -- > > Google Logo > Justin King > Software Engineer > jcking at google.com > > > From psandoz at openjdk.org Tue Mar 21 01:17:42 2023 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 21 Mar 2023 01:17:42 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) In-Reply-To: References: Message-ID: On Fri, 17 Mar 2023 15:42:56 GMT, Per Minborg wrote: > API changes for the FFM API (third preview) > > Specdiff: > https://cr.openjdk.org/~pminborg/panama/21/v1/specdiff/overview-summary.html > > Javadoc: > https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/module-summary.html src/java.base/share/classes/java/lang/foreign/SymbolLookup.java line 56: > 54: *
  • It can be passed to an existing {@linkplain Linker#downcallHandle(FunctionDescriptor, Linker.Option...) downcall method handle}, as an argument to the underlying foreign function.
  • > 55: *
  • It can be {@linkplain MemorySegment#set(AddressLayout, long, MemorySegment) stored} inside another memory segment.
  • > 56: *
  • It can be used to access the region of memory backing a global variable (this might require Suggestion: *
  • It can be used to access the region of memory backing a global variable (this requires ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13079#discussion_r1142812300 From fgao at openjdk.org Tue Mar 21 02:33:41 2023 From: fgao at openjdk.org (Fei Gao) Date: Tue, 21 Mar 2023 02:33:41 GMT Subject: RFR: 8304301: Remove the global option SuperWordMaxVectorSize Message-ID: <2XQwCCx_ficJ1bn2WX0Ud9m-QmuGkkZAH8N6yLugXqQ=.90a136da-db9a-49b4-a4c4-d6c644cf7d5a@github.com> https://github.com/openjdk/jdk/pull/8877 introduced the global option `SuperWordMaxVectorSize` as a temporary solution to fix the performance regression on some x86 machines. Currently, `SuperWordMaxVectorSize` behaves differently between x86 and other platforms [1]. For example, if the current machine only supports `MaxVectorSize <= 32`, but we set `SuperWordMaxVectorSize = 64`, then `SuperWordMaxVectorSize` will be kept at 64 on other platforms while x86 machine would change `SuperWordMaxVectorSize` to `MaxVectorSize`. Other platforms except x86 miss similar implementations like [2]. Also, `SuperWordMaxVectorSize` limits the max vector size of auto-vectorization as `64`, which is fine for current aarch64 hardware, but SVE architecture supports larger than 512 bits. The patch is to drop the global option and use an architecture-dependent interface to consult the max vector size for auto-vectorization, fixing the performance issue on x86 and reducing side effects for other platforms. After the patch, auto-vectorization is still limited to 32-byte vectors by default on Cascade Lake and users can override this by either setting `-XX:UseAVX=3` or `-XX:MaxVectorSize=64` on JVM command line. So my question is: Before the patch, we could have a smaller max vector size for auto-vectorization than `MaxVectorSize` on x86. For example, users could have `MaxVectorSize=64` and `SuperWordMaxVectorSize=32`. But after the change, if we set `-XX:MaxVectorSize=64` explicitly, then the max vector size for auto-vectorization would be `MaxVectorSize`, i.e. 64 bytes, which I believe is more reasonable. @sviswa7 @jatin-bhateja, are you happy about the change? [1] https://github.com/openjdk/jdk/pull/12350#discussion_r1126106213 [2] https://github.com/openjdk/jdk/blob/33bec207103acd520eb99afb093cfafa44aecfda/src/hotspot/cpu/x86/vm_version_x86.cpp#L1314-L1333 ------------- Commit messages: - 8304301: Remove the global option SuperWordMaxVectorSize Changes: https://git.openjdk.org/jdk/pull/13112/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13112&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8304301 Stats: 104 lines in 14 files changed: 50 ins; 38 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/13112.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13112/head:pull/13112 PR: https://git.openjdk.org/jdk/pull/13112 From sviswanathan at openjdk.org Tue Mar 21 04:19:44 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 21 Mar 2023 04:19:44 GMT Subject: RFR: 8304301: Remove the global option SuperWordMaxVectorSize In-Reply-To: <2XQwCCx_ficJ1bn2WX0Ud9m-QmuGkkZAH8N6yLugXqQ=.90a136da-db9a-49b4-a4c4-d6c644cf7d5a@github.com> References: <2XQwCCx_ficJ1bn2WX0Ud9m-QmuGkkZAH8N6yLugXqQ=.90a136da-db9a-49b4-a4c4-d6c644cf7d5a@github.com> Message-ID: On Tue, 21 Mar 2023 02:26:55 GMT, Fei Gao wrote: > https://github.com/openjdk/jdk/pull/8877 introduced the global option `SuperWordMaxVectorSize` as a temporary solution to fix the performance regression on some x86 machines. > > Currently, `SuperWordMaxVectorSize` behaves differently between x86 and other platforms [1]. For example, if the current machine only supports `MaxVectorSize <= 32`, but we set `SuperWordMaxVectorSize = 64`, then `SuperWordMaxVectorSize` will be kept at 64 on other platforms while x86 machine would change `SuperWordMaxVectorSize` to `MaxVectorSize`. Other platforms except x86 miss similar implementations like [2]. > > Also, `SuperWordMaxVectorSize` limits the max vector size of auto-vectorization as `64`, which is fine for current aarch64 hardware, but SVE architecture supports larger than 512 bits. > > The patch is to drop the global option and use an architecture-dependent interface to consult the max vector size for auto-vectorization, fixing the performance issue on x86 and reducing side effects for other platforms. After the patch, auto-vectorization is still limited to 32-byte vectors by default on Cascade Lake and users can override this by either setting > `-XX:UseAVX=3` or `-XX:MaxVectorSize=64` on JVM command line. > > So my question is: > > Before the patch, we could have a smaller max vector size for auto-vectorization than `MaxVectorSize` on x86. For example, users could have `MaxVectorSize=64` and `SuperWordMaxVectorSize=32`. But after the change, if we set > `-XX:MaxVectorSize=64` explicitly, then the max vector size for auto-vectorization would be `MaxVectorSize`, i.e. 64 bytes, which I believe is more reasonable. @sviswa7 @jatin-bhateja, are you happy about the change? > > [1] https://github.com/openjdk/jdk/pull/12350#discussion_r1126106213 > [2] https://github.com/openjdk/jdk/blob/33bec207103acd520eb99afb093cfafa44aecfda/src/hotspot/cpu/x86/vm_version_x86.cpp#L1314-L1333 @fg1417 SuperWordMaxVectorSize defines the maximum vector size generated by the auto vectorization. MaxVectorSize defines the vector width supported by the underlying platform. For CascadeLake we are setting SuperWordMaxVectorSize=32 and MaxVectorSize=64 by default. This allows the usage of larger 64 byte width vector instructions in places like Java Vector API and intrinsics. We would like to keep this behavior for x86. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13112#issuecomment-1477259831 From fgao at openjdk.org Tue Mar 21 06:45:50 2023 From: fgao at openjdk.org (Fei Gao) Date: Tue, 21 Mar 2023 06:45:50 GMT Subject: RFR: 8304301: Remove the global option SuperWordMaxVectorSize In-Reply-To: References: <2XQwCCx_ficJ1bn2WX0Ud9m-QmuGkkZAH8N6yLugXqQ=.90a136da-db9a-49b4-a4c4-d6c644cf7d5a@github.com> Message-ID: On Tue, 21 Mar 2023 04:16:26 GMT, Sandhya Viswanathan wrote: > @fg1417 SuperWordMaxVectorSize defines the maximum vector size generated by the auto vectorization. MaxVectorSize defines the vector width supported by the underlying platform. For CascadeLake we are setting SuperWordMaxVectorSize=32 and MaxVectorSize=64 by default. This allows the usage of larger 64 byte width vector instructions in places like Java Vector API and intrinsics. We would like to keep this behavior for x86. Hi @sviswa7, thanks for your quick response! Yes, the patch keeps the special handling for auto-vectorization on Cascade Lake. For Cascade Lake, even after the patch, we still have 32 bytes for auto-vectorization and larger 64 bytes for Java Vector API and intrinsics. Can it cover your needs? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13112#issuecomment-1477349803 From pminborg at openjdk.org Tue Mar 21 07:50:22 2023 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 21 Mar 2023 07:50:22 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v2] In-Reply-To: References: Message-ID: > API changes for the FFM API (third preview) > > Specdiff: > https://cr.openjdk.org/~pminborg/panama/21/v1/specdiff/overview-summary.html > > Javadoc: > https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/module-summary.html Per Minborg has updated the pull request incrementally with two additional commits since the last revision: - Update src/java.base/share/classes/java/lang/foreign/SymbolLookup.java Co-authored-by: Paul Sandoz - Update src/java.base/share/classes/java/lang/foreign/AddressLayout.java Co-authored-by: Paul Sandoz ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13079/files - new: https://git.openjdk.org/jdk/pull/13079/files/bb2f4438..7a78948a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13079&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13079&range=00-01 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13079.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13079/head:pull/13079 PR: https://git.openjdk.org/jdk/pull/13079 From pminborg at openjdk.org Tue Mar 21 08:04:25 2023 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 21 Mar 2023 08:04:25 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v3] In-Reply-To: References: Message-ID: > API changes for the FFM API (third preview) > > Specdiff: > https://cr.openjdk.org/~pminborg/panama/21/v1/specdiff/overview-summary.html > > Javadoc: > https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/module-summary.html Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Remove MemoryInspection classes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13079/files - new: https://git.openjdk.org/jdk/pull/13079/files/7a78948a..a71518c4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13079&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13079&range=01-02 Stats: 684 lines in 3 files changed: 0 ins; 684 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13079.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13079/head:pull/13079 PR: https://git.openjdk.org/jdk/pull/13079 From pminborg at openjdk.org Tue Mar 21 08:33:29 2023 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 21 Mar 2023 08:33:29 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v4] In-Reply-To: References: Message-ID: On Tue, 21 Mar 2023 00:52:01 GMT, Paul Sandoz wrote: >> Per Minborg has updated the pull request incrementally with one additional commit since the last revision: >> >> Improve Linker javadocs > > src/java.base/share/classes/java/lang/foreign/Linker.java line 609: > >> 607: * @see #captureStateLayout() >> 608: */ >> 609: static Option captureCallState(String... capturedState) { > > What if a name is not recognized? Throw IAE? Added ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13079#discussion_r1143024389 From pminborg at openjdk.org Tue Mar 21 08:33:26 2023 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 21 Mar 2023 08:33:26 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v4] In-Reply-To: References: Message-ID: <_HwSyb1bsHNIJpGKdznMZR_HfWIU9VeAl1etn4miKGQ=.8397676f-1e24-4a2e-bcfa-ed44fb8a515b@github.com> > API changes for the FFM API (third preview) > > Specdiff: > https://cr.openjdk.org/~pminborg/panama/21/v1/specdiff/overview-summary.html > > Javadoc: > https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/module-summary.html Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Improve Linker javadocs ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13079/files - new: https://git.openjdk.org/jdk/pull/13079/files/a71518c4..4626a54e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13079&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13079&range=02-03 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13079.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13079/head:pull/13079 PR: https://git.openjdk.org/jdk/pull/13079 From pminborg at openjdk.org Tue Mar 21 09:02:29 2023 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 21 Mar 2023 09:02:29 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v5] In-Reply-To: References: Message-ID: > API changes for the FFM API (third preview) > > Specdiff: > https://cr.openjdk.org/~pminborg/panama/21/v1/specdiff/overview-summary.html > > Javadoc: > https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/module-summary.html Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Add example for Option::captureStateLayout ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13079/files - new: https://git.openjdk.org/jdk/pull/13079/files/4626a54e..21ef0607 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13079&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13079&range=03-04 Stats: 12 lines in 1 file changed: 12 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13079.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13079/head:pull/13079 PR: https://git.openjdk.org/jdk/pull/13079 From pminborg at openjdk.org Tue Mar 21 09:02:33 2023 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 21 Mar 2023 09:02:33 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v5] In-Reply-To: References: Message-ID: On Tue, 21 Mar 2023 00:54:10 GMT, Paul Sandoz wrote: >> Per Minborg has updated the pull request incrementally with one additional commit since the last revision: >> >> Add example for Option::captureStateLayout > > src/java.base/share/classes/java/lang/foreign/Linker.java line 621: > >> 619: * to a downcall handle linked with {@link #captureCallState(String...)}} >> 620: * >> 621: * @see #captureCallState(String...) > > How does a caller know what the structure may contain? Should we document the platform specific structures? I've added an example of how to print the platform-dependent structure. Should we document anyhow? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13079#discussion_r1143055969 From mcimadamore at openjdk.org Tue Mar 21 09:52:54 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Tue, 21 Mar 2023 09:52:54 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v5] In-Reply-To: References: Message-ID: On Tue, 21 Mar 2023 00:35:40 GMT, Paul Sandoz wrote: >> Per Minborg has updated the pull request incrementally with one additional commit since the last revision: >> >> Add example for Option::captureStateLayout > > src/java.base/share/classes/java/lang/foreign/Linker.java line 479: > >> 477: * Otherwise, the invocation throws {@link WrongThreadException}; and
  • >> 478: *
  • {@code A} is kept alive during the invocation. For instance, if {@code A} has been obtained using a >> 479: * {@linkplain Arena#ofConfined() confined arena}, any attempt to {@linkplain Arena#close() close} > > Is that correct? Do you mean a shared arena? The text is correct (you can still close a confined arena from inside an upcall), but I agree that perhaps using a shared arena might be a bit more intuitive. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13079#discussion_r1143118512 From mcimadamore at openjdk.org Tue Mar 21 09:56:53 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Tue, 21 Mar 2023 09:56:53 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v5] In-Reply-To: References: Message-ID: On Tue, 21 Mar 2023 08:57:42 GMT, Per Minborg wrote: >> src/java.base/share/classes/java/lang/foreign/Linker.java line 621: >> >>> 619: * to a downcall handle linked with {@link #captureCallState(String...)}} >>> 620: * >>> 621: * @see #captureCallState(String...) >> >> How does a caller know what the structure may contain? Should we document the platform specific structures? > > I've added an example of how to print the platform-dependent structure. Should we document anyhow? I'm not sure about this. Honestly, the example probably doesn't help much if somebody didn't get the idea of what the layout might be. I think it might be better to say something like, `on Windows/x64 the returned layout might be as follows...` and then you write the code to create the layout instance corresponding to the returned layout. But we have to be careful to make the text non-normative (as the set of values might change). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13079#discussion_r1143125997 From jsjolen at openjdk.org Tue Mar 21 09:56:53 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 21 Mar 2023 09:56:53 GMT Subject: RFR: JDK-8301493: Replace NULL with nullptr in cpu/aarch64 In-Reply-To: References: Message-ID: On Tue, 31 Jan 2023 11:39:27 GMT, Johan Sj?len wrote: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/aarch64. Unfortunately the script that does the change isn't perfect, and so we > need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! Not yet. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12321#issuecomment-1477546797 From mcimadamore at openjdk.org Tue Mar 21 10:01:53 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Tue, 21 Mar 2023 10:01:53 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v5] In-Reply-To: References: Message-ID: On Tue, 21 Mar 2023 09:02:29 GMT, Per Minborg wrote: >> API changes for the FFM API (third preview) >> >> Specdiff: >> https://cr.openjdk.org/~pminborg/panama/21/v1/specdiff/overview-summary.html >> >> Javadoc: >> https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/module-summary.html > > Per Minborg has updated the pull request incrementally with one additional commit since the last revision: > > Add example for Option::captureStateLayout src/java.base/share/classes/java/lang/foreign/Linker.java line 480: > 478: * Otherwise, the invocation throws {@link WrongThreadException}; and
  • > 479: *
  • {@code A} is kept alive during the invocation. For instance, if {@code A} has been obtained using a > 480: * {@linkplain Arena#ofShared()} confined arena}, any attempt to {@linkplain Arena#close() close} the description of the link still says "confined" ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13079#discussion_r1143131392 From jsjolen at openjdk.org Tue Mar 21 10:04:04 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 21 Mar 2023 10:04:04 GMT Subject: RFR: JDK-8301498: Replace NULL with nullptr in cpu/x86 [v4] In-Reply-To: References: Message-ID: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/x86. Unfortunately the script that does the change isn't perfect, and so we > need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Merge remote-tracking branch 'origin/master' into JDK-8301498 - Fix vnkozlov's suggestions - Merge remote-tracking branch 'origin/master' into JDK-8301498 - Some more fixes - Fixes - Replace NULL with nullptr in cpu/x86 ------------- Changes: https://git.openjdk.org/jdk/pull/12326/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12326&range=03 Stats: 657 lines in 54 files changed: 0 ins; 0 del; 657 mod Patch: https://git.openjdk.org/jdk/pull/12326.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12326/head:pull/12326 PR: https://git.openjdk.org/jdk/pull/12326 From aph at openjdk.org Tue Mar 21 10:53:53 2023 From: aph at openjdk.org (Andrew Haley) Date: Tue, 21 Mar 2023 10:53:53 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v9] In-Reply-To: References: Message-ID: On Mon, 20 Mar 2023 14:29:35 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. >> >> This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. >> >> Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. >> >> The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. >> >> This change supports the following platforms: x86, aarch64, PPC, and RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Fix riscv interpreter mistake and acquire semantics src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp line 491: > 489: } else { > 490: // Pop N words from the stack > 491: __ get_cache_and_index_at_bcp(r1, r2, 1, index_size); This aliasing of `r1` and `cache` is very confusing. Please decide whether to use the name `cache` or `r1` and do so consistently. src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 2337: > 2335: // Load-acquire the adapter method > 2336: __ lea(method, Address(cache, in_bytes(ResolvedIndyEntry::method_offset()))); > 2337: __ ldar(method, method); What reordering are we trying to prevent here? src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 2399: > 2397: bool is_invokevirtual, > 2398: bool is_invokevfinal, /*unused*/ > 2399: bool is_invokedynamic /*unused*/) { Suggestion: bool /*is_invokedynamic*/) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12778#discussion_r1143191698 PR Review Comment: https://git.openjdk.org/jdk/pull/12778#discussion_r1143193859 PR Review Comment: https://git.openjdk.org/jdk/pull/12778#discussion_r1143195547 From pminborg at openjdk.org Tue Mar 21 12:02:44 2023 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 21 Mar 2023 12:02:44 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v5] In-Reply-To: References: Message-ID: On Tue, 21 Mar 2023 09:54:16 GMT, Maurizio Cimadamore wrote: >> I've added an example of how to print the platform-dependent structure. Should we document anyhow? > > I'm not sure about this. Honestly, the example probably doesn't help much if somebody didn't get the idea of what the layout might be. I think it might be better to say something like, `on Windows/x64 the returned layout might be as follows...` and then you write the code to create the layout instance corresponding to the returned layout. But we have to be careful to make the text non-normative (as the set of values might change). What about adding something like this? Here is a list of valid names for some platforms: Linux: "errno" macOS: "errno" Windows: "GetLastError", "WSAGetLastError" and "errno" ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13079#discussion_r1143265595 From mcimadamore at openjdk.org Tue Mar 21 12:08:45 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Tue, 21 Mar 2023 12:08:45 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v5] In-Reply-To: References: Message-ID: <5VeF6VvOJ94JGz-slAqtwU-mtnhdV4qSk2KsbqVy-x4=.47d9a1f7-b86b-4d07-8ab1-f9abbaf3e5ed@github.com> On Tue, 21 Mar 2023 11:58:28 GMT, Per Minborg wrote: >> I'm not sure about this. Honestly, the example probably doesn't help much if somebody didn't get the idea of what the layout might be. I think it might be better to say something like, `on Windows/x64 the returned layout might be as follows...` and then you write the code to create the layout instance corresponding to the returned layout. But we have to be careful to make the text non-normative (as the set of values might change). > > What about adding something like this? > > > Here is a list of valid names for some platforms: > Linux: > "errno" > macOS: > "errno" > Windows: > "GetLastError", "WSAGetLastError" and "errno" I'd prefer something more informal: "Examples of valid names are "errno" on Linux/x64, or "GetLastError" on Windows/x64". And maybe add that "The precise set of platform-dependent supported names can be queried using the returned group layout". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13079#discussion_r1143272764 From mcimadamore at openjdk.org Tue Mar 21 12:17:43 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Tue, 21 Mar 2023 12:17:43 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v5] In-Reply-To: References: Message-ID: On Tue, 21 Mar 2023 00:54:10 GMT, Paul Sandoz wrote: >> Per Minborg has updated the pull request incrementally with one additional commit since the last revision: >> >> Add example for Option::captureStateLayout > > src/java.base/share/classes/java/lang/foreign/Linker.java line 621: > >> 619: * to a downcall handle linked with {@link #captureCallState(String...)}} >> 620: * >> 621: * @see #captureCallState(String...) > > How does a caller know what the structure may contain? Should we document the platform specific structures? Back to @PaulSandoz question - "how does the caller know what the structure contains?". The caller typically doesn't care too much about what the returned struct is. But it might care about which "values" can be captured. That said, the set of such interesting values, is not too surprising. As demonstrated in the example in the `Linker.capturedCallState` method, once the client knows the name (and "errno" is likely to be 90% case), everything else follows from there - as the layout can be used to obtain var handles for the required values. But, perhaps, now that I write this, I realize that what @PaulSandoz might _really_ be asking is: how do I know that e.g. the returned struct will not contain e.g. nested structs, sequences, or other non-sense. So we might specify (in a normative way) that the returned layout is a struct layout, whose member layouts are one or more value layouts (possibly with some added padding layouts). The names of the value layouts reflect the names of the values that can be captured. And _then_ we show an example of the layout we return for Windows/x64 (as that's more interesting). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13079#discussion_r1143281164 From jwaters at openjdk.org Tue Mar 21 13:03:58 2023 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 21 Mar 2023 13:03:58 GMT Subject: RFR: 8302798: Refactor -XX:+UseOSErrorReporting for noreturn crash reporting [v3] In-Reply-To: References: Message-ID: On Wed, 1 Mar 2023 10:25:06 GMT, Kim Barrett wrote: >> Please review this change to the implementation of the Windows-specific option >> UseOSErrorReporting, toward allowing crash reporting functions to be declared >> noreturn. VMError::report_and_die no longer conditionally returns if the >> Windows-only option UseOSErrorReporting is true. >> >> The Windows-only sections of report_and_die now call RaiseFailFastException >> (https://learn.microsoft.com/en-us/windows/win32/api/errhandlingapi/nf-errhandlingapi-raisefailfastexception), >> which immediately invokes WER (Windows Error Reporting) if it is enabled, >> without executing structured exception handler. If WER is not enabled, it >> just immediately terminates the program. Thus, we no longer return to walk up >> thestructured exception handler chain to pop out at the top as unhandled in >> order to invoke WER. >> >> This permits declaring report_and_die as [[noreturn]], once some functions >> from the os class are also so declared. Also adding that attribute as >> appropriate to other functions in the os class. This of course assumes >> the use of [[noreturn]] in HotSpot code is approved (JDK-8302124). >> >> There is a pre-existing bug that I'll be reporting separately. If >> UseOSErrorReporting and CreateCoredumpOnCrash are both true, we create an >> empty .mdmp file. We shouldn't create that file when UseOSErrorReporting. >> >> Testing: >> mach5 tier1-3 >> >> Manual testing with the following, to verify desired behavior. >> >> -XX:ErrorHandlerTest=N >> 1: assertion failure >> 2: guarantee failure >> 14: SIGSEGV >> 15: divide by zero >> path/to/bin/java \ >> -XX:+UnlockDiagnosticVMOptions \ >> -XX:+ErrorLogSecondaryErrorDetails \ >> -XX:+UseOSErrorReporting \ >> -XX:ErrorHandlerTest=1 \ >> TestDebug.java >> >> --- TestDebug.java --- >> import java.lang.String; >> public class TestDebug { >> static private volatile String dummy; >> public static void main(String[] args) throws Exception { >> while (true) { >> dummy = new String("foo bar"); >> } >> } >> } >> --- end TestDebug.java --- >> >> The state of WER can be examined and modified using Power Shell commands >> {Get,Enable,Disable}-WindowsErrorReporting. >> >> The state of reporting WER captured errors can be examined and modified using >> Control Panel > Security and Maintenance > Maintenance : Report Problems [on,off] >> >> With Report Problems off, reports are placed in >> c:\ProgramData\Microsoft\Windows\WER\ReportArchive >> >> I verified that executing the above test with WER enabled adds an entry in >> that directory, but not when it's disabled. Also nothing is added there when >> the test is run with -XX:-UseOSErrorReporting. > > Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into failfast > - remove failfast cuttoff of secondary errors > - failfast When compiling a Windows HotSpot with gcc (see [8288293](https://bugs.openjdk.org/browse/JDK-8288293?filter=-3)) newer gcc versions have a diagnostic that can catch when noreturn code returns in certain areas, and gcc catches several with this change: src/hotspot/os/windows/os_windows.cpp: In static member function 'static void os::abort(bool, void*, const void*)': src/hotspot/os/windows/os_windows.cpp:1249:1: error: 'noreturn' function does return [-Werror] 1249 | } | ^ src/hotspot/os/windows/os_windows.cpp: In static member function 'static void os::die()': src/hotspot/os/windows/os_windows.cpp:1254:1: error: 'noreturn' function does return [-Werror] 1254 | } | ^ src/hotspot/os/windows/os_windows.cpp: In static member function 'static void os::exit(int)': src/hotspot/os/windows/os_windows.cpp:4792:1: error: 'noreturn' function does return [-Werror] 4792 | } | ^ src/hotspot/os/windows/os_windows.cpp: In static member function 'static void os::_exit(int)': src/hotspot/os/windows/os_windows.cpp:4796:1: error: 'noreturn' function does return [-Werror] 4796 | } I don't know if this is something we should be worried about with our current setup and the regular build that we have for Windows, so I'm leaving this warning here for someone else to read through ------------- PR Comment: https://git.openjdk.org/jdk/pull/12759#issuecomment-1477797624 From coleenp at openjdk.org Tue Mar 21 13:26:44 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 21 Mar 2023 13:26:44 GMT Subject: RFR: 8304089: Convert TraceDependencies to UL [v3] In-Reply-To: References: <0aGc1NdAjpvksCWmXb1gZOPp9MV0n6xWvG8EaEp2ZLg=.b79cf218-9a23-420d-bec5-7509b7f8f1c1@github.com> <13yGfFhRFEsjHA-ox_6GxPiyU8w_hpQtgjHbsw6Glq0=.c2330c3f-0316-4f4c-aade-b6ad6c8543ee@github.com> <2IKS8cYMhoU1JRBEalwF1ZeV4Vih78eXffu_GXZ4JkQ=.27d433ac-1f04-47ce-a177-c791a42bc6fe@github.com> Message-ID: On Fri, 17 Mar 2023 04:53:38 GMT, David Holmes wrote: >> See the function log_dependency() >> >> https://github.com/openjdk/jdk/blob/421b4ee33c652cc7c444fbbf298bbc23d052c2fe/src/hotspot/share/code/dependencies.cpp#L845 >> >> called from here (as one place). >> >> https://github.com/openjdk/jdk/blob/421b4ee33c652cc7c444fbbf298bbc23d052c2fe/src/hotspot/share/code/dependencies.cpp#L2071 >> >> When TraceDependencies was true, the log file would be non-null and log_dependency would write to it. Keeping this with -Xlog:dependencies=debug retains what TraceDependencies did. > > Hmmm this seems broken to me then. We output the general dependency logging to one place based on UL configuration, but then we output `log_dependency` to the log file. If these are meant to be related and always reported together then we have lost that. If they are actually unrelated then this may still need the TraceDependencies flag to control it. > > @iwanowww can you comment on this please? Maybe we don't need this -Xlog test in this place, because this is also what PrintDependencies does. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13007#discussion_r1143368172 From smonteith at openjdk.org Tue Mar 21 13:34:40 2023 From: smonteith at openjdk.org (Stuart Monteith) Date: Tue, 21 Mar 2023 13:34:40 GMT Subject: RFR: JDK-8301493: Replace NULL with nullptr in cpu/aarch64 In-Reply-To: References: Message-ID: <-ZV05tb2xNWIBcGc7Nj_TZ6qq3BGrsjlKCT48_GTmQU=.6480f4f9-f1a5-47fa-94d9-51d3968ff711@github.com> On Tue, 31 Jan 2023 11:39:27 GMT, Johan Sj?len wrote: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/aarch64. Unfortunately the script that does the change isn't perfect, and so we > need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! This looks OK so far. However, is it your intention to also do aarch64.ad? aarch64_ad.m4 and aarch64_vector(.ad|_ad.m4) files look clean. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12321#issuecomment-1477844415 From pminborg at openjdk.org Tue Mar 21 13:39:05 2023 From: pminborg at openjdk.org (Per Minborg) Date: Tue, 21 Mar 2023 13:39:05 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v5] In-Reply-To: References: Message-ID: On Tue, 21 Mar 2023 09:02:29 GMT, Per Minborg wrote: >> API changes for the FFM API (third preview) >> >> Specdiff: >> https://cr.openjdk.org/~pminborg/panama/21/v1/specdiff/overview-summary.html >> >> Javadoc: >> https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/module-summary.html > > Per Minborg has updated the pull request incrementally with one additional commit since the last revision: > > Add example for Option::captureStateLayout A review of all the copyright years shall be made in this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13079#issuecomment-1477849414 From matsaave at openjdk.org Tue Mar 21 14:49:54 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 21 Mar 2023 14:49:54 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v9] In-Reply-To: References: Message-ID: On Tue, 21 Mar 2023 10:51:08 GMT, Andrew Haley wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix riscv interpreter mistake and acquire semantics > > src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 2399: > >> 2397: bool is_invokevirtual, >> 2398: bool is_invokevfinal, /*unused*/ >> 2399: bool is_invokedynamic /*unused*/) { > > Suggestion: > > bool /*is_invokedynamic*/) { This is a temporary bandaid in the same format we see for `is_invokefinal`, both of which should be cleaned up once all the ports are complete. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12778#discussion_r1143508140 From qamai at openjdk.org Tue Mar 21 16:16:34 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 21 Mar 2023 16:16:34 GMT Subject: RFR: 8304450: [vectorapi] Refactor VectorShuffle implementation [v2] In-Reply-To: References: Message-ID: On Sun, 19 Mar 2023 19:38:04 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, VectorShuffle is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: >> >> 1. Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. >> 2. Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. >> 3. Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. >> 4. Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. >> >> Upon these changes, a `rearrange` can emit more efficient code: >> >> var species = IntVector.SPECIES_128; >> var v1 = IntVector.fromArray(species, SRC1, 0); >> var v2 = IntVector.fromArray(species, SRC2, 0); >> v1.rearrange(v2.toShuffle()).intoArray(DST, 0); >> >> Before: >> movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} >> vmovdqu 0x10(%r10),%xmm2 >> movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} >> vmovdqu 0x10(%r10),%xmm1 >> movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} >> vmovdqu 0x10(%r10),%xmm0 >> vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask >> ; {external_word} >> vpackusdw %xmm0,%xmm0,%xmm0 >> vpackuswb %xmm0,%xmm0,%xmm0 >> vpmovsxbd %xmm0,%xmm3 >> vpcmpgtd %xmm3,%xmm1,%xmm3 >> vtestps %xmm3,%xmm3 >> jne 0x00007fc2acb4e0d8 >> vpmovzxbd %xmm0,%xmm0 >> vpermd %ymm2,%ymm0,%ymm0 >> movabs $0x751588f98,%r10 ; {oop([I{0x0000000751588f98})} >> vmovdqu %xmm0,0x10(%r10) >> >> After: >> movabs $0x751589c78,%r10 ; {oop([I{0x0000000751589c78})} >> vmovdqu 0x10(%r10),%xmm1 >> movabs $0x75158ac88,%r10 ; {oop([I{0x000000075158ac88})} >> vmovdqu 0x10(%r10),%xmm2 >> vpxor %xmm0,%xmm0,%xmm0 >> vpcmpgtd %xmm2,%xmm0,%xmm3 >> vtestps %xmm3,%xmm3 >> jne 0x00007fa818b27cb1 >> vpermd %ymm1,%ymm2,%ymm0 >> movabs $0x751588c68,%r10 ; {oop([I{0x0000000751588c68})} >> vmovdqu %xmm0,0x10(%r10) >> >> Please take a look and leave reviews. Thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > fix Matcher::vector_needs_load_shuffle Yes I will try to polish the patch more after finding the cause of the failure in x86_32. The failure is strange, though, it does not occur on x86_64 for some reasons. > One annoyance in the API which propagates down into the implementation is `VectorShuffle` and `VectorMask` have `E` that is the lane element type. Yes I agree, a shuffle merely contains the lane indices while a mask is an array of boolean, it would be a good cleanup to remove `E` from the interface. > However, i don't have a good sense of the implications this has to the current HotSpot implementation and whether it is feasible. Note that generics are erased, so from the VM point of view, a `VectorMask` and a `VectorMask` is indifferent. As a result, removing the type parameter should not have any impact on the VM. Some details may have to change though, as element types are removed, a mask or shuffle would only be validated in accordance to its length, and we need to insert a cast at use sites. The cast will be removed if it is actually the same species so there is little concern regarding the machine code emitted. Thanks a lot. I have moved most of the methods to `AbstractVector` and `AbstractShuffle`, I have to resort to raw types, though, since there seems to be no way to do the same with wild cards, and the generics mechanism is not powerful enough for things like `Vector`. The remaining failure seems to be related to [JDK-8304676](https://bugs.openjdk.org/projects/JDK/issues/JDK-8304676), so I think this patch is ready for review now. > The mask implementation is specialized by the species of vectors it operates on, but does it have to be Apart from the mask implementation, shuffle implementation definitely has to take into consideration the element type. However, this information does not have to be visible to the API, similar to how we currently handle the vector length, we can have `class AbstractMask implements VectorMask`. As a result, the cast method would be useless and can be removed in the API, but our implementation details would still use it, for example Vector blend(Vector v, VectorMask w) { AbstractMask aw = (AbstractMask) w; AbstractMask tw = aw.cast(vspecies()); return VectorSupport.blend(...); } Vector rearrange(VectorShuffle s) { AbstractShuffle as = (AbstractShuffle) s; AbstractShuffle ts = s.cast(vspecies()); return VectorSupport.rearrangeOp(...); } What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13093#issuecomment-1477581887 PR Comment: https://git.openjdk.org/jdk/pull/13093#issuecomment-1478140557 From psandoz at openjdk.org Tue Mar 21 16:16:35 2023 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 21 Mar 2023 16:16:35 GMT Subject: RFR: 8304450: [vectorapi] Refactor VectorShuffle implementation [v2] In-Reply-To: References: Message-ID: <5WEC6Qqt9dnLkrD7AlIXjAImDgYtcUgvQlR3acn1oi0=.96d3caf3-a2b9-4491-b2ba-a9d04f658772@github.com> On Tue, 21 Mar 2023 10:18:19 GMT, Quan Anh Mai wrote: > Note that generics are erased, so from the VM point of view, a `VectorMask` and a `VectorMask` is indifferent. Yes, that's the easy bit :-) The mask implementation is specialized by the species of vectors it operates on, but does it have to be and can we make it independent of the species and bind to the lane count? Then the user does not need to explicitly cast from and to species that have the same lane count, which means we can remove the VectorMask::cast method (since it already throws if the lane counts are not equal). ------------- PR Comment: https://git.openjdk.org/jdk/pull/13093#issuecomment-1478069401 From qamai at openjdk.org Tue Mar 21 16:16:31 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 21 Mar 2023 16:16:31 GMT Subject: RFR: 8304450: [vectorapi] Refactor VectorShuffle implementation [v3] In-Reply-To: References: Message-ID: <4Op0Z8whnyDXDC6zGyMbx4ugcZp5TEoAqW_myB5flxM=.1c7b59ba-efb2-4f68-90d7-2d6e33e39572@github.com> > Hi, > > This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, VectorShuffle is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: > > 1. Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. > 2. Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. > 3. Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. > 4. Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. > > Upon these changes, a `rearrange` can emit more efficient code: > > var species = IntVector.SPECIES_128; > var v1 = IntVector.fromArray(species, SRC1, 0); > var v2 = IntVector.fromArray(species, SRC2, 0); > v1.rearrange(v2.toShuffle()).intoArray(DST, 0); > > Before: > movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} > vmovdqu 0x10(%r10),%xmm2 > movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} > vmovdqu 0x10(%r10),%xmm1 > movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} > vmovdqu 0x10(%r10),%xmm0 > vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask > ; {external_word} > vpackusdw %xmm0,%xmm0,%xmm0 > vpackuswb %xmm0,%xmm0,%xmm0 > vpmovsxbd %xmm0,%xmm3 > vpcmpgtd %xmm3,%xmm1,%xmm3 > vtestps %xmm3,%xmm3 > jne 0x00007fc2acb4e0d8 > vpmovzxbd %xmm0,%xmm0 > vpermd %ymm2,%ymm0,%ymm0 > movabs $0x751588f98,%r10 ; {oop([I{0x0000000751588f98})} > vmovdqu %xmm0,0x10(%r10) > > After: > movabs $0x751589c78,%r10 ; {oop([I{0x0000000751589c78})} > vmovdqu 0x10(%r10),%xmm1 > movabs $0x75158ac88,%r10 ; {oop([I{0x000000075158ac88})} > vmovdqu 0x10(%r10),%xmm2 > vpxor %xmm0,%xmm0,%xmm0 > vpcmpgtd %xmm2,%xmm0,%xmm3 > vtestps %xmm3,%xmm3 > jne 0x00007fa818b27cb1 > vpermd %ymm1,%ymm2,%ymm0 > movabs $0x751588c68,%r10 ; {oop([I{0x0000000751588c68})} > vmovdqu %xmm0,0x10(%r10) > > Please take a look and leave reviews. Thanks a lot. Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: - missing casts - clean up ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13093/files - new: https://git.openjdk.org/jdk/pull/13093/files/060554a9..4caa9d10 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13093&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13093&range=01-02 Stats: 1396 lines in 36 files changed: 44 ins; 1339 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/13093.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13093/head:pull/13093 PR: https://git.openjdk.org/jdk/pull/13093 From psandoz at openjdk.org Tue Mar 21 16:16:33 2023 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 21 Mar 2023 16:16:33 GMT Subject: RFR: 8304450: [vectorapi] Refactor VectorShuffle implementation [v2] In-Reply-To: References: Message-ID: <34zcCpLdmLg_1qtY5maOBw5ozxupQlkKPOf3Kzljybc=.d995b7c1-6ad0-4af4-9bc3-bf6cda45d274@github.com> On Sun, 19 Mar 2023 19:38:04 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, VectorShuffle is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: >> >> 1. Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. >> 2. Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. >> 3. Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. >> 4. Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. >> >> Upon these changes, a `rearrange` can emit more efficient code: >> >> var species = IntVector.SPECIES_128; >> var v1 = IntVector.fromArray(species, SRC1, 0); >> var v2 = IntVector.fromArray(species, SRC2, 0); >> v1.rearrange(v2.toShuffle()).intoArray(DST, 0); >> >> Before: >> movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} >> vmovdqu 0x10(%r10),%xmm2 >> movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} >> vmovdqu 0x10(%r10),%xmm1 >> movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} >> vmovdqu 0x10(%r10),%xmm0 >> vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask >> ; {external_word} >> vpackusdw %xmm0,%xmm0,%xmm0 >> vpackuswb %xmm0,%xmm0,%xmm0 >> vpmovsxbd %xmm0,%xmm3 >> vpcmpgtd %xmm3,%xmm1,%xmm3 >> vtestps %xmm3,%xmm3 >> jne 0x00007fc2acb4e0d8 >> vpmovzxbd %xmm0,%xmm0 >> vpermd %ymm2,%ymm0,%ymm0 >> movabs $0x751588f98,%r10 ; {oop([I{0x0000000751588f98})} >> vmovdqu %xmm0,0x10(%r10) >> >> After: >> movabs $0x751589c78,%r10 ; {oop([I{0x0000000751589c78})} >> vmovdqu 0x10(%r10),%xmm1 >> movabs $0x75158ac88,%r10 ; {oop([I{0x000000075158ac88})} >> vmovdqu 0x10(%r10),%xmm2 >> vpxor %xmm0,%xmm0,%xmm0 >> vpcmpgtd %xmm2,%xmm0,%xmm3 >> vtestps %xmm3,%xmm3 >> jne 0x00007fa818b27cb1 >> vpermd %ymm1,%ymm2,%ymm0 >> movabs $0x751588c68,%r10 ; {oop([I{0x0000000751588c68})} >> vmovdqu %xmm0,0x10(%r10) >> >> Please take a look and leave reviews. Thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > fix Matcher::vector_needs_load_shuffle That looks like a very good simplification and performance enhancement, and removing a limitation, the byte[] representation. This should likely also help with Valhalla integration. IIUC it has the same upper bound limitation for vector lengths greater than the maximum index size that can be represented as a lane element (although in practice there may not be any hardware where this can occur). Which is fine, i am not suggesting we try and fix this. Perhaps it may be possible to move some methods on the concrete implementations to the abstract implementations as helper methods or template methods, thereby reducing the amount of generated code? It seems so in some cases, but i did not look very closely. It may require the introduction of an an element type specific abstract shuffle, and if that's the case it may not be worth it. -- Relatedly, i would be interested in your opinion on the following. One annoyance in the API which propagates down into the implementation is `VectorShuffle` and `VectorMask` have `E` that is the lane element type. But, in theory they should not need E, and any shuffle or mask with the same lanes as the vector being operated on should be compatible, and it's an implementation detail of the shuffle/mask how its state represented as a hardware register. However, i don't have a good sense of the implications this has to the current HotSpot implementation and whether it is feasible. ------------- PR Review: https://git.openjdk.org/jdk/pull/13093#pullrequestreview-1349108003 From psandoz at openjdk.org Tue Mar 21 16:17:07 2023 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 21 Mar 2023 16:17:07 GMT Subject: RFR: 8304450: [vectorapi] Refactor VectorShuffle implementation [v2] In-Reply-To: References: Message-ID: On Tue, 21 Mar 2023 16:11:50 GMT, Quan Anh Mai wrote: > I have moved most of the methods to `AbstractVector` and `AbstractShuffle`, I have to resort to raw types, though, since there seems to be no way to do the same with wild cards, and the generics mechanism is not powerful enough for things like `Vector`. The remaining failure seems to be related to [JDK-8304676](https://bugs.openjdk.org/projects/JDK/issues/JDK-8304676), so I think this patch is ready for review now. The Java changes look good to me. I need to have another look, but will not be able to do so until next week. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13093#issuecomment-1478148846 From psandoz at openjdk.org Tue Mar 21 16:32:44 2023 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 21 Mar 2023 16:32:44 GMT Subject: RFR: 8304450: [vectorapi] Refactor VectorShuffle implementation [v2] In-Reply-To: References: Message-ID: On Tue, 21 Mar 2023 16:11:50 GMT, Quan Anh Mai wrote: > Apart from the mask implementation, shuffle implementation definitely has to take into consideration the element type. Yes, the way you have implemented shuffle is tightly connected, that looks ok. I am wondering if we can make the mask implementation more loosely coupled and modified such that it does not have to take into consideration the element type (or species) of the vector it operates on, and instead compatibility is based solely on the lane count. Ideally it would be good to change the `VectorMask::check` method to just compare the lanes counts and not require a cast in the implementation, which i presume requires some deeper changes in C2? What you propose seems a possible a interim step towards a more preferable API, if the performance is good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13093#issuecomment-1478175761 From sviswanathan at openjdk.org Tue Mar 21 16:53:49 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 21 Mar 2023 16:53:49 GMT Subject: RFR: 8304301: Remove the global option SuperWordMaxVectorSize In-Reply-To: <2XQwCCx_ficJ1bn2WX0Ud9m-QmuGkkZAH8N6yLugXqQ=.90a136da-db9a-49b4-a4c4-d6c644cf7d5a@github.com> References: <2XQwCCx_ficJ1bn2WX0Ud9m-QmuGkkZAH8N6yLugXqQ=.90a136da-db9a-49b4-a4c4-d6c644cf7d5a@github.com> Message-ID: <00uMDu5j39--5DfRMljtqUgU0DeBNdkBXDhTPgaEGI8=.866f24f0-b871-4fea-b5a7-62a376006054@github.com> On Tue, 21 Mar 2023 02:26:55 GMT, Fei Gao wrote: > https://github.com/openjdk/jdk/pull/8877 introduced the global option `SuperWordMaxVectorSize` as a temporary solution to fix the performance regression on some x86 machines. > > Currently, `SuperWordMaxVectorSize` behaves differently between x86 and other platforms [1]. For example, if the current machine only supports `MaxVectorSize <= 32`, but we set `SuperWordMaxVectorSize = 64`, then `SuperWordMaxVectorSize` will be kept at 64 on other platforms while x86 machine would change `SuperWordMaxVectorSize` to `MaxVectorSize`. Other platforms except x86 miss similar implementations like [2]. > > Also, `SuperWordMaxVectorSize` limits the max vector size of auto-vectorization as `64`, which is fine for current aarch64 hardware, but SVE architecture supports larger than 512 bits. > > The patch is to drop the global option and use an architecture-dependent interface to consult the max vector size for auto-vectorization, fixing the performance issue on x86 and reducing side effects for other platforms. After the patch, auto-vectorization is still limited to 32-byte vectors by default on Cascade Lake and users can override this by either setting > `-XX:UseAVX=3` or `-XX:MaxVectorSize=64` on JVM command line. > > So my question is: > > Before the patch, we could have a smaller max vector size for auto-vectorization than `MaxVectorSize` on x86. For example, users could have `MaxVectorSize=64` and `SuperWordMaxVectorSize=32`. But after the change, if we set > `-XX:MaxVectorSize=64` explicitly, then the max vector size for auto-vectorization would be `MaxVectorSize`, i.e. 64 bytes, which I believe is more reasonable. @sviswa7 @jatin-bhateja, are you happy about the change? > > [1] https://github.com/openjdk/jdk/pull/12350#discussion_r1126106213 > [2] https://github.com/openjdk/jdk/blob/33bec207103acd520eb99afb093cfafa44aecfda/src/hotspot/cpu/x86/vm_version_x86.cpp#L1314-L1333 Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13112#pullrequestreview-1350975796 From jcking at openjdk.org Tue Mar 21 16:54:37 2023 From: jcking at openjdk.org (Justin King) Date: Tue, 21 Mar 2023 16:54:37 GMT Subject: RFR: JDK-8304683: Memory leak in WB_IsMethodCompatible Message-ID: <4V-_5ZAkbj0Kw1UsgBnat-JrwffKM_mHRNmWZwB3ox8=.94ce5e0d-9552-4a96-9bb1-8a3f26c2dd7f@github.com> Add missing call to `DirectivesStack::release`. ------------- Commit messages: - Fix memory leak in WB_IsMethodCompatible Changes: https://git.openjdk.org/jdk/pull/13124/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13124&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8304683 Stats: 4 lines in 1 file changed: 0 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/13124.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13124/head:pull/13124 PR: https://git.openjdk.org/jdk/pull/13124 From rrich at openjdk.org Tue Mar 21 17:04:56 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 21 Mar 2023 17:04:56 GMT Subject: RFR: 8296440: Remove Method* handling from cleanup_inline_caches_impl [v3] In-Reply-To: References: Message-ID: > This PR replaces cleaning of static stubs in CompiledMethod::cleanup_inline_caches_impl() with a guarantee that it is actually not needed because the holder of the embedded target Method* is alive if the caller nmethod is not unloading. > > The holder of the target Method* has to be alive because it is reachable from the caller nmethod's oop pool. This is checked by `check_path_to_callee()` when a statically bound call gets resolved. > > C2i entry barriers can be removed for the same reason. > > Testing: > > Many rounds in our CI testing which includes most JCK and JTREG tests, Renaissance benchmark and SAP specific tests with fastdebug and release builds on the standard platforms plus PPC64. > > I've also done tier1 and tier2 tests with -XX:-Inline and tier1 tests with ZGC. > > I've started hotspot and jdk tier1 tests with -Xcomp. They were not finished when I stopped them after 24h. Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: Feedback Coleene ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12802/files - new: https://git.openjdk.org/jdk/pull/12802/files/876e55dc..2c2e49b5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12802&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12802&range=01-02 Stats: 60 lines in 4 files changed: 10 ins; 31 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/12802.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12802/head:pull/12802 PR: https://git.openjdk.org/jdk/pull/12802 From rrich at openjdk.org Tue Mar 21 17:09:14 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 21 Mar 2023 17:09:14 GMT Subject: RFR: 8296440: Remove Method* handling from cleanup_inline_caches_impl [v3] In-Reply-To: References: Message-ID: <6ohmVBWgkqo_3T0rc-5WMxp8eTdGAULDUT6kCCr773I=.877fe53a-7eec-48d7-89a8-17974fd2a335@github.com> On Mon, 20 Mar 2023 18:38:22 GMT, Coleen Phillimore wrote: >> Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: >> >> Feedback Coleene > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 4308: > >> 4306: void MacroAssembler::load_method_holder_cld(Register rresult, Register rmethod) { >> 4307: load_method_holder(rresult, rmethod); >> 4308: ldr(rresult, Address(rresult, InstanceKlass::class_loader_data_offset())); > > Can you remove InstanceKlass::klass_loader_data_offset() also? Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12802#discussion_r1143732838 From rrich at openjdk.org Tue Mar 21 17:09:21 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 21 Mar 2023 17:09:21 GMT Subject: RFR: 8296440: Remove Method* handling from cleanup_inline_caches_impl [v2] In-Reply-To: References: Message-ID: On Mon, 20 Mar 2023 18:57:25 GMT, Coleen Phillimore wrote: >> Richard Reingruber has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: >> >> - Adding TestStaticallyBoundTargetIsReachable.java >> - Path to target exists also if the receiver is a constant of the caller >> - Use nmethod::oops_do() to search for to_holder in from_nm >> - Merge branch 'master' >> - Remove MacroAssembler::resolve_weak_handle() >> - Remove keep_alive_offset() and holder_offset() from CLD >> - Remove MacroAssembler::load_method_holder_cld() >> - Remove c2i entry barrier >> - Check dependency for statically bound call > > src/hotspot/share/runtime/sharedRuntime.cpp line 1354: > >> 1352: } >> 1353: >> 1354: class Search2OopsClosure : public OopClosure { > > Should this all be under #ifdef ASSERT? Yes. I'll do that before the pr gets integrated. I want to make sure all dependencies are present also in the release build. (E.g. before JDK-8299155 was fixed the debug vm produced more dependencies) > src/hotspot/share/runtime/sharedRuntime.cpp line 1418: > >> 1416: return; // `to` is reachable by iterating parents of `from` >> 1417: } >> 1418: } > > I'd be happier if this part was a function in ClassLoaderData or refactored from record_dependency (along with the constains function). Since it's similar code. You are right. I've done the refactoring. Please let me know what you think. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12802#discussion_r1143738349 PR Review Comment: https://git.openjdk.org/jdk/pull/12802#discussion_r1143739466 From coleenp at openjdk.org Tue Mar 21 17:12:00 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 21 Mar 2023 17:12:00 GMT Subject: RFR: 8304089: Convert TraceDependencies to UL [v5] In-Reply-To: <0aGc1NdAjpvksCWmXb1gZOPp9MV0n6xWvG8EaEp2ZLg=.b79cf218-9a23-420d-bec5-7509b7f8f1c1@github.com> References: <0aGc1NdAjpvksCWmXb1gZOPp9MV0n6xWvG8EaEp2ZLg=.b79cf218-9a23-420d-bec5-7509b7f8f1c1@github.com> Message-ID: > This change converts TraceDependencies to UL and removes the develop option. I think this provides further flexibility to add tags to only trace certain things in dependency analysis, as I did when trying to understand a PR for a deoptimization change. For now, the messages are the same and the option is -Xlog:dependencies=debug. > Tested with tier1-4 Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Drop UL check for log file output (still have PrintDependencies). ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13007/files - new: https://git.openjdk.org/jdk/pull/13007/files/421b4ee3..c078451b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13007&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13007&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13007.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13007/head:pull/13007 PR: https://git.openjdk.org/jdk/pull/13007 From vlivanov at openjdk.org Tue Mar 21 17:12:03 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 21 Mar 2023 17:12:03 GMT Subject: RFR: 8304089: Convert TraceDependencies to UL [v3] In-Reply-To: References: <0aGc1NdAjpvksCWmXb1gZOPp9MV0n6xWvG8EaEp2ZLg=.b79cf218-9a23-420d-bec5-7509b7f8f1c1@github.com> <13yGfFhRFEsjHA-ox_6GxPiyU8w_hpQtgjHbsw6Glq0=.c2330c3f-0316-4f4c-aade-b6ad6c8543ee@github.com> <2IKS8cYMhoU1JRBEalwF1ZeV4Vih78eXffu_GXZ4JkQ=.27d433ac-1f04-47ce-a177-c791a42bc6fe@github.com> Message-ID: On Tue, 21 Mar 2023 13:24:03 GMT, Coleen Phillimore wrote: >> Hmmm this seems broken to me then. We output the general dependency logging to one place based on UL configuration, but then we output `log_dependency` to the log file. If these are meant to be related and always reported together then we have lost that. If they are actually unrelated then this may still need the TraceDependencies flag to control it. >> >> @iwanowww can you comment on this please? > > Maybe we don't need this -Xlog test in this place, because this is also what PrintDependencies does. Yes, UL check doesn't help here. Irrespective of the check, UL output isn't automatically placed in VM log. It's convenient for LogCompilation to have VM output along with other VM events present in the log. Having said that, I'm fine with dropping TraceDependencies check here and improve the interaction between UL and LogVMOutput separately. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13007#discussion_r1143727768 From coleenp at openjdk.org Tue Mar 21 17:12:04 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 21 Mar 2023 17:12:04 GMT Subject: RFR: 8304089: Convert TraceDependencies to UL [v3] In-Reply-To: References: <0aGc1NdAjpvksCWmXb1gZOPp9MV0n6xWvG8EaEp2ZLg=.b79cf218-9a23-420d-bec5-7509b7f8f1c1@github.com> <13yGfFhRFEsjHA-ox_6GxPiyU8w_hpQtgjHbsw6Glq0=.c2330c3f-0316-4f4c-aade-b6ad6c8543ee@github.com> <2IKS8cYMhoU1JRBEalwF1ZeV4Vih78eXffu_GXZ4JkQ=.27d433ac-1f04-47ce-a177-c791a42bc6fe@github.com> Message-ID: On Tue, 21 Mar 2023 17:01:21 GMT, Vladimir Ivanov wrote: >> Maybe we don't need this -Xlog test in this place, because this is also what PrintDependencies does. > > Yes, UL check doesn't help here. Irrespective of the check, UL output isn't automatically placed in VM log. > It's convenient for LogCompilation to have VM output along with other VM events present in the log. > Having said that, I'm fine with dropping TraceDependencies check here and improve the interaction between UL and LogVMOutput separately. Okay thank you. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13007#discussion_r1143734101 From rrich at openjdk.org Tue Mar 21 17:16:38 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 21 Mar 2023 17:16:38 GMT Subject: RFR: 8296440: Remove Method* handling from cleanup_inline_caches_impl [v2] In-Reply-To: References: Message-ID: On Mon, 20 Mar 2023 19:01:21 GMT, Coleen Phillimore wrote: > I don't really know if this is right but some comments anyway. Thanks for looking at this and for giving feedback. > src/hotspot/share/classfile/classLoaderData.cpp line 791: > >> 789: >> 790: bool ClassLoaderData::handles_contain(oop obj) { >> 791: return _handles.contains(obj); > > This might need to be protected by a metaspace_lock. I though it was not needed. Looks like `ChunkedHandleList` supports unsynchronized reading. We might not see all the handles recently added without locking but we are guaranteed to see the handles that were added by resolving the call we are checking. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12802#issuecomment-1478287839 PR Review Comment: https://git.openjdk.org/jdk/pull/12802#discussion_r1143745849 From vlivanov at openjdk.org Tue Mar 21 17:24:50 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 21 Mar 2023 17:24:50 GMT Subject: RFR: 8304089: Convert TraceDependencies to UL [v5] In-Reply-To: References: <0aGc1NdAjpvksCWmXb1gZOPp9MV0n6xWvG8EaEp2ZLg=.b79cf218-9a23-420d-bec5-7509b7f8f1c1@github.com> Message-ID: On Tue, 21 Mar 2023 17:12:00 GMT, Coleen Phillimore wrote: >> This change converts TraceDependencies to UL and removes the develop option. I think this provides further flexibility to add tags to only trace certain things in dependency analysis, as I did when trying to understand a PR for a deoptimization change. For now, the messages are the same and the option is -Xlog:dependencies=debug. >> Tested with tier1-4 > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Drop UL check for log file output (still have PrintDependencies). Marked as reviewed by vlivanov (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13007#pullrequestreview-1351042944 From rrich at openjdk.org Tue Mar 21 17:43:57 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 21 Mar 2023 17:43:57 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v9] In-Reply-To: References: Message-ID: On Tue, 21 Mar 2023 10:49:32 GMT, Andrew Haley wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix riscv interpreter mistake and acquire semantics > > src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 2337: > >> 2335: // Load-acquire the adapter method >> 2336: __ lea(method, Address(cache, in_bytes(ResolvedIndyEntry::method_offset()))); >> 2337: __ ldar(method, method); > > What reordering are we trying to prevent here? The loads of the data stored in `ResolvedIndyEntry::fill_in()` must not be reordered with loading `ResolvedIndyEntry::_method`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12778#discussion_r1143781928 From qamai at openjdk.org Tue Mar 21 18:15:44 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 21 Mar 2023 18:15:44 GMT Subject: RFR: 8304450: [vectorapi] Refactor VectorShuffle implementation [v2] In-Reply-To: References: Message-ID: On Tue, 21 Mar 2023 16:29:44 GMT, Paul Sandoz wrote: >> I have moved most of the methods to `AbstractVector` and `AbstractShuffle`, I have to resort to raw types, though, since there seems to be no way to do the same with wild cards, and the generics mechanism is not powerful enough for things like `Vector`. The remaining failure seems to be related to [JDK-8304676](https://bugs.openjdk.org/projects/JDK/issues/JDK-8304676), so I think this patch is ready for review now. >> >>> The mask implementation is specialized by the species of vectors it operates on, but does it have to be >> >> Apart from the mask implementation, shuffle implementation definitely has to take into consideration the element type. However, this information does not have to be visible to the API, similar to how we currently handle the vector length, we can have `class AbstractMask implements VectorMask`. As a result, the cast method would be useless and can be removed in the API, but our implementation details would still use it, for example >> >> Vector blend(Vector v, VectorMask w) { >> AbstractMask aw = (AbstractMask) w; >> AbstractMask tw = aw.cast(vspecies()); >> return VectorSupport.blend(...); >> } >> >> Vector rearrange(VectorShuffle s) { >> AbstractShuffle as = (AbstractShuffle) s; >> AbstractShuffle ts = s.cast(vspecies()); >> return VectorSupport.rearrangeOp(...); >> } >> >> What do you think? > >> Apart from the mask implementation, shuffle implementation definitely has to take into consideration the element type. > > Yes, the way you have implemented shuffle is tightly connected, that looks ok. > > I am wondering if we can make the mask implementation more loosely coupled and modified such that it does not have to take into consideration the element type (or species) of the vector it operates on, and instead compatibility is based solely on the lane count. > > Ideally it would be good to change the `VectorMask::check` method to just compare the lanes counts and not require a cast in the implementation, which i presume requires some deeper changes in C2? > > What you propose seems a possible a interim step towards a more preferable API, if the performance is good. @PaulSandoz As some hardware does differentiate masks based on element type, at some point we have to differentiate between them. From a design point of view, they are both implementation details so there might be no consideration regarding the API. On the other hand, having more in the Java side seems to be more desirable, as it does illustrate the operations more intuitively compared to the graph management in C2. Another important point I can think of is that having a constant shape for a Java class would help us in implementing the vector calling convention, as we can rely on the class information instead of some side channels. As a result, I think I do prefer the current class hierarchy. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13093#issuecomment-1478374992 From matsaave at openjdk.org Tue Mar 21 20:01:44 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 21 Mar 2023 20:01:44 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v10] In-Reply-To: References: Message-ID: > The current structure used to store the resolution information for invokedynamic, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure can hold information for fields, methods, and invokedynamics and each of its fields can hold different types of values depending on the entry. > > This enhancement proposes a new structure to exclusively contain invokedynamic information in a manner that is easy to interpret and easy to extend. Resolved invokedynamic entries will be stored in an array in the constant pool cache and the operand of the invokedynamic bytecode will be rewritten to be the index into this array. > > Any areas that previously accessed invokedynamic data from ConstantPoolCacheEntry will be replaced with accesses to this new array and structure. Verified with tier1-9 tests. > > The PPC was provided by @reinrich and the RISCV port was provided by @DingliZhang and @zifeihan. > > This change supports the following platforms: x86, aarch64, PPC, and RISCV Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Consistent register naming for aarch64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12778/files - new: https://git.openjdk.org/jdk/pull/12778/files/8607f62a..cbe4fdcb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12778&range=08-09 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/12778.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12778/head:pull/12778 PR: https://git.openjdk.org/jdk/pull/12778 From smonteith at openjdk.org Tue Mar 21 20:47:43 2023 From: smonteith at openjdk.org (Stuart Monteith) Date: Tue, 21 Mar 2023 20:47:43 GMT Subject: RFR: JDK-8301493: Replace NULL with nullptr in cpu/aarch64 In-Reply-To: References: Message-ID: On Tue, 31 Jan 2023 11:39:27 GMT, Johan Sj?len wrote: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/aarch64. Unfortunately the script that does the change isn't perfect, and so we > need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! src/hotspot/cpu/aarch64/interpreterRT_aarch64.cpp line 270: > 268: virtual void pass_object() { > 269: intptr_t* addr = single_slot_addr(); > 270: intptr_t value = *addr == 0 ? nullptr : (intptr_t)addr; This doesn't compile - perhaps replace nullptr with zero? Unless casting it is more appropriate. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12321#discussion_r1143970163 From coleenp at openjdk.org Tue Mar 21 21:43:34 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 21 Mar 2023 21:43:34 GMT Subject: RFR: 8304687: Move add_to_hierarchy Message-ID: <8lZkIOj_sW-s5fenIUyqOflXokTscrf8v6Zi6VOic4o=.847bbf8a-430b-46f5-a4c2-335d3355c320@github.com> Moved SystemDictionary::add_to_hierarchy to InstanceKlass::add_to_hierarchy where it more logically belongs and next to other functions that also care about dependencies. Tested with tier1-4, and tier1 on linux,windows,macos to check header file changes. ------------- Commit messages: - 8304687: Move add_to_hierarchy Changes: https://git.openjdk.org/jdk/pull/13129/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13129&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8304687 Stats: 98 lines in 7 files changed: 44 ins; 49 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/13129.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13129/head:pull/13129 PR: https://git.openjdk.org/jdk/pull/13129 From aph at openjdk.org Tue Mar 21 22:24:51 2023 From: aph at openjdk.org (Andrew Haley) Date: Tue, 21 Mar 2023 22:24:51 GMT Subject: RFR: 8301995: Move invokedynamic resolution information out of ConstantPoolCacheEntry [v9] In-Reply-To: References: Message-ID: On Tue, 21 Mar 2023 17:40:46 GMT, Richard Reingruber wrote: >> src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 2337: >> >>> 2335: // Load-acquire the adapter method >>> 2336: __ lea(method, Address(cache, in_bytes(ResolvedIndyEntry::method_offset()))); >>> 2337: __ ldar(method, method); >> >> What reordering are we trying to prevent here? > > The loads of the data stored in `ResolvedIndyEntry::fill_in()` must not be reordered with loading `ResolvedIndyEntry::_method`. Ah, I see. This acquiring load matches the releasing store in `fill_in()`. Please add a comment here to that effect. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12778#discussion_r1144047014 From sviswanathan at openjdk.org Tue Mar 21 23:35:48 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 21 Mar 2023 23:35:48 GMT Subject: RFR: 8302191: Performance degradation for float/double modulo on Linux [v13] In-Reply-To: References: Message-ID: <88lAXNroRn3vodO7xcBefm3MumrW5grX4s8yCsV0W5s=.f7c1a32f-0ba3-4b56-b38c-2cd0dba932b1@github.com> On Wed, 15 Mar 2023 04:57:01 GMT, David Holmes wrote: >> Jan Kratochvil has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: >> >> - Merge branch 'master' into modulo >> - Fix missing SharedRuntime::frem and SharedRuntime::drem on aarch64. >> - bugreported by sviswa7. >> - Merge branch 'master' into modulo >> - Fix #endif comment - found by dholmes-ora. >> - Merge branch 'master' into modulo >> - Fix win32 broken build. >> - Merge remote-tracking branch 'origin/master' into modulo >> - Always include the _WIN64 workaround - a review by dholmes-ora. >> - Remove comments to be moved to JBS (Bug System) - a review by jddarcy. >> - Uppercase L - a review by turbanoff. >> - ... and 4 more: https://git.openjdk.org/jdk/compare/eb5e17b6...65af58da > > Functional CI testing in tiers 1-4 is good. > > I'm also running some benchmarks on Linux-x64 @dholmes-ora Were the benchmark runs for float/double modulo ok? ------------- PR Comment: https://git.openjdk.org/jdk/pull/12508#issuecomment-1478726088 From kvn at openjdk.org Wed Mar 22 01:19:03 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 22 Mar 2023 01:19:03 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: On Thu, 16 Mar 2023 20:56:15 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is grown when needed. This means that we need to check for potential overflow before attempting locking. When that is the case, locking fast-paths would call into the runtime to grow the stack and handle the locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check on method entry to avoid (possibly lots) of such checks at locking sites. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 > - Set condition flags correctly after fast-lock call on aarch64 First, thank you for putting new code under flag. I looked only on x86 code. It seems fine except few places where I added comment. I wish locking code for Aarch64 and Risc-v was moved to c2_MacroAssembler as on x86 but for this review it is better to keep it where it is. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 602: > 600: movptr(tmpReg, Address(objReg, oopDesc::mark_offset_in_bytes())); // [FETCH] > 601: testptr(tmpReg, markWord::monitor_value); // inflated vs stack-locked|neutral > 602: jccb(Assembler::notZero, IsInflated); May be use `jcc` long branch here to be safe. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 670: > 668: get_thread (scrReg); // beware: clobbers ICCs > 669: movptr(Address(boxReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner)), scrReg); > 670: xorptr(boxReg, boxReg); // set icc.ZFlag = 1 to indicate success Should this be under `if (UseFastLocking)`? src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 791: > 789: Compile::current()->output()->add_stub(stub); > 790: jcc(Assembler::notEqual, stub->entry()); > 791: bind(stub->continuation()); Why use stub here and not inline the code? Because the branch mostly not taken? ------------- PR Review: https://git.openjdk.org/jdk/pull/10907#pullrequestreview-1351551552 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1144107482 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1144111315 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1144119776 From sspitsyn at openjdk.org Wed Mar 22 02:20:40 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 22 Mar 2023 02:20:40 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 Message-ID: The fix is to enable support for late binding JVMTI agents. The fix includes: - New function `JvmtiEnvBase::enable_virtual_threads_notify_jvmti()` which does enabling JVMTI VTMS transition notifications in case of agent loaded into running VM. This function executes a VM operation counting VTMS transition bits in all `JavaThread`'s to correctly set the static counter `_VTMS_transition_count` needed for VTMS transition protocol. - New function `JvmtiEnvBase::disable_virtual_threads_notify_jvmti()` which is needed for testing. It is used by the `WhiteBox` API. - New WhiteBox function `WB_SetVirtualThreadsNotifyJvmtiMode(JNIEnv* env, jobject wb, jboolean enable)` needed for testing of this update. - New regression test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` Testing: - New test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` - The originally failed tests are expected to pass now: `runtime/vthread/RedefineClass.java` `runtime/vthread/TestObjectAllocationSampleEvent.java` - In progress: Run the tiers 1-6 to make sure there are no regression. ------------- Commit messages: - cleanup in vmOperation.hpp - restored one incorrectly removed function - removed temporary debugging changes - 8297286: runtime/vthread tests crashing after JDK-8296324 Changes: https://git.openjdk.org/jdk/pull/13133/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13133&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8297286 Stats: 380 lines in 9 files changed: 377 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13133.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13133/head:pull/13133 PR: https://git.openjdk.org/jdk/pull/13133 From sspitsyn at openjdk.org Wed Mar 22 03:03:18 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 22 Mar 2023 03:03:18 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v2] In-Reply-To: References: Message-ID: > The fix is to enable support for late binding JVMTI agents. > The fix includes: > - New function `JvmtiEnvBase::enable_virtual_threads_notify_jvmti()` which does enabling JVMTI VTMS transition notifications in case of agent loaded into running VM. This function executes a VM operation counting VTMS transition bits in all `JavaThread`'s to correctly set the static counter `_VTMS_transition_count` needed for VTMS transition protocol. > - New function `JvmtiEnvBase::disable_virtual_threads_notify_jvmti()` which is needed for testing. It is used by the `WhiteBox` API. > - New WhiteBox function `WB_SetVirtualThreadsNotifyJvmtiMode(JNIEnv* env, jobject wb, jboolean enable)` needed for testing of this update. > - New regression test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` > > Testing: > - New test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` > - The originally failed tests are expected to pass now: > `runtime/vthread/RedefineClass.java` > `runtime/vthread/TestObjectAllocationSampleEvent.java` > - In progress: Run the tiers 1-6 to make sure there are no regression. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: update for missed part in jvmtiExport.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13133/files - new: https://git.openjdk.org/jdk/pull/13133/files/6e27ee6f..ddac01c3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13133&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13133&range=00-01 Stats: 10 lines in 2 files changed: 8 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13133.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13133/head:pull/13133 PR: https://git.openjdk.org/jdk/pull/13133 From rrich at openjdk.org Wed Mar 22 05:39:42 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 22 Mar 2023 05:39:42 GMT Subject: RFR: 8296440: Remove Method* handling from cleanup_inline_caches_impl [v3] In-Reply-To: References: Message-ID: On Tue, 21 Mar 2023 17:04:56 GMT, Richard Reingruber wrote: >> This PR replaces cleaning of static stubs in CompiledMethod::cleanup_inline_caches_impl() with a guarantee that it is actually not needed because the holder of the embedded target Method* is alive if the caller nmethod is not unloading. >> >> The holder of the target Method* has to be alive because it is reachable from the caller nmethod's oop pool. This is checked by `check_path_to_callee()` when a statically bound call gets resolved. >> >> C2i entry barriers can be removed for the same reason. >> >> Testing: >> >> Many rounds in our CI testing which includes most JCK and JTREG tests, Renaissance benchmark and SAP specific tests with fastdebug and release builds on the standard platforms plus PPC64. >> >> I've also done tier1 and tier2 tests with -XX:-Inline and tier1 tests with ZGC. >> >> I've started hotspot and jdk tier1 tests with -Xcomp. They were not finished when I stopped them after 24h. > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Feedback Coleene Going back to draft while analyzing test errors. Sorry for the noise. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12802#issuecomment-1478942316 From fgao at openjdk.org Wed Mar 22 05:57:40 2023 From: fgao at openjdk.org (Fei Gao) Date: Wed, 22 Mar 2023 05:57:40 GMT Subject: RFR: 8304301: Remove the global option SuperWordMaxVectorSize In-Reply-To: References: <2XQwCCx_ficJ1bn2WX0Ud9m-QmuGkkZAH8N6yLugXqQ=.90a136da-db9a-49b4-a4c4-d6c644cf7d5a@github.com> Message-ID: On Tue, 21 Mar 2023 04:16:26 GMT, Sandhya Viswanathan wrote: >> https://github.com/openjdk/jdk/pull/8877 introduced the global option `SuperWordMaxVectorSize` as a temporary solution to fix the performance regression on some x86 machines. >> >> Currently, `SuperWordMaxVectorSize` behaves differently between x86 and other platforms [1]. For example, if the current machine only supports `MaxVectorSize <= 32`, but we set `SuperWordMaxVectorSize = 64`, then `SuperWordMaxVectorSize` will be kept at 64 on other platforms while x86 machine would change `SuperWordMaxVectorSize` to `MaxVectorSize`. Other platforms except x86 miss similar implementations like [2]. >> >> Also, `SuperWordMaxVectorSize` limits the max vector size of auto-vectorization as `64`, which is fine for current aarch64 hardware, but SVE architecture supports larger than 512 bits. >> >> The patch is to drop the global option and use an architecture-dependent interface to consult the max vector size for auto-vectorization, fixing the performance issue on x86 and reducing side effects for other platforms. After the patch, auto-vectorization is still limited to 32-byte vectors by default on Cascade Lake and users can override this by either setting >> `-XX:UseAVX=3` or `-XX:MaxVectorSize=64` on JVM command line. >> >> So my question is: >> >> Before the patch, we could have a smaller max vector size for auto-vectorization than `MaxVectorSize` on x86. For example, users could have `MaxVectorSize=64` and `SuperWordMaxVectorSize=32`. But after the change, if we set >> `-XX:MaxVectorSize=64` explicitly, then the max vector size for auto-vectorization would be `MaxVectorSize`, i.e. 64 bytes, which I believe is more reasonable. @sviswa7 @jatin-bhateja, are you happy about the change? >> >> [1] https://github.com/openjdk/jdk/pull/12350#discussion_r1126106213 >> [2] https://github.com/openjdk/jdk/blob/33bec207103acd520eb99afb093cfafa44aecfda/src/hotspot/cpu/x86/vm_version_x86.cpp#L1314-L1333 > > @fg1417 SuperWordMaxVectorSize defines the maximum vector size generated by the auto vectorization. > MaxVectorSize defines the vector width supported by the underlying platform. > For CascadeLake we are setting SuperWordMaxVectorSize=32 and MaxVectorSize=64 by default. > This allows the usage of larger 64 byte width vector instructions in places like Java Vector API and intrinsics. > We would like to keep this behavior for x86. @sviswa7 thanks for your kind review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13112#issuecomment-1478962379 From sspitsyn at openjdk.org Wed Mar 22 06:06:25 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 22 Mar 2023 06:06:25 GMT Subject: RFR: 8297286: runtime/vthread tests crashing after JDK-8296324 [v3] In-Reply-To: References: Message-ID: > The fix is to enable support for late binding JVMTI agents. > The fix includes: > - New function `JvmtiEnvBase::enable_virtual_threads_notify_jvmti()` which does enabling JVMTI VTMS transition notifications in case of agent loaded into running VM. This function executes a VM operation counting VTMS transition bits in all `JavaThread`'s to correctly set the static counter `_VTMS_transition_count` needed for VTMS transition protocol. > - New function `JvmtiEnvBase::disable_virtual_threads_notify_jvmti()` which is needed for testing. It is used by the `WhiteBox` API. > - New WhiteBox function `WB_SetVirtualThreadsNotifyJvmtiMode(JNIEnv* env, jobject wb, jboolean enable)` needed for testing of this update. > - New regression test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` > > Testing: > - New test: `serviceability/jvmti/vthread/ToggleNotifyJvmtiTest` > - The originally failed tests are expected to pass now: > `runtime/vthread/RedefineClass.java` > `runtime/vthread/TestObjectAllocationSampleEvent.java` > - In progress: Run the tiers 1-6 to make sure there are no regression. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: add necessary tweak in jvmtiExport.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13133/files - new: https://git.openjdk.org/jdk/pull/13133/files/ddac01c3..51c3f7d5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13133&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13133&range=01-02 Stats: 4 lines in 2 files changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13133.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13133/head:pull/13133 PR: https://git.openjdk.org/jdk/pull/13133 From fgao at openjdk.org Wed Mar 22 06:11:54 2023 From: fgao at openjdk.org (Fei Gao) Date: Wed, 22 Mar 2023 06:11:54 GMT Subject: RFR: 8304301: Remove the global option SuperWordMaxVectorSize In-Reply-To: <2XQwCCx_ficJ1bn2WX0Ud9m-QmuGkkZAH8N6yLugXqQ=.90a136da-db9a-49b4-a4c4-d6c644cf7d5a@github.com> References: <2XQwCCx_ficJ1bn2WX0Ud9m-QmuGkkZAH8N6yLugXqQ=.90a136da-db9a-49b4-a4c4-d6c644cf7d5a@github.com> Message-ID: On Tue, 21 Mar 2023 02:26:55 GMT, Fei Gao wrote: > https://github.com/openjdk/jdk/pull/8877 introduced the global option `SuperWordMaxVectorSize` as a temporary solution to fix the performance regression on some x86 machines. > > Currently, `SuperWordMaxVectorSize` behaves differently between x86 and other platforms [1]. For example, if the current machine only supports `MaxVectorSize <= 32`, but we set `SuperWordMaxVectorSize = 64`, then `SuperWordMaxVectorSize` will be kept at 64 on other platforms while x86 machine would change `SuperWordMaxVectorSize` to `MaxVectorSize`. Other platforms except x86 miss similar implementations like [2]. > > Also, `SuperWordMaxVectorSize` limits the max vector size of auto-vectorization as `64`, which is fine for current aarch64 hardware, but SVE architecture supports larger than 512 bits. > > The patch is to drop the global option and use an architecture-dependent interface to consult the max vector size for auto-vectorization, fixing the performance issue on x86 and reducing side effects for other platforms. After the patch, auto-vectorization is still limited to 32-byte vectors by default on Cascade Lake and users can override this by either setting > `-XX:UseAVX=3` or `-XX:MaxVectorSize=64` on JVM command line. > > So my question is: > > Before the patch, we could have a smaller max vector size for auto-vectorization than `MaxVectorSize` on x86. For example, users could have `MaxVectorSize=64` and `SuperWordMaxVectorSize=32`. But after the change, if we set > `-XX:MaxVectorSize=64` explicitly, then the max vector size for auto-vectorization would be `MaxVectorSize`, i.e. 64 bytes, which I believe is more reasonable. @sviswa7 @jatin-bhateja, are you happy about the change? > > [1] https://github.com/openjdk/jdk/pull/12350#discussion_r1126106213 > [2] https://github.com/openjdk/jdk/blob/33bec207103acd520eb99afb093cfafa44aecfda/src/hotspot/cpu/x86/vm_version_x86.cpp#L1314-L1333 Hi @vnkozlov @TobiHartmann, could you please help review the patch? Since I don't have sufficient x86 systems, I would appreciate it if you could help verify that the patch will not introduce any performance regression on x86, especially on Cascade Lake. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13112#issuecomment-1478971819 From thartmann at openjdk.org Wed Mar 22 06:21:42 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 22 Mar 2023 06:21:42 GMT Subject: RFR: JDK-8304683: Memory leak in WB_IsMethodCompatible In-Reply-To: <4V-_5ZAkbj0Kw1UsgBnat-JrwffKM_mHRNmWZwB3ox8=.94ce5e0d-9552-4a96-9bb1-8a3f26c2dd7f@github.com> References: <4V-_5ZAkbj0Kw1UsgBnat-JrwffKM_mHRNmWZwB3ox8=.94ce5e0d-9552-4a96-9bb1-8a3f26c2dd7f@github.com> Message-ID: On Tue, 21 Mar 2023 16:45:30 GMT, Justin King wrote: > Add missing call to `DirectivesStack::release`. Looks good and trivial. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13124#pullrequestreview-1351792326 From kvn at openjdk.org Wed Mar 22 06:35:41 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 22 Mar 2023 06:35:41 GMT Subject: RFR: 8304301: Remove the global option SuperWordMaxVectorSize In-Reply-To: <2XQwCCx_ficJ1bn2WX0Ud9m-QmuGkkZAH8N6yLugXqQ=.90a136da-db9a-49b4-a4c4-d6c644cf7d5a@github.com> References: <2XQwCCx_ficJ1bn2WX0Ud9m-QmuGkkZAH8N6yLugXqQ=.90a136da-db9a-49b4-a4c4-d6c644cf7d5a@github.com> Message-ID: <50dOJmvInI6foWz9kH7iVPz6TfYH5Crbc4kyXxN4HNY=.f7cc3709-f0a4-49dc-85b8-0fc9f5ce1afa@github.com> On Tue, 21 Mar 2023 02:26:55 GMT, Fei Gao wrote: > https://github.com/openjdk/jdk/pull/8877 introduced the global option `SuperWordMaxVectorSize` as a temporary solution to fix the performance regression on some x86 machines. > > Currently, `SuperWordMaxVectorSize` behaves differently between x86 and other platforms [1]. For example, if the current machine only supports `MaxVectorSize <= 32`, but we set `SuperWordMaxVectorSize = 64`, then `SuperWordMaxVectorSize` will be kept at 64 on other platforms while x86 machine would change `SuperWordMaxVectorSize` to `MaxVectorSize`. Other platforms except x86 miss similar implementations like [2]. > > Also, `SuperWordMaxVectorSize` limits the max vector size of auto-vectorization as `64`, which is fine for current aarch64 hardware, but SVE architecture supports larger than 512 bits. > > The patch is to drop the global option and use an architecture-dependent interface to consult the max vector size for auto-vectorization, fixing the performance issue on x86 and reducing side effects for other platforms. After the patch, auto-vectorization is still limited to 32-byte vectors by default on Cascade Lake and users can override this by either setting > `-XX:UseAVX=3` or `-XX:MaxVectorSize=64` on JVM command line. > > So my question is: > > Before the patch, we could have a smaller max vector size for auto-vectorization than `MaxVectorSize` on x86. For example, users could have `MaxVectorSize=64` and `SuperWordMaxVectorSize=32`. But after the change, if we set > `-XX:MaxVectorSize=64` explicitly, then the max vector size for auto-vectorization would be `MaxVectorSize`, i.e. 64 bytes, which I believe is more reasonable. @sviswa7 @jatin-bhateja, are you happy about the change? > > [1] https://github.com/openjdk/jdk/pull/12350#discussion_r1126106213 > [2] https://github.com/openjdk/jdk/blob/33bec207103acd520eb99afb093cfafa44aecfda/src/hotspot/cpu/x86/vm_version_x86.cpp#L1314-L1333 I will test it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13112#issuecomment-1478986856 From dholmes at openjdk.org Wed Mar 22 06:54:43 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 22 Mar 2023 06:54:43 GMT Subject: RFR: 8302191: Performance degradation for float/double modulo on Linux [v13] In-Reply-To: References: Message-ID: On Tue, 14 Mar 2023 12:45:42 GMT, Jan Kratochvil wrote: >> I have OCA already processed/approved. I am not Author but my Author request is being processed these days (sent to Rob McKenna). >> I did regression test x86_64 OpenJDK-8. I will leave other regression testing on GHA. >> The patch (and former GCC performance regression) affects only x86_64+i686. > > Jan Kratochvil has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: > > - Merge branch 'master' into modulo > - Fix missing SharedRuntime::frem and SharedRuntime::drem on aarch64. > - bugreported by sviswa7. > - Merge branch 'master' into modulo > - Fix #endif comment - found by dholmes-ora. > - Merge branch 'master' into modulo > - Fix win32 broken build. > - Merge remote-tracking branch 'origin/master' into modulo > - Always include the _WIN64 workaround - a review by dholmes-ora. > - Remove comments to be moved to JBS (Bug System) - a review by jddarcy. > - Uppercase L - a review by turbanoff. > - ... and 4 more: https://git.openjdk.org/jdk/compare/539f86a7...65af58da Nothing further from me. Thanks for your patience here. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/12508#pullrequestreview-1351821663 From dholmes at openjdk.org Wed Mar 22 06:54:45 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 22 Mar 2023 06:54:45 GMT Subject: RFR: 8302191: Performance degradation for float/double modulo on Linux [v13] In-Reply-To: <88lAXNroRn3vodO7xcBefm3MumrW5grX4s8yCsV0W5s=.f7c1a32f-0ba3-4b56-b38c-2cd0dba932b1@github.com> References: <88lAXNroRn3vodO7xcBefm3MumrW5grX4s8yCsV0W5s=.f7c1a32f-0ba3-4b56-b38c-2cd0dba932b1@github.com> Message-ID: On Tue, 21 Mar 2023 23:32:28 GMT, Sandhya Viswanathan wrote: >> Functional CI testing in tiers 1-4 is good. >> >> I'm also running some benchmarks on Linux-x64 > > @dholmes-ora Were the benchmark runs for float/double modulo ok? @sviswa7 the general benchmarks I ran showed some small losses, some small wins and a few larger wins. So no reason for this not to go in based on that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12508#issuecomment-1479003124 From dholmes at openjdk.org Wed Mar 22 07:02:45 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 22 Mar 2023 07:02:45 GMT Subject: RFR: 8304089: Convert TraceDependencies to UL [v5] In-Reply-To: References: <0aGc1NdAjpvksCWmXb1gZOPp9MV0n6xWvG8EaEp2ZLg=.b79cf218-9a23-420d-bec5-7509b7f8f1c1@github.com> Message-ID: On Tue, 21 Mar 2023 17:12:00 GMT, Coleen Phillimore wrote: >> This change converts TraceDependencies to UL and removes the develop option. I think this provides further flexibility to add tags to only trace certain things in dependency analysis, as I did when trying to understand a PR for a deoptimization change. For now, the messages are the same and the option is -Xlog:dependencies=debug. >> Tested with tier1-4 > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Drop UL check for log file output (still have PrintDependencies). Nothing further from me - update is good. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13007#pullrequestreview-1351837055 From dholmes at openjdk.org Wed Mar 22 07:17:40 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 22 Mar 2023 07:17:40 GMT Subject: RFR: 8304687: Move add_to_hierarchy In-Reply-To: <8lZkIOj_sW-s5fenIUyqOflXokTscrf8v6Zi6VOic4o=.847bbf8a-430b-46f5-a4c2-335d3355c320@github.com> References: <8lZkIOj_sW-s5fenIUyqOflXokTscrf8v6Zi6VOic4o=.847bbf8a-430b-46f5-a4c2-335d3355c320@github.com> Message-ID: <8GQkulzonIlR8El5Ql1MVrQg8X9Ka7mXjoa6fjsNgt8=.15551bb3-12e9-40b1-b6e6-13d504d9e18b@github.com> On Tue, 21 Mar 2023 21:36:33 GMT, Coleen Phillimore wrote: > Moved SystemDictionary::add_to_hierarchy to InstanceKlass::add_to_hierarchy where it more logically belongs and next to other functions that also care about dependencies. > Tested with tier1-4, and tier1 on linux,windows,macos to check header file changes. Refactoring looks fine. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13129#pullrequestreview-1351852298 From dholmes at openjdk.org Wed Mar 22 07:33:43 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 22 Mar 2023 07:33:43 GMT Subject: RFR: JDK-8301498: Replace NULL with nullptr in cpu/x86 [v4] In-Reply-To: References: Message-ID: On Tue, 21 Mar 2023 10:04:04 GMT, Johan Sj?len wrote: >> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/x86. Unfortunately the script that does the change isn't perfect, and so we >> need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. >> >> Here are some typical things to look out for: >> >> 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). >> 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. >> 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. >> >> An example of this: >> >> ```c++ >> // This function returns null >> void* ret_null(); >> // This function returns true if *x == nullptr >> bool is_nullptr(void** x); >> >> >> Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. >> >> Thanks! > > Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge remote-tracking branch 'origin/master' into JDK-8301498 > - Fix vnkozlov's suggestions > - Merge remote-tracking branch 'origin/master' into JDK-8301498 > - Some more fixes > - Fixes > - Replace NULL with nullptr in cpu/x86 Still good. Thanks. (incremental change looked weird but end result seems fine) ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/12326#pullrequestreview-1351869217 From xgong at openjdk.org Wed Mar 22 08:08:46 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 22 Mar 2023 08:08:46 GMT Subject: RFR: 8304450: [vectorapi] Refactor VectorShuffle implementation [v3] In-Reply-To: <4Op0Z8whnyDXDC6zGyMbx4ugcZp5TEoAqW_myB5flxM=.1c7b59ba-efb2-4f68-90d7-2d6e33e39572@github.com> References: <4Op0Z8whnyDXDC6zGyMbx4ugcZp5TEoAqW_myB5flxM=.1c7b59ba-efb2-4f68-90d7-2d6e33e39572@github.com> Message-ID: On Tue, 21 Mar 2023 16:16:31 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, VectorShuffle is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: >> >> 1. Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. >> 2. Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. >> 3. Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. >> 4. Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. >> >> Upon these changes, a `rearrange` can emit more efficient code: >> >> var species = IntVector.SPECIES_128; >> var v1 = IntVector.fromArray(species, SRC1, 0); >> var v2 = IntVector.fromArray(species, SRC2, 0); >> v1.rearrange(v2.toShuffle()).intoArray(DST, 0); >> >> Before: >> movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} >> vmovdqu 0x10(%r10),%xmm2 >> movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} >> vmovdqu 0x10(%r10),%xmm1 >> movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} >> vmovdqu 0x10(%r10),%xmm0 >> vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask >> ; {external_word} >> vpackusdw %xmm0,%xmm0,%xmm0 >> vpackuswb %xmm0,%xmm0,%xmm0 >> vpmovsxbd %xmm0,%xmm3 >> vpcmpgtd %xmm3,%xmm1,%xmm3 >> vtestps %xmm3,%xmm3 >> jne 0x00007fc2acb4e0d8 >> vpmovzxbd %xmm0,%xmm0 >> vpermd %ymm2,%ymm0,%ymm0 >> movabs $0x751588f98,%r10 ; {oop([I{0x0000000751588f98})} >> vmovdqu %xmm0,0x10(%r10) >> >> After: >> movabs $0x751589c78,%r10 ; {oop([I{0x0000000751589c78})} >> vmovdqu 0x10(%r10),%xmm1 >> movabs $0x75158ac88,%r10 ; {oop([I{0x000000075158ac88})} >> vmovdqu 0x10(%r10),%xmm2 >> vpxor %xmm0,%xmm0,%xmm0 >> vpcmpgtd %xmm2,%xmm0,%xmm3 >> vtestps %xmm3,%xmm3 >> jne 0x00007fa818b27cb1 >> vpermd %ymm1,%ymm2,%ymm0 >> movabs $0x751588c68,%r10 ; {oop([I{0x0000000751588c68})} >> vmovdqu %xmm0,0x10(%r10) >> >> Please take a look and leave reviews. Thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: > > - missing casts > - clean up src/hotspot/cpu/aarch64/aarch64_vector.ad line 6082: > 6080: // to implement rearrange. > 6081: > 6082: // Maybe move the shuffle preparation to VectorLoadShuffle Agree that moving the shuffle computation code to `VectorLoadShuffle`. Thanks! src/hotspot/share/opto/vectorIntrinsics.cpp line 2059: > 2057: if (need_load_shuffle) { > 2058: shuffle = gvn().transform(new VectorLoadShuffleNode(shuffle, vt)); > 2059: } How about generating `VectorLoadShuffleNode` for all platforms that support Vector API, and remove the helper method `vector_needs_load_shuffle()` ? For those platforms that do not need this shuffle preparation, we can emit nothing in codegen. src/hotspot/share/opto/vectorIntrinsics.cpp line 2426: > 2424: if (is_vector_shuffle(vbox_klass_from)) { > 2425: return false; // vector shuffles aren't supported > 2426: } Is it better to change this as an "assertion" or print the log details? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13093#discussion_r1144366812 PR Review Comment: https://git.openjdk.org/jdk/pull/13093#discussion_r1144360349 PR Review Comment: https://git.openjdk.org/jdk/pull/13093#discussion_r1144363416 From xgong at openjdk.org Wed Mar 22 08:11:43 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 22 Mar 2023 08:11:43 GMT Subject: RFR: 8304450: [vectorapi] Refactor VectorShuffle implementation [v3] In-Reply-To: <4Op0Z8whnyDXDC6zGyMbx4ugcZp5TEoAqW_myB5flxM=.1c7b59ba-efb2-4f68-90d7-2d6e33e39572@github.com> References: <4Op0Z8whnyDXDC6zGyMbx4ugcZp5TEoAqW_myB5flxM=.1c7b59ba-efb2-4f68-90d7-2d6e33e39572@github.com> Message-ID: On Tue, 21 Mar 2023 16:16:31 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, VectorShuffle is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: >> >> 1. Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. >> 2. Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. >> 3. Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. >> 4. Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. >> >> Upon these changes, a `rearrange` can emit more efficient code: >> >> var species = IntVector.SPECIES_128; >> var v1 = IntVector.fromArray(species, SRC1, 0); >> var v2 = IntVector.fromArray(species, SRC2, 0); >> v1.rearrange(v2.toShuffle()).intoArray(DST, 0); >> >> Before: >> movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} >> vmovdqu 0x10(%r10),%xmm2 >> movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} >> vmovdqu 0x10(%r10),%xmm1 >> movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} >> vmovdqu 0x10(%r10),%xmm0 >> vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask >> ; {external_word} >> vpackusdw %xmm0,%xmm0,%xmm0 >> vpackuswb %xmm0,%xmm0,%xmm0 >> vpmovsxbd %xmm0,%xmm3 >> vpcmpgtd %xmm3,%xmm1,%xmm3 >> vtestps %xmm3,%xmm3 >> jne 0x00007fc2acb4e0d8 >> vpmovzxbd %xmm0,%xmm0 >> vpermd %ymm2,%ymm0,%ymm0 >> movabs $0x751588f98,%r10 ; {oop([I{0x0000000751588f98})} >> vmovdqu %xmm0,0x10(%r10) >> >> After: >> movabs $0x751589c78,%r10 ; {oop([I{0x0000000751589c78})} >> vmovdqu 0x10(%r10),%xmm1 >> movabs $0x75158ac88,%r10 ; {oop([I{0x000000075158ac88})} >> vmovdqu 0x10(%r10),%xmm2 >> vpxor %xmm0,%xmm0,%xmm0 >> vpcmpgtd %xmm2,%xmm0,%xmm3 >> vtestps %xmm3,%xmm3 >> jne 0x00007fa818b27cb1 >> vpermd %ymm1,%ymm2,%ymm0 >> movabs $0x751588c68,%r10 ; {oop([I{0x0000000751588c68})} >> vmovdqu %xmm0,0x10(%r10) >> >> Please take a look and leave reviews. Thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: > > - missing casts > - clean up Please also update the copyright to 2023 for some touched files like `vectorSupport.hpp` and other java files like `AbstractShuffle.java`, `AbstractVector.java`, `VectorShape.java`, and `VectorShuffle.java`. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13093#issuecomment-1479076010 From fjiang at openjdk.org Wed Mar 22 08:18:42 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Wed, 22 Mar 2023 08:18:42 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v5] In-Reply-To: References: Message-ID: <2VN0jPv3cwJuzneVkIBwHReDI-2zj0qBknbc_AeFy1k=.d306ef5f-c104-496b-b6fd-f404ffc33d39@github.com> On Tue, 21 Mar 2023 13:35:25 GMT, Per Minborg wrote: >> Per Minborg has updated the pull request incrementally with one additional commit since the last revision: >> >> Add example for Option::captureStateLayout > > A review of all the copyright years shall be made in this PR. Hi @minborg, looks like some changes were missed on riscv port. I've added these changes and submitted tests on linux-riscv. `jdk_foreign` still passed with release & fatdebug build. Could you please add these extra changes for riscv? Thanks. Here is the patch: [foreign_riscv_port_patch.txt](https://github.com/openjdk/jdk/files/11037700/foreign_riscv_port_patch.txt) ------------- PR Comment: https://git.openjdk.org/jdk/pull/13079#issuecomment-1479083052 From xgong at openjdk.org Wed Mar 22 08:33:47 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 22 Mar 2023 08:33:47 GMT Subject: RFR: 8304450: [vectorapi] Refactor VectorShuffle implementation [v3] In-Reply-To: <4Op0Z8whnyDXDC6zGyMbx4ugcZp5TEoAqW_myB5flxM=.1c7b59ba-efb2-4f68-90d7-2d6e33e39572@github.com> References: <4Op0Z8whnyDXDC6zGyMbx4ugcZp5TEoAqW_myB5flxM=.1c7b59ba-efb2-4f68-90d7-2d6e33e39572@github.com> Message-ID: On Tue, 21 Mar 2023 16:16:31 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, VectorShuffle is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: >> >> 1. Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. >> 2. Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. >> 3. Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. >> 4. Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. >> >> Upon these changes, a `rearrange` can emit more efficient code: >> >> var species = IntVector.SPECIES_128; >> var v1 = IntVector.fromArray(species, SRC1, 0); >> var v2 = IntVector.fromArray(species, SRC2, 0); >> v1.rearrange(v2.toShuffle()).intoArray(DST, 0); >> >> Before: >> movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} >> vmovdqu 0x10(%r10),%xmm2 >> movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} >> vmovdqu 0x10(%r10),%xmm1 >> movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} >> vmovdqu 0x10(%r10),%xmm0 >> vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask >> ; {external_word} >> vpackusdw %xmm0,%xmm0,%xmm0 >> vpackuswb %xmm0,%xmm0,%xmm0 >> vpmovsxbd %xmm0,%xmm3 >> vpcmpgtd %xmm3,%xmm1,%xmm3 >> vtestps %xmm3,%xmm3 >> jne 0x00007fc2acb4e0d8 >> vpmovzxbd %xmm0,%xmm0 >> vpermd %ymm2,%ymm0,%ymm0 >> movabs $0x751588f98,%r10 ; {oop([I{0x0000000751588f98})} >> vmovdqu %xmm0,0x10(%r10) >> >> After: >> movabs $0x751589c78,%r10 ; {oop([I{0x0000000751589c78})} >> vmovdqu 0x10(%r10),%xmm1 >> movabs $0x75158ac88,%r10 ; {oop([I{0x000000075158ac88})} >> vmovdqu 0x10(%r10),%xmm2 >> vpxor %xmm0,%xmm0,%xmm0 >> vpcmpgtd %xmm2,%xmm0,%xmm3 >> vtestps %xmm3,%xmm3 >> jne 0x00007fa818b27cb1 >> vpermd %ymm1,%ymm2,%ymm0 >> movabs $0x751588c68,%r10 ; {oop([I{0x0000000751588c68})} >> vmovdqu %xmm0,0x10(%r10) >> >> Please take a look and leave reviews. Thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: > > - missing casts > - clean up src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractShuffle.java line 118: > 116: return (VectorShuffle) v.rearrange(shuffle.cast(vspecies().asIntegral())) > 117: .toShuffle() > 118: .cast(vspecies()); Style issue. Suggest to change to: return (VectorShuffle) v.rearrange(shuffle.cast(vspecies().asIntegral())) .toShuffle() .cast(vspecies()); I also noticed that the similar shuffle cast code is used more frequently. Could we wrap such code `toShuffle().cast(vspecies())` to a separate method? src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractShuffle.java line 130: > 128: } else { > 129: v = v.blend(v.lanewise(VectorOperators.ADD, length()), > 130: v.compare(VectorOperators.LT, 0)); Style issue. Suggest to change to: v = v.blend(v.lanewise(VectorOperators.ADD, length()), v.compare(VectorOperators.LT, 0)); src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractVector.java line 198: > 196: if ((length() & (length() - 1)) != 0) { > 197: return wrap ? shuffleFromOp(i -> (VectorIntrinsics.wrapToRange(i * step + start, length()))) > 198: : shuffleFromOp(i -> i * step + start); Code style issue. Suggest to: return wrap ? shuffleFromOp(i -> (VectorIntrinsics.wrapToRange(i * step + start, length()))) : shuffleFromOp(i -> i * step + start); src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractVector.java line 204: > 202: Vector iota = species.iota(); > 203: iota = iota.lanewise(VectorOperators.MUL, step) > 204: .lanewise(VectorOperators.ADD, start); Style issue. Same as above. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13093#discussion_r1144384585 PR Review Comment: https://git.openjdk.org/jdk/pull/13093#discussion_r1144389023 PR Review Comment: https://git.openjdk.org/jdk/pull/13093#discussion_r1144390218 PR Review Comment: https://git.openjdk.org/jdk/pull/13093#discussion_r1144390692 From aturbanov at openjdk.org Wed Mar 22 09:18:50 2023 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Wed, 22 Mar 2023 09:18:50 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v5] In-Reply-To: References: Message-ID: On Tue, 21 Mar 2023 09:02:29 GMT, Per Minborg wrote: >> API changes for the FFM API (third preview) >> >> Specdiff: >> https://cr.openjdk.org/~pminborg/panama/21/v1/specdiff/overview-summary.html >> >> Javadoc: >> https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/module-summary.html > > Per Minborg has updated the pull request incrementally with one additional commit since the last revision: > > Add example for Option::captureStateLayout src/java.base/share/classes/java/lang/foreign/Arena.java line 224: > 222: static Arena global() { > 223: class Holder { > 224: final static Arena GLOBAL = MemorySessionImpl.GLOBAL.asArena(); Nit: use blessed modifiers order Suggestion: static final Arena GLOBAL = MemorySessionImpl.GLOBAL.asArena(); src/java.base/share/classes/jdk/internal/foreign/abi/aarch64/windows/WindowsAArch64Linker.java line 58: > 56: @Override > 57: protected UpcallStubFactory arrangeUpcall(MethodType targetType, FunctionDescriptor function, LinkerOptions options) { > 58: return CallArranger.WINDOWS.arrangeUpcall(targetType, function, options); Suggestion: return CallArranger.WINDOWS.arrangeUpcall(targetType, function, options); src/java.base/share/classes/jdk/internal/foreign/abi/fallback/LibFallback.java line 159: > 157: */ > 158: static void getStructOffsets(MemorySegment structType, MemorySegment offsetsOut, FFIABI abi) > 159: throws IllegalStateException { Suggestion: throws IllegalStateException { test/jdk/java/foreign/trivial/TestTrivial.java line 74: > 72: VarHandle vhX = bigLayout.varHandle(MemoryLayout.PathElement.groupElement("x")); > 73: VarHandle vhY = bigLayout.varHandle(MemoryLayout.PathElement.groupElement("y")); > 74: try (Arena arena = Arena.ofConfined()) { nit: Suggestion: try (Arena arena = Arena.ofConfined()) { test/jdk/java/foreign/trivial/TestTrivial.java line 89: > 87: StructLayout capturedStateLayout = Linker.Option.captureStateLayout(); > 88: VarHandle errnoHandle = capturedStateLayout.varHandle(MemoryLayout.PathElement.groupElement("errno")); > 89: try (Arena arena = Arena.ofConfined()) { nit Suggestion: try (Arena arena = Arena.ofConfined()) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13079#discussion_r1144450371 PR Review Comment: https://git.openjdk.org/jdk/pull/13079#discussion_r1144453152 PR Review Comment: https://git.openjdk.org/jdk/pull/13079#discussion_r1144452221 PR Review Comment: https://git.openjdk.org/jdk/pull/13079#discussion_r1144451413 PR Review Comment: https://git.openjdk.org/jdk/pull/13079#discussion_r1144451634 From dholmes at openjdk.org Wed Mar 22 09:22:00 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 22 Mar 2023 09:22:00 GMT Subject: RFR: 8257967: JFR: Events for loaded agents [v10] In-Reply-To: References: Message-ID: On Tue, 21 Mar 2023 00:53:31 GMT, Serguei Spitsyn wrote: >> Markus Gr?nlund has updated the pull request incrementally with one additional commit since the last revision: >> >> more cleanup > > src/hotspot/share/prims/jvmtiEnvBase.hpp line 166: > >> 164: >> 165: const void* get_env_local_storage() { return _env_local_storage; } >> 166: > > Why was this change/move necessary? Do I miss anything? It is now public, not protected. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12923#discussion_r1144458026 From rkennke at openjdk.org Wed Mar 22 09:51:08 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 22 Mar 2023 09:51:08 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v29] In-Reply-To: References: <_WvW_1rgaeDPAzM9DferkAgb6IhT-kZXPsINY8o_uA4=.bb1a0483-dc87-4e61-8272-f41618e27f53@github.com> Message-ID: On Wed, 22 Mar 2023 00:25:43 GMT, Vladimir Kozlov wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2 >> - Set condition flags correctly after fast-lock call on aarch64 > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 670: > >> 668: get_thread (scrReg); // beware: clobbers ICCs >> 669: movptr(Address(boxReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner)), scrReg); >> 670: xorptr(boxReg, boxReg); // set icc.ZFlag = 1 to indicate success > > Should this be under `if (UseFastLocking)`? I don't think so, unless we also want to change all the stuff in x86_32.ad to not fetch the thread before calling into fast_unlock(). However, I think it is a nice and useful change. I could break it out of this PR and get it reviewed separately, it is a side-effect of the new locking impl insofar as we always require a thread register, and allocate&fetch it before going into fast_lock(). > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 791: > >> 789: Compile::current()->output()->add_stub(stub); >> 790: jcc(Assembler::notEqual, stub->entry()); >> 791: bind(stub->continuation()); > > Why use stub here and not inline the code? Because the branch mostly not taken? Yes, the branch is mostly not taken. If we inline the code, then we would have to take a forward branch on the very common path to skip over the (rare) part that handles ANON monitor owner. This would throw off static branch prediction and is discouraged by the Intel optimization guide. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1144501909 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1144504528 From mcimadamore at openjdk.org Wed Mar 22 11:57:46 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 22 Mar 2023 11:57:46 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v5] In-Reply-To: References: Message-ID: On Tue, 21 Mar 2023 09:02:29 GMT, Per Minborg wrote: >> API changes for the FFM API (third preview) >> >> Specdiff: >> https://cr.openjdk.org/~pminborg/panama/21/v1/specdiff/overview-summary.html >> >> Javadoc: >> https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/module-summary.html > > Per Minborg has updated the pull request incrementally with one additional commit since the last revision: > > Add example for Option::captureStateLayout src/java.base/share/classes/java/lang/foreign/MemoryLayout.java line 694: > 692: * @param bitSize the padding size in bits. > 693: * @return the new selector layout. > 694: * @throws IllegalArgumentException if {@code bitSize < 0} or {@code bitSize % 8 != 0} I'm not sure if this change in the `@throws` was deliberate - e.g. the new API seems to allow creation of padding layouts of zero size (which I did not realize). This can hide issues for code generators (e.g. I stumbled upon this when working on jextract, which was silently emitting zero-length paddings in some struct layouts). Perhaps better to revert to old semantics? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13079#discussion_r1144680066 From pminborg at openjdk.org Wed Mar 22 12:10:46 2023 From: pminborg at openjdk.org (Per Minborg) Date: Wed, 22 Mar 2023 12:10:46 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v5] In-Reply-To: References: Message-ID: On Wed, 22 Mar 2023 11:55:09 GMT, Maurizio Cimadamore wrote: >> Per Minborg has updated the pull request incrementally with one additional commit since the last revision: >> >> Add example for Option::captureStateLayout > > src/java.base/share/classes/java/lang/foreign/MemoryLayout.java line 694: > >> 692: * @param bitSize the padding size in bits. >> 693: * @return the new selector layout. >> 694: * @throws IllegalArgumentException if {@code bitSize < 0} or {@code bitSize % 8 != 0} > > I'm not sure if this change in the `@throws` was deliberate - e.g. the new API seems to allow creation of padding layouts of zero size (which I did not realize). This can hide issues for code generators (e.g. I stumbled upon this when working on jextract, which was silently emitting zero-length paddings in some struct layouts). Perhaps better to revert to old semantics? Agreed. Having zero-length padding does not make much sense. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13079#discussion_r1144707890 From pminborg at openjdk.org Wed Mar 22 12:28:14 2023 From: pminborg at openjdk.org (Per Minborg) Date: Wed, 22 Mar 2023 12:28:14 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v6] In-Reply-To: References: Message-ID: > API changes for the FFM API (third preview) > > Specdiff: > https://cr.openjdk.org/~pminborg/panama/21/v1/specdiff/overview-summary.html > > Javadoc: > https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/module-summary.html Per Minborg has updated the pull request incrementally with two additional commits since the last revision: - Fix formating and modifier order - Fix typo -> shared arena ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13079/files - new: https://git.openjdk.org/jdk/pull/13079/files/21ef0607..192050d6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13079&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13079&range=04-05 Stats: 6 lines in 5 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/13079.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13079/head:pull/13079 PR: https://git.openjdk.org/jdk/pull/13079 From coleenp at openjdk.org Wed Mar 22 12:36:54 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 22 Mar 2023 12:36:54 GMT Subject: RFR: 8304089: Convert TraceDependencies to UL [v5] In-Reply-To: References: <0aGc1NdAjpvksCWmXb1gZOPp9MV0n6xWvG8EaEp2ZLg=.b79cf218-9a23-420d-bec5-7509b7f8f1c1@github.com> Message-ID: <8MoLfNrpMWxIi4YI8BzYzYa0i59kTkfCvI3I1QP2QL0=.64d3f6e9-4f02-4637-984a-f2c251cb0080@github.com> On Tue, 21 Mar 2023 17:12:00 GMT, Coleen Phillimore wrote: >> This change converts TraceDependencies to UL and removes the develop option. I think this provides further flexibility to add tags to only trace certain things in dependency analysis, as I did when trying to understand a PR for a deoptimization change. For now, the messages are the same and the option is -Xlog:dependencies=debug. >> Tested with tier1-4 > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Drop UL check for log file output (still have PrintDependencies). Thank you Vladimir and David. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13007#issuecomment-1479491476 From coleenp at openjdk.org Wed Mar 22 12:36:57 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 22 Mar 2023 12:36:57 GMT Subject: Integrated: 8304089: Convert TraceDependencies to UL In-Reply-To: <0aGc1NdAjpvksCWmXb1gZOPp9MV0n6xWvG8EaEp2ZLg=.b79cf218-9a23-420d-bec5-7509b7f8f1c1@github.com> References: <0aGc1NdAjpvksCWmXb1gZOPp9MV0n6xWvG8EaEp2ZLg=.b79cf218-9a23-420d-bec5-7509b7f8f1c1@github.com> Message-ID: On Mon, 13 Mar 2023 21:32:14 GMT, Coleen Phillimore wrote: > This change converts TraceDependencies to UL and removes the develop option. I think this provides further flexibility to add tags to only trace certain things in dependency analysis, as I did when trying to understand a PR for a deoptimization change. For now, the messages are the same and the option is -Xlog:dependencies=debug. > Tested with tier1-4 This pull request has now been integrated. Changeset: ddf1e34c Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/ddf1e34c1a0815e8677212f1a7860ca7cf9fc2c9 Stats: 78 lines in 14 files changed: 22 ins; 4 del; 52 mod 8304089: Convert TraceDependencies to UL Reviewed-by: vlivanov, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/13007 From coleenp at openjdk.org Wed Mar 22 12:43:25 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 22 Mar 2023 12:43:25 GMT Subject: RFR: 8304687: Move add_to_hierarchy [v2] In-Reply-To: <8lZkIOj_sW-s5fenIUyqOflXokTscrf8v6Zi6VOic4o=.847bbf8a-430b-46f5-a4c2-335d3355c320@github.com> References: <8lZkIOj_sW-s5fenIUyqOflXokTscrf8v6Zi6VOic4o=.847bbf8a-430b-46f5-a4c2-335d3355c320@github.com> Message-ID: <88wO6xUBgdQseupIPo93t7gKRlhKgME3O98C-dKwmbI=.e21b9600-a2f4-43cf-9e7d-9241f704d20a@github.com> > Moved SystemDictionary::add_to_hierarchy to InstanceKlass::add_to_hierarchy where it more logically belongs and next to other functions that also care about dependencies. > Tested with tier1-4, and tier1 on linux,windows,macos to check header file changes. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Add back method.inline.hpp include ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13129/files - new: https://git.openjdk.org/jdk/pull/13129/files/cd8c564a..b86abf50 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13129&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13129&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13129.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13129/head:pull/13129 PR: https://git.openjdk.org/jdk/pull/13129 From qamai at openjdk.org Wed Mar 22 12:46:33 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 22 Mar 2023 12:46:33 GMT Subject: RFR: 8304450: [vectorapi] Refactor VectorShuffle implementation [v4] In-Reply-To: References: Message-ID: > Hi, > > This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, VectorShuffle is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: > > 1. Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. > 2. Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. > 3. Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. > 4. Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. > > Upon these changes, a `rearrange` can emit more efficient code: > > var species = IntVector.SPECIES_128; > var v1 = IntVector.fromArray(species, SRC1, 0); > var v2 = IntVector.fromArray(species, SRC2, 0); > v1.rearrange(v2.toShuffle()).intoArray(DST, 0); > > Before: > movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} > vmovdqu 0x10(%r10),%xmm2 > movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} > vmovdqu 0x10(%r10),%xmm1 > movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} > vmovdqu 0x10(%r10),%xmm0 > vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask > ; {external_word} > vpackusdw %xmm0,%xmm0,%xmm0 > vpackuswb %xmm0,%xmm0,%xmm0 > vpmovsxbd %xmm0,%xmm3 > vpcmpgtd %xmm3,%xmm1,%xmm3 > vtestps %xmm3,%xmm3 > jne 0x00007fc2acb4e0d8 > vpmovzxbd %xmm0,%xmm0 > vpermd %ymm2,%ymm0,%ymm0 > movabs $0x751588f98,%r10 ; {oop([I{0x0000000751588f98})} > vmovdqu %xmm0,0x10(%r10) > > After: > movabs $0x751589c78,%r10 ; {oop([I{0x0000000751589c78})} > vmovdqu 0x10(%r10),%xmm1 > movabs $0x75158ac88,%r10 ; {oop([I{0x000000075158ac88})} > vmovdqu 0x10(%r10),%xmm2 > vpxor %xmm0,%xmm0,%xmm0 > vpcmpgtd %xmm2,%xmm0,%xmm3 > vtestps %xmm3,%xmm3 > jne 0x00007fa818b27cb1 > vpermd %ymm1,%ymm2,%ymm0 > movabs $0x751588c68,%r10 ; {oop([I{0x0000000751588c68})} > vmovdqu %xmm0,0x10(%r10) > > Please take a look and leave reviews. Thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: reviews ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13093/files - new: https://git.openjdk.org/jdk/pull/13093/files/4caa9d10..e0b9ee88 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13093&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13093&range=02-03 Stats: 17 lines in 5 files changed: 0 ins; 0 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/13093.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13093/head:pull/13093 PR: https://git.openjdk.org/jdk/pull/13093 From qamai at openjdk.org Wed Mar 22 12:46:36 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 22 Mar 2023 12:46:36 GMT Subject: RFR: 8304450: [vectorapi] Refactor VectorShuffle implementation [v3] In-Reply-To: References: <4Op0Z8whnyDXDC6zGyMbx4ugcZp5TEoAqW_myB5flxM=.1c7b59ba-efb2-4f68-90d7-2d6e33e39572@github.com> Message-ID: On Wed, 22 Mar 2023 08:09:03 GMT, Xiaohong Gong wrote: >> Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: >> >> - missing casts >> - clean up > > Please also update the copyright to 2023 for some touched files like `vectorSupport.hpp` and other java files like `AbstractShuffle.java`, `AbstractVector.java`, `VectorShape.java`, and `VectorShuffle.java`. Thanks! @XiaohongGong Thanks, I have updated the copyright year and the code styles as you suggested ------------- PR Comment: https://git.openjdk.org/jdk/pull/13093#issuecomment-1479500096 From qamai at openjdk.org Wed Mar 22 12:46:39 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 22 Mar 2023 12:46:39 GMT Subject: RFR: 8304450: [vectorapi] Refactor VectorShuffle implementation [v4] In-Reply-To: References: <4Op0Z8whnyDXDC6zGyMbx4ugcZp5TEoAqW_myB5flxM=.1c7b59ba-efb2-4f68-90d7-2d6e33e39572@github.com> Message-ID: On Wed, 22 Mar 2023 07:59:40 GMT, Xiaohong Gong wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> reviews > > src/hotspot/share/opto/vectorIntrinsics.cpp line 2059: > >> 2057: if (need_load_shuffle) { >> 2058: shuffle = gvn().transform(new VectorLoadShuffleNode(shuffle, vt)); >> 2059: } > > How about generating `VectorLoadShuffleNode` for all platforms that support Vector API, and remove the helper method `vector_needs_load_shuffle()` ? For those platforms that do not need this shuffle preparation, we can emit nothing in codegen. I think not emitting `VectorLoadShuffleNode` is more common so it is better to emit them only when needed, as it will simplify the graph and may allow better inspections of the indices in the future. Additionally, a do-nothing node does not alias with its input and therefore kills the input, which leads to an additional spill if they both need to live. > src/hotspot/share/opto/vectorIntrinsics.cpp line 2426: > >> 2424: if (is_vector_shuffle(vbox_klass_from)) { >> 2425: return false; // vector shuffles aren't supported >> 2426: } > > Is it better to change this as an "assertion" or print the log details? The change indifferentiates a vector shuffle from a normal vector in C2, so this should be removed, as vector shuffles are converted to/from normal vector using this routine ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13093#discussion_r1144748489 PR Review Comment: https://git.openjdk.org/jdk/pull/13093#discussion_r1144744663 From coleenp at openjdk.org Wed Mar 22 13:03:27 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 22 Mar 2023 13:03:27 GMT Subject: RFR: 8304687: Move add_to_hierarchy [v3] In-Reply-To: <8lZkIOj_sW-s5fenIUyqOflXokTscrf8v6Zi6VOic4o=.847bbf8a-430b-46f5-a4c2-335d3355c320@github.com> References: <8lZkIOj_sW-s5fenIUyqOflXokTscrf8v6Zi6VOic4o=.847bbf8a-430b-46f5-a4c2-335d3355c320@github.com> Message-ID: > Moved SystemDictionary::add_to_hierarchy to InstanceKlass::add_to_hierarchy where it more logically belongs and next to other functions that also care about dependencies. > Tested with tier1-4, and tier1 on linux,windows,macos to check header file changes. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Correct comments referring to SystemDictionary::add_to_hierarchy ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13129/files - new: https://git.openjdk.org/jdk/pull/13129/files/b86abf50..31c5a9c7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13129&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13129&range=01-02 Stats: 4 lines in 3 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/13129.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13129/head:pull/13129 PR: https://git.openjdk.org/jdk/pull/13129 From pminborg at openjdk.org Wed Mar 22 13:58:35 2023 From: pminborg at openjdk.org (Per Minborg) Date: Wed, 22 Mar 2023 13:58:35 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v7] In-Reply-To: References: Message-ID: > API changes for the FFM API (third preview) > > Specdiff: > https://cr.openjdk.org/~pminborg/panama/21/v1/specdiff/overview-summary.html > > Javadoc: > https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/module-summary.html Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Disalow padding layouts of size zero ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13079/files - new: https://git.openjdk.org/jdk/pull/13079/files/192050d6..45febe9d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13079&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13079&range=05-06 Stats: 10 lines in 2 files changed: 3 ins; 5 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13079.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13079/head:pull/13079 PR: https://git.openjdk.org/jdk/pull/13079 From jbechberger at openjdk.org Wed Mar 22 14:08:55 2023 From: jbechberger at openjdk.org (Johannes Bechberger) Date: Wed, 22 Mar 2023 14:08:55 GMT Subject: Withdrawn: 8303444: AsyncGetCallTrace obtains too few frames with instrumentation agent In-Reply-To: References: Message-ID: On Wed, 1 Mar 2023 12:32:19 GMT, Johannes Bechberger wrote: > This fixes the bug by removing the faulty completeness check for runtime blobs. > > I tested it using the [trace_validation](https://github.com/parttimenerd/trace_validation) tool successfully as described in the issue. I furthermore ran the [jdk-profiling-tester](https://github.com/parttimenerd/jdk-profiling-tester) to ensure that this fix did not introduce any stability issues and ran the serviceability JTREG tests successfully. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/12804 From pminborg at openjdk.org Wed Mar 22 14:09:07 2023 From: pminborg at openjdk.org (Per Minborg) Date: Wed, 22 Mar 2023 14:09:07 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v8] In-Reply-To: References: Message-ID: <4JIhKmX2VnDfArfFl-1YJfoUzGGBVA5Uvd3mdpatW-s=.5d86f29e-5475-4a4a-91df-d6418356e204@github.com> > API changes for the FFM API (third preview) > > Specdiff: > https://cr.openjdk.org/~pminborg/panama/21/v1/specdiff/overview-summary.html > > Javadoc: > https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/module-summary.html Per Minborg has updated the pull request incrementally with one additional commit since the last revision: Improve javadocs for Linker::captureStateLayout ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13079/files - new: https://git.openjdk.org/jdk/pull/13079/files/45febe9d..6df28a78 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13079&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13079&range=06-07 Stats: 11 lines in 1 file changed: 0 ins; 1 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/13079.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13079/head:pull/13079 PR: https://git.openjdk.org/jdk/pull/13079 From pminborg at openjdk.org Wed Mar 22 14:09:56 2023 From: pminborg at openjdk.org (Per Minborg) Date: Wed, 22 Mar 2023 14:09:56 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v8] In-Reply-To: References: Message-ID: On Tue, 21 Mar 2023 12:12:15 GMT, Maurizio Cimadamore wrote: >> src/java.base/share/classes/java/lang/foreign/Linker.java line 621: >> >>> 619: * to a downcall handle linked with {@link #captureCallState(String...)}} >>> 620: * >>> 621: * @see #captureCallState(String...) >> >> How does a caller know what the structure may contain? Should we document the platform specific structures? > > Back to @PaulSandoz question - "how does the caller know what the structure contains?". The caller typically doesn't care too much about what the returned struct is. But it might care about which "values" can be captured. That said, the set of such interesting values, is not too surprising. As demonstrated in the example in the `Linker.capturedCallState` method, once the client knows the name (and "errno" is likely to be 90% case), everything else follows from there - as the layout can be used to obtain var handles for the required values. > > But, perhaps, now that I write this, I realize that what @PaulSandoz might _really_ be asking is: how do I know that e.g. the returned struct will not contain e.g. nested structs, sequences, or other non-sense. So we might specify (in a normative way) that the returned layout is a struct layout, whose member layouts are one or more value layouts (possibly with some added padding layouts). The names of the value layouts reflect the names of the values that can be captured. > > And _then_ we show an example of the layout we return for Windows/x64 (as that's more interesting). I've updated the specs as per how I interpret the comments above. Let me know your thoughts on this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13079#discussion_r1144872865 From jsjolen at openjdk.org Wed Mar 22 14:22:01 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 22 Mar 2023 14:22:01 GMT Subject: RFR: JDK-8301498: Replace NULL with nullptr in cpu/x86 [v4] In-Reply-To: References: Message-ID: <0A-D-P6RuP8u8dtlXBf5ox_2-15SnHi_ixwC5MZW5OM=.0971e7bf-5380-42ce-a496-eedd8cf2fff8@github.com> On Tue, 21 Mar 2023 10:04:04 GMT, Johan Sj?len wrote: >> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/x86. Unfortunately the script that does the change isn't perfect, and so we >> need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. >> >> Here are some typical things to look out for: >> >> 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). >> 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. >> 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. >> >> An example of this: >> >> ```c++ >> // This function returns null >> void* ret_null(); >> // This function returns true if *x == nullptr >> bool is_nullptr(void** x); >> >> >> Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. >> >> Thanks! > > Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge remote-tracking branch 'origin/master' into JDK-8301498 > - Fix vnkozlov's suggestions > - Merge remote-tracking branch 'origin/master' into JDK-8301498 > - Some more fixes > - Fixes > - Replace NULL with nullptr in cpu/x86 Thanks. All of tier1 and tier2 is passing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12326#issuecomment-1479646756 From jsjolen at openjdk.org Wed Mar 22 14:22:02 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 22 Mar 2023 14:22:02 GMT Subject: Integrated: JDK-8301498: Replace NULL with nullptr in cpu/x86 In-Reply-To: References: Message-ID: On Tue, 31 Jan 2023 11:40:19 GMT, Johan Sj?len wrote: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/x86. Unfortunately the script that does the change isn't perfect, and so we > need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! This pull request has now been integrated. Changeset: 4154a980 Author: Johan Sj?len URL: https://git.openjdk.org/jdk/commit/4154a980ca28c1ae56db26e3dce64c07c225de12 Stats: 656 lines in 54 files changed: 0 ins; 0 del; 656 mod 8301498: Replace NULL with nullptr in cpu/x86 Reviewed-by: dholmes, kvn ------------- PR: https://git.openjdk.org/jdk/pull/12326 From mcimadamore at openjdk.org Wed Mar 22 14:38:30 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 22 Mar 2023 14:38:30 GMT Subject: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v8] In-Reply-To: <4JIhKmX2VnDfArfFl-1YJfoUzGGBVA5Uvd3mdpatW-s=.5d86f29e-5475-4a4a-91df-d6418356e204@github.com> References: <4JIhKmX2VnDfArfFl-1YJfoUzGGBVA5Uvd3mdpatW-s=.5d86f29e-5475-4a4a-91df-d6418356e204@github.com> Message-ID: On Wed, 22 Mar 2023 14:09:07 GMT, Per Minborg wrote: >> API changes for the FFM API (third preview) >> >> Specdiff: >> https://cr.openjdk.org/~pminborg/panama/21/v1/specdiff/overview-summary.html >> >> Javadoc: >> https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/module-summary.html > > Per Minborg has updated the pull request incrementally with one additional commit since the last revision: > > Improve javadocs for Linker::captureStateLayout src/java.base/share/classes/java/lang/foreign/Linker.java line 628: > 626: * and possibly {@linkplain PaddingLayout padding layouts}. > 627: * As an example, on Windows, the returned layout might contain three value layouts named: > 628: *