From njian at openjdk.java.net Mon Feb 1 07:51:45 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Mon, 1 Feb 2021 07:51:45 GMT Subject: [jdk16] RFR: 8260585: AArch64: Wrong code generated for shifting right and accumulating four unsigned short integers [v3] In-Reply-To: References: Message-ID: On Sun, 31 Jan 2021 10:30:06 GMT, Dong Bo wrote: >> This is a typo introduced by JDK-8255949. >> Compiler will generate `ushr` for shifting right and accumulating four short integers. >> It produces wrong results for specific case. The instruction should be `usra`. > > Dong Bo has updated the pull request incrementally with one additional commit since the last revision: > > make empty ins_encode when shift >= 16 (chars) Looks good to me. ------------- Marked as reviewed by njian (Committer). PR: https://git.openjdk.java.net/jdk16/pull/136 From ngasson at openjdk.java.net Mon Feb 1 08:10:10 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Mon, 1 Feb 2021 08:10:10 GMT Subject: RFR: 8260355: AArch64: deoptimization stub should save vector registers [v2] In-Reply-To: <5PbujtOhWB7uqNXu5vRMwYGNMNU78aLAkGpDLWUnQKM=.9d2206fe-9c15-424c-b08e-80eab468df2a@github.com> References: <5PbujtOhWB7uqNXu5vRMwYGNMNU78aLAkGpDLWUnQKM=.9d2206fe-9c15-424c-b08e-80eab468df2a@github.com> Message-ID: <3E3DtTPnJhqfGu8vR-HaKp642TBt9AoIox_qwba56I4=.8df17541-a42e-4f2d-a075-7f5ef176b13f@github.com> > This is an AArch64 port of the fix for JDK-8256056 "Deoptimization stub > doesn't save vector registers on x86". The problem is that a vector > produced by the Vector API may be stored in a register when the deopt > blob is called. Because the deopt blob only stores the lower half of > vector registers, the full vector object cannot be rematerialized during > deoptimization. So the following will crash on AArch64 with current JDK: > > make test TEST="jdk/incubator/vector" \ > JTREG="VM_OPTIONS=-XX:+DeoptimizeALot -XX:DeoptimizeALotInterval=0" > > The fix is to store the full vector registers by passing > save_vectors=true to save_live_registers() in the deopt blob. Because > save_live_registers() places the integer registers above the floating > registers in the stack frame, RegisterSaver::r0_offset_in_bytes() needs > to calculate the SP offset based on whether full vectors were saved, and > whether those vectors were NEON or SVE, rather than using a static > offset as it does currently. > > The change to VectorSupport::allocate_vector_payload_helper() is > required because we only store the lowest VMReg slot in the oop map. > However unlike x86 the vector registers are always saved in a contiguous > region of memory, so we can calculate the address of each vector element > as an offset from the address of the first slot. X86 handles this in > RegisterMap::pd_location() but that won't work on AArch64 because with > SVE there isn't a unique VMReg corresponding to each four-byte physical > slot in the vector (there are always exactly eight logical VMRegs > regardless of the actual vector length). > > Tested hotspot_all_no_apps and jdk_core. Nick Gasson has updated the pull request incrementally with one additional commit since the last revision: Move SVE slot handling to RegisterMap::pd_location ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2279/files - new: https://git.openjdk.java.net/jdk/pull/2279/files/83d07c58..2364f3dd Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2279&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2279&range=00-01 Stats: 222 lines in 20 files changed: 136 ins; 13 del; 73 mod Patch: https://git.openjdk.java.net/jdk/pull/2279.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2279/head:pull/2279 PR: https://git.openjdk.java.net/jdk/pull/2279 From dongbo at openjdk.java.net Mon Feb 1 08:12:58 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Mon, 1 Feb 2021 08:12:58 GMT Subject: [jdk16] RFR: 8260585: AArch64: Wrong code generated for shifting right and accumulating four unsigned short integers [v4] In-Reply-To: References: Message-ID: > This is a typo introduced by JDK-8255949. > Compiler will generate `ushr` for shifting right and accumulating four short integers. > It produces wrong results for specific case. The instruction should be `usra`. Dong Bo has updated the pull request incrementally with two additional commits since the last revision: - fix trailing whitespace - add tests for shifting counts ------------- Changes: - all: https://git.openjdk.java.net/jdk16/pull/136/files - new: https://git.openjdk.java.net/jdk16/pull/136/files/b7ef8fb8..f2e490a3 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk16&pr=136&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk16&pr=136&range=02-03 Stats: 286 lines in 1 file changed: 245 ins; 6 del; 35 mod Patch: https://git.openjdk.java.net/jdk16/pull/136.diff Fetch: git fetch https://git.openjdk.java.net/jdk16 pull/136/head:pull/136 PR: https://git.openjdk.java.net/jdk16/pull/136 From ngasson at openjdk.java.net Mon Feb 1 08:38:41 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Mon, 1 Feb 2021 08:38:41 GMT Subject: RFR: 8260355: AArch64: deoptimization stub should save vector registers [v2] In-Reply-To: References: <5PbujtOhWB7uqNXu5vRMwYGNMNU78aLAkGpDLWUnQKM=.9d2206fe-9c15-424c-b08e-80eab468df2a@github.com> <_4sNUEnp1D8ufU2DBKfAVKa0AzBXOdKT9-ya8RwFIKo=.37bbf7b0-d3eb-47a9-9057-cc0569577207@github.com> Message-ID: On Thu, 28 Jan 2021 12:12:32 GMT, Vladimir Ivanov wrote: >> For Arm NEON (and PPC) we don't set VMReg::next() in oopmap either, and their vector slots are contiguous, so that's x86-specific? But yes, NEON can also generate correct full oopmap as for fixed vector size. For SVE, I have no idea to have proper VMReg::next() support, so Nick's solution looks good to me. Regarding to introducing new cross-platform API, which API do you mean? If we could have some better api, that would be perfect. Currently, allocate_vector_payload_helper() is the only one I can see that is vector related for RegisterMap::location() call. > > Probably, x86 is unique in using non-contiguous representation for vector values, but it doesn't make the code in question x86-specific. AArch64 is the only user of`VecA` and `VecA` is the only register type that has a mismatch in size between in-memory and RegMask representation. So, I conclude it is AArch64/SVE-specific. > > On x86 RegisterMap isn't fully populated for vector registers as well, but there's`RegisterMap::pd_location()` to cover that. > > Regarding new API, I mean the alternative to `VMReg::next()`/`RegisterMap::location(VMReg)` which is able to handle `VecA` case well. As Nick pointed out earlier, the problem with `VecA` is that there's no `VMReg` representation for all the slots which comprise the register value. > > Either enhancing `VMReg::next(int)` to produce special values for `VecA` case or introducing `RegisterMap::location(VMReg base_reg, int slot)` is a better way to handle the problem. @iwanowww please take a look at the latest set of changes and let me know what you think. There's now a `RegisterMap::location(VMReg base_reg, int slot)` method as you suggest. That in turn uses a new method `VMReg::is_expressible(int slot_delta)` which is true if offsetting a VMReg by slot_delta slots gives another valid VMReg which is also a slot of the same physical register (i.e. `reg->next(slot_delta)` is valid). We can use this to fall back to `pd_location` if a slot of a vector is not expressible as a VMReg (i.e. for SVE). Unfortunately it touches a lot of files but that seems unavoidable. ------------- PR: https://git.openjdk.java.net/jdk/pull/2279 From github.com+4146708+a74nh at openjdk.java.net Mon Feb 1 09:34:47 2021 From: github.com+4146708+a74nh at openjdk.java.net (Alan Hayward) Date: Mon, 1 Feb 2021 09:34:47 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v6] In-Reply-To: References: Message-ID: On Wed, 27 Jan 2021 14:58:27 GMT, Vladimir Kempik wrote: >> Build changes per se now looks okay. However, I agree with Erik that unless this PR can wait for the JNF removal, at the very least the build docs needs to be updated to explain how to successfully build for this platform. (I can live with the configure command line hack, since it's temporary -- otherwise I'd have requested a new configure argument.) This can be done in this PR or a follow-up PR. > >> Build changes per se now looks okay. However, I agree with Erik that unless this PR can wait for the JNF removal, at the very least the build docs needs to be updated to explain how to successfully build for this platform. (I can live with the configure command line hack, since it's temporary -- otherwise I'd have requested a new configure argument.) This can be done in this PR or a follow-up PR. > > I believe it's better be done under separate PR/bugfix, so it can be completely reverted once JNF removed. You need add macos arm64 to hsdis. Having it working is fairly essential for debugging. Inside src/utils/hsdis, After cloning binutils make; make demo; ./build/macosx-arm64/hsdis-demo Results in: Hello, world! ...And now for something completely different: Decoding from 0x1046e31a4 to 0x1046e3664...with decode_instructions_virtual hsdis: bad native mach=architecture not set in Makefile!; please port hsdis to this platform hsdis output options: I fixed it by changing the makefile to force the build flags: ARCH=aarch64 CFLAGS/aarch64 += -m64 Resulting in: Hello, world! ...And now for something completely different: Decoding from 0x10012719c to 0x10012765c...with decode_instructions_virtual Decoding for CPU 'aarch64' main: 0x10012719c sub sp, sp, #0x60 0x1001271a0 stp x29, x30, [sp, #80] ...etc Putting the library in the right place then made disassembly in java work for me. ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From ngasson at openjdk.java.net Mon Feb 1 09:36:05 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Mon, 1 Feb 2021 09:36:05 GMT Subject: RFR: 8260355: AArch64: deoptimization stub should save vector registers [v3] In-Reply-To: <5PbujtOhWB7uqNXu5vRMwYGNMNU78aLAkGpDLWUnQKM=.9d2206fe-9c15-424c-b08e-80eab468df2a@github.com> References: <5PbujtOhWB7uqNXu5vRMwYGNMNU78aLAkGpDLWUnQKM=.9d2206fe-9c15-424c-b08e-80eab468df2a@github.com> Message-ID: > This is an AArch64 port of the fix for JDK-8256056 "Deoptimization stub > doesn't save vector registers on x86". The problem is that a vector > produced by the Vector API may be stored in a register when the deopt > blob is called. Because the deopt blob only stores the lower half of > vector registers, the full vector object cannot be rematerialized during > deoptimization. So the following will crash on AArch64 with current JDK: > > make test TEST="jdk/incubator/vector" \ > JTREG="VM_OPTIONS=-XX:+DeoptimizeALot -XX:DeoptimizeALotInterval=0" > > The fix is to store the full vector registers by passing > save_vectors=true to save_live_registers() in the deopt blob. Because > save_live_registers() places the integer registers above the floating > registers in the stack frame, RegisterSaver::r0_offset_in_bytes() needs > to calculate the SP offset based on whether full vectors were saved, and > whether those vectors were NEON or SVE, rather than using a static > offset as it does currently. > > The change to VectorSupport::allocate_vector_payload_helper() is > required because we only store the lowest VMReg slot in the oop map. > However unlike x86 the vector registers are always saved in a contiguous > region of memory, so we can calculate the address of each vector element > as an offset from the address of the first slot. X86 handles this in > RegisterMap::pd_location() but that won't work on AArch64 because with > SVE there isn't a unique VMReg corresponding to each four-byte physical > slot in the vector (there are always exactly eight logical VMRegs > regardless of the actual vector length). > > Tested hotspot_all_no_apps and jdk_core. Nick Gasson has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into 8260355 - Move SVE slot handling to RegisterMap::pd_location - 8260355: AArch64: deoptimization stub should save vector registers This is an AArch64 port of the fix for JDK-8256056 "Deoptimization stub doesn't save vector registers on x86". The problem is that a vector produced by the Vector API may be stored in a register when the deopt blob is called. Because the deopt blob only stores the lower half of vector registers, the full vector object cannot be rematerialized during deoptimization. So the following will crash on AArch64 with current JDK: make test TEST="jdk/incubator/vector" \ JTREG="VM_OPTIONS=-XX:+DeoptimizeALot -XX:DeoptimizeALotInterval=0" The fix is to store the full vector registers by passing save_vectors=true to save_live_registers() in the deopt blob. Because save_live_registers() places the integer registers above the floating registers in the stack frame, RegisterSaver::r0_offset_in_bytes() needs to calculate the SP offset based on whether full vectors were saved, and whether those vectors were NEON or SVE, rather than using a static offset as it does currently. The change to VectorSupport::allocate_vector_payload_helper() is required because we only store the lowest VMReg slot in the oop map. However unlike x86 the vector registers are always saved in a contiguous region of memory, so we can calculate the address of each vector element as an offset from the address of the first slot. X86 handles this in RegisterMap::pd_location() but that won't work on AArch64 because with SVE there isn't a unique VMReg corresponding to each four-byte physical slot in the vector (there are always exactly eight logical VMRegs regardless of the actual vector length). Tested hotspot_all_no_apps and jdk_core. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2279/files - new: https://git.openjdk.java.net/jdk/pull/2279/files/2364f3dd..498310d4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2279&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2279&range=01-02 Stats: 16985 lines in 325 files changed: 3685 ins; 4804 del; 8496 mod Patch: https://git.openjdk.java.net/jdk/pull/2279.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2279/head:pull/2279 PR: https://git.openjdk.java.net/jdk/pull/2279 From vkempik at openjdk.java.net Mon Feb 1 11:22:47 2021 From: vkempik at openjdk.java.net (Vladimir Kempik) Date: Mon, 1 Feb 2021 11:22:47 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v6] In-Reply-To: References: Message-ID: <-iXh6ikIWdG2YM4G3ZE33bE_bflntM9JZ0JWd1vSMKU=.7c380f16-0e1f-4fa8-8b47-f0e9e56fdba3@github.com> On Mon, 1 Feb 2021 09:31:31 GMT, Alan Hayward wrote: > You need add macos arm64 to hsdis. Having it working is fairly essential for debugging. > > Inside src/utils/hsdis, After cloning binutils > > ``` > make; make demo; ./build/macosx-arm64/hsdis-demo > ``` > > Results in: > > ``` > Hello, world! > ...And now for something completely different: > > Decoding from 0x1046e31a4 to 0x1046e3664...with decode_instructions_virtual > hsdis: bad native mach=architecture not set in Makefile!; please port hsdis to this platform > hsdis output options: > ``` > > I fixed it by changing the makefile to force the build flags: > > ``` > ARCH=aarch64 > CFLAGS/aarch64 += -m64 > ``` > > Resulting in: > > ``` > Hello, world! > ...And now for something completely different: > > Decoding from 0x10012719c to 0x10012765c...with decode_instructions_virtual > Decoding for CPU 'aarch64' > main: > 0x10012719c sub sp, sp, #0x60 > 0x1001271a0 stp x29, x30, [sp, #80] > ...etc > ``` > > Putting the library in the right place then made disassembly in java work for me. Hello, hsdis is a separate out-of-tree project and is not part of this jep. support for looking for proper hsdis dylib library was added as part of this jep. ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From github.com+4146708+a74nh at openjdk.java.net Mon Feb 1 12:37:45 2021 From: github.com+4146708+a74nh at openjdk.java.net (Alan Hayward) Date: Mon, 1 Feb 2021 12:37:45 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v6] In-Reply-To: <-iXh6ikIWdG2YM4G3ZE33bE_bflntM9JZ0JWd1vSMKU=.7c380f16-0e1f-4fa8-8b47-f0e9e56fdba3@github.com> References: <-iXh6ikIWdG2YM4G3ZE33bE_bflntM9JZ0JWd1vSMKU=.7c380f16-0e1f-4fa8-8b47-f0e9e56fdba3@github.com> Message-ID: <1S0Z45oJKy8oesCN0k6SIHhZLzDYW9j-M_yfWZoHEzI=.2694c382-6613-4cd1-a917-16147bb00a9e@github.com> On Mon, 1 Feb 2021 11:19:34 GMT, Vladimir Kempik wrote: > Hello, hsdis is a separate out-of-tree project and is not part of this jep. Unless there's something I'm missing it only requires a few lines of change to src/utils/hsdis/makefile (it already has support for macos x86_64) >support for looking for proper hsdis dylib library was added as part of this jep. I'm a little confused. Are you planning on adding a new disassembler? ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From vlivanov at openjdk.java.net Mon Feb 1 12:44:40 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 1 Feb 2021 12:44:40 GMT Subject: RFR: 8260355: AArch64: deoptimization stub should save vector registers In-Reply-To: <5PbujtOhWB7uqNXu5vRMwYGNMNU78aLAkGpDLWUnQKM=.9d2206fe-9c15-424c-b08e-80eab468df2a@github.com> References: <5PbujtOhWB7uqNXu5vRMwYGNMNU78aLAkGpDLWUnQKM=.9d2206fe-9c15-424c-b08e-80eab468df2a@github.com> Message-ID: On Thu, 28 Jan 2021 08:27:22 GMT, Nick Gasson wrote: > This is an AArch64 port of the fix for JDK-8256056 "Deoptimization stub > doesn't save vector registers on x86". The problem is that a vector > produced by the Vector API may be stored in a register when the deopt > blob is called. Because the deopt blob only stores the lower half of > vector registers, the full vector object cannot be rematerialized during > deoptimization. So the following will crash on AArch64 with current JDK: > > make test TEST="jdk/incubator/vector" \ > JTREG="VM_OPTIONS=-XX:+DeoptimizeALot -XX:DeoptimizeALotInterval=0" > > The fix is to store the full vector registers by passing > save_vectors=true to save_live_registers() in the deopt blob. Because > save_live_registers() places the integer registers above the floating > registers in the stack frame, RegisterSaver::r0_offset_in_bytes() needs > to calculate the SP offset based on whether full vectors were saved, and > whether those vectors were NEON or SVE, rather than using a static > offset as it does currently. > > The change to VectorSupport::allocate_vector_payload_helper() is > required because we only store the lowest VMReg slot in the oop map. > However unlike x86 the vector registers are always saved in a contiguous > region of memory, so we can calculate the address of each vector element > as an offset from the address of the first slot. X86 handles this in > RegisterMap::pd_location() but that won't work on AArch64 because with > SVE there isn't a unique VMReg corresponding to each four-byte physical > slot in the vector (there are always exactly eight logical VMRegs > regardless of the actual vector length). > > Tested hotspot_all_no_apps and jdk_core. Much better, thanks. I suggest the following changes: * please, leave original `location(VMReg reg)`/`pd_location(VMReg reg)` intact and introduce `(VMReg reg, uint slot_idx)` overloads; * it's fair for `location(VMReg reg, uint slot_idx)` to require `reg` to always be a base register: address RegisterMap::location(VMReg base_reg, uint slot_idx) const { if (slot_idx > 0) { return pd_location(base_reg, slot_idx); } else { return location(base_reg); } * on all platforms except AArch64 define `pd_location(VMReg base_reg, int slot_idx)` as: address RegisterMap::pd_location(VMReg base_reg, int slot_idx) const { return location(base_reg->next(slot_idx)); } * on AArch64 check for vector register case and special-case it to: address RegisterMap::pd_location(VMReg base_reg, int slot_idx) const { if (base_reg->is_FloatRegister()) { assert((base_reg->value() - ConcreteRegisterImpl::max_gpr) % FloatRegisterImpl::max_slots_per_register == 0, "not a base register"); intptr_t offset_in_bytes = slot_idx * VMRegImpl::stack_slot_size; address base_location = location(base_reg); if (base_location != NULL) { return base_location + offset_in_bytes; } } return location(base_reg->next(slot_idx)); } Or, as an alternative (since all the registers are stored contiguously on AArch64 anyway): address RegisterMap::pd_location(VMReg base_reg, int slot_idx) const { intptr_t offset_in_bytes = slot_idx * VMRegImpl::stack_slot_size; address base_location = location(base_reg); if (base_location != NULL) { return base_location + offset_in_bytes; } } * keep the assert in `VMReg::next(int)` if you wish, but refactor it into an out-of-bounds check instead. Personally, I'd prefer to see it as a separate enhancement. ------------- PR: https://git.openjdk.java.net/jdk/pull/2279 From vlivanov at openjdk.java.net Mon Feb 1 12:44:45 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 1 Feb 2021 12:44:45 GMT Subject: RFR: 8260355: AArch64: deoptimization stub should save vector registers [v3] In-Reply-To: References: <5PbujtOhWB7uqNXu5vRMwYGNMNU78aLAkGpDLWUnQKM=.9d2206fe-9c15-424c-b08e-80eab468df2a@github.com> Message-ID: On Mon, 1 Feb 2021 09:36:05 GMT, Nick Gasson wrote: >> This is an AArch64 port of the fix for JDK-8256056 "Deoptimization stub >> doesn't save vector registers on x86". The problem is that a vector >> produced by the Vector API may be stored in a register when the deopt >> blob is called. Because the deopt blob only stores the lower half of >> vector registers, the full vector object cannot be rematerialized during >> deoptimization. So the following will crash on AArch64 with current JDK: >> >> make test TEST="jdk/incubator/vector" \ >> JTREG="VM_OPTIONS=-XX:+DeoptimizeALot -XX:DeoptimizeALotInterval=0" >> >> The fix is to store the full vector registers by passing >> save_vectors=true to save_live_registers() in the deopt blob. Because >> save_live_registers() places the integer registers above the floating >> registers in the stack frame, RegisterSaver::r0_offset_in_bytes() needs >> to calculate the SP offset based on whether full vectors were saved, and >> whether those vectors were NEON or SVE, rather than using a static >> offset as it does currently. >> >> The change to VectorSupport::allocate_vector_payload_helper() is >> required because we only store the lowest VMReg slot in the oop map. >> However unlike x86 the vector registers are always saved in a contiguous >> region of memory, so we can calculate the address of each vector element >> as an offset from the address of the first slot. X86 handles this in >> RegisterMap::pd_location() but that won't work on AArch64 because with >> SVE there isn't a unique VMReg corresponding to each four-byte physical >> slot in the vector (there are always exactly eight logical VMRegs >> regardless of the actual vector length). >> >> Tested hotspot_all_no_apps and jdk_core. > > Nick Gasson has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into 8260355 > - Move SVE slot handling to RegisterMap::pd_location > - 8260355: AArch64: deoptimization stub should save vector registers > > This is an AArch64 port of the fix for JDK-8256056 "Deoptimization stub > doesn't save vector registers on x86". The problem is that a vector > produced by the Vector API may be stored in a register when the deopt > blob is called. Because the deopt blob only stores the lower half of > vector registers, the full vector object cannot be rematerialized during > deoptimization. So the following will crash on AArch64 with current JDK: > > make test TEST="jdk/incubator/vector" \ > JTREG="VM_OPTIONS=-XX:+DeoptimizeALot -XX:DeoptimizeALotInterval=0" > > The fix is to store the full vector registers by passing > save_vectors=true to save_live_registers() in the deopt blob. Because > save_live_registers() places the integer registers above the floating > registers in the stack frame, RegisterSaver::r0_offset_in_bytes() needs > to calculate the SP offset based on whether full vectors were saved, and > whether those vectors were NEON or SVE, rather than using a static > offset as it does currently. > > The change to VectorSupport::allocate_vector_payload_helper() is > required because we only store the lowest VMReg slot in the oop map. > However unlike x86 the vector registers are always saved in a contiguous > region of memory, so we can calculate the address of each vector element > as an offset from the address of the first slot. X86 handles this in > RegisterMap::pd_location() but that won't work on AArch64 because with > SVE there isn't a unique VMReg corresponding to each four-byte physical > slot in the vector (there are always exactly eight logical VMRegs > regardless of the actual vector length). > > Tested hotspot_all_no_apps and jdk_core. src/hotspot/share/prims/vectorSupport.cpp line 141: > 139: int off = (i * elem_size) % VMRegImpl::stack_slot_size; > 140: > 141: address elem_addr = reg_map->location(vreg, vslot) + off; Unrelated to your change: please, put the following comment: address elem_addr = reg_map->location(vreg, vslot) + off; // assumes little endian element order ------------- PR: https://git.openjdk.java.net/jdk/pull/2279 From ihse at openjdk.java.net Mon Feb 1 14:09:51 2021 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Mon, 1 Feb 2021 14:09:51 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v6] In-Reply-To: <1S0Z45oJKy8oesCN0k6SIHhZLzDYW9j-M_yfWZoHEzI=.2694c382-6613-4cd1-a917-16147bb00a9e@github.com> References: <-iXh6ikIWdG2YM4G3ZE33bE_bflntM9JZ0JWd1vSMKU=.7c380f16-0e1f-4fa8-8b47-f0e9e56fdba3@github.com> <1S0Z45oJKy8oesCN0k6SIHhZLzDYW9j-M_yfWZoHEzI=.2694c382-6613-4cd1-a917-16147bb00a9e@github.com> Message-ID: On Mon, 1 Feb 2021 12:35:05 GMT, Alan Hayward wrote: > > Hello, hsdis is a separate out-of-tree project and is not part of this jep. > > Unless there's something I'm missing it only requires a few lines of change to src/utils/hsdis/makefile (it already has support for macos x86_64) I agree with Alan that it makes sense to add this trivial change as part of this PR, if it allows the current hsdis solution to continue working on mac/aarch64. > > > support for looking for proper hsdis dylib library was added as part of this jep. > > I'm a little confused. Are you planning on adding a new disassembler? @a74nh I think Vladimir is referring to https://github.com/openjdk/jdk/pull/392. The hsdis "component" has been left behind for a long time, and there are several requests to add new backends, which require a modernized build of hsdis. This is undfortunately not a high-priority project, and has been postponed several times already. :( ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From dongbo at openjdk.java.net Mon Feb 1 14:36:04 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Mon, 1 Feb 2021 14:36:04 GMT Subject: [jdk16] RFR: 8260585: AArch64: Wrong code generated for shifting right and accumulating four unsigned short integers [v5] In-Reply-To: References: Message-ID: <3CKNS6J7Rr4dq69Vw5-PofaIvMgZulMUZCYk1N9Hy9E=.e6dc1c3f-080c-4d30-9f94-2f149613d9a0@github.com> > This is a typo introduced by JDK-8255949. > Compiler will generate `ushr` for shifting right and accumulating four short integers. > It produces wrong results for specific case. The instruction should be `usra`. Dong Bo has updated the pull request incrementally with one additional commit since the last revision: update tests for bytes and shorts ------------- Changes: - all: https://git.openjdk.java.net/jdk16/pull/136/files - new: https://git.openjdk.java.net/jdk16/pull/136/files/f2e490a3..ca3d2192 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk16&pr=136&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk16&pr=136&range=03-04 Stats: 74 lines in 1 file changed: 19 ins; 0 del; 55 mod Patch: https://git.openjdk.java.net/jdk16/pull/136.diff Fetch: git fetch https://git.openjdk.java.net/jdk16 pull/136/head:pull/136 PR: https://git.openjdk.java.net/jdk16/pull/136 From lfoltan at openjdk.java.net Mon Feb 1 17:12:43 2021 From: lfoltan at openjdk.java.net (Lois Foltan) Date: Mon, 1 Feb 2021 17:12:43 GMT Subject: RFR: 8260471: Change SystemDictionary::X_klass calls to vmClasses::X_klass [v2] In-Reply-To: References: Message-ID: On Fri, 29 Jan 2021 14:26:03 GMT, Ioi Lam wrote: >> This is the second step of https://github.com/openjdk/jdk/pull/2246 (8260467: Move well-known classes from systemDictionary.hpp to vmClasses.hpp). These are mostly boiler-plate changes done by scripts. >> >> [1] Change calls like >> >> SystemDictionary::Object_klass() >> SystemDictionary::Throwable_klass_is_loaded() >> SystemDictionary::box_klass_type() >> >> to >> >> vmClasses::Object_klass() >> vmClasses::Throwable_klass_is_loaded() >> vmClasses::box_klass_type() >> >> [2] Remove unnecessary inclusion of systemDictionary.hpp (replace with vmClasses.hpp if necessary). In some cases, I have to add signature.hpp to some files, which only indirectly included signature.hpp through systemDictionary.hpp. >> >> [3] In the previous PR, I incorrectly used the enum name `VMClassID`. This PR changes it to `vmClassID` to match the existing use of `vmSymbolID` and `vmIntrinsicID`. >> >> Due to the refactoring of these two PRs, the number of HotSpot .o files that include systemDictionary.hpp decreases from 491 to 91. HotSpot build time is reduced by about 2% >> >> Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. >> >> Review Notes: if you don't want to scroll through 185 files, you may want to try: >> >> curl https://github.com/openjdk/jdk/compare/1de3c554477497d1ceee573180940e8d38c364ee...e2f77252c8b3edd4d0071cfc014290568a16de9d.diff | \ >> grep -v '^[+-][+-][+-]' | grep '^[+-]' > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > added missing #include systemDictionary.hpp Looks good!! Lois ------------- Marked as reviewed by lfoltan (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2301 From stuefe at openjdk.java.net Mon Feb 1 18:12:42 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Mon, 1 Feb 2021 18:12:42 GMT Subject: RFR: 8260471: Change SystemDictionary::X_klass calls to vmClasses::X_klass [v2] In-Reply-To: References: Message-ID: On Mon, 1 Feb 2021 17:09:42 GMT, Lois Foltan wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> added missing #include systemDictionary.hpp > > Looks good!! > Lois Hi Ioi, Sorry, I still get build errors with this (linux x64, fastdebug, nopch): link_klass(SystemDictionary::Reference_klass()); ^~~~~~~~~~~~~~~ Last Commit: commit e1a09411c12cdd95bf1f8896100284e0697fafb7 (HEAD -> pull/2301) ? Author: iklam ? Date: Fri Jan 29 06:22:52 2021 -0800 ? ? added missing #include systemDictionary.hpp I'm also confused why no GA ran for this pr. I gave the patch a cursory read, it reads all okay but of course I cannot see from reading the patch whether we could miss some includes. I'd like to see the GA builds being successful, in this case also for the side platforms and minimal builds/zero. Thanks, Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/2301 From aph at openjdk.java.net Mon Feb 1 18:49:58 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 1 Feb 2021 18:49:58 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v8] In-Reply-To: References: Message-ID: On Sun, 31 Jan 2021 20:14:01 GMT, Anton Kozlov wrote: >> Please review the implementation of JEP 391: macOS/AArch64 Port. >> >> It's heavily based on existing ports to linux/aarch64, macos/x86_64, and windows/aarch64. >> >> Major changes are in: >> * src/hotspot/cpu/aarch64: support of the new calling convention (subtasks JDK-8253817, JDK-8253818) >> * src/hotspot/os_cpu/bsd_aarch64: copy of os_cpu/linux_aarch64 with necessary adjustments (JDK-8253819) >> * src/hotspot/share, test/hotspot/gtest: support of write-xor-execute (W^X), required on macOS/AArch64 platform. It's implemented with pthread_jit_write_protect_np provided by Apple. The W^X mode is local to a thread, so W^X mode change relates to the java thread state change (for java threads). In most cases, JVM executes in write-only mode, except when calling a generated stub like SafeFetch, which requires a temporary switch to execute-only mode. The same execute-only mode is enabled when a java thread executes in java or native states. This approach of managing W^X mode turned out to be simple and efficient enough. >> * src/jdk.hotspot.agent: serviceability agent implementation (JDK-8254941) > > Anton Kozlov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 62 commits: > > - Merge branch 'master' into jdk-macos > - Update copyright year for BsdAARCH64ThreadContext.java > - Fix inclusing of StubRoutines header > - Redo buildsys fix > - Revert harfbuzz changes, disable warnings for it > - Little adjustement of SlowSignatureHandler > - Partially bring previous commit > - Revert "Address feedback for signature generators" > > This reverts commit 50b55f6684cd21f8b532fa979b7b6fbb4613266d. > - Refactor CDS disabling > - Redo builsys support for aarch64-darwin > - ... and 52 more: https://git.openjdk.java.net/jdk/compare/8a9004da...b421e0b4 src/hotspot/cpu/aarch64/interpreterRT_aarch64.cpp line 84: > 82: // on stack. Natural alignment for types are still in place, > 83: // for example double/long should be 8 bytes alligned > 84: This comment is a bit confusing because it's no longer #ifdef APPLE. Better move it up to Line 41. src/hotspot/cpu/aarch64/interpreterRT_aarch64.cpp line 352: > 350: > 351: #ifdef __APPLE__ > 352: virtual void pass_byte() Please remove ```#ifdef __APPLE__``` around this region. src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 839: > 837: // The code unable to handle this, bailout. > 838: return -1; > 839: #endif This looks like a bug to me. The caller doesn't necessarily check the return value. See CallRuntimeNode::calling_convention. ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From iklam at openjdk.java.net Mon Feb 1 18:52:52 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Mon, 1 Feb 2021 18:52:52 GMT Subject: RFR: 8260193: Remove JVM_GetInterfaceVersion() and JVM_DTraceXXX Message-ID: - JVM_GetInterfaceVersion() was used by "HotSpot Express" (HSX) which allowed the same JDK library to use different version of HotSpot. However, HSX is no longer supported so this API should be removed. - Implementations of APIs such as JVM_DTraceActivate, were removed in [JDK-8068976](https://bugs.openjdk.java.net/browse/JDK-8068976), so their declarations should be removed from jvm.h ------------- Commit messages: - 8260193: Remove JVM_GetInterfaceVersion() and JVM_DTraceXXX Changes: https://git.openjdk.java.net/jdk/pull/2338/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2338&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260193 Stats: 110 lines in 3 files changed: 0 ins; 109 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2338.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2338/head:pull/2338 PR: https://git.openjdk.java.net/jdk/pull/2338 From github.com+9200663+quaffel at openjdk.java.net Mon Feb 1 18:56:02 2021 From: github.com+9200663+quaffel at openjdk.java.net (Niklas Radomski) Date: Mon, 1 Feb 2021 18:56:02 GMT Subject: RFR: 8260368: [PPC64] GC interface needs enhancement to support GCs with load barriers Message-ID: At present, the `needs_frame` flag is used on the ppc platform to determine whether gc barriers must emit a new stack frame (and save the link register, for that matter) or not. With the introduction of load reference barriers, however, this mechansim is no longer sufficient. This holds especially true for compiler stubs as those make heavy use of volatile registers. To mitigate this, this patch replaces the `needs_frame` flag with a simple enumeration. As the enumerators are incremental, handling the different "register preservation needs" in the actual gc barrier implementations is comparatively (pun intended) easy. _This is a preparational change for the ShenandoahGC port to ppc. As such, it may provide some functionality this version doesn't make use of, but that is required for the upcoming change. This way, the scope of the upcoming change is limited to GC-specific functionality; making its review a little easier._ _For the same reason, this patch also introduces patching support for `LIR_Assembler::leal`._ ------------- Commit messages: - 8260368: Enhance gc interface for advanced gc barriers Changes: https://git.openjdk.java.net/jdk/pull/2302/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2302&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260368 Stats: 306 lines in 19 files changed: 120 ins; 20 del; 166 mod Patch: https://git.openjdk.java.net/jdk/pull/2302.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2302/head:pull/2302 PR: https://git.openjdk.java.net/jdk/pull/2302 From mdoerr at openjdk.java.net Mon Feb 1 18:56:02 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Mon, 1 Feb 2021 18:56:02 GMT Subject: RFR: 8260368: [PPC64] GC interface needs enhancement to support GCs with load barriers In-Reply-To: References: Message-ID: <581tClq3bai1nul8Bj98EB_hp8VUtqgOCWgf08Tfycw=.e985b2b9-85e9-4638-bd58-07f041a7c70b@github.com> On Thu, 28 Jan 2021 22:08:17 GMT, Niklas Radomski wrote: > At present, the `needs_frame` flag is used on the ppc platform to determine whether gc barriers must emit a new stack frame (and save the link register, for that matter) or not. With the introduction of load reference barriers, however, this mechansim is no longer sufficient. This holds especially true for compiler stubs as those make heavy use of volatile registers. > > To mitigate this, this patch replaces the `needs_frame` flag with a simple enumeration. As the enumerators are incremental, handling the different "register preservation needs" in the actual gc barrier implementations is comparatively (pun intended) easy. > > _This is a preparational change for the ShenandoahGC port to ppc. As such, it may provide some functionality this version doesn't make use of, but that is required for the upcoming change. This way, the scope of the upcoming change is limited to GC-specific functionality; making its review a little easier._ > > _For the same reason, this patch also introduces patching support for `LIR_Assembler::leal`._ Thanks for reworking it this way! Nice improvement! Looks good. I'll sponsor it after review is completed. ------------- Marked as reviewed by mdoerr (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2302 From iklam at openjdk.java.net Mon Feb 1 19:04:05 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Mon, 1 Feb 2021 19:04:05 GMT Subject: RFR: 8260471: Change SystemDictionary::X_klass calls to vmClasses::X_klass [v3] In-Reply-To: References: Message-ID: > This is the second step of https://github.com/openjdk/jdk/pull/2246 (8260467: Move well-known classes from systemDictionary.hpp to vmClasses.hpp). These are mostly boiler-plate changes done by scripts. > > [1] Change calls like > > SystemDictionary::Object_klass() > SystemDictionary::Throwable_klass_is_loaded() > SystemDictionary::box_klass_type() > > to > > vmClasses::Object_klass() > vmClasses::Throwable_klass_is_loaded() > vmClasses::box_klass_type() > > [2] Remove unnecessary inclusion of systemDictionary.hpp (replace with vmClasses.hpp if necessary). In some cases, I have to add signature.hpp to some files, which only indirectly included signature.hpp through systemDictionary.hpp. > > [3] In the previous PR, I incorrectly used the enum name `VMClassID`. This PR changes it to `vmClassID` to match the existing use of `vmSymbolID` and `vmIntrinsicID`. > > Due to the refactoring of these two PRs, the number of HotSpot .o files that include systemDictionary.hpp decreases from 491 to 91. HotSpot build time is reduced by about 2% > > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. > > Review Notes: if you don't want to scroll through 185 files, you may want to try: > > curl https://github.com/openjdk/jdk/compare/1de3c554477497d1ceee573180940e8d38c364ee...e2f77252c8b3edd4d0071cfc014290568a16de9d.diff | \ > grep -v '^[+-][+-][+-]' | grep '^[+-]' Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: fixed AOT build ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2301/files - new: https://git.openjdk.java.net/jdk/pull/2301/files/e1a09411..8df077d4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2301&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2301&range=01-02 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2301.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2301/head:pull/2301 PR: https://git.openjdk.java.net/jdk/pull/2301 From iklam at openjdk.java.net Mon Feb 1 19:14:38 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Mon, 1 Feb 2021 19:14:38 GMT Subject: RFR: 8260471: Change SystemDictionary::X_klass calls to vmClasses::X_klass [v2] In-Reply-To: References: Message-ID: On Mon, 1 Feb 2021 18:10:02 GMT, Thomas Stuefe wrote: >> Looks good!! >> Lois > > Hi Ioi, > > Sorry, I still get build errors with this (linux x64, fastdebug, nopch): > > > > link_klass(SystemDictionary::Reference_klass()); > ^~~~~~~~~~~~~~~ > > Last Commit: > commit e1a09411c12cdd95bf1f8896100284e0697fafb7 (HEAD -> pull/2301) ? > Author: iklam ? > Date: Fri Jan 29 06:22:52 2021 -0800 ? > ? > added missing #include systemDictionary.hpp > > I'm also confused why no GA ran for this pr. I gave the patch a cursory read, it reads all okay but of course I cannot see from reading the patch whether we could miss some includes. I'd like to see the GA builds being successful, in this case also for the side platforms and minimal builds/zero. > > Thanks, Thomas > Hi Ioi, > > Sorry, I still get build errors with this (linux x64, fastdebug, nopch): > > ``` > /shared/projects/openjdk/jdk-jdk/source/src/hotspot/share/aot/aotCodeHeap.cpp: In member function 'void AOTCodeHeap::link_known_klasses()': > /shared/projects/openjdk/jdk-jdk/source/src/hotspot/share/aot/aotCodeHeap.cpp:397:32: error: 'Reference_klass' is not a member of 'SystemDictionary' > link_klass(SystemDictionary::Reference_klass()); > ^~~~~~~~~~~~~~~ > ``` > > Last Commit: > > ``` > commit e1a09411c12cdd95bf1f8896100284e0697fafb7 (HEAD -> pull/2301) ? > Author: iklam ? > Date: Fri Jan 29 06:22:52 2021 -0800 ? > ? > added missing #include systemDictionary.hpp > ``` > > I'm also confused why no GA ran for this pr. I gave the patch a cursory read, it reads all okay but of course I cannot see from reading the patch whether we could miss some includes. I'd like to see the GA builds being successful, in this case also for the side platforms and minimal builds/zero. > > Thanks, Thomas I disabled the GitHub Actions for my repo because they were creating too much noise. Instead, I've been testing my builds with Mach5, which has a much larger variety of builds. Unfortunately AOT is disabled by default in Mach5 (due to JDK-8255616: Removal of experimental features AOT and Graal JIT). Now I've added AOT to my local builds to make sure I don't break it unintentionally. I am also re-enabling GitHub actions on my repo. I verified that AOT builds again with [8df077d](https://github.com/openjdk/jdk/pull/2301/commits/8df077d47201dbb88171c8137acb57d26f0dd007) ------------- PR: https://git.openjdk.java.net/jdk/pull/2301 From rkennke at openjdk.java.net Mon Feb 1 19:15:41 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 1 Feb 2021 19:15:41 GMT Subject: RFR: 8260368: [PPC64] GC interface needs enhancement to support GCs with load barriers In-Reply-To: References: Message-ID: On Thu, 28 Jan 2021 22:08:17 GMT, Niklas Radomski wrote: > At present, the `needs_frame` flag is used on the ppc platform to determine whether gc barriers must emit a new stack frame (and save the link register, for that matter) or not. With the introduction of load reference barriers, however, this mechansim is no longer sufficient. This holds especially true for compiler stubs as those make heavy use of volatile registers. > > To mitigate this, this patch replaces the `needs_frame` flag with a simple enumeration. As the enumerators are incremental, handling the different "register preservation needs" in the actual gc barrier implementations is comparatively (pun intended) easy. > > _This is a preparational change for the ShenandoahGC port to ppc. As such, it may provide some functionality this version doesn't make use of, but that is required for the upcoming change. This way, the scope of the upcoming change is limited to GC-specific functionality; making its review a little easier._ > > _For the same reason, this patch also introduces patching support for `LIR_Assembler::leal`._ Thank you for doing that! I cannot really comment much on PPC specifics. Structurally and conceptually it looks reasonable to me. I've one question about the use of int vs enum though. src/hotspot/cpu/ppc/gc/g1/g1BarrierSetAssembler_ppc.cpp line 111: > 109: > 110: void G1BarrierSetAssembler::g1_write_barrier_pre(MacroAssembler* masm, DecoratorSet decorators, Register obj, RegisterOrConstant ind_or_offs, Register pre_val, > 111: Register tmp1, Register tmp2, unsigned int preservation_level) { Wouldn't it make sense to use an enum type here? Or is this something that the PPC toolchain doesn't like? Applies to all occurances of similar change all over the place. ------------- PR: https://git.openjdk.java.net/jdk/pull/2302 From mdoerr at openjdk.java.net Mon Feb 1 19:35:43 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Mon, 1 Feb 2021 19:35:43 GMT Subject: RFR: 8260368: [PPC64] GC interface needs enhancement to support GCs with load barriers In-Reply-To: References: Message-ID: On Mon, 1 Feb 2021 19:07:22 GMT, Roman Kennke wrote: >> At present, the `needs_frame` flag is used on the ppc platform to determine whether gc barriers must emit a new stack frame (and save the link register, for that matter) or not. With the introduction of load reference barriers, however, this mechansim is no longer sufficient. This holds especially true for compiler stubs as those make heavy use of volatile registers. >> >> To mitigate this, this patch replaces the `needs_frame` flag with a simple enumeration. As the enumerators are incremental, handling the different "register preservation needs" in the actual gc barrier implementations is comparatively (pun intended) easy. >> >> _This is a preparational change for the ShenandoahGC port to ppc. As such, it may provide some functionality this version doesn't make use of, but that is required for the upcoming change. This way, the scope of the upcoming change is limited to GC-specific functionality; making its review a little easier._ >> >> _For the same reason, this patch also introduces patching support for `LIR_Assembler::leal`._ > > src/hotspot/cpu/ppc/gc/g1/g1BarrierSetAssembler_ppc.cpp line 111: > >> 109: >> 110: void G1BarrierSetAssembler::g1_write_barrier_pre(MacroAssembler* masm, DecoratorSet decorators, Register obj, RegisterOrConstant ind_or_offs, Register pre_val, >> 111: Register tmp1, Register tmp2, unsigned int preservation_level) { > > Wouldn't it make sense to use an enum type here? Or is this something that the PPC toolchain doesn't like? Applies to all occurances of similar change all over the place. Hi Roman, enum type works. We only don't want to use C++11 features like enum class or typed enums because we will probably backport this change to 11u. ------------- PR: https://git.openjdk.java.net/jdk/pull/2302 From hseigel at openjdk.java.net Mon Feb 1 19:49:41 2021 From: hseigel at openjdk.java.net (Harold Seigel) Date: Mon, 1 Feb 2021 19:49:41 GMT Subject: RFR: 8260471: Change SystemDictionary::X_klass calls to vmClasses::X_klass [v3] In-Reply-To: References: Message-ID: On Mon, 1 Feb 2021 19:04:05 GMT, Ioi Lam wrote: >> This is the second step of https://github.com/openjdk/jdk/pull/2246 (8260467: Move well-known classes from systemDictionary.hpp to vmClasses.hpp). These are mostly boiler-plate changes done by scripts. >> >> [1] Change calls like >> >> SystemDictionary::Object_klass() >> SystemDictionary::Throwable_klass_is_loaded() >> SystemDictionary::box_klass_type() >> >> to >> >> vmClasses::Object_klass() >> vmClasses::Throwable_klass_is_loaded() >> vmClasses::box_klass_type() >> >> [2] Remove unnecessary inclusion of systemDictionary.hpp (replace with vmClasses.hpp if necessary). In some cases, I have to add signature.hpp to some files, which only indirectly included signature.hpp through systemDictionary.hpp. >> >> [3] In the previous PR, I incorrectly used the enum name `VMClassID`. This PR changes it to `vmClassID` to match the existing use of `vmSymbolID` and `vmIntrinsicID`. >> >> Due to the refactoring of these two PRs, the number of HotSpot .o files that include systemDictionary.hpp decreases from 491 to 91. HotSpot build time is reduced by about 2% >> >> Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. >> >> Review Notes: if you don't want to scroll through 185 files, you may want to try: >> >> curl https://github.com/openjdk/jdk/compare/1de3c554477497d1ceee573180940e8d38c364ee...e2f77252c8b3edd4d0071cfc014290568a16de9d.diff | \ >> grep -v '^[+-][+-][+-]' | grep '^[+-]' > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > fixed AOT build Changes look good! Thanks, Harold ------------- Marked as reviewed by hseigel (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2301 From rkennke at openjdk.java.net Mon Feb 1 19:50:43 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 1 Feb 2021 19:50:43 GMT Subject: RFR: 8260368: [PPC64] GC interface needs enhancement to support GCs with load barriers In-Reply-To: References: Message-ID: On Mon, 1 Feb 2021 19:33:21 GMT, Martin Doerr wrote: >> src/hotspot/cpu/ppc/gc/g1/g1BarrierSetAssembler_ppc.cpp line 111: >> >>> 109: >>> 110: void G1BarrierSetAssembler::g1_write_barrier_pre(MacroAssembler* masm, DecoratorSet decorators, Register obj, RegisterOrConstant ind_or_offs, Register pre_val, >>> 111: Register tmp1, Register tmp2, unsigned int preservation_level) { >> >> Wouldn't it make sense to use an enum type here? Or is this something that the PPC toolchain doesn't like? Applies to all occurances of similar change all over the place. > > Hi Roman, > enum type works. We only don't want to use C++11 features like enum class or typed enums because we will probably backport this change to 11u. IOW, can we write MacroAssembler::RuntimeInvocationPreservationLevel instead of unsigned int there or not? I believe this should be possible (even though it'll still only be treated by the compiler as an int) ------------- PR: https://git.openjdk.java.net/jdk/pull/2302 From alanb at openjdk.java.net Mon Feb 1 19:54:41 2021 From: alanb at openjdk.java.net (Alan Bateman) Date: Mon, 1 Feb 2021 19:54:41 GMT Subject: RFR: 8260193: Remove JVM_GetInterfaceVersion() and JVM_DTraceXXX In-Reply-To: References: Message-ID: On Mon, 1 Feb 2021 18:40:54 GMT, Ioi Lam wrote: > - JVM_GetInterfaceVersion() was used by "HotSpot Express" (HSX) which allowed the same JDK library to use different version of HotSpot. However, HSX is no longer supported so this API should be removed. > - Implementations of APIs such as JVM_DTraceActivate, were removed in [JDK-8068976](https://bugs.openjdk.java.net/browse/JDK-8068976), so their declarations should be removed from jvm.h Marked as reviewed by alanb (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2338 From lfoltan at openjdk.java.net Mon Feb 1 19:58:39 2021 From: lfoltan at openjdk.java.net (Lois Foltan) Date: Mon, 1 Feb 2021 19:58:39 GMT Subject: RFR: 8260193: Remove JVM_GetInterfaceVersion() and JVM_DTraceXXX In-Reply-To: References: Message-ID: On Mon, 1 Feb 2021 18:40:54 GMT, Ioi Lam wrote: > - JVM_GetInterfaceVersion() was used by "HotSpot Express" (HSX) which allowed the same JDK library to use different version of HotSpot. However, HSX is no longer supported so this API should be removed. > - Implementations of APIs such as JVM_DTraceActivate, were removed in [JDK-8068976](https://bugs.openjdk.java.net/browse/JDK-8068976), so their declarations should be removed from jvm.h Marked as reviewed by lfoltan (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2338 From iklam at openjdk.java.net Mon Feb 1 20:10:58 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Mon, 1 Feb 2021 20:10:58 GMT Subject: RFR: 8260193: Remove JVM_GetInterfaceVersion() and JVM_DTraceXXX [v2] In-Reply-To: References: Message-ID: <188th_PzKn-dtdX8nHylqBZEa7Dddi7cU13bkoDzigc=.6a12ee5d-b027-4012-a137-0169440d61b6@github.com> > - JVM_GetInterfaceVersion() was used by "HotSpot Express" (HSX) which allowed the same JDK library to use different version of HotSpot. However, HSX is no longer supported so this API should be removed. > - Implementations of APIs such as JVM_DTraceActivate, were removed in [JDK-8068976](https://bugs.openjdk.java.net/browse/JDK-8068976), so their declarations should be removed from jvm.h Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: fixed macos build ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2338/files - new: https://git.openjdk.java.net/jdk/pull/2338/files/c0307e7d..3a6415eb Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2338&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2338&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2338.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2338/head:pull/2338 PR: https://git.openjdk.java.net/jdk/pull/2338 From github.com+9200663+quaffel at openjdk.java.net Mon Feb 1 20:15:40 2021 From: github.com+9200663+quaffel at openjdk.java.net (Niklas Radomski) Date: Mon, 1 Feb 2021 20:15:40 GMT Subject: RFR: 8260368: [PPC64] GC interface needs enhancement to support GCs with load barriers In-Reply-To: References: Message-ID: On Mon, 1 Feb 2021 19:48:21 GMT, Roman Kennke wrote: >> Hi Roman, >> enum type works. We only don't want to use C++11 features like enum class or typed enums because we will probably backport this change to 11u. > > IOW, can we write MacroAssembler::RuntimeInvocationPreservationLevel instead of unsigned int there or not? I believe this should be possible (even though it'll still only be treated by the compiler as an int) Yes, that should be totally fine. As a site note: The compiler's behavior isn't well-defined, though. It could be `unsigned int`, `int`, or any other integral type that can represent the enumerators (that is not bigger than `unsigned int`). I'm not a huge fan of _potentially_ using signed integers for enum and/or flag-like parameters, but as an enum _does_ increase the code's readability and ultimately the developer experience by a lot, we should probably use that instead. Typed enums would have been perfect here, but Martin has already pointed out why that isn't an option. The true reason why I didn't use an enum in the first place, however, is a different one: I originally intended to implement a flag-like mechanism. As it turns out, that is an absolute overkill as the different preservation levels are incremental. Since then, I haven't touched the types, so yeah... TL;DR: Good catch, will change it. ------------- PR: https://git.openjdk.java.net/jdk/pull/2302 From gziemski at openjdk.java.net Mon Feb 1 20:32:41 2021 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Mon, 1 Feb 2021 20:32:41 GMT Subject: RFR: 8260193: Remove JVM_GetInterfaceVersion() and JVM_DTraceXXX [v2] In-Reply-To: <188th_PzKn-dtdX8nHylqBZEa7Dddi7cU13bkoDzigc=.6a12ee5d-b027-4012-a137-0169440d61b6@github.com> References: <188th_PzKn-dtdX8nHylqBZEa7Dddi7cU13bkoDzigc=.6a12ee5d-b027-4012-a137-0169440d61b6@github.com> Message-ID: On Mon, 1 Feb 2021 20:10:58 GMT, Ioi Lam wrote: >> - JVM_GetInterfaceVersion() was used by "HotSpot Express" (HSX) which allowed the same JDK library to use different version of HotSpot. However, HSX is no longer supported so this API should be removed. >> - Implementations of APIs such as JVM_DTraceActivate, were removed in [JDK-8068976](https://bugs.openjdk.java.net/browse/JDK-8068976), so their declarations should be removed from jvm.h > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > fixed macos build Changes requested by gziemski (Committer). src/java.base/share/native/libjava/check_version.c line 33: > 31: DEF_JNI_OnLoad(JavaVM *vm, void *reserved) > 32: { > 33: return JNI_VERSION_1_2; This leaves an entire file with one trivial function implementation. Can we remove the file and implement `DEF_JNI_OnLoad()` in `jni_util.h` (or some other existing suitable file) ? ------------- PR: https://git.openjdk.java.net/jdk/pull/2338 From iklam at openjdk.java.net Mon Feb 1 20:35:00 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Mon, 1 Feb 2021 20:35:00 GMT Subject: RFR: 8260471: Change SystemDictionary::X_klass calls to vmClasses::X_klass [v4] In-Reply-To: References: Message-ID: <9diipTSLJjuGg_TauTufYb21HrRu3QFGltF2929nb0Y=.280e89f4-77a2-45bb-82af-2c1dffd11e9f@github.com> > This is the second step of https://github.com/openjdk/jdk/pull/2246 (8260467: Move well-known classes from systemDictionary.hpp to vmClasses.hpp). These are mostly boiler-plate changes done by scripts. > > [1] Change calls like > > SystemDictionary::Object_klass() > SystemDictionary::Throwable_klass_is_loaded() > SystemDictionary::box_klass_type() > > to > > vmClasses::Object_klass() > vmClasses::Throwable_klass_is_loaded() > vmClasses::box_klass_type() > > [2] Remove unnecessary inclusion of systemDictionary.hpp (replace with vmClasses.hpp if necessary). In some cases, I have to add signature.hpp to some files, which only indirectly included signature.hpp through systemDictionary.hpp. > > [3] In the previous PR, I incorrectly used the enum name `VMClassID`. This PR changes it to `vmClassID` to match the existing use of `vmSymbolID` and `vmIntrinsicID`. > > Due to the refactoring of these two PRs, the number of HotSpot .o files that include systemDictionary.hpp decreases from 491 to 91. HotSpot build time is reduced by about 2% > > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. > > Review Notes: if you don't want to scroll through 185 files, you may want to try: > > curl https://github.com/openjdk/jdk/compare/1de3c554477497d1ceee573180940e8d38c364ee...e2f77252c8b3edd4d0071cfc014290568a16de9d.diff | \ > grep -v '^[+-][+-][+-]' | grep '^[+-]' Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - fixed comments - Merge branch 'master' into 8260471-SystemDictionary-to-vmClasses-rename - fixed AOT build - added missing #include systemDictionary.hpp - 8260471: Change SystemDictionary::xxx_klass() calls to vmClasses::xxx_klass() ------------- Changes: https://git.openjdk.java.net/jdk/pull/2301/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2301&range=03 Stats: 705 lines in 190 files changed: 94 ins; 67 del; 544 mod Patch: https://git.openjdk.java.net/jdk/pull/2301.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2301/head:pull/2301 PR: https://git.openjdk.java.net/jdk/pull/2301 From iklam at openjdk.java.net Mon Feb 1 20:43:01 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Mon, 1 Feb 2021 20:43:01 GMT Subject: RFR: 8260471: Change SystemDictionary::X_klass calls to vmClasses::X_klass [v5] In-Reply-To: References: Message-ID: > This is the second step of https://github.com/openjdk/jdk/pull/2246 (8260467: Move well-known classes from systemDictionary.hpp to vmClasses.hpp). These are mostly boiler-plate changes done by scripts. > > [1] Change calls like > > SystemDictionary::Object_klass() > SystemDictionary::Throwable_klass_is_loaded() > SystemDictionary::box_klass_type() > > to > > vmClasses::Object_klass() > vmClasses::Throwable_klass_is_loaded() > vmClasses::box_klass_type() > > [2] Remove unnecessary inclusion of systemDictionary.hpp (replace with vmClasses.hpp if necessary). In some cases, I have to add signature.hpp to some files, which only indirectly included signature.hpp through systemDictionary.hpp. > > [3] In the previous PR, I incorrectly used the enum name `VMClassID`. This PR changes it to `vmClassID` to match the existing use of `vmSymbolID` and `vmIntrinsicID`. > > Due to the refactoring of these two PRs, the number of HotSpot .o files that include systemDictionary.hpp decreases from 491 to 91. HotSpot build time is reduced by about 2% > > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. > > Review Notes: if you don't want to scroll through 185 files, you may want to try: > > curl https://github.com/openjdk/jdk/compare/1de3c554477497d1ceee573180940e8d38c364ee...e2f77252c8b3edd4d0071cfc014290568a16de9d.diff | \ > grep -v '^[+-][+-][+-]' | grep '^[+-]' Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: fixed SA (serviceability/sa/CDSJMapClstats.java) ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2301/files - new: https://git.openjdk.java.net/jdk/pull/2301/files/533615be..7489267b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2301&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2301&range=03-04 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/2301.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2301/head:pull/2301 PR: https://git.openjdk.java.net/jdk/pull/2301 From iklam at openjdk.java.net Mon Feb 1 20:57:44 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Mon, 1 Feb 2021 20:57:44 GMT Subject: RFR: 8260193: Remove JVM_GetInterfaceVersion() and JVM_DTraceXXX [v2] In-Reply-To: References: <188th_PzKn-dtdX8nHylqBZEa7Dddi7cU13bkoDzigc=.6a12ee5d-b027-4012-a137-0169440d61b6@github.com> Message-ID: <3s1-hVjGvofyZ6o7jh6ayZWMp_RkPlt1Juig5U9zQfM=.3adf577e-6ee6-4f29-ba22-c7d15e742f3e@github.com> On Mon, 1 Feb 2021 20:29:10 GMT, Gerard Ziemski wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed macos build > > src/java.base/share/native/libjava/check_version.c line 33: > >> 31: DEF_JNI_OnLoad(JavaVM *vm, void *reserved) >> 32: { >> 33: return JNI_VERSION_1_2; > > This leaves an entire file with one trivial function implementation. Can we remove the file and implement `DEF_JNI_OnLoad()` in `jni_util.h` (or some other existing suitable file) ? I am not sure if jni_utils.c is the right file (it defines the `JNU_XXX` functions that are used by other shared libraries). There are other .c files that have trivial `DEF_JNI_OnLoad` functions (e.g., java.base/share/native/libnio/nio_util.c). @AlanBateman do you have any suggestions? ------------- PR: https://git.openjdk.java.net/jdk/pull/2338 From github.com+9200663+quaffel at openjdk.java.net Mon Feb 1 21:18:54 2021 From: github.com+9200663+quaffel at openjdk.java.net (Niklas Radomski) Date: Mon, 1 Feb 2021 21:18:54 GMT Subject: RFR: 8260368: [PPC64] GC interface needs enhancement to support GCs with load barriers [v2] In-Reply-To: References: Message-ID: <0xOJFk5EaXbtIubyqdIQEGUW4u0z5ZtdclHS9PYL1Hg=.ebc3c5ce-8da6-4510-98d9-474aec1e85a6@github.com> > At present, the `needs_frame` flag is used on the ppc platform to determine whether gc barriers must emit a new stack frame (and save the link register, for that matter) or not. With the introduction of load reference barriers, however, this mechansim is no longer sufficient. This holds especially true for compiler stubs as those make heavy use of volatile registers. > > To mitigate this, this patch replaces the `needs_frame` flag with a simple enumeration. As the enumerators are incremental, handling the different "register preservation needs" in the actual gc barrier implementations is comparatively (pun intended) easy. > > _This is a preparational change for the ShenandoahGC port to ppc. As such, it may provide some functionality this version doesn't make use of, but that is required for the upcoming change. This way, the scope of the upcoming change is limited to GC-specific functionality; making its review a little easier._ > > _For the same reason, this patch also introduces patching support for `LIR_Assembler::leal`._ Niklas Radomski has updated the pull request incrementally with three additional commits since the last revision: - Fix carg slot offset calculation - Use a distinct magic number in clobber_carg_stack_slots - Use enumeration instead of unsigned int ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2302/files - new: https://git.openjdk.java.net/jdk/pull/2302/files/46fcf931..46bfce16 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2302&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2302&range=00-01 Stats: 42 lines in 12 files changed: 4 ins; 1 del; 37 mod Patch: https://git.openjdk.java.net/jdk/pull/2302.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2302/head:pull/2302 PR: https://git.openjdk.java.net/jdk/pull/2302 From mdoerr at openjdk.java.net Mon Feb 1 21:46:44 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Mon, 1 Feb 2021 21:46:44 GMT Subject: RFR: 8260368: [PPC64] GC interface needs enhancement to support GCs with load barriers [v2] In-Reply-To: <0xOJFk5EaXbtIubyqdIQEGUW4u0z5ZtdclHS9PYL1Hg=.ebc3c5ce-8da6-4510-98d9-474aec1e85a6@github.com> References: <0xOJFk5EaXbtIubyqdIQEGUW4u0z5ZtdclHS9PYL1Hg=.ebc3c5ce-8da6-4510-98d9-474aec1e85a6@github.com> Message-ID: On Mon, 1 Feb 2021 21:18:54 GMT, Niklas Radomski wrote: >> At present, the `needs_frame` flag is used on the ppc platform to determine whether gc barriers must emit a new stack frame (and save the link register, for that matter) or not. With the introduction of load reference barriers, however, this mechansim is no longer sufficient. This holds especially true for compiler stubs as those make heavy use of volatile registers. >> >> To mitigate this, this patch replaces the `needs_frame` flag with a simple enumeration. As the enumerators are incremental, handling the different "register preservation needs" in the actual gc barrier implementations is comparatively (pun intended) easy. >> >> _This is a preparational change for the ShenandoahGC port to ppc. As such, it may provide some functionality this version doesn't make use of, but that is required for the upcoming change. This way, the scope of the upcoming change is limited to GC-specific functionality; making its review a little easier._ >> >> _For the same reason, this patch also introduces patching support for `LIR_Assembler::leal`._ > > Niklas Radomski has updated the pull request incrementally with three additional commits since the last revision: > > - Fix carg slot offset calculation > - Use a distinct magic number in clobber_carg_stack_slots > - Use enumeration instead of unsigned int Even better, now. Thanks! ------------- Marked as reviewed by mdoerr (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2302 From iklam at openjdk.java.net Tue Feb 2 04:34:47 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Tue, 2 Feb 2021 04:34:47 GMT Subject: RFR: 8260012: Reduce inclusion of collectedHeap.hpp and heapInspection.hpp Message-ID: collectedHeap.hpp is included by 477 out of 1000 .o files in HotSpot. This file in turn includes many other complex header files. In many cases, an object file only directly includes this file via: - memAllocator.hpp (which does not actually use collectedHeap.hpp) - oop.inline.hpp and compressedOops.inline.hpp (only use collectedHeap.hpp in asserts via `Universe::heap()->is_in()`). By refactoring the above 3 files, we can reduce the .o files that include collectedHeap.hpp to 242. This RFE also removes the unnecessary inclusion of heapInspection.hpp from collectedHeap.hpp. Build time of HotSpot is reduced for about 1%. Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. ------------- Commit messages: - 8260012: Reduce inclusion of collectedHeap.hpp and heapInspection.hpp Changes: https://git.openjdk.java.net/jdk/pull/2347/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2347&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260012 Stats: 110 lines in 60 files changed: 63 ins; 7 del; 40 mod Patch: https://git.openjdk.java.net/jdk/pull/2347.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2347/head:pull/2347 PR: https://git.openjdk.java.net/jdk/pull/2347 From dongbo at openjdk.java.net Tue Feb 2 07:08:11 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Tue, 2 Feb 2021 07:08:11 GMT Subject: [jdk16] RFR: 8260585: AArch64: Wrong code generated for shifting right and accumulating four unsigned short integers [v6] In-Reply-To: References: Message-ID: > This is a typo introduced by JDK-8255949. > Compiler will generate `ushr` for shifting right and accumulating four short integers. > It produces wrong results for specific case. The instruction should be `usra`. Dong Bo has updated the pull request incrementally with one additional commit since the last revision: Update tests to match .2I. Still cannot match ssra for .8B, sshr+add are not combined. ------------- Changes: - all: https://git.openjdk.java.net/jdk16/pull/136/files - new: https://git.openjdk.java.net/jdk16/pull/136/files/ca3d2192..693f8cbd Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk16&pr=136&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk16&pr=136&range=04-05 Stats: 88 lines in 1 file changed: 39 ins; 41 del; 8 mod Patch: https://git.openjdk.java.net/jdk16/pull/136.diff Fetch: git fetch https://git.openjdk.java.net/jdk16 pull/136/head:pull/136 PR: https://git.openjdk.java.net/jdk16/pull/136 From dholmes at openjdk.java.net Tue Feb 2 07:42:00 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 2 Feb 2021 07:42:00 GMT Subject: RFR: 8260471: Change SystemDictionary::X_klass calls to vmClasses::X_klass [v5] In-Reply-To: References: Message-ID: On Mon, 1 Feb 2021 20:43:01 GMT, Ioi Lam wrote: >> This is the second step of https://github.com/openjdk/jdk/pull/2246 (8260467: Move well-known classes from systemDictionary.hpp to vmClasses.hpp). These are mostly boiler-plate changes done by scripts. >> >> [1] Change calls like >> >> SystemDictionary::Object_klass() >> SystemDictionary::Throwable_klass_is_loaded() >> SystemDictionary::box_klass_type() >> >> to >> >> vmClasses::Object_klass() >> vmClasses::Throwable_klass_is_loaded() >> vmClasses::box_klass_type() >> >> [2] Remove unnecessary inclusion of systemDictionary.hpp (replace with vmClasses.hpp if necessary). In some cases, I have to add signature.hpp to some files, which only indirectly included signature.hpp through systemDictionary.hpp. >> >> [3] In the previous PR, I incorrectly used the enum name `VMClassID`. This PR changes it to `vmClassID` to match the existing use of `vmSymbolID` and `vmIntrinsicID`. >> >> Due to the refactoring of these two PRs, the number of HotSpot .o files that include systemDictionary.hpp decreases from 491 to 91. HotSpot build time is reduced by about 2% >> >> Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. >> >> Review Notes: if you don't want to scroll through 185 files, you may want to try: >> >> curl https://github.com/openjdk/jdk/compare/1de3c554477497d1ceee573180940e8d38c364ee...e2f77252c8b3edd4d0071cfc014290568a16de9d.diff | \ >> grep -v '^[+-][+-][+-]' | grep '^[+-]' > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > fixed SA (serviceability/sa/CDSJMapClstats.java) LGTM! Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2301 From stuefe at openjdk.java.net Tue Feb 2 07:53:51 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 2 Feb 2021 07:53:51 GMT Subject: RFR: 8260471: Change SystemDictionary::X_klass calls to vmClasses::X_klass [v5] In-Reply-To: References: Message-ID: <-_LSd7J2e_ReZLdIKH0wlpoR8pv2CF75saI9eJkHWms=.dd6717de-5f5e-43fe-b3fc-8ebdd649bad8@github.com> On Mon, 1 Feb 2021 20:43:01 GMT, Ioi Lam wrote: >> This is the second step of https://github.com/openjdk/jdk/pull/2246 (8260467: Move well-known classes from systemDictionary.hpp to vmClasses.hpp). These are mostly boiler-plate changes done by scripts. >> >> [1] Change calls like >> >> SystemDictionary::Object_klass() >> SystemDictionary::Throwable_klass_is_loaded() >> SystemDictionary::box_klass_type() >> >> to >> >> vmClasses::Object_klass() >> vmClasses::Throwable_klass_is_loaded() >> vmClasses::box_klass_type() >> >> [2] Remove unnecessary inclusion of systemDictionary.hpp (replace with vmClasses.hpp if necessary). In some cases, I have to add signature.hpp to some files, which only indirectly included signature.hpp through systemDictionary.hpp. >> >> [3] In the previous PR, I incorrectly used the enum name `VMClassID`. This PR changes it to `vmClassID` to match the existing use of `vmSymbolID` and `vmIntrinsicID`. >> >> Due to the refactoring of these two PRs, the number of HotSpot .o files that include systemDictionary.hpp decreases from 491 to 91. HotSpot build time is reduced by about 2% >> >> Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. >> >> Review Notes: if you don't want to scroll through 185 files, you may want to try: >> >> curl https://github.com/openjdk/jdk/compare/1de3c554477497d1ceee573180940e8d38c364ee...e2f77252c8b3edd4d0071cfc014290568a16de9d.diff | \ >> grep -v '^[+-][+-][+-]' | grep '^[+-]' > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > fixed SA (serviceability/sa/CDSJMapClstats.java) Marked as reviewed by stuefe (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2301 From stuefe at openjdk.java.net Tue Feb 2 07:53:51 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 2 Feb 2021 07:53:51 GMT Subject: RFR: 8260471: Change SystemDictionary::X_klass calls to vmClasses::X_klass [v5] In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 07:38:37 GMT, David Holmes wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed SA (serviceability/sa/CDSJMapClstats.java) > > LGTM! > > Thanks, > David Looks good to me too. Local build worked for me now. I see GA builds went through too. Thanks for this work! Anything giving us faster builds is appreciated, since build time keeps creeping up. ..Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/2301 From ihse at openjdk.java.net Tue Feb 2 08:17:51 2021 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Tue, 2 Feb 2021 08:17:51 GMT Subject: RFR: 8260193: Remove JVM_GetInterfaceVersion() and JVM_DTraceXXX [v2] In-Reply-To: <188th_PzKn-dtdX8nHylqBZEa7Dddi7cU13bkoDzigc=.6a12ee5d-b027-4012-a137-0169440d61b6@github.com> References: <188th_PzKn-dtdX8nHylqBZEa7Dddi7cU13bkoDzigc=.6a12ee5d-b027-4012-a137-0169440d61b6@github.com> Message-ID: <9Blq7Y4KR8iZy6lazmOmdoY74SldevY4pir1anU9VM0=.6854695d-9dbb-4a31-9367-1a21369d61e6@github.com> On Mon, 1 Feb 2021 20:10:58 GMT, Ioi Lam wrote: >> - JVM_GetInterfaceVersion() was used by "HotSpot Express" (HSX) which allowed the same JDK library to use different version of HotSpot. However, HSX is no longer supported so this API should be removed. >> - Implementations of APIs such as JVM_DTraceActivate, were removed in [JDK-8068976](https://bugs.openjdk.java.net/browse/JDK-8068976), so their declarations should be removed from jvm.h > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > fixed macos build "Build" change looks good. ------------- Marked as reviewed by ihse (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2338 From dongbo at openjdk.java.net Tue Feb 2 08:21:48 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Tue, 2 Feb 2021 08:21:48 GMT Subject: [jdk16] RFR: 8260585: AArch64: Wrong code generated for shifting right and accumulating four unsigned short integers [v3] In-Reply-To: References: Message-ID: On Mon, 1 Feb 2021 07:48:35 GMT, Ningsheng Jian wrote: >> Dong Bo has updated the pull request incrementally with one additional commit since the last revision: >> >> make empty ins_encode when shift >= 16 (chars) > > Looks good to me. Hi, Andrew. The reason `ssra` is not generated with .8B form is that if loop size is 16, the vector length is not 8 but 4. Because we only have `predicate(n->as_Vector()->length() == 8)` in `vsraa8B_imm`, so they are not matched. We should fix this with the following code: instruct vsraa8B_imm(vecD dst, vecD src, immI shift) %{ - predicate(n->as_Vector()->length() == 8); + predicate(n->as_Vector()->length() == 4 || n->as_Vector()->length() == 8); match(Set dst (AddVB dst (RShiftVB src (RShiftCntV shift)))); ins_cost(INSN_COST); format %{ "ssra $dst, $src, $shift\t# vector (8B)" %} @@ -18782,7 +18782,7 @@ instruct vsraa16B_imm(vecX dst, vecX src, immI shift) %{ %} instruct vsraa4S_imm(vecD dst, vecD src, immI shift) %{ - predicate(n->as_Vector()->length() == 4); + predicate(n->as_Vector()->length() == 2 || n->as_Vector()->length() == 4); match(Set dst (AddVS dst (RShiftVS src (RShiftCntV shift)))); ins_cost(INSN_COST); format %{ "ssra $dst, $src, $shift\t# vector (4H)" %} @@ -18849,7 +18849,7 @@ instruct vsraa2L_imm(vecX dst, vecX src, immI shift) %{ %} instruct vsrla8B_imm(vecD dst, vecD src, immI shift) %{ - predicate(n->as_Vector()->length() == 8); + predicate(n->as_Vector()->length() == 4 || n->as_Vector()->length() == 8); match(Set dst (AddVB dst (URShiftVB src (RShiftCntV shift)))); ins_cost(INSN_COST); format %{ "usra $dst, $src, $shift\t# vector (8B)" %} @@ -18879,7 +18879,7 @@ instruct vsrla16B_imm(vecX dst, vecX src, immI shift) %{ %} instruct vsrla4S_imm(vecD dst, vecD src, immI shift) %{ - predicate(n->as_Vector()->length() == 4); + predicate(n->as_Vector()->length() == 2 || n->as_Vector()->length() == 4); match(Set dst (AddVS dst (URShiftVS src (RShiftCntV shift)))); ins_cost(INSN_COST); format %{ "usra $dst, $src, $shift\t# vector (4H)" %} How do you think if we do this modification together via this PR? ------------- PR: https://git.openjdk.java.net/jdk16/pull/136 From rkennke at openjdk.java.net Tue Feb 2 09:20:49 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 2 Feb 2021 09:20:49 GMT Subject: RFR: 8260368: [PPC64] GC interface needs enhancement to support GCs with load barriers [v2] In-Reply-To: <0xOJFk5EaXbtIubyqdIQEGUW4u0z5ZtdclHS9PYL1Hg=.ebc3c5ce-8da6-4510-98d9-474aec1e85a6@github.com> References: <0xOJFk5EaXbtIubyqdIQEGUW4u0z5ZtdclHS9PYL1Hg=.ebc3c5ce-8da6-4510-98d9-474aec1e85a6@github.com> Message-ID: On Mon, 1 Feb 2021 21:18:54 GMT, Niklas Radomski wrote: >> At present, the `needs_frame` flag is used on the ppc platform to determine whether gc barriers must emit a new stack frame (and save the link register, for that matter) or not. With the introduction of load reference barriers, however, this mechansim is no longer sufficient. This holds especially true for compiler stubs as those make heavy use of volatile registers. >> >> To mitigate this, this patch replaces the `needs_frame` flag with a simple enumeration. As the enumerators are incremental, handling the different "register preservation needs" in the actual gc barrier implementations is comparatively (pun intended) easy. >> >> _This is a preparational change for the ShenandoahGC port to ppc. As such, it may provide some functionality this version doesn't make use of, but that is required for the upcoming change. This way, the scope of the upcoming change is limited to GC-specific functionality; making its review a little easier._ >> >> _For the same reason, this patch also introduces patching support for `LIR_Assembler::leal`._ > > Niklas Radomski has updated the pull request incrementally with three additional commits since the last revision: > > - Fix carg slot offset calculation > - Use a distinct magic number in clobber_carg_stack_slots > - Use enumeration instead of unsigned int Looks good! ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2302 From goetz.lindenmaier at sap.com Tue Feb 2 10:01:15 2021 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 2 Feb 2021 10:01:15 +0000 Subject: RFR: 8260368: [PPC64] GC interface needs enhancement to support GCs with load barriers In-Reply-To: References: Message-ID: Hi Niklas, I had a look at your change. Looks good to me, I only have some minor comments, See below. Best regards, Goetz c1_LIRAssembler_ppc.cpp ok g1BarrierSetAssembler_ppc.cpp ok You remove a row of line breaks manking them quit long. In other files, you break the lines very nicely, with register arguments on one line, other args in another line etc. g1BarrierSetAssembler_ppc.hpp ok barrierSetAssembler_ppc.cpp ok barrierSetAssembler_ppc.hpp ok cardTableBarrierSetAssembler_ppc.cpp/hpp ok modRefBarrierSetAssembler_ppc.cpp/hpp ok interp_masm_ppc.hpp ok interp_masm_ppc_64.cpp Please add a comment to load_resolved_reference_at_index() that the index register content is destroyed. macroAssembler_ppc.hpp: RuntimeInvocationPreservationLevel Document what it is good for, not why you use what C++ construct. Maybe "Indicate which registers must be preserved when calling into the runtime." ? + // This is especially useful for making calls to the JRT in places in which this haven't been done before; haven't --> hasn't macroAssembler_ppc.inline.hpp ok methodHandles_ppc.cpp As described below, I think it would be nice if you name + Register temp1 = R30; R30_temp1. (This holds also for argbase and param_size, but no need to introduce additional changes.) Why do you do these register changes here? Some nice cleanups. The comments with the O5 register etc might stem from the sparc port ;) sharedRuntime_ppc.cpp ok stubGenerator_ppc.cpp + Register tmp1 = R12_tmp, tmp2 = R11_klass; I think the register rename is pointless. I would just use R12_tmp and R11_scratch1. You could move the definition of R11_klass below this point, but the list of all regs used at the beginning gives a good overview of used regs, too. templateInterpreterGenerator_ppc.cpp ok templateTable_ppc_64.cpp + const Register tmp1 = R11_scratch1, + tmp2 = R12_scratch2; Why not use R11_scratch1 directly? The name basically has the same meaning as tmp. The advantage is that the register number is mentioned in the name, which makes debugging register problems more easier. Look at this artificial example: __ load_heap_oop(dst, offset, base, tmp1, tmp2, ... looks good. __ load_heap_oop(R11_dst, R12_offset, R12_base, tmp1, tmp2, makes me suspicious: how can offset and base be the same register? Further, if reading assembly, it is easier to match emitting C++ code and the assembly if the register numbers are mentioned in the variable names. The existing code was pointless, too: - const Register Rscratch = R11_scratch1; I would think this stems from all the renamings we had to do in the code when we contributed it to openjdk. -----Original Message----- From: hotspot-dev On Behalf Of Niklas Radomski Sent: Montag, 1. Februar 2021 19:56 To: hotspot-dev at openjdk.java.net Subject: RFR: 8260368: [PPC64] GC interface needs enhancement to support GCs with load barriers At present, the `needs_frame` flag is used on the ppc platform to determine whether gc barriers must emit a new stack frame (and save the link register, for that matter) or not. With the introduction of load reference barriers, however, this mechansim is no longer sufficient. This holds especially true for compiler stubs as those make heavy use of volatile registers. To mitigate this, this patch replaces the `needs_frame` flag with a simple enumeration. As the enumerators are incremental, handling the different "register preservation needs" in the actual gc barrier implementations is comparatively (pun intended) easy. _This is a preparational change for the ShenandoahGC port to ppc. As such, it may provide some functionality this version doesn't make use of, but that is required for the upcoming change. This way, the scope of the upcoming change is limited to GC-specific functionality; making its review a little easier._ _For the same reason, this patch also introduces patching support for `LIR_Assembler::leal`._ ------------- Commit messages: - 8260368: Enhance gc interface for advanced gc barriers Changes: https://git.openjdk.java.net/jdk/pull/2302/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2302&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260368 Stats: 306 lines in 19 files changed: 120 ins; 20 del; 166 mod Patch: https://git.openjdk.java.net/jdk/pull/2302.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2302/head:pull/2302 PR: https://git.openjdk.java.net/jdk/pull/2302 From dholmes at openjdk.java.net Tue Feb 2 10:06:43 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 2 Feb 2021 10:06:43 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code In-Reply-To: References: Message-ID: On Wed, 27 Jan 2021 09:18:19 GMT, Thomas Stuefe wrote: > In signal handling code, we have code sections which save signal handler state into vectors of sigaction structures, or of integers (if only flags are saved). All these code sections can be unified, disentangled and the using code simplified. > > There are three places where we do this: > > 1) When installing hotspot signal handlers, should we find a handler in place and signal chaining is enabled, we save the original handler inside a sigaction array and a corresponding sigset: > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L85 > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L338 > > 2) if diagnostics are enabled with -Xcheck:jni, we periodically check if our hotspot signal handlers had been replaced (`static void check_signal_handler(int sig)`): > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L766 > To do that, we store information about the handlers we installed and we expect to be intact; in this case we only store the sigaction flags (`int sigflags[NSIG];`) and deduce the handler address from context. > > 3) There is a complicated dance between VMError and the posix signal handler code: If a fatal error happens, we enter error reporting and install the secondary handler (`VMError::install_secondary_signal_handler()`). Before doing that, we store the handler we replace in yet another array, in this case one array for the handler address, one for the flag: > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/vmError_posix.cpp#L77 > I believe the purpose of this is to - when printing signal handlers as part of error reporting - print the original signal handler instead of the secondary crash handler (see `PosixSignals::print_signal_handler()`): > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L1372 > and additionally to not trip this warning here: > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L1391 > > ------ > > Changes in this patch: > > - I added some convenience macros to check if a handler matches a given function (HANDLER_IS), check if a handler is set to ignore or default or both (HANDLER_IS_IGN, HANDLER_IS_DFL, HANDLER_IS_IGN_OR_DFL). Makes code more readable. > - I added convenience class `SavedSignalHandlers` to keep a vector of handler information by signal number. > - I used that class to cover cases (1)..(3): > - `chained_handlers` contains all information of chained handlers > - `expected_handlers` contains a copy of the handlers the hotspot installed > - `replaced_handlers` contains information about replaced handlers > > - about (1): I store the chained signal handler information in `chained_handlers` when installing a hotspot handler, UseSignalChaining is 1, and a non-default handler was encountered. > > - about (2): I simplified the signal checking mechanism quite a bit: it compares the handler (address and flags) it finds present with expectations. Before this patch, the expected handler address was deduced in a hard-wired way, now, we just compare the active sigaction structure with the one we installed on VM start. > > - about (3): when installing any handler (hotspot as well as user defined via java), I store the handler it replaced in `replaced_handlers`. I use that to print which handler had been replaced in `PosixSignals::print_signal_handler`. I simplified `PosixSignals::print_signal_handler` such that it does not retain any knowledge about hotspot signal handlers. Now, it just prints out the currently established handlers. In addition to that, it prints out chaining information and which handlers had been replaced. I removed the associated coding from VMError. > > Output Before: > 663 Signal Handlers: > 664 SIGSEGV: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 665 SIGBUS: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 666 SIGFPE: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 667 SIGPIPE: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 668 SIGXFSZ: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 669 SIGILL: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 670 SIGUSR2: SR_handler in libjvm.so, sa_mask[0]=00000000000000000000000000000000, sa_flags=SA_RESTART|SA_SIGINFO > 671 SIGHUP: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 672 SIGINT: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 673 SIGTERM: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 674 SIGQUIT: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 675 SIGTRAP: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > > Now: > Signal Handlers: > SIGSEGV: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > replaced: SIGSEGV: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGBUS: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > replaced: SIGBUS: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGFPE: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > replaced: SIGFPE: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGPIPE: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGXFSZ: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGILL: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > replaced: SIGILL: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGUSR2: SR_handler in libjvm.so, mask=00000000000000000000000000000000, flags=SA_RESTART|SA_SIGINFO > SIGHUP: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGINT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGTERM: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGQUIT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGTRAP: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > > ----- > Tests: GA, and the patch has been tested in our nighlies for over a month now. I manually executed the runtime/jni/checked tests too. Hi Thomas, I've taken a first pass at this. I like parts of it, perhaps most of it, but am having trouble seeing the before/after picture clearly. A few comments below. Thanks, David src/hotspot/os/posix/signals_posix.cpp line 99: > 97: bool check_signal_number(int sig) const { > 98: assert(sig > 0 || sig < NSIG, "invalid signal number %d", sig); > 99: return sig > 0 || sig < NSIG; Surely && not || src/hotspot/os/posix/signals_posix.cpp line 94: > 92: // structures and membership information. > 93: class SavedSignalHandlers { > 94: struct sigaction _v[NSIG]; Why _v ?? src/hotspot/os/posix/signals_posix.cpp line 105: > 103: void mark_as_set(int sig) { sigaddset(&_set, sig); } > 104: void mark_as_clear(int sig) { sigdelset(&_set, sig); } > 105: These don't really pay for themselves IMO. They are private and only used once each, and are a single line of code. The extra abstraction layer really doesn't add anything in terms of clarity. src/hotspot/os/posix/signals_posix.cpp line 142: > 140: // Our own hotspot signal handlers should never ever get replaced by a third > 141: // party one. To check that, store a copy of the handler setup and compare it > 142: // periodically against reality (see see os::run_periodic_checks()). typo: see see src/hotspot/os/posix/signals_posix.cpp line 844: > 842: if (sig == SHUTDOWN2_SIGNAL && !isatty(fileno(stdin))) { // Flags? > 843: tty->print_cr("Running in non-interactive shell, %s handler is replaced by shell", > 844: os::exception_name(sig, buf, O_BUFLEN)); // When comparing, ignore the SA_RESTORER flag on Linux I don't understand the flag comments in this block. ------------- Changes requested by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2251 From ngasson at openjdk.java.net Tue Feb 2 10:21:02 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Tue, 2 Feb 2021 10:21:02 GMT Subject: RFR: 8260355: AArch64: deoptimization stub should save vector registers [v4] In-Reply-To: <5PbujtOhWB7uqNXu5vRMwYGNMNU78aLAkGpDLWUnQKM=.9d2206fe-9c15-424c-b08e-80eab468df2a@github.com> References: <5PbujtOhWB7uqNXu5vRMwYGNMNU78aLAkGpDLWUnQKM=.9d2206fe-9c15-424c-b08e-80eab468df2a@github.com> Message-ID: <8uV_aS99ZXLKzfeqP9PnJOMqLDLqqDBAXgY1kShysUE=.458c9338-e47a-4371-aac7-8fe096ef19c4@github.com> > This is an AArch64 port of the fix for JDK-8256056 "Deoptimization stub > doesn't save vector registers on x86". The problem is that a vector > produced by the Vector API may be stored in a register when the deopt > blob is called. Because the deopt blob only stores the lower half of > vector registers, the full vector object cannot be rematerialized during > deoptimization. So the following will crash on AArch64 with current JDK: > > make test TEST="jdk/incubator/vector" \ > JTREG="VM_OPTIONS=-XX:+DeoptimizeALot -XX:DeoptimizeALotInterval=0" > > The fix is to store the full vector registers by passing > save_vectors=true to save_live_registers() in the deopt blob. Because > save_live_registers() places the integer registers above the floating > registers in the stack frame, RegisterSaver::r0_offset_in_bytes() needs > to calculate the SP offset based on whether full vectors were saved, and > whether those vectors were NEON or SVE, rather than using a static > offset as it does currently. > > The change to VectorSupport::allocate_vector_payload_helper() is > required because we only store the lowest VMReg slot in the oop map. > However unlike x86 the vector registers are always saved in a contiguous > region of memory, so we can calculate the address of each vector element > as an offset from the address of the first slot. X86 handles this in > RegisterMap::pd_location() but that won't work on AArch64 because with > SVE there isn't a unique VMReg corresponding to each four-byte physical > slot in the vector (there are always exactly eight logical VMRegs > regardless of the actual vector length). > > Tested hotspot_all_no_apps and jdk_core. Nick Gasson has updated the pull request incrementally with one additional commit since the last revision: Review comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2279/files - new: https://git.openjdk.java.net/jdk/pull/2279/files/498310d4..0809ccf1 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2279&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2279&range=02-03 Stats: 132 lines in 19 files changed: 30 ins; 65 del; 37 mod Patch: https://git.openjdk.java.net/jdk/pull/2279.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2279/head:pull/2279 PR: https://git.openjdk.java.net/jdk/pull/2279 From ngasson at openjdk.java.net Tue Feb 2 10:25:40 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Tue, 2 Feb 2021 10:25:40 GMT Subject: RFR: 8260355: AArch64: deoptimization stub should save vector registers In-Reply-To: References: <5PbujtOhWB7uqNXu5vRMwYGNMNU78aLAkGpDLWUnQKM=.9d2206fe-9c15-424c-b08e-80eab468df2a@github.com> Message-ID: On Mon, 1 Feb 2021 12:40:54 GMT, Vladimir Ivanov wrote: > Much better, thanks. > > I suggest the following changes: > I've changed it as suggested. This way seems much simpler, thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/2279 From vlivanov at openjdk.java.net Tue Feb 2 11:44:44 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Tue, 2 Feb 2021 11:44:44 GMT Subject: RFR: 8260355: AArch64: deoptimization stub should save vector registers [v4] In-Reply-To: <8uV_aS99ZXLKzfeqP9PnJOMqLDLqqDBAXgY1kShysUE=.458c9338-e47a-4371-aac7-8fe096ef19c4@github.com> References: <5PbujtOhWB7uqNXu5vRMwYGNMNU78aLAkGpDLWUnQKM=.9d2206fe-9c15-424c-b08e-80eab468df2a@github.com> <8uV_aS99ZXLKzfeqP9PnJOMqLDLqqDBAXgY1kShysUE=.458c9338-e47a-4371-aac7-8fe096ef19c4@github.com> Message-ID: On Tue, 2 Feb 2021 10:21:02 GMT, Nick Gasson wrote: >> This is an AArch64 port of the fix for JDK-8256056 "Deoptimization stub >> doesn't save vector registers on x86". The problem is that a vector >> produced by the Vector API may be stored in a register when the deopt >> blob is called. Because the deopt blob only stores the lower half of >> vector registers, the full vector object cannot be rematerialized during >> deoptimization. So the following will crash on AArch64 with current JDK: >> >> make test TEST="jdk/incubator/vector" \ >> JTREG="VM_OPTIONS=-XX:+DeoptimizeALot -XX:DeoptimizeALotInterval=0" >> >> The fix is to store the full vector registers by passing >> save_vectors=true to save_live_registers() in the deopt blob. Because >> save_live_registers() places the integer registers above the floating >> registers in the stack frame, RegisterSaver::r0_offset_in_bytes() needs >> to calculate the SP offset based on whether full vectors were saved, and >> whether those vectors were NEON or SVE, rather than using a static >> offset as it does currently. >> >> The change to VectorSupport::allocate_vector_payload_helper() is >> required because we only store the lowest VMReg slot in the oop map. >> However unlike x86 the vector registers are always saved in a contiguous >> region of memory, so we can calculate the address of each vector element >> as an offset from the address of the first slot. X86 handles this in >> RegisterMap::pd_location() but that won't work on AArch64 because with >> SVE there isn't a unique VMReg corresponding to each four-byte physical >> slot in the vector (there are always exactly eight logical VMRegs >> regardless of the actual vector length). >> >> Tested hotspot_all_no_apps and jdk_core. > > Nick Gasson has updated the pull request incrementally with one additional commit since the last revision: > > Review comments `RegisterMap`-related changes look good. ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2279 From njian at openjdk.java.net Tue Feb 2 11:44:58 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Tue, 2 Feb 2021 11:44:58 GMT Subject: [jdk16] RFR: 8260585: AArch64: Wrong code generated for shifting right and accumulating four unsigned short integers [v3] In-Reply-To: References: Message-ID: <4KBCYEYrcTMEqEo89p3m4OfqSolqi2agyexH-k1vZrU=.813f7a97-b7ef-4cc0-809b-91e4c5f3c7fd@github.com> On Tue, 2 Feb 2021 08:19:21 GMT, Dong Bo wrote: > Because we only have predicate(n->as_Vector()->length() == 8) in vsraa8B_imm, so they are not matched. > We should fix this with the following code: I think this is an enhancement, and should be done in a separate patch in jdk mainline. ------------- PR: https://git.openjdk.java.net/jdk16/pull/136 From vkempik at openjdk.java.net Tue Feb 2 11:45:04 2021 From: vkempik at openjdk.java.net (Vladimir Kempik) Date: Tue, 2 Feb 2021 11:45:04 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v6] In-Reply-To: References: <-iXh6ikIWdG2YM4G3ZE33bE_bflntM9JZ0JWd1vSMKU=.7c380f16-0e1f-4fa8-8b47-f0e9e56fdba3@github.com> <1S0Z45oJKy8oesCN0k6SIHhZLzDYW9j-M_yfWZoHEzI=.2694c382-6613-4cd1-a917-16147bb00a9e@github.com> Message-ID: On Mon, 1 Feb 2021 14:06:32 GMT, Magnus Ihse Bursie wrote: >>> Hello, hsdis is a separate out-of-tree project and is not part of this jep. >> >> Unless there's something I'm missing it only requires a few lines of change to src/utils/hsdis/makefile (it already has support for macos x86_64) >> >>>support for looking for proper hsdis dylib library was added as part of this jep. >> >> I'm a little confused. Are you planning on adding a new disassembler? > >> > Hello, hsdis is a separate out-of-tree project and is not part of this jep. >> >> Unless there's something I'm missing it only requires a few lines of change to src/utils/hsdis/makefile (it already has support for macos x86_64) > > I agree with Alan that it makes sense to add this trivial change as part of this PR, if it allows the current hsdis solution to continue working on mac/aarch64. > >> >> > support for looking for proper hsdis dylib library was added as part of this jep. >> >> I'm a little confused. Are you planning on adding a new disassembler? > > @a74nh I think Vladimir is referring to https://github.com/openjdk/jdk/pull/392. The hsdis "component" has been left behind for a long time, and there are several requests to add new backends, which require a modernized build of hsdis. This is undfortunately not a high-priority project, and has been postponed several times already. :( > > > Hello, hsdis is a separate out-of-tree project and is not part of this jep. > > > > > > Unless there's something I'm missing it only requires a few lines of change to src/utils/hsdis/makefile (it already has support for macos x86_64) > > I agree with Alan that it makes sense to add this trivial change as part of this PR, if it allows the current hsdis solution to continue working on mac/aarch64. > > > > support for looking for proper hsdis dylib library was added as part of this jep. > > > > > > I'm a little confused. Are you planning on adding a new disassembler? > > @a74nh I think Vladimir is referring to #392. The hsdis "component" has been left behind for a long time, and there are several requests to add new backends, which require a modernized build of hsdis. This is undfortunately not a high-priority project, and has been postponed several times already. :( Sorry I was under impression hsdis is not part of openjdk tree. Alan, could you please share with us the version of binutils you were using in your test ? Regards, Vladimir ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From akozlov at openjdk.java.net Tue Feb 2 11:59:08 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Tue, 2 Feb 2021 11:59:08 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: Message-ID: > Please review the implementation of JEP 391: macOS/AArch64 Port. > > It's heavily based on existing ports to linux/aarch64, macos/x86_64, and windows/aarch64. > > Major changes are in: > * src/hotspot/cpu/aarch64: support of the new calling convention (subtasks JDK-8253817, JDK-8253818) > * src/hotspot/os_cpu/bsd_aarch64: copy of os_cpu/linux_aarch64 with necessary adjustments (JDK-8253819) > * src/hotspot/share, test/hotspot/gtest: support of write-xor-execute (W^X), required on macOS/AArch64 platform. It's implemented with pthread_jit_write_protect_np provided by Apple. The W^X mode is local to a thread, so W^X mode change relates to the java thread state change (for java threads). In most cases, JVM executes in write-only mode, except when calling a generated stub like SafeFetch, which requires a temporary switch to execute-only mode. The same execute-only mode is enabled when a java thread executes in java or native states. This approach of managing W^X mode turned out to be simple and efficient enough. > * src/jdk.hotspot.agent: serviceability agent implementation (JDK-8254941) Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: support macos_aarch64 in hsdis ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2200/files - new: https://git.openjdk.java.net/jdk/pull/2200/files/b421e0b4..3c705ae5 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2200&range=08 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2200&range=07-08 Stats: 8 lines in 1 file changed: 6 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2200.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2200/head:pull/2200 PR: https://git.openjdk.java.net/jdk/pull/2200 From vkempik at openjdk.java.net Tue Feb 2 11:59:09 2021 From: vkempik at openjdk.java.net (Vladimir Kempik) Date: Tue, 2 Feb 2021 11:59:09 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v6] In-Reply-To: References: <-iXh6ikIWdG2YM4G3ZE33bE_bflntM9JZ0JWd1vSMKU=.7c380f16-0e1f-4fa8-8b47-f0e9e56fdba3@github.com> <1S0Z45oJKy8oesCN0k6SIHhZLzDYW9j-M_yfWZoHEzI=.2694c382-6613-4cd1-a917-16147bb00a9e@github.com> Message-ID: On Tue, 2 Feb 2021 11:14:12 GMT, Vladimir Kempik wrote: > > > > Hello, hsdis is a separate out-of-tree project and is not part of this jep. > > > > > > > > > Unless there's something I'm missing it only requires a few lines of change to src/utils/hsdis/makefile (it already has support for macos x86_64) > > > > > > I agree with Alan that it makes sense to add this trivial change as part of this PR, if it allows the current hsdis solution to continue working on mac/aarch64. > > > > support for looking for proper hsdis dylib library was added as part of this jep. > > > > > > > > > I'm a little confused. Are you planning on adding a new disassembler? > > > > > > @a74nh I think Vladimir is referring to #392. The hsdis "component" has been left behind for a long time, and there are several requests to add new backends, which require a modernized build of hsdis. This is undfortunately not a high-priority project, and has been postponed several times already. :( > > Sorry I was under impression hsdis is not part of openjdk tree. > > Alan, could you please share with us the version of binutils you were using in your test ? > > Regards, Vladimir I have updated PR with patch for hsdis, one thing to notice, LP64 is not defined for arm64 in Makefile ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From stefank at openjdk.java.net Tue Feb 2 12:14:45 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Tue, 2 Feb 2021 12:14:45 GMT Subject: RFR: 8260012: Reduce inclusion of collectedHeap.hpp and heapInspection.hpp In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 04:18:24 GMT, Ioi Lam wrote: > collectedHeap.hpp is included by 477 out of 1000 .o files in HotSpot. This file in turn includes many other complex header files. > > In many cases, an object file only directly includes this file via: > > - memAllocator.hpp (which does not actually use collectedHeap.hpp) > - oop.inline.hpp and compressedOops.inline.hpp (only use collectedHeap.hpp in asserts via `Universe::heap()->is_in()`). > > By refactoring the above 3 files, we can reduce the .o files that include collectedHeap.hpp to 242. > > This RFE also removes the unnecessary inclusion of heapInspection.hpp from collectedHeap.hpp. > > Build time of HotSpot is reduced for about 1%. > > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. Looks good. A few things that you might want to consider, but I'm also fine with the patch as it is. src/hotspot/share/gc/shared/memAllocator.hpp line 30: > 28: #include "memory/memRegion.hpp" > 29: #include "oops/oopsHierarchy.hpp" > 30: #include "runtime/thread.hpp" If we want to, this could be changed to a forward declaration if we removed the default value (Thread* thread = Thread::current()) of the constructors. Not needed for this RFE though. src/hotspot/cpu/arm/frame_arm.cpp line 518: > 516: obj = *(oop*)res_addr; > 517: } > 518: assert(obj == NULL || Universe::is_in_heap(obj), "sanity check"); Could have been changed to is_in_heap_or_null. src/hotspot/cpu/ppc/frame_ppc.cpp line 308: > 306: case T_ARRAY: { > 307: oop obj = *(oop*)tos_addr; > 308: assert(obj == NULL || Universe::is_in_heap(obj), "sanity check"); Could have been changed to is_in_heap_or_null. src/hotspot/cpu/s390/frame_s390.cpp line 321: > 319: case T_ARRAY: { > 320: oop obj = *(oop*)tos_addr; > 321: assert(obj == NULL || Universe::is_in_heap(obj), "sanity check"); Could have been changed to is_in_heap_or_null. ------------- Marked as reviewed by stefank (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2347 From github.com+4146708+a74nh at openjdk.java.net Tue Feb 2 12:17:47 2021 From: github.com+4146708+a74nh at openjdk.java.net (Alan Hayward) Date: Tue, 2 Feb 2021 12:17:47 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v6] In-Reply-To: References: <-iXh6ikIWdG2YM4G3ZE33bE_bflntM9JZ0JWd1vSMKU=.7c380f16-0e1f-4fa8-8b47-f0e9e56fdba3@github.com> <1S0Z45oJKy8oesCN0k6SIHhZLzDYW9j-M_yfWZoHEzI=.2694c382-6613-4cd1-a917-16147bb00a9e@github.com> Message-ID: On Tue, 2 Feb 2021 11:56:12 GMT, Vladimir Kempik wrote: > > > > Hello, hsdis is a separate out-of-tree project and is not part of this jep. > > > > > > > > > Unless there's something I'm missing it only requires a few lines of change to src/utils/hsdis/makefile (it already has support for macos x86_64) > > > > > > I agree with Alan that it makes sense to add this trivial change as part of this PR, if it allows the current hsdis solution to continue working on mac/aarch64. > > > > support for looking for proper hsdis dylib library was added as part of this jep. > > > > > > > > > I'm a little confused. Are you planning on adding a new disassembler? > > > > > > @a74nh I think Vladimir is referring to #392. The hsdis "component" has been left behind for a long time, and there are several requests to add new backends, which require a modernized build of hsdis. This is undfortunately not a high-priority project, and has been postponed several times already. :( > > Sorry I was under impression hsdis is not part of openjdk tree. > > Alan, could you please share with us the version of binutils you were using in your test ? > I was just using the latest HEAD: git clone git://sourceware.org/git/binutils-gdb.git src/utils/hsdis/build/binutils A slightly safer approach would be to grab the latest release: https://ftp.gnu.org/gnu/binutils/binutils-2.36.tar.gz Once hsdis-demo was working for me, for only other oddity I had was that the library needed renaming when copying/linking into the build dir: jdk/lib/server/libhsdis-aarch64.dylib ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From tschatzl at openjdk.java.net Tue Feb 2 12:33:47 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 2 Feb 2021 12:33:47 GMT Subject: RFR: 8260012: Reduce inclusion of collectedHeap.hpp and heapInspection.hpp In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 04:18:24 GMT, Ioi Lam wrote: > collectedHeap.hpp is included by 477 out of 1000 .o files in HotSpot. This file in turn includes many other complex header files. > > In many cases, an object file only directly includes this file via: > > - memAllocator.hpp (which does not actually use collectedHeap.hpp) > - oop.inline.hpp and compressedOops.inline.hpp (only use collectedHeap.hpp in asserts via `Universe::heap()->is_in()`). > > By refactoring the above 3 files, we can reduce the .o files that include collectedHeap.hpp to 242. > > This RFE also removes the unnecessary inclusion of heapInspection.hpp from collectedHeap.hpp. > > Build time of HotSpot is reduced for about 1%. > > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. Checked a few includes for missing ones; obviously they are included transitively so add as you see fit. src/hotspot/share/gc/shared/memAllocator.hpp line 30: > 28: #include "memory/memRegion.hpp" > 29: #include "oops/oopsHierarchy.hpp" > 30: #include "runtime/thread.hpp" `utilities/globalDefinitions.hpp` for `HeapWord` is missing. src/hotspot/share/oops/compressedOops.inline.hpp line 28: > 26: #define SHARE_OOPS_COMPRESSEDOOPS_INLINE_HPP > 27: > 28: #include "gc/shared/collectedHeap.hpp" `utilities/globalDefinitions.hpp` for `*PTR_FORMAT` and others is missing. src/hotspot/share/oops/oop.inline.hpp line 28: > 26: #define SHARE_OOPS_OOP_INLINE_HPP > 27: > 28: #include "gc/shared/collectedHeap.hpp" `utilities/globalDefinitions.hpp` for `HeapWord` is missing. `globals.hpp` for some globals. `oopsHierarchy.hpp` for `narrowKlass` `utilties/debug.hpp` for `assert` ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2347 From github.com+9200663+quaffel at openjdk.java.net Tue Feb 2 12:53:42 2021 From: github.com+9200663+quaffel at openjdk.java.net (Niklas Radomski) Date: Tue, 2 Feb 2021 12:53:42 GMT Subject: RFR: 8260368: [PPC64] GC interface needs enhancement to support GCs with load barriers [v2] In-Reply-To: References: <0xOJFk5EaXbtIubyqdIQEGUW4u0z5ZtdclHS9PYL1Hg=.ebc3c5ce-8da6-4510-98d9-474aec1e85a6@github.com> Message-ID: <3AFu9m-UrpWr1Kv2GluPW0w-S1yvE7rGZR_LnWPM3xs=.b04631fc-1eee-441b-b3a0-91ac68729c18@github.com> On Tue, 2 Feb 2021 09:18:10 GMT, Roman Kennke wrote: >> Niklas Radomski has updated the pull request incrementally with three additional commits since the last revision: >> >> - Fix carg slot offset calculation >> - Use a distinct magic number in clobber_carg_stack_slots >> - Use enumeration instead of unsigned int > > Looks good! > You remove a row of line breaks manking them quit long. > In other files, you break the lines very nicely, > with register arguments on one line, other args in > another line etc. My rule of thumb is: If a line fits into the (nowadays standard) 120 characters line limit quite nicely, I usually don't use additional line breaks. If it doesn't, I do what you've just described. As far as I can tell, I'm not removing any line breaks that are necessary for keeping this limit. Nonetheless, as the variable names are very compact, the rule might not be as applicable as in other codebases. IMHO, it's a matter of taste. As you mention it explicitly, I'll go back to the more aggressive approach. > interp_masm_ppc_64.cpp > > Please add a comment to load_resolved_reference_at_index() > that the index register content is destroyed. > > macroAssembler_ppc.hpp: > > RuntimeInvocationPreservationLevel > Document what it is good for, not why you use what C++ construct. > Maybe "Indicate which registers must be preserved when calling into the runtime." ? > That comment was removed in dbc18a3 because it really didn't provide any value. Suggestion looks good, I'll consider it. > + // This is especially useful for making calls to the JRT in places in which this haven't been done before; > haven't --> hasn't > Yikes. > methodHandles_ppc.cpp > > As described below, I think it would be nice if you name > + Register temp1 = R30; > R30_temp1. > (This holds also for argbase and param_size, but no need to > introduce additional changes.) As I have to edit that file anyway, I'll rename the other locals as well. Great opportunity to pay off some styling debts... Using different naming styles in the same function would presumably be much worse. > Why do you do these register changes here? `temp1` is used to store the receiver klass and must survive several calls to `MacroAssembler::load_heap_oop` (and thus JRT calls). If it still was a volatile register, we would have to use a higher preservation level; resulting in additional overhead. As future readers might stumble across this as well, I'll add a comment to make this design choice a little more obvious in that particular case. I also dislike that `tmp1` is still used after the assignment of `temp1` to `temp1_recv_klass`, but I cannot think of a better solution... The same holds true for all the other register changes. > Some nice cleanups. The comments with the O5 register etc > might stem from the sparc port ;) Although those comments may seem subtle, they actually did lead to some serious confusion! Nonetheless, quite interesting to find such "artifacts of the past" in the codebase. > stubGenerator_ppc.cpp > > + Register tmp1 = R12_tmp, tmp2 = R11_klass; > I think the register rename is pointless. > I would just use R12_tmp and R11_scratch1. > > You could move the definition of R11_klass > below this point, but the list of all regs > used at the beginning gives a good overview > of used regs, too. > If I moved R11_klass, I should probably do the same with all the other declarations (if applicable). So I'll just leave it as it is. > templateTable_ppc_64.cpp > + const Register tmp1 = R11_scratch1, > + tmp2 = R12_scratch2; > Why not use R11_scratch1 directly? The name basically > has the same meaning as tmp. The advantage is that > the register number is mentioned in the name, which > makes debugging register problems more easier. > > Look at this artificial example: > > __ load_heap_oop(dst, offset, base, tmp1, tmp2, ... > looks good. > __ load_heap_oop(R11_dst, R12_offset, R12_base, tmp1, tmp2, > makes me suspicious: how can offset and base be the same > register? That is a _really_ good point! I'll change it and keep it in mind for future changes. Thank you for your review! ------------- PR: https://git.openjdk.java.net/jdk/pull/2302 From github.com+9200663+quaffel at openjdk.java.net Tue Feb 2 13:44:58 2021 From: github.com+9200663+quaffel at openjdk.java.net (Niklas Radomski) Date: Tue, 2 Feb 2021 13:44:58 GMT Subject: RFR: 8260368: [PPC64] GC interface needs enhancement to support GCs with load barriers [v3] In-Reply-To: References: Message-ID: > At present, the `needs_frame` flag is used on the ppc platform to determine whether gc barriers must emit a new stack frame (and save the link register, for that matter) or not. With the introduction of load reference barriers, however, this mechansim is no longer sufficient. This holds especially true for compiler stubs as those make heavy use of volatile registers. > > To mitigate this, this patch replaces the `needs_frame` flag with a simple enumeration. As the enumerators are incremental, handling the different "register preservation needs" in the actual gc barrier implementations is comparatively (pun intended) easy. > > _This is a preparational change for the ShenandoahGC port to ppc. As such, it may provide some functionality this version doesn't make use of, but that is required for the upcoming change. This way, the scope of the upcoming change is limited to GC-specific functionality; making its review a little easier._ > > _For the same reason, this patch also introduces patching support for `LIR_Assembler::leal`._ Niklas Radomski has updated the pull request incrementally with one additional commit since the last revision: Apply feedback ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2302/files - new: https://git.openjdk.java.net/jdk/pull/2302/files/46bfce16..1c833ab7 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2302&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2302&range=01-02 Stats: 122 lines in 12 files changed: 57 ins; 4 del; 61 mod Patch: https://git.openjdk.java.net/jdk/pull/2302.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2302/head:pull/2302 PR: https://git.openjdk.java.net/jdk/pull/2302 From mdoerr at openjdk.java.net Tue Feb 2 14:46:42 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Tue, 2 Feb 2021 14:46:42 GMT Subject: RFR: 8260368: [PPC64] GC interface needs enhancement to support GCs with load barriers [v3] In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 13:44:58 GMT, Niklas Radomski wrote: >> At present, the `needs_frame` flag is used on the ppc platform to determine whether gc barriers must emit a new stack frame (and save the link register, for that matter) or not. With the introduction of load reference barriers, however, this mechansim is no longer sufficient. This holds especially true for compiler stubs as those make heavy use of volatile registers. >> >> To mitigate this, this patch replaces the `needs_frame` flag with a simple enumeration. As the enumerators are incremental, handling the different "register preservation needs" in the actual gc barrier implementations is comparatively (pun intended) easy. >> >> _This is a preparational change for the ShenandoahGC port to ppc. As such, it may provide some functionality this version doesn't make use of, but that is required for the upcoming change. This way, the scope of the upcoming change is limited to GC-specific functionality; making its review a little easier._ >> >> _For the same reason, this patch also introduces patching support for `LIR_Assembler::leal`._ > > Niklas Radomski has updated the pull request incrementally with one additional commit since the last revision: > > Apply feedback Marked as reviewed by mdoerr (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2302 From goetz.lindenmaier at sap.com Tue Feb 2 14:56:01 2021 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 2 Feb 2021 14:56:01 +0000 Subject: RFR: 8260368: [PPC64] GC interface needs enhancement to support GCs with load barriers [v2] In-Reply-To: <3AFu9m-UrpWr1Kv2GluPW0w-S1yvE7rGZR_LnWPM3xs=.b04631fc-1eee-441b-b3a0-91ac68729c18@github.com> References: <0xOJFk5EaXbtIubyqdIQEGUW4u0z5ZtdclHS9PYL1Hg=.ebc3c5ce-8da6-4510-98d9-474aec1e85a6@github.com> <3AFu9m-UrpWr1Kv2GluPW0w-S1yvE7rGZR_LnWPM3xs=.b04631fc-1eee-441b-b3a0-91ac68729c18@github.com> Message-ID: > IMHO, it's a matter of taste. As you mention it explicitly, I'll go back to the > more aggressive approach. Thanks, not a game-changer anyways ??. And now they are really short! > > RuntimeInvocationPreservationLevel > > Document what it is good for, not why you use what C++ construct. > > Maybe "Indicate which registers must be preserved when calling into the > runtime." ? > > That comment was removed in dbc18a3 because it really didn't provide any > value. > Suggestion looks good, I'll consider it. Thanks! > > methodHandles_ppc.cpp > > > > As described below, I think it would be nice if you name > > + Register temp1 = R30; > > R30_temp1. > > (This holds also for argbase and param_size, but no need to > > introduce additional changes.) > > As I have to edit that file anyway, I'll rename the other locals as well. > Great opportunity to pay off some styling debts... > Using different naming styles in the same function would presumably be > much worse. Good. > > Why do you do these register changes here? > > `temp1` is used to store the receiver klass and must survive several calls to > `MacroAssembler::load_heap_oop` (and thus JRT calls). If it still was a > volatile register, we would have to use a higher preservation level; resulting > in additional overhead. Thanks, makes sense! > As future readers might stumble across this as well, I'll add a comment to > make this design choice a little more obvious in that particular case. I also > dislike that `tmp1` is still used after the assignment of `temp1` to > `temp1_recv_klass`, but I cannot think of a better solution... > > The same holds true for all the other register changes. > > > stubGenerator_ppc.cpp > > > > + Register tmp1 = R12_tmp, tmp2 = R11_klass; > > I think the register rename is pointless. > > I would just use R12_tmp and R11_scratch1. > > > > You could move the definition of R11_klass > > below this point, but the list of all regs > > used at the beginning gives a good overview > > of used regs, too. > > > > If I moved R11_klass, I should probably do the same with all the other > declarations (if applicable). > So I'll just leave it as it is. OK. Looks good now, reviewed! And thanks a lot for doing this great work on Shenandoah! Best regards, Goetz. From goetz at openjdk.java.net Tue Feb 2 15:07:45 2021 From: goetz at openjdk.java.net (Goetz Lindenmaier) Date: Tue, 2 Feb 2021 15:07:45 GMT Subject: RFR: 8260368: [PPC64] GC interface needs enhancement to support GCs with load barriers [v3] In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 13:44:58 GMT, Niklas Radomski wrote: >> At present, the `needs_frame` flag is used on the ppc platform to determine whether gc barriers must emit a new stack frame (and save the link register, for that matter) or not. With the introduction of load reference barriers, however, this mechansim is no longer sufficient. This holds especially true for compiler stubs as those make heavy use of volatile registers. >> >> To mitigate this, this patch replaces the `needs_frame` flag with a simple enumeration. As the enumerators are incremental, handling the different "register preservation needs" in the actual gc barrier implementations is comparatively (pun intended) easy. >> >> _This is a preparational change for the ShenandoahGC port to ppc. As such, it may provide some functionality this version doesn't make use of, but that is required for the upcoming change. This way, the scope of the upcoming change is limited to GC-specific functionality; making its review a little easier._ >> >> _For the same reason, this patch also introduces patching support for `LIR_Assembler::leal`._ > > Niklas Radomski has updated the pull request incrementally with one additional commit since the last revision: > > Apply feedback Looks good now, thanks! ------------- Marked as reviewed by goetz (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2302 From kvn at openjdk.java.net Tue Feb 2 15:49:11 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 2 Feb 2021 15:49:11 GMT Subject: RFR: 8260301: misc gc/g1/unloading tests fails with "RuntimeException: Method could not be enqueued for compilation at level N" Message-ID: On return WB wait to acquire Compile_lock before checking compilation status https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/whitebox.cpp#L988 This lock is used by ciEnv for compiled code publishing: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/ci/ciEnv.cpp#L981 So while WB waits the lock compiler thread can finish compilation, register nmethod and clear method's queued_for_compilation bit. The problem is that WB check `nm` value (compiled code) which it got before the lock and when method compilation is not finished. The fix is to check compiled code again similar to check in CompileBroker: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/compileBroker.cpp#L1501 Passed hs-tier1-4 testing and 100 x vmTestbase/gc/g1/unloading/tests/unloading_compilation_*. ------------- Commit messages: - 8260301: misc gc/g1/unloading tests fails with "RuntimeException: Method could not be enqueued for compilation at level N" Changes: https://git.openjdk.java.net/jdk/pull/2356/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2356&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260301 Stats: 9 lines in 1 file changed: 9 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2356.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2356/head:pull/2356 PR: https://git.openjdk.java.net/jdk/pull/2356 From github.com+9200663+quaffel at openjdk.java.net Tue Feb 2 15:57:42 2021 From: github.com+9200663+quaffel at openjdk.java.net (Niklas Radomski) Date: Tue, 2 Feb 2021 15:57:42 GMT Subject: Integrated: 8260368: [PPC64] GC interface needs enhancement to support GCs with load barriers In-Reply-To: References: Message-ID: <8BZ7qmtWtPt8jXg_IX_CQgrcWpTqKu-s19-csGMi868=.90fc5d3f-a5a4-433d-9668-23f9130cc966@github.com> On Thu, 28 Jan 2021 22:08:17 GMT, Niklas Radomski wrote: > At present, the `needs_frame` flag is used on the ppc platform to determine whether gc barriers must emit a new stack frame (and save the link register, for that matter) or not. With the introduction of load reference barriers, however, this mechansim is no longer sufficient. This holds especially true for compiler stubs as those make heavy use of volatile registers. > > To mitigate this, this patch replaces the `needs_frame` flag with a simple enumeration. As the enumerators are incremental, handling the different "register preservation needs" in the actual gc barrier implementations is comparatively (pun intended) easy. > > _This is a preparational change for the ShenandoahGC port to ppc. As such, it may provide some functionality this version doesn't make use of, but that is required for the upcoming change. This way, the scope of the upcoming change is limited to GC-specific functionality; making its review a little easier._ > > _For the same reason, this patch also introduces patching support for `LIR_Assembler::leal`._ This pull request has now been integrated. Changeset: 0093183b Author: Quaffel Committer: Martin Doerr URL: https://git.openjdk.java.net/jdk/commit/0093183b Stats: 381 lines in 19 files changed: 174 ins; 18 del; 189 mod 8260368: [PPC64] GC interface needs enhancement to support GCs with load barriers Reviewed-by: mdoerr, rkennke, goetz ------------- PR: https://git.openjdk.java.net/jdk/pull/2302 From gziemski at openjdk.java.net Tue Feb 2 16:03:42 2021 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Tue, 2 Feb 2021 16:03:42 GMT Subject: RFR: 8260193: Remove JVM_GetInterfaceVersion() and JVM_DTraceXXX [v2] In-Reply-To: <3s1-hVjGvofyZ6o7jh6ayZWMp_RkPlt1Juig5U9zQfM=.3adf577e-6ee6-4f29-ba22-c7d15e742f3e@github.com> References: <188th_PzKn-dtdX8nHylqBZEa7Dddi7cU13bkoDzigc=.6a12ee5d-b027-4012-a137-0169440d61b6@github.com> <3s1-hVjGvofyZ6o7jh6ayZWMp_RkPlt1Juig5U9zQfM=.3adf577e-6ee6-4f29-ba22-c7d15e742f3e@github.com> Message-ID: On Mon, 1 Feb 2021 20:54:42 GMT, Ioi Lam wrote: >> src/java.base/share/native/libjava/check_version.c line 33: >> >>> 31: DEF_JNI_OnLoad(JavaVM *vm, void *reserved) >>> 32: { >>> 33: return JNI_VERSION_1_2; >> >> This leaves an entire file with one trivial function implementation. Can we remove the file and implement `DEF_JNI_OnLoad()` in `jni_util.h` (or some other existing suitable file) ? > > I am not sure if jni_utils.c is the right file (it defines the `JNU_XXX` functions that are used by other shared libraries). > > There are other .c files that have trivial `DEF_JNI_OnLoad` functions (e.g., java.base/share/native/libnio/nio_util.c). > > @AlanBateman do you have any suggestions? I'm fine with the way it is, just thought we might want to consider cleaning up a bit more, since it's a cleanup task itself. ------------- PR: https://git.openjdk.java.net/jdk/pull/2338 From gziemski at openjdk.java.net Tue Feb 2 16:03:40 2021 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Tue, 2 Feb 2021 16:03:40 GMT Subject: RFR: 8260193: Remove JVM_GetInterfaceVersion() and JVM_DTraceXXX [v2] In-Reply-To: <188th_PzKn-dtdX8nHylqBZEa7Dddi7cU13bkoDzigc=.6a12ee5d-b027-4012-a137-0169440d61b6@github.com> References: <188th_PzKn-dtdX8nHylqBZEa7Dddi7cU13bkoDzigc=.6a12ee5d-b027-4012-a137-0169440d61b6@github.com> Message-ID: <_PhEUVHOgMItPaEPIZH_KHEMPP6D4tPPTn0qjDLzd8Q=.7f3486a5-90f8-4ff2-931a-ea4dd238dc4b@github.com> On Mon, 1 Feb 2021 20:10:58 GMT, Ioi Lam wrote: >> - JVM_GetInterfaceVersion() was used by "HotSpot Express" (HSX) which allowed the same JDK library to use different version of HotSpot. However, HSX is no longer supported so this API should be removed. >> - Implementations of APIs such as JVM_DTraceActivate, were removed in [JDK-8068976](https://bugs.openjdk.java.net/browse/JDK-8068976), so their declarations should be removed from jvm.h > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > fixed macos build Marked as reviewed by gziemski (Committer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2338 From iklam at openjdk.java.net Tue Feb 2 17:22:04 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Tue, 2 Feb 2021 17:22:04 GMT Subject: RFR: 8260471: Change SystemDictionary::X_klass calls to vmClasses::X_klass [v6] In-Reply-To: References: Message-ID: > This is the second step of https://github.com/openjdk/jdk/pull/2246 (8260467: Move well-known classes from systemDictionary.hpp to vmClasses.hpp). These are mostly boiler-plate changes done by scripts. > > [1] Change calls like > > SystemDictionary::Object_klass() > SystemDictionary::Throwable_klass_is_loaded() > SystemDictionary::box_klass_type() > > to > > vmClasses::Object_klass() > vmClasses::Throwable_klass_is_loaded() > vmClasses::box_klass_type() > > [2] Remove unnecessary inclusion of systemDictionary.hpp (replace with vmClasses.hpp if necessary). In some cases, I have to add signature.hpp to some files, which only indirectly included signature.hpp through systemDictionary.hpp. > > [3] In the previous PR, I incorrectly used the enum name `VMClassID`. This PR changes it to `vmClassID` to match the existing use of `vmSymbolID` and `vmIntrinsicID`. > > Due to the refactoring of these two PRs, the number of HotSpot .o files that include systemDictionary.hpp decreases from 491 to 91. HotSpot build time is reduced by about 2% > > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. > > Review Notes: if you don't want to scroll through 185 files, you may want to try: > > curl https://github.com/openjdk/jdk/compare/1de3c554477497d1ceee573180940e8d38c364ee...e2f77252c8b3edd4d0071cfc014290568a16de9d.diff | \ > grep -v '^[+-][+-][+-]' | grep '^[+-]' Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: - updated copyright - Merge branch 'master' into 8260471-SystemDictionary-to-vmClasses-rename - fixed SA (serviceability/sa/CDSJMapClstats.java) - fixed comments - Merge branch 'master' into 8260471-SystemDictionary-to-vmClasses-rename - fixed AOT build - added missing #include systemDictionary.hpp - 8260471: Change SystemDictionary::xxx_klass() calls to vmClasses::xxx_klass() ------------- Changes: https://git.openjdk.java.net/jdk/pull/2301/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2301&range=05 Stats: 806 lines in 191 files changed: 94 ins; 67 del; 645 mod Patch: https://git.openjdk.java.net/jdk/pull/2301.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2301/head:pull/2301 PR: https://git.openjdk.java.net/jdk/pull/2301 From iklam at openjdk.java.net Tue Feb 2 17:22:05 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Tue, 2 Feb 2021 17:22:05 GMT Subject: RFR: 8260471: Change SystemDictionary::X_klass calls to vmClasses::X_klass [v5] In-Reply-To: <-_LSd7J2e_ReZLdIKH0wlpoR8pv2CF75saI9eJkHWms=.dd6717de-5f5e-43fe-b3fc-8ebdd649bad8@github.com> References: <-_LSd7J2e_ReZLdIKH0wlpoR8pv2CF75saI9eJkHWms=.dd6717de-5f5e-43fe-b3fc-8ebdd649bad8@github.com> Message-ID: On Tue, 2 Feb 2021 07:50:45 GMT, Thomas Stuefe wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed SA (serviceability/sa/CDSJMapClstats.java) > > Marked as reviewed by stuefe (Reviewer). Thanks @tstuefe, @hseigel, @dholmes-ora and @lfoltan for the review. I will merge/test/push tonight when the repo is relatively quiet. ------------- PR: https://git.openjdk.java.net/jdk/pull/2301 From stuefe at openjdk.java.net Tue Feb 2 17:31:03 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 2 Feb 2021 17:31:03 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v2] In-Reply-To: References: Message-ID: > In signal handling code, we have code sections which save signal handler state into vectors of sigaction structures, or of integers (if only flags are saved). All these code sections can be unified, disentangled and the using code simplified. > > There are three places where we do this: > > 1) When installing hotspot signal handlers, should we find a handler in place and signal chaining is enabled, we save the original handler inside a sigaction array and a corresponding sigset: > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L85 > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L338 > > 2) if diagnostics are enabled with -Xcheck:jni, we periodically check if our hotspot signal handlers had been replaced (`static void check_signal_handler(int sig)`): > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L766 > To do that, we store information about the handlers we installed and we expect to be intact; in this case we only store the sigaction flags (`int sigflags[NSIG];`) and deduce the handler address from context. > > 3) There is a complicated dance between VMError and the posix signal handler code: If a fatal error happens, we enter error reporting and install the secondary handler (`VMError::install_secondary_signal_handler()`). Before doing that, we store the handler we replace in yet another array, in this case one array for the handler address, one for the flag: > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/vmError_posix.cpp#L77 > I believe the purpose of this is to - when printing signal handlers as part of error reporting - print the original signal handler instead of the secondary crash handler (see `PosixSignals::print_signal_handler()`): > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L1372 > and additionally to not trip this warning here: > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L1391 > > ------ > > Changes in this patch: > > - I added some convenience macros to check if a handler matches a given function (HANDLER_IS), check if a handler is set to ignore or default or both (HANDLER_IS_IGN, HANDLER_IS_DFL, HANDLER_IS_IGN_OR_DFL). Makes code more readable. > - I added convenience class `SavedSignalHandlers` to keep a vector of handler information by signal number. > - I used that class to cover cases (1)..(3): > - `chained_handlers` contains all information of chained handlers > - `expected_handlers` contains a copy of the handlers the hotspot installed > - `replaced_handlers` contains information about replaced handlers > > - about (1): I store the chained signal handler information in `chained_handlers` when installing a hotspot handler, UseSignalChaining is 1, and a non-default handler was encountered. > > - about (2): I simplified the signal checking mechanism quite a bit: it compares the handler (address and flags) it finds present with expectations. Before this patch, the expected handler address was deduced in a hard-wired way, now, we just compare the active sigaction structure with the one we installed on VM start. > > - about (3): when installing any handler (hotspot as well as user defined via java), I store the handler it replaced in `replaced_handlers`. I use that to print which handler had been replaced in `PosixSignals::print_signal_handler`. I simplified `PosixSignals::print_signal_handler` such that it does not retain any knowledge about hotspot signal handlers. Now, it just prints out the currently established handlers. In addition to that, it prints out chaining information and which handlers had been replaced. I removed the associated coding from VMError. > > Output Before: > 663 Signal Handlers: > 664 SIGSEGV: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 665 SIGBUS: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 666 SIGFPE: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 667 SIGPIPE: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 668 SIGXFSZ: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 669 SIGILL: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 670 SIGUSR2: SR_handler in libjvm.so, sa_mask[0]=00000000000000000000000000000000, sa_flags=SA_RESTART|SA_SIGINFO > 671 SIGHUP: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 672 SIGINT: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 673 SIGTERM: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 674 SIGQUIT: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 675 SIGTRAP: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > > Now: > Signal Handlers: > SIGSEGV: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > replaced: SIGSEGV: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGBUS: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > replaced: SIGBUS: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGFPE: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > replaced: SIGFPE: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGPIPE: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGXFSZ: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGILL: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > replaced: SIGILL: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGUSR2: SR_handler in libjvm.so, mask=00000000000000000000000000000000, flags=SA_RESTART|SA_SIGINFO > SIGHUP: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGINT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGTERM: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGQUIT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGTRAP: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > > ----- > Tests: GA, and the patch has been tested in our nighlies for over a month now. I manually executed the runtime/jni/checked tests too. Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Make SavedSignalHandlers use C-heap for its items - Removed display-replaced-handler-logic - Feedback David - Merge - JDK-8260485-signal-handler-improvements ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2251/files - new: https://git.openjdk.java.net/jdk/pull/2251/files/016d35c2..f8deb8a7 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2251&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2251&range=00-01 Stats: 23284 lines in 487 files changed: 8309 ins; 5639 del; 9336 mod Patch: https://git.openjdk.java.net/jdk/pull/2251.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2251/head:pull/2251 PR: https://git.openjdk.java.net/jdk/pull/2251 From stuefe at openjdk.java.net Tue Feb 2 17:36:45 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 2 Feb 2021 17:36:45 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v2] In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 10:04:04 GMT, David Holmes wrote: > Hi Thomas, > > I've taken a first pass at this. I like parts of it, perhaps most of it, but am having trouble seeing the before/after picture clearly. > > A few comments below. > > Thanks, > David Thanks David. I reduced the complexity of the patch somewhat by removing case (3) completely. I found that it adds very little useful functionality. So all that is left now is (1) the simplified handling of chained handlers and (2) the new logic for implementing the signal handler jni check. I also changed/simplified `SavedSignalHandlers` somewhat. NSIG can be largeish (I believe 1024 on AIX) and that would have made the array-of-sigaction structures large too. Not that it really matters much. But I changed the class into a pointer array, with the sigaction structures living in C-Heap. Since this array is sparsely populated, that saves space. Also removes the need for the membership sigset, since a NULL slot would indicate its not set. Further answers inline. Feel free to ask questions if something is unclear. Thanks, Thomas > src/hotspot/os/posix/signals_posix.cpp line 99: > >> 97: bool check_signal_number(int sig) const { >> 98: assert(sig > 0 || sig < NSIG, "invalid signal number %d", sig); >> 99: return sig > 0 || sig < NSIG; > > Surely && not || Good catch, this was an oversight. > src/hotspot/os/posix/signals_posix.cpp line 94: > >> 92: // structures and membership information. >> 93: class SavedSignalHandlers { >> 94: struct sigaction _v[NSIG]; > > Why _v ?? v as in vector... I changed it to _sa. > src/hotspot/os/posix/signals_posix.cpp line 105: > >> 103: void mark_as_set(int sig) { sigaddset(&_set, sig); } >> 104: void mark_as_clear(int sig) { sigdelset(&_set, sig); } >> 105: > > These don't really pay for themselves IMO. They are private and only used once each, and are a single line of code. The extra abstraction layer really doesn't add anything in terms of clarity. True. I removed the whole sigset. See my comment below. > src/hotspot/os/posix/signals_posix.cpp line 844: > >> 842: if (sig == SHUTDOWN2_SIGNAL && !isatty(fileno(stdin))) { // Flags? >> 843: tty->print_cr("Running in non-interactive shell, %s handler is replaced by shell", >> 844: os::exception_name(sig, buf, O_BUFLEN)); // When comparing, ignore the SA_RESTORER flag on Linux > > I don't understand the flag comments in this block. Just a typo. ------------- PR: https://git.openjdk.java.net/jdk/pull/2251 From stuefe at openjdk.java.net Tue Feb 2 17:56:54 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 2 Feb 2021 17:56:54 GMT Subject: RFR: JDK-8260926: Trace resource exhausted events unconditionally Message-ID: <-LuicM5vRHrcchY00NJhTTgngnNKX3ZkbKZVM_pmHpE=.372c1b6a-120c-460d-a168-76f200408efd@github.com> Analyzing out-of-resource situations in cloud scenarios is no fun. With CloudFoundry, a JVMTI agent (jvmkill) is hooked up intercepting the jvmti "resource exhausted" event, then attempts to write up a heap report. That may fail, e.g. due to bugs in the agent [1], but also because that report runs java code and may suffer from the same resource exhaustion. Successful or not, it unceremoniously kills the VM when done, often leaving us with no information about the actual resource. It would be very helpful if we had unconditional tracing here. We do have tracing, but it requires a non-product build and is triggered with TraceJVMTI. Also, it traces at trace level which is way to fine granular. I'd like to introduce another, unconditional trace line here. Arguably, resource exhausted is fatal enough that it justifies unconditional tracing. This is a bit of a coin toss. Tracing unconditionally would help in most scenarios, where it would be either difficult or even impossible to specify a trace command line switch. OTOH it may trip up scripts parsing the VM output, or some of our tests (which can be fixed). Thoughts? ..Thomas [1] https://github.com/cloudfoundry/jvmkill/issues/18 ------------- Commit messages: - Add trace Changes: https://git.openjdk.java.net/jdk/pull/2350/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2350&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260926 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2350.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2350/head:pull/2350 PR: https://git.openjdk.java.net/jdk/pull/2350 From gziemski at openjdk.java.net Tue Feb 2 18:55:48 2021 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Tue, 2 Feb 2021 18:55:48 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 11:59:08 GMT, Anton Kozlov wrote: >> Please review the implementation of JEP 391: macOS/AArch64 Port. >> >> It's heavily based on existing ports to linux/aarch64, macos/x86_64, and windows/aarch64. >> >> Major changes are in: >> * src/hotspot/cpu/aarch64: support of the new calling convention (subtasks JDK-8253817, JDK-8253818) >> * src/hotspot/os_cpu/bsd_aarch64: copy of os_cpu/linux_aarch64 with necessary adjustments (JDK-8253819) >> * src/hotspot/share, test/hotspot/gtest: support of write-xor-execute (W^X), required on macOS/AArch64 platform. It's implemented with pthread_jit_write_protect_np provided by Apple. The W^X mode is local to a thread, so W^X mode change relates to the java thread state change (for java threads). In most cases, JVM executes in write-only mode, except when calling a generated stub like SafeFetch, which requires a temporary switch to execute-only mode. The same execute-only mode is enabled when a java thread executes in java or native states. This approach of managing W^X mode turned out to be simple and efficient enough. >> * src/jdk.hotspot.agent: serviceability agent implementation (JDK-8254941) > > Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > support macos_aarch64 in hsdis Changes requested by gziemski (Committer). src/hotspot/cpu/aarch64/interpreterRT_aarch64.cpp line 390: > 388: store_and_inc(_to, from_obj, NativeStack::intSpace); > 389: > 390: _num_int_args++; `pass_byte()` and `pass_short()` use only one `_num_int_args++;` after the `if else` but other methods use 2 of them inside `if else` branches. We should be consistent. src/hotspot/cpu/aarch64/interpreterRT_aarch64.cpp line 404: > 402: } else { > 403: store_and_inc(_to, from_obj, NativeStack::longSpace); > 404: _num_int_args++; `pass_byte()` and `pass_short()` use only one `_num_int_args++;` after the `if else` but other methods use it 2 of them inside `if else` We should be consistent. src/hotspot/cpu/aarch64/interpreterRT_aarch64.cpp line 418: > 416: } else { > 417: store_and_inc(_to, (*from_addr == 0) ? (intptr_t)NULL : (intptr_t) from_addr, wordSize); > 418: _num_int_args++; `pass_byte()` and `pass_short()` use only one `_num_int_args++;` after the `if else` but other methods use it 2 of them inside `if else` We should be consistent. src/hotspot/cpu/aarch64/interpreterRT_aarch64.cpp line 433: > 431: store_and_inc(_to, from_obj, NativeStack::floatSpace); > 432: > 433: _num_fp_args++; `pass_byte()` and `pass_short()` use only one `_num_int_args++;` after the `if else` but other methods use it 2 of them inside `if else` We should be consistent. src/hotspot/cpu/aarch64/interpreterRT_aarch64.cpp line 448: > 446: } else { > 447: store_and_inc(_to, from_obj, NativeStack::doubleSpace); > 448: _num_fp_args++; `pass_byte()` and `pass_short()` use only one `_num_int_args++;` after the `if else` but other methods use it 2 of them inside `if else` We should be consistent. src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5271: > 5269: // > 5270: void MacroAssembler::get_thread(Register dst) { > 5271: RegSet saved_regs = RegSet::range(r0, r1) + BSD_ONLY(RegSet::range(r2, r17)) + lr - dst; The comment needs to be updated, since on BSD we also seem to clobber r2,r17 ? src/hotspot/os/posix/signals_posix.cpp line 1297: > 1295: kern_return_t kr; > 1296: kr = task_set_exception_ports(mach_task_self(), > 1297: EXC_MASK_BAD_ACCESS | EXC_MASK_BAD_INSTRUCTION | EXC_MASK_ARITHMETIC, Could someone elaborate on why we need to add `EXC_MASK_BAD_INSTRUCTION` to the mask here? src/hotspot/cpu/aarch64/vm_version_aarch64.hpp line 93: > 91: CPU_MARVELL = 'V', > 92: CPU_INTEL = 'i', > 93: CPU_APPLE = 'a', The `ARM Architecture Reference Manual ARMv8, for ARMv8-A architecture profile` has 8538 pages, can we be more specific and point to the particular section of the document where the CPU codes are defined? ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From gziemski at openjdk.java.net Tue Feb 2 19:00:56 2021 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Tue, 2 Feb 2021 19:00:56 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 18:52:29 GMT, Gerard Ziemski wrote: >> Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> support macos_aarch64 in hsdis > > Changes requested by gziemski (Committer). There were bunch of assembly code that I couldn't review. I also shallow reviewed the brand new files that were copies of the existing BSD x86_64 files - I hope I can get back to those when I figure out how to build/run these changes. ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From burban at openjdk.java.net Tue Feb 2 19:25:49 2021 From: burban at openjdk.java.net (Bernhard Urban-Forster) Date: Tue, 2 Feb 2021 19:25:49 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 18:23:04 GMT, Gerard Ziemski wrote: >> Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> support macos_aarch64 in hsdis > > src/hotspot/os/posix/signals_posix.cpp line 1297: > >> 1295: kern_return_t kr; >> 1296: kr = task_set_exception_ports(mach_task_self(), >> 1297: EXC_MASK_BAD_ACCESS | EXC_MASK_BAD_INSTRUCTION | EXC_MASK_ARITHMETIC, > > Could someone elaborate on why we need to add `EXC_MASK_BAD_INSTRUCTION` to the mask here? See comment above about `gdb`, the same applies to `lldb` today. The AArch64 backend uses `SIGILL` (~= `EXC_MASK_BAD_INSTRUCTION`) to initiate a deoptimization. Without this change you cannot continue debugging once you the debuggee receives `SIGILL`. This wasn't needed before as x86 doesn't use `SIGILL`. ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From honguye at microsoft.com Tue Feb 2 21:46:25 2021 From: honguye at microsoft.com (Nhat Nguyen) Date: Tue, 2 Feb 2021 21:46:25 +0000 Subject: [EXTERNAL] Re: Follow up on [JDK-7018422] - JavaAgent code always interpreted during initialization phase In-Reply-To: <4e6a3167c955d66ef732e46c7fcda51566d19359.camel@redhat.com> References: <26ea45a4-bf19-6a5e-c885-5b628e6aebfd@oracle.com> <4e6a3167c955d66ef732e46c7fcda51566d19359.camel@redhat.com> Message-ID: Hi all, Thank you all for the thoughtful responses. As someone who's new to the community, the replies really help me understand the risks involved in such changes; especially it's jdk8 that we're talking about here. I totally agree and acknowledge that, given the risks, changing the initialization order for the sake of performance is not plausible enough. With that said, for completeness sake only, I'm sharing more details on why I thought that changing the order can help trigger the JIT. But first I acknowledge that my understanding in this area of the code is limited, and so there are things that I could have overlooked. JvmtiExport::post_vm_initialized() [1] calls into eventHandlerVMInit() [2] which runs the premain method. At this point, CompileBroker::compilation_init() [3] hasn't been called yet, so the c1 and c2 compiler instances are always null [4]. This in turn makes the JVM deny all compilation requests [5]. The repro that I used to test the behaviour is [6]. The commands I used were: "java -javaagent:target/premain-test-1.0-SNAPSHOT.jar -version" for the premain version "java -jar target/premain-test-1.0-SNAPSHOT.jar" for the regular main version In this repro, the premain version is consistently slow as opposed to the main version which sees significant speedup after the JIT has kicked in. After changing the initialization order, running the premain version with -XX:+PrintCompilation showed that the JIT is in effect and allowed the premain version to run as fast as the main one. As I mention above, I'm sharing the my findings for completeness only. I understand that even if the JIT is indeed in effect in this particular case, there are certain hidden dependencies that make this a very risky change. I'm also cc'ing Trask here, who is more familiar with the javaagent in question, so that he can discuss some potential workarounds that David mentioned. Thank you, Nhat [1]: JvmtiExport::post_vm_initialized() https://github.com/AdoptOpenJDK/openjdk-jdk8u/blob/91a5bf6cd78c7c073f1e8851217ae3a2241d9441/hotspot/src/share/vm/runtime/thread.cpp#L3638 [2]: eventHandlerVMInit() calling premain https://github.com/AdoptOpenJDK/openjdk-jdk8u/blob/91a5bf6cd78c7c073f1e8851217ae3a2241d9441/jdk/src/share/instrument/InvocationAdapter.c#L490 [3]: CompileBroker::compilation_init() https://github.com/AdoptOpenJDK/openjdk-jdk8u/blob/91a5bf6cd78c7c073f1e8851217ae3a2241d9441/hotspot/src/share/vm/runtime/thread.cpp#L3648 [4]: CompileBroker::compilation_init() initializing c1 & c2 instances https://github.com/AdoptOpenJDK/openjdk-jdk8u/blob/91a5bf6cd78c7c073f1e8851217ae3a2241d9441/hotspot/src/share/vm/compiler/compileBroker.cpp#L897 [5]: https://github.com/AdoptOpenJDK/openjdk-jdk8u/blob/91a5bf6cd78c7c073f1e8851217ae3a2241d9441/hotspot/src/share/vm/compiler/compileBroker.cpp#L1350 [6]: https://github.com/trask/premain-test -----Original Message----- From: Severin Gehwolf Sent: Thursday, January 28, 2021 2:05 AM To: David Holmes ; Nhat Nguyen ; hotspot-dev at openjdk.java.net Cc: jdk8u-dev ; Andrew Haley Subject: [EXTERNAL] Re: Follow up on [JDK-7018422] - JavaAgent code always interpreted during initialization phase Hi, On Thu, 2021-01-28 at 14:37 +1000, David Holmes wrote: > Hi, > > On 28/01/2021 10:51 am, Nhat Nguyen wrote: > > Hi, > > > > I'm writing this email to follow up on JDK-7018422 [1] on jdk8u, > > where the JavaAgent code is always interpreted during the initialization phase on jdk8. > > Just FYI for changes to 8u you need to raise the issue on > jdk8u-dev at openjdk.java.net. Adding jdk8u-dev to cc. > > > For background, one of our tools is currently making use of a > > JavaAgent whose performance is important. As part of a performance > > investigation, I discovered that the CompilerBroker is not > > initialized by the time the premain method runs, therefore all compilation requests for the JavaAgent are discarded. > > > > Fortunately, I found JDK-7018422 which describes the exact issue that we are experiencing. > > However, the ticket was closed as "Not an issue" because the > > initialization order is changed with the introduction of the module system. > > > > So I would like to ask the mailing list for opinions on whether a > > fix for this issue can be considered for jdk8u. I have also attached > > a prototype fix [2] and would appreciate any suggestions and comments as well. > > You have to be extremely careful changing the initialization order as > there are many hidden inter-dependencies. Yes, and we've been bitten by this before[i]. There is a significant risk involved changing initialization order in stable jdk8u. The reasons given here don't strike me as a good enough reason to accept such a patch. YMMV. Andrew Haley, thoughts? Thanks, Severin [i] https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8249158&data=04%7C01%7Chonguye%40microsoft.com%7C7e9a8beabe6b456f7e8408d8c374348c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637474251920297965%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=nXN72pGgJ%2FhFKSPrj4cWfkiblAvhVCug5JcXl1y279I%3D&reserved=0 and https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8233197&data=04%7C01%7Chonguye%40microsoft.com%7C7e9a8beabe6b456f7e8408d8c374348c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637474251920297965%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=OzvFPmEZYa4k2RPGdVYNYc6Du7BH8mOuiB%2FGV666Yeg%3D&reserved=0 > And as Serguei stated, that change may potentially allow JIT'ing to be > possible, but that doesn't mean it will actually occur. > > Cheers, > David > ----- > > > > > [1]: > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbu > > gs.openjdk.java.net%2Fbrowse%2FJDK-7018422&data=04%7C01%7Chonguy > > e%40microsoft.com%7C7e9a8beabe6b456f7e8408d8c374348c%7C72f988bf86f14 > > 1af91ab2d7cd011db47%7C1%7C0%7C637474251920297965%7CUnknown%7CTWFpbGZ > > sb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0 > > %3D%7C2000&sdata=ds5f37k%2B4297QtOCNjd%2BrezIqyU2rmcT4URVBPCB8Kg > > %3D&reserved=0 > > [2]: > > > > --- > > ? hotspot/src/share/vm/runtime/thread.cpp | 10 +++++----- > > ? 1 file changed, 5 insertions(+), 5 deletions(-) > > > > diff --git a/hotspot/src/share/vm/runtime/thread.cpp > > b/hotspot/src/share/vm/runtime/thread.cpp > > index 330593acb3..3918f989cc 100644 > > --- a/hotspot/src/share/vm/runtime/thread.cpp > > +++ b/hotspot/src/share/vm/runtime/thread.cpp > > @@ -3634,11 +3634,6 @@ jint Threads::create_vm(JavaVMInitArgs* > > args, bool* canTryAgain) { > > ????? create_vm_init_libraries(); > > ??? } > > ? > > -? // Notify JVMTI agents that VM initialization is complete - nop > > if no agents. > > -? JvmtiExport::post_vm_initialized(); > > - > > -? JFR_ONLY(Jfr::on_vm_start();) > > - > > ??? if (CleanChunkPoolAsync) { > > ????? Chunk::start_chunk_pool_cleaner_task(); > > ??? } > > @@ -3648,6 +3643,11 @@ jint Threads::create_vm(JavaVMInitArgs* > > args, bool* canTryAgain) { > > ??? CompileBroker::compilation_init(); > > ? #endif > > ? > > +? // Notify JVMTI agents that VM initialization is complete - nop > > if no agents. > > +? JvmtiExport::post_vm_initialized(); > > + > > +? JFR_ONLY(Jfr::on_vm_start();) > > + > > ??? if (EnableInvokeDynamic) { > > ????? // Pre-initialize some JSR292 core classes to avoid deadlock > > during class loading. > > ????? // It is done after compilers are initialized, because > > otherwise compilations of > > -- > > > From dcubed at openjdk.java.net Tue Feb 2 23:13:00 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Tue, 2 Feb 2021 23:13:00 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 11:59:08 GMT, Anton Kozlov wrote: >> Please review the implementation of JEP 391: macOS/AArch64 Port. >> >> It's heavily based on existing ports to linux/aarch64, macos/x86_64, and windows/aarch64. >> >> Major changes are in: >> * src/hotspot/cpu/aarch64: support of the new calling convention (subtasks JDK-8253817, JDK-8253818) >> * src/hotspot/os_cpu/bsd_aarch64: copy of os_cpu/linux_aarch64 with necessary adjustments (JDK-8253819) >> * src/hotspot/share, test/hotspot/gtest: support of write-xor-execute (W^X), required on macOS/AArch64 platform. It's implemented with pthread_jit_write_protect_np provided by Apple. The W^X mode is local to a thread, so W^X mode change relates to the java thread state change (for java threads). In most cases, JVM executes in write-only mode, except when calling a generated stub like SafeFetch, which requires a temporary switch to execute-only mode. The same execute-only mode is enabled when a java thread executes in java or native states. This approach of managing W^X mode turned out to be simple and efficient enough. >> * src/jdk.hotspot.agent: serviceability agent implementation (JDK-8254941) > > Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > support macos_aarch64 in hsdis For platform files that were copied from other ports to this port, if the file wasn't changed I presume the copyright years are left alone. If the file required changes for this port, I expect the year to be updated to 2021. How are you verifying that these copyright years are being properly managed on the new files? For the new W^X helpers, e.g., WXWriteFromExecSetter, a short comment explaining why one was landed in a particular place would help reviewers. Also see my comment about creating a new ThreadToNativeWithWXExecFromVM helper. I'm stopping my review with all the src/hotspot files done for now. make/autoconf/flags.m4 line 140: > 138: else > 139: MACOSX_VERSION_MIN=10.12.0 > 140: fi Not something that needs to be addressed here, but these changes illustrate that our collective use of macOSX/MACOSX/MacOSX names are tied to the fact that the macOS major version number was at 10 for a very long time. @magicus - Do we have an RFE to rename MACOSX or are we sticking with it and evolving our interpretation of the 'X' from '10' to */splat/asterik? make/common/NativeCompilation.gmk line 1178: > 1176: endif > 1177: # This only works if the openjdk_codesign identity is present on the system. Let > 1178: # silently fail otherwise. Might want to add a comment here: # The '-f' option will replace an existing signature if one exists. src/hotspot/cpu/aarch64/interpreterRT_aarch64.cpp line 41: > 39: #define __ _masm-> > 40: > 41: //describe amount of space in bytes occupied by type on native stack nit - needs a space after `//` src/hotspot/cpu/aarch64/interpreterRT_aarch64.cpp line 83: > 81: // On macos/aarch64 native stack is packed, int/float are using only 4 bytes > 82: // on stack. Natural alignment for types are still in place, > 83: // for example double/long should be 8 bytes alligned nit typo: s/alligned/aligned/ And sentence should end with a period. src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 323: > 321: str(zr, Address(rthread, JavaThread::last_Java_pc_offset())); > 322: > 323: str(zr, Address(rthread, JavaFrameAnchor::saved_fp_address_offset())); I don't think this switch from `JavaThread::saved_fp_address_offset()` to `JavaFrameAnchor::saved_fp_address_offset()` is correct since `rthread` is still used and is a JavaThread*. The new code will give you: `rthread` + offset of the `saved_fp_address` field in a JavaFrameAnchor The old code gave you: `rthread` + offset of the `saved_fp_address` field in the JavaFrameAnchor field in the JavaThread Those are not the same things. src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 31: > 29: #include "asm/assembler.inline.hpp" > 30: #include "oops/compressedOops.hpp" > 31: #include "runtime/vm_version.hpp" It's not clear why this include needed to be added. src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 810: > 808: #ifdef __APPLE__ > 809: // Less-than word types are stored one after another. > 810: // The code unable to handle this, bailout. Perhaps: // The code is unable to handle this so bailout. src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 837: > 835: #ifdef __APPLE__ > 836: // Less-than word types are stored one after another. > 837: // The code unable to handle this, bailout. Perhaps: // The code is unable to handle this so bailout. src/hotspot/os/aix/os_aix.cpp line 69: > 67: #include "runtime/sharedRuntime.hpp" > 68: #include "runtime/statSampler.hpp" > 69: #include "runtime/stubRoutines.inline.hpp" It is not clear why this include to inline-include change was made. src/hotspot/os/linux/gc/z/zPhysicalMemoryBacking_linux.cpp line 37: > 35: #include "runtime/init.hpp" > 36: #include "runtime/os.hpp" > 37: #include "runtime/stubRoutines.inline.hpp" It is not clear why this include to inline-include change was made. src/hotspot/os/windows/os_windows.cpp line 65: > 63: #include "runtime/sharedRuntime.hpp" > 64: #include "runtime/statSampler.hpp" > 65: #include "runtime/stubRoutines.inline.hpp" It is not clear why this include to inline-include change was made. src/hotspot/os_cpu/bsd_aarch64/atomic_bsd_aarch64.hpp line 59: > 57: } > 58: > 59: // __attribute__((unused)) on dest is to get rid of spurious GCC warnings. Is the GCC comment appropriate? Does this file get built with GCC or just Apple compilers? src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp line 92: > 90: # define DEFAULT_MAIN_THREAD_STACK_PAGES 2048 > 91: # define OS_X_10_9_0_KERNEL_MAJOR_VERSION 13 > 92: #endif Does this work around for Maveriks (10.9.0) need to be here? src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp line 103: > 101: // 10.5 UNIX03 member name prefixes > 102: #define DU3_PREFIX(s, m) __ ## s.__ ## m > 103: # else Does this work around for macOS 10.5 need to be here? src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp line 195: > 193: frame os::get_sender_for_C_frame(frame* fr) { > 194: return frame(fr->link(), fr->link(), fr->sender_pc()); > 195: } Is this file going to be built by GCC or just macOS compilers? src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp line 221: > 219: assert(sig == info->si_signo, "bad siginfo"); > 220: } > 221: */ Should this code be deleted? From here and from where it was copied from if it is also commented out there... src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp line 363: > 361: address pc = os::Posix::ucontext_get_pc(uc); > 362: > 363: if (pc != addr && uc->context_esr == 0x9200004F) { //TODO: figure out what this value means Is this TODO going to be resolved by this port? src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp line 374: > 372: > 373: last_addr = (address) -1; > 374: } else if (pc == addr && uc->context_esr == 0x8200000f) { //TODO: figure out what this value means Is this TODO going to be resolved by this port? src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp line 435: > 433: // | |\ Java thread created by VM does not have glibc > 434: // | glibc guard page | - guard, attached Java thread usually has > 435: // | |/ 1 glibc guard page. Is this code going to be built by GCC (with glibc) or will only macOS compilers and libraries be used? src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp line 486: > 484: } > 485: } > 486: } This appears to be a mix for Mavericks (10.9) and 10.12 work arounds. Is this code needed by this project? src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.hpp line 45: > 43: // Atomically copy 64 bits of data > 44: static void atomic_copy64(const volatile void *src, volatile void *dst) { > 45: *(jlong *) dst = *(const jlong *) src; Is this construct actually atomic on aarch64? src/hotspot/os_cpu/bsd_aarch64/thread_bsd_aarch64.cpp line 43: > 41: assert(Thread::current() == this, "caller must be current thread"); > 42: return pd_get_top_frame(fr_addr, ucontext, isInJava); > 43: } Is AsyncGetCallTrace() being supported by this port? src/hotspot/os_cpu/aix_ppc/os_aix_ppc.hpp line 43: > 41: > 42: private: > 43: 'private' usually has once space in front of it. Also why the blank line after it? src/hotspot/os_cpu/aix_ppc/os_aix_ppc.hpp line 46: > 44: static void current_thread_enable_wx_impl(WXMode mode) { } > 45: > 46: public: 'public' usually has one space in front of it. src/hotspot/os_cpu/bsd_x86/os_bsd_x86.hpp line 41: > 39: > 40: private: > 41: 'private' usually has one space in front of it. Also, why the blank line after it? src/hotspot/os_cpu/bsd_x86/os_bsd_x86.hpp line 44: > 42: static void current_thread_enable_wx_impl(WXMode mode) { } > 43: > 44: public: 'public' usually has one space in front of it. src/hotspot/os_cpu/bsd_zero/os_bsd_zero.hpp line 57: > 55: > 56: private: > 57: 'private' usually has one space in front of it. Also, why the blank line after it? src/hotspot/os_cpu/bsd_zero/os_bsd_zero.hpp line 60: > 58: static void current_thread_enable_wx_impl(WXMode mode) { } > 59: > 60: public: 'public' usually has one space in front of it. src/hotspot/os_cpu/linux_aarch64/os_linux_aarch64.hpp line 43: > 41: > 42: private: > 43: 'private' usually has one space in front of it. Also, why the blank line after it? src/hotspot/os_cpu/linux_aarch64/os_linux_aarch64.hpp line 46: > 44: static void current_thread_enable_wx_impl(WXMode mode) { } > 45: > 46: public: 'public' usually has one space in front of it. src/hotspot/os_cpu/linux_arm/os_linux_arm.hpp line 74: > 72: > 73: private: > 74: 'private' usually has one space in front of it. Also, why the blank line after it? src/hotspot/os_cpu/linux_arm/os_linux_arm.hpp line 77: > 75: static void current_thread_enable_wx_impl(WXMode mode) { } > 76: > 77: public: 'public' usually has one space in front of it. src/hotspot/os_cpu/linux_ppc/os_linux_ppc.hpp line 36: > 34: > 35: private: > 36: 'private' usually has one space in front of it. Also, why the blank line after it? src/hotspot/os_cpu/linux_ppc/os_linux_ppc.hpp line 39: > 37: static void current_thread_enable_wx_impl(WXMode mode) { } > 38: > 39: public: 'public' usually has one space in front of it. src/hotspot/os_cpu/linux_s390/os_linux_s390.hpp line 35: > 33: > 34: private: > 35: 'private' usually has one space in front of it. Also, why the blank line after it? src/hotspot/os_cpu/linux_s390/os_linux_s390.hpp line 38: > 36: static void current_thread_enable_wx_impl(WXMode mode) { } > 37: > 38: public: 'public' usually has one space in front of it. src/hotspot/os_cpu/linux_x86/os_linux_x86.hpp line 54: > 52: > 53: private: > 54: 'private' usually has one space in front of it. Also, why the blank line after it? src/hotspot/os_cpu/linux_x86/os_linux_x86.hpp line 57: > 55: static void current_thread_enable_wx_impl(WXMode mode) { } > 56: > 57: public: 'public' usually has one space in front of it. src/hotspot/os_cpu/linux_zero/os_linux_zero.hpp line 94: > 92: > 93: private: > 94: 'private' usually has one space in front of it. Also, why the blank line after it? src/hotspot/os_cpu/linux_zero/os_linux_zero.hpp line 97: > 95: static void current_thread_enable_wx_impl(WXMode mode) { } > 96: > 97: private: I think this should 'public' (with one space in front of it). src/hotspot/os_cpu/windows_aarch64/os_windows_aarch64.hpp line 37: > 35: > 36: private: > 37: 'private' usually has one space in front of it. Also, why the blank line after it? src/hotspot/os_cpu/windows_aarch64/os_windows_aarch64.hpp line 40: > 38: static void current_thread_enable_wx_impl(WXMode mode) { } > 39: > 40: public: 'public' usually has one space in front of it. src/hotspot/os_cpu/windows_x86/os_windows_x86.hpp line 47: > 45: > 46: private: > 47: 'private' usually has one space in front of it. Also, why the blank line after it? src/hotspot/os_cpu/windows_x86/os_windows_x86.hpp line 50: > 48: static void current_thread_enable_wx_impl(WXMode mode) { } > 49: > 50: public: 'public' usually has one space in front of it. src/hotspot/share/gc/shared/oopStorage.cpp line 40: > 38: #include "runtime/os.hpp" > 39: #include "runtime/safepoint.hpp" > 40: #include "runtime/stubRoutines.inline.hpp" The reason for switching from include to inline-include is not clear. src/hotspot/share/logging/logStream.hpp line 35: > 33: class LogStream : public outputStream { > 34: // see test/hotspot/gtest/logging/test_logStream.cpp > 35: friend class LogStreamTest; It's not clear why this change is made for this port. src/hotspot/share/prims/jni.cpp line 3930: > 3928: > 3929: // Go to the execute mode, the initial state of the thread on creation. > 3930: // Use os interface as the thread is not a java one anymore. Perhaps: s/not a java one anymore./not a JavaThread anymore./ src/hotspot/share/prims/nativeEntryPoint.cpp line 45: > 43: guarantee(status == JNI_OK && !env->ExceptionOccurred(), > 44: "register jdk.internal.invoke.NativeEntryPoint natives"); > 45: JNI_END I thought that jcheck caught a missing new-line? src/hotspot/share/runtime/globals.hpp line 1855: > 1853: \ > 1854: product(intx, UnguardOnExecutionViolation, 0, \ > 1855: "Unguard page and retry on no-execute fault " \ Taking away the "(Win32 only)" and not replacing it with new text feels wrong. I can't imagine that this flag works on any additional platforms except for macos-aarch64. src/hotspot/share/runtime/os.cpp line 58: > 56: #include "runtime/osThread.hpp" > 57: #include "runtime/sharedRuntime.hpp" > 58: #include "runtime/stubRoutines.inline.hpp" The reason for switching from include to inline include is not clear. src/hotspot/share/runtime/objectMonitor.cpp line 52: > 50: #include "runtime/safepointMechanism.inline.hpp" > 51: #include "runtime/sharedRuntime.hpp" > 52: #include "runtime/stubRoutines.inline.hpp" The reason for switching from include to inline include is not clear. src/hotspot/share/runtime/stubRoutines.cpp line 34: > 32: #include "runtime/timerTrace.hpp" > 33: #include "runtime/sharedRuntime.hpp" > 34: #include "runtime/stubRoutines.inline.hpp" The reason for switching from include to inline include is not clear. src/hotspot/share/runtime/stubRoutines.inline.hpp line 1: > 1: /* NOW I understand the reason for switching from include to inline-include. Is there a reason that this change is part of this project and not extracted into a separate RFE. That would reduce the number of files touched by this project. src/hotspot/share/runtime/thread.cpp line 3991: > 3989: JavaThread* thread = JavaThread::current(); > 3990: ThreadToNativeFromVM ttn(thread); > 3991: Thread::WXExecFromWriteSetter wx_exec; I saw somewhere in this review a comment about why this new WXExecFromWriteSetter helper isn't folded into ThreadToNativeFromVM and I understand that not every current ThreadToNativeFromVM needs the new helper. If the vast majority of the ThreadToNativeFromVM locations need WXExecFromWriteSetter, then perhaps those locations that currently have a ThreadToNativeFromVM followed by the new WXExecFromWriteSetter should use a new helper: ThreadToNativeWithWXExecFromVM so we'll see changes from: ThreadToNativeFromVM -> ThreadToNativeWithWXExecFromVM where we need them and hopefully a short comment can be added at the same time to explain the need for WXExec. This will allow us to easily distinguish ThreadToNativeFromVM locations that DO NOT need WXExec from those that DO need it. src/java.base/macosx/native/libjli/java_md_macosx.m line 210: > 208: if (preferredJVM == NULL) { > 209: #if defined(__i386__) > 210: preferredJVM = "client"; #if defined(__i386__) preferredJVM = "client"; Not your bug, but Oracle/OpenJDK never supported 32-bit __i386__ on macOS. ------------- Changes requested by dcubed (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2200 From iklam at openjdk.java.net Tue Feb 2 23:43:55 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Tue, 2 Feb 2021 23:43:55 GMT Subject: Integrated: 8260471: Change SystemDictionary::X_klass calls to vmClasses::X_klass In-Reply-To: References: Message-ID: On Thu, 28 Jan 2021 21:00:04 GMT, Ioi Lam wrote: > This is the second step of https://github.com/openjdk/jdk/pull/2246 (8260467: Move well-known classes from systemDictionary.hpp to vmClasses.hpp). These are mostly boiler-plate changes done by scripts. > > [1] Change calls like > > SystemDictionary::Object_klass() > SystemDictionary::Throwable_klass_is_loaded() > SystemDictionary::box_klass_type() > > to > > vmClasses::Object_klass() > vmClasses::Throwable_klass_is_loaded() > vmClasses::box_klass_type() > > [2] Remove unnecessary inclusion of systemDictionary.hpp (replace with vmClasses.hpp if necessary). In some cases, I have to add signature.hpp to some files, which only indirectly included signature.hpp through systemDictionary.hpp. > > [3] In the previous PR, I incorrectly used the enum name `VMClassID`. This PR changes it to `vmClassID` to match the existing use of `vmSymbolID` and `vmIntrinsicID`. > > Due to the refactoring of these two PRs, the number of HotSpot .o files that include systemDictionary.hpp decreases from 491 to 91. HotSpot build time is reduced by about 2% > > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. > > Review Notes: if you don't want to scroll through 185 files, you may want to try: > > curl https://github.com/openjdk/jdk/compare/1de3c554477497d1ceee573180940e8d38c364ee...e2f77252c8b3edd4d0071cfc014290568a16de9d.diff | \ > grep -v '^[+-][+-][+-]' | grep '^[+-]' This pull request has now been integrated. Changeset: ffbcf1b0 Author: Ioi Lam URL: https://git.openjdk.java.net/jdk/commit/ffbcf1b0 Stats: 806 lines in 191 files changed: 94 ins; 67 del; 645 mod 8260471: Change SystemDictionary::X_klass calls to vmClasses::X_klass Reviewed-by: lfoltan, hseigel, dholmes, stuefe ------------- PR: https://git.openjdk.java.net/jdk/pull/2301 From dholmes at openjdk.java.net Wed Feb 3 01:00:55 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 3 Feb 2021 01:00:55 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v2] In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 17:31:03 GMT, Thomas Stuefe wrote: >> In signal handling code, we have code sections which save signal handler state into vectors of sigaction structures, or of integers (if only flags are saved). All these code sections can be unified, disentangled and the using code simplified. >> >> There are three places where we do this: >> >> 1) When installing hotspot signal handlers, should we find a handler in place and signal chaining is enabled, we save the original handler inside a sigaction array and a corresponding sigset: >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L85 >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L338 >> >> 2) if diagnostics are enabled with -Xcheck:jni, we periodically check if our hotspot signal handlers had been replaced (`static void check_signal_handler(int sig)`): >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L766 >> To do that, we store information about the handlers we installed and we expect to be intact; in this case we only store the sigaction flags (`int sigflags[NSIG];`) and deduce the handler address from context. >> >> 3) There is a complicated dance between VMError and the posix signal handler code: If a fatal error happens, we enter error reporting and install the secondary handler (`VMError::install_secondary_signal_handler()`). Before doing that, we store the handler we replace in yet another array, in this case one array for the handler address, one for the flag: >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/vmError_posix.cpp#L77 >> I believe the purpose of this is to - when printing signal handlers as part of error reporting - print the original signal handler instead of the secondary crash handler (see `PosixSignals::print_signal_handler()`): >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L1372 >> and additionally to not trip this warning here: >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L1391 >> >> ------ >> >> Changes in this patch: >> >> - I added some convenience macros to check if a handler matches a given function (HANDLER_IS), check if a handler is set to ignore or default or both (HANDLER_IS_IGN, HANDLER_IS_DFL, HANDLER_IS_IGN_OR_DFL). Makes code more readable. >> - I added convenience class `SavedSignalHandlers` to keep a vector of handler information by signal number. >> - I used that class to cover cases (1)..(3): >> - `chained_handlers` contains all information of chained handlers >> - `expected_handlers` contains a copy of the handlers the hotspot installed >> - `replaced_handlers` contains information about replaced handlers >> >> - about (1): I store the chained signal handler information in `chained_handlers` when installing a hotspot handler, UseSignalChaining is 1, and a non-default handler was encountered. >> >> - about (2): I simplified the signal checking mechanism quite a bit: it compares the handler (address and flags) it finds present with expectations. Before this patch, the expected handler address was deduced in a hard-wired way, now, we just compare the active sigaction structure with the one we installed on VM start. >> >> - about (3): when installing any handler (hotspot as well as user defined via java), I store the handler it replaced in `replaced_handlers`. I use that to print which handler had been replaced in `PosixSignals::print_signal_handler`. I simplified `PosixSignals::print_signal_handler` such that it does not retain any knowledge about hotspot signal handlers. Now, it just prints out the currently established handlers. In addition to that, it prints out chaining information and which handlers had been replaced. I removed the associated coding from VMError. >> >> Output Before: >> 663 Signal Handlers: >> 664 SIGSEGV: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 665 SIGBUS: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 666 SIGFPE: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 667 SIGPIPE: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 668 SIGXFSZ: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 669 SIGILL: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 670 SIGUSR2: SR_handler in libjvm.so, sa_mask[0]=00000000000000000000000000000000, sa_flags=SA_RESTART|SA_SIGINFO >> 671 SIGHUP: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 672 SIGINT: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 673 SIGTERM: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 674 SIGQUIT: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 675 SIGTRAP: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> >> Now: >> Signal Handlers: >> SIGSEGV: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGSEGV: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGBUS: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGBUS: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGFPE: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGFPE: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGPIPE: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGXFSZ: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGILL: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGILL: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGUSR2: SR_handler in libjvm.so, mask=00000000000000000000000000000000, flags=SA_RESTART|SA_SIGINFO >> SIGHUP: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGINT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGTERM: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGQUIT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGTRAP: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> >> ----- >> Tests: GA, and the patch has been tested in our nighlies for over a month now. I manually executed the runtime/jni/checked tests too. > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Make SavedSignalHandlers use C-heap for its items > - Removed display-replaced-handler-logic > - Feedback David > - Merge > - JDK-8260485-signal-handler-improvements Hi Thomas, Still a few minor comments and queries, but overall it is looking good. Thanks, David src/hotspot/os/posix/signals_posix.cpp line 152: > 150: // For signal-chaining: > 151: // if chaining is active, chained_handlers contains all handlers which we > 152: // did replace with our own and to which we must delegate. Nit: s/which we did replace/we replaced/ src/hotspot/os/posix/signals_posix.cpp line 842: > 840: if (sig == SHUTDOWN2_SIGNAL && !isatty(fileno(stdin))) { > 841: tty->print_cr("Running in non-interactive shell, %s handler is replaced by shell", > 842: os::exception_name(sig, buf, O_BUFLEN)); // When comparing, ignore the SA_RESTORER flag on Linux What does this comment mean in this context ??? src/hotspot/os/posix/signals_posix.cpp line 1393: > 1391: st->print(", flags="); > 1392: int flags = act->sa_flags; > 1393: // On Linux, hide the SA_RESTORE flag typo: SA_RESTORE -> SA_RESTORER src/hotspot/os/posix/signals_posix.cpp line 1401: > 1399: // Print established signal handler for this signal. > 1400: // - if this signal handler was installed by us and is chained to a pre-established user handler > 1401: // it did replace, print that one too. s/it did replace/it replaced/ src/hotspot/os/posix/signals_posix.cpp line 1392: > 1390: > 1391: // Print established signal handler for this signal. > 1392: // - if this signal handler was installed by us and is chained to a pre-established user handler It is not clear to me that this check still remains somewhere. src/hotspot/os/posix/vmError_posix.cpp line 60: > 58: } > 59: > 60: void VMError::interrupt_reporting_thread() { I'm not clear what happened to these ?? (I'm not clear how they were used in the first place. :( ) ------------- Changes requested by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2251 From dholmes at openjdk.java.net Wed Feb 3 01:45:42 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 3 Feb 2021 01:45:42 GMT Subject: RFR: JDK-8260926: Trace resource exhausted events unconditionally In-Reply-To: <-LuicM5vRHrcchY00NJhTTgngnNKX3ZkbKZVM_pmHpE=.372c1b6a-120c-460d-a168-76f200408efd@github.com> References: <-LuicM5vRHrcchY00NJhTTgngnNKX3ZkbKZVM_pmHpE=.372c1b6a-120c-460d-a168-76f200408efd@github.com> Message-ID: <-ZUFjDD0KRceDLzTEHDuyfhPq47VOKzZAtPXECxwNMI=.a4b9069a-30a8-4aed-9625-94ee854531ba@github.com> On Tue, 2 Feb 2021 11:02:18 GMT, Thomas Stuefe wrote: > Analyzing out-of-resource situations in cloud scenarios is no fun. With CloudFoundry, a JVMTI agent (jvmkill) is hooked up intercepting the jvmti "resource exhausted" event, then attempts to write up a heap report. That may fail, e.g. due to bugs in the agent [1], but also because that report runs java code and may suffer from the same resource exhaustion. Successful or not, it unceremoniously kills the VM when done, often leaving us with no information about the actual resource. > > It would be very helpful if we had unconditional tracing here. We do have tracing, but it requires a non-product build and is triggered with TraceJVMTI. Also, it traces at trace level which is way to fine granular. > > I'd like to introduce another, unconditional trace line here. Arguably, resource exhausted is fatal enough that it justifies unconditional tracing. > > This is a bit of a coin toss. Tracing unconditionally would help in most scenarios, where it would be either difficult or even impossible to specify a trace command line switch. OTOH it may trip up scripts parsing the VM output, or some of our tests (which can be fixed). > > Thoughts? > > ..Thomas > > [1] https://github.com/cloudfoundry/jvmkill/issues/18 Hi Thomas, Approval in principle, but changes suggested. Thanks, David src/hotspot/share/prims/jvmtiExport.cpp line 1509: > 1507: JavaThread *thread = JavaThread::current(); > 1508: > 1509: log_error(os)("Resource Exhausted (%s)", description != nullptr ? description : "no info"); Shouldn't that be `log_error(jvmti)`? I'd also suggest the text "Posting Resource Exhausted event: %s" with "unknown" for a null description. ------------- Changes requested by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2350 From dongbo at openjdk.java.net Wed Feb 3 02:04:03 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Wed, 3 Feb 2021 02:04:03 GMT Subject: [jdk16] RFR: 8260585: AArch64: Wrong code generated for shifting right and accumulating four unsigned short integers [v7] In-Reply-To: References: Message-ID: > This is a typo introduced by JDK-8255949. > Compiler will generate `ushr` for shifting right and accumulating four short integers. > It produces wrong results for specific case. The instruction should be `usra`. Dong Bo has updated the pull request incrementally with one additional commit since the last revision: match ssra with 8B ------------- Changes: - all: https://git.openjdk.java.net/jdk16/pull/136/files - new: https://git.openjdk.java.net/jdk16/pull/136/files/693f8cbd..9e71e0f5 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk16&pr=136&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk16&pr=136&range=05-06 Stats: 32 lines in 1 file changed: 15 ins; 9 del; 8 mod Patch: https://git.openjdk.java.net/jdk16/pull/136.diff Fetch: git fetch https://git.openjdk.java.net/jdk16 pull/136/head:pull/136 PR: https://git.openjdk.java.net/jdk16/pull/136 From dongbo at openjdk.java.net Wed Feb 3 02:04:03 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Wed, 3 Feb 2021 02:04:03 GMT Subject: [jdk16] RFR: 8260585: AArch64: Wrong code generated for shifting right and accumulating four unsigned short integers [v3] In-Reply-To: <4KBCYEYrcTMEqEo89p3m4OfqSolqi2agyexH-k1vZrU=.813f7a97-b7ef-4cc0-809b-91e4c5f3c7fd@github.com> References: <4KBCYEYrcTMEqEo89p3m4OfqSolqi2agyexH-k1vZrU=.813f7a97-b7ef-4cc0-809b-91e4c5f3c7fd@github.com> Message-ID: On Tue, 2 Feb 2021 11:04:24 GMT, Ningsheng Jian wrote: > > Because we only have predicate(n->as_Vector()->length() == 8) in vsraa8B_imm, so they are not matched. > > We should fix this with the following code: > > I think this is an enhancement, and should be done in a separate patch in jdk mainline. OK, I update a test with loop size 80 for bytes so that `ssra` for 8B can be matched now. ------------- PR: https://git.openjdk.java.net/jdk16/pull/136 From dongbo at openjdk.java.net Wed Feb 3 02:20:51 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Wed, 3 Feb 2021 02:20:51 GMT Subject: [jdk16] RFR: 8260585: AArch64: Wrong code generated for shifting right and accumulating four unsigned short integers [v3] In-Reply-To: References: <4KBCYEYrcTMEqEo89p3m4OfqSolqi2agyexH-k1vZrU=.813f7a97-b7ef-4cc0-809b-91e4c5f3c7fd@github.com> Message-ID: On Wed, 3 Feb 2021 01:59:38 GMT, Dong Bo wrote: >>> Because we only have predicate(n->as_Vector()->length() == 8) in vsraa8B_imm, so they are not matched. >>> We should fix this with the following code: >> >> I think this is an enhancement, and should be done in a separate patch in jdk mainline. > >> > Because we only have predicate(n->as_Vector()->length() == 8) in vsraa8B_imm, so they are not matched. >> > We should fix this with the following code: >> >> I think this is an enhancement, and should be done in a separate patch in jdk mainline. > > OK, I update a test with loop size 80 for bytes so that `ssra` for 8B can be matched now. Ping... Can I get a review for the newest changes? Please let me know if we are ready to go. ------------- PR: https://git.openjdk.java.net/jdk16/pull/136 From dlong at openjdk.java.net Wed Feb 3 03:17:41 2021 From: dlong at openjdk.java.net (Dean Long) Date: Wed, 3 Feb 2021 03:17:41 GMT Subject: RFR: 8260301: misc gc/g1/unloading tests fails with "RuntimeException: Method could not be enqueued for compilation at level N" In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 15:44:16 GMT, Vladimir Kozlov wrote: > On return WB wait to acquire Compile_lock before checking compilation status > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/whitebox.cpp#L988 > > This lock is used by ciEnv for compiled code publishing: > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/ci/ciEnv.cpp#L981 > > So while WB waits the lock compiler thread can finish compilation, register nmethod and clear method's queued_for_compilation bit. > > The problem is that WB check `nm` value (compiled code) which it got before the lock and when method compilation is not finished. > > The fix is to check compiled code again similar to check in CompileBroker: > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/compileBroker.cpp#L1501 > > Passed hs-tier1-4 testing and 100 x vmTestbase/gc/g1/unloading/tests/unloading_compilation_*. Looks good. ------------- Marked as reviewed by dlong (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2356 From iklam at openjdk.java.net Wed Feb 3 03:59:43 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 3 Feb 2021 03:59:43 GMT Subject: RFR: 8260193: Remove JVM_GetInterfaceVersion() and JVM_DTraceXXX [v2] In-Reply-To: <_PhEUVHOgMItPaEPIZH_KHEMPP6D4tPPTn0qjDLzd8Q=.7f3486a5-90f8-4ff2-931a-ea4dd238dc4b@github.com> References: <188th_PzKn-dtdX8nHylqBZEa7Dddi7cU13bkoDzigc=.6a12ee5d-b027-4012-a137-0169440d61b6@github.com> <_PhEUVHOgMItPaEPIZH_KHEMPP6D4tPPTn0qjDLzd8Q=.7f3486a5-90f8-4ff2-931a-ea4dd238dc4b@github.com> Message-ID: On Tue, 2 Feb 2021 15:59:47 GMT, Gerard Ziemski wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed macos build > > Marked as reviewed by gziemski (Committer). Thanks @gerard-ziemski @magicus @AlanBateman @lfoltan for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/2338 From iklam at openjdk.java.net Wed Feb 3 03:59:44 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 3 Feb 2021 03:59:44 GMT Subject: RFR: 8260193: Remove JVM_GetInterfaceVersion() and JVM_DTraceXXX [v2] In-Reply-To: References: <188th_PzKn-dtdX8nHylqBZEa7Dddi7cU13bkoDzigc=.6a12ee5d-b027-4012-a137-0169440d61b6@github.com> <3s1-hVjGvofyZ6o7jh6ayZWMp_RkPlt1Juig5U9zQfM=.3adf577e-6ee6-4f29-ba22-c7d15e742f3e@github.com> Message-ID: On Tue, 2 Feb 2021 16:00:43 GMT, Gerard Ziemski wrote: >> I am not sure if jni_utils.c is the right file (it defines the `JNU_XXX` functions that are used by other shared libraries). >> >> There are other .c files that have trivial `DEF_JNI_OnLoad` functions (e.g., java.base/share/native/libnio/nio_util.c). >> >> @AlanBateman do you have any suggestions? > > I'm fine with the way it is, just thought we might want to consider cleaning up a bit more, since it's a cleanup task itself. Thanks Gerard. The main purpose of this PR is to clean up the JVM side. I'll leave the refactoring of check_version.c to the core-lib team. ------------- PR: https://git.openjdk.java.net/jdk/pull/2338 From iklam at openjdk.java.net Wed Feb 3 03:59:45 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 3 Feb 2021 03:59:45 GMT Subject: Integrated: 8260193: Remove JVM_GetInterfaceVersion() and JVM_DTraceXXX In-Reply-To: References: Message-ID: On Mon, 1 Feb 2021 18:40:54 GMT, Ioi Lam wrote: > - JVM_GetInterfaceVersion() was used by "HotSpot Express" (HSX) which allowed the same JDK library to use different version of HotSpot. However, HSX is no longer supported so this API should be removed. > - Implementations of APIs such as JVM_DTraceActivate, were removed in [JDK-8068976](https://bugs.openjdk.java.net/browse/JDK-8068976), so their declarations should be removed from jvm.h This pull request has now been integrated. Changeset: b9d4211b Author: Ioi Lam URL: https://git.openjdk.java.net/jdk/commit/b9d4211b Stats: 112 lines in 4 files changed: 0 ins; 110 del; 2 mod 8260193: Remove JVM_GetInterfaceVersion() and JVM_DTraceXXX Reviewed-by: alanb, lfoltan, gziemski, ihse ------------- PR: https://git.openjdk.java.net/jdk/pull/2338 From stuefe at openjdk.java.net Wed Feb 3 06:39:01 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 3 Feb 2021 06:39:01 GMT Subject: RFR: JDK-8260926: Trace resource exhausted events unconditionally [v2] In-Reply-To: <-LuicM5vRHrcchY00NJhTTgngnNKX3ZkbKZVM_pmHpE=.372c1b6a-120c-460d-a168-76f200408efd@github.com> References: <-LuicM5vRHrcchY00NJhTTgngnNKX3ZkbKZVM_pmHpE=.372c1b6a-120c-460d-a168-76f200408efd@github.com> Message-ID: > Analyzing out-of-resource situations in cloud scenarios is no fun. With CloudFoundry, a JVMTI agent (jvmkill) is hooked up intercepting the jvmti "resource exhausted" event, then attempts to write up a heap report. That may fail, e.g. due to bugs in the agent [1], but also because that report runs java code and may suffer from the same resource exhaustion. Successful or not, it unceremoniously kills the VM when done, often leaving us with no information about the actual resource. > > It would be very helpful if we had unconditional tracing here. We do have tracing, but it requires a non-product build and is triggered with TraceJVMTI. Also, it traces at trace level which is way to fine granular. > > I'd like to introduce another, unconditional trace line here. Arguably, resource exhausted is fatal enough that it justifies unconditional tracing. > > This is a bit of a coin toss. Tracing unconditionally would help in most scenarios, where it would be either difficult or even impossible to specify a trace command line switch. OTOH it may trip up scripts parsing the VM output, or some of our tests (which can be fixed). > > Thoughts? > > ..Thomas > > [1] https://github.com/cloudfoundry/jvmkill/issues/18 Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: Feedback David ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2350/files - new: https://git.openjdk.java.net/jdk/pull/2350/files/abe2bf60..40e3af87 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2350&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2350&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2350.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2350/head:pull/2350 PR: https://git.openjdk.java.net/jdk/pull/2350 From stuefe at openjdk.java.net Wed Feb 3 06:39:03 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 3 Feb 2021 06:39:03 GMT Subject: RFR: JDK-8260926: Trace resource exhausted events unconditionally [v2] In-Reply-To: <-ZUFjDD0KRceDLzTEHDuyfhPq47VOKzZAtPXECxwNMI=.a4b9069a-30a8-4aed-9625-94ee854531ba@github.com> References: <-LuicM5vRHrcchY00NJhTTgngnNKX3ZkbKZVM_pmHpE=.372c1b6a-120c-460d-a168-76f200408efd@github.com> <-ZUFjDD0KRceDLzTEHDuyfhPq47VOKzZAtPXECxwNMI=.a4b9069a-30a8-4aed-9625-94ee854531ba@github.com> Message-ID: On Wed, 3 Feb 2021 01:43:25 GMT, David Holmes wrote: > Hi Thomas, > > Approval in principle, but changes suggested. > > Thanks, > David Hi David, thanks for looking at this. I changed text and tag as requested. ..Thomas > src/hotspot/share/prims/jvmtiExport.cpp line 1509: > >> 1507: JavaThread *thread = JavaThread::current(); >> 1508: >> 1509: log_error(os)("Resource Exhausted (%s)", description != nullptr ? description : "no info"); > > Shouldn't that be `log_error(jvmti)`? > I'd also suggest the text "Posting Resource Exhausted event: %s" with "unknown" for a null description. Sure, no problem. ------------- PR: https://git.openjdk.java.net/jdk/pull/2350 From iklam at openjdk.java.net Wed Feb 3 06:40:04 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 3 Feb 2021 06:40:04 GMT Subject: RFR: 8260012: Reduce inclusion of collectedHeap.hpp and heapInspection.hpp [v2] In-Reply-To: References: Message-ID: > collectedHeap.hpp is included by 477 out of 1000 .o files in HotSpot. This file in turn includes many other complex header files. > > In many cases, an object file only directly includes this file via: > > - memAllocator.hpp (which does not actually use collectedHeap.hpp) > - oop.inline.hpp and compressedOops.inline.hpp (only use collectedHeap.hpp in asserts via `Universe::heap()->is_in()`). > > By refactoring the above 3 files, we can reduce the .o files that include collectedHeap.hpp to 242. > > This RFE also removes the unnecessary inclusion of heapInspection.hpp from collectedHeap.hpp. > > Build time of HotSpot is reduced for about 1%. > > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - @tschatzl and @stefank comments - Merge branch 'master' into 8260012-reduce-inclue-collectedHeap-heapInspection-hpp - 8260012: Reduce inclusion of collectedHeap.hpp and heapInspection.hpp ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2347/files - new: https://git.openjdk.java.net/jdk/pull/2347/files/a1bdc2f7..529e77e4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2347&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2347&range=00-01 Stats: 3635 lines in 268 files changed: 1458 ins; 983 del; 1194 mod Patch: https://git.openjdk.java.net/jdk/pull/2347.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2347/head:pull/2347 PR: https://git.openjdk.java.net/jdk/pull/2347 From iklam at openjdk.java.net Wed Feb 3 06:40:08 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 3 Feb 2021 06:40:08 GMT Subject: RFR: 8260012: Reduce inclusion of collectedHeap.hpp and heapInspection.hpp [v2] In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 12:09:22 GMT, Stefan Karlsson wrote: >> Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - @tschatzl and @stefank comments >> - Merge branch 'master' into 8260012-reduce-inclue-collectedHeap-heapInspection-hpp >> - 8260012: Reduce inclusion of collectedHeap.hpp and heapInspection.hpp > > src/hotspot/share/gc/shared/memAllocator.hpp line 30: > >> 28: #include "memory/memRegion.hpp" >> 29: #include "oops/oopsHierarchy.hpp" >> 30: #include "runtime/thread.hpp" > > If we want to, this could be changed to a forward declaration if we removed the default value (Thread* thread = Thread::current()) of the constructors. Not needed for this RFE though. memAllocator.hpp is not included very often (only 65 out of ~1000 .o files), so I decided to leave it as is. > src/hotspot/cpu/arm/frame_arm.cpp line 518: > >> 516: obj = *(oop*)res_addr; >> 517: } >> 518: assert(obj == NULL || Universe::is_in_heap(obj), "sanity check"); > > Could have been changed to is_in_heap_or_null. Fixed > src/hotspot/cpu/ppc/frame_ppc.cpp line 308: > >> 306: case T_ARRAY: { >> 307: oop obj = *(oop*)tos_addr; >> 308: assert(obj == NULL || Universe::is_in_heap(obj), "sanity check"); > > Could have been changed to is_in_heap_or_null. Fixed. I also change other frame_.cpp files to use is_in_heap_or_null. ------------- PR: https://git.openjdk.java.net/jdk/pull/2347 From iklam at openjdk.java.net Wed Feb 3 06:40:11 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 3 Feb 2021 06:40:11 GMT Subject: RFR: 8260012: Reduce inclusion of collectedHeap.hpp and heapInspection.hpp [v2] In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 12:22:50 GMT, Thomas Schatzl wrote: >> Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - @tschatzl and @stefank comments >> - Merge branch 'master' into 8260012-reduce-inclue-collectedHeap-heapInspection-hpp >> - 8260012: Reduce inclusion of collectedHeap.hpp and heapInspection.hpp > > src/hotspot/share/gc/shared/memAllocator.hpp line 30: > >> 28: #include "memory/memRegion.hpp" >> 29: #include "oops/oopsHierarchy.hpp" >> 30: #include "runtime/thread.hpp" > > `utilities/globalDefinitions.hpp` for `HeapWord` is missing. Fixed. > src/hotspot/share/oops/compressedOops.inline.hpp line 28: > >> 26: #define SHARE_OOPS_COMPRESSEDOOPS_INLINE_HPP >> 27: >> 28: #include "gc/shared/collectedHeap.hpp" > > `utilities/globalDefinitions.hpp` for `*PTR_FORMAT` and others is missing. Fixed. > src/hotspot/share/oops/oop.inline.hpp line 28: > >> 26: #define SHARE_OOPS_OOP_INLINE_HPP >> 27: >> 28: #include "gc/shared/collectedHeap.hpp" > > `utilities/globalDefinitions.hpp` for `HeapWord` is missing. > `globals.hpp` for some globals. > `oopsHierarchy.hpp` for `narrowKlass` > `utilties/debug.hpp` for `assert` Fixed. Thanks for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/2347 From stuefe at openjdk.java.net Wed Feb 3 07:00:47 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 3 Feb 2021 07:00:47 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v2] In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 00:42:39 GMT, David Holmes wrote: >> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Make SavedSignalHandlers use C-heap for its items >> - Removed display-replaced-handler-logic >> - Feedback David >> - Merge >> - JDK-8260485-signal-handler-improvements > > src/hotspot/os/posix/signals_posix.cpp line 842: > >> 840: if (sig == SHUTDOWN2_SIGNAL && !isatty(fileno(stdin))) { >> 841: tty->print_cr("Running in non-interactive shell, %s handler is replaced by shell", >> 842: os::exception_name(sig, buf, O_BUFLEN)); // When comparing, ignore the SA_RESTORER flag on Linux > > What does this comment mean in this context ??? Nothing. This, like the "flags" comment before, must have been accidental paste errors. I went through the patch again, but I hope this was the last one. ------------- PR: https://git.openjdk.java.net/jdk/pull/2251 From ngasson at openjdk.java.net Wed Feb 3 07:02:40 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Wed, 3 Feb 2021 07:02:40 GMT Subject: RFR: 8260355: AArch64: deoptimization stub should save vector registers [v4] In-Reply-To: References: <5PbujtOhWB7uqNXu5vRMwYGNMNU78aLAkGpDLWUnQKM=.9d2206fe-9c15-424c-b08e-80eab468df2a@github.com> <8uV_aS99ZXLKzfeqP9PnJOMqLDLqqDBAXgY1kShysUE=.458c9338-e47a-4371-aac7-8fe096ef19c4@github.com> Message-ID: On Tue, 2 Feb 2021 11:08:46 GMT, Vladimir Ivanov wrote: >> Nick Gasson has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments > > `RegisterMap`-related changes look good. @theRealAph are the sharedRuntime_aarch64.cpp changes ok? ------------- PR: https://git.openjdk.java.net/jdk/pull/2279 From stuefe at openjdk.java.net Wed Feb 3 07:06:50 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 3 Feb 2021 07:06:50 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v2] In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 00:54:23 GMT, David Holmes wrote: >> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Make SavedSignalHandlers use C-heap for its items >> - Removed display-replaced-handler-logic >> - Feedback David >> - Merge >> - JDK-8260485-signal-handler-improvements > > src/hotspot/os/posix/signals_posix.cpp line 1392: > >> 1390: >> 1391: // Print established signal handler for this signal. >> 1392: // - if this signal handler was installed by us and is chained to a pre-established user handler > > It is not clear to me that this check still remains somewhere. I removed this check since I did not think it very useful. We have the checking code with Xcheck:jni, which does the same check. But I am now reconsidering. If we run without Xcheck, it still is useful. I'll re-add this test. ------------- PR: https://git.openjdk.java.net/jdk/pull/2251 From kvn at openjdk.java.net Wed Feb 3 07:19:39 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 3 Feb 2021 07:19:39 GMT Subject: RFR: 8260301: misc gc/g1/unloading tests fails with "RuntimeException: Method could not be enqueued for compilation at level N" In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 03:15:16 GMT, Dean Long wrote: >> On return WB wait to acquire Compile_lock before checking compilation status >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/whitebox.cpp#L988 >> >> This lock is used by ciEnv for compiled code publishing: >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/ci/ciEnv.cpp#L981 >> >> So while WB waits the lock compiler thread can finish compilation, register nmethod and clear method's queued_for_compilation bit. >> >> The problem is that WB check `nm` value (compiled code) which it got before the lock and when method compilation is not finished. >> >> The fix is to check compiled code again similar to check in CompileBroker: >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/compileBroker.cpp#L1501 >> >> Passed hs-tier1-4 testing and 100 x vmTestbase/gc/g1/unloading/tests/unloading_compilation_*. > > Looks good. Thank you, Dean ------------- PR: https://git.openjdk.java.net/jdk/pull/2356 From stuefe at openjdk.java.net Wed Feb 3 07:57:49 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 3 Feb 2021 07:57:49 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v2] In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 00:57:07 GMT, David Holmes wrote: >> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Make SavedSignalHandlers use C-heap for its items >> - Removed display-replaced-handler-logic >> - Feedback David >> - Merge >> - JDK-8260485-signal-handler-improvements > > src/hotspot/os/posix/vmError_posix.cpp line 60: > >> 58: } >> 59: >> 60: void VMError::interrupt_reporting_thread() { > > I'm not clear what happened to these ?? (I'm not clear how they were used in the first place. :( ) I think they were used for path (3). When printing signal handlers, we do this little test to check - similar to what we do in Xcheck:jni - to print out if someone changed the handler under us. See: https://github.com/openjdk/jdk/blob/cb127a4bb5d95d19eb1f5e625b600311a2490135/src/hotspot/os/posix/signals_posix.cpp#L1372 and https://github.com/openjdk/jdk/blob/cb127a4bb5d95d19eb1f5e625b600311a2490135/src/hotspot/os/posix/signals_posix.cpp#L1389 . But when printing signal handlers as part of a hs-err file, we replaced the hotspot signal handlers with the secondary crash handler and would trigger this test. I think this whole logic is needed just to prevent this. ------------- PR: https://git.openjdk.java.net/jdk/pull/2251 From aph at openjdk.java.net Wed Feb 3 09:10:47 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 3 Feb 2021 09:10:47 GMT Subject: [jdk16] RFR: 8260585: AArch64: Wrong code generated for shifting right and accumulating four unsigned short integers [v7] In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 02:04:03 GMT, Dong Bo wrote: >> This is a typo introduced by JDK-8255949. >> Compiler will generate `ushr` for shifting right and accumulating four short integers. >> It produces wrong results for specific case. The instruction should be `usra`. > > Dong Bo has updated the pull request incrementally with one additional commit since the last revision: > > match ssra with 8B OK, thanks. ------------- Marked as reviewed by aph (Reviewer). PR: https://git.openjdk.java.net/jdk16/pull/136 From stuefe at openjdk.java.net Wed Feb 3 09:15:12 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 3 Feb 2021 09:15:12 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v3] In-Reply-To: References: Message-ID: > In signal handling code, we have code sections which save signal handler state into vectors of sigaction structures, or of integers (if only flags are saved). All these code sections can be unified, disentangled and the using code simplified. > > There are three places where we do this: > > 1) When installing hotspot signal handlers, should we find a handler in place and signal chaining is enabled, we save the original handler inside a sigaction array and a corresponding sigset: > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L85 > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L338 > > 2) if diagnostics are enabled with -Xcheck:jni, we periodically check if our hotspot signal handlers had been replaced (`static void check_signal_handler(int sig)`): > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L766 > To do that, we store information about the handlers we installed and we expect to be intact; in this case we only store the sigaction flags (`int sigflags[NSIG];`) and deduce the handler address from context. > > 3) There is a complicated dance between VMError and the posix signal handler code: If a fatal error happens, we enter error reporting and install the secondary handler (`VMError::install_secondary_signal_handler()`). Before doing that, we store the handler we replace in yet another array, in this case one array for the handler address, one for the flag: > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/vmError_posix.cpp#L77 > I believe the purpose of this is to - when printing signal handlers as part of error reporting - print the original signal handler instead of the secondary crash handler (see `PosixSignals::print_signal_handler()`): > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L1372 > and additionally to not trip this warning here: > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L1391 > > ------ > > Changes in this patch: > > - I added some convenience macros to check if a handler matches a given function (HANDLER_IS), check if a handler is set to ignore or default or both (HANDLER_IS_IGN, HANDLER_IS_DFL, HANDLER_IS_IGN_OR_DFL). Makes code more readable. > - I added convenience class `SavedSignalHandlers` to keep a vector of handler information by signal number. > - I used that class to cover cases (1)..(3): > - `chained_handlers` contains all information of chained handlers > - `expected_handlers` contains a copy of the handlers the hotspot installed > - `replaced_handlers` contains information about replaced handlers > > - about (1): I store the chained signal handler information in `chained_handlers` when installing a hotspot handler, UseSignalChaining is 1, and a non-default handler was encountered. > > - about (2): I simplified the signal checking mechanism quite a bit: it compares the handler (address and flags) it finds present with expectations. Before this patch, the expected handler address was deduced in a hard-wired way, now, we just compare the active sigaction structure with the one we installed on VM start. > > - about (3): when installing any handler (hotspot as well as user defined via java), I store the handler it replaced in `replaced_handlers`. I use that to print which handler had been replaced in `PosixSignals::print_signal_handler`. I simplified `PosixSignals::print_signal_handler` such that it does not retain any knowledge about hotspot signal handlers. Now, it just prints out the currently established handlers. In addition to that, it prints out chaining information and which handlers had been replaced. I removed the associated coding from VMError. > > Output Before: > 663 Signal Handlers: > 664 SIGSEGV: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 665 SIGBUS: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 666 SIGFPE: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 667 SIGPIPE: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 668 SIGXFSZ: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 669 SIGILL: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 670 SIGUSR2: SR_handler in libjvm.so, sa_mask[0]=00000000000000000000000000000000, sa_flags=SA_RESTART|SA_SIGINFO > 671 SIGHUP: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 672 SIGINT: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 673 SIGTERM: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 674 SIGQUIT: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 675 SIGTRAP: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > > Now: > Signal Handlers: > SIGSEGV: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > replaced: SIGSEGV: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGBUS: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > replaced: SIGBUS: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGFPE: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > replaced: SIGFPE: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGPIPE: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGXFSZ: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGILL: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > replaced: SIGILL: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGUSR2: SR_handler in libjvm.so, mask=00000000000000000000000000000000, flags=SA_RESTART|SA_SIGINFO > SIGHUP: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGINT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGTERM: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGQUIT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGTRAP: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > > ----- > Tests: GA, and the patch has been tested in our nighlies for over a month now. I manually executed the runtime/jni/checked tests too. Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: David Feedback ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2251/files - new: https://git.openjdk.java.net/jdk/pull/2251/files/f8deb8a7..44fa2199 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2251&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2251&range=01-02 Stats: 27 lines in 3 files changed: 23 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/2251.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2251/head:pull/2251 PR: https://git.openjdk.java.net/jdk/pull/2251 From stuefe at openjdk.java.net Wed Feb 3 09:17:49 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 3 Feb 2021 09:17:49 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v2] In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 00:58:11 GMT, David Holmes wrote: >> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Make SavedSignalHandlers use C-heap for its items >> - Removed display-replaced-handler-logic >> - Feedback David >> - Merge >> - JDK-8260485-signal-handler-improvements > > Hi Thomas, > > Still a few minor comments and queries, but overall it is looking good. > > Thanks, > David I added a printout to print_signal_handler to print out a message if the expected handler changed. This is the same logic, somewhat abridged, applied at Xcheck:jni. Only here, it fires whenever we print signal handlers independently from Xcheck:jni. So it gets also displayed in hs-err files or as part of a VM.info jcmd. ------------- PR: https://git.openjdk.java.net/jdk/pull/2251 From aph at openjdk.java.net Wed Feb 3 09:25:52 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 3 Feb 2021 09:25:52 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 18:03:50 GMT, Gerard Ziemski wrote: >> Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> support macos_aarch64 in hsdis > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5271: > >> 5269: // >> 5270: void MacroAssembler::get_thread(Register dst) { >> 5271: RegSet saved_regs = RegSet::range(r0, r1) + BSD_ONLY(RegSet::range(r2, r17)) + lr - dst; > > The comment needs to be updated, since on BSD we also seem to clobber r2,r17 ? Surely this should be saved_regs = RegSet::range(r0, r1) BSD_ONLY(+ RegSet::range(r2, r17)) + lr - dst;``` Shouldn't it? ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From aph at openjdk.java.net Wed Feb 3 09:25:51 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 3 Feb 2021 09:25:51 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 21:49:36 GMT, Daniel D. Daugherty wrote: >> Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> support macos_aarch64 in hsdis > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 323: > >> 321: str(zr, Address(rthread, JavaThread::last_Java_pc_offset())); >> 322: >> 323: str(zr, Address(rthread, JavaFrameAnchor::saved_fp_address_offset())); > > I don't think this switch from `JavaThread::saved_fp_address_offset()` > to `JavaFrameAnchor::saved_fp_address_offset()` is correct since > `rthread` is still used and is a JavaThread*. The new code will give you: > > `rthread` + offset of the `saved_fp_address` field in a JavaFrameAnchor > > The old code gave you: > > `rthread` + offset of the `saved_fp_address` field in the JavaFrameAnchor field in the JavaThread > > Those are not the same things. I agree, I don't understand why this change was made. > src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.hpp line 45: > >> 43: // Atomically copy 64 bits of data >> 44: static void atomic_copy64(const volatile void *src, volatile void *dst) { >> 45: *(jlong *) dst = *(const jlong *) src; > > Is this construct actually atomic on aarch64? Yes. > src/hotspot/os_cpu/windows_aarch64/os_windows_aarch64.hpp line 37: > >> 35: >> 36: private: >> 37: > > 'private' usually has one space in front of it. > Also, why the blank line after it? It reads better with the blank line, and it's not in violation of HotSpot conventions. ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From dongbo at openjdk.java.net Wed Feb 3 09:31:45 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Wed, 3 Feb 2021 09:31:45 GMT Subject: [jdk16] RFR: 8260585: AArch64: Wrong code generated for shifting right and accumulating four unsigned short integers [v7] In-Reply-To: References: Message-ID: <9WisRAG9qBk4FL87nQy3kNCLUyhKanVfnc_ZY2ZxkB8=.abf07896-4b00-4769-85b4-670d88f25aa3@github.com> On Wed, 3 Feb 2021 09:07:36 GMT, Andrew Haley wrote: >> Dong Bo has updated the pull request incrementally with one additional commit since the last revision: >> >> match ssra with 8B > > OK, thanks. Thank you all for the review. ------------- PR: https://git.openjdk.java.net/jdk16/pull/136 From aph at redhat.com Wed Feb 3 09:36:27 2021 From: aph at redhat.com (Andrew Haley) Date: Wed, 3 Feb 2021 09:36:27 +0000 Subject: RFR: 8260355: AArch64: deoptimization stub should save vector registers [v4] In-Reply-To: References: <5PbujtOhWB7uqNXu5vRMwYGNMNU78aLAkGpDLWUnQKM=.9d2206fe-9c15-424c-b08e-80eab468df2a@github.com> <8uV_aS99ZXLKzfeqP9PnJOMqLDLqqDBAXgY1kShysUE=.458c9338-e47a-4371-aac7-8fe096ef19c4@github.com> Message-ID: On 2/3/21 7:02 AM, Nick Gasson wrote: > On Tue, 2 Feb 2021 11:08:46 GMT, Vladimir Ivanov wrote: > >>> Nick Gasson has updated the pull request incrementally with one additional commit since the last revision: >>> >>> Review comments >> >> `RegisterMap`-related changes look good. > > @theRealAph are the sharedRuntime_aarch64.cpp changes ok? I guess so, but the code changes are so complex and delicate it's extremely hard to tell. What have you done about stress testing? I guess we need some code that's repeatedly deoptimized and recompiled millions of times, with continuous testing. I guess that in order to make sure nothing has regressed, a bootstrap with DeoptimizeALot would help gain some confidence. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From tschatzl at openjdk.java.net Wed Feb 3 10:09:52 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 3 Feb 2021 10:09:52 GMT Subject: RFR: 8234534: Simplify CardTable code after CMS removal In-Reply-To: References: Message-ID: <47QCVJeDPnsUak4dH0LXGJxDmqutyQeY91MIcPwyi-Q=.19b4ed7a-750a-47a8-aee0-76427c1752cf@github.com> On Tue, 2 Feb 2021 15:13:38 GMT, Thomas Schatzl wrote: > Hi, > > can I have reviews for this cleanup that removes CMS specific code from `CardTable/CardTableRS`? > > Note that there is still this "conc_scan" parameter passed to the card table that affects barrier code generation, for some reason also G1 barrier code generation although it should not as `G1CardTable::scanned_concurrently()` only used for the "normal" card table. Initial attempts showed that removing this is not straightforward, causing crashes and so I left it out for [JDK-8250941](https://bugs.openjdk.java.net/browse/JDK-8260941) so that this change is solely about removing unused code. > > Testing: tier1-4, some tier1-5 runs earlier (before some removal of hunks for files only containing copyright updates or newline changes) (latest tier1-4 testing still stuck on linux-aarch64, but everything else passed. I think there is no particular aarch64 specific change in there...) ------------- PR: https://git.openjdk.java.net/jdk/pull/2354 From tschatzl at openjdk.java.net Wed Feb 3 10:09:52 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 3 Feb 2021 10:09:52 GMT Subject: RFR: 8234534: Simplify CardTable code after CMS removal Message-ID: Hi, can I have reviews for this cleanup that removes CMS specific code from `CardTable/CardTableRS`? Note that there is still this "conc_scan" parameter passed to the card table that affects barrier code generation, for some reason also G1 barrier code generation although it should not as `G1CardTable::scanned_concurrently()` only used for the "normal" card table. Initial attempts showed that removing this is not straightforward, causing crashes and so I left it out for [JDK-8250941](https://bugs.openjdk.java.net/browse/JDK-8260941) so that this change is solely about removing unused code. Testing: tier1-4, some tier1-5 runs earlier (before some removal of hunks for files only containing copyright updates or newline changes) ------------- Commit messages: - Initial commit Changes: https://git.openjdk.java.net/jdk/pull/2354/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2354&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8234534 Stats: 197 lines in 7 files changed: 0 ins; 185 del; 12 mod Patch: https://git.openjdk.java.net/jdk/pull/2354.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2354/head:pull/2354 PR: https://git.openjdk.java.net/jdk/pull/2354 From tschatzl at openjdk.java.net Wed Feb 3 10:31:44 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 3 Feb 2021 10:31:44 GMT Subject: RFR: 8260012: Reduce inclusion of collectedHeap.hpp and heapInspection.hpp [v2] In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 12:30:51 GMT, Thomas Schatzl wrote: >> Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - @tschatzl and @stefank comments >> - Merge branch 'master' into 8260012-reduce-inclue-collectedHeap-heapInspection-hpp >> - 8260012: Reduce inclusion of collectedHeap.hpp and heapInspection.hpp > > Checked a few includes for missing ones; obviously they are included transitively so add as you see fit. Still good. ------------- PR: https://git.openjdk.java.net/jdk/pull/2347 From redestad at openjdk.java.net Wed Feb 3 12:26:51 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Wed, 3 Feb 2021 12:26:51 GMT Subject: RFR: 8261031: Move some ClassLoader name checking to native/VM Message-ID: <3fZUkpucpgdhZyyWDQ7Hp1oKthgl1ckXBq942wMNwxI=.7a3db0ca-03c0-44f9-ade9-3b4443cc6666@github.com> This patch moves some sanity checking done in ClassLoader.java to the corresponding endpoints in native or VM code. ------------- Commit messages: - Copyrights - Move class name checking for findBootstrapClass/findLoadedClass into native/VM Changes: https://git.openjdk.java.net/jdk/pull/2378/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2378&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261031 Stats: 20 lines in 3 files changed: 11 ins; 4 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/2378.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2378/head:pull/2378 PR: https://git.openjdk.java.net/jdk/pull/2378 From stuefe at openjdk.java.net Wed Feb 3 16:21:48 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 3 Feb 2021 16:21:48 GMT Subject: RFR: 8259643: ZGC can return metaspace OOM prematurely [v2] In-Reply-To: References: Message-ID: On Thu, 28 Jan 2021 13:24:10 GMT, Erik ?sterlund wrote: >> There exists a race condition for ZGC metaspace allocations, where an allocation can throw OOM due to unbounded starvation from other threads. Towards the end of the allocation dance, we conceptually do this: >> >> 1. full_gc() >> 2. final_allocation_attempt() >> >> And if we still fail at 2 after doing a full GC, we conclude that there isn't enough metaspace memory. However, if the thread gets preempted between 1 and 2, then an unbounded number of metaspace allocations from other threads can fill up the entire metaspace, making the final allocation attempt fail and hence throw. This can cause a situation where almost the entire metaspace is unreachable from roots, yet we throw OOM. I managed to reproduce this with the right sleeps. >> >> The way we deal with this particular issue for heap allocations, is to have an allocation request queue, and satisfy those allocations before others, preventing starvation. My solution to this metaspace OOM problem will be to basically do exactly that - have a queue of "critical" allocations, that get precedence over normal metaspace allocations. >> >> The solution should work for other concurrent GCs (who likely have the same issue), but I only tried this with ZGC, so I am only hooking in ZGC to the new API (for concurrently unloading GCs to manage critical metaspace allocations) at this point. >> >> Passes ZGC tests from tier 1-5, and the particular test that failed (with the JVM sleeps that make it fail deterministically). > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > polish code alignment and rename register/unregister to add/remove Hi Erik, sorry for the delay, I got swamped. thanks for your patient explanations. I think I understand most of it now, but I still have a number of questions. Also see the code remarks, though most are just more questions. > > One issue with your patch just came to me: the block-on-allocate may be too early. `Metaspace::allocate()` is a bit hot. I wonder about the performance impact of pulling and releasing a lock on each atomar allocation, even if its uncontended. Ideally I'd like to keep this path as close to a simple pointer bump allocation as possible (which it isn't unfortunately). > > I have a global flag that denotes there being at least 1 critical allocation in the system. It is set by the first critical allocation, and cleared by the GC if all critical allocations could be satisfied. The global lock is only taken in Metaspace::allocate() if said flag is on. Normal apps should never see this flag being on. So the only overhead in the common case is to check the flag and see that no locking is required. I think that should be fast enough, right? And when you are in the critical mode, you obviously have more to worry about than lock contention, in terms of how the system is performing. I overlooked the flag check. I think this is fine then in its current form. > > > > I think I get this now. IIUC we have the problem that memory release happens delayed, and we carry around a baggage of "potentially free" memory which needs a collector run to materialize. So many threads jumping up and down and doing class loading and unloading drive up metaspace use rate and increase the "potentially free" overhead, right? So the thing is to time collector runs right. > > Exactly. > > > One concern I have is that if the customer runs with too tight a limit, we may bounce from full GC to full GC. Always scraping the barrel enough to just keep going - maybe collecting some short lived loaders - but not enough to get the VM into clear waters. I think this may be an issue today already. What is unclear to me is when it would be just better to give up and throw an OOM. To motivate the customer to increase the limits. > > Right - this is a bit of a philosophical one. There is always a balance there between precision, code complexity, and when to put the VM out of its misery when it is performing poorly. We deal with the same trade-off really with heap allocations, which is why I am also solving the starvation problem in pretty much the same way: with a queue satisfied by the GC, and locking out starvation. Then we throw OOM in fairly similar conditions. What they have in common is that when they throw, we will have a live portion of metaspace that is "pretty much" full, and there is no point in continuing, while allowing unfortunate timings on a less full (in terms of temporal liveness) metaspace to always succeed. > > One might argue that the trade-off should be moved in some direction, and that it is okay for it to be more or less exact, but I was hoping that by doing the same dance that heap OOM situations do, we can at least follow a trade-off that is pretty well established and has worked pretty well for heap OOM situations for many years. And I think heap OOM situations in the wild are far more common than metaspace OOM, so I don't think that the metaspace OOM mechanism needs to do better than what the heap OOM mechanism does. If that makes sense. It makes sense. If its an established proven pattern lets use it here too. Its not that complex. I think there are differences between heap and metaspace in elasticity. Metaspace is way less "spongy", chance of recovering metaspace is slimmer than with the heap, so a Full GC is more expensive in relation to its effects. I think I have seen series of Full GCs on quite empty heaps when customers set MaxMetaspaceSize too low (people seem to like doing that). I'm worried about small loaders with short lifetimes which just allow the VM to reclaim enough to keep going. Reflection invocation loaders, or projects like jruby. But I think this is not the norm, nor does it have anything to do with your patch. Typically we do just one futile GC and then throw OOM. > > > > Why not just cover the whole synchronous GC collect call? I'd put that barrier up as early as possible, to prevent as many threads as possible from entering the more expensive fail path. At that point we know we are near exhaustion. Any thread allocating could just as well wait inside MetaspaceArena::allocate. If the GC succeeds in releasing lots of memory, they will not have been disturbed much. > > Do you mean > > 1. why I don't hold the MetaspaceCritical_lock across the collect() call at the mutator side of the code, or > 2. why I don't hold the MetaspaceCritical_lock across the entire GC cycle instead of purge? > > I think you meant 2), so will answer that: > a) Keeping the lock across the entire GC cycle is rather problematic when it comes to constraining lock ranks. It would have to be above any lock ever needed in an entire GC cycle, yet we take a bunch of other locks that mutators hold at different point, interacting with class unloading. It would be very hard to find the right rank for this. > b) The GC goes in and out of safepoints, and needs to sometimes block out safepoints. Holding locks in and out of safepoints while blocking in and out safepoints, is in my experience rather deadlock prone. > c) There is no need to keep the lock across more than the operation that frees metaspace memory, which in a concurrent GC always happens when safepoints are blocked out. If a mutator succeeds during a concurrent GC due to finding memory in a local free list or something, while another allocation failed and needs a critical allocation, then that is absolutely fine, as the successful allocation is never what causes the failing allocation to fail. Thanks, that makes sense. I was completely offtrack, thinking more in direction of (1), wrt the current coding, not your patch. I thought if the mutator thread locks across the collect call and the subsequent allocation attempt (which would have to be some sort of priority allocation, ignoring the lock) this would be a simple solution. But I think that has a number of holes, never mind the allocation request ordering. > > > > > Why do we even need a queue? Why could we not just let the first thread attempting a synchronous gc block metaspace allocation path for all threads, including others running into a limit, until the gc is finished and it had its first-allocation-right served? > > > > > > > > > Each "critical" allocation rides on one particular GC cycle, that denotes the make-or-break point of the allocation. > > > > > > I feel like I should know this, but if multiple threads enter satisfy_failed_metadata_allocation around the same time and call a synchronous collect(), they would wait on the same GC, right? They won't start individual GCs for each thread? > > The rule we have in ZGC to ensure what we want in terms of OOM situations, is that GCs that are _requested_ before an equivalent GC _starts_, can be satisfied with the same GC cycle. > > Let's go through what will likely happen in practice with my solution when you have, let's say 1000 concurrent calls to satisfy a failing metaspace allocation. > > 1. Thread 1 registers its allocation, sees it is the first one, and starts a metaspace GC. > 2. GC starts running > 3. Threads 2-999 register their allocations, and see that there was already a critical allocation before them. This causes them to wait for the GC to purge, opportunistically, riding on that first GC. > 4. The GC satisfies allocations. For the sake of example, let's say that allocations 1-500 could be satisfied, but not the rest. > 5. Threads 2-500 who were waiting for purge to finish, wake up, and run off happily with their satisfied allocations. > 6. Threads 501-1000 wake up seeing that their allocations didn't get satisfied. They now stop being opportunistic, and request a gc each before finally giving up. > 7. The first GC cycle finishes. > 8. Thread 1 wakes up after the entire first GC cycle is done and sees its satisfied allocation, happily running off with it. > 9. The next GC cycle starts > 10. The next GC cycle successfully satisfies metadata allocations for threads 501-750, but not the rest. > 11. The entire next GC cycle finishes, satisfying the GC requested by threads 2-1000, as they all _requested_ a metaspace GC, _before_ it started running. Therefore, no further GC will run. > 12. Threads 501-750 wake up, happily running off with their satisfied allocations. > 13. Threads 751-1000 wake up, grumpy about the lack of memory after their GC. They are all gonna throw. > > So basically, if they can all get satisfied with 1 GC, then 1 GC will be enough. But we won't throw until a thread has had a full GC run _after_ it was requested, but multiple threads can ride on the same GC there. In this example, threads 2-1000 all ride on the same GC. This is a nice elegant solution. I had some trouble understanding your explanation: When you write "threads 2-1000 all ride on the same GC" this confused me since threads 2-500 were lucky and were satisfied by the purge in GC cycle 1. So I would have expected threads 501-1000 to ride on the second GC, thread 1-500 on the first. Unless with "ride" you mean "guaranteed to be processed"? > > Note though, that we would never allow a GC that is already running to satisfy a GC request that comes in while the GC is already running, as we then wouldn't catch situations when a thread releases a lot of memory, and then expects it to be available just after. > > > > In order to prevent starvation, we have to satisfy all critical allocations who have their make-or-break GC cycle associated with the current purge() operation, before we release the lock in purge(), letting new allocations in, or we will rely on luck again. However, among the pending critical allocations, they will all possibly have different make-or-break GC cycles associated with them. So in purge() some of them need to be satisfied, and others do not, yet can happily get their allocations satisfied opportunistically if possible. So we need to make sure they are ordered somehow, such that the earliest arriving pending critical allocations are satisfied first, before subsequent critical allocations (possibly waiting for a later GC), or we can get premature OOM situations again, where a thread releases a bunch of memory, expecting to be able to allocate, yet fails due to races with various threads. > > > The queue basically ensures the ordering of critical allocation satisfaction is sound, so that the pending critical allocations with the associated make-or-break GC being the one running purge(), are satisfied first, before satisfying (opportunistically) other critical allocations, that are really waiting for the next GC to happen. > > > > > > I still don't get why the order of the critical allocations matter. I understand that even with your barrier, multiple threads can fail the initial allocation, enter the "satisfy_failed_metadata_allocation()" path, and now their allocation count as critical since if they fail again they will throw an OOM. But why does the order of the critical allocations among themselves matter? Why not just let the critical allocations trickle out unordered? Is the relation to the GC cycle not arbitrary? > > If we don't preserve the order, we would miss situations when a thread releases a large chunk of metaspace (e.g. by releasing a class loader reference), and then expects that memory to be available. An allocation from a critical allocation that is associated with a subsequent GC, could starve a thread that is associated with the current GC cycle, hence causing a premature OOM for that thread, while not really needing that allocation until next GC cycle, while doing it in the right order would satisfy both allocations. > > One might argue it might be okay to throw in such a scenario with an unordered solution. We are pretty close to running out of memory anyway I guess. But I'm not really sure what such a solution would look like in more detail, and thought writing this little list was easy enough, and for me easier to reason about, partially because we do the same dance in the GC to deal with heap OOM, which has been a success. I think I get this now. Its still a brain teaser if you are not used to the pattern, but I think I got most of it. If the pattern is well established and proofed it makes sense. Thanks for the great explanations! > > Thanks, > /Erik > > ^--- I know this will be treated as a PR bot command, but I can't give up on my slash! > Alias it to integrate :) > > > Just FYI, I have very vague plans to extend usage of the metaspace allocator to other areas. To get back the cost of implementation. Eg one candidate may be replacement of the current resource areas, which are just more primitive arena based allocators. This is very vague and not a high priority, but I am a bit interested in keeping the coding generic if its not too much effort. But I don't see your patch causing any problems there. > > That sounds very interesting. We had internal discussions about riding on the MetaspaceExpand lock which I believe would also work, but thought this really ought to be a separate thing so that we don't tie ourselves too tight to the internal implementation of the allocator. Given that you might want to use the allocator for other stuff, it sounds like that is indeed the right choice. Yes, I think this is definitely better. The expand lock (I renamed it to be just the Metaspace_lock since its also used for reclamation and other stuff) is used in a more fine granular fashion. I cannot see it working in the same way as the critical lock, protecting the queue and preventing entry for non-priority allocation. Thanks, Thomas src/hotspot/share/memory/metaspaceCriticalAllocation.cpp line 37: > 35: ClassLoaderData* _loader_data; > 36: size_t _word_size; > 37: Metaspace::MetadataType _type; These could be const, right? ptr const in the case of the CLD. src/hotspot/share/memory/metaspaceCriticalAllocation.cpp line 40: > 38: MetadataAllocationRequest* _next; > 39: MetaWord* _result; > 40: bool _has_result; I think `_has_result` is confusing, since I first mistook it for `_result != NULL` . Suggestion: `_processed` or `_handled`? src/hotspot/share/memory/metaspaceCriticalAllocation.cpp line 170: > 168: return request.result(); > 169: } > 170: I don't fully understand yet how in the second GC cycles the leftover critical allocations from the first cycles are handled: 100 threads allocate, run into the limit, enter MetaspaceCriticalAllocation::allocate, each create a request object on their stacks, which are linked and now they are all queued. 1) t1 (passing thru try_allocate_critical) calls collect and waits. 2) t2-t99 enter try_allocate_critical, enter wait loop in wait_for_purge() 3) GC locks critical lock, frees up metaspace, calls MetaspaceCriticalAllocation::satisfy(), attempts to allocate for all queued requests, in the order they came in since thats the queue. Satisfies lets say t1 and t2, but t3-99 fail to allocate. But all requests are marked as "have result" though. GC unlocks critical lock 4) t1 gets control and return != NULL 5) t2 gets control, try_allocate_critical() returns true, return != NULL 6) t3 gets control, try_allocate_critical() return false, now he is the second GC instigator. Calls collect and waits. 7) t4 gets control, try_allocate_critical() returns false, also calls collect? 8) t5..99 too? Why do we not keep t4-t99 in wait_for_purge()? They all now wait directly on the collect call? Also, any critical allocation not satisfied by this second GC cycle now is considered failing? Or is that the point? Edit: wait, I did not read your description closely. This is case (6) in your list. Have I got that right? src/hotspot/share/memory/metaspaceCriticalAllocation.cpp line 150: > 148: MetaWord* result = curr->loader_data()->metaspace_non_null()->allocate(curr->word_size(), curr->type()); > 149: if (result == NULL) { > 150: result = curr->loader_data()->metaspace_non_null()->expand_and_allocate(curr->word_size(), curr->type()); I am not sure it makes sense retrying to expand here. The HWM at this point must be fully extended, since the hard limit is the problem, not the HWM. HWM may get reduced at the end of the GC, eg ZUnload::finish, but this is yet to come at this point, right? src/hotspot/share/memory/metaspaceCriticalAllocation.cpp line 171: > 169: } > 170: > 171: // Always perform a synchronous full GC before bailing Can we have a comment here stating that this stalls the calling thread? Its only obvious if you know the collector details. src/hotspot/share/memory/metaspaceCriticalAllocation.cpp line 126: > 124: > 125: void MetaspaceCriticalAllocation::wait_for_purge(MetadataAllocationRequest* request) { > 126: while (!request->has_result()) { I'm unclear on how we could leave `wait` and still not be handled here. Since the lock is only notified at the end of Metaspace::purge, which itself is under critical lock protection, which must mean this request was queued before Metaspace::purge started. src/hotspot/share/memory/metaspaceCriticalAllocation.cpp line 143: > 141: bool all_satisfied = true; > 142: for (MetadataAllocationRequest* curr = _requests_head; curr != NULL; curr = curr->next()) { > 143: if (curr->result() != NULL) { So this means the thread had no occasion to continue and leave wait_for_purge() until the second GC ran? src/hotspot/share/memory/metaspaceCriticalAllocation.cpp line 114: > 112: } > 113: > 114: bool MetaspaceCriticalAllocation::try_allocate_critical(MetadataAllocationRequest* request) { I think this is a bit of a misnomer. Since we do nothing here but wait for the purge. Call it `wait_for_purge` maybe, and merge the current `wait_for_purge` function into this function (since its only called from here and quite small)? ------------- PR: https://git.openjdk.java.net/jdk/pull/2289 From iignatyev at openjdk.java.net Wed Feb 3 16:30:41 2021 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Wed, 3 Feb 2021 16:30:41 GMT Subject: RFR: 8260301: misc gc/g1/unloading tests fails with "RuntimeException: Method could not be enqueued for compilation at level N" In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 15:44:16 GMT, Vladimir Kozlov wrote: > On return WB wait to acquire Compile_lock before checking compilation status > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/whitebox.cpp#L988 > > This lock is used by ciEnv for compiled code publishing: > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/ci/ciEnv.cpp#L981 > > So while WB waits the lock compiler thread can finish compilation, register nmethod and clear method's queued_for_compilation bit. > > The problem is that WB check `nm` value (compiled code) which it got before the lock and when method compilation is not finished. > > The fix is to check compiled code again similar to check in CompileBroker: > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/compileBroker.cpp#L1501 > > Passed hs-tier1-4 testing and 100 x vmTestbase/gc/g1/unloading/tests/unloading_compilation_*. LGTM ------------- Marked as reviewed by iignatyev (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2356 From ayang at openjdk.java.net Wed Feb 3 16:31:58 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Wed, 3 Feb 2021 16:31:58 GMT Subject: RFR: 8259668: Make SubTasksDone use-once Message-ID: After JDK-8260574, a instance of `SubTasksDone` is never reused, so part of its APIs could be revised: `clear()` and the code calling it is removed. With this patch, `all_tasks_completed` contains only assertion. Kim suggested moving this assertion logic to `~SubTasksDone`, but that could defer the assertion violation. For example, in the case of `G1FullGCMarkTask::work`, there is a significant amount of code running btw the instance when all subtasks are claimed (where `all_tasks_completed` is called in this PR) and `~SubTasksDone`. In the interest of having more precise location where bugs may lie, I have kept `all_tasks_completed` in the original place. More comments on this are welcome. ------------- Commit messages: - once Changes: https://git.openjdk.java.net/jdk/pull/2383/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2383&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8259668 Stats: 95 lines in 4 files changed: 31 ins; 49 del; 15 mod Patch: https://git.openjdk.java.net/jdk/pull/2383.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2383/head:pull/2383 PR: https://git.openjdk.java.net/jdk/pull/2383 From kvn at openjdk.java.net Wed Feb 3 18:08:45 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 3 Feb 2021 18:08:45 GMT Subject: Integrated: 8260301: misc gc/g1/unloading tests fails with "RuntimeException: Method could not be enqueued for compilation at level N" In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 15:44:16 GMT, Vladimir Kozlov wrote: > On return WB wait to acquire Compile_lock before checking compilation status > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/whitebox.cpp#L988 > > This lock is used by ciEnv for compiled code publishing: > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/ci/ciEnv.cpp#L981 > > So while WB waits the lock compiler thread can finish compilation, register nmethod and clear method's queued_for_compilation bit. > > The problem is that WB check `nm` value (compiled code) which it got before the lock and when method compilation is not finished. > > The fix is to check compiled code again similar to check in CompileBroker: > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/compileBroker.cpp#L1501 > > Passed hs-tier1-4 testing and 100 x vmTestbase/gc/g1/unloading/tests/unloading_compilation_*. This pull request has now been integrated. Changeset: f025bc1d Author: Vladimir Kozlov URL: https://git.openjdk.java.net/jdk/commit/f025bc1d Stats: 9 lines in 1 file changed: 9 ins; 0 del; 0 mod 8260301: misc gc/g1/unloading tests fails with "RuntimeException: Method could not be enqueued for compilation at level N" Reviewed-by: dlong, iignatyev ------------- PR: https://git.openjdk.java.net/jdk/pull/2356 From kvn at openjdk.java.net Wed Feb 3 18:08:44 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 3 Feb 2021 18:08:44 GMT Subject: RFR: 8260301: misc gc/g1/unloading tests fails with "RuntimeException: Method could not be enqueued for compilation at level N" In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 16:27:54 GMT, Igor Ignatyev wrote: >> On return WB wait to acquire Compile_lock before checking compilation status >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/whitebox.cpp#L988 >> >> This lock is used by ciEnv for compiled code publishing: >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/ci/ciEnv.cpp#L981 >> >> So while WB waits the lock compiler thread can finish compilation, register nmethod and clear method's queued_for_compilation bit. >> >> The problem is that WB check `nm` value (compiled code) which it got before the lock and when method compilation is not finished. >> >> The fix is to check compiled code again similar to check in CompileBroker: >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/compileBroker.cpp#L1501 >> >> Passed hs-tier1-4 testing and 100 x vmTestbase/gc/g1/unloading/tests/unloading_compilation_*. > > LGTM Thank you, Igor ------------- PR: https://git.openjdk.java.net/jdk/pull/2356 From mchung at openjdk.java.net Wed Feb 3 19:55:45 2021 From: mchung at openjdk.java.net (Mandy Chung) Date: Wed, 3 Feb 2021 19:55:45 GMT Subject: RFR: 8261031: Move some ClassLoader name checking to native/VM In-Reply-To: <3fZUkpucpgdhZyyWDQ7Hp1oKthgl1ckXBq942wMNwxI=.7a3db0ca-03c0-44f9-ade9-3b4443cc6666@github.com> References: <3fZUkpucpgdhZyyWDQ7Hp1oKthgl1ckXBq942wMNwxI=.7a3db0ca-03c0-44f9-ade9-3b4443cc6666@github.com> Message-ID: On Wed, 3 Feb 2021 12:21:30 GMT, Claes Redestad wrote: > This patch moves some sanity checking done in ClassLoader.java to the corresponding endpoints in native or VM code. I'm unsure the benefit of moving the check done by `checkName` to the VM but `preDefineClass` still calls `checkName`. The overhead of `checkName` should be fairly negligible? src/java.base/share/native/libjava/ClassLoader.c line 291: > 289: } > 290: // disallow slashes in input, change '.' to '/' > 291: if (verifyFixClassname(clname)) { perhaps we should replace all use of `fixClassname` with `verifyFixClassname` and remove `fixClassname`. ------------- PR: https://git.openjdk.java.net/jdk/pull/2378 From akozlov at openjdk.java.net Wed Feb 3 20:01:15 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Wed, 3 Feb 2021 20:01:15 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v10] In-Reply-To: References: Message-ID: > Please review the implementation of JEP 391: macOS/AArch64 Port. > > It's heavily based on existing ports to linux/aarch64, macos/x86_64, and windows/aarch64. > > Major changes are in: > * src/hotspot/cpu/aarch64: support of the new calling convention (subtasks JDK-8253817, JDK-8253818) > * src/hotspot/os_cpu/bsd_aarch64: copy of os_cpu/linux_aarch64 with necessary adjustments (JDK-8253819) > * src/hotspot/share, test/hotspot/gtest: support of write-xor-execute (W^X), required on macOS/AArch64 platform. It's implemented with pthread_jit_write_protect_np provided by Apple. The W^X mode is local to a thread, so W^X mode change relates to the java thread state change (for java threads). In most cases, JVM executes in write-only mode, except when calling a generated stub like SafeFetch, which requires a temporary switch to execute-only mode. The same execute-only mode is enabled when a java thread executes in java or native states. This approach of managing W^X mode turned out to be simple and efficient enough. > * src/jdk.hotspot.agent: serviceability agent implementation (JDK-8254941) Anton Kozlov has updated the pull request incrementally with six additional commits since the last revision: - Merge remote-tracking branch 'origin/jdk/jdk-macos' into jdk-macos - Add comments to WX transitions + minor change of placements - Use macro conditionals instead of empty functions - Add W^X to tests - Do not require known W^X state - Revert w^x in gtests ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2200/files - new: https://git.openjdk.java.net/jdk/pull/2200/files/3c705ae5..80827176 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2200&range=09 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2200&range=08-09 Stats: 444 lines in 64 files changed: 112 ins; 278 del; 54 mod Patch: https://git.openjdk.java.net/jdk/pull/2200.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2200/head:pull/2200 PR: https://git.openjdk.java.net/jdk/pull/2200 From akozlov at openjdk.java.net Wed Feb 3 20:01:16 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Wed, 3 Feb 2021 20:01:16 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: Message-ID: <6oqFTX-wDWyEUpJ6FLHvgP8gi0zmnkF6Mzz87hC6A1w=.e0b31425-4750-472f-9335-0b187a59c834@github.com> On Tue, 2 Feb 2021 23:10:17 GMT, Daniel D. Daugherty wrote: >> Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> support macos_aarch64 in hsdis > > For platform files that were copied from other ports to this port, if the file wasn't > changed I presume the copyright years are left alone. If the file required changes > for this port, I expect the year to be updated to 2021. How are you verifying that > these copyright years are being properly managed on the new files? > > For the new W^X helpers, e.g., WXWriteFromExecSetter, a short comment > explaining why one was landed in a particular place would help reviewers. > Also see my comment about creating a new ThreadToNativeWithWXExecFromVM > helper. > > I'm stopping my review with all the src/hotspot files done for now. Thank you all for your comments regarding W^X implementation. I've made a change that reduces the footprint of the implementation, also addressing most of the comments. I'll revisit them individually to make sure nothing is forgotten. The basic principle has not changed: when we execute JVM code (owned by libjvm.so, starting from JVM entry function), we switch to Write state. When we leave JVM to execute generated or JNI code, we switch to Executable state. I would like to highlight that JVM code does not mean the VM state of the java thread. After @stefank's suggestion, I could also drop a few W^X state switches, so now it should be more clear that switches are tied to JVM entry functions. ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From akozlov at openjdk.java.net Wed Feb 3 20:11:50 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Wed, 3 Feb 2021 20:11:50 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: <6oqFTX-wDWyEUpJ6FLHvgP8gi0zmnkF6Mzz87hC6A1w=.e0b31425-4750-472f-9335-0b187a59c834@github.com> References: <6oqFTX-wDWyEUpJ6FLHvgP8gi0zmnkF6Mzz87hC6A1w=.e0b31425-4750-472f-9335-0b187a59c834@github.com> Message-ID: On Wed, 3 Feb 2021 19:57:24 GMT, Anton Kozlov wrote: >> For platform files that were copied from other ports to this port, if the file wasn't >> changed I presume the copyright years are left alone. If the file required changes >> for this port, I expect the year to be updated to 2021. How are you verifying that >> these copyright years are being properly managed on the new files? >> >> For the new W^X helpers, e.g., WXWriteFromExecSetter, a short comment >> explaining why one was landed in a particular place would help reviewers. >> Also see my comment about creating a new ThreadToNativeWithWXExecFromVM >> helper. >> >> I'm stopping my review with all the src/hotspot files done for now. > > Thank you all for your comments regarding W^X implementation. I've made a change that reduces the footprint of the implementation, also addressing most of the comments. I'll revisit them individually to make sure nothing is forgotten. > > The basic principle has not changed: when we execute JVM code (owned by libjvm.so, starting from JVM entry function), we switch to Write state. When we leave JVM to execute generated or JNI code, we switch to Executable state. I would like to highlight that JVM code does not mean the VM state of the java thread. After @stefank's suggestion, I could also drop a few W^X state switches, so now it should be more clear that switches are tied to JVM entry functions. > I wonder if this is the right choice > ... > ``` > OopStorageParIterPerf::~OopStorageParIterPerf() { > ... > ``` > The transition in OopStorageParIterPerf was made for gtest setup to execute in WXWrite context. For tests themselves, defining macro set WXWrite. I've simplified the scheme and now we switch to WXWrite once at the gtest launcher. So this transition was dropped. I've also refreshed my memory and tried to switch to WXWrite as close as possible to each place where we'll be writing executable memory. There are a lot of such places! As you correctly noted, code cache contains objects, not plain data. For example, CodeCache memory management structures, CompiledMethod, ... are there, so we need more WXWrite switches than we have in the current approach. I had a comparable amount of them just to run -version, but certainly not enough to run tier1 tests. Following your advice, I don't require a known "from" state anymore. So a few W^X transitions were dropped, e.g. when the JVM code calls a JNI entry function, which expects to be called from the native code. I had to switch to WXExec just only to satisfy the expectations. After the update, we don't need this anymore. W^X switches are mostly hidden by VM_ENTRY and similar macros. Some JVM functions are not marked as entries for some reason, although they are called directly from e.g. interpreter. I added W^X management to such functions. ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From mikael at openjdk.java.net Wed Feb 3 20:11:50 2021 From: mikael at openjdk.java.net (Mikael Vidstedt) Date: Wed, 3 Feb 2021 20:11:50 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: <6oqFTX-wDWyEUpJ6FLHvgP8gi0zmnkF6Mzz87hC6A1w=.e0b31425-4750-472f-9335-0b187a59c834@github.com> Message-ID: On Wed, 3 Feb 2021 20:05:29 GMT, Anton Kozlov wrote: >> Thank you all for your comments regarding W^X implementation. I've made a change that reduces the footprint of the implementation, also addressing most of the comments. I'll revisit them individually to make sure nothing is forgotten. >> >> The basic principle has not changed: when we execute JVM code (owned by libjvm.so, starting from JVM entry function), we switch to Write state. When we leave JVM to execute generated or JNI code, we switch to Executable state. I would like to highlight that JVM code does not mean the VM state of the java thread. After @stefank's suggestion, I could also drop a few W^X state switches, so now it should be more clear that switches are tied to JVM entry functions. > >> I wonder if this is the right choice >> ... >> ``` >> OopStorageParIterPerf::~OopStorageParIterPerf() { >> ... >> ``` >> > > The transition in OopStorageParIterPerf was made for gtest setup to execute in WXWrite context. For tests themselves, defining macro set WXWrite. > > I've simplified the scheme and now we switch to WXWrite once at the gtest launcher. So this transition was dropped. > > I've also refreshed my memory and tried to switch to WXWrite as close as possible to each place where we'll be writing executable memory. There are a lot of such places! As you correctly noted, code cache contains objects, not plain data. For example, CodeCache memory management structures, CompiledMethod, ... are there, so we need more WXWrite switches than we have in the current approach. I had a comparable amount of them just to run -version, but certainly not enough to run tier1 tests. > > Following your advice, I don't require a known "from" state anymore. So a few W^X transitions were dropped, e.g. when the JVM code calls a JNI entry function, which expects to be called from the native code. I had to switch to WXExec just only to satisfy the expectations. After the update, we don't need this anymore. > > W^X switches are mostly hidden by VM_ENTRY and similar macros. Some JVM functions are not marked as entries for some reason, although they are called directly from e.g. interpreter. I added W^X management to such functions. Out of curiosity, have you looked at the performance of the W^X state transition? In particular I'm wondering if the cost is effectively negligible so doing it unconditionally on JVM entry is a no-brainer and just easier/cleaner than the alternatives, or if there are reasons to look at only doing the transition if/when needed (perhaps do it lazily and revert back to X when leaving the JVM?). ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From akozlov at openjdk.java.net Wed Feb 3 20:11:50 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Wed, 3 Feb 2021 20:11:50 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v2] In-Reply-To: References: <2wn66gOh0ezQssR63oCL82zCvBkga3mRlycGNbCtUqE=.e1c58219-93a0-47d2-8358-80a78f16d513@github.com> Message-ID: On Tue, 26 Jan 2021 12:50:22 GMT, Anton Kozlov wrote: >> Yes, that's why I thought it should be added to the classes ThreadInVMfromNative, etc like: >> class ThreadInVMfromNative : public ThreadStateTransition { >> ResetNoHandleMark __rnhm; >> >> We can look at it with cleaning up the thread transitions RFE or as a follow-on. If every line of ThreadInVMfromNative has to have one of these Thread::WXWriteVerifier __wx_write; people are going to miss them when adding the former. > > Not every ThreadInVMfromNative needs this, for example JIT goes to Native state without changing of W^X state. But from some experience of maintaining this patch, I actually had a duty to add missing W^X transitions after assert failures. A possible solution is actually to make W^X transition a part of ThreadInVMfromNative (and similar), but controlled by an optional constructor parameter with possible values "do exec->write", "verify write"...;. So in a common case ThreadInVMfromNative would implicitly do the transition and still would allow something uncommon like verification of the state for the JIT. I have to think about this. I've dropped this transition here and in similar places after state tracking always available. As a benefit, there are few places really using the setter and all of them are tied to VM_ENTRY macro or similar one. I expect we don't need to do W^X management near every java thread state change. ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From akozlov at openjdk.java.net Wed Feb 3 20:11:51 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Wed, 3 Feb 2021 20:11:51 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v2] In-Reply-To: References: <2wn66gOh0ezQssR63oCL82zCvBkga3mRlycGNbCtUqE=.e1c58219-93a0-47d2-8358-80a78f16d513@github.com> <51L0YGiOtGUEJUjhJkGnTRsoNofBsnSbopxjamQSesE=.0c2f3387-9f62-4d37-b2e8-73b5a5002641@github.com> <-_A-bf8i3jWY1awmyxwzi3yv4UoJQelRbJrMQVWQGLU=.5103b95d-3ad7-4a70-8d46-60c0eb0a301f@github.com> Message-ID: On Tue, 26 Jan 2021 12:01:30 GMT, Coleen Phillimore wrote: >> I assume a WXVerifier class that tracks W^X mode in debug mode and does nothing in release mode. I've considered to do this, it's relates to small inefficiencies, while this patch brings zero overhead (in release) for a platform that does not need W^X. >> * We don't need thread instance in release to call `os::current_thread_enable_wx`. Having WXVerifier a part of the Thread will require calling `Thread::current()` first and we could only hope for compiler to optimize this out, not sure if it will happen at all. In some contexts the Thread instance is available, in some it's not. >> * An instance of the empty class (as WXVerifier will be in the release) will occupy non-zero space in the Thread instance. >> >> If such costs are negligible, I can do as suggested. > > I really just want the minimal number of lines of code and hooks in thread.hpp. You can still access it through the thread, just like these lines currently. Look at HandshakeState as an example. Please take a look at the recent changes. Changes in thread.hpp were reduced: https://github.com/openjdk/jdk/pull/2200/files#diff-abdc409967d04172ecc20e3747aa55a79e755584d570b57c4d58902a9813d188 thread.inline.hpp provides definitions of accessors (non-trivial): https://github.com/openjdk/jdk/pull/2200/files#diff-3a29f7f952bf2bd936f49e97cb3b86a7324851133e879c142dec724455310b50 And new threadWXSetters.hpp defines RAII context setter: https://github.com/openjdk/jdk/pull/2200/files#diff-6424782ec43941031282f079e89adaa76d341ce340a3b78b0e9657358ec16278 ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From akozlov at openjdk.java.net Wed Feb 3 20:11:53 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Wed, 3 Feb 2021 20:11:53 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 23:03:45 GMT, Daniel D. Daugherty wrote: >> Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> support macos_aarch64 in hsdis > > src/hotspot/share/runtime/thread.cpp line 3991: > >> 3989: JavaThread* thread = JavaThread::current(); >> 3990: ThreadToNativeFromVM ttn(thread); >> 3991: Thread::WXExecFromWriteSetter wx_exec; > > I saw somewhere in this review a comment about why this new > WXExecFromWriteSetter helper isn't folded into ThreadToNativeFromVM > and I understand that not every current ThreadToNativeFromVM needs > the new helper. If the vast majority of the ThreadToNativeFromVM > locations need WXExecFromWriteSetter, then perhaps those locations > that currently have a ThreadToNativeFromVM followed by the new > WXExecFromWriteSetter should use a new helper: > > ThreadToNativeWithWXExecFromVM > > so we'll see changes from: > > ThreadToNativeFromVM -> ThreadToNativeWithWXExecFromVM > > where we need them and hopefully a short comment can be added > at the same time to explain the need for WXExec. This will allow us > to easily distinguish ThreadToNativeFromVM locations that DO NOT > need WXExec from those that DO need it. With a small overhead for tracking the current W^X state, I avoided WXExecFromWriteSetter near ThreadToNativeFromVM at all. New ThreadWXEnable(WXExec) is used only to call a generated function. More common ThreadWXEnable(WXWrite) is tied to JVM entry functions. I added comments for functions that are not clear to be entries, although they are. Thank you for the suggestion! ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From gziemski at openjdk.java.net Wed Feb 3 20:11:52 2021 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Wed, 3 Feb 2021 20:11:52 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 19:23:16 GMT, Bernhard Urban-Forster wrote: >> src/hotspot/os/posix/signals_posix.cpp line 1297: >> >>> 1295: kern_return_t kr; >>> 1296: kr = task_set_exception_ports(mach_task_self(), >>> 1297: EXC_MASK_BAD_ACCESS | EXC_MASK_BAD_INSTRUCTION | EXC_MASK_ARITHMETIC, >> >> Could someone elaborate on why we need to add `EXC_MASK_BAD_INSTRUCTION` to the mask here? > > See comment above about `gdb`, the same applies to `lldb` today. The AArch64 backend uses `SIGILL` (~= `EXC_MASK_BAD_INSTRUCTION`) to initiate a deoptimization. Without this change you cannot continue debugging once you the debuggee receives `SIGILL`. This wasn't needed before as x86 doesn't use `SIGILL`. Part of the comment said `This work-around is not necessary for 10.5+, as CrashReporter no longer intercedes on caught fatal signals.` so I thought it was no longer needed, but it sounds like the part about `gdb` still applies then. We should update the comment to just say the `gdb` relevant part perhaps (and evaluate which of the EXC_MASK_BAD_ACCESS | EXC_MASK_BAD_INSTRUCTION | EXC_MASK_ARITHMETIC) we actually need for gdb: `// gdb installs both standard BSD signal handlers, and mach exception` `// handlers. By replacing the existing task exception handler, we disable gdb's mach` `// exception handling, while leaving the standard BSD signal handlers functional.` Do you know if this also apply to `lldb` or is it `gdb` only specific? How do you run `gdb` on macOS nowadays anyhow? ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From akozlov at openjdk.java.net Wed Feb 3 20:11:54 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Wed, 3 Feb 2021 20:11:54 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 18:00:06 GMT, Gerard Ziemski wrote: >> Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> support macos_aarch64 in hsdis > > src/hotspot/cpu/aarch64/interpreterRT_aarch64.cpp line 390: > >> 388: store_and_inc(_to, from_obj, NativeStack::intSpace); >> 389: >> 390: _num_int_args++; > > `pass_byte()` and `pass_short()` use only one `_num_int_args++;` after the `if else` but other methods use 2 of them inside `if else` branches. > > We should be consistent. Agree. I'm going to do as much refactoring as needed before this patch under JDK-8261071 ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From akozlov at openjdk.java.net Wed Feb 3 20:16:52 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Wed, 3 Feb 2021 20:16:52 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 09:14:24 GMT, Andrew Haley wrote: >> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5271: >> >>> 5269: // >>> 5270: void MacroAssembler::get_thread(Register dst) { >>> 5271: RegSet saved_regs = RegSet::range(r0, r1) + BSD_ONLY(RegSet::range(r2, r17)) + lr - dst; >> >> The comment needs to be updated, since on BSD we also seem to clobber r2,r17 ? > > Surely this should be > > saved_regs = RegSet::range(r0, r1) BSD_ONLY(+ RegSet::range(r2, r17)) + lr - dst;``` > > Shouldn't it? Interesting, I wonder why it has built successfully on Linux. I'm going to fix this under as JDK-8261072 ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From gziemski at openjdk.java.net Wed Feb 3 20:32:47 2021 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Wed, 3 Feb 2021 20:32:47 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 20:04:18 GMT, Gerard Ziemski wrote: >> See comment above about `gdb`, the same applies to `lldb` today. The AArch64 backend uses `SIGILL` (~= `EXC_MASK_BAD_INSTRUCTION`) to initiate a deoptimization. Without this change you cannot continue debugging once you the debuggee receives `SIGILL`. This wasn't needed before as x86 doesn't use `SIGILL`. > > Part of the comment said `This work-around is not necessary for 10.5+, as CrashReporter no longer intercedes on caught fatal signals.` so I thought it was no longer needed, but it sounds like the part about `gdb` still applies then. > > We should update the comment to just say the `gdb` relevant part perhaps (and evaluate which of the EXC_MASK_BAD_ACCESS | EXC_MASK_BAD_INSTRUCTION | EXC_MASK_ARITHMETIC) we actually need for gdb: > > `// gdb installs both standard BSD signal handlers, and mach exception` > `// handlers. By replacing the existing task exception handler, we disable gdb's mach` > `// exception handling, while leaving the standard BSD signal handlers functional.` > > Do you know if this also apply to `lldb` or is it `gdb` only specific? How do you run `gdb` on macOS nowadays anyhow? To answer my own question, it seems that code is still needed on `x86_64` for `lldb` with `EXC_MASK_BAD_ACCESS` or we keep tripping over `EXC_BAD_ACCESS` Remaining questions: a) why we need `EXC_MASK_ARITHMETIC` ? b) we hit `signal SIGSEGV` in debugger even with the code in place, any way to avoid that? c) does `BSD aarch64` need only `EXC_MASK_BAD_INSTRUCTION` or does it need `EXC_MASK_BAD_ACCESS` as well? d) can we `#ifdef` the `EXC_MASK_BAD_INSTRUCTION` part of the mask only to apply to `aarch64`? ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From mdoerr at openjdk.java.net Wed Feb 3 20:48:56 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Wed, 3 Feb 2021 20:48:56 GMT Subject: RFR: 8260369: [PPC64] Add support for JDK-8200555 Message-ID: I'd like to add the PPC64 part of JDK-8200555 "OopHandle should use Access API". This will be required to support ShenandoahGC and zGC. I have to change register usage. That's what makes this change a bit larger. ------------- Commit messages: - add resolve_weak_handle which is also available on other platforms - remove debugging code - 8260369: [PPC64] Add support for JDK-8200555 Changes: https://git.openjdk.java.net/jdk/pull/2358/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2358&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260369 Stats: 89 lines in 7 files changed: 25 ins; 2 del; 62 mod Patch: https://git.openjdk.java.net/jdk/pull/2358.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2358/head:pull/2358 PR: https://git.openjdk.java.net/jdk/pull/2358 From dongbo at openjdk.java.net Wed Feb 3 21:43:50 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Wed, 3 Feb 2021 21:43:50 GMT Subject: [jdk16] Integrated: 8260585: AArch64: Wrong code generated for shifting right and accumulating four unsigned short integers In-Reply-To: References: Message-ID: On Thu, 28 Jan 2021 13:01:07 GMT, Dong Bo wrote: > This is a typo introduced by JDK-8255949. > Compiler will generate `ushr` for shifting right and accumulating four short integers. > It produces wrong results for specific case. The instruction should be `usra`. This pull request has now been integrated. Changeset: 5307afa9 Author: Dong Bo Committer: Dean Long URL: https://git.openjdk.java.net/jdk16/commit/5307afa9 Stats: 479 lines in 2 files changed: 458 ins; 16 del; 5 mod 8260585: AArch64: Wrong code generated for shifting right and accumulating four unsigned short integers Reviewed-by: iveresov, dlong, njian, aph ------------- PR: https://git.openjdk.java.net/jdk16/pull/136 From burban at openjdk.java.net Wed Feb 3 22:19:58 2021 From: burban at openjdk.java.net (Bernhard Urban-Forster) Date: Wed, 3 Feb 2021 22:19:58 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 20:29:48 GMT, Gerard Ziemski wrote: >> Part of the comment said `This work-around is not necessary for 10.5+, as CrashReporter no longer intercedes on caught fatal signals.` so I thought it was no longer needed, but it sounds like the part about `gdb` still applies then. >> >> We should update the comment to just say the `gdb` relevant part perhaps (and evaluate which of the EXC_MASK_BAD_ACCESS | EXC_MASK_BAD_INSTRUCTION | EXC_MASK_ARITHMETIC) we actually need for gdb: >> >> `// gdb installs both standard BSD signal handlers, and mach exception` >> `// handlers. By replacing the existing task exception handler, we disable gdb's mach` >> `// exception handling, while leaving the standard BSD signal handlers functional.` >> >> Do you know if this also apply to `lldb` or is it `gdb` only specific? How do you run `gdb` on macOS nowadays anyhow? > > To answer my own question, it seems that code is still needed on `x86_64` for `lldb` with `EXC_MASK_BAD_ACCESS` or we keep tripping over `EXC_BAD_ACCESS` > > Remaining questions: > > a) why we need `EXC_MASK_ARITHMETIC` ? > b) we hit `signal SIGSEGV` in debugger even with the code in place, any way to avoid that? > c) does `BSD aarch64` need only `EXC_MASK_BAD_INSTRUCTION` or does it need `EXC_MASK_BAD_ACCESS` as well? > d) can we `#ifdef` the `EXC_MASK_BAD_INSTRUCTION` part of the mask only to apply to `aarch64`? Thanks for your questions Gerard. > Part of the comment said This work-around is not necessary for 10.5+, as CrashReporter no longer intercedes on caught fatal signals. That comment can probably be deleted since minversion is anyway 10.9 (and soon 10.12 https://github.com/openjdk/jdk/pull/2268 ). > Do you know if this also apply to lldb or is it gdb only specific? How do you run gdb on macOS nowadays anyhow? `lldb` is shipped with Xcode, `gdb` isn't. You would need to build and sign it yourself, I haven't tried that in a while. So, we should update that comment to talk about `lldb` ?? > a) why we need `EXC_MASK_ARITHMETIC` ? I _believe_ this dates back to i386. As far as I can tell this isn't needed for x86_64 or aarch64. I guess we can remove it, the worst case is that it makes the debugging experience of the runtime a little bit worse. OTOH it doesn't hurt either to have it here. > b) we hit signal SIGSEGV in debugger even with the code in place, any way to avoid that? The equivalent for `handle SIGSEGV nostop noprint` (gdb) in lldb is `process handle -n false -p true -s false SIGSEGV`. > c) does `BSD aarch6` need only `EXC_MASK_BAD_INSTRUCTION` or does it need `EXC_MASK_BAD_ACCESS` as well? aarch64 needs `EXC_MASK_BAD_ACCESS` at least for implicit null checking, there might be other cases. > d) can we `#ifdef` the `EXC_MASK_BAD_INSTRUCTION` part of the mask only to apply to `aarch64`? Maybe. I don't see any value in it though, except making the code more complicated to read ?? ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From gziemski at openjdk.java.net Wed Feb 3 23:55:06 2021 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Wed, 3 Feb 2021 23:55:06 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 22:17:02 GMT, Bernhard Urban-Forster wrote: >> To answer my own question, it seems that code is still needed on `x86_64` for `lldb` with `EXC_MASK_BAD_ACCESS` or we keep tripping over `EXC_BAD_ACCESS` >> >> Remaining questions: >> >> a) why we need `EXC_MASK_ARITHMETIC` ? >> b) we hit `signal SIGSEGV` in debugger even with the code in place, any way to avoid that? >> c) does `BSD aarch64` need only `EXC_MASK_BAD_INSTRUCTION` or does it need `EXC_MASK_BAD_ACCESS` as well? >> d) can we `#ifdef` the `EXC_MASK_BAD_INSTRUCTION` part of the mask only to apply to `aarch64`? > > Thanks for your questions Gerard. > >> Part of the comment said This work-around is not necessary for 10.5+, as CrashReporter no longer intercedes on caught fatal signals. > > That comment can probably be deleted since minversion is anyway 10.9 (and soon 10.12 https://github.com/openjdk/jdk/pull/2268 ). > >> Do you know if this also apply to lldb or is it gdb only specific? How do you run gdb on macOS nowadays anyhow? > > `lldb` is shipped with Xcode, `gdb` isn't. You would need to build and sign it yourself, I haven't tried that in a while. So, we should update that comment to talk about `lldb` ?? > >> a) why we need `EXC_MASK_ARITHMETIC` ? > > I _believe_ this dates back to i386. As far as I can tell this isn't needed for x86_64 or aarch64. I guess we can remove it, the worst case is that it makes the debugging experience of the runtime a little bit worse. OTOH it doesn't hurt either to have it here. > >> b) we hit signal SIGSEGV in debugger even with the code in place, any way to avoid that? > > The equivalent for `handle SIGSEGV nostop noprint` (gdb) in lldb is `process handle -n false -p true -s false SIGSEGV`. > >> c) does `BSD aarch6` need only `EXC_MASK_BAD_INSTRUCTION` or does it need `EXC_MASK_BAD_ACCESS` as well? > > aarch64 needs `EXC_MASK_BAD_ACCESS` at least for implicit null checking, there might be other cases. > >> d) can we `#ifdef` the `EXC_MASK_BAD_INSTRUCTION` part of the mask only to apply to `aarch64`? > > Maybe. I don't see any value in it though, except making the code more complicated to read ?? I don't like the idea of using masks on architectures that do not require them. How about something like this? `#if defined(__APPLE__)` ` // lldb (gdb) installs both standard BSD signal handlers, and mach exception` ` // handlers. By replacing the existing task exception handler, we disable lldb's mach` ` // exception handling, while leaving the standard BSD signal handlers functional.` ` //` ` // EXC_MASK_BAD_ACCESS needed by all architectures for NULL ptr checking` ` // EXC_MASK_ARITHMETIC needed by i386` ` // EXC_MASK_BAD_INSTRUCTION needed by aarch64 to initiate deoptimization` ` kern_return_t kr;` ` kr = task_set_exception_ports(mach_task_self(),` ` EXC_MASK_BAD_ACCESS` ` NOT_LP64(| EXC_MASK_ARITHMETIC)` ` AARCH64_ONLY(| EXC_MASK_BAD_INSTRUCTION),` ` MACH_PORT_NULL,` ` EXCEPTION_STATE_IDENTITY,` ` MACHINE_THREAD_STATE);` ` ` ` assert(kr == KERN_SUCCESS, "could not set mach task signal handler");` `#endif` If I just knew why i386 needs `EXC_MASK_ARITHMETIC` and add that to the comment I would be personally happy with that chunk of code. ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From gziemski at openjdk.java.net Wed Feb 3 23:55:07 2021 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Wed, 3 Feb 2021 23:55:07 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 22:44:18 GMT, Gerard Ziemski wrote: >> Thanks for your questions Gerard. >> >>> Part of the comment said This work-around is not necessary for 10.5+, as CrashReporter no longer intercedes on caught fatal signals. >> >> That comment can probably be deleted since minversion is anyway 10.9 (and soon 10.12 https://github.com/openjdk/jdk/pull/2268 ). >> >>> Do you know if this also apply to lldb or is it gdb only specific? How do you run gdb on macOS nowadays anyhow? >> >> `lldb` is shipped with Xcode, `gdb` isn't. You would need to build and sign it yourself, I haven't tried that in a while. So, we should update that comment to talk about `lldb` ?? >> >>> a) why we need `EXC_MASK_ARITHMETIC` ? >> >> I _believe_ this dates back to i386. As far as I can tell this isn't needed for x86_64 or aarch64. I guess we can remove it, the worst case is that it makes the debugging experience of the runtime a little bit worse. OTOH it doesn't hurt either to have it here. >> >>> b) we hit signal SIGSEGV in debugger even with the code in place, any way to avoid that? >> >> The equivalent for `handle SIGSEGV nostop noprint` (gdb) in lldb is `process handle -n false -p true -s false SIGSEGV`. >> >>> c) does `BSD aarch6` need only `EXC_MASK_BAD_INSTRUCTION` or does it need `EXC_MASK_BAD_ACCESS` as well? >> >> aarch64 needs `EXC_MASK_BAD_ACCESS` at least for implicit null checking, there might be other cases. >> >>> d) can we `#ifdef` the `EXC_MASK_BAD_INSTRUCTION` part of the mask only to apply to `aarch64`? >> >> Maybe. I don't see any value in it though, except making the code more complicated to read ?? > > I don't like the idea of using masks on architectures that do not require them. How about something like this? > > `#if defined(__APPLE__)` > ` // lldb (gdb) installs both standard BSD signal handlers, and mach exception` > ` // handlers. By replacing the existing task exception handler, we disable lldb's mach` > ` // exception handling, while leaving the standard BSD signal handlers functional.` > ` //` > ` // EXC_MASK_BAD_ACCESS needed by all architectures for NULL ptr checking` > ` // EXC_MASK_ARITHMETIC needed by i386` > ` // EXC_MASK_BAD_INSTRUCTION needed by aarch64 to initiate deoptimization` > ` kern_return_t kr;` > ` kr = task_set_exception_ports(mach_task_self(),` > ` EXC_MASK_BAD_ACCESS` > ` NOT_LP64(| EXC_MASK_ARITHMETIC)` > ` AARCH64_ONLY(| EXC_MASK_BAD_INSTRUCTION),` > ` MACH_PORT_NULL,` > ` EXCEPTION_STATE_IDENTITY,` > ` MACHINE_THREAD_STATE);` > ` ` > ` assert(kr == KERN_SUCCESS, "could not set mach task signal handler");` > `#endif` > > If I just knew why i386 needs `EXC_MASK_ARITHMETIC` and add that to the comment I would be personally happy with that chunk of code. No idea how to insert spaces and make text align :-( ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From gziemski at openjdk.java.net Wed Feb 3 23:55:07 2021 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Wed, 3 Feb 2021 23:55:07 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 23:13:12 GMT, Bernhard Urban-Forster wrote: >> No idea how to insert spaces and make text align :-( > > using ` ```c ` https://docs.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks > > I was wrong about `SIGFPE` / `EXC_MASK_ARITHMETIC`, it's used on i386, x86_64: > https://github.com/openjdk/jdk/blob/2be60e37e0e433141b2e3d3e32f8e638a4888e3a/src/hotspot/os_cpu/bsd_x86/os_bsd_x86.cpp#L467-L524 > and aarch64: > https://github.com/AntonKozlov/jdk/blob/80827176cbc5f0dd26003cf234a8076f3f557928/src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp#L309-L323 > (What happened with the formatting here, ugh?) > > Your suggestion sounds good otherwise. @AntonKozlov, do you mind to integrate that? So it should be: #if defined(__APPLE__) // lldb (gdb) installs both standard BSD signal handlers, and mach exception // handlers. By replacing the existing task exception handler, we disable lldb's mach // exception handling, while leaving the standard BSD signal handlers functional. // // EXC_MASK_BAD_ACCESS needed by all architectures for NULL ptr checking // EXC_MASK_ARITHMETIC needed by all architectures for div by 0 checking // EXC_MASK_BAD_INSTRUCTION needed by aarch64 to initiate deoptimization kern_return_t kr; kr = task_set_exception_ports(mach_task_self(), EXC_MASK_BAD_ACCESS | EXC_MASK_ARITHMETIC AARCH64_ONLY(| EXC_MASK_BAD_INSTRUCTION), MACH_PORT_NULL, EXCEPTION_STATE_IDENTITY, MACHINE_THREAD_STATE); assert(kr == KERN_SUCCESS, "could not set mach task signal handler"); #endif ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From burban at openjdk.java.net Wed Feb 3 23:55:07 2021 From: burban at openjdk.java.net (Bernhard Urban-Forster) Date: Wed, 3 Feb 2021 23:55:07 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 22:48:33 GMT, Gerard Ziemski wrote: >> I don't like the idea of using masks on architectures that do not require them. How about something like this? >> >> `#if defined(__APPLE__)` >> ` // lldb (gdb) installs both standard BSD signal handlers, and mach exception` >> ` // handlers. By replacing the existing task exception handler, we disable lldb's mach` >> ` // exception handling, while leaving the standard BSD signal handlers functional.` >> ` //` >> ` // EXC_MASK_BAD_ACCESS needed by all architectures for NULL ptr checking` >> ` // EXC_MASK_ARITHMETIC needed by i386` >> ` // EXC_MASK_BAD_INSTRUCTION needed by aarch64 to initiate deoptimization` >> ` kern_return_t kr;` >> ` kr = task_set_exception_ports(mach_task_self(),` >> ` EXC_MASK_BAD_ACCESS` >> ` NOT_LP64(| EXC_MASK_ARITHMETIC)` >> ` AARCH64_ONLY(| EXC_MASK_BAD_INSTRUCTION),` >> ` MACH_PORT_NULL,` >> ` EXCEPTION_STATE_IDENTITY,` >> ` MACHINE_THREAD_STATE);` >> ` ` >> ` assert(kr == KERN_SUCCESS, "could not set mach task signal handler");` >> `#endif` >> >> If I just knew why i386 needs `EXC_MASK_ARITHMETIC` and add that to the comment I would be personally happy with that chunk of code. > > No idea how to insert spaces and make text align :-( using ` ```c ` https://docs.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks I was wrong about `SIGFPE` / `EXC_MASK_ARITHMETIC`, it's used on i386, x86_64: https://github.com/openjdk/jdk/blob/2be60e37e0e433141b2e3d3e32f8e638a4888e3a/src/hotspot/os_cpu/bsd_x86/os_bsd_x86.cpp#L467-L524 and aarch64: https://github.com/AntonKozlov/jdk/blob/80827176cbc5f0dd26003cf234a8076f3f557928/src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp#L309-L323 (What happened with the formatting here, ugh?) Your suggestion sounds good otherwise. @AntonKozlov, do you mind to integrate that? ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From iklam at openjdk.java.net Wed Feb 3 23:57:12 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 3 Feb 2021 23:57:12 GMT Subject: RFR: 8260019: Move some Thread subtypes out of thread.hpp Message-ID: <-Nn8JnMM_HSkbTRFUt3BsUt56gB63ixaurC5zsJ4oGQ=.b7fcba36-2a7d-469c-bff1-9b134713282d@github.com> thread.hpp is 2133 lines long and is included by about 800 out of 1000 HotSpot .o files. It also pulls in many other header files. Many of the Thread subtypes are infrequently used. This RFE move the easy ones to compilerThread.hpp and nonJavaThread.hpp. This reduces thread.hpp to 1888 lines. (I hope to do more refactoring of thread.hpp in future RFEs). Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. ------------- Commit messages: - 8260019: Move some Thread subtypes out of thread.hpp Changes: https://git.openjdk.java.net/jdk/pull/2390/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2390&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260019 Stats: 1392 lines in 18 files changed: 760 ins; 622 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/2390.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2390/head:pull/2390 PR: https://git.openjdk.java.net/jdk/pull/2390 From jwilhelm at openjdk.java.net Thu Feb 4 01:24:02 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Thu, 4 Feb 2021 01:24:02 GMT Subject: RFR: Merge jdk16 Message-ID: Forwardport JDK 16 -> JDK 17 ------------- Commit messages: - Merge - 8259794: Remove EA from JDK 16 version string starting with Initial RC promotion on Feb 04, 2021(B35) - 8260704: ParallelGC: oldgen expansion needs release-store for _end - 8260927: StringBuilder::insert is incorrect without Compact Strings - 8258378: Final nroff manpage update for JDK 16 - 8257215: JFR: Events dropped when streaming over a chunk rotation - 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled - 8260632: Build failures after JDK-8253353 - 8260339: JVM crashes when executing PhaseIdealLoop::match_fill_loop - 8260608: add a regression test for 8260370 - ... and 2 more: https://git.openjdk.java.net/jdk/compare/f025bc1d...dad835ee The webrevs contain the adjustments done while merging with regards to each parent branch: - master: https://webrevs.openjdk.java.net/?repo=jdk&pr=2392&range=00.0 - jdk16: https://webrevs.openjdk.java.net/?repo=jdk&pr=2392&range=00.1 Changes: https://git.openjdk.java.net/jdk/pull/2392/files Stats: 2645 lines in 56 files changed: 2497 ins; 69 del; 79 mod Patch: https://git.openjdk.java.net/jdk/pull/2392.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2392/head:pull/2392 PR: https://git.openjdk.java.net/jdk/pull/2392 From iklam at openjdk.java.net Thu Feb 4 01:38:49 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 4 Feb 2021 01:38:49 GMT Subject: RFR: 8261106: Reduce inclusion of jniHandles.hpp Message-ID: niHandles.hpp is included by about 800 out of 1000 HotSpot .o files. Most of these are transitively included from these header files, which don't actually need to include jniHandles.hpp. - ci/ciBaseObject.hpp - ci/ciMetadata.hpp - ci/ciObject.hpp - classfile/moduleEntry.hpp - gc/shared/gcVMOperations.hpp - jvmci/jvmciJavaClasses.hpp - runtime/thread.hpp - services/threadService.hpp Fixing these headers reduces the number of .o files that include jniHandles.hpp to 145. Note: 43 files were changed in this PR. Most of them were using jniHandles.hpp but were not including it directly. Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. ------------- Commit messages: - 8261106: Reduce inclusion of jniHandles.hpp Changes: https://git.openjdk.java.net/jdk/pull/2393/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2393&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261106 Stats: 44 lines in 43 files changed: 36 ins; 8 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2393.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2393/head:pull/2393 PR: https://git.openjdk.java.net/jdk/pull/2393 From iklam at openjdk.java.net Thu Feb 4 02:00:07 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 4 Feb 2021 02:00:07 GMT Subject: RFR: 8260012: Reduce inclusion of collectedHeap.hpp and heapInspection.hpp [v3] In-Reply-To: References: Message-ID: > collectedHeap.hpp is included by 477 out of 1000 .o files in HotSpot. This file in turn includes many other complex header files. > > In many cases, an object file only directly includes this file via: > > - memAllocator.hpp (which does not actually use collectedHeap.hpp) > - oop.inline.hpp and compressedOops.inline.hpp (only use collectedHeap.hpp in asserts via `Universe::heap()->is_in()`). > > By refactoring the above 3 files, we can reduce the .o files that include collectedHeap.hpp to 242. > > This RFE also removes the unnecessary inclusion of heapInspection.hpp from collectedHeap.hpp. > > Build time of HotSpot is reduced for about 1%. > > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' of https://github.com/openjdk/jdk into 8260012-reduce-inclue-collectedHeap-heapInspection-hpp - @tschatzl and @stefank comments - Merge branch 'master' into 8260012-reduce-inclue-collectedHeap-heapInspection-hpp - 8260012: Reduce inclusion of collectedHeap.hpp and heapInspection.hpp ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2347/files - new: https://git.openjdk.java.net/jdk/pull/2347/files/529e77e4..7d9015d2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2347&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2347&range=01-02 Stats: 2516 lines in 114 files changed: 1237 ins; 850 del; 429 mod Patch: https://git.openjdk.java.net/jdk/pull/2347.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2347/head:pull/2347 PR: https://git.openjdk.java.net/jdk/pull/2347 From jwilhelm at openjdk.java.net Thu Feb 4 02:09:40 2021 From: jwilhelm at openjdk.java.net (Jesper Wilhelmsson) Date: Thu, 4 Feb 2021 02:09:40 GMT Subject: Integrated: Merge jdk16 In-Reply-To: References: Message-ID: <9bIQlW7tZAt5i8quzTGD6Z6vHAG4-Q8-_saIecOJ4dM=.b1dacc8a-2678-417c-958b-650ff659723f@github.com> On Thu, 4 Feb 2021 01:17:48 GMT, Jesper Wilhelmsson wrote: > Forwardport JDK 16 -> JDK 17 This pull request has now been integrated. Changeset: 9b7a8f19 Author: Jesper Wilhelmsson URL: https://git.openjdk.java.net/jdk/commit/9b7a8f19 Stats: 2645 lines in 56 files changed: 2497 ins; 69 del; 79 mod Merge ------------- PR: https://git.openjdk.java.net/jdk/pull/2392 From iklam at openjdk.java.net Thu Feb 4 04:09:06 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 4 Feb 2021 04:09:06 GMT Subject: RFR: 8260012: Reduce inclusion of collectedHeap.hpp and heapInspection.hpp [v4] In-Reply-To: References: Message-ID: > collectedHeap.hpp is included by 477 out of 1000 .o files in HotSpot. This file in turn includes many other complex header files. > > In many cases, an object file only directly includes this file via: > > - memAllocator.hpp (which does not actually use collectedHeap.hpp) > - oop.inline.hpp and compressedOops.inline.hpp (only use collectedHeap.hpp in asserts via `Universe::heap()->is_in()`). > > By refactoring the above 3 files, we can reduce the .o files that include collectedHeap.hpp to 242. > > This RFE also removes the unnecessary inclusion of heapInspection.hpp from collectedHeap.hpp. > > Build time of HotSpot is reduced for about 1%. > > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into 8260012-reduce-inclue-collectedHeap-heapInspection-hpp - Merge branch 'master' of https://github.com/openjdk/jdk into 8260012-reduce-inclue-collectedHeap-heapInspection-hpp - @tschatzl and @stefank comments - Merge branch 'master' into 8260012-reduce-inclue-collectedHeap-heapInspection-hpp - 8260012: Reduce inclusion of collectedHeap.hpp and heapInspection.hpp ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2347/files - new: https://git.openjdk.java.net/jdk/pull/2347/files/7d9015d2..cfd70b3c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2347&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2347&range=02-03 Stats: 2645 lines in 56 files changed: 2497 ins; 69 del; 79 mod Patch: https://git.openjdk.java.net/jdk/pull/2347.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2347/head:pull/2347 PR: https://git.openjdk.java.net/jdk/pull/2347 From iklam at openjdk.java.net Thu Feb 4 04:09:07 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 4 Feb 2021 04:09:07 GMT Subject: Integrated: 8260012: Reduce inclusion of collectedHeap.hpp and heapInspection.hpp In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 04:18:24 GMT, Ioi Lam wrote: > collectedHeap.hpp is included by 477 out of 1000 .o files in HotSpot. This file in turn includes many other complex header files. > > In many cases, an object file only directly includes this file via: > > - memAllocator.hpp (which does not actually use collectedHeap.hpp) > - oop.inline.hpp and compressedOops.inline.hpp (only use collectedHeap.hpp in asserts via `Universe::heap()->is_in()`). > > By refactoring the above 3 files, we can reduce the .o files that include collectedHeap.hpp to 242. > > This RFE also removes the unnecessary inclusion of heapInspection.hpp from collectedHeap.hpp. > > Build time of HotSpot is reduced for about 1%. > > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. This pull request has now been integrated. Changeset: 82028e70 Author: Ioi Lam URL: https://git.openjdk.java.net/jdk/commit/82028e70 Stats: 110 lines in 60 files changed: 69 ins; 7 del; 34 mod 8260012: Reduce inclusion of collectedHeap.hpp and heapInspection.hpp Reviewed-by: stefank, tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/2347 From iklam at openjdk.java.net Thu Feb 4 06:16:00 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 4 Feb 2021 06:16:00 GMT Subject: RFR: 8261125: Move VM_Operation to vmOperation.hpp Message-ID: vmOperations.hpp declares the VM_Operation class, as well as a hodge podge of subclasses such as VM_ForceSafepoint, VM_DeoptimizeFrame. Out of the 1000 hotspot .o files, about 680 include vmOperations.hpp (mostly transitively). In most cases, they just need to use the VM_Operation class. So we should move VM_Operation to its own header: vmOperation.hpp (no "s"). After the refactoring, vmOperations.hpp is included only 64 times. The inclusion count of threadSMR.hpp is also reduced from 687 to 99. HotSpot build time is improved by about 0.4%. Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. ------------- Commit messages: - 8261125: Move VM_Operation to vmOperation.hpp Changes: https://git.openjdk.java.net/jdk/pull/2398/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2398&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261125 Stats: 347 lines in 24 files changed: 189 ins; 142 del; 16 mod Patch: https://git.openjdk.java.net/jdk/pull/2398.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2398/head:pull/2398 PR: https://git.openjdk.java.net/jdk/pull/2398 From dholmes at openjdk.java.net Thu Feb 4 06:32:42 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 4 Feb 2021 06:32:42 GMT Subject: RFR: 8260019: Move some Thread subtypes out of thread.hpp In-Reply-To: <-Nn8JnMM_HSkbTRFUt3BsUt56gB63ixaurC5zsJ4oGQ=.b7fcba36-2a7d-469c-bff1-9b134713282d@github.com> References: <-Nn8JnMM_HSkbTRFUt3BsUt56gB63ixaurC5zsJ4oGQ=.b7fcba36-2a7d-469c-bff1-9b134713282d@github.com> Message-ID: On Wed, 3 Feb 2021 22:55:56 GMT, Ioi Lam wrote: > thread.hpp is 2133 lines long and is included by about 800 out of 1000 HotSpot .o files. It also pulls in many other header files. > > Many of the Thread subtypes are infrequently used. This RFE move the easy ones to compilerThread.hpp and nonJavaThread.hpp. This reduces thread.hpp to 1888 lines. > > (I hope to do more refactoring of thread.hpp in future RFEs). > > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. Hi Ioi, I'm assuming the code is basically just a cut'n'paste with few actual changes needed thereafter - though I did spot the need to make the thread_entry member functions. I'm not sure about the choice of new file names given they contain code for multiple thread types. I don't have great names to suggest as they stand. I wouldn't object to the sweeper and watcher threads getting their own files - but don't insist at this time. I can't decide whether including both fooThread.hpp and thread.hpp in the same cpp file is good code hygiene or a complete waste of time. It is kind of obvious that fooThread.hpp must include thread.hpp. Overall I come down on the side of approval. :) Thanks, David src/hotspot/share/compiler/compilerThread.hpp line 2: > 1: /* > 2: * Copyright (c) 1997, 2021, Oracle and/or its affiliates. All rights reserved. Should a new file only have a single copyright year? I'm not sure what the rule is about splitting existing files but you use a single year in the new cpp files. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2390 From nick.gasson at arm.com Thu Feb 4 07:21:17 2021 From: nick.gasson at arm.com (Nick Gasson) Date: Thu, 04 Feb 2021 15:21:17 +0800 Subject: RFR: 8260355: AArch64: deoptimization stub should save vector registers [v4] In-Reply-To: References: <5PbujtOhWB7uqNXu5vRMwYGNMNU78aLAkGpDLWUnQKM=.9d2206fe-9c15-424c-b08e-80eab468df2a@github.com> <8uV_aS99ZXLKzfeqP9PnJOMqLDLqqDBAXgY1kShysUE=.458c9338-e47a-4371-aac7-8fe096ef19c4@github.com> Message-ID: <85im78cn82.fsf@nicgas01-pc.shanghai.arm.com> On 02/03/21 17:36 pm, Andrew Haley wrote: >> >> @theRealAph are the sharedRuntime_aarch64.cpp changes ok? > > I guess so, but the code changes are so complex and delicate it's extremely > hard to tell. > > What have you done about stress testing? I guess we need some code that's > repeatedly deoptimized and recompiled millions of times, with continuous > testing. I guess that in order to make sure nothing has regressed, a > bootstrap with DeoptimizeALot would help gain some confidence. I tried make bootcycle-images as you suggest with -XX:+DeoptimizeALot added to JAVA_FLAGS and JAVA_FLAGS_BIG in bootcycle-spec.gmk.in (not sure if there's a better way to do that...). I've also previously run the tier1 and java/incubator/vector/* tests with -XX:+DeoptimizeALot. My experience of modifying that code is that DeoptimizeALot fails pretty quickly if you get something wrong. -- Thanks, Nick From aph at redhat.com Thu Feb 4 08:18:41 2021 From: aph at redhat.com (Andrew Haley) Date: Thu, 4 Feb 2021 08:18:41 +0000 Subject: RFR: 8260355: AArch64: deoptimization stub should save vector registers [v4] In-Reply-To: <85im78cn82.fsf@nicgas01-pc.shanghai.arm.com> References: <5PbujtOhWB7uqNXu5vRMwYGNMNU78aLAkGpDLWUnQKM=.9d2206fe-9c15-424c-b08e-80eab468df2a@github.com> <8uV_aS99ZXLKzfeqP9PnJOMqLDLqqDBAXgY1kShysUE=.458c9338-e47a-4371-aac7-8fe096ef19c4@github.com> <85im78cn82.fsf@nicgas01-pc.shanghai.arm.com> Message-ID: <2084d66e-ab46-f909-51a0-c6f03c8da51c@redhat.com> On 2/4/21 7:21 AM, Nick Gasson wrote: > On 02/03/21 17:36 pm, Andrew Haley wrote: >>> >>> @theRealAph are the sharedRuntime_aarch64.cpp changes ok? >> >> I guess so, but the code changes are so complex and delicate it's extremely >> hard to tell. >> >> What have you done about stress testing? I guess we need some code that's >> repeatedly deoptimized and recompiled millions of times, with continuous >> testing. I guess that in order to make sure nothing has regressed, a >> bootstrap with DeoptimizeALot would help gain some confidence. > > I tried make bootcycle-images as you suggest with -XX:+DeoptimizeALot > added to JAVA_FLAGS and JAVA_FLAGS_BIG in bootcycle-spec.gmk.in (not > sure if there's a better way to do that...). > > I've also previously run the tier1 and java/incubator/vector/* tests > with -XX:+DeoptimizeALot. My experience of modifying that code is that > DeoptimizeALot fails pretty quickly if you get something wrong. Yeah. The problem here is that safepoints with live vectors aren't so common, so it's hard to test, I get it. Maybe this will have to do. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Thu Feb 4 08:20:26 2021 From: aph at redhat.com (Andrew Haley) Date: Thu, 4 Feb 2021 08:20:26 +0000 Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: <6oqFTX-wDWyEUpJ6FLHvgP8gi0zmnkF6Mzz87hC6A1w=.e0b31425-4750-472f-9335-0b187a59c834@github.com> References: <6oqFTX-wDWyEUpJ6FLHvgP8gi0zmnkF6Mzz87hC6A1w=.e0b31425-4750-472f-9335-0b187a59c834@github.com> Message-ID: <8f81546a-eee9-5966-2dc7-f0cde6cc1a14@redhat.com> On 2/3/21 8:01 PM, Anton Kozlov wrote: > The basic principle has not changed: when we execute JVM code (owned by libjvm.so, starting from JVM entry function), we switch to Write state. When we leave JVM to execute generated or JNI code, we switch to Executable state. I would like to highlight that JVM code does not mean the VM state of the java thread. After @stefank's suggestion, I could also drop a few W^X state switches, so now it should be more clear that switches are tied to JVM entry functions. I haven't got a MacOS AArch64 system right now. Is it possible to enable W^X in Linux in order to kick the tyres? -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Thu Feb 4 08:22:22 2021 From: aph at redhat.com (Andrew Haley) Date: Thu, 4 Feb 2021 08:22:22 +0000 Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: <6oqFTX-wDWyEUpJ6FLHvgP8gi0zmnkF6Mzz87hC6A1w=.e0b31425-4750-472f-9335-0b187a59c834@github.com> Message-ID: <9b2aaf97-6dd5-e203-ac73-603dca7fb47e@redhat.com> On 2/3/21 8:11 PM, Mikael Vidstedt wrote: > Out of curiosity, have you looked at the performance of the W^X state transition? In particular I'm wondering if the cost is effectively negligible so doing it unconditionally on JVM entry is a no-brainer and just easier/cleaner than the alternatives, or if there are reasons to look at only doing the transition if/when needed (perhaps do it lazily and revert back to X when leaving the JVM?). I has to change page permissions, doesn't it? That's never going to be cheap, given that it requires a TLB teardown broadcast across all cores. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From mikael at openjdk.java.net Thu Feb 4 08:30:48 2021 From: mikael at openjdk.java.net (Mikael Vidstedt) Date: Thu, 4 Feb 2021 08:30:48 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: <6oqFTX-wDWyEUpJ6FLHvgP8gi0zmnkF6Mzz87hC6A1w=.e0b31425-4750-472f-9335-0b187a59c834@github.com> Message-ID: On Wed, 3 Feb 2021 20:08:28 GMT, Mikael Vidstedt wrote: >>> I wonder if this is the right choice >>> ... >>> ``` >>> OopStorageParIterPerf::~OopStorageParIterPerf() { >>> ... >>> ``` >>> >> >> The transition in OopStorageParIterPerf was made for gtest setup to execute in WXWrite context. For tests themselves, defining macro set WXWrite. >> >> I've simplified the scheme and now we switch to WXWrite once at the gtest launcher. So this transition was dropped. >> >> I've also refreshed my memory and tried to switch to WXWrite as close as possible to each place where we'll be writing executable memory. There are a lot of such places! As you correctly noted, code cache contains objects, not plain data. For example, CodeCache memory management structures, CompiledMethod, ... are there, so we need more WXWrite switches than we have in the current approach. I had a comparable amount of them just to run -version, but certainly not enough to run tier1 tests. >> >> Following your advice, I don't require a known "from" state anymore. So a few W^X transitions were dropped, e.g. when the JVM code calls a JNI entry function, which expects to be called from the native code. I had to switch to WXExec just only to satisfy the expectations. After the update, we don't need this anymore. >> >> W^X switches are mostly hidden by VM_ENTRY and similar macros. Some JVM functions are not marked as entries for some reason, although they are called directly from e.g. interpreter. I added W^X management to such functions. >> >> Thank you! > > Out of curiosity, have you looked at the performance of the W^X state transition? In particular I'm wondering if the cost is effectively negligible so doing it unconditionally on JVM entry is a no-brainer and just easier/cleaner than the alternatives, or if there are reasons to look at only doing the transition if/when needed (perhaps do it lazily and revert back to X when leaving the JVM?). You read my mind, Andrew. Unless, of course, it's optimized to leverage the fact that it's thread-specific.. ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From vkempik at openjdk.java.net Thu Feb 4 09:51:57 2021 From: vkempik at openjdk.java.net (Vladimir Kempik) Date: Thu, 4 Feb 2021 09:51:57 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: <6oqFTX-wDWyEUpJ6FLHvgP8gi0zmnkF6Mzz87hC6A1w=.e0b31425-4750-472f-9335-0b187a59c834@github.com> Message-ID: On Thu, 4 Feb 2021 08:26:35 GMT, Mikael Vidstedt wrote: > You read my mind, Andrew. Unless, of course, it's optimized to leverage the fact that it's thread-specific.. it's thread-specific https://developer.apple.com/documentation/apple_silicon/porting_just-in-time_compilers_to_apple_silicon >Because pthread_jit_write_protect_np changes only the current thread?s permissions, avoid accessing the same memory region from multiple threads. Giving multiple threads access to the same memory region opens up a potential attack vector, in which one thread has write access and another has executable access to the same region. ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From nick.gasson at arm.com Thu Feb 4 10:02:29 2021 From: nick.gasson at arm.com (Nick Gasson) Date: Thu, 04 Feb 2021 18:02:29 +0800 Subject: RFR: 8260355: AArch64: deoptimization stub should save vector registers [v4] In-Reply-To: <2084d66e-ab46-f909-51a0-c6f03c8da51c@redhat.com> References: <5PbujtOhWB7uqNXu5vRMwYGNMNU78aLAkGpDLWUnQKM=.9d2206fe-9c15-424c-b08e-80eab468df2a@github.com> <8uV_aS99ZXLKzfeqP9PnJOMqLDLqqDBAXgY1kShysUE=.458c9338-e47a-4371-aac7-8fe096ef19c4@github.com> <85im78cn82.fsf@nicgas01-pc.shanghai.arm.com> <2084d66e-ab46-f909-51a0-c6f03c8da51c@redhat.com> Message-ID: <85ft2ccfre.fsf@nicgas01-pc.shanghai.arm.com> On 02/04/21 16:18 pm, Andrew Haley wrote: > > Yeah. The problem here is that safepoints with live vectors aren't so > common, so it's hard to test, I get it. Maybe this will have to do. You can test that situation quite readily with: make test TEST="jdk/incubator/vector" \ JTREG="VM_OPTIONS=-XX:+DeoptimizeALot -XX:DeoptimizeALotInterval=0" Which will segfault with current JDK. I guess the difficulty is showing it hasn't regressed something else. -- Thanks, Nick From vladimir.x.ivanov at oracle.com Thu Feb 4 10:03:00 2021 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 4 Feb 2021 13:03:00 +0300 Subject: RFR: 8260355: AArch64: deoptimization stub should save vector registers [v4] In-Reply-To: <2084d66e-ab46-f909-51a0-c6f03c8da51c@redhat.com> References: <5PbujtOhWB7uqNXu5vRMwYGNMNU78aLAkGpDLWUnQKM=.9d2206fe-9c15-424c-b08e-80eab468df2a@github.com> <8uV_aS99ZXLKzfeqP9PnJOMqLDLqqDBAXgY1kShysUE=.458c9338-e47a-4371-aac7-8fe096ef19c4@github.com> <85im78cn82.fsf@nicgas01-pc.shanghai.arm.com> <2084d66e-ab46-f909-51a0-c6f03c8da51c@redhat.com> Message-ID: <8d4f63c0-41a6-5def-3e6f-8de014476750@oracle.com> >> I've also previously run the tier1 and java/incubator/vector/* tests >> with -XX:+DeoptimizeALot. My experience of modifying that code is that >> DeoptimizeALot fails pretty quickly if you get something wrong. > > Yeah. The problem here is that safepoints with live vectors aren't so > common, so it's hard to test, I get it. Maybe this will have to do. FTR jdk/java/incubator/vector tests w/ -XX:+DeoptimizeALot are very good at verifying that in-register vector values are properly preserved: vectors (as exposed by Vector API) are routinely kept in registers across safepoints and during deoptimization they are rematerialized into full-blown Vector instances, so the tests fail quickly on broken vector values. Best regards, Vladimir Ivanov From stuefe at openjdk.java.net Thu Feb 4 10:08:07 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 4 Feb 2021 10:08:07 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v4] In-Reply-To: References: Message-ID: > In signal handling code, we have code sections which save signal handler state into vectors of sigaction structures, or of integers (if only flags are saved). All these code sections can be unified, disentangled and the using code simplified. > > There are three places where we do this: > > 1) When installing hotspot signal handlers, should we find a handler in place and signal chaining is enabled, we save the original handler inside a sigaction array and a corresponding sigset: > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L85 > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L338 > > 2) if diagnostics are enabled with -Xcheck:jni, we periodically check if our hotspot signal handlers had been replaced (`static void check_signal_handler(int sig)`): > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L766 > To do that, we store information about the handlers we installed and we expect to be intact; in this case we only store the sigaction flags (`int sigflags[NSIG];`) and deduce the handler address from context. > > 3) There is a complicated dance between VMError and the posix signal handler code: If a fatal error happens, we enter error reporting and install the secondary handler (`VMError::install_secondary_signal_handler()`). Before doing that, we store the handler we replace in yet another array, in this case one array for the handler address, one for the flag: > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/vmError_posix.cpp#L77 > I believe the purpose of this is to - when printing signal handlers as part of error reporting - print the original signal handler instead of the secondary crash handler (see `PosixSignals::print_signal_handler()`): > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L1372 > and additionally to not trip this warning here: > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L1391 > > ------ > > Changes in this patch: > > - I added some convenience macros to check if a handler matches a given function (HANDLER_IS), check if a handler is set to ignore or default or both (HANDLER_IS_IGN, HANDLER_IS_DFL, HANDLER_IS_IGN_OR_DFL). Makes code more readable. > - I added convenience class `SavedSignalHandlers` to keep a vector of handler information by signal number. > - I used that class to cover cases (1)..(3): > - `chained_handlers` contains all information of chained handlers > - `expected_handlers` contains a copy of the handlers the hotspot installed > - `replaced_handlers` contains information about replaced handlers > > - about (1): I store the chained signal handler information in `chained_handlers` when installing a hotspot handler, UseSignalChaining is 1, and a non-default handler was encountered. > > - about (2): I simplified the signal checking mechanism quite a bit: it compares the handler (address and flags) it finds present with expectations. Before this patch, the expected handler address was deduced in a hard-wired way, now, we just compare the active sigaction structure with the one we installed on VM start. > > - about (3): when installing any handler (hotspot as well as user defined via java), I store the handler it replaced in `replaced_handlers`. I use that to print which handler had been replaced in `PosixSignals::print_signal_handler`. I simplified `PosixSignals::print_signal_handler` such that it does not retain any knowledge about hotspot signal handlers. Now, it just prints out the currently established handlers. In addition to that, it prints out chaining information and which handlers had been replaced. I removed the associated coding from VMError. > > Output Before: > 663 Signal Handlers: > 664 SIGSEGV: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 665 SIGBUS: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 666 SIGFPE: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 667 SIGPIPE: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 668 SIGXFSZ: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 669 SIGILL: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 670 SIGUSR2: SR_handler in libjvm.so, sa_mask[0]=00000000000000000000000000000000, sa_flags=SA_RESTART|SA_SIGINFO > 671 SIGHUP: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 672 SIGINT: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 673 SIGTERM: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 674 SIGQUIT: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 675 SIGTRAP: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > > Now: > Signal Handlers: > SIGSEGV: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > replaced: SIGSEGV: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGBUS: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > replaced: SIGBUS: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGFPE: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > replaced: SIGFPE: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGPIPE: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGXFSZ: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGILL: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > replaced: SIGILL: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGUSR2: SR_handler in libjvm.so, mask=00000000000000000000000000000000, flags=SA_RESTART|SA_SIGINFO > SIGHUP: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGINT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGTERM: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGQUIT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGTRAP: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > > ----- > Tests: GA, and the patch has been tested in our nighlies for over a month now. I manually executed the runtime/jni/checked tests too. Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: Fix build error on zlinux ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2251/files - new: https://git.openjdk.java.net/jdk/pull/2251/files/44fa2199..40201e1d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2251&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2251&range=02-03 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2251.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2251/head:pull/2251 PR: https://git.openjdk.java.net/jdk/pull/2251 From ayang at openjdk.java.net Thu Feb 4 10:15:41 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Thu, 4 Feb 2021 10:15:41 GMT Subject: RFR: 8234534: Simplify CardTable code after CMS removal In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 15:13:38 GMT, Thomas Schatzl wrote: > Hi, > > can I have reviews for this cleanup that removes CMS specific code from `CardTable/CardTableRS`? > > Note that there is still this "conc_scan" parameter passed to the card table that affects barrier code generation, for some reason also G1 barrier code generation although it should not as `G1CardTable::scanned_concurrently()` only used for the "normal" card table. Initial attempts showed that removing this is not straightforward, causing crashes and so I left it out for [JDK-8250941](https://bugs.openjdk.java.net/browse/JDK-8260941) so that this change is solely about removing unused code. > > Testing: tier1-4, some tier1-5 runs earlier (before some removal of hunks for files only containing copyright updates or newline changes) Marked as reviewed by ayang (Author). src/hotspot/share/gc/shared/cardTableRS.cpp line 442: > 440: CardTable(whole_heap, scanned_concurrently) { } > 441: > 442: CardTableRS::~CardTableRS() { } Now that it's empty, is it possible to remove it completely? src/hotspot/share/gc/shared/cardTableRS.hpp line 55: > 53: virtual void verify_used_region_at_save_marks(Space* sp) const NOT_DEBUG_RETURN; > 54: > 55: void inline_write_ref_field_gc(void* field, oop new_val) { It seems that the arg `new_val` is not used. Maybe remove it or add a comment saying it's an intentional omission. ------------- PR: https://git.openjdk.java.net/jdk/pull/2354 From kbarrett at openjdk.java.net Thu Feb 4 10:31:41 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 4 Feb 2021 10:31:41 GMT Subject: RFR: 8234534: Simplify CardTable code after CMS removal In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 15:13:38 GMT, Thomas Schatzl wrote: > Hi, > > can I have reviews for this cleanup that removes CMS specific code from `CardTable/CardTableRS`? > > Note that there is still this "conc_scan" parameter passed to the card table that affects barrier code generation, for some reason also G1 barrier code generation although it should not as `G1CardTable::scanned_concurrently()` only used for the "normal" card table. Initial attempts showed that removing this is not straightforward, causing crashes and so I left it out for [JDK-8250941](https://bugs.openjdk.java.net/browse/JDK-8260941) so that this change is solely about removing unused code. > > Testing: tier1-4, some tier1-5 runs earlier (before some removal of hunks for files only containing copyright updates or newline changes) Looks good to me, with the one minor nit I commented on and Albert's suggestions. src/hotspot/share/gc/shared/cardTableRS.cpp line 43: > 41: inline bool ClearNoncleanCardWrapper::clear_card(CardValue* entry) { > 42: CardValue entry_val = *entry; > 43: assert(entry_val == CardTableRS::dirty_card_val(), Consider eliminating `entry_val` - just use `*entry` in the assert. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2354 From lucy at openjdk.java.net Thu Feb 4 11:48:45 2021 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Thu, 4 Feb 2021 11:48:45 GMT Subject: RFR: 8260369: [PPC64] Add support for JDK-8200555 In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 16:14:17 GMT, Martin Doerr wrote: > I'd like to add the PPC64 part of JDK-8200555 "OopHandle should use Access API". This will be required to support ShenandoahGC and zGC. > > I have to change register usage. That's what makes this change a bit larger. Changes look good to me. Please review and act on my inline comment on interp_masm_ppc_64.cpp(line 507). Thanks, Lutz src/hotspot/cpu/ppc/interp_masm_ppc_64.cpp line 507: > 505: load_heap_oop(result, arrayOopDesc::base_offset_in_bytes(T_OBJECT), result, > 506: tmp1, tmp2, > 507: MacroAssembler::MacroAssembler::PRESERVATION_NONE, Are you sure you need this duplicate class specification? ------------- PR: https://git.openjdk.java.net/jdk/pull/2358 From akozlov at openjdk.java.net Thu Feb 4 12:00:51 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Thu, 4 Feb 2021 12:00:51 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: Message-ID: <9N0zJ8ZgY9e3yIF3IKnkdbkRu80waLyh8GHBti22DK8=.949c0612-f514-44a0-9d2b-ff9e2eb539d1@github.com> On Tue, 2 Feb 2021 22:56:55 GMT, Daniel D. Daugherty wrote: >> Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> support macos_aarch64 in hsdis > > src/hotspot/share/runtime/stubRoutines.inline.hpp line 1: > >> 1: /* > > NOW I understand the reason for switching from include to inline-include. > Is there a reason that this change is part of this project and not extracted > into a separate RFE. That would reduce the number of files touched by > this project. Makes sense, thanks. I'll do this as JDK-8261075. ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From mdoerr at openjdk.java.net Thu Feb 4 12:53:58 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 4 Feb 2021 12:53:58 GMT Subject: RFR: 8260369: [PPC64] Add support for JDK-8200555 [v2] In-Reply-To: References: Message-ID: <3Bx5IPnaueCwXmRJ2SYoktEV6wAX1LnH2cpyb-Ta6vY=.7fb9f5df-b00b-4d8e-bb74-aecc61ebd5d8@github.com> > I'd like to add the PPC64 part of JDK-8200555 "OopHandle should use Access API". This will be required to support ShenandoahGC and zGC. > > I have to change register usage. That's what makes this change a bit larger. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: remove duplicate MacroAssembler::MacroAssembler:: ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2358/files - new: https://git.openjdk.java.net/jdk/pull/2358/files/7636d1af..021aec12 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2358&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2358&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2358.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2358/head:pull/2358 PR: https://git.openjdk.java.net/jdk/pull/2358 From lucy at openjdk.java.net Thu Feb 4 12:56:45 2021 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Thu, 4 Feb 2021 12:56:45 GMT Subject: RFR: 8260369: [PPC64] Add support for JDK-8200555 [v2] In-Reply-To: <3Bx5IPnaueCwXmRJ2SYoktEV6wAX1LnH2cpyb-Ta6vY=.7fb9f5df-b00b-4d8e-bb74-aecc61ebd5d8@github.com> References: <3Bx5IPnaueCwXmRJ2SYoktEV6wAX1LnH2cpyb-Ta6vY=.7fb9f5df-b00b-4d8e-bb74-aecc61ebd5d8@github.com> Message-ID: On Thu, 4 Feb 2021 12:53:58 GMT, Martin Doerr wrote: >> I'd like to add the PPC64 part of JDK-8200555 "OopHandle should use Access API". This will be required to support ShenandoahGC and zGC. >> >> I have to change register usage. That's what makes this change a bit larger. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > remove duplicate MacroAssembler::MacroAssembler:: LGTM ------------- Marked as reviewed by lucy (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2358 From mdoerr at openjdk.java.net Thu Feb 4 13:08:07 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 4 Feb 2021 13:08:07 GMT Subject: RFR: 8260369: [PPC64] Add support for JDK-8200555 [v3] In-Reply-To: References: Message-ID: > I'd like to add the PPC64 part of JDK-8200555 "OopHandle should use Access API". This will be required to support ShenandoahGC and zGC. > > I have to change register usage. That's what makes this change a bit larger. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: use PRESERVATION_NONE in load_field_cp_cache_entry ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2358/files - new: https://git.openjdk.java.net/jdk/pull/2358/files/021aec12..be66f4e5 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2358&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2358&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2358.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2358/head:pull/2358 PR: https://git.openjdk.java.net/jdk/pull/2358 From mdoerr at openjdk.java.net Thu Feb 4 13:10:41 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 4 Feb 2021 13:10:41 GMT Subject: RFR: 8260369: [PPC64] Add support for JDK-8200555 [v2] In-Reply-To: References: <3Bx5IPnaueCwXmRJ2SYoktEV6wAX1LnH2cpyb-Ta6vY=.7fb9f5df-b00b-4d8e-bb74-aecc61ebd5d8@github.com> Message-ID: <7TZlgU-vxKbe2cBstru58Cy2KsOwq4Q0PPlBn4KVuYo=.f2e4b462-f8f8-4c8f-a77f-0f905b282f5f@github.com> On Thu, 4 Feb 2021 12:53:57 GMT, Lutz Schmidt wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> remove duplicate MacroAssembler::MacroAssembler:: > > LGTM Thanks for the review! MacroAssembler::MacroAssembler::PRESERVATION_NONE was a copy&paste bug. And while fixing it, I noticed that I should use PRESERVATION_NONE in load_field_cp_cache_entry, too, which fits better to the interpreter design on PPC64 where we can usually call C without any save&restore code: https://github.com/openjdk/jdk/pull/2358/commits/be66f4e533e46549bdfc43019920bcff09c4e49a Tested by using clobber code and running a benchmark. ------------- PR: https://git.openjdk.java.net/jdk/pull/2358 From coleenp at openjdk.java.net Thu Feb 4 13:16:43 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 4 Feb 2021 13:16:43 GMT Subject: RFR: 8261031: Move some ClassLoader name checking to native/VM In-Reply-To: References: <3fZUkpucpgdhZyyWDQ7Hp1oKthgl1ckXBq942wMNwxI=.7a3db0ca-03c0-44f9-ade9-3b4443cc6666@github.com> Message-ID: On Wed, 3 Feb 2021 19:49:30 GMT, Mandy Chung wrote: >> This patch moves some sanity checking done in ClassLoader.java to the corresponding endpoints in native or VM code. > > src/java.base/share/native/libjava/ClassLoader.c line 291: > >> 289: } >> 290: // disallow slashes in input, change '.' to '/' >> 291: if (verifyFixClassname(clname)) { > > perhaps we should replace all use of `fixClassname` with `verifyFixClassname` and remove `fixClassname`. This suggestion makes sense to me. verifyClassName is only used once in Class.c passing false so you could remove that argument. It's hard to see how fixClassName then verifyClassname is equivalent to verifyFixClassname but verifyFixClassname makes more sense than verifyClassname. I think this return: return (p != 0 && p - name == (ptrdiff_t)length); implies a non-utf8 character was found? ------------- PR: https://git.openjdk.java.net/jdk/pull/2378 From coleenp at openjdk.java.net Thu Feb 4 13:16:41 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 4 Feb 2021 13:16:41 GMT Subject: RFR: 8261031: Move some ClassLoader name checking to native/VM In-Reply-To: <3fZUkpucpgdhZyyWDQ7Hp1oKthgl1ckXBq942wMNwxI=.7a3db0ca-03c0-44f9-ade9-3b4443cc6666@github.com> References: <3fZUkpucpgdhZyyWDQ7Hp1oKthgl1ckXBq942wMNwxI=.7a3db0ca-03c0-44f9-ade9-3b4443cc6666@github.com> Message-ID: On Wed, 3 Feb 2021 12:21:30 GMT, Claes Redestad wrote: > This patch moves some sanity checking done in ClassLoader.java to the corresponding endpoints in native or VM code. Changes requested by coleenp (Reviewer). src/java.base/share/classes/java/lang/ClassLoader.java line 1259: > 1257: Class findBootstrapClassOrNull(String name) { > 1258: return findBootstrapClass(name); > 1259: } I'm confused why this would improve performance. Wouldn't avoiding the transition between Java to the VM be good? Or is checkName seldom false, so we're checking valid names for nothing? ------------- PR: https://git.openjdk.java.net/jdk/pull/2378 From coleenp at openjdk.java.net Thu Feb 4 13:16:44 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 4 Feb 2021 13:16:44 GMT Subject: RFR: 8261031: Move some ClassLoader name checking to native/VM In-Reply-To: References: <3fZUkpucpgdhZyyWDQ7Hp1oKthgl1ckXBq942wMNwxI=.7a3db0ca-03c0-44f9-ade9-3b4443cc6666@github.com> Message-ID: On Thu, 4 Feb 2021 13:11:47 GMT, Coleen Phillimore wrote: >> src/java.base/share/native/libjava/ClassLoader.c line 291: >> >>> 289: } >>> 290: // disallow slashes in input, change '.' to '/' >>> 291: if (verifyFixClassname(clname)) { >> >> perhaps we should replace all use of `fixClassname` with `verifyFixClassname` and remove `fixClassname`. > > This suggestion makes sense to me. verifyClassName is only used once in Class.c passing false so you could remove that argument. > It's hard to see how fixClassName then verifyClassname is equivalent to verifyFixClassname but verifyFixClassname makes more sense than verifyClassname. > I think this return: > return (p != 0 && p - name == (ptrdiff_t)length); > implies a non-utf8 character was found? Actually I think replacing fixClassName with verifyFixClassname will be awkward since the latter returns a value that's not checked in all the callers of fixClassName. Maybe you could write fixClassName as: void fixClassName() { verifyFixClassName(); with some assertion it passed? } ------------- PR: https://git.openjdk.java.net/jdk/pull/2378 From coleenp at openjdk.java.net Thu Feb 4 13:20:41 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 4 Feb 2021 13:20:41 GMT Subject: RFR: JDK-8260926: Trace resource exhausted events unconditionally [v2] In-Reply-To: References: <-LuicM5vRHrcchY00NJhTTgngnNKX3ZkbKZVM_pmHpE=.372c1b6a-120c-460d-a168-76f200408efd@github.com> Message-ID: On Wed, 3 Feb 2021 06:39:01 GMT, Thomas Stuefe wrote: >> Analyzing out-of-resource situations in cloud scenarios is no fun. With CloudFoundry, a JVMTI agent (jvmkill) is hooked up intercepting the jvmti "resource exhausted" event, then attempts to write up a heap report. That may fail, e.g. due to bugs in the agent [1], but also because that report runs java code and may suffer from the same resource exhaustion. Successful or not, it unceremoniously kills the VM when done, often leaving us with no information about the actual resource. >> >> It would be very helpful if we had unconditional tracing here. We do have tracing, but it requires a non-product build and is triggered with TraceJVMTI. Also, it traces at trace level which is way to fine granular. >> >> I'd like to introduce another, unconditional trace line here. Arguably, resource exhausted is fatal enough that it justifies unconditional tracing. >> >> This is a bit of a coin toss. Tracing unconditionally would help in most scenarios, where it would be either difficult or even impossible to specify a trace command line switch. OTOH it may trip up scripts parsing the VM output, or some of our tests (which can be fixed). >> >> Thoughts? >> >> ..Thomas >> >> [1] https://github.com/cloudfoundry/jvmkill/issues/18 > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Feedback David This looks good! ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2350 From coleenp at openjdk.java.net Thu Feb 4 13:25:41 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 4 Feb 2021 13:25:41 GMT Subject: RFR: 8261125: Move VM_Operation to vmOperation.hpp In-Reply-To: References: Message-ID: On Thu, 4 Feb 2021 05:38:49 GMT, Ioi Lam wrote: > vmOperations.hpp declares the VM_Operation class, as well as a hodge podge of subclasses such as VM_ForceSafepoint, VM_DeoptimizeFrame. > > Out of the 1000 hotspot .o files, about 680 include vmOperations.hpp (mostly transitively). In most cases, they just need to use the VM_Operation class. > > So we should move VM_Operation to its own header: vmOperation.hpp (no "s"). > > After the refactoring, vmOperations.hpp is included only 64 times. The inclusion count of threadSMR.hpp is also reduced from 687 to 99. HotSpot build time is improved by about 0.4%. > > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. Ok! ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2398 From redestad at openjdk.java.net Thu Feb 4 13:25:52 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Thu, 4 Feb 2021 13:25:52 GMT Subject: RFR: 8261031: Move some ClassLoader name checking to native/VM In-Reply-To: References: <3fZUkpucpgdhZyyWDQ7Hp1oKthgl1ckXBq942wMNwxI=.7a3db0ca-03c0-44f9-ade9-3b4443cc6666@github.com> Message-ID: <5FaAQkARB-2Vg67MU55A6zFI0icKnjO5aegBSpq6cDM=.ba72b609-a85f-4424-942f-2fdd6a973e2f@github.com> On Thu, 4 Feb 2021 12:54:43 GMT, Coleen Phillimore wrote: >> This patch moves some sanity checking done in ClassLoader.java to the corresponding endpoints in native or VM code. > > src/java.base/share/classes/java/lang/ClassLoader.java line 1259: > >> 1257: Class findBootstrapClassOrNull(String name) { >> 1258: return findBootstrapClass(name); >> 1259: } > > I'm confused why this would improve performance. Wouldn't avoiding the transition between Java to the VM be good? Or is checkName seldom false, so we're checking valid names for nothing? It's practically never false, so the checking done here is just extra work. The patch skips execution of a few thousand bytecode on startup as is, but I'm reworking it to try and get rid of the last remaining checkName use clean up the verifyFixClassName/fixClassName use to perhaps consolidate code there for a bit. ------------- PR: https://git.openjdk.java.net/jdk/pull/2378 From coleenp at openjdk.java.net Thu Feb 4 13:26:42 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 4 Feb 2021 13:26:42 GMT Subject: RFR: 8261106: Reduce inclusion of jniHandles.hpp In-Reply-To: References: Message-ID: <4hWwb7Jpf35oaGMbU_YB8MiT2F87wTGJxvL_Mq0UCHs=.a4589f14-f2a0-4e64-884d-c81a7b4b3540@github.com> On Thu, 4 Feb 2021 01:31:39 GMT, Ioi Lam wrote: > niHandles.hpp is included by about 800 out of 1000 HotSpot .o files. Most of these are transitively included from these header files, which don't actually need to include jniHandles.hpp. > > - ci/ciBaseObject.hpp > - ci/ciMetadata.hpp > - ci/ciObject.hpp > - classfile/moduleEntry.hpp > - gc/shared/gcVMOperations.hpp > - jvmci/jvmciJavaClasses.hpp > - runtime/thread.hpp > - services/threadService.hpp > > Fixing these headers reduces the number of .o files that include jniHandles.hpp to 145. > > Note: 43 files were changed in this PR. Most of them were using jniHandles.hpp but were not including it directly. > > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. Ok! Seems trivial since I assume you've built this everywhere. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2393 From stuefe at openjdk.java.net Thu Feb 4 13:28:45 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 4 Feb 2021 13:28:45 GMT Subject: RFR: JDK-8260926: Trace resource exhausted events unconditionally [v2] In-Reply-To: References: <-LuicM5vRHrcchY00NJhTTgngnNKX3ZkbKZVM_pmHpE=.372c1b6a-120c-460d-a168-76f200408efd@github.com> Message-ID: On Thu, 4 Feb 2021 13:18:22 GMT, Coleen Phillimore wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> Feedback David > > This looks good! Thanks, Coleen! @dholmes-ora : are you fine with the latest version? ------------- PR: https://git.openjdk.java.net/jdk/pull/2350 From coleenp at openjdk.java.net Thu Feb 4 13:52:41 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 4 Feb 2021 13:52:41 GMT Subject: RFR: 8260019: Move some Thread subtypes out of thread.hpp In-Reply-To: <-Nn8JnMM_HSkbTRFUt3BsUt56gB63ixaurC5zsJ4oGQ=.b7fcba36-2a7d-469c-bff1-9b134713282d@github.com> References: <-Nn8JnMM_HSkbTRFUt3BsUt56gB63ixaurC5zsJ4oGQ=.b7fcba36-2a7d-469c-bff1-9b134713282d@github.com> Message-ID: On Wed, 3 Feb 2021 22:55:56 GMT, Ioi Lam wrote: > thread.hpp is 2133 lines long and is included by about 800 out of 1000 HotSpot .o files. It also pulls in many other header files. > > Many of the Thread subtypes are infrequently used. This RFE move the easy ones to compilerThread.hpp and nonJavaThread.hpp. This reduces thread.hpp to 1888 lines. > > (I hope to do more refactoring of thread.hpp in future RFEs). > > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. I like this a lot. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2390 From coleenp at openjdk.java.net Thu Feb 4 13:52:43 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 4 Feb 2021 13:52:43 GMT Subject: RFR: 8260019: Move some Thread subtypes out of thread.hpp In-Reply-To: References: <-Nn8JnMM_HSkbTRFUt3BsUt56gB63ixaurC5zsJ4oGQ=.b7fcba36-2a7d-469c-bff1-9b134713282d@github.com> Message-ID: On Thu, 4 Feb 2021 06:19:21 GMT, David Holmes wrote: >> thread.hpp is 2133 lines long and is included by about 800 out of 1000 HotSpot .o files. It also pulls in many other header files. >> >> Many of the Thread subtypes are infrequently used. This RFE move the easy ones to compilerThread.hpp and nonJavaThread.hpp. This reduces thread.hpp to 1888 lines. >> >> (I hope to do more refactoring of thread.hpp in future RFEs). >> >> Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. > > src/hotspot/share/compiler/compilerThread.hpp line 2: > >> 1: /* >> 2: * Copyright (c) 1997, 2021, Oracle and/or its affiliates. All rights reserved. > > Should a new file only have a single copyright year? I'm not sure what the rule is about splitting existing files but you use a single year in the new cpp files. I think the rule has been: if it's a new file, it gets a new year. ------------- PR: https://git.openjdk.java.net/jdk/pull/2390 From tschatzl at openjdk.java.net Thu Feb 4 13:56:58 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 4 Feb 2021 13:56:58 GMT Subject: RFR: 8234534: Simplify CardTable code after CMS removal [v2] In-Reply-To: References: Message-ID: > Hi, > > can I have reviews for this cleanup that removes CMS specific code from `CardTable/CardTableRS`? > > Note that there is still this "conc_scan" parameter passed to the card table that affects barrier code generation, for some reason also G1 barrier code generation although it should not as `G1CardTable::scanned_concurrently()` only used for the "normal" card table. Initial attempts showed that removing this is not straightforward, causing crashes and so I left it out for [JDK-8250941](https://bugs.openjdk.java.net/browse/JDK-8260941) so that this change is solely about removing unused code. > > Testing: tier1-4, some tier1-5 runs earlier (before some removal of hunks for files only containing copyright updates or newline changes) Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: kimbarret, albertnetymk review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2354/files - new: https://git.openjdk.java.net/jdk/pull/2354/files/5aa23d74..849c79bb Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2354&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2354&range=00-01 Stats: 11 lines in 4 files changed: 0 ins; 6 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/2354.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2354/head:pull/2354 PR: https://git.openjdk.java.net/jdk/pull/2354 From tschatzl at openjdk.java.net Thu Feb 4 13:56:59 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 4 Feb 2021 13:56:59 GMT Subject: RFR: 8234534: Simplify CardTable code after CMS removal [v2] In-Reply-To: References: Message-ID: <33XHcZDMFLFqOngnBQUpiuaQ_VlxfZ9HPhinJoDGIYY=.838ade60-1bc8-43c7-98d9-9d8c21ba3d26@github.com> On Thu, 4 Feb 2021 10:29:18 GMT, Kim Barrett wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> kimbarret, albertnetymk review > > Looks good to me, with the one minor nit I commented on and Albert's suggestions. All fixed as suggested. Still compiles. ------------- PR: https://git.openjdk.java.net/jdk/pull/2354 From aph at openjdk.java.net Thu Feb 4 14:30:49 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 4 Feb 2021 14:30:49 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: <6oqFTX-wDWyEUpJ6FLHvgP8gi0zmnkF6Mzz87hC6A1w=.e0b31425-4750-472f-9335-0b187a59c834@github.com> Message-ID: On Thu, 4 Feb 2021 09:49:17 GMT, Vladimir Kempik wrote: > > You read my mind, Andrew. Unless, of course, it's optimized to leverage the fact that it's thread-specific.. > > it's thread-specific > > https://developer.apple.com/documentation/apple_silicon/porting_just-in-time_compilers_to_apple_silicon > > > Because pthread_jit_write_protect_np changes only the current thread?s permissions, avoid accessing the same memory region from multiple threads. Giving multiple threads access to the same memory region opens up a potential attack vector, in which one thread has write access and another has executable access to the same region. Umm, so how does patching work? We don't even know if other threads are executing the code we need to patch. ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From vkempik at openjdk.java.net Thu Feb 4 14:43:51 2021 From: vkempik at openjdk.java.net (Vladimir Kempik) Date: Thu, 4 Feb 2021 14:43:51 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: <6oqFTX-wDWyEUpJ6FLHvgP8gi0zmnkF6Mzz87hC6A1w=.e0b31425-4750-472f-9335-0b187a59c834@github.com> Message-ID: On Thu, 4 Feb 2021 14:27:53 GMT, Andrew Haley wrote: > > > You read my mind, Andrew. Unless, of course, it's optimized to leverage the fact that it's thread-specific.. > > > > > > it's thread-specific > > https://developer.apple.com/documentation/apple_silicon/porting_just-in-time_compilers_to_apple_silicon > > > Because pthread_jit_write_protect_np changes only the current thread?s permissions, avoid accessing the same memory region from multiple threads. Giving multiple threads access to the same memory region opens up a potential attack vector, in which one thread has write access and another has executable access to the same region. > > Umm, so how does patching work? We don't even know if other threads are executing the code we need to patch. I thought java can handle that scenario in usual (non W^X systems) just fine, so we just believe jvm did everything right and it's safe to rewrite some code at specific moment. ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From hseigel at openjdk.java.net Thu Feb 4 15:03:42 2021 From: hseigel at openjdk.java.net (Harold Seigel) Date: Thu, 4 Feb 2021 15:03:42 GMT Subject: RFR: 8261106: Reduce inclusion of jniHandles.hpp In-Reply-To: References: Message-ID: On Thu, 4 Feb 2021 01:31:39 GMT, Ioi Lam wrote: > niHandles.hpp is included by about 800 out of 1000 HotSpot .o files. Most of these are transitively included from these header files, which don't actually need to include jniHandles.hpp. > > - ci/ciBaseObject.hpp > - ci/ciMetadata.hpp > - ci/ciObject.hpp > - classfile/moduleEntry.hpp > - gc/shared/gcVMOperations.hpp > - jvmci/jvmciJavaClasses.hpp > - runtime/thread.hpp > - services/threadService.hpp > > Fixing these headers reduces the number of .o files that include jniHandles.hpp to 145. > > Note: 43 files were changed in this PR. Most of them were using jniHandles.hpp but were not including it directly. > > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. LGTM! Harold ------------- Marked as reviewed by hseigel (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2393 From tschatzl at openjdk.java.net Thu Feb 4 15:06:40 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 4 Feb 2021 15:06:40 GMT Subject: RFR: 8259668: Make SubTasksDone use-once In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 16:26:33 GMT, Albert Mingkun Yang wrote: > After JDK-8260574, a instance of `SubTasksDone` is never reused, so part of its APIs could be revised: `clear()` and the code calling it is removed. > > With this patch, `all_tasks_completed` contains only assertion. Kim suggested moving this assertion logic to `~SubTasksDone`, but that could defer the assertion violation. For example, in the case of `G1FullGCMarkTask::work`, there is a significant amount of code running btw the instance when all subtasks are claimed (where `all_tasks_completed` is called in this PR) and `~SubTasksDone`. In the interest of having more precise location where bugs may lie, I have kept `all_tasks_completed` in the original place. More comments on this are welcome. Changes requested by tschatzl (Reviewer). src/hotspot/share/gc/shared/workgroup.hpp line 314: > 312: > 313: void all_tasks_completed_impl(uint skipped[], size_t skipped_size) { > 314: #ifdef ASSERT Please keep the definition of the method into the .cpp file. It's too long. You can use the DEBUG_ONLY macro here to not need to define it in non-assert code. ------------- PR: https://git.openjdk.java.net/jdk/pull/2383 From akozlov at openjdk.java.net Thu Feb 4 15:16:49 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Thu, 4 Feb 2021 15:16:49 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: <6oqFTX-wDWyEUpJ6FLHvgP8gi0zmnkF6Mzz87hC6A1w=.e0b31425-4750-472f-9335-0b187a59c834@github.com> Message-ID: <73MpfeAfsTCSMLBU7q_Ho-oOWdCevCHXTIuxCHHWZBA=.26d22cdc-cb67-490f-9c7b-03623aa049c6@github.com> On Wed, 3 Feb 2021 20:08:28 GMT, Mikael Vidstedt wrote: > Out of curiosity, have you looked at the performance of the W^X state transition? Earlier it was possible to disable W^X protection (unfortunately, I don't know a way to do this now). We compared Renaissance results with W^X transitions like the proposed one vs. no transitions with the protection disabled once. Results were identical for a small and large number of iterations. >From the other hand, I've used https://github.com/AntonKozlov/macos-aarch64-transition-bench to estimate the cost of the transition. I re-did measurements with the current implementation and on consumer hardware: testJNI thrpt 25 277997000.151 ? 4095685.956 ops/s testJniNanoTime thrpt 25 17851098.010 ? 119489.599 ops/s testNanoTime thrpt 25 78007491.762 ? 628455.971 ops/s testNothing thrpt 25 1724298829.088 ? 100537565.068 ops/s testTwoStateAndWX thrpt 25 21958839.057 ? 210490.755 ops/s testWX thrpt 25 23299813.266 ? 149837.302 ops/s There is an overhead, but it does not look like blocking the first implementation. I'm not refusing future optimizations like enabling W only when necessary. But for now, I don't have a robust and maintainable solution for this, sorry. ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From ayang at openjdk.java.net Thu Feb 4 15:41:59 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Thu, 4 Feb 2021 15:41:59 GMT Subject: RFR: 8259668: Make SubTasksDone use-once [v2] In-Reply-To: References: Message-ID: <1t8CJEN8F3yCAYd6oO7OAvMJAeyLz4JG8krB-bl9oBI=.aa665edd-63b0-4111-9211-91cc257c927e@github.com> > After JDK-8260574, a instance of `SubTasksDone` is never reused, so part of its APIs could be revised: `clear()` and the code calling it is removed. > > With this patch, `all_tasks_completed` contains only assertion. Kim suggested moving this assertion logic to `~SubTasksDone`, but that could defer the assertion violation. For example, in the case of `G1FullGCMarkTask::work`, there is a significant amount of code running btw the instance when all subtasks are claimed (where `all_tasks_completed` is called in this PR) and `~SubTasksDone`. In the interest of having more precise location where bugs may lie, I have kept `all_tasks_completed` in the original place. More comments on this are welcome. Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2383/files - new: https://git.openjdk.java.net/jdk/pull/2383/files/16cf3cec..90add10d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2383&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2383&range=00-01 Stats: 61 lines in 2 files changed: 30 ins; 29 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2383.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2383/head:pull/2383 PR: https://git.openjdk.java.net/jdk/pull/2383 From stuefe at openjdk.java.net Thu Feb 4 17:18:45 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 4 Feb 2021 17:18:45 GMT Subject: RFR: 8261125: Move VM_Operation to vmOperation.hpp In-Reply-To: References: Message-ID: On Thu, 4 Feb 2021 05:38:49 GMT, Ioi Lam wrote: > vmOperations.hpp declares the VM_Operation class, as well as a hodge podge of subclasses such as VM_ForceSafepoint, VM_DeoptimizeFrame. > > Out of the 1000 hotspot .o files, about 680 include vmOperations.hpp (mostly transitively). In most cases, they just need to use the VM_Operation class. > > So we should move VM_Operation to its own header: vmOperation.hpp (no "s"). > > After the refactoring, vmOperations.hpp is included only 64 times. The inclusion count of threadSMR.hpp is also reduced from 687 to 99. HotSpot build time is improved by about 0.4%. > > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. Hi Ioi, I like all these include cleanups! How do you find these, do you analyze the include tree? I think vmOperation vs vmOperations could be confusing. But have no immediate better idea. If others are fine with it, I am too. Can you please enable GA so we see that our weirder platforms build? Otherwise, looks good. ..Thomas ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2398 From iklam at openjdk.java.net Thu Feb 4 17:20:00 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 4 Feb 2021 17:20:00 GMT Subject: RFR: 8261106: Reduce inclusion of jniHandles.hpp [v2] In-Reply-To: References: Message-ID: > niHandles.hpp is included by about 800 out of 1000 HotSpot .o files. Most of these are transitively included from these header files, which don't actually need to include jniHandles.hpp. > > - ci/ciBaseObject.hpp > - ci/ciMetadata.hpp > - ci/ciObject.hpp > - classfile/moduleEntry.hpp > - gc/shared/gcVMOperations.hpp > - jvmci/jvmciJavaClasses.hpp > - runtime/thread.hpp > - services/threadService.hpp > > Fixing these headers reduces the number of .o files that include jniHandles.hpp to 145. > > Note: 43 files were changed in this PR. Most of them were using jniHandles.hpp but were not including it directly. > > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into 8261106-reduce-jniHandles-hpp - 8261106: Reduce inclusion of jniHandles.hpp ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2393/files - new: https://git.openjdk.java.net/jdk/pull/2393/files/e7f2d097..6e77fbea Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2393&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2393&range=00-01 Stats: 7440 lines in 439 files changed: 4576 ins; 1518 del; 1346 mod Patch: https://git.openjdk.java.net/jdk/pull/2393.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2393/head:pull/2393 PR: https://git.openjdk.java.net/jdk/pull/2393 From iklam at openjdk.java.net Thu Feb 4 19:08:43 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 4 Feb 2021 19:08:43 GMT Subject: Integrated: 8261106: Reduce inclusion of jniHandles.hpp In-Reply-To: References: Message-ID: On Thu, 4 Feb 2021 01:31:39 GMT, Ioi Lam wrote: > niHandles.hpp is included by about 800 out of 1000 HotSpot .o files. Most of these are transitively included from these header files, which don't actually need to include jniHandles.hpp. > > - ci/ciBaseObject.hpp > - ci/ciMetadata.hpp > - ci/ciObject.hpp > - classfile/moduleEntry.hpp > - gc/shared/gcVMOperations.hpp > - jvmci/jvmciJavaClasses.hpp > - runtime/thread.hpp > - services/threadService.hpp > > Fixing these headers reduces the number of .o files that include jniHandles.hpp to 145. > > Note: 43 files were changed in this PR. Most of them were using jniHandles.hpp but were not including it directly. > > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. This pull request has now been integrated. Changeset: c59e4b66 Author: Ioi Lam URL: https://git.openjdk.java.net/jdk/commit/c59e4b66 Stats: 44 lines in 43 files changed: 36 ins; 8 del; 0 mod 8261106: Reduce inclusion of jniHandles.hpp Reviewed-by: coleenp, hseigel ------------- PR: https://git.openjdk.java.net/jdk/pull/2393 From iklam at openjdk.java.net Thu Feb 4 19:08:42 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 4 Feb 2021 19:08:42 GMT Subject: RFR: 8261106: Reduce inclusion of jniHandles.hpp [v2] In-Reply-To: <4hWwb7Jpf35oaGMbU_YB8MiT2F87wTGJxvL_Mq0UCHs=.a4589f14-f2a0-4e64-884d-c81a7b4b3540@github.com> References: <4hWwb7Jpf35oaGMbU_YB8MiT2F87wTGJxvL_Mq0UCHs=.a4589f14-f2a0-4e64-884d-c81a7b4b3540@github.com> Message-ID: On Thu, 4 Feb 2021 13:24:24 GMT, Coleen Phillimore wrote: >> Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge branch 'master' into 8261106-reduce-jniHandles-hpp >> - 8261106: Reduce inclusion of jniHandles.hpp > > Ok! Seems trivial since I assume you've built this everywhere. Thanks @coleenp and @hseigel for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/2393 From coleen.phillimore at oracle.com Thu Feb 4 19:19:23 2021 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Thu, 4 Feb 2021 14:19:23 -0500 Subject: RFR: 8261125: Move VM_Operation to vmOperation.hpp In-Reply-To: References: Message-ID: On 2/4/21 12:18 PM, Thomas Stuefe wrote: > On Thu, 4 Feb 2021 05:38:49 GMT, Ioi Lam wrote: > >> vmOperations.hpp declares the VM_Operation class, as well as a hodge podge of subclasses such as VM_ForceSafepoint, VM_DeoptimizeFrame. >> >> Out of the 1000 hotspot .o files, about 680 include vmOperations.hpp (mostly transitively). In most cases, they just need to use the VM_Operation class. >> >> So we should move VM_Operation to its own header: vmOperation.hpp (no "s"). >> >> After the refactoring, vmOperations.hpp is included only 64 times. The inclusion count of threadSMR.hpp is also reduced from 687 to 99. HotSpot build time is improved by about 0.4%. >> >> Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. > Hi Ioi, > > I like all these include cleanups! How do you find these, do you analyze the include tree? > > I think vmOperation vs vmOperations could be confusing. But have no immediate better idea. If others are fine with it, I am too. I thought briefly that I will be annoyed if I keep typing the extra 's' when editing the file, but I don't edit that file very much, so I'm fine with it. Coleen > > Can you please enable GA so we see that our weirder platforms build? > > Otherwise, looks good. > > ..Thomas > > ------------- > > Marked as reviewed by stuefe (Reviewer). > > PR: https://git.openjdk.java.net/jdk/pull/2398 From ayang at openjdk.java.net Thu Feb 4 19:29:43 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Thu, 4 Feb 2021 19:29:43 GMT Subject: RFR: 8259668: Make SubTasksDone use-once [v2] In-Reply-To: References: Message-ID: On Thu, 4 Feb 2021 15:00:21 GMT, Thomas Schatzl wrote: >> Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > src/hotspot/share/gc/shared/workgroup.hpp line 314: > >> 312: >> 313: void all_tasks_completed_impl(uint skipped[], size_t skipped_size) { >> 314: #ifdef ASSERT > > Please keep the definition of the method into the .cpp file. It's too long. You can use the DEBUG_ONLY macro here to not need to define it in non-assert code. Revised. ------------- PR: https://git.openjdk.java.net/jdk/pull/2383 From dcubed at openjdk.java.net Thu Feb 4 19:47:45 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 4 Feb 2021 19:47:45 GMT Subject: RFR: 8261125: Move VM_Operation to vmOperation.hpp In-Reply-To: References: Message-ID: On Thu, 4 Feb 2021 17:15:37 GMT, Thomas Stuefe wrote: >> vmOperations.hpp declares the VM_Operation class, as well as a hodge podge of subclasses such as VM_ForceSafepoint, VM_DeoptimizeFrame. >> >> Out of the 1000 hotspot .o files, about 680 include vmOperations.hpp (mostly transitively). In most cases, they just need to use the VM_Operation class. >> >> So we should move VM_Operation to its own header: vmOperation.hpp (no "s"). >> >> After the refactoring, vmOperations.hpp is included only 64 times. The inclusion count of threadSMR.hpp is also reduced from 687 to 99. HotSpot build time is improved by about 0.4%. >> >> Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. > > Hi Ioi, > > I like all these include cleanups! How do you find these, do you analyze the include tree? > > I think vmOperation vs vmOperations could be confusing. But have no immediate better idea. If others are fine with it, I am too. > > Can you please enable GA so we see that our weirder platforms build? > > Otherwise, looks good. > > ..Thomas A drive-by comment... The class is named VM_Operation, but the file is named vmOperation.hpp. The missing '_' in the file name is a bit of a disconnect, but there are worse in the code base... ------------- PR: https://git.openjdk.java.net/jdk/pull/2398 From vkempik at openjdk.java.net Thu Feb 4 21:53:49 2021 From: vkempik at openjdk.java.net (Vladimir Kempik) Date: Thu, 4 Feb 2021 21:53:49 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: <73MpfeAfsTCSMLBU7q_Ho-oOWdCevCHXTIuxCHHWZBA=.26d22cdc-cb67-490f-9c7b-03623aa049c6@github.com> References: <6oqFTX-wDWyEUpJ6FLHvgP8gi0zmnkF6Mzz87hC6A1w=.e0b31425-4750-472f-9335-0b187a59c834@github.com> <73MpfeAfsTCSMLBU7q_Ho-oOWdCevCHXTIuxCHHWZBA=.26d22cdc-cb67-490f-9c7b-03623aa049c6@github.com> Message-ID: On Thu, 4 Feb 2021 15:13:49 GMT, Anton Kozlov wrote: >> Out of curiosity, have you looked at the performance of the W^X state transition? In particular I'm wondering if the cost is effectively negligible so doing it unconditionally on JVM entry is a no-brainer and just easier/cleaner than the alternatives, or if there are reasons to look at only doing the transition if/when needed (perhaps do it lazily and revert back to X when leaving the JVM?). > >> Out of curiosity, have you looked at the performance of the W^X state transition? > > Earlier it was possible to disable W^X protection (unfortunately, I don't know a way to do this now). We compared Renaissance results with W^X transitions like the proposed one vs. no transitions with the protection disabled once. Results were identical for a small and large number of iterations. > > From the other hand, I've used https://github.com/AntonKozlov/macos-aarch64-transition-bench to estimate the cost of the transition. > I re-did measurements with the current implementation and on consumer hardware: > > testJNI thrpt 25 277997000.151 ? 4095685.956 ops/s > testJniNanoTime thrpt 25 17851098.010 ? 119489.599 ops/s > testNanoTime thrpt 25 78007491.762 ? 628455.971 ops/s > testNothing thrpt 25 1724298829.088 ? 100537565.068 ops/s > testTwoStateAndWX thrpt 25 21958839.057 ? 210490.755 ops/s > testWX thrpt 25 23299813.266 ? 149837.302 ops/s > > There is an overhead, but it does not look like blocking the first implementation. I'm not refusing future optimizations like enabling W only when necessary. But for now, I don't have a robust and maintainable solution for this, sorry. > _Mailing list message from [erik.joelsson at oracle.com](mailto:erik.joelsson at oracle.com) on [2d-dev](mailto:2d-dev at openjdk.java.net):_ > > On 2021-01-26 04:44, Magnus Ihse Bursie wrote: > > > On 2021-01-26 13:09, Vladimir Kempik wrote: > > > On Tue, 26 Jan 2021 12:02:02 GMT, Alan Hayward > > > wrote: > > > > AIUI, the configure line needs passing a prebuilt > > > > JavaNativeFoundation framework > > > > ie: > > > > `--with-extra-ldflags='-F > > > > /Applications/Xcode.app/Contents/SharedFrameworks/ContentDeliveryServices.framework/Versions/A/itms/java/Frameworks/'` > > > > Otherwise there will be missing _JNFNative* functions. > > > > Is this the long term plan? Or will eventually the required code be > > > > moved into JDK and/or the xcode one automatically get picked up by > > > > the configure scripts? > > > > There is ongoing work by P. Race to eliminate dependence on JNF at all > > > > How far has that work come? Otherwise the logic should be added to > > > > configure to look for this framework automatically, and provide a way > > > > to override it/set it if not found. > > > > > > I don't think it's OK to publish a new port that cannot build > > out-of-the-box without hacks like this. > > My understanding is that Apple chose to not provide JNF for aarch64, so > if you want to build OpenJDK, you first need to build JNF yourself (it's > available in github). Phil is working on removing this dependency > completely, which will solve this issue [1]. > > In the meantime, I don't think we should rely on finding JNF in > unsupported locations inside Xcode. Could we wait with integrating this > port until it can be built without such hacks? If not, then adding > something in the documentation on how to get a working JNF would at > least be needed. > > /Erik > > [1] https://bugs.openjdk.java.net/browse/JDK-8257852 This doesn't seem to be an issue anymore, After P.Race have finished with JDK-8257852, Macarm port can be build without extra ld flags. J2Demo works fine as example. This can be checked if one merges pull/2200 branch into his local copy of master branch. ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From gziemski at openjdk.java.net Thu Feb 4 22:01:51 2021 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Thu, 4 Feb 2021 22:01:51 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v10] In-Reply-To: References: Message-ID: <9Nasu4m7orJoGYjX4EYCuz5-aevYNno3Ru3jPHgwkvc=.168cfdf0-648b-46e4-9cb4-b24956eeba7d@github.com> On Wed, 3 Feb 2021 20:01:15 GMT, Anton Kozlov wrote: >> Please review the implementation of JEP 391: macOS/AArch64 Port. >> >> It's heavily based on existing ports to linux/aarch64, macos/x86_64, and windows/aarch64. >> >> Major changes are in: >> * src/hotspot/cpu/aarch64: support of the new calling convention (subtasks JDK-8253817, JDK-8253818) >> * src/hotspot/os_cpu/bsd_aarch64: copy of os_cpu/linux_aarch64 with necessary adjustments (JDK-8253819) >> * src/hotspot/share, test/hotspot/gtest: support of write-xor-execute (W^X), required on macOS/AArch64 platform. It's implemented with pthread_jit_write_protect_np provided by Apple. The W^X mode is local to a thread, so W^X mode change relates to the java thread state change (for java threads). In most cases, JVM executes in write-only mode, except when calling a generated stub like SafeFetch, which requires a temporary switch to execute-only mode. The same execute-only mode is enabled when a java thread executes in java or native states. This approach of managing W^X mode turned out to be simple and efficient enough. >> * src/jdk.hotspot.agent: serviceability agent implementation (JDK-8254941) > > Anton Kozlov has updated the pull request incrementally with six additional commits since the last revision: > > - Merge remote-tracking branch 'origin/jdk/jdk-macos' into jdk-macos > - Add comments to WX transitions > > + minor change of placements > - Use macro conditionals instead of empty functions > - Add W^X to tests > - Do not require known W^X state > - Revert w^x in gtests src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp line 194: > 192: // may get turned off by -fomit-frame-pointer. > 193: frame os::get_sender_for_C_frame(frame* fr) { > 194: return frame(fr->link(), fr->link(), fr->sender_pc()); Why is it return frame(fr->link(), fr->link(), fr->sender_pc()); and not return frame(fr->sender_sp(), fr->link(), fr->sender_pc()); like in the bsd-x86 counter part? ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From akozlov at openjdk.java.net Thu Feb 4 22:05:46 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Thu, 4 Feb 2021 22:05:46 GMT Subject: RFR: 8261071: AArch64: Refactor interpreter native wrappers Message-ID: Please review refactoring of interpreter signature handlers on aarch64. The main objective is to prepare for the new calling convention of macOS/AArch64, although this patch brings nothing from the new convention. Tested with signature stress tests and tier1 on Linux/AArch64. I have stared with a single function implementing SlowSignatureHandler (https://github.com/openjdk/jdk/commit/5ef1bd15c3bb174f4aed5e358d1ce2fff2846858#diff-1ff58ce70aeea7e9842d34e8d8fd9c94dd91182999d455618b2a171efd8f742cR164). The single function was compact but obscure. I was shuffling it until I eventually came to something similar of the initial approach with few pieces abstracted away. The most notable changes in the final version should be * we count only parameters passed in registers * ldrw/strw are used to pass via stack in SignatureHandlerGenerator::pass_int ------------- Commit messages: - Fix java stack access for int & float - Extract functions from SignatureHandlerGenerator - Finish SlowSignatureHandler - Simplify pass_fp/gp - Split load - Split SSh to pass_gp/fp - Single function for SlowSignatureHandler Changes: https://git.openjdk.java.net/jdk/pull/2413/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2413&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261071 Stats: 266 lines in 2 files changed: 39 ins; 147 del; 80 mod Patch: https://git.openjdk.java.net/jdk/pull/2413.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2413/head:pull/2413 PR: https://git.openjdk.java.net/jdk/pull/2413 From gziemski at openjdk.java.net Thu Feb 4 22:18:49 2021 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Thu, 4 Feb 2021 22:18:49 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v10] In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 20:01:15 GMT, Anton Kozlov wrote: >> Please review the implementation of JEP 391: macOS/AArch64 Port. >> >> It's heavily based on existing ports to linux/aarch64, macos/x86_64, and windows/aarch64. >> >> Major changes are in: >> * src/hotspot/cpu/aarch64: support of the new calling convention (subtasks JDK-8253817, JDK-8253818) >> * src/hotspot/os_cpu/bsd_aarch64: copy of os_cpu/linux_aarch64 with necessary adjustments (JDK-8253819) >> * src/hotspot/share, test/hotspot/gtest: support of write-xor-execute (W^X), required on macOS/AArch64 platform. It's implemented with pthread_jit_write_protect_np provided by Apple. The W^X mode is local to a thread, so W^X mode change relates to the java thread state change (for java threads). In most cases, JVM executes in write-only mode, except when calling a generated stub like SafeFetch, which requires a temporary switch to execute-only mode. The same execute-only mode is enabled when a java thread executes in java or native states. This approach of managing W^X mode turned out to be simple and efficient enough. >> * src/jdk.hotspot.agent: serviceability agent implementation (JDK-8254941) > > Anton Kozlov has updated the pull request incrementally with six additional commits since the last revision: > > - Merge remote-tracking branch 'origin/jdk/jdk-macos' into jdk-macos > - Add comments to WX transitions > > + minor change of placements > - Use macro conditionals instead of empty functions > - Add W^X to tests > - Do not require known W^X state > - Revert w^x in gtests src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp line 291: > 289: bool is_unsafe_arraycopy = (thread->doing_unsafe_access() && UnsafeCopyMemory::contains_pc(pc)); > 290: if ((nm != NULL && nm->has_unsafe_access()) || is_unsafe_arraycopy) { > 291: address next_pc = pc + NativeCall::instruction_size; Replace address next_pc = pc + NativeCall::instruction_size; with address next_pc = Assembler::locate_next_instruction(pc); there is at least one other place that needs it. src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp line 322: > 320: #ifdef __APPLE__ > 321: } else if (sig == SIGFPE && info->si_code == FPE_NOOP) { > 322: Unimplemented(); Is there a follow up issue for this? ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From gziemski at openjdk.java.net Thu Feb 4 22:22:48 2021 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Thu, 4 Feb 2021 22:22:48 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v10] In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 20:01:15 GMT, Anton Kozlov wrote: >> Please review the implementation of JEP 391: macOS/AArch64 Port. >> >> It's heavily based on existing ports to linux/aarch64, macos/x86_64, and windows/aarch64. >> >> Major changes are in: >> * src/hotspot/cpu/aarch64: support of the new calling convention (subtasks JDK-8253817, JDK-8253818) >> * src/hotspot/os_cpu/bsd_aarch64: copy of os_cpu/linux_aarch64 with necessary adjustments (JDK-8253819) >> * src/hotspot/share, test/hotspot/gtest: support of write-xor-execute (W^X), required on macOS/AArch64 platform. It's implemented with pthread_jit_write_protect_np provided by Apple. The W^X mode is local to a thread, so W^X mode change relates to the java thread state change (for java threads). In most cases, JVM executes in write-only mode, except when calling a generated stub like SafeFetch, which requires a temporary switch to execute-only mode. The same execute-only mode is enabled when a java thread executes in java or native states. This approach of managing W^X mode turned out to be simple and efficient enough. >> * src/jdk.hotspot.agent: serviceability agent implementation (JDK-8254941) > > Anton Kozlov has updated the pull request incrementally with six additional commits since the last revision: > > - Merge remote-tracking branch 'origin/jdk/jdk-macos' into jdk-macos > - Add comments to WX transitions > > + minor change of placements > - Use macro conditionals instead of empty functions > - Add W^X to tests > - Do not require known W^X state > - Revert w^x in gtests src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp line 198: > 196: > 197: NOINLINE frame os::current_frame() { > 198: intptr_t *fp = *(intptr_t **)__builtin_frame_address(0); In the bsd_x86 counter part we initialize `fp` to `_get_previous_fp()` - do we need to implement it on aarch64 as well (and using address 0 is just a temp workaround) or is it doing the right thing here? ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From dholmes at openjdk.java.net Thu Feb 4 22:27:40 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 4 Feb 2021 22:27:40 GMT Subject: RFR: 8261125: Move VM_Operation to vmOperation.hpp In-Reply-To: References: Message-ID: <1bctHoGUQv3v65nZBeuvMgXH4ur6CU3xXkzHT6ZqIPo=.aeda32f9-fdfe-4561-b6f0-7201caa0eea1@github.com> On Thu, 4 Feb 2021 05:38:49 GMT, Ioi Lam wrote: > vmOperations.hpp declares the VM_Operation class, as well as a hodge podge of subclasses such as VM_ForceSafepoint, VM_DeoptimizeFrame. > > Out of the 1000 hotspot .o files, about 680 include vmOperations.hpp (mostly transitively). In most cases, they just need to use the VM_Operation class. > > So we should move VM_Operation to its own header: vmOperation.hpp (no "s"). > > After the refactoring, vmOperations.hpp is included only 64 times. The inclusion count of threadSMR.hpp is also reduced from 687 to 99. HotSpot build time is improved by about 0.4%. > > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. Hi Ioi, The distinction between vmOperation versus vmOperations is far too subtle. Perhaps as Dan implied vm_Operation.hpp or VM_Operation.hpp (though that breaks normal - odd - naming convention). I assume most files that include vmOperation.hpp are those that define VM_Operation subclasses? Thanks, David src/hotspot/share/runtime/vmOperation.hpp line 2: > 1: /* > 2: * Copyright (c) 1997, 2021, Oracle and/or its affiliates. All rights reserved. New file so single copyright year. ------------- Changes requested by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2398 From gziemski at openjdk.java.net Thu Feb 4 22:36:49 2021 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Thu, 4 Feb 2021 22:36:49 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v10] In-Reply-To: References: Message-ID: <9bn80xM9g41OytCtH71-krOZtAgy27TbvTFJHmfmKrE=.e9d223d5-2922-4891-8f45-ee039d88ced6@github.com> On Wed, 3 Feb 2021 20:01:15 GMT, Anton Kozlov wrote: >> Please review the implementation of JEP 391: macOS/AArch64 Port. >> >> It's heavily based on existing ports to linux/aarch64, macos/x86_64, and windows/aarch64. >> >> Major changes are in: >> * src/hotspot/cpu/aarch64: support of the new calling convention (subtasks JDK-8253817, JDK-8253818) >> * src/hotspot/os_cpu/bsd_aarch64: copy of os_cpu/linux_aarch64 with necessary adjustments (JDK-8253819) >> * src/hotspot/share, test/hotspot/gtest: support of write-xor-execute (W^X), required on macOS/AArch64 platform. It's implemented with pthread_jit_write_protect_np provided by Apple. The W^X mode is local to a thread, so W^X mode change relates to the java thread state change (for java threads). In most cases, JVM executes in write-only mode, except when calling a generated stub like SafeFetch, which requires a temporary switch to execute-only mode. The same execute-only mode is enabled when a java thread executes in java or native states. This approach of managing W^X mode turned out to be simple and efficient enough. >> * src/jdk.hotspot.agent: serviceability agent implementation (JDK-8254941) > > Anton Kozlov has updated the pull request incrementally with six additional commits since the last revision: > > - Merge remote-tracking branch 'origin/jdk/jdk-macos' into jdk-macos > - Add comments to WX transitions > > + minor change of placements > - Use macro conditionals instead of empty functions > - Add W^X to tests > - Do not require known W^X state > - Revert w^x in gtests src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp line 237: > 235: os::Posix::ucontext_set_pc(uc, StubRoutines::continuation_for_safefetch_fault(pc)); > 236: return true; > 237: } Isn't this case already handled by `JVM_HANDLE_XXX_SIGNAL()` ? Why do we need it here again? ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From gziemski at openjdk.java.net Thu Feb 4 22:44:47 2021 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Thu, 4 Feb 2021 22:44:47 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v10] In-Reply-To: References: Message-ID: <5xcTYtv5IrRdAkhPa6uG0C__8L6IsXKQOCsAaeha0vk=.217371e5-85a4-4b62-a885-5236a94cd242@github.com> On Wed, 3 Feb 2021 20:01:15 GMT, Anton Kozlov wrote: >> Please review the implementation of JEP 391: macOS/AArch64 Port. >> >> It's heavily based on existing ports to linux/aarch64, macos/x86_64, and windows/aarch64. >> >> Major changes are in: >> * src/hotspot/cpu/aarch64: support of the new calling convention (subtasks JDK-8253817, JDK-8253818) >> * src/hotspot/os_cpu/bsd_aarch64: copy of os_cpu/linux_aarch64 with necessary adjustments (JDK-8253819) >> * src/hotspot/share, test/hotspot/gtest: support of write-xor-execute (W^X), required on macOS/AArch64 platform. It's implemented with pthread_jit_write_protect_np provided by Apple. The W^X mode is local to a thread, so W^X mode change relates to the java thread state change (for java threads). In most cases, JVM executes in write-only mode, except when calling a generated stub like SafeFetch, which requires a temporary switch to execute-only mode. The same execute-only mode is enabled when a java thread executes in java or native states. This approach of managing W^X mode turned out to be simple and efficient enough. >> * src/jdk.hotspot.agent: serviceability agent implementation (JDK-8254941) > > Anton Kozlov has updated the pull request incrementally with six additional commits since the last revision: > > - Merge remote-tracking branch 'origin/jdk/jdk-macos' into jdk-macos > - Add comments to WX transitions > > + minor change of placements > - Use macro conditionals instead of empty functions > - Add W^X to tests > - Do not require known W^X state > - Revert w^x in gtests src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp line 302: > 300: const uint64_t *detail_msg_ptr > 301: = (uint64_t*)(pc + NativeInstruction::instruction_size); > 302: const char *detail_msg = (const char *)*detail_msg_ptr; Where is `detail_msg` used? ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From gziemski at openjdk.java.net Thu Feb 4 22:51:50 2021 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Thu, 4 Feb 2021 22:51:50 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v10] In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 20:01:15 GMT, Anton Kozlov wrote: >> Please review the implementation of JEP 391: macOS/AArch64 Port. >> >> It's heavily based on existing ports to linux/aarch64, macos/x86_64, and windows/aarch64. >> >> Major changes are in: >> * src/hotspot/cpu/aarch64: support of the new calling convention (subtasks JDK-8253817, JDK-8253818) >> * src/hotspot/os_cpu/bsd_aarch64: copy of os_cpu/linux_aarch64 with necessary adjustments (JDK-8253819) >> * src/hotspot/share, test/hotspot/gtest: support of write-xor-execute (W^X), required on macOS/AArch64 platform. It's implemented with pthread_jit_write_protect_np provided by Apple. The W^X mode is local to a thread, so W^X mode change relates to the java thread state change (for java threads). In most cases, JVM executes in write-only mode, except when calling a generated stub like SafeFetch, which requires a temporary switch to execute-only mode. The same execute-only mode is enabled when a java thread executes in java or native states. This approach of managing W^X mode turned out to be simple and efficient enough. >> * src/jdk.hotspot.agent: serviceability agent implementation (JDK-8254941) > > Anton Kozlov has updated the pull request incrementally with six additional commits since the last revision: > > - Merge remote-tracking branch 'origin/jdk/jdk-macos' into jdk-macos > - Add comments to WX transitions > > + minor change of placements > - Use macro conditionals instead of empty functions > - Add W^X to tests > - Do not require known W^X state > - Revert w^x in gtests src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp line 297: > 295: stub = SharedRuntime::handle_unsafe_access(thread, next_pc); > 296: } > 297: } else if (sig == SIGILL && nativeInstruction_at(pc)->is_stop()) { Can we add a comment here describing what this case means? ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From dcubed at openjdk.java.net Thu Feb 4 23:03:55 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 4 Feb 2021 23:03:55 GMT Subject: RFR: 8261190: restore original Alibaba copyright line in two files Message-ID: A trivial fix to restore original Alibaba copyright line in two files. ------------- Commit messages: - 8261190: restore original Alibaba copyright line in two files Changes: https://git.openjdk.java.net/jdk/pull/2417/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2417&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261190 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2417.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2417/head:pull/2417 PR: https://git.openjdk.java.net/jdk/pull/2417 From dholmes at openjdk.java.net Thu Feb 4 23:03:55 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 4 Feb 2021 23:03:55 GMT Subject: RFR: 8261190: restore original Alibaba copyright line in two files In-Reply-To: References: Message-ID: <_nUZ2MGD_nynkjPozD2LlEHeX5h2EjozCijcV86eNLw=.c4ec5bad-62af-4283-b873-9d998ab7638a@github.com> On Thu, 4 Feb 2021 22:56:16 GMT, Daniel D. Daugherty wrote: > A trivial fix to restore original Alibaba copyright line in two files. Looks good and trivial. Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2417 From dcubed at openjdk.java.net Thu Feb 4 23:03:55 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 4 Feb 2021 23:03:55 GMT Subject: RFR: 8261190: restore original Alibaba copyright line in two files In-Reply-To: References: Message-ID: On Thu, 4 Feb 2021 22:58:26 GMT, Daniel D. Daugherty wrote: >> A trivial fix to restore original Alibaba copyright line in two files. > > @dholmes-ora, @egahlin and @D-D-H - Would love for the three > of you to chime in on this one. The validate-source task in Mach5 Tier1 has passed. ------------- PR: https://git.openjdk.java.net/jdk/pull/2417 From dcubed at openjdk.java.net Thu Feb 4 23:03:55 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 4 Feb 2021 23:03:55 GMT Subject: RFR: 8261190: restore original Alibaba copyright line in two files In-Reply-To: References: Message-ID: On Thu, 4 Feb 2021 22:56:16 GMT, Daniel D. Daugherty wrote: > A trivial fix to restore original Alibaba copyright line in two files. @dholmes-ora, @egahlin and @D-D-H - Would love for the three of you to chime in on this one. ------------- PR: https://git.openjdk.java.net/jdk/pull/2417 From gziemski at openjdk.java.net Thu Feb 4 23:08:50 2021 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Thu, 4 Feb 2021 23:08:50 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v10] In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 20:01:15 GMT, Anton Kozlov wrote: >> Please review the implementation of JEP 391: macOS/AArch64 Port. >> >> It's heavily based on existing ports to linux/aarch64, macos/x86_64, and windows/aarch64. >> >> Major changes are in: >> * src/hotspot/cpu/aarch64: support of the new calling convention (subtasks JDK-8253817, JDK-8253818) >> * src/hotspot/os_cpu/bsd_aarch64: copy of os_cpu/linux_aarch64 with necessary adjustments (JDK-8253819) >> * src/hotspot/share, test/hotspot/gtest: support of write-xor-execute (W^X), required on macOS/AArch64 platform. It's implemented with pthread_jit_write_protect_np provided by Apple. The W^X mode is local to a thread, so W^X mode change relates to the java thread state change (for java threads). In most cases, JVM executes in write-only mode, except when calling a generated stub like SafeFetch, which requires a temporary switch to execute-only mode. The same execute-only mode is enabled when a java thread executes in java or native states. This approach of managing W^X mode turned out to be simple and efficient enough. >> * src/jdk.hotspot.agent: serviceability agent implementation (JDK-8254941) > > Anton Kozlov has updated the pull request incrementally with six additional commits since the last revision: > > - Merge remote-tracking branch 'origin/jdk/jdk-macos' into jdk-macos > - Add comments to WX transitions > > + minor change of placements > - Use macro conditionals instead of empty functions > - Add W^X to tests > - Do not require known W^X state > - Revert w^x in gtests I reviewed bsd_aarch64.cpp digging bit deeper and left some comments. ------------- Changes requested by gziemski (Committer). PR: https://git.openjdk.java.net/jdk/pull/2200 From dcubed at openjdk.java.net Thu Feb 4 23:13:41 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 4 Feb 2021 23:13:41 GMT Subject: Integrated: 8261190: restore original Alibaba copyright line in two files In-Reply-To: References: Message-ID: On Thu, 4 Feb 2021 22:56:16 GMT, Daniel D. Daugherty wrote: > A trivial fix to restore original Alibaba copyright line in two files. This pull request has now been integrated. Changeset: 08f7454f Author: Daniel D. Daugherty URL: https://git.openjdk.java.net/jdk/commit/08f7454f Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod 8261190: restore original Alibaba copyright line in two files Reviewed-by: dholmes ------------- PR: https://git.openjdk.java.net/jdk/pull/2417 From dcubed at openjdk.java.net Thu Feb 4 23:13:39 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 4 Feb 2021 23:13:39 GMT Subject: RFR: 8261190: restore original Alibaba copyright line in two files In-Reply-To: <_nUZ2MGD_nynkjPozD2LlEHeX5h2EjozCijcV86eNLw=.c4ec5bad-62af-4283-b873-9d998ab7638a@github.com> References: <_nUZ2MGD_nynkjPozD2LlEHeX5h2EjozCijcV86eNLw=.c4ec5bad-62af-4283-b873-9d998ab7638a@github.com> Message-ID: On Thu, 4 Feb 2021 23:00:46 GMT, David Holmes wrote: >> A trivial fix to restore original Alibaba copyright line in two files. > > Looks good and trivial. > > Thanks, > David @dholmes-ora - Thanks for the fast review. @egahlin and @D-D-H - please chime in to confirm that you've seen this thread. @D-D-H - please read my comments in: JDK-8261190: restore original Alibaba copyright line in two files so that you know what had to change with your original copyright headers. ------------- PR: https://git.openjdk.java.net/jdk/pull/2417 From dholmes at openjdk.java.net Thu Feb 4 23:26:43 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 4 Feb 2021 23:26:43 GMT Subject: RFR: JDK-8260926: Trace resource exhausted events unconditionally [v2] In-Reply-To: References: <-LuicM5vRHrcchY00NJhTTgngnNKX3ZkbKZVM_pmHpE=.372c1b6a-120c-460d-a168-76f200408efd@github.com> Message-ID: On Wed, 3 Feb 2021 06:39:01 GMT, Thomas Stuefe wrote: >> Analyzing out-of-resource situations in cloud scenarios is no fun. With CloudFoundry, a JVMTI agent (jvmkill) is hooked up intercepting the jvmti "resource exhausted" event, then attempts to write up a heap report. That may fail, e.g. due to bugs in the agent [1], but also because that report runs java code and may suffer from the same resource exhaustion. Successful or not, it unceremoniously kills the VM when done, often leaving us with no information about the actual resource. >> >> It would be very helpful if we had unconditional tracing here. We do have tracing, but it requires a non-product build and is triggered with TraceJVMTI. Also, it traces at trace level which is way to fine granular. >> >> I'd like to introduce another, unconditional trace line here. Arguably, resource exhausted is fatal enough that it justifies unconditional tracing. >> >> This is a bit of a coin toss. Tracing unconditionally would help in most scenarios, where it would be either difficult or even impossible to specify a trace command line switch. OTOH it may trip up scripts parsing the VM output, or some of our tests (which can be fixed). >> >> Thoughts? >> >> ..Thomas >> >> [1] https://github.com/cloudfoundry/jvmkill/issues/18 > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Feedback David I would have formatted the message differently as per my first comment. But ok. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2350 From iklam at openjdk.java.net Fri Feb 5 01:18:57 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Fri, 5 Feb 2021 01:18:57 GMT Subject: RFR: 8260019: Move some Thread subtypes out of thread.hpp [v2] In-Reply-To: <-Nn8JnMM_HSkbTRFUt3BsUt56gB63ixaurC5zsJ4oGQ=.b7fcba36-2a7d-469c-bff1-9b134713282d@github.com> References: <-Nn8JnMM_HSkbTRFUt3BsUt56gB63ixaurC5zsJ4oGQ=.b7fcba36-2a7d-469c-bff1-9b134713282d@github.com> Message-ID: <79S1rUtQj5EtlUyTx3o3vVzFowv98qgdrAiLlXbpwIw=.5593bd3d-34e1-4378-8726-255f669cff99@github.com> > thread.hpp is 2133 lines long and is included by about 800 out of 1000 HotSpot .o files. It also pulls in many other header files. > > Many of the Thread subtypes are infrequently used. This RFE move the easy ones to compilerThread.hpp and nonJavaThread.hpp. This reduces thread.hpp to 1888 lines. > > (I hope to do more refactoring of thread.hpp in future RFEs). > > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - fixed merge - Merge branch 'master' into 8260019-move-some-subtypes-out-of-thread-hpp - fixed copyright year - 8260019: Move some Thread subtypes out of thread.hpp ------------- Changes: https://git.openjdk.java.net/jdk/pull/2390/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2390&range=01 Stats: 1392 lines in 19 files changed: 762 ins; 620 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/2390.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2390/head:pull/2390 PR: https://git.openjdk.java.net/jdk/pull/2390 From ddong at openjdk.java.net Fri Feb 5 02:00:41 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Fri, 5 Feb 2021 02:00:41 GMT Subject: RFR: 8261190: restore original Alibaba copyright line in two files In-Reply-To: <_nUZ2MGD_nynkjPozD2LlEHeX5h2EjozCijcV86eNLw=.c4ec5bad-62af-4283-b873-9d998ab7638a@github.com> References: <_nUZ2MGD_nynkjPozD2LlEHeX5h2EjozCijcV86eNLw=.c4ec5bad-62af-4283-b873-9d998ab7638a@github.com> Message-ID: On Thu, 4 Feb 2021 23:00:46 GMT, David Holmes wrote: >> A trivial fix to restore original Alibaba copyright line in two files. > > Looks good and trivial. > > Thanks, > David > @dholmes-ora - Thanks for the fast review. > @egahlin and @D-D-H - please chime in to confirm that you've seen this thread. > @D-D-H - please read my comments in: > JDK-8261190: restore original Alibaba copyright line in two files > so that you know what had to change with your original copyright headers. There is a problem with the copyright format we used before, thank you for your correction. ------------- PR: https://git.openjdk.java.net/jdk/pull/2417 From iklam at openjdk.java.net Fri Feb 5 02:59:39 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Fri, 5 Feb 2021 02:59:39 GMT Subject: RFR: 8260019: Move some Thread subtypes out of thread.hpp [v2] In-Reply-To: References: <-Nn8JnMM_HSkbTRFUt3BsUt56gB63ixaurC5zsJ4oGQ=.b7fcba36-2a7d-469c-bff1-9b134713282d@github.com> Message-ID: On Thu, 4 Feb 2021 13:46:28 GMT, Coleen Phillimore wrote: >> src/hotspot/share/compiler/compilerThread.hpp line 2: >> >>> 1: /* >>> 2: * Copyright (c) 1997, 2021, Oracle and/or its affiliates. All rights reserved. >> >> Should a new file only have a single copyright year? I'm not sure what the rule is about splitting existing files but you use a single year in the new cpp files. > > I think the rule has been: if it's a new file, it gets a new year. Hi David, you're correct that I just cut-and-pasted the code; the only exception was I made the `thread_entry` functions member functions. I don't have a better idea for the file names, either. I'll keep them as is in the patch. Maybe someone else with a better sense for naming could fix our file names in the future. I fixed the copyright year. ------------- PR: https://git.openjdk.java.net/jdk/pull/2390 From iklam at openjdk.java.net Fri Feb 5 03:05:44 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Fri, 5 Feb 2021 03:05:44 GMT Subject: RFR: 8260019: Move some Thread subtypes out of thread.hpp [v2] In-Reply-To: References: <-Nn8JnMM_HSkbTRFUt3BsUt56gB63ixaurC5zsJ4oGQ=.b7fcba36-2a7d-469c-bff1-9b134713282d@github.com> Message-ID: On Thu, 4 Feb 2021 13:49:33 GMT, Coleen Phillimore wrote: >> Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: >> >> - fixed merge >> - Merge branch 'master' into 8260019-move-some-subtypes-out-of-thread-hpp >> - fixed copyright year >> - 8260019: Move some Thread subtypes out of thread.hpp > > I like this a lot. Thanks @coleenp and @dholmes-ora for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/2390 From iklam at openjdk.java.net Fri Feb 5 03:05:45 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Fri, 5 Feb 2021 03:05:45 GMT Subject: Integrated: 8260019: Move some Thread subtypes out of thread.hpp In-Reply-To: <-Nn8JnMM_HSkbTRFUt3BsUt56gB63ixaurC5zsJ4oGQ=.b7fcba36-2a7d-469c-bff1-9b134713282d@github.com> References: <-Nn8JnMM_HSkbTRFUt3BsUt56gB63ixaurC5zsJ4oGQ=.b7fcba36-2a7d-469c-bff1-9b134713282d@github.com> Message-ID: On Wed, 3 Feb 2021 22:55:56 GMT, Ioi Lam wrote: > thread.hpp is 2133 lines long and is included by about 800 out of 1000 HotSpot .o files. It also pulls in many other header files. > > Many of the Thread subtypes are infrequently used. This RFE move the easy ones to compilerThread.hpp and nonJavaThread.hpp. This reduces thread.hpp to 1888 lines. > > (I hope to do more refactoring of thread.hpp in future RFEs). > > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. This pull request has now been integrated. Changeset: c5bb1092 Author: Ioi Lam URL: https://git.openjdk.java.net/jdk/commit/c5bb1092 Stats: 1392 lines in 19 files changed: 762 ins; 620 del; 10 mod 8260019: Move some Thread subtypes out of thread.hpp Reviewed-by: dholmes, coleenp ------------- PR: https://git.openjdk.java.net/jdk/pull/2390 From dholmes at openjdk.java.net Fri Feb 5 04:21:46 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Fri, 5 Feb 2021 04:21:46 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v4] In-Reply-To: References: Message-ID: <1xLKHl6bUXtDeCRlMtdFmSF6MQreM1Ck8PQt9cuQpGQ=.e27683e3-f6da-4112-9f7a-d540f99b5e72@github.com> On Thu, 4 Feb 2021 10:08:07 GMT, Thomas Stuefe wrote: >> In signal handling code, we have code sections which save signal handler state into vectors of sigaction structures, or of integers (if only flags are saved). All these code sections can be unified, disentangled and the using code simplified. >> >> There are three places where we do this: >> >> 1) When installing hotspot signal handlers, should we find a handler in place and signal chaining is enabled, we save the original handler inside a sigaction array and a corresponding sigset: >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L85 >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L338 >> >> 2) if diagnostics are enabled with -Xcheck:jni, we periodically check if our hotspot signal handlers had been replaced (`static void check_signal_handler(int sig)`): >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L766 >> To do that, we store information about the handlers we installed and we expect to be intact; in this case we only store the sigaction flags (`int sigflags[NSIG];`) and deduce the handler address from context. >> >> 3) There is a complicated dance between VMError and the posix signal handler code: If a fatal error happens, we enter error reporting and install the secondary handler (`VMError::install_secondary_signal_handler()`). Before doing that, we store the handler we replace in yet another array, in this case one array for the handler address, one for the flag: >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/vmError_posix.cpp#L77 >> I believe the purpose of this is to - when printing signal handlers as part of error reporting - print the original signal handler instead of the secondary crash handler (see `PosixSignals::print_signal_handler()`): >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L1372 >> and additionally to not trip this warning here: >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L1391 >> >> ------ >> >> Changes in this patch: >> >> - I added some convenience macros to check if a handler matches a given function (HANDLER_IS), check if a handler is set to ignore or default or both (HANDLER_IS_IGN, HANDLER_IS_DFL, HANDLER_IS_IGN_OR_DFL). Makes code more readable. >> - I added convenience class `SavedSignalHandlers` to keep a vector of handler information by signal number. >> - I used that class to cover cases (1)..(3): >> - `chained_handlers` contains all information of chained handlers >> - `expected_handlers` contains a copy of the handlers the hotspot installed >> - `replaced_handlers` contains information about replaced handlers >> >> - about (1): I store the chained signal handler information in `chained_handlers` when installing a hotspot handler, UseSignalChaining is 1, and a non-default handler was encountered. >> >> - about (2): I simplified the signal checking mechanism quite a bit: it compares the handler (address and flags) it finds present with expectations. Before this patch, the expected handler address was deduced in a hard-wired way, now, we just compare the active sigaction structure with the one we installed on VM start. >> >> - about (3): when installing any handler (hotspot as well as user defined via java), I store the handler it replaced in `replaced_handlers`. I use that to print which handler had been replaced in `PosixSignals::print_signal_handler`. I simplified `PosixSignals::print_signal_handler` such that it does not retain any knowledge about hotspot signal handlers. Now, it just prints out the currently established handlers. In addition to that, it prints out chaining information and which handlers had been replaced. I removed the associated coding from VMError. >> >> Output Before: >> 663 Signal Handlers: >> 664 SIGSEGV: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 665 SIGBUS: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 666 SIGFPE: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 667 SIGPIPE: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 668 SIGXFSZ: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 669 SIGILL: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 670 SIGUSR2: SR_handler in libjvm.so, sa_mask[0]=00000000000000000000000000000000, sa_flags=SA_RESTART|SA_SIGINFO >> 671 SIGHUP: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 672 SIGINT: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 673 SIGTERM: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 674 SIGQUIT: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 675 SIGTRAP: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> >> Now: >> Signal Handlers: >> SIGSEGV: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGSEGV: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGBUS: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGBUS: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGFPE: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGFPE: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGPIPE: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGXFSZ: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGILL: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGILL: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGUSR2: SR_handler in libjvm.so, mask=00000000000000000000000000000000, flags=SA_RESTART|SA_SIGINFO >> SIGHUP: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGINT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGTERM: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGQUIT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGTRAP: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> >> ----- >> Tests: GA, and the patch has been tested in our nighlies for over a month now. I manually executed the runtime/jni/checked tests too. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Fix build error on zlinux Looks good. Thanks for the updates. One query for further improvement below. :) Thanks, David src/hotspot/os/posix/signals_posix.cpp line 121: > 119: assert(_sa[sig] == NULL, "Overwriting signal handler?"); > 120: _sa[sig] = NEW_C_HEAP_OBJ(struct sigaction, mtInternal); > 121: *_sa[sig] = *act; Sorry I missed the fact we weren't actually copying the incoming sigaction struct! src/hotspot/os/posix/signals_posix.cpp line 1426: > 1424: LINUX_ONLY(this_flag &= SA_RESTORER_FLAG_MASK;) > 1425: if (this_flag != expected_act->sa_flags) { > 1426: st->print_cr(" *** Flags changed! Consider using jsig library."); Can't we easily print the difference in flags? ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2251 From david.holmes at oracle.com Fri Feb 5 05:01:10 2021 From: david.holmes at oracle.com (David Holmes) Date: Fri, 5 Feb 2021 15:01:10 +1000 Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v10] In-Reply-To: References: Message-ID: <8ac7e6ab-2e12-67fc-3440-5fd8f7a585ea@oracle.com> Hi Anton, On 4/02/2021 6:01 am, Anton Kozlov wrote: >> Please review the implementation of JEP 391: macOS/AArch64 Port. > > Anton Kozlov has updated the pull request incrementally with six additional commits since the last revision: > > - Merge remote-tracking branch 'origin/jdk/jdk-macos' into jdk-macos > - Add comments to WX transitions > > + minor change of placements > - Use macro conditionals instead of empty functions > - Add W^X to tests > - Do not require known W^X state > - Revert w^x in gtests These updates to the w^x code look good to me, this is much improved in terms of the pervasiveness/intrusiveness. Hopefully there can still be further refinement in the future (after the initial integration). Thanks, David > ------------- > > Changes: > - all: https://git.openjdk.java.net/jdk/pull/2200/files > - new: https://git.openjdk.java.net/jdk/pull/2200/files/3c705ae5..80827176 > > Webrevs: > - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2200&range=09 > - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2200&range=08-09 > > Stats: 444 lines in 64 files changed: 112 ins; 278 del; 54 mod > Patch: https://git.openjdk.java.net/jdk/pull/2200.diff > Fetch: git fetch https://git.openjdk.java.net/jdk pull/2200/head:pull/2200 > > PR: https://git.openjdk.java.net/jdk/pull/2200 > From gziemski at openjdk.java.net Fri Feb 5 05:03:54 2021 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Fri, 5 Feb 2021 05:03:54 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v10] In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 20:01:15 GMT, Anton Kozlov wrote: >> Please review the implementation of JEP 391: macOS/AArch64 Port. >> >> It's heavily based on existing ports to linux/aarch64, macos/x86_64, and windows/aarch64. >> >> Major changes are in: >> * src/hotspot/cpu/aarch64: support of the new calling convention (subtasks JDK-8253817, JDK-8253818) >> * src/hotspot/os_cpu/bsd_aarch64: copy of os_cpu/linux_aarch64 with necessary adjustments (JDK-8253819) >> * src/hotspot/share, test/hotspot/gtest: support of write-xor-execute (W^X), required on macOS/AArch64 platform. It's implemented with pthread_jit_write_protect_np provided by Apple. The W^X mode is local to a thread, so W^X mode change relates to the java thread state change (for java threads). In most cases, JVM executes in write-only mode, except when calling a generated stub like SafeFetch, which requires a temporary switch to execute-only mode. The same execute-only mode is enabled when a java thread executes in java or native states. This approach of managing W^X mode turned out to be simple and efficient enough. >> * src/jdk.hotspot.agent: serviceability agent implementation (JDK-8254941) > > Anton Kozlov has updated the pull request incrementally with six additional commits since the last revision: > > - Merge remote-tracking branch 'origin/jdk/jdk-macos' into jdk-macos > - Add comments to WX transitions > > + minor change of placements > - Use macro conditionals instead of empty functions > - Add W^X to tests > - Do not require known W^X state > - Revert w^x in gtests src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp line 403: > 401: } > 402: > 403: return false; // Mute compiler Is this comment needed? src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp line 420: > 418: size_t os::Posix::_compiler_thread_min_stack_allowed = 72 * K; > 419: size_t os::Posix::_java_thread_min_stack_allowed = 72 * K; > 420: size_t os::Posix::_vm_internal_thread_min_stack_allowed = 72 * K; Those are slightly larger than their x86_64 counter parts. Are they conservative/aggressive values? How did we arrive at those? src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp line 652: > 650: > 651: void os::setup_fpu() { > 652: } Is there really nothing to do here, or does still need to be implemented? A clarification comment here would help/. ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From gziemski at openjdk.java.net Fri Feb 5 05:03:53 2021 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Fri, 5 Feb 2021 05:03:53 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 22:09:58 GMT, Daniel D. Daugherty wrote: >> Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> support macos_aarch64 in hsdis > > src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp line 363: > >> 361: address pc = os::Posix::ucontext_get_pc(uc); >> 362: >> 363: if (pc != addr && uc->context_esr == 0x9200004F) { //TODO: figure out what this value means > > Is this TODO going to be resolved by this port? Where did this come from - some snippet/example/tech note code? Maybe other people can help figure it out if we provide more info. > src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp line 374: > >> 372: >> 373: last_addr = (address) -1; >> 374: } else if (pc == addr && uc->context_esr == 0x8200000f) { //TODO: figure out what this value means > > Is this TODO going to be resolved by this port? Where did this come from - some snippet/example/tech note code? Maybe other people can help figure it out if we provide more info. ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From iklam at openjdk.java.net Fri Feb 5 05:39:40 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Fri, 5 Feb 2021 05:39:40 GMT Subject: RFR: 8261125: Move VM_Operation to vmOperation.hpp In-Reply-To: References: Message-ID: On Thu, 4 Feb 2021 17:15:37 GMT, Thomas Stuefe wrote: > I like all these include cleanups! How do you find these, do you analyze the include tree? I have a few hand-rolled tools for identifying what header files are easy/beneficial to refactor. See https://github.com/iklam/tools/tree/main/headers > I think vmOperation vs vmOperations could be confusing. But have no immediate better idea. If others are fine with it, I am too. > > Can you please enable GA so we see that our weirder platforms build? I think I finally re-enabled the GitHub actions properly for my PRs. Let's see if they get run for my next push to this PR :-) ------------- PR: https://git.openjdk.java.net/jdk/pull/2398 From iklam at openjdk.java.net Fri Feb 5 05:53:39 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Fri, 5 Feb 2021 05:53:39 GMT Subject: RFR: 8261125: Move VM_Operation to vmOperation.hpp In-Reply-To: <1bctHoGUQv3v65nZBeuvMgXH4ur6CU3xXkzHT6ZqIPo=.aeda32f9-fdfe-4561-b6f0-7201caa0eea1@github.com> References: <1bctHoGUQv3v65nZBeuvMgXH4ur6CU3xXkzHT6ZqIPo=.aeda32f9-fdfe-4561-b6f0-7201caa0eea1@github.com> Message-ID: <7F1Nq0zPfyWP644ml5icuHT6lkHlAjJG2qFOJr2dGAY=.daffcc16-1ed0-42db-b77b-e59c72dff349@github.com> On Thu, 4 Feb 2021 22:24:38 GMT, David Holmes wrote: > Hi Ioi, > > The distinction between vmOperation versus vmOperations is far too subtle. Perhaps as Dan implied vm_Operation.hpp or VM_Operation.hpp (though that breaks normal - odd - naming convention). How about this: - runtime/vmOperation.hpp --- (new file) this is the file that declares VM_Operation - runtime/commonVMOperations.hpp -- (renamed from vmOperations.hpp) these are the VM_Operation subclasses that no one cares to organize properly :-) this will be kinda consistent with these existing files: - gc/g1/g1VMOperations.hpp - gc/g1/g1VMOperations.cpp - gc/shenandoah/shenandoahVMOperations.cpp - gc/shenandoah/shenandoahVMOperations.hpp - gc/shared/gcVMOperations.cpp - gc/shared/gcVMOperations.hpp - gc/parallel/psVMOperations.cpp - gc/parallel/psVMOperations.hpp (I should also add a new vmOperation.cpp, and rename vmOperations.cpp to commonVMOperations.cpp) BTW, I need to refactor VM_Exit into its own file. It's used by the `JVM_LEAF` macro in interfaceSupport.inline.hpp, but I don't want to pull in all the other "common" operations in there. I am thinking of calling it vmExit.hpp (since exitVMOperation.hpp doesn't really look good). > I assume most files that include vmOperation.hpp are those that define VM_Operation subclasses? Yes, but there are also files that use the `VM_Operation::VMOp_Type` type, notably safepoint.hpp. ------------- PR: https://git.openjdk.java.net/jdk/pull/2398 From stuefe at openjdk.java.net Fri Feb 5 06:41:47 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 5 Feb 2021 06:41:47 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v4] In-Reply-To: <1xLKHl6bUXtDeCRlMtdFmSF6MQreM1Ck8PQt9cuQpGQ=.e27683e3-f6da-4112-9f7a-d540f99b5e72@github.com> References: <1xLKHl6bUXtDeCRlMtdFmSF6MQreM1Ck8PQt9cuQpGQ=.e27683e3-f6da-4112-9f7a-d540f99b5e72@github.com> Message-ID: On Fri, 5 Feb 2021 04:19:02 GMT, David Holmes wrote: > Looks good. Thanks for the updates. > > One query for further improvement below. :) > > Thanks, > David Thanks for approval, David. Unfortunately I still have a strange build problem on one of our s390 boxes, so this is not done yet. > src/hotspot/os/posix/signals_posix.cpp line 121: > >> 119: assert(_sa[sig] == NULL, "Overwriting signal handler?"); >> 120: _sa[sig] = NEW_C_HEAP_OBJ(struct sigaction, mtInternal); >> 121: *_sa[sig] = *act; > > Sorry I missed the fact we weren't actually copying the incoming sigaction struct! Yes me too, and I'm annoyed by the fact that no tier1 test did catch that. The full jtreg suite would have catched it but it did not run yet since the last commit. > src/hotspot/os/posix/signals_posix.cpp line 1426: > >> 1424: LINUX_ONLY(this_flag &= SA_RESTORER_FLAG_MASK;) >> 1425: if (this_flag != expected_act->sa_flags) { >> 1426: st->print_cr(" *** Flags changed! Consider using jsig library."); > > Can't we easily print the difference in flags? Sure. I'll change it. ------------- PR: https://git.openjdk.java.net/jdk/pull/2251 From stuefe at openjdk.java.net Fri Feb 5 07:33:59 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 5 Feb 2021 07:33:59 GMT Subject: RFR: JDK-8260926: Trace resource exhausted events unconditionally [v3] In-Reply-To: <-LuicM5vRHrcchY00NJhTTgngnNKX3ZkbKZVM_pmHpE=.372c1b6a-120c-460d-a168-76f200408efd@github.com> References: <-LuicM5vRHrcchY00NJhTTgngnNKX3ZkbKZVM_pmHpE=.372c1b6a-120c-460d-a168-76f200408efd@github.com> Message-ID: <5Xj68CmxmlFgu8fPxmL9wfPFgLfGsuLxKSr1l8VhmW0=.9ad80a17-d485-443b-8e89-85e1ef7597e9@github.com> > Analyzing out-of-resource situations in cloud scenarios is no fun. With CloudFoundry, a JVMTI agent (jvmkill) is hooked up intercepting the jvmti "resource exhausted" event, then attempts to write up a heap report. That may fail, e.g. due to bugs in the agent [1], but also because that report runs java code and may suffer from the same resource exhaustion. Successful or not, it unceremoniously kills the VM when done, often leaving us with no information about the actual resource. > > It would be very helpful if we had unconditional tracing here. We do have tracing, but it requires a non-product build and is triggered with TraceJVMTI. Also, it traces at trace level which is way to fine granular. > > I'd like to introduce another, unconditional trace line here. Arguably, resource exhausted is fatal enough that it justifies unconditional tracing. > > This is a bit of a coin toss. Tracing unconditionally would help in most scenarios, where it would be either difficult or even impossible to specify a trace command line switch. OTOH it may trip up scripts parsing the VM output, or some of our tests (which can be fixed). > > Thoughts? > > ..Thomas > > [1] https://github.com/cloudfoundry/jvmkill/issues/18 Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: Reformulate message ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2350/files - new: https://git.openjdk.java.net/jdk/pull/2350/files/40e3af87..b3d331ff Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2350&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2350&range=01-02 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2350.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2350/head:pull/2350 PR: https://git.openjdk.java.net/jdk/pull/2350 From stuefe at openjdk.java.net Fri Feb 5 07:33:59 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 5 Feb 2021 07:33:59 GMT Subject: RFR: JDK-8260926: Trace resource exhausted events unconditionally [v2] In-Reply-To: References: <-LuicM5vRHrcchY00NJhTTgngnNKX3ZkbKZVM_pmHpE=.372c1b6a-120c-460d-a168-76f200408efd@github.com> Message-ID: On Thu, 4 Feb 2021 23:23:56 GMT, David Holmes wrote: > I would have formatted the message differently as per my first comment. But ok. Sorry, I misread part your original comment. I changed the message according to your suggestion. Cheers, Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/2350 From stuefe at openjdk.java.net Fri Feb 5 07:34:00 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 5 Feb 2021 07:34:00 GMT Subject: Integrated: JDK-8260926: Trace resource exhausted events unconditionally In-Reply-To: <-LuicM5vRHrcchY00NJhTTgngnNKX3ZkbKZVM_pmHpE=.372c1b6a-120c-460d-a168-76f200408efd@github.com> References: <-LuicM5vRHrcchY00NJhTTgngnNKX3ZkbKZVM_pmHpE=.372c1b6a-120c-460d-a168-76f200408efd@github.com> Message-ID: <5S3aJ8TNsarqVNUIIoUhR2yoDXAK-Hiiu5x6-zmXNy8=.bcb50b06-4c91-4618-9fd7-7bc0807696ee@github.com> On Tue, 2 Feb 2021 11:02:18 GMT, Thomas Stuefe wrote: > Analyzing out-of-resource situations in cloud scenarios is no fun. With CloudFoundry, a JVMTI agent (jvmkill) is hooked up intercepting the jvmti "resource exhausted" event, then attempts to write up a heap report. That may fail, e.g. due to bugs in the agent [1], but also because that report runs java code and may suffer from the same resource exhaustion. Successful or not, it unceremoniously kills the VM when done, often leaving us with no information about the actual resource. > > It would be very helpful if we had unconditional tracing here. We do have tracing, but it requires a non-product build and is triggered with TraceJVMTI. Also, it traces at trace level which is way to fine granular. > > I'd like to introduce another, unconditional trace line here. Arguably, resource exhausted is fatal enough that it justifies unconditional tracing. > > This is a bit of a coin toss. Tracing unconditionally would help in most scenarios, where it would be either difficult or even impossible to specify a trace command line switch. OTOH it may trip up scripts parsing the VM output, or some of our tests (which can be fixed). > > Thoughts? > > ..Thomas > > [1] https://github.com/cloudfoundry/jvmkill/issues/18 This pull request has now been integrated. Changeset: ee2f2055 Author: Thomas Stuefe URL: https://git.openjdk.java.net/jdk/commit/ee2f2055 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod 8260926: Trace resource exhausted events unconditionally Reviewed-by: dholmes, coleenp ------------- PR: https://git.openjdk.java.net/jdk/pull/2350 From tschatzl at openjdk.java.net Fri Feb 5 08:36:40 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 5 Feb 2021 08:36:40 GMT Subject: RFR: 8234534: Simplify CardTable code after CMS removal [v2] In-Reply-To: References: Message-ID: On Thu, 4 Feb 2021 10:29:18 GMT, Kim Barrett wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> kimbarret, albertnetymk review > > Looks good to me, with the one minor nit I commented on and Albert's suggestions. Thanks @kimbarrett @albertnetymk for your reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/2354 From tschatzl at openjdk.java.net Fri Feb 5 08:36:41 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 5 Feb 2021 08:36:41 GMT Subject: Integrated: 8234534: Simplify CardTable code after CMS removal In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 15:13:38 GMT, Thomas Schatzl wrote: > Hi, > > can I have reviews for this cleanup that removes CMS specific code from `CardTable/CardTableRS`? > > Note that there is still this "conc_scan" parameter passed to the card table that affects barrier code generation, for some reason also G1 barrier code generation although it should not as `G1CardTable::scanned_concurrently()` only used for the "normal" card table. Initial attempts showed that removing this is not straightforward, causing crashes and so I left it out for [JDK-8250941](https://bugs.openjdk.java.net/browse/JDK-8260941) so that this change is solely about removing unused code. > > Testing: tier1-4, some tier1-5 runs earlier (before some removal of hunks for files only containing copyright updates or newline changes) This pull request has now been integrated. Changeset: 78b0d327 Author: Thomas Schatzl URL: https://git.openjdk.java.net/jdk/commit/78b0d327 Stats: 205 lines in 9 files changed: 0 ins; 191 del; 14 mod 8234534: Simplify CardTable code after CMS removal Reviewed-by: ayang, kbarrett ------------- PR: https://git.openjdk.java.net/jdk/pull/2354 From david.holmes at oracle.com Fri Feb 5 08:37:31 2021 From: david.holmes at oracle.com (David Holmes) Date: Fri, 5 Feb 2021 18:37:31 +1000 Subject: RFR: 8261125: Move VM_Operation to vmOperation.hpp In-Reply-To: <7F1Nq0zPfyWP644ml5icuHT6lkHlAjJG2qFOJr2dGAY=.daffcc16-1ed0-42db-b77b-e59c72dff349@github.com> References: <1bctHoGUQv3v65nZBeuvMgXH4ur6CU3xXkzHT6ZqIPo=.aeda32f9-fdfe-4561-b6f0-7201caa0eea1@github.com> <7F1Nq0zPfyWP644ml5icuHT6lkHlAjJG2qFOJr2dGAY=.daffcc16-1ed0-42db-b77b-e59c72dff349@github.com> Message-ID: <0b7b9e47-9ca6-afee-17bf-8063bc46df76@oracle.com> On 5/02/2021 3:53 pm, Ioi Lam wrote: > On Thu, 4 Feb 2021 22:24:38 GMT, David Holmes wrote: > >> Hi Ioi, >> >> The distinction between vmOperation versus vmOperations is far too subtle. Perhaps as Dan implied vm_Operation.hpp or VM_Operation.hpp (though that breaks normal - odd - naming convention). > > How about this: > > - runtime/vmOperation.hpp --- (new file) this is the file that declares VM_Operation > - runtime/commonVMOperations.hpp -- (renamed from vmOperations.hpp) these are the VM_Operation subclasses that no one cares to organize properly :-) > > this will be kinda consistent with these existing files: > > - gc/g1/g1VMOperations.hpp > - gc/g1/g1VMOperations.cpp > - gc/shenandoah/shenandoahVMOperations.cpp > - gc/shenandoah/shenandoahVMOperations.hpp > - gc/shared/gcVMOperations.cpp > - gc/shared/gcVMOperations.hpp > - gc/parallel/psVMOperations.cpp > - gc/parallel/psVMOperations.hpp > > (I should also add a new vmOperation.cpp, and rename vmOperations.cpp to commonVMOperations.cpp) Okay. > BTW, I need to refactor VM_Exit into its own file. It's used by the `JVM_LEAF` macro in interfaceSupport.inline.hpp, but I don't want to pull in all the other "common" operations in there. I am thinking of calling it vmExit.hpp (since exitVMOperation.hpp doesn't really look good). :( Does it really matter? We're going to end up with a bazillion files at this rate and the build time improvements are not even perceptible. David ----- >> I assume most files that include vmOperation.hpp are those that define VM_Operation subclasses? > > Yes, but there are also files that use the `VM_Operation::VMOp_Type` type, notably safepoint.hpp. > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/2398 > From bulasevich at openjdk.java.net Fri Feb 5 09:24:40 2021 From: bulasevich at openjdk.java.net (Boris Ulasevich) Date: Fri, 5 Feb 2021 09:24:40 GMT Subject: RFR: 8260899: ARM32: SyncOnValueBasedClassTest fails with assert(is_valid()) failed: invalid register In-Reply-To: <3kcU-WrscBpRvHSepNBZWu4CNsBaP6pdeYTyIUT2z1E=.180e2e1d-7dec-4c28-b9c4-058627b12419@github.com> References: <3kcU-WrscBpRvHSepNBZWu4CNsBaP6pdeYTyIUT2z1E=.180e2e1d-7dec-4c28-b9c4-058627b12419@github.com> Message-ID: On Tue, 2 Feb 2021 09:24:20 GMT, Aleksey Shipilev wrote: > $ CONF=linux-arm-server-fastdebug make run-test TEST=runtime/Monitor/SyncOnValueBasedClassTest.java > ... > > # Internal Error (/home/pi/jdk/src/hotspot/cpu/arm/register_arm.hpp:155), pid=3793, tid=3808 > # assert(is_valid()) failed: invalid register > # > # JRE version: OpenJDK Runtime Environment (17.0) (fastdebug build 17-internal+0-adhoc.pi.jdk) > # Java VM: OpenJDK Server VM (fastdebug 17-internal+0-adhoc.pi.jdk, compiled mode, emulated-client, g1 gc, linux-arm) > # Problematic frame: > # V [libjvm.so+0xe1a6a8] MacroAssembler::load_klass(RegisterImpl*, RegisterImpl*, AsmCondition)+0xa4 > > Current CompileTask: > C1: 318 2 !b java.lang.Class::desiredAssertionStatus (54 bytes) > > Stack: [0x72580000,0x72600000], sp=0x725fe170, free space=504k > Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0xe1a6a8] MacroAssembler::load_klass(RegisterImpl*, RegisterImpl*, AsmCondition)+0xa4 > V [libjvm.so+0x43b6b4] C1_MacroAssembler::lock_object(RegisterImpl*, RegisterImpl*, RegisterImpl*, RegisterImpl*, Label&)+0xcf8 > V [libjvm.so+0x3d731c] LIR_Assembler::emit_lock(LIR_OpLock*)+0x160 > > > The problem is in this code: > > if (DiagnoseSyncOnValueBasedClasses != 0) { > load_klass(tmp1, obj); <--- asserts > ldr_u32(tmp1, Address(tmp1, Klass::access_flags_offset())); > tst(tmp1, JVM_ACC_IS_VALUE_BASED_CLASS); > b(slow_case, ne); > } > > `tmp1` is `noreg` when `!BiasedLocking`, because `c1_LIRGenerator_arm.cpp` provides it only when `UseBiasedLocking` is enabled: > > void LIRGenerator::do_MonitorEnter(MonitorEnter* x) { > ... > // Need a scratch register for biased locking on arm > LIR_Opr scratch = LIR_OprFact::illegalOpr; > if(UseBiasedLocking) { > scratch = new_pointer_register(); > } else { > scratch = atomicLockOpr(); // <--- actually illegalOpr > } > > ... > > monitor_enter(obj.result(), lock, hdr, scratch, > x->monitor_no(), info_for_exception, info); > } > > The way out is to use `tmp2`, which is the alias for `Rtemp` and always available. > > Additional testing: > - [x] Linux ARM32 `SyncOnValueBasedClassTest`, `-XX:+UseBiasedLocking` > - [x] Linux ARM32 `SyncOnValueBasedClassTest`, `-XX:-UseBiasedLocking` Hi, The change is good! I did not notice this Pull Request, I studied the case and came up with exactly the same solution. Thanks for fixing this. Boris ------------- PR: https://git.openjdk.java.net/jdk/pull/2349 From tschatzl at openjdk.java.net Fri Feb 5 09:37:43 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 5 Feb 2021 09:37:43 GMT Subject: RFR: 8259668: Make SubTasksDone use-once [v2] In-Reply-To: <1t8CJEN8F3yCAYd6oO7OAvMJAeyLz4JG8krB-bl9oBI=.aa665edd-63b0-4111-9211-91cc257c927e@github.com> References: <1t8CJEN8F3yCAYd6oO7OAvMJAeyLz4JG8krB-bl9oBI=.aa665edd-63b0-4111-9211-91cc257c927e@github.com> Message-ID: <8fS9uHBh4LevCcCjRGtlNBCUTSZVv85yO4nrVupRMAs=.1f5964dc-b304-4cc4-90eb-86b8ec238f4c@github.com> On Thu, 4 Feb 2021 15:41:59 GMT, Albert Mingkun Yang wrote: >> After JDK-8260574, a instance of `SubTasksDone` is never reused, so part of its APIs could be revised: `clear()` and the code calling it is removed. >> >> With this patch, `all_tasks_completed` contains only assertion. Kim suggested moving this assertion logic to `~SubTasksDone`, but that could defer the assertion violation. For example, in the case of `G1FullGCMarkTask::work`, there is a significant amount of code running btw the instance when all subtasks are claimed (where `all_tasks_completed` is called in this PR) and `~SubTasksDone`. In the interest of having more precise location where bugs may lie, I have kept `all_tasks_completed` in the original place. More comments on this are welcome. > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Looks good. As an alternative to wrapping all (two) calls to `all_tasks_completel_impl` in `DEBUG_ONLY`, it is possible to declare it as `NOT_DEBUG_RETURN` outside of `#ifdef ASSERT` in the header file. The current approach is fine with me too. An additional (as an extra CR, preexisting) cleanup could be moving more method definitions into the cpp file as they do not seem to be performance critical; but as mentioned, this is fine too. Finally, could you sync the description of the CR in the JIRA with the PR one that there are no actual instances of reuse of this class any more? This is somewhat misleading as I was searching for actual reuse after reading the CR description (and skipping over the PR description) for a few minutes. Thanks. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2383 From shade at openjdk.java.net Fri Feb 5 09:51:47 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 5 Feb 2021 09:51:47 GMT Subject: RFR: 8260899: ARM32: SyncOnValueBasedClassTest fails with assert(is_valid()) failed: invalid register In-Reply-To: References: <3kcU-WrscBpRvHSepNBZWu4CNsBaP6pdeYTyIUT2z1E=.180e2e1d-7dec-4c28-b9c4-058627b12419@github.com> Message-ID: On Fri, 5 Feb 2021 09:21:24 GMT, Boris Ulasevich wrote: >> $ CONF=linux-arm-server-fastdebug make run-test TEST=runtime/Monitor/SyncOnValueBasedClassTest.java >> ... >> >> # Internal Error (/home/pi/jdk/src/hotspot/cpu/arm/register_arm.hpp:155), pid=3793, tid=3808 >> # assert(is_valid()) failed: invalid register >> # >> # JRE version: OpenJDK Runtime Environment (17.0) (fastdebug build 17-internal+0-adhoc.pi.jdk) >> # Java VM: OpenJDK Server VM (fastdebug 17-internal+0-adhoc.pi.jdk, compiled mode, emulated-client, g1 gc, linux-arm) >> # Problematic frame: >> # V [libjvm.so+0xe1a6a8] MacroAssembler::load_klass(RegisterImpl*, RegisterImpl*, AsmCondition)+0xa4 >> >> Current CompileTask: >> C1: 318 2 !b java.lang.Class::desiredAssertionStatus (54 bytes) >> >> Stack: [0x72580000,0x72600000], sp=0x725fe170, free space=504k >> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [libjvm.so+0xe1a6a8] MacroAssembler::load_klass(RegisterImpl*, RegisterImpl*, AsmCondition)+0xa4 >> V [libjvm.so+0x43b6b4] C1_MacroAssembler::lock_object(RegisterImpl*, RegisterImpl*, RegisterImpl*, RegisterImpl*, Label&)+0xcf8 >> V [libjvm.so+0x3d731c] LIR_Assembler::emit_lock(LIR_OpLock*)+0x160 >> >> >> The problem is in this code: >> >> if (DiagnoseSyncOnValueBasedClasses != 0) { >> load_klass(tmp1, obj); <--- asserts >> ldr_u32(tmp1, Address(tmp1, Klass::access_flags_offset())); >> tst(tmp1, JVM_ACC_IS_VALUE_BASED_CLASS); >> b(slow_case, ne); >> } >> >> `tmp1` is `noreg` when `!BiasedLocking`, because `c1_LIRGenerator_arm.cpp` provides it only when `UseBiasedLocking` is enabled: >> >> void LIRGenerator::do_MonitorEnter(MonitorEnter* x) { >> ... >> // Need a scratch register for biased locking on arm >> LIR_Opr scratch = LIR_OprFact::illegalOpr; >> if(UseBiasedLocking) { >> scratch = new_pointer_register(); >> } else { >> scratch = atomicLockOpr(); // <--- actually illegalOpr >> } >> >> ... >> >> monitor_enter(obj.result(), lock, hdr, scratch, >> x->monitor_no(), info_for_exception, info); >> } >> >> The way out is to use `tmp2`, which is the alias for `Rtemp` and always available. >> >> Additional testing: >> - [x] Linux ARM32 `SyncOnValueBasedClassTest`, `-XX:+UseBiasedLocking` >> - [x] Linux ARM32 `SyncOnValueBasedClassTest`, `-XX:-UseBiasedLocking` > > Hi, > The change is good! I did not notice this Pull Request, I studied the case and came up with exactly the same solution. > Thanks for fixing this. > Boris Thanks! I need a formal Reviewer to ack this :) ------------- PR: https://git.openjdk.java.net/jdk/pull/2349 From aph at redhat.com Fri Feb 5 09:51:48 2021 From: aph at redhat.com (Andrew Haley) Date: Fri, 5 Feb 2021 09:51:48 +0000 Subject: RFR: 8260355: AArch64: deoptimization stub should save vector registers [v4] In-Reply-To: <8d4f63c0-41a6-5def-3e6f-8de014476750@oracle.com> References: <5PbujtOhWB7uqNXu5vRMwYGNMNU78aLAkGpDLWUnQKM=.9d2206fe-9c15-424c-b08e-80eab468df2a@github.com> <8uV_aS99ZXLKzfeqP9PnJOMqLDLqqDBAXgY1kShysUE=.458c9338-e47a-4371-aac7-8fe096ef19c4@github.com> <85im78cn82.fsf@nicgas01-pc.shanghai.arm.com> <2084d66e-ab46-f909-51a0-c6f03c8da51c@redhat.com> <8d4f63c0-41a6-5def-3e6f-8de014476750@oracle.com> Message-ID: <9e49d389-ff0c-b746-6b5f-5bf48f468bc9@redhat.com> On 2/4/21 10:03 AM, Vladimir Ivanov wrote: > >>> I've also previously run the tier1 and java/incubator/vector/* tests >>> with -XX:+DeoptimizeALot. My experience of modifying that code is that >>> DeoptimizeALot fails pretty quickly if you get something wrong. >> >> Yeah. The problem here is that safepoints with live vectors aren't so >> common, so it's hard to test, I get it. Maybe this will have to do. > > FTR jdk/java/incubator/vector tests w/ -XX:+DeoptimizeALot are very good > at verifying that in-register vector values are properly preserved: > vectors (as exposed by Vector API) are routinely kept in registers > across safepoints and during deoptimization they are rematerialized into > full-blown Vector instances, so the tests fail quickly on broken vector > values. Great, thanks. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From ihse at openjdk.java.net Fri Feb 5 09:51:54 2021 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Fri, 5 Feb 2021 09:51:54 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 21:20:52 GMT, Daniel D. Daugherty wrote: >> Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> support macos_aarch64 in hsdis > > make/autoconf/flags.m4 line 140: > >> 138: else >> 139: MACOSX_VERSION_MIN=10.12.0 >> 140: fi > > Not something that needs to be addressed here, but these changes > illustrate that our collective use of macOSX/MACOSX/MacOSX names > are tied to the fact that the macOS major version number was at 10 > for a very long time. > > @magicus - Do we have an RFE to rename MACOSX or are we sticking > with it and evolving our interpretation of the 'X' from '10' to */splat/asterik? @dcubed-ojdk There is no RFE to renaming "macosx" to "macos". I'm not sure it should be done. We can't follow all marketing trends (Apple recently renamed iOS to iPadOS for the iPad; we can't keep adapting to such schemes). Personally, I like the new name without the "x", but we had already spent some time trying to find and fix all (or at least, most) instances of "osx" in the code, that I don't really think it's worth the effort. If you can drill up enough enthusiasm for such a project, and get any objections down to minimum, I can help implementing it. But I won't be spearheading it. > make/common/NativeCompilation.gmk line 1178: > >> 1176: endif >> 1177: # This only works if the openjdk_codesign identity is present on the system. Let >> 1178: # silently fail otherwise. > > Might want to add a comment here: > # The '-f' option will replace an existing signature if one exists. We're not really in the habit of adding comments for various command line options. Normally, you can check these with "man" if you are uncertain. If they do something surprising, sure, but here it's more of a "it's needed on aarch64 to work at all", so I don't think a comment will be anything but added clutter. ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From aph at openjdk.java.net Fri Feb 5 09:51:52 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 5 Feb 2021 09:51:52 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v10] In-Reply-To: References: Message-ID: On Thu, 4 Feb 2021 23:05:56 GMT, Gerard Ziemski wrote: >> Anton Kozlov has updated the pull request incrementally with six additional commits since the last revision: >> >> - Merge remote-tracking branch 'origin/jdk/jdk-macos' into jdk-macos >> - Add comments to WX transitions >> >> + minor change of placements >> - Use macro conditionals instead of empty functions >> - Add W^X to tests >> - Do not require known W^X state >> - Revert w^x in gtests > > I reviewed bsd_aarch64.cpp digging bit deeper and left some comments. > > Umm, so how does patching work? We don't even know if other threads are executing the code we need to patch. > > I thought java can handle that scenario in usual (non W^X systems) just fine, so we just believe jvm did everything right and it's safe to rewrite some code at specific moment. Got it, OK. ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From aph at openjdk.java.net Fri Feb 5 09:58:41 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 5 Feb 2021 09:58:41 GMT Subject: RFR: 8261071: AArch64: Refactor interpreter native wrappers In-Reply-To: References: Message-ID: On Thu, 4 Feb 2021 22:01:34 GMT, Anton Kozlov wrote: > Please review refactoring of interpreter signature handlers on aarch64. The main objective is to prepare for the new calling convention of macOS/AArch64, although this patch brings nothing from the new convention. > > Tested with signature stress tests and tier1 on Linux/AArch64. > > I have stared with a single function implementing SlowSignatureHandler (https://github.com/openjdk/jdk/commit/5ef1bd15c3bb174f4aed5e358d1ce2fff2846858#diff-1ff58ce70aeea7e9842d34e8d8fd9c94dd91182999d455618b2a171efd8f742cR164). The single function was compact but obscure. I was shuffling it until I eventually came to something similar of the initial approach with few pieces abstracted away. > > The most notable changes in the final version should be > * we count only parameters passed in registers > * ldrw/strw are used to pass via stack in SignatureHandlerGenerator::pass_int Thanks, that looks way better. Are all platform differences handled by Interpreter::local_offset_in_bytes() ? ------------- PR: https://git.openjdk.java.net/jdk/pull/2413 From ihse at openjdk.java.net Fri Feb 5 10:00:50 2021 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Fri, 5 Feb 2021 10:00:50 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v10] In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 20:01:15 GMT, Anton Kozlov wrote: >> Please review the implementation of JEP 391: macOS/AArch64 Port. >> >> It's heavily based on existing ports to linux/aarch64, macos/x86_64, and windows/aarch64. >> >> Major changes are in: >> * src/hotspot/cpu/aarch64: support of the new calling convention (subtasks JDK-8253817, JDK-8253818) >> * src/hotspot/os_cpu/bsd_aarch64: copy of os_cpu/linux_aarch64 with necessary adjustments (JDK-8253819) >> * src/hotspot/share, test/hotspot/gtest: support of write-xor-execute (W^X), required on macOS/AArch64 platform. It's implemented with pthread_jit_write_protect_np provided by Apple. The W^X mode is local to a thread, so W^X mode change relates to the java thread state change (for java threads). In most cases, JVM executes in write-only mode, except when calling a generated stub like SafeFetch, which requires a temporary switch to execute-only mode. The same execute-only mode is enabled when a java thread executes in java or native states. This approach of managing W^X mode turned out to be simple and efficient enough. >> * src/jdk.hotspot.agent: serviceability agent implementation (JDK-8254941) > > Anton Kozlov has updated the pull request incrementally with six additional commits since the last revision: > > - Merge remote-tracking branch 'origin/jdk/jdk-macos' into jdk-macos > - Add comments to WX transitions > > + minor change of placements > - Use macro conditionals instead of empty functions > - Add W^X to tests > - Do not require known W^X state > - Revert w^x in gtests Marked as reviewed by ihse (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From ihse at openjdk.java.net Fri Feb 5 10:00:50 2021 From: ihse at openjdk.java.net (Magnus Ihse Bursie) Date: Fri, 5 Feb 2021 10:00:50 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v10] In-Reply-To: References: Message-ID: On Fri, 5 Feb 2021 09:49:11 GMT, Andrew Haley wrote: >> I reviewed bsd_aarch64.cpp digging bit deeper and left some comments. > >> > Umm, so how does patching work? We don't even know if other threads are executing the code we need to patch. >> >> I thought java can handle that scenario in usual (non W^X systems) just fine, so we just believe jvm did everything right and it's safe to rewrite some code at specific moment. > > Got it, OK. > This doesn't seem to be an issue anymore, After P.Race have finished with JDK-8257852, Macarm port can be build without extra ld flags. @VladimirKempik I agree. That concludes the build issues with this PR. So from a build perspective, this is now good to go. ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From tschatzl at openjdk.java.net Fri Feb 5 10:10:50 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 5 Feb 2021 10:10:50 GMT Subject: RFR: 8260941: Remove the conc_scan parameter for CardTable Message-ID: <1sGB_hdxutVE55IriJ2XK3krb4vsffzX3OKavt1UwBE=.53483ccd-9923-4ec3-a0d9-82e619c440f5@github.com> Hi, can I have reviews for this removal of the last(?) CMS-specific code in CardTable, namely some provision to indicate that cards are being scanned concurrently in Serial/Parallel GC barrier code? The change simply follows the predicate into Serial/Parallel GC code which always returns false for them and removes that code. In the review for JDK-8234534 I mentioned that I split this out due to unexplainable errors; testing tier1-5 three times showed none of that any more (after updating to latest code). This change has only been built on Oracle-platforms and linux-x86 via github actions (https://github.com/tschatzl/jdk/actions/runs/539993964), so I would like to kindly ask maintainers of the others to compile and report issues (32 bit ARM, PPC etc). Testing: tier1-5 ------------- Commit messages: - Remove scan_concurrently() - Compilation fixes - First attempt to prune code Changes: https://git.openjdk.java.net/jdk/pull/2425/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2425&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260941 Stats: 81 lines in 17 files changed: 4 ins; 66 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/2425.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2425/head:pull/2425 PR: https://git.openjdk.java.net/jdk/pull/2425 From github.com+9200663+quaffel at openjdk.java.net Fri Feb 5 10:42:45 2021 From: github.com+9200663+quaffel at openjdk.java.net (Niklas Radomski) Date: Fri, 5 Feb 2021 10:42:45 GMT Subject: RFR: 8260369: [PPC64] Add support for JDK-8200555 [v3] In-Reply-To: References: Message-ID: On Thu, 4 Feb 2021 13:08:07 GMT, Martin Doerr wrote: >> I'd like to add the PPC64 part of JDK-8200555 "OopHandle should use Access API". This will be required to support ShenandoahGC and zGC. >> >> I have to change register usage. That's what makes this change a bit larger. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > use PRESERVATION_NONE in load_field_cp_cache_entry Looks very good! Took some time to understand why `load_resolved_reference_at_index` can be used without the need for register preservation, but seems to work just fine! I just have some very minor styling complains. Would be great if you could address those, but that's definitely no requirement. src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp line 941: > 939: R10_tmp = R10_ARG8; > 940: > 941: assert_different_registers(Rsize_of_parameters, Rsize_of_locals, parent_frame_resize, top_frame_size); Consider strengthening the assertion. `Rconst_method`, for instance, must not be equals to `Rsize_of_parameters`. src/hotspot/cpu/ppc/templateInterpreterGenerator_ppc.cpp line 939: > 937: Rconst_method = R8_ARG6, > 938: Rconst_pool = R9_ARG7, > 939: R10_tmp = R10_ARG8; Three variable naming styles in 5 LOCs. Would be great to have a consistent naming style throughout this function. src/hotspot/cpu/ppc/templateTable_ppc_64.cpp line 3365: > 3363: // Load receiver if needed (after appendix is pushed so parameter size is correct). > 3364: if (load_receiver) { > 3365: const Register Rparam_count = Rscratch1; The other variables of type Register declared in this method aren't `const`, although they aren't subject to change as well. ------------- Marked as reviewed by Quaffel at github.com (no known OpenJDK username). PR: https://git.openjdk.java.net/jdk/pull/2358 From github.com+9200663+quaffel at openjdk.java.net Fri Feb 5 10:56:46 2021 From: github.com+9200663+quaffel at openjdk.java.net (Niklas Radomski) Date: Fri, 5 Feb 2021 10:56:46 GMT Subject: RFR: 8260369: [PPC64] Add support for JDK-8200555 [v3] In-Reply-To: References: Message-ID: <1Vh7RLb-Es2cNLmkfqedJtKaK31gL-EqtRlDHW-i6VQ=.bf7d4e25-e19d-456b-aab7-267f356dbbab@github.com> On Thu, 4 Feb 2021 13:08:07 GMT, Martin Doerr wrote: >> I'd like to add the PPC64 part of JDK-8200555 "OopHandle should use Access API". This will be required to support ShenandoahGC and zGC. >> >> I have to change register usage. That's what makes this change a bit larger. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > use PRESERVATION_NONE in load_field_cp_cache_entry Seems like GitHub didn't show the latest changes to me. Looks good as well, nice optimization! ------------- Marked as reviewed by Quaffel at github.com (no known OpenJDK username). PR: https://git.openjdk.java.net/jdk/pull/2358 From stuefe at openjdk.java.net Fri Feb 5 10:57:00 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 5 Feb 2021 10:57:00 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v5] In-Reply-To: References: Message-ID: > In signal handling code, we have code sections which save signal handler state into vectors of sigaction structures, or of integers (if only flags are saved). All these code sections can be unified, disentangled and the using code simplified. > > There are three places where we do this: > > 1) When installing hotspot signal handlers, should we find a handler in place and signal chaining is enabled, we save the original handler inside a sigaction array and a corresponding sigset: > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L85 > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L338 > > 2) if diagnostics are enabled with -Xcheck:jni, we periodically check if our hotspot signal handlers had been replaced (`static void check_signal_handler(int sig)`): > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L766 > To do that, we store information about the handlers we installed and we expect to be intact; in this case we only store the sigaction flags (`int sigflags[NSIG];`) and deduce the handler address from context. > > 3) There is a complicated dance between VMError and the posix signal handler code: If a fatal error happens, we enter error reporting and install the secondary handler (`VMError::install_secondary_signal_handler()`). Before doing that, we store the handler we replace in yet another array, in this case one array for the handler address, one for the flag: > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/vmError_posix.cpp#L77 > I believe the purpose of this is to - when printing signal handlers as part of error reporting - print the original signal handler instead of the secondary crash handler (see `PosixSignals::print_signal_handler()`): > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L1372 > and additionally to not trip this warning here: > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L1391 > > ------ > > Changes in this patch: > > - I added some convenience macros to check if a handler matches a given function (HANDLER_IS), check if a handler is set to ignore or default or both (HANDLER_IS_IGN, HANDLER_IS_DFL, HANDLER_IS_IGN_OR_DFL). Makes code more readable. > - I added convenience class `SavedSignalHandlers` to keep a vector of handler information by signal number. > - I used that class to cover cases (1)..(3): > - `chained_handlers` contains all information of chained handlers > - `expected_handlers` contains a copy of the handlers the hotspot installed > - `replaced_handlers` contains information about replaced handlers > > - about (1): I store the chained signal handler information in `chained_handlers` when installing a hotspot handler, UseSignalChaining is 1, and a non-default handler was encountered. > > - about (2): I simplified the signal checking mechanism quite a bit: it compares the handler (address and flags) it finds present with expectations. Before this patch, the expected handler address was deduced in a hard-wired way, now, we just compare the active sigaction structure with the one we installed on VM start. > > - about (3): when installing any handler (hotspot as well as user defined via java), I store the handler it replaced in `replaced_handlers`. I use that to print which handler had been replaced in `PosixSignals::print_signal_handler`. I simplified `PosixSignals::print_signal_handler` such that it does not retain any knowledge about hotspot signal handlers. Now, it just prints out the currently established handlers. In addition to that, it prints out chaining information and which handlers had been replaced. I removed the associated coding from VMError. > > Output Before: > 663 Signal Handlers: > 664 SIGSEGV: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 665 SIGBUS: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 666 SIGFPE: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 667 SIGPIPE: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 668 SIGXFSZ: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 669 SIGILL: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 670 SIGUSR2: SR_handler in libjvm.so, sa_mask[0]=00000000000000000000000000000000, sa_flags=SA_RESTART|SA_SIGINFO > 671 SIGHUP: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 672 SIGINT: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 673 SIGTERM: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 674 SIGQUIT: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 675 SIGTRAP: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > > Now: > Signal Handlers: > SIGSEGV: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > replaced: SIGSEGV: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGBUS: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > replaced: SIGBUS: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGFPE: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > replaced: SIGFPE: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGPIPE: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGXFSZ: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGILL: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > replaced: SIGILL: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGUSR2: SR_handler in libjvm.so, mask=00000000000000000000000000000000, flags=SA_RESTART|SA_SIGINFO > SIGHUP: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGINT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGTERM: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGQUIT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGTRAP: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > > ----- > Tests: GA, and the patch has been tested in our nighlies for over a month now. I manually executed the runtime/jni/checked tests too. Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: Further fixes ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2251/files - new: https://git.openjdk.java.net/jdk/pull/2251/files/40201e1d..d2434ad2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2251&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2251&range=03-04 Stats: 111 lines in 1 file changed: 34 ins; 45 del; 32 mod Patch: https://git.openjdk.java.net/jdk/pull/2251.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2251/head:pull/2251 PR: https://git.openjdk.java.net/jdk/pull/2251 From stuefe at openjdk.java.net Fri Feb 5 11:00:44 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 5 Feb 2021 11:00:44 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v4] In-Reply-To: References: <1xLKHl6bUXtDeCRlMtdFmSF6MQreM1Ck8PQt9cuQpGQ=.e27683e3-f6da-4112-9f7a-d540f99b5e72@github.com> Message-ID: <1YGfukBzw92WjnbYzMUSiB_-JkJgTLNOmJS_5RzsTY8=.2760dfbf-c305-4b9d-96c5-7b206b7fba05@github.com> On Fri, 5 Feb 2021 06:38:58 GMT, Thomas Stuefe wrote: >> Looks good. Thanks for the updates. >> >> One query for further improvement below. :) >> >> Thanks, >> David > >> Looks good. Thanks for the updates. >> >> One query for further improvement below. :) >> >> Thanks, >> David > > Thanks for approval, David. Unfortunately I still have a strange build problem on one of our s390 boxes, so this is not done yet. Hi David, sorry, its not yet done. Changes in this version: 1) I found that on zlinux, sigaction.sa_flags is actually a long unsigned int. Which surprised me since this is clearly not posix compatible. Therefore I rewrote the code, adding a new function `get_sanitized_sa_flags`, which hides the correct casting and also contains the stripping away of unwanted flags set by the kernel (SA_RESTORER). As a side effect this cleanly factors out the SA_RESTORER handling which is nice. 2) You requested that `print_single_signal_handler` should print the flag difference when spotting a change. I agree and expanded on this: now `print_single_signal_handler` will print out the full expected handler information if it spots a difference (see below for examples). This contains more information that just the flags, and is actually less code. 3) Then I found that further code could be folded: We have a detailed printout for Xcheck:jni in `check_signal_handler`, but then we also print all signal handlers as part of that printout, which now (2) contains a verbose description of the found differences. Therefore the detailed printout in `check_signal_handler` was not needed anymore and could be removed. That also allows me to factor out the "compare sigaction structures semantically" logic into one function, called `compare_handler_info`. That again safes some code. Now the logic is like this: - If Xcheck:jni is not set, we still store the expected signal handler info. When printing signal handlers, e.g. as part of a hs_err file, in `os::print_signal_handlers`, we print a description of any changed handlers to nudge the supporter into the right direction. - If Xcheck:jni is active, we will do our periodic checks; upon finding a difference, we do not elaborate but call `os::print_signal_handlers()` which will elaborate for us. The final adjustment I had to make was to separate the "expected handler" info from the "check when Xcheck:jni is active" logic. That is because when Xcheck:jni is active and we find a mismatch, we want to disable checking for that signal in `check_signal_handler` but still see the difference in future printouts when calling `os::print_signal_handlers`, e.g. if later we crash and write a hs-err file. I am really sorry for the added review work, but I think its a worthwhile change and complexity gets further reduced. I thought about adding tests but testing this is quite difficult. So I tested manually by overwriting some of the hotspot signal handlers and check how the VM reacts. First a hs-err file, good case: 659 Signal Handlers: 660 SIGSEGV: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO 661 SIGBUS: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO 662 SIGFPE: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO 663 SIGPIPE: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO 664 SIGXFSZ: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO 665 SIGILL: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO 666 SIGUSR2: SR_handler in libjvm.so, mask=00000000000000000000000000000000, flags=SA_RESTART|SA_SIGINFO 667 SIGHUP: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO 668 SIGINT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO 669 SIGTERM: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO 670 SIGQUIT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO 671 SIGTRAP: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO Bad cases (I changed handlers for SIGFPE, SIGPIPE, SIGXFSZ and SIGUSR2): hs-err file: 658 Signal Handlers: 659 SIGSEGV: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO 660 SIGBUS: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO 661 SIGFPE: badwolf in libjvm.so, mask=10000000000000000000000000000000, flags=none 662 *** Handler was modified! 663 *** Expected: javaSignalHandler in libjvm.so, mask=11100100110111111111111111111110, flags=SA_RESTART|SA_SIGINFO 664 SIGPIPE: badwolf in libjvm.so, mask=10000000000000000000000000000000, flags=none 665 *** Handler was modified! 666 *** Expected: javaSignalHandler in libjvm.so, mask=11100100110111111111111111111110, flags=SA_RESTART|SA_SIGINFO 667 SIGXFSZ: badwolf in libjvm.so, mask=10000000000000000000000000000000, flags=none 668 *** Handler was modified! 669 *** Expected: javaSignalHandler in libjvm.so, mask=11100100110111111111111111111110, flags=SA_RESTART|SA_SIGINFO 670 SIGILL: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO 671 SIGUSR2: badwolf in libjvm.so, mask=10000000000000000000000000000000, flags=none 672 *** Handler was modified! 673 *** Expected: SR_handler in libjvm.so, mask=00000000000000000000000000000000, flags=SA_RESTART|SA_SIGINFO 674 SIGHUP: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO 675 SIGINT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO 676 SIGTERM: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO 677 SIGQUIT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO 678 SIGTRAP: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO jcmd VM.info: ... Signal Handlers: SIGSEGV: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO SIGBUS: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO SIGFPE: badwolf in libjvm.so, mask=00000111001010000101001100010001, flags=none *** Handler was modified! *** Expected: javaSignalHandler in libjvm.so, mask=11100100110111111111111111111110, flags=SA_RESTART|SA_SIGINFO SIGPIPE: badwolf in libjvm.so, mask=00000111001010000101001100010001, flags=none *** Handler was modified! *** Expected: javaSignalHandler in libjvm.so, mask=11100100110111111111111111111110, flags=SA_RESTART|SA_SIGINFO SIGXFSZ: badwolf in libjvm.so, mask=00000111001010000101001100010001, flags=none *** Handler was modified! *** Expected: javaSignalHandler in libjvm.so, mask=11100100110111111111111111111110, flags=SA_RESTART|SA_SIGINFO SIGILL: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO SIGUSR2: badwolf in libjvm.so, mask=00000111001010000101001100010001, flags=none *** Handler was modified! *** Expected: SR_handler in libjvm.so, mask=00000000000000000000000000000000, flags=SA_RESTART|SA_SIGINFO SIGHUP: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO SIGINT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO SIGTERM: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO SIGQUIT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO SIGTRAP: SIG_DFL, mask=00000000000000000000000000000000, flags=none -Xcheck:jni output: Warning: SIGFPE handler modified! Signal Handlers: SIGSEGV: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO SIGBUS: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO SIGFPE: badwolf in libjvm.so, mask=11100100010111111101111111111110, flags=none *** Handler was modified! *** Expected: javaSignalHandler in libjvm.so, mask=11100100110111111111111111111110, flags=SA_RESTART|SA_SIGINFO SIGPIPE: badwolf in libjvm.so, mask=11100100010111111101111111111110, flags=none *** Handler was modified! *** Expected: javaSignalHandler in libjvm.so, mask=11100100110111111111111111111110, flags=SA_RESTART|SA_SIGINFO SIGXFSZ: badwolf in libjvm.so, mask=11100100010111111101111111111110, flags=none *** Handler was modified! *** Expected: javaSignalHandler in libjvm.so, mask=11100100110111111111111111111110, flags=SA_RESTART|SA_SIGINFO SIGILL: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO SIGUSR2: badwolf in libjvm.so, mask=11100100010111111101111111111110, flags=none *** Handler was modified! *** Expected: SR_handler in libjvm.so, mask=00000000000000000000000000000000, flags=SA_RESTART|SA_SIGINFO SIGHUP: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO SIGINT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO SIGTERM: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO SIGQUIT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO SIGTRAP: SIG_DFL, mask=00000000000000000000000000000000, flags=none Consider using jsig library. ... Thanks, Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/2251 From ayang at openjdk.java.net Fri Feb 5 11:10:41 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Fri, 5 Feb 2021 11:10:41 GMT Subject: RFR: 8259668: Make SubTasksDone use-once [v2] In-Reply-To: <8fS9uHBh4LevCcCjRGtlNBCUTSZVv85yO4nrVupRMAs=.1f5964dc-b304-4cc4-90eb-86b8ec238f4c@github.com> References: <1t8CJEN8F3yCAYd6oO7OAvMJAeyLz4JG8krB-bl9oBI=.aa665edd-63b0-4111-9211-91cc257c927e@github.com> <8fS9uHBh4LevCcCjRGtlNBCUTSZVv85yO4nrVupRMAs=.1f5964dc-b304-4cc4-90eb-86b8ec238f4c@github.com> Message-ID: On Fri, 5 Feb 2021 09:34:58 GMT, Thomas Schatzl wrote: > An additional (as an extra CR, preexisting) cleanup could be moving more method definitions into the cpp file as they do not seem to be performance critical; but as mentioned, this is fine too. I think it's best to not to hide the definitions from the caller. The current approach ensures an optimized build will have zero cost from having the assertion; otherwise, the call site needs a `callq` instruction (without LTO), probably insignificant, but a few extra lines in the header is fine, IMO. > could you sync the description of the CR in the JIRA ... Done. Thank you for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/2383 From mdoerr at openjdk.java.net Fri Feb 5 11:36:55 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Fri, 5 Feb 2021 11:36:55 GMT Subject: RFR: 8260369: [PPC64] Add support for JDK-8200555 [v4] In-Reply-To: References: Message-ID: > I'd like to add the PPC64 part of JDK-8200555 "OopHandle should use Access API". This will be required to support ShenandoahGC and zGC. > > I have to change register usage. That's what makes this change a bit larger. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: feedback from Niklas ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2358/files - new: https://git.openjdk.java.net/jdk/pull/2358/files/be66f4e5..32e919d6 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2358&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2358&range=02-03 Stats: 31 lines in 2 files changed: 1 ins; 0 del; 30 mod Patch: https://git.openjdk.java.net/jdk/pull/2358.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2358/head:pull/2358 PR: https://git.openjdk.java.net/jdk/pull/2358 From github.com+9200663+quaffel at openjdk.java.net Fri Feb 5 11:46:40 2021 From: github.com+9200663+quaffel at openjdk.java.net (Niklas Radomski) Date: Fri, 5 Feb 2021 11:46:40 GMT Subject: RFR: 8260369: [PPC64] Add support for JDK-8200555 [v4] In-Reply-To: References: Message-ID: On Fri, 5 Feb 2021 11:36:55 GMT, Martin Doerr wrote: >> I'd like to add the PPC64 part of JDK-8200555 "OopHandle should use Access API". This will be required to support ShenandoahGC and zGC. >> >> I have to change register usage. That's what makes this change a bit larger. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > feedback from Niklas Thanks for implementing the feeback. I have nothing to add! ------------- Marked as reviewed by Quaffel at github.com (no known OpenJDK username). PR: https://git.openjdk.java.net/jdk/pull/2358 From lucy at openjdk.java.net Fri Feb 5 12:00:41 2021 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Fri, 5 Feb 2021 12:00:41 GMT Subject: RFR: 8260369: [PPC64] Add support for JDK-8200555 [v4] In-Reply-To: References: Message-ID: On Fri, 5 Feb 2021 11:36:55 GMT, Martin Doerr wrote: >> I'd like to add the PPC64 part of JDK-8200555 "OopHandle should use Access API". This will be required to support ShenandoahGC and zGC. >> >> I have to change register usage. That's what makes this change a bit larger. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > feedback from Niklas Still looks good to me. ------------- Marked as reviewed by lucy (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2358 From akozlov at openjdk.java.net Fri Feb 5 12:29:50 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Fri, 5 Feb 2021 12:29:50 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v10] In-Reply-To: References: Message-ID: On Fri, 5 Feb 2021 09:57:54 GMT, Magnus Ihse Bursie wrote: >> Anton Kozlov has updated the pull request incrementally with six additional commits since the last revision: >> >> - Merge remote-tracking branch 'origin/jdk/jdk-macos' into jdk-macos >> - Add comments to WX transitions >> >> + minor change of placements >> - Use macro conditionals instead of empty functions >> - Add W^X to tests >> - Do not require known W^X state >> - Revert w^x in gtests > > Marked as reviewed by ihse (Reviewer). > I haven't got a MacOS AArch64 system right now. Is it possible to > enable W^X in Linux in order to kick the tyres? I've just got rid of asserts that fired on Linux sometime :) As for W^X like on macOS, I vaguely remember working with a Linux system with one-way transition W->X, probably provided by SELinux. But I don't think it allowed per-thread W^X state. ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From akozlov at openjdk.java.net Fri Feb 5 12:41:41 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Fri, 5 Feb 2021 12:41:41 GMT Subject: RFR: 8261071: AArch64: Refactor interpreter native wrappers In-Reply-To: References: Message-ID: <5KA_bVe-BCpRge6rK1CoJFpEYcF0oS18nn6cnePp4_U=.1c84b225-ccd5-4076-96a5-1f13723e8cc1@github.com> On Fri, 5 Feb 2021 09:56:25 GMT, Andrew Haley wrote: > Are all platform differences handled by Interpreter::local_offset_in_bytes() ? Java local structure is very similar between CPUs. The function is a simple one: // Local values relative to locals[n] static int local_offset_in_bytes(int n) { return ((frame::interpreter_frame_expression_stack_direction() * n) * stackElementSize); } ------------- PR: https://git.openjdk.java.net/jdk/pull/2413 From akozlov at openjdk.java.net Fri Feb 5 13:01:05 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Fri, 5 Feb 2021 13:01:05 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v11] In-Reply-To: References: Message-ID: > Please review the implementation of JEP 391: macOS/AArch64 Port. > > It's heavily based on existing ports to linux/aarch64, macos/x86_64, and windows/aarch64. > > Major changes are in: > * src/hotspot/cpu/aarch64: support of the new calling convention (subtasks JDK-8253817, JDK-8253818) > * src/hotspot/os_cpu/bsd_aarch64: copy of os_cpu/linux_aarch64 with necessary adjustments (JDK-8253819) > * src/hotspot/share, test/hotspot/gtest: support of write-xor-execute (W^X), required on macOS/AArch64 platform. It's implemented with pthread_jit_write_protect_np provided by Apple. The W^X mode is local to a thread, so W^X mode change relates to the java thread state change (for java threads). In most cases, JVM executes in write-only mode, except when calling a generated stub like SafeFetch, which requires a temporary switch to execute-only mode. The same execute-only mode is enabled when a java thread executes in java or native states. This approach of managing W^X mode turned out to be simple and efficient enough. > * src/jdk.hotspot.agent: serviceability agent implementation (JDK-8254941) Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: Cleanup SA changes ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2200/files - new: https://git.openjdk.java.net/jdk/pull/2200/files/80827176..8652d21d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2200&range=10 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2200&range=09-10 Stats: 11 lines in 1 file changed: 3 ins; 8 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2200.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2200/head:pull/2200 PR: https://git.openjdk.java.net/jdk/pull/2200 From akozlov at openjdk.java.net Fri Feb 5 13:01:06 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Fri, 5 Feb 2021 13:01:06 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v3] In-Reply-To: References: Message-ID: On Mon, 25 Jan 2021 22:48:50 GMT, Chris Plummer wrote: >> Anton Kozlov has updated the pull request incrementally with two additional commits since the last revision: >> >> - Refactor CDS disabling >> - Redo builsys support for aarch64-darwin > > src/jdk.hotspot.agent/macosx/native/libsaproc/MacosxDebuggerLocal.m line 702: > >> 700: primitiveArray = (*env)->GetLongArrayElements(env, registerArray, NULL); >> 701: >> 702: #undef REG_INDEX > > I'm not so sure why the #undef and subsequent #define of REG_INDEX is needed since it seems to just get #define'd back to the same value. We've merged two implementations of SA, this change slipped in. I've cleaned this up. Thanks for noticing! ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From mdoerr at openjdk.java.net Fri Feb 5 13:01:42 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Fri, 5 Feb 2021 13:01:42 GMT Subject: RFR: 8260369: [PPC64] Add support for JDK-8200555 [v4] In-Reply-To: References: Message-ID: On Fri, 5 Feb 2021 11:57:37 GMT, Lutz Schmidt wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> feedback from Niklas > > Still looks good to me. Thanks for the reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/2358 From mdoerr at openjdk.java.net Fri Feb 5 13:01:44 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Fri, 5 Feb 2021 13:01:44 GMT Subject: Integrated: 8260369: [PPC64] Add support for JDK-8200555 In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 16:14:17 GMT, Martin Doerr wrote: > I'd like to add the PPC64 part of JDK-8200555 "OopHandle should use Access API". This will be required to support ShenandoahGC and zGC. > > I have to change register usage. That's what makes this change a bit larger. This pull request has now been integrated. Changeset: 48f5220c Author: Martin Doerr URL: https://git.openjdk.java.net/jdk/commit/48f5220c Stats: 113 lines in 7 files changed: 26 ins; 2 del; 85 mod 8260369: [PPC64] Add support for JDK-8200555 Reviewed-by: lucy ------------- PR: https://git.openjdk.java.net/jdk/pull/2358 From aph at openjdk.java.net Fri Feb 5 13:19:41 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 5 Feb 2021 13:19:41 GMT Subject: RFR: 8261071: AArch64: Refactor interpreter native wrappers In-Reply-To: References: Message-ID: On Thu, 4 Feb 2021 22:01:34 GMT, Anton Kozlov wrote: > Please review refactoring of interpreter signature handlers on aarch64. The main objective is to prepare for the new calling convention of macOS/AArch64, although this patch brings nothing from the new convention. > > Tested with signature stress tests and tier1 on Linux/AArch64. > > I have stared with a single function implementing SlowSignatureHandler (https://github.com/openjdk/jdk/commit/5ef1bd15c3bb174f4aed5e358d1ce2fff2846858#diff-1ff58ce70aeea7e9842d34e8d8fd9c94dd91182999d455618b2a171efd8f742cR164). The single function was compact but obscure. I was shuffling it until I eventually came to something similar of the initial approach with few pieces abstracted away. > > The most notable changes in the final version should be > * we count only parameters passed in registers > * ldrw/strw are used to pass via stack in SignatureHandlerGenerator::pass_int Marked as reviewed by aph (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2413 From hseigel at openjdk.java.net Fri Feb 5 14:53:53 2021 From: hseigel at openjdk.java.net (Harold Seigel) Date: Fri, 5 Feb 2021 14:53:53 GMT Subject: RFR: 8261161: Clean up warnings in hotspot/jtreg/vmTestbase tests Message-ID: Please review this change to clean up warnings, such as the following, in the vmTestbase tests. warning: [synchronization] attempt to synchronize on an instance of a value-based class warning: [removal] Integer(int) in Integer has been deprecated and marked for removal This change cleans up the warnings by using a non-value based class to synchronize on, and replacing calls such as Integer(int) with Integer.valueOf(int). The change was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-8 on Linux x64. Thanks, Harold ------------- Commit messages: - 8261161: Clean up warnings in hotspot/jtreg/vmTestbase tests Changes: https://git.openjdk.java.net/jdk/pull/2427/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2427&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261161 Stats: 747 lines in 129 files changed: 2 ins; 0 del; 745 mod Patch: https://git.openjdk.java.net/jdk/pull/2427.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2427/head:pull/2427 PR: https://git.openjdk.java.net/jdk/pull/2427 From daniel.daugherty at oracle.com Fri Feb 5 15:05:25 2021 From: daniel.daugherty at oracle.com (daniel.daugherty at oracle.com) Date: Fri, 5 Feb 2021 10:05:25 -0500 Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: Message-ID: <18c74331-6056-405f-1207-e62787f34830@oracle.com> On 2/5/21 4:51 AM, Magnus Ihse Bursie wrote: > On Tue, 2 Feb 2021 21:20:52 GMT, Daniel D. Daugherty wrote: > >>> Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: >>> >>> support macos_aarch64 in hsdis >> make/autoconf/flags.m4 line 140: >> >>> 138: else >>> 139: MACOSX_VERSION_MIN=10.12.0 >>> 140: fi >> Not something that needs to be addressed here, but these changes >> illustrate that our collective use of macOSX/MACOSX/MacOSX names >> are tied to the fact that the macOS major version number was at 10 >> for a very long time. >> >> @magicus - Do we have an RFE to rename MACOSX or are we sticking >> with it and evolving our interpretation of the 'X' from '10' to */splat/asterik? > @dcubed-ojdk There is no RFE to renaming "macosx" to "macos". I'm not sure it should be done. We can't follow all marketing trends (Apple recently renamed iOS to iPadOS for the iPad; we can't keep adapting to such schemes). Personally, I like the new name without the "x", but we had already spent some time trying to find and fix all (or at least, most) instances of "osx" in the code, that I don't really think it's worth the effort. > > If you can drill up enough enthusiasm for such a project, and get any objections down to minimum, I can help implementing it. But I won't be spearheading it. > >> make/common/NativeCompilation.gmk line 1178: >> >>> 1176: endif >>> 1177: # This only works if the openjdk_codesign identity is present on the system. Let >>> 1178: # silently fail otherwise. >> Might want to add a comment here: >> # The '-f' option will replace an existing signature if one exists. > We're not really in the habit of adding comments for various command line options. Normally, you can check these with "man" if you are uncertain. If they do something surprising, sure, but here it's more of a "it's needed on aarch64 to work at all", so I don't think a comment will be anything but added clutter. > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/2200 @magicus - I'm good with both of these answers. I personally like 'macosx'. Dan From gziemski at openjdk.java.net Fri Feb 5 15:47:52 2021 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Fri, 5 Feb 2021 15:47:52 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v10] In-Reply-To: References: Message-ID: <1WBui88c5lauE5a6N6u6iqV93EDULATd6jx-VTi7ctY=.4fc4b9e2-1e6d-4ff4-9693-27cdbb06e7ef@github.com> On Fri, 5 Feb 2021 12:26:27 GMT, Anton Kozlov wrote: >> Marked as reviewed by ihse (Reviewer). > >> I haven't got a MacOS AArch64 system right now. Is it possible to >> enable W^X in Linux in order to kick the tyres? > > I've just got rid of asserts that fired on Linux sometime :) As for W^X like on macOS, I vaguely remember working with a Linux system with one-way transition W->X, probably provided by SELinux. But I don't think it allowed per-thread W^X state. > _Mailing list message from [daniel.daugherty at oracle.com](mailto:daniel.daugherty at oracle.com) on [security-dev](mailto:security-dev at openjdk.java.net):_ > > On 2/5/21 4:51 AM, Magnus Ihse Bursie wrote: > > @magicus - I'm good with both of these answers. I personally like 'macosx'. > > Dan It's no longer `macosx`, it's just `macos` now - see https://en.wikipedia.org/wiki/MacOS ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From lfoltan at openjdk.java.net Fri Feb 5 15:49:41 2021 From: lfoltan at openjdk.java.net (Lois Foltan) Date: Fri, 5 Feb 2021 15:49:41 GMT Subject: RFR: 8261161: Clean up warnings in hotspot/jtreg/vmTestbase tests In-Reply-To: References: Message-ID: On Fri, 5 Feb 2021 14:48:25 GMT, Harold Seigel wrote: > Please review this change to clean up warnings, such as the following, in the vmTestbase tests. > > warning: [synchronization] attempt to synchronize on an instance of a value-based class > warning: [removal] Integer(int) in Integer has been deprecated and marked for removal > > This change cleans up the warnings by using a non-value based class to synchronize on, and replacing calls such as Integer(int) with Integer.valueOf(int). > > The change was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-8 on Linux x64. > > Thanks, Harold Looks good. Lois ------------- Marked as reviewed by lfoltan (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2427 From akozlov at openjdk.java.net Fri Feb 5 16:07:09 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Fri, 5 Feb 2021 16:07:09 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v12] In-Reply-To: References: Message-ID: > Please review the implementation of JEP 391: macOS/AArch64 Port. > > It's heavily based on existing ports to linux/aarch64, macos/x86_64, and windows/aarch64. > > Major changes are in: > * src/hotspot/cpu/aarch64: support of the new calling convention (subtasks JDK-8253817, JDK-8253818) > * src/hotspot/os_cpu/bsd_aarch64: copy of os_cpu/linux_aarch64 with necessary adjustments (JDK-8253819) > * src/hotspot/share, test/hotspot/gtest: support of write-xor-execute (W^X), required on macOS/AArch64 platform. It's implemented with pthread_jit_write_protect_np provided by Apple. The W^X mode is local to a thread, so W^X mode change relates to the java thread state change (for java threads). In most cases, JVM executes in write-only mode, except when calling a generated stub like SafeFetch, which requires a temporary switch to execute-only mode. The same execute-only mode is enabled when a java thread executes in java or native states. This approach of managing W^X mode turned out to be simple and efficient enough. > * src/jdk.hotspot.agent: serviceability agent implementation (JDK-8254941) Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: Update signal handler part for debugger ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2200/files - new: https://git.openjdk.java.net/jdk/pull/2200/files/8652d21d..0d0e9baf Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2200&range=11 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2200&range=10-11 Stats: 16 lines in 1 file changed: 5 ins; 8 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/2200.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2200/head:pull/2200 PR: https://git.openjdk.java.net/jdk/pull/2200 From akozlov at openjdk.java.net Fri Feb 5 16:17:50 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Fri, 5 Feb 2021 16:17:50 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 23:29:30 GMT, Gerard Ziemski wrote: >> using ` ```c ` https://docs.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks >> >> I was wrong about `SIGFPE` / `EXC_MASK_ARITHMETIC`, it's used on i386, x86_64: >> https://github.com/openjdk/jdk/blob/2be60e37e0e433141b2e3d3e32f8e638a4888e3a/src/hotspot/os_cpu/bsd_x86/os_bsd_x86.cpp#L467-L524 >> and aarch64: >> https://github.com/AntonKozlov/jdk/blob/80827176cbc5f0dd26003cf234a8076f3f557928/src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp#L309-L323 >> (What happened with the formatting here, ugh?) >> >> Your suggestion sounds good otherwise. @AntonKozlov, do you mind to integrate that? > > So it should be: > > #if defined(__APPLE__) > // lldb (gdb) installs both standard BSD signal handlers, and mach exception > // handlers. By replacing the existing task exception handler, we disable lldb's mach > // exception handling, while leaving the standard BSD signal handlers functional. > // > // EXC_MASK_BAD_ACCESS needed by all architectures for NULL ptr checking > // EXC_MASK_ARITHMETIC needed by all architectures for div by 0 checking > // EXC_MASK_BAD_INSTRUCTION needed by aarch64 to initiate deoptimization > kern_return_t kr; > kr = task_set_exception_ports(mach_task_self(), > EXC_MASK_BAD_ACCESS | EXC_MASK_ARITHMETIC > AARCH64_ONLY(| EXC_MASK_BAD_INSTRUCTION), > MACH_PORT_NULL, > EXCEPTION_STATE_IDENTITY, > MACHINE_THREAD_STATE); > > assert(kr == KERN_SUCCESS, "could not set mach task signal handler"); > #endif Thanks! I've updated the PR with this code, with extra indentation of `AARCH64_ONLY(...)` line, since this is continuation of the first parameter. I'll fix the formatting in os_bsd_arch64.cpp along other changes to `bsd_aarch64` directory ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From coleenp at openjdk.java.net Fri Feb 5 16:55:40 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 5 Feb 2021 16:55:40 GMT Subject: RFR: 8261161: Clean up warnings in hotspot/jtreg/vmTestbase tests In-Reply-To: References: Message-ID: <1KH2oh0e-Qi8fsB4oxmJPN4AIdvEU2gYlhdGVZoPyjg=.1c2e2036-1fff-4ec5-97e4-616987dd7864@github.com> On Fri, 5 Feb 2021 14:48:25 GMT, Harold Seigel wrote: > Please review this change to clean up warnings, such as the following, in the vmTestbase tests. > > warning: [synchronization] attempt to synchronize on an instance of a value-based class > warning: [removal] Integer(int) in Integer has been deprecated and marked for removal > > This change cleans up the warnings by using a non-value based class to synchronize on, and replacing calls such as Integer(int) with Integer.valueOf(int). > > The change was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-8 on Linux x64. > > Thanks, Harold Wow. thank you! ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2427 From akozlov at openjdk.java.net Fri Feb 5 17:23:50 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Fri, 5 Feb 2021 17:23:50 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 18:35:51 GMT, Gerard Ziemski wrote: >> Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> support macos_aarch64 in hsdis > > src/hotspot/cpu/aarch64/vm_version_aarch64.hpp line 93: > >> 91: CPU_MARVELL = 'V', >> 92: CPU_INTEL = 'i', >> 93: CPU_APPLE = 'a', > > The `ARM Architecture Reference Manual ARMv8, for ARMv8-A architecture profile` has 8538 pages, can we be more specific and point to the particular section of the document where the CPU codes are defined? They are defined in 13.2.95. MIDR_EL1, Main ID Register. Apple's code is not there, but "Arm can assign codes that are not published in this manual. All values not assigned by Arm are reserved and must not be used.". I assume the value was obtained by digging around https://github.com/apple/darwin-xnu/blob/main/osfmk/arm/cpuid.h#L62 ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From coleenp at openjdk.java.net Fri Feb 5 17:37:39 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 5 Feb 2021 17:37:39 GMT Subject: RFR: 8261125: Move VM_Operation to vmOperation.hpp In-Reply-To: <7F1Nq0zPfyWP644ml5icuHT6lkHlAjJG2qFOJr2dGAY=.daffcc16-1ed0-42db-b77b-e59c72dff349@github.com> References: <1bctHoGUQv3v65nZBeuvMgXH4ur6CU3xXkzHT6ZqIPo=.aeda32f9-fdfe-4561-b6f0-7201caa0eea1@github.com> <7F1Nq0zPfyWP644ml5icuHT6lkHlAjJG2qFOJr2dGAY=.daffcc16-1ed0-42db-b77b-e59c72dff349@github.com> Message-ID: On Fri, 5 Feb 2021 05:51:22 GMT, Ioi Lam wrote: >> Hi Ioi, >> >> The distinction between vmOperation versus vmOperations is far too subtle. Perhaps as Dan implied vm_Operation.hpp or VM_Operation.hpp (though that breaks normal - odd - naming convention). >> >> I assume most files that include vmOperation.hpp are those that define VM_Operation subclasses? >> >> Thanks, >> David > >> Hi Ioi, >> >> The distinction between vmOperation versus vmOperations is far too subtle. Perhaps as Dan implied vm_Operation.hpp or VM_Operation.hpp (though that breaks normal - odd - naming convention). > > How about this: > > - runtime/vmOperation.hpp --- (new file) this is the file that declares VM_Operation > - runtime/commonVMOperations.hpp -- (renamed from vmOperations.hpp) these are the VM_Operation subclasses that no one cares to organize properly :-) > > this will be kinda consistent with these existing files: > > - gc/g1/g1VMOperations.hpp > - gc/g1/g1VMOperations.cpp > - gc/shenandoah/shenandoahVMOperations.cpp > - gc/shenandoah/shenandoahVMOperations.hpp > - gc/shared/gcVMOperations.cpp > - gc/shared/gcVMOperations.hpp > - gc/parallel/psVMOperations.cpp > - gc/parallel/psVMOperations.hpp > > (I should also add a new vmOperation.cpp, and rename vmOperations.cpp to commonVMOperations.cpp) > > BTW, I need to refactor VM_Exit into its own file. It's used by the `JVM_LEAF` macro in interfaceSupport.inline.hpp, but I don't want to pull in all the other "common" operations in there. I am thinking of calling it vmExit.hpp (since exitVMOperation.hpp doesn't really look good). > >> I assume most files that include vmOperation.hpp are those that define VM_Operation subclasses? > > Yes, but there are also files that use the `VM_Operation::VMOp_Type` type, notably safepoint.hpp. I'm fine with leaving vmOperations.hpp and vmOperation.hpp. It's not a big deal. commonVMOperations.hpp - too much noise! I agree with David. interfaceSupport.inline.hpp imports a lot of things so importing vmOperations.hpp is not a big deal. vmOperations.hpp imports #include "runtime/threadSMR.hpp" otherwise it has all the same imports as interfaceSupport.inline.hpp anyway. All these files are going to increase compilation time too. I stand by my check mark above! ------------- PR: https://git.openjdk.java.net/jdk/pull/2398 From dsamersoff at openjdk.java.net Fri Feb 5 18:05:39 2021 From: dsamersoff at openjdk.java.net (Dmitry Samersoff) Date: Fri, 5 Feb 2021 18:05:39 GMT Subject: RFR: 8260899: ARM32: SyncOnValueBasedClassTest fails with assert(is_valid()) failed: invalid register In-Reply-To: <3kcU-WrscBpRvHSepNBZWu4CNsBaP6pdeYTyIUT2z1E=.180e2e1d-7dec-4c28-b9c4-058627b12419@github.com> References: <3kcU-WrscBpRvHSepNBZWu4CNsBaP6pdeYTyIUT2z1E=.180e2e1d-7dec-4c28-b9c4-058627b12419@github.com> Message-ID: On Tue, 2 Feb 2021 09:24:20 GMT, Aleksey Shipilev wrote: > $ CONF=linux-arm-server-fastdebug make run-test TEST=runtime/Monitor/SyncOnValueBasedClassTest.java > ... > > # Internal Error (/home/pi/jdk/src/hotspot/cpu/arm/register_arm.hpp:155), pid=3793, tid=3808 > # assert(is_valid()) failed: invalid register > # > # JRE version: OpenJDK Runtime Environment (17.0) (fastdebug build 17-internal+0-adhoc.pi.jdk) > # Java VM: OpenJDK Server VM (fastdebug 17-internal+0-adhoc.pi.jdk, compiled mode, emulated-client, g1 gc, linux-arm) > # Problematic frame: > # V [libjvm.so+0xe1a6a8] MacroAssembler::load_klass(RegisterImpl*, RegisterImpl*, AsmCondition)+0xa4 > > Current CompileTask: > C1: 318 2 !b java.lang.Class::desiredAssertionStatus (54 bytes) > > Stack: [0x72580000,0x72600000], sp=0x725fe170, free space=504k > Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0xe1a6a8] MacroAssembler::load_klass(RegisterImpl*, RegisterImpl*, AsmCondition)+0xa4 > V [libjvm.so+0x43b6b4] C1_MacroAssembler::lock_object(RegisterImpl*, RegisterImpl*, RegisterImpl*, RegisterImpl*, Label&)+0xcf8 > V [libjvm.so+0x3d731c] LIR_Assembler::emit_lock(LIR_OpLock*)+0x160 > > > The problem is in this code: > > if (DiagnoseSyncOnValueBasedClasses != 0) { > load_klass(tmp1, obj); <--- asserts > ldr_u32(tmp1, Address(tmp1, Klass::access_flags_offset())); > tst(tmp1, JVM_ACC_IS_VALUE_BASED_CLASS); > b(slow_case, ne); > } > > `tmp1` is `noreg` when `!BiasedLocking`, because `c1_LIRGenerator_arm.cpp` provides it only when `UseBiasedLocking` is enabled: > > void LIRGenerator::do_MonitorEnter(MonitorEnter* x) { > ... > // Need a scratch register for biased locking on arm > LIR_Opr scratch = LIR_OprFact::illegalOpr; > if(UseBiasedLocking) { > scratch = new_pointer_register(); > } else { > scratch = atomicLockOpr(); // <--- actually illegalOpr > } > > ... > > monitor_enter(obj.result(), lock, hdr, scratch, > x->monitor_no(), info_for_exception, info); > } > > The way out is to use `tmp2`, which is the alias for `Rtemp` and always available. > > Additional testing: > - [x] Linux ARM32 `SyncOnValueBasedClassTest`, `-XX:+UseBiasedLocking` > - [x] Linux ARM32 `SyncOnValueBasedClassTest`, `-XX:-UseBiasedLocking` Marked as reviewed by dsamersoff (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2349 From dsamersoff at openjdk.java.net Fri Feb 5 18:05:40 2021 From: dsamersoff at openjdk.java.net (Dmitry Samersoff) Date: Fri, 5 Feb 2021 18:05:40 GMT Subject: RFR: 8260899: ARM32: SyncOnValueBasedClassTest fails with assert(is_valid()) failed: invalid register In-Reply-To: References: <3kcU-WrscBpRvHSepNBZWu4CNsBaP6pdeYTyIUT2z1E=.180e2e1d-7dec-4c28-b9c4-058627b12419@github.com> Message-ID: On Fri, 5 Feb 2021 09:48:54 GMT, Aleksey Shipilev wrote: >> Hi, >> The change is good! I did not notice this Pull Request, I studied the case and came up with exactly the same solution. >> Thanks for fixing this. >> Boris > > Thanks! I need a formal Reviewer to ack this :) Looks good to me (R) ------------- PR: https://git.openjdk.java.net/jdk/pull/2349 From aph at openjdk.java.net Fri Feb 5 19:07:01 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 5 Feb 2021 19:07:01 GMT Subject: RFR: 8261027: AArch64: Support for LSE atomics C++ HotSpot code Message-ID: Go back a few years, and there were simple atomic load/store exclusive instructions on Arm. Say you want to do an atomic increment of a counter. You'd do an atomic load to get the counter into your local cache in exclusive state, increment that counter locally, then write that incremented counter back to memory with an atomic store. All the time that cache line was in exclusive state, so you're guaranteed that no-one else changed anything on that cache line while you had it. This is hard to scale on a very large system (e.g. Fugaku) because if many processors are incrementing that counter you get a lot of cache line ping-ponging between cores. So, Arm decided to add a locked memory increment instruction that works without needing to load an entire line into local cache. It's a single instruction that loads, increments, and writes back. The secret is to send a cache control message to whichever processor owns the cache line containing the count, tell that processor to increment the counter and return the incremented value. That way cache coherency traffic is mimimized. This new set of instructions is known as Large System Extensions, or LSE. Unfortunately, in recent processors, the "old" load/store exclusive instructions, sometimes perform very badly. Therefore, it's now necessary for software to detect which version of Arm it's running on, and use the "new" LSE instructions if they're available. Otherwise performance can be very poor under heavy contention. GCC's -moutline-atomics does this by providing library calls which use LSE if it's available, but this option is only provided on newer versions of GCC. This is particularly problematic with older versions of OpenJDK, which build using old GCC versions. Also, I suspect that some other operating systems could use this. Perhaps not MacOS, given that all Apple CPUs support LSE, but maybe Windows. ------------- Commit messages: - Hoist load of stub pointer before FULL_MEM_BARRIER. - Oops - Move stuff around - Cleanup - Intermediate for perf test - Untabify - Intermediate - Intermediate Changes: https://git.openjdk.java.net/jdk/pull/2434/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2434&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261027 Stats: 396 lines in 6 files changed: 362 ins; 6 del; 28 mod Patch: https://git.openjdk.java.net/jdk/pull/2434.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2434/head:pull/2434 PR: https://git.openjdk.java.net/jdk/pull/2434 From github.com+9200663+quaffel at openjdk.java.net Fri Feb 5 19:07:46 2021 From: github.com+9200663+quaffel at openjdk.java.net (Niklas Radomski) Date: Fri, 5 Feb 2021 19:07:46 GMT Subject: RFR: JDK-8260372: [PPC64] Add support for JDK-8210498 and JDK-8222841 Message-ID: Introduces support for _nmethod entry barriers_ and _c2i entry barriers_ on the ppc platform. Those are required to enable concurrent class unloading for compatible garbage collectors, such as Shenandoah or zGC. _This is a preparational change for the Shenandoah GC port to ppc. As such, it introduces features that the current version doesn't make use of, but that are required for the upcoming change. This way, the scope of the upcoming change is limited to Shenandoah-specific functionality; making its review a little easier._ ------------- Commit messages: - Remove extraneous whitespace - [PPC64] Introduce nmethod_entry_barrier and c2i_entry_barrier Changes: https://git.openjdk.java.net/jdk/pull/2432/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2432&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260372 Stats: 264 lines in 10 files changed: 254 ins; 0 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/2432.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2432/head:pull/2432 PR: https://git.openjdk.java.net/jdk/pull/2432 From mdoerr at openjdk.java.net Fri Feb 5 22:37:43 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Fri, 5 Feb 2021 22:37:43 GMT Subject: RFR: JDK-8260372: [PPC64] Add support for JDK-8210498 and JDK-8222841 In-Reply-To: References: Message-ID: <41GbtKE8p2wI0QVgfjqSiL7hczENQL1M0g_0HceZEy8=.79ab0b4b-70d8-4357-b1a8-ce05376b95af@github.com> On Fri, 5 Feb 2021 18:04:34 GMT, Niklas Radomski wrote: > Introduces support for _nmethod entry barriers_ and _c2i entry barriers_ on the ppc platform. Those are required to enable concurrent class unloading for compatible garbage collectors, such as Shenandoah or zGC. > > _This is a preparational change for the Shenandoah GC port to ppc. As such, it introduces features that the current version doesn't make use of, but that are required for the upcoming change. This way, the scope of the upcoming change is limited to Shenandoah-specific functionality; making its review a little easier._ Thanks for the contribution. Looks very good. I only have a few minor requests. src/hotspot/cpu/ppc/gc/shared/barrierSetNMethod_ppc.cpp line 44: > 42: > 43: public: > 44: unsigned short get_guard_value() const { Return type should be int. src/hotspot/cpu/ppc/gc/shared/barrierSetNMethod_ppc.cpp line 49: > 47: } > 48: > 49: void release_set_guard_value(unsigned short value) { Parameter should be int. src/hotspot/cpu/ppc/gc/shared/barrierSetNMethod_ppc.cpp line 113: > 111: // Nothing to do. > 112: // Unlike other platforms, the frame resolution is done in the nmethod entry barrier stub. > 113: // This way, writing frame information on the stack can be avoided. [optional] I'd replace the last sentence: // PPC64 always has a valid back chain so the stub can simply pop the frame and there's nothing to do here. src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 3581: > 3579: > 3580: // Return to prologue if no deoptimization is required (bnelr) > 3581: __ bclr(Assembler::bcondCRbiIs1_bhintIsTaken, Assembler::bi0(CCR0, Assembler::equal), Assembler::bhintIsTaken); Branch hint is provided twice. Not an error, but bcondCRbiIs1 would be cleaner if you keep it at the end. src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 3592: > 3590: > 3591: // Restore link register. Points to an instruction in the previous Java frame; effectively resuming > 3592: // its execution after this method's deoptimization. This method's prologue is aborted. Unprecise: We're not resuming in the caller after deoptimization, we call handle_wrong_method which finds a better callee (e.g. interpreter) and jumps to it. ------------- Changes requested by mdoerr (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2432 From iveresov at openjdk.java.net Sat Feb 6 06:21:54 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Sat, 6 Feb 2021 06:21:54 GMT Subject: RFR: 8261229: MethodData is not correctly initialized with TieredStopAtLevel=3 Message-ID: <-ReDb8Jf-EIuYNyQltIaKSF1tUPmswp6kx_mciU5Fvk=.31d8565c-26fe-4f7f-8b88-55bcc996bfca@github.com> Mostly a typo in compilation mode ergonomics that selected a quick-only mode essentially when the user specified TieredStopAtLevel={1,2,3}. The quick-only mode has an optimization that eliminates parts of the MDO since they are not needed. Meanwhile, the WB API considered it a fair game to request a level 3 compile, that requires a full MDO. The fix corrects the original issue and also tries to be extra defensive with WB API (since it's semantics is not clearly specified) by always allocating full MDO if WB API is on. ------------- Commit messages: - Be defensive with respect to WB API - Fix compilation mode ergonomics Changes: https://git.openjdk.java.net/jdk/pull/2444/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2444&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261229 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/2444.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2444/head:pull/2444 PR: https://git.openjdk.java.net/jdk/pull/2444 From aph at openjdk.java.net Sat Feb 6 09:27:41 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Sat, 6 Feb 2021 09:27:41 GMT Subject: RFR: 8261027: AArch64: Support for LSE atomics C++ HotSpot code In-Reply-To: References: Message-ID: On Fri, 5 Feb 2021 18:56:46 GMT, Andrew Haley wrote: > Go back a few years, and there were simple atomic load/store exclusive > instructions on Arm. Say you want to do an atomic increment of a > counter. You'd do an atomic load to get the counter into your local cache > in exclusive state, increment that counter locally, then write that > incremented counter back to memory with an atomic store. All the time > that cache line was in exclusive state, so you're guaranteed that > no-one else changed anything on that cache line while you had it. > > This is hard to scale on a very large system (e.g. Fugaku) because if > many processors are incrementing that counter you get a lot of cache > line ping-ponging between cores. > > So, Arm decided to add a locked memory increment instruction that > works without needing to load an entire line into local cache. It's a > single instruction that loads, increments, and writes back. The secret > is to send a cache control message to whichever processor owns the > cache line containing the count, tell that processor to increment the > counter and return the incremented value. That way cache coherency > traffic is mimimized. This new set of instructions is known as Large > System Extensions, or LSE. > > Unfortunately, in recent processors, the "old" load/store exclusive > instructions, sometimes perform very badly. Therefore, it's now > necessary for software to detect which version of Arm it's running > on, and use the "new" LSE instructions if they're available. Otherwise > performance can be very poor under heavy contention. > > GCC's -moutline-atomics does this by providing library calls which use > LSE if it's available, but this option is only provided on newer > versions of GCC. This is particularly problematic with older versions > of OpenJDK, which build using old GCC versions. > > Also, I suspect that some other operating systems could use this. > Perhaps not MacOS, given that all Apple CPUs support LSE, but > maybe Windows. With regard to performance, the overhead of the ```call ... ret``` sequence seems to be almost negligible on the systems I've tested. On ThunderX2, there is little difference, whatever you do. A straight-line count and increment loop is 5%v slower. On Neoverse N1 there is some 25% straight-line improvement for a simple count and increment loop with this patch. GCC's -moutline-atomics isn't quite as good as this patch, with only a 17% improvement. But simple straight-line tests aren't really the point of LSE. The big performance hit with the "old" atomics happens at times of heavy contention, when fairness problems cause severe scaling issues. This is more likely to be a problem on large systems with many cores and large heaps. **ThunderX2:** Baseline: real 0m24.001s Patched: -XX:+UseLSE real 0m25.222s -XX:-UseLSE real 0m25.215s Built with -moutline-atomics: real 0m25.227s **Neoverse N1:** Baseline: real 0m10.027s Patched: -XX:+UseLSE real 0m8.027s -XX:-UseLSE real 0m10.429s Built with -moutline-atomics: real 0m8.538s ------------- PR: https://git.openjdk.java.net/jdk/pull/2434 From iignatyev at openjdk.java.net Sat Feb 6 17:52:41 2021 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Sat, 6 Feb 2021 17:52:41 GMT Subject: RFR: 8261229: MethodData is not correctly initialized with TieredStopAtLevel=3 In-Reply-To: <-ReDb8Jf-EIuYNyQltIaKSF1tUPmswp6kx_mciU5Fvk=.31d8565c-26fe-4f7f-8b88-55bcc996bfca@github.com> References: <-ReDb8Jf-EIuYNyQltIaKSF1tUPmswp6kx_mciU5Fvk=.31d8565c-26fe-4f7f-8b88-55bcc996bfca@github.com> Message-ID: On Sat, 6 Feb 2021 06:16:45 GMT, Igor Veresov wrote: > Mostly a typo in compilation mode ergonomics that selected a quick-only mode essentially when the user specified TieredStopAtLevel={1,2,3}. The quick-only mode has an optimization that eliminates parts of the MDO since they are not needed. Meanwhile, the WB API considered it a fair game to request a level 3 compile, that requires a full MDO. > > The fix corrects the original issue and also tries to be extra defensive with WB API (since it's semantics is not clearly specified) by always allocating full MDO if WB API is on. I don't think we should adjust the product code to behave differently just to satisfy the incorrect assumptions of WhiteBox. it also kinda defeats the purpose of WhiteBox API as we won't be able to go thru the same code path. -- Igor ------------- PR: https://git.openjdk.java.net/jdk/pull/2444 From kim.barrett at oracle.com Sun Feb 7 05:30:23 2021 From: kim.barrett at oracle.com (Kim Barrett) Date: Sun, 7 Feb 2021 00:30:23 -0500 Subject: RFR: 8259668: Make SubTasksDone use-once [v2] In-Reply-To: References: <1t8CJEN8F3yCAYd6oO7OAvMJAeyLz4JG8krB-bl9oBI=.aa665edd-63b0-4111-9211-91cc257c927e@github.com> <8fS9uHBh4LevCcCjRGtlNBCUTSZVv85yO4nrVupRMAs=.1f5964dc-b304-4cc4-90eb-86b8ec238f4c@github.com> Message-ID: <8ED33A9E-7DBE-4B25-AD9F-59AD31D6C865@oracle.com> > On Feb 5, 2021, at 6:10 AM, Albert Mingkun Yang wrote: > > On Fri, 5 Feb 2021 09:34:58 GMT, Thomas Schatzl wrote: > >> An additional (as an extra CR, preexisting) cleanup could be moving more method definitions into the cpp file as they do not seem to be performance critical; but as mentioned, this is fine too. > > I think it's best to not to hide the definitions from the caller. The current approach ensures an optimized build will have zero cost from having the assertion; otherwise, the call site needs a `callq` instruction (without LTO), probably insignificant, but a few extra lines in the header is fine, IMO. Thomas?s suggestion was to change the class to have DEBUG_ONLY(volatile bool _verification_done = false;) void all_tasks_completed_impl(uint skipped[], size_t skipped_size) NOT_DEBUG_RETURN; and eliminate the DEBUG_ONLY wrappers currently cluttering the calls to all_tasks_completed_impl. A call in a not-debug build will have a trivial empty-bodied function that will be inlined to nothing. With that change, I would expect any decent compiler to generate no code for a call to all_tasks_completed in a product build. We use this pattern a lot, with exactly that expectation. I also think this change should be made. From kbarrett at openjdk.java.net Sun Feb 7 05:51:43 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sun, 7 Feb 2021 05:51:43 GMT Subject: RFR: 8259668: Make SubTasksDone use-once [v2] In-Reply-To: <1t8CJEN8F3yCAYd6oO7OAvMJAeyLz4JG8krB-bl9oBI=.aa665edd-63b0-4111-9211-91cc257c927e@github.com> References: <1t8CJEN8F3yCAYd6oO7OAvMJAeyLz4JG8krB-bl9oBI=.aa665edd-63b0-4111-9211-91cc257c927e@github.com> Message-ID: On Thu, 4 Feb 2021 15:41:59 GMT, Albert Mingkun Yang wrote: >> After JDK-8260574, a instance of `SubTasksDone` is never reused, so part of its APIs could be revised: `clear()` and the code calling it is removed. >> >> With this patch, `all_tasks_completed` contains only assertion. Kim suggested moving this assertion logic to `~SubTasksDone`, but that could defer the assertion violation. For example, in the case of `G1FullGCMarkTask::work`, there is a significant amount of code running btw the instance when all subtasks are claimed (where `all_tasks_completed` is called in this PR) and `~SubTasksDone`. In the interest of having more precise location where bugs may lie, I have kept `all_tasks_completed` in the original place. More comments on this are welcome. > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Changes requested by kbarrett (Reviewer). src/hotspot/share/gc/shared/workgroup.cpp line 364: > 362: #ifdef ASSERT > 363: void SubTasksDone::all_tasks_completed_impl(uint skipped[], size_t skipped_size) { > 364: if (Atomic::cmpxchg(&_verification_done, false, true)) { This verification done check prevents detection of certain kinds of mistakes. For example, if the first thread did not claim a skipped task but a later one did, we'll miss that. Given that this function only does anything in a debug-build, and is usually pretty fast because the number of subtasks is small, I don't think there's a good reason to "optimize" it this way. (Assuming it even is an optimization, as a CAS operation may be relatively expensive.) src/hotspot/share/gc/shared/workgroup.hpp line 341: > 339: template 340: ENABLE_IF(Conjunction...>::value)> > 341: void all_tasks_completed(T0 first_skipped, Ts... more_skipped) { I think this overload should be treated as the primary that the documentation applies to, with the no-arg overload following and being commented as being the base case for the variadic function. src/hotspot/share/gc/shared/workgroup.cpp line 391: > 389: > 390: bool SubTasksDone::valid() { > 391: return _tasks != NULL; This function should can never return false, since _tasks is initialized in the constructor and the value deleted in the destructor. It should be removed and any callers fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/2383 From dongbo at openjdk.java.net Sun Feb 7 07:57:57 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Sun, 7 Feb 2021 07:57:57 GMT Subject: RFR: 8258953: AArch64: move NEON instructions to aarch64_neon.ad [v2] In-Reply-To: References: Message-ID: > As discussed in [1], all NEON instructions should be moved from `aarch64.ad` to `aarch64_neon.ad`. > > In the first commit [2] of this PR, the NEON instructions are deleted from `aarch64.ad` and appended to `aarch64_neon.ad`. > I compared the generated code in `aarch64_neon.ad` with original code in `aarch64.ad`, no suspicious differences found. > The last two commits just simply move code around in `aarch64_neon.ad` to put related instructions together, i.e. `LoadStore` [3], `Reduction` [4]. > > This also supports vector length 4 for `vsraa8B_imm` and `vsrla8B_imm`, vector length 2 for `vsraa4S_imm` and `vsrla4S_imm`, fixes few typos, e.g. `vor8B`, `vsrla4S_imm`. > > [1] https://github.com/openjdk/jdk/pull/1215#issuecomment-728186803 > [2] https://github.com/dgbo/jdk/commit/40cbe99e647cdf93712edf8f77ab3b5b30ea0a95 > [3] https://github.com/dgbo/jdk/commit/695fb8f8ef009b733a8f804e791347f4bfe2572e > [4] https://github.com/dgbo/jdk/commit/e0c38aa9aaa6af9925a3821328384b1e2b2c2070 Dong Bo has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: 8258953: AArch64: move NEON instructions to aarch64_neon.ad ------------- Changes: https://git.openjdk.java.net/jdk/pull/2273/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2273&range=01 Stats: 5661 lines in 3 files changed: 3216 ins; 2435 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/2273.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2273/head:pull/2273 PR: https://git.openjdk.java.net/jdk/pull/2273 From dongbo at openjdk.java.net Sun Feb 7 08:05:43 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Sun, 7 Feb 2021 08:05:43 GMT Subject: RFR: 8258953: AArch64: move NEON instructions to aarch64_neon.ad [v2] In-Reply-To: References: Message-ID: On Thu, 28 Jan 2021 12:09:36 GMT, Dong Bo wrote: >> I managed to sort all the instructs and compare them with and without the patch. They are general the same except for some trailing whitespaces and typos you mentioned. > >> _Mailing list message from [Andrew Haley](mailto:aph at redhat.com) on [hotspot-dev](mailto:hotspot-dev at openjdk.java.net):_ >> >> On 1/28/21 10:40 AM, Ningsheng Jian wrote: >> >> > I see you have fixed this typo, from ushr to usra. I presume original version generates wrong code and produces wrong results for specific case? If so, do you think it deserves a separate fix, e.g. for jdk16? >> >> It does. This patch should change nothing at all, except moving >> text from A to B. >> >> -- >> Andrew Haley (he/him) >> Java Platform Lead Engineer >> Red Hat UK Ltd. >> https://keybase.io/andrewhaley >> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 > > @nsjian @theRealAph Thank you for the comments. I'll raise a seperate PR to fix this right now. > > BTW, since Andrew says we should change nothing at all in this move, do you think we should also do the things below in separtate PRs? > 1. fix the typo of `vor8B`. > 2. supporting vector length 4 for `vsraa8B_imm` and `vsrla8B_imm`, vector length 2 for `vsraa4S_imm` and `vsrla4S_imm`. Updated. The whitespaces mentioned are addressed. The format typo fix in `vor8B` is kept, other instructions are appended to aarch64_neon.ad. ------------- PR: https://git.openjdk.java.net/jdk/pull/2273 From ayang at openjdk.java.net Sun Feb 7 09:47:08 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Sun, 7 Feb 2021 09:47:08 GMT Subject: RFR: 8259668: Make SubTasksDone use-once [v3] In-Reply-To: References: Message-ID: > After JDK-8260574, a instance of `SubTasksDone` is never reused, so part of its APIs could be revised: `clear()` and the code calling it is removed. > > With this patch, `all_tasks_completed` contains only assertion. Kim suggested moving this assertion logic to `~SubTasksDone`, but that could defer the assertion violation. For example, in the case of `G1FullGCMarkTask::work`, there is a significant amount of code running btw the instance when all subtasks are claimed (where `all_tasks_completed` is called in this PR) and `~SubTasksDone`. In the interest of having more precise location where bugs may lie, I have kept `all_tasks_completed` in the original place. More comments on this are welcome. Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2383/files - new: https://git.openjdk.java.net/jdk/pull/2383/files/90add10d..67c399e1 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2383&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2383&range=01-02 Stats: 32 lines in 4 files changed: 4 ins; 15 del; 13 mod Patch: https://git.openjdk.java.net/jdk/pull/2383.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2383/head:pull/2383 PR: https://git.openjdk.java.net/jdk/pull/2383 From ayang at openjdk.java.net Sun Feb 7 09:47:10 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Sun, 7 Feb 2021 09:47:10 GMT Subject: RFR: 8259668: Make SubTasksDone use-once [v2] In-Reply-To: References: <1t8CJEN8F3yCAYd6oO7OAvMJAeyLz4JG8krB-bl9oBI=.aa665edd-63b0-4111-9211-91cc257c927e@github.com> Message-ID: On Sun, 7 Feb 2021 05:49:21 GMT, Kim Barrett wrote: >> Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > Changes requested by kbarrett (Reviewer). > I also think this change should be made. Done; I misunderstood it to be the caller of `all_tasks_completed`. PS: on revising the doc for `all_tasks_completed`, I realized the intention is to assert all tasks are *claimed*, not *completed* (some subtasks might still be running), so I make the rename for better accuracy. > src/hotspot/share/gc/shared/workgroup.cpp line 364: > >> 362: #ifdef ASSERT >> 363: void SubTasksDone::all_tasks_completed_impl(uint skipped[], size_t skipped_size) { >> 364: if (Atomic::cmpxchg(&_verification_done, false, true)) { > > This verification done check prevents detection of certain kinds of mistakes. For example, if the first thread did not claim a skipped task but a later one did, we'll miss that. Given that this function only does anything in a debug-build, and is usually pretty fast because the number of subtasks is small, I don't think there's a good reason to "optimize" it this way. (Assuming it even is an optimization, as a CAS operation may be relatively expensive.) It's not an optimization; it's for avoiding getting duplicate assertion failures. Considering all uses of `SubTasksDone` are based on task-based closure, all threads will run the identical closure. Therefore, it's unlikely that a subtask is skipped in one thread but claimed in another. Revised the comments to make the intention clearer. > src/hotspot/share/gc/shared/workgroup.hpp line 341: > >> 339: template> 340: ENABLE_IF(Conjunction...>::value)> >> 341: void all_tasks_completed(T0 first_skipped, Ts... more_skipped) { > > I think this overload should be treated as the primary that the documentation applies to, with the no-arg overload following and being commented as being the base case for the variadic function. Done. > src/hotspot/share/gc/shared/workgroup.cpp line 391: > >> 389: >> 390: bool SubTasksDone::valid() { >> 391: return _tasks != NULL; > > This function should can never return false, since _tasks is initialized in the constructor and the value deleted in the destructor. It should be removed and any callers fixed. Removed. ------------- PR: https://git.openjdk.java.net/jdk/pull/2383 From aph at openjdk.java.net Sun Feb 7 10:58:44 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Sun, 7 Feb 2021 10:58:44 GMT Subject: RFR: 8258953: AArch64: move NEON instructions to aarch64_neon.ad [v2] In-Reply-To: References: Message-ID: On Sun, 7 Feb 2021 07:57:57 GMT, Dong Bo wrote: >> As discussed in [1], all NEON instructions should be moved from `aarch64.ad` to `aarch64_neon.ad`. >> >> In the first commit [2] of this PR, the NEON instructions are deleted from `aarch64.ad` and appended to `aarch64_neon.ad`. >> I compared the generated code in `aarch64_neon.ad` with original code in `aarch64.ad`, no suspicious differences found. >> The last two commits just simply move code around in `aarch64_neon.ad` to put related instructions together, i.e. `LoadStore` [3], `Reduction` [4]. >> >> This also supports vector length 4 for `vsraa8B_imm` and `vsrla8B_imm`, vector length 2 for `vsraa4S_imm` and `vsrla4S_imm`, fixes few typos, e.g. `vor8B`, `vsrla4S_imm`. >> >> [1] https://github.com/openjdk/jdk/pull/1215#issuecomment-728186803 >> [2] https://github.com/dgbo/jdk/commit/40cbe99e647cdf93712edf8f77ab3b5b30ea0a95 >> [3] https://github.com/dgbo/jdk/commit/695fb8f8ef009b733a8f804e791347f4bfe2572e >> [4] https://github.com/dgbo/jdk/commit/e0c38aa9aaa6af9925a3821328384b1e2b2c2070 > > Dong Bo has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > 8258953: AArch64: move NEON instructions to aarch64_neon.ad That looks fine. I haven't been able to check that all this patch does is move code from aarch64.ad to aarch64_neon.ad, but I believe you. ------------- Marked as reviewed by aph (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2273 From github.com+9200663+quaffel at openjdk.java.net Sun Feb 7 22:02:04 2021 From: github.com+9200663+quaffel at openjdk.java.net (Niklas Radomski) Date: Sun, 7 Feb 2021 22:02:04 GMT Subject: RFR: JDK-8260372: [PPC64] Add support for JDK-8210498 and JDK-8222841 [v2] In-Reply-To: References: Message-ID: > Introduces support for _nmethod entry barriers_ and _c2i entry barriers_ on the ppc platform. Those are required to enable concurrent class unloading for compatible garbage collectors, such as Shenandoah or zGC. > > _This is a preparational change for the Shenandoah GC port to ppc. As such, it introduces features that the current version doesn't make use of, but that are required for the upcoming change. This way, the scope of the upcoming change is limited to Shenandoah-specific functionality; making its review a little easier._ Niklas Radomski has updated the pull request incrementally with one additional commit since the last revision: Apply Martin's feedback ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2432/files - new: https://git.openjdk.java.net/jdk/pull/2432/files/b6add58b..ab354b68 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2432&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2432&range=00-01 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/2432.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2432/head:pull/2432 PR: https://git.openjdk.java.net/jdk/pull/2432 From njian at openjdk.java.net Mon Feb 8 02:08:41 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Mon, 8 Feb 2021 02:08:41 GMT Subject: RFR: 8258953: AArch64: move NEON instructions to aarch64_neon.ad [v2] In-Reply-To: References: Message-ID: On Sun, 7 Feb 2021 07:57:57 GMT, Dong Bo wrote: >> As discussed in [1], all NEON instructions should be moved from `aarch64.ad` to `aarch64_neon.ad`. >> >> In the first commit [2] of this PR, the NEON instructions are deleted from `aarch64.ad` and appended to `aarch64_neon.ad`. >> I compared the generated code in `aarch64_neon.ad` with original code in `aarch64.ad`, no suspicious differences found. >> The last two commits just simply move code around in `aarch64_neon.ad` to put related instructions together, i.e. `LoadStore` [3], `Reduction` [4]. >> >> This also supports vector length 4 for `vsraa8B_imm` and `vsrla8B_imm`, vector length 2 for `vsraa4S_imm` and `vsrla4S_imm`, fixes few typos, e.g. `vor8B`, `vsrla4S_imm`. >> >> [1] https://github.com/openjdk/jdk/pull/1215#issuecomment-728186803 >> [2] https://github.com/dgbo/jdk/commit/40cbe99e647cdf93712edf8f77ab3b5b30ea0a95 >> [3] https://github.com/dgbo/jdk/commit/695fb8f8ef009b733a8f804e791347f4bfe2572e >> [4] https://github.com/dgbo/jdk/commit/e0c38aa9aaa6af9925a3821328384b1e2b2c2070 > > Dong Bo has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > 8258953: AArch64: move NEON instructions to aarch64_neon.ad I compared all-ad-src.ad with and without the patch, and it looked good to me. ------------- Marked as reviewed by njian (Committer). PR: https://git.openjdk.java.net/jdk/pull/2273 From dongbo at openjdk.java.net Mon Feb 8 02:15:44 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Mon, 8 Feb 2021 02:15:44 GMT Subject: RFR: 8258953: AArch64: move NEON instructions to aarch64_neon.ad [v2] In-Reply-To: References: Message-ID: On Mon, 8 Feb 2021 02:05:59 GMT, Ningsheng Jian wrote: >> Dong Bo has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: >> >> 8258953: AArch64: move NEON instructions to aarch64_neon.ad > > I compared all-ad-src.ad with and without the patch, and it looked good to me. Thanks for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/2273 From dongbo at openjdk.java.net Mon Feb 8 02:15:45 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Mon, 8 Feb 2021 02:15:45 GMT Subject: Integrated: 8258953: AArch64: move NEON instructions to aarch64_neon.ad In-Reply-To: References: Message-ID: On Thu, 28 Jan 2021 01:37:33 GMT, Dong Bo wrote: > As discussed in [1], all NEON instructions should be moved from `aarch64.ad` to `aarch64_neon.ad`. > > In the first commit [2] of this PR, the NEON instructions are deleted from `aarch64.ad` and appended to `aarch64_neon.ad`. > I compared the generated code in `aarch64_neon.ad` with original code in `aarch64.ad`, no suspicious differences found. > The last two commits just simply move code around in `aarch64_neon.ad` to put related instructions together, i.e. `LoadStore` [3], `Reduction` [4]. > > This also supports vector length 4 for `vsraa8B_imm` and `vsrla8B_imm`, vector length 2 for `vsraa4S_imm` and `vsrla4S_imm`, fixes few typos, e.g. `vor8B`, `vsrla4S_imm`. > > [1] https://github.com/openjdk/jdk/pull/1215#issuecomment-728186803 > [2] https://github.com/dgbo/jdk/commit/40cbe99e647cdf93712edf8f77ab3b5b30ea0a95 > [3] https://github.com/dgbo/jdk/commit/695fb8f8ef009b733a8f804e791347f4bfe2572e > [4] https://github.com/dgbo/jdk/commit/e0c38aa9aaa6af9925a3821328384b1e2b2c2070 This pull request has now been integrated. Changeset: aa5bc6ed Author: Dong Bo Committer: Fei Yang URL: https://git.openjdk.java.net/jdk/commit/aa5bc6ed Stats: 5661 lines in 3 files changed: 3216 ins; 2435 del; 10 mod 8258953: AArch64: move NEON instructions to aarch64_neon.ad Reviewed-by: njian, aph ------------- PR: https://git.openjdk.java.net/jdk/pull/2273 From shade at openjdk.java.net Mon Feb 8 07:32:40 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 8 Feb 2021 07:32:40 GMT Subject: Integrated: 8260899: ARM32: SyncOnValueBasedClassTest fails with assert(is_valid()) failed: invalid register In-Reply-To: <3kcU-WrscBpRvHSepNBZWu4CNsBaP6pdeYTyIUT2z1E=.180e2e1d-7dec-4c28-b9c4-058627b12419@github.com> References: <3kcU-WrscBpRvHSepNBZWu4CNsBaP6pdeYTyIUT2z1E=.180e2e1d-7dec-4c28-b9c4-058627b12419@github.com> Message-ID: On Tue, 2 Feb 2021 09:24:20 GMT, Aleksey Shipilev wrote: > $ CONF=linux-arm-server-fastdebug make run-test TEST=runtime/Monitor/SyncOnValueBasedClassTest.java > ... > > # Internal Error (/home/pi/jdk/src/hotspot/cpu/arm/register_arm.hpp:155), pid=3793, tid=3808 > # assert(is_valid()) failed: invalid register > # > # JRE version: OpenJDK Runtime Environment (17.0) (fastdebug build 17-internal+0-adhoc.pi.jdk) > # Java VM: OpenJDK Server VM (fastdebug 17-internal+0-adhoc.pi.jdk, compiled mode, emulated-client, g1 gc, linux-arm) > # Problematic frame: > # V [libjvm.so+0xe1a6a8] MacroAssembler::load_klass(RegisterImpl*, RegisterImpl*, AsmCondition)+0xa4 > > Current CompileTask: > C1: 318 2 !b java.lang.Class::desiredAssertionStatus (54 bytes) > > Stack: [0x72580000,0x72600000], sp=0x725fe170, free space=504k > Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0xe1a6a8] MacroAssembler::load_klass(RegisterImpl*, RegisterImpl*, AsmCondition)+0xa4 > V [libjvm.so+0x43b6b4] C1_MacroAssembler::lock_object(RegisterImpl*, RegisterImpl*, RegisterImpl*, RegisterImpl*, Label&)+0xcf8 > V [libjvm.so+0x3d731c] LIR_Assembler::emit_lock(LIR_OpLock*)+0x160 > > > The problem is in this code: > > if (DiagnoseSyncOnValueBasedClasses != 0) { > load_klass(tmp1, obj); <--- asserts > ldr_u32(tmp1, Address(tmp1, Klass::access_flags_offset())); > tst(tmp1, JVM_ACC_IS_VALUE_BASED_CLASS); > b(slow_case, ne); > } > > `tmp1` is `noreg` when `!BiasedLocking`, because `c1_LIRGenerator_arm.cpp` provides it only when `UseBiasedLocking` is enabled: > > void LIRGenerator::do_MonitorEnter(MonitorEnter* x) { > ... > // Need a scratch register for biased locking on arm > LIR_Opr scratch = LIR_OprFact::illegalOpr; > if(UseBiasedLocking) { > scratch = new_pointer_register(); > } else { > scratch = atomicLockOpr(); // <--- actually illegalOpr > } > > ... > > monitor_enter(obj.result(), lock, hdr, scratch, > x->monitor_no(), info_for_exception, info); > } > > The way out is to use `tmp2`, which is the alias for `Rtemp` and always available. > > Additional testing: > - [x] Linux ARM32 `SyncOnValueBasedClassTest`, `-XX:+UseBiasedLocking` > - [x] Linux ARM32 `SyncOnValueBasedClassTest`, `-XX:-UseBiasedLocking` This pull request has now been integrated. Changeset: d45343ea Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/d45343ea Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod 8260899: ARM32: SyncOnValueBasedClassTest fails with assert(is_valid()) failed: invalid register Reviewed-by: dsamersoff ------------- PR: https://git.openjdk.java.net/jdk/pull/2349 From akozlov at openjdk.java.net Mon Feb 8 08:31:55 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Mon, 8 Feb 2021 08:31:55 GMT Subject: RFR: 8261072: AArch64: Fix MacroAssembler::get_thread convention Message-ID: Please review a fix in a special calling convention for aarch64_get_thread_helper for non-Linux platforms (windows/aarch64 for now). Preliminary review: https://mail.openjdk.java.net/pipermail/aarch64-port-dev/2021-January/011239.html ------------- Commit messages: - Fix MacroAssembler::get_thread convention Changes: https://git.openjdk.java.net/jdk/pull/2451/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2451&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261072 Stats: 6 lines in 1 file changed: 4 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2451.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2451/head:pull/2451 PR: https://git.openjdk.java.net/jdk/pull/2451 From ngasson at openjdk.java.net Mon Feb 8 08:49:51 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Mon, 8 Feb 2021 08:49:51 GMT Subject: RFR: 8261027: AArch64: Support for LSE atomics C++ HotSpot code In-Reply-To: References: Message-ID: On Fri, 5 Feb 2021 18:56:46 GMT, Andrew Haley wrote: > Go back a few years, and there were simple atomic load/store exclusive > instructions on Arm. Say you want to do an atomic increment of a > counter. You'd do an atomic load to get the counter into your local cache > in exclusive state, increment that counter locally, then write that > incremented counter back to memory with an atomic store. All the time > that cache line was in exclusive state, so you're guaranteed that > no-one else changed anything on that cache line while you had it. > > This is hard to scale on a very large system (e.g. Fugaku) because if > many processors are incrementing that counter you get a lot of cache > line ping-ponging between cores. > > So, Arm decided to add a locked memory increment instruction that > works without needing to load an entire line into local cache. It's a > single instruction that loads, increments, and writes back. The secret > is to send a cache control message to whichever processor owns the > cache line containing the count, tell that processor to increment the > counter and return the incremented value. That way cache coherency > traffic is mimimized. This new set of instructions is known as Large > System Extensions, or LSE. > > Unfortunately, in recent processors, the "old" load/store exclusive > instructions, sometimes perform very badly. Therefore, it's now > necessary for software to detect which version of Arm it's running > on, and use the "new" LSE instructions if they're available. Otherwise > performance can be very poor under heavy contention. > > GCC's -moutline-atomics does this by providing library calls which use > LSE if it's available, but this option is only provided on newer > versions of GCC. This is particularly problematic with older versions > of OpenJDK, which build using old GCC versions. > > Also, I suspect that some other operating systems could use this. > Perhaps not MacOS, given that all Apple CPUs support LSE, but > maybe Windows. src/hotspot/cpu/aarch64/atomic_aarch64.S line 1: > 1: // Copyright (c) 2021, Red Hat Inc. All rights reserved. Does this file work with the Windows assembler? src/hotspot/cpu/aarch64/atomic_aarch64.S line 35: > 33: ret > 34: > 35: .globl aarch64_atomic_fetch_add_4_default_impl The N1 optimisation guide suggests aligning branch targets on 32 byte boundaries. src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5579: > 5577: // > 5578: // If LSE is in use, generate LSE versions of all the stubs. The > 5579: // non-LSE versions are in atomic_aarch64.S. IMO it would be better for maintainability if the LSE versions were in atomic_aarch64.S too (with an explicit `.arch armv8-a+lse` directive). Is there any reason to generate them here, other than to support old toolchains? As far as I can tell GNU as supported LSE as far back as binutils 2.27. https://sourceware.org/binutils/docs-2.27/as/AArch64-Extensions.html ------------- PR: https://git.openjdk.java.net/jdk/pull/2434 From ngasson at openjdk.java.net Mon Feb 8 08:51:41 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Mon, 8 Feb 2021 08:51:41 GMT Subject: RFR: 8260355: AArch64: deoptimization stub should save vector registers [v4] In-Reply-To: References: <5PbujtOhWB7uqNXu5vRMwYGNMNU78aLAkGpDLWUnQKM=.9d2206fe-9c15-424c-b08e-80eab468df2a@github.com> <8uV_aS99ZXLKzfeqP9PnJOMqLDLqqDBAXgY1kShysUE=.458c9338-e47a-4371-aac7-8fe096ef19c4@github.com> Message-ID: On Wed, 3 Feb 2021 06:59:58 GMT, Nick Gasson wrote: >> `RegisterMap`-related changes look good. > > @theRealAph are the sharedRuntime_aarch64.cpp changes ok? @theRealAph Is this one ok to push now? ------------- PR: https://git.openjdk.java.net/jdk/pull/2279 From aph at openjdk.java.net Mon Feb 8 09:49:42 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 8 Feb 2021 09:49:42 GMT Subject: RFR: 8260355: AArch64: deoptimization stub should save vector registers [v4] In-Reply-To: <8uV_aS99ZXLKzfeqP9PnJOMqLDLqqDBAXgY1kShysUE=.458c9338-e47a-4371-aac7-8fe096ef19c4@github.com> References: <5PbujtOhWB7uqNXu5vRMwYGNMNU78aLAkGpDLWUnQKM=.9d2206fe-9c15-424c-b08e-80eab468df2a@github.com> <8uV_aS99ZXLKzfeqP9PnJOMqLDLqqDBAXgY1kShysUE=.458c9338-e47a-4371-aac7-8fe096ef19c4@github.com> Message-ID: On Tue, 2 Feb 2021 10:21:02 GMT, Nick Gasson wrote: >> This is an AArch64 port of the fix for JDK-8256056 "Deoptimization stub >> doesn't save vector registers on x86". The problem is that a vector >> produced by the Vector API may be stored in a register when the deopt >> blob is called. Because the deopt blob only stores the lower half of >> vector registers, the full vector object cannot be rematerialized during >> deoptimization. So the following will crash on AArch64 with current JDK: >> >> make test TEST="jdk/incubator/vector" \ >> JTREG="VM_OPTIONS=-XX:+DeoptimizeALot -XX:DeoptimizeALotInterval=0" >> >> The fix is to store the full vector registers by passing >> save_vectors=true to save_live_registers() in the deopt blob. Because >> save_live_registers() places the integer registers above the floating >> registers in the stack frame, RegisterSaver::r0_offset_in_bytes() needs >> to calculate the SP offset based on whether full vectors were saved, and >> whether those vectors were NEON or SVE, rather than using a static >> offset as it does currently. >> >> The change to VectorSupport::allocate_vector_payload_helper() is >> required because we only store the lowest VMReg slot in the oop map. >> However unlike x86 the vector registers are always saved in a contiguous >> region of memory, so we can calculate the address of each vector element >> as an offset from the address of the first slot. X86 handles this in >> RegisterMap::pd_location() but that won't work on AArch64 because with >> SVE there isn't a unique VMReg corresponding to each four-byte physical >> slot in the vector (there are always exactly eight logical VMRegs >> regardless of the actual vector length). >> >> Tested hotspot_all_no_apps and jdk_core. > > Nick Gasson has updated the pull request incrementally with one additional commit since the last revision: > > Review comments Marked as reviewed by aph (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2279 From aph at openjdk.java.net Mon Feb 8 09:55:00 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 8 Feb 2021 09:55:00 GMT Subject: RFR: 8261027: AArch64: Support for LSE atomics C++ HotSpot code In-Reply-To: References: Message-ID: On Mon, 8 Feb 2021 08:35:49 GMT, Nick Gasson wrote: >> Go back a few years, and there were simple atomic load/store exclusive >> instructions on Arm. Say you want to do an atomic increment of a >> counter. You'd do an atomic load to get the counter into your local cache >> in exclusive state, increment that counter locally, then write that >> incremented counter back to memory with an atomic store. All the time >> that cache line was in exclusive state, so you're guaranteed that >> no-one else changed anything on that cache line while you had it. >> >> This is hard to scale on a very large system (e.g. Fugaku) because if >> many processors are incrementing that counter you get a lot of cache >> line ping-ponging between cores. >> >> So, Arm decided to add a locked memory increment instruction that >> works without needing to load an entire line into local cache. It's a >> single instruction that loads, increments, and writes back. The secret >> is to send a cache control message to whichever processor owns the >> cache line containing the count, tell that processor to increment the >> counter and return the incremented value. That way cache coherency >> traffic is mimimized. This new set of instructions is known as Large >> System Extensions, or LSE. >> >> Unfortunately, in recent processors, the "old" load/store exclusive >> instructions, sometimes perform very badly. Therefore, it's now >> necessary for software to detect which version of Arm it's running >> on, and use the "new" LSE instructions if they're available. Otherwise >> performance can be very poor under heavy contention. >> >> GCC's -moutline-atomics does this by providing library calls which use >> LSE if it's available, but this option is only provided on newer >> versions of GCC. This is particularly problematic with older versions >> of OpenJDK, which build using old GCC versions. >> >> Also, I suspect that some other operating systems could use this. >> Perhaps not MacOS, given that all Apple CPUs support LSE, but >> maybe Windows. > > src/hotspot/cpu/aarch64/atomic_aarch64.S line 35: > >> 33: ret >> 34: >> 35: .globl aarch64_atomic_fetch_add_4_default_impl > > The N1 optimisation guide suggests aligning branch targets on 32 byte boundaries. OK. > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5579: > >> 5577: // >> 5578: // If LSE is in use, generate LSE versions of all the stubs. The >> 5579: // non-LSE versions are in atomic_aarch64.S. > > IMO it would be better for maintainability if the LSE versions were in atomic_aarch64.S too (with an explicit `.arch armv8-a+lse` directive). Is there any reason to generate them here, other than to support old toolchains? As far as I can tell GNU as supported LSE as far back as binutils 2.27. > > https://sourceware.org/binutils/docs-2.27/as/AArch64-Extensions.html I can't see any reason to do this.There's be no benefit to moving this stuff, and it would be harder to change in the future. I'd do the whole lot as runtime stubs if I could, but they're needed before VM startup. > src/hotspot/cpu/aarch64/atomic_aarch64.S line 1: > >> 1: // Copyright (c) 2021, Red Hat Inc. All rights reserved. > > Does this file work with the Windows assembler? I have no idea. If it doesn't, please tell me; I have no Windows system. ------------- PR: https://git.openjdk.java.net/jdk/pull/2434 From ngasson at openjdk.java.net Mon Feb 8 10:08:43 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Mon, 8 Feb 2021 10:08:43 GMT Subject: RFR: 8261027: AArch64: Support for LSE atomics C++ HotSpot code In-Reply-To: References: Message-ID: On Mon, 8 Feb 2021 09:52:41 GMT, Andrew Haley wrote: >> src/hotspot/cpu/aarch64/atomic_aarch64.S line 1: >> >>> 1: // Copyright (c) 2021, Red Hat Inc. All rights reserved. >> >> Does this file work with the Windows assembler? > > I have no idea. If it doesn't, please tell me; I have no Windows system. I don't either unfortunately. Maybe @mo-beck or @luhenry could help? The Windows build on GitHub Actions failed cryptically with: Creating support/modules_libs/java.base/server/jvm.dll from 1045 file(s) make[3]: *** [lib/CompileJvm.gmk:143: /cygdrive/d/a/jdk/jdk/jdk/build/windows-aarch64/hotspot/variant-server/libjvm/objs/atomic_aarch64.obj] Error 2 ------------- PR: https://git.openjdk.java.net/jdk/pull/2434 From ngasson at openjdk.java.net Mon Feb 8 10:14:39 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Mon, 8 Feb 2021 10:14:39 GMT Subject: RFR: 8261027: AArch64: Support for LSE atomics C++ HotSpot code In-Reply-To: References: Message-ID: On Mon, 8 Feb 2021 10:06:05 GMT, Nick Gasson wrote: >> I have no idea. If it doesn't, please tell me; I have no Windows system. > > I don't either unfortunately. Maybe @mo-beck or @luhenry could help? The Windows build on GitHub Actions failed cryptically with: > > Creating support/modules_libs/java.base/server/jvm.dll from 1045 file(s) > > make[3]: *** [lib/CompileJvm.gmk:143: /cygdrive/d/a/jdk/jdk/jdk/build/windows-aarch64/hotspot/variant-server/libjvm/objs/atomic_aarch64.obj] Error 2 I suppose you could just move it to `os_cpu/linux_aarch64/` as these are only called from the Linux atomics? ------------- PR: https://git.openjdk.java.net/jdk/pull/2434 From shade at openjdk.java.net Mon Feb 8 10:18:01 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 8 Feb 2021 10:18:01 GMT Subject: [jdk16] RFR: 8261310: PPC64 Zero build fails with 'VMError::controlled_crash(int)::FunctionDescriptor functionDescriptor' has incomplete type and cannot be defined Message-ID: $ CONF=linux-ppc64-zero-fastdebug make hotspot 1799 | struct FunctionDescriptor functionDescriptor; | ^~~~~~~~~~~~~~~~~~ `FunctionDescriptor` is from `src/hotspot/cpu/ppc/assembler_ppc.hpp`, and obviously not available for Zero. The affected code was removed by JDK-8252148 in 17, so this issue affects versions below it. While not exactly the regression for 16, it would be nice to have this fixed for 16 and lower, to get clean builds on all platform configurations, including JDK 16 GA. The fix is trivial: diff --git a/src/hotspot/share/utilities/vmError.cpp b/src/hotspot/share/utilities/vmError.cpp index 9b0dc413bcd..476fdc48e43 100644 --- a/src/hotspot/share/utilities/vmError.cpp +++ b/src/hotspot/share/utilities/vmError.cpp @@ -1795,7 +1795,7 @@ void VMError::controlled_crash(int how) { char * const dataPtr = NULL; // bad data pointer const void (*funcPtr)(void); // bad function pointer -#if defined(PPC64) && !defined(ABI_ELFv2) +#if defined(PPC64) && !defined(ABI_ELFv2) && !defined(ZERO) struct FunctionDescriptor functionDescriptor; functionDescriptor.set_entry((address) 0xF); Additional testing: - [x] Linux Zero PPC64 fastdebug build ------------- Commit messages: - Special-case Zero when accessing FunctionDescriptor for PPC64 Changes: https://git.openjdk.java.net/jdk16/pull/147/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk16&pr=147&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261310 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk16/pull/147.diff Fetch: git fetch https://git.openjdk.java.net/jdk16 pull/147/head:pull/147 PR: https://git.openjdk.java.net/jdk16/pull/147 From burban at openjdk.java.net Mon Feb 8 10:27:42 2021 From: burban at openjdk.java.net (Bernhard Urban-Forster) Date: Mon, 8 Feb 2021 10:27:42 GMT Subject: RFR: 8261072: AArch64: Fix MacroAssembler::get_thread convention In-Reply-To: References: Message-ID: <4AGCX0R1EP_qxN7Uux7IsDQFFXN7VH_8Is8Yt8xYRc4=.e9eb6a11-9c4d-4384-97c9-25cc1d71561a@github.com> On Mon, 8 Feb 2021 08:26:41 GMT, Anton Kozlov wrote: > Please review a fix in a special calling convention for aarch64_get_thread_helper for non-Linux platforms (windows/aarch64 for now). > > Preliminary review: https://mail.openjdk.java.net/pipermail/aarch64-port-dev/2021-January/011239.html Marked as reviewed by burban (Author). ------------- PR: https://git.openjdk.java.net/jdk/pull/2451 From stuefe at openjdk.java.net Mon Feb 8 10:36:47 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Mon, 8 Feb 2021 10:36:47 GMT Subject: [jdk16] RFR: 8261310: PPC64 Zero build fails with 'VMError::controlled_crash(int)::FunctionDescriptor functionDescriptor' has incomplete type and cannot be defined In-Reply-To: References: Message-ID: <6rPJd6UdCDY-zqEcAeg65d8zgD7x0Qd1tkXb9WapmZk=.4402ec86-ae74-48dd-99b7-2949ed6a31b5@github.com> On Mon, 8 Feb 2021 10:12:31 GMT, Aleksey Shipilev wrote: > $ CONF=linux-ppc64-zero-fastdebug make hotspot > > > 1799 | struct FunctionDescriptor functionDescriptor; > | ^~~~~~~~~~~~~~~~~~ > > `FunctionDescriptor` is from `src/hotspot/cpu/ppc/assembler_ppc.hpp`, and obviously not available for Zero. > > The affected code was removed by JDK-8252148 in 17, so this issue affects versions below it. > While not exactly the regression for 16, it would be nice to have this fixed for 16 and lower, to get clean builds on all platform configurations, including JDK 16 GA. > > The fix is trivial: > > diff --git a/src/hotspot/share/utilities/vmError.cpp b/src/hotspot/share/utilities/vmError.cpp > index 9b0dc413bcd..476fdc48e43 100644 > --- a/src/hotspot/share/utilities/vmError.cpp > +++ b/src/hotspot/share/utilities/vmError.cpp > @@ -1795,7 +1795,7 @@ void VMError::controlled_crash(int how) { > char * const dataPtr = NULL; // bad data pointer > const void (*funcPtr)(void); // bad function pointer > > -#if defined(PPC64) && !defined(ABI_ELFv2) > +#if defined(PPC64) && !defined(ABI_ELFv2) && !defined(ZERO) > struct FunctionDescriptor functionDescriptor; > > functionDescriptor.set_entry((address) 0xF); > > Additional testing: > - [x] Linux Zero PPC64 fastdebug build This is trivial and fine. The way it is fixed may trip error handler tests on Zero which test that the pc is correctly displayed. But I don't think we have such a test. We have downstream but I ever brought it upstream. Its also not terribly important. Also, function descriptors should be available with zero too of course, but I'm not sure its worth following up on that. Note that one could just do: void* fakefundes[2] = { 0xF, NULL }; funcPtr = (const void(*)()) &fakefundes; and it would probably work just fine. Cheers, Thomas ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.java.net/jdk16/pull/147 From aph at openjdk.java.net Mon Feb 8 11:26:42 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 8 Feb 2021 11:26:42 GMT Subject: RFR: 8261027: AArch64: Support for LSE atomics C++ HotSpot code In-Reply-To: References: Message-ID: On Mon, 8 Feb 2021 10:11:30 GMT, Nick Gasson wrote: >> I don't either unfortunately. Maybe @mo-beck or @luhenry could help? The Windows build on GitHub Actions failed cryptically with: >> >> Creating support/modules_libs/java.base/server/jvm.dll from 1045 file(s) >> >> make[3]: *** [lib/CompileJvm.gmk:143: /cygdrive/d/a/jdk/jdk/jdk/build/windows-aarch64/hotspot/variant-server/libjvm/objs/atomic_aarch64.obj] Error 2 > > I suppose you could just move it to `os_cpu/linux_aarch64/` as these are only called from the Linux atomics? They're probably needed on Windows, or I'd have put them in linux_aarch64. ------------- PR: https://git.openjdk.java.net/jdk/pull/2434 From mdoerr at openjdk.java.net Mon Feb 8 11:26:44 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Mon, 8 Feb 2021 11:26:44 GMT Subject: RFR: JDK-8260372: [PPC64] Add support for JDK-8210498 and JDK-8222841 [v2] In-Reply-To: References: Message-ID: On Sun, 7 Feb 2021 22:02:04 GMT, Niklas Radomski wrote: >> Introduces support for _nmethod entry barriers_ and _c2i entry barriers_ on the ppc platform. Those are required to enable concurrent class unloading for compatible garbage collectors, such as Shenandoah or zGC. >> >> _This is a preparational change for the Shenandoah GC port to ppc. As such, it introduces features that the current version doesn't make use of, but that are required for the upcoming change. This way, the scope of the upcoming change is limited to Shenandoah-specific functionality; making its review a little easier._ > > Niklas Radomski has updated the pull request incrementally with one additional commit since the last revision: > > Apply Martin's feedback Thanks for the update. I found an additional Big Endian bug (missing function descriptor). src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 3556: > 3554: StubCodeMark mark(this, "StubRoutines", "nmethod_entry_barrier"); > 3555: > 3556: address stub_address = __ pc(); Please use __ function_entry(); instead of __ pc(); Otherwise it's broken on Big Endian which use ABI v1. ------------- Changes requested by mdoerr (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2432 From mdoerr at openjdk.java.net Mon Feb 8 11:31:44 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Mon, 8 Feb 2021 11:31:44 GMT Subject: RFR: JDK-8260372: [PPC64] Add support for JDK-8210498 and JDK-8222841 [v2] In-Reply-To: References: Message-ID: On Sun, 7 Feb 2021 22:02:04 GMT, Niklas Radomski wrote: >> Introduces support for _nmethod entry barriers_ and _c2i entry barriers_ on the ppc platform. Those are required to enable concurrent class unloading for compatible garbage collectors, such as Shenandoah or zGC. >> >> _This is a preparational change for the Shenandoah GC port to ppc. As such, it introduces features that the current version doesn't make use of, but that are required for the upcoming change. This way, the scope of the upcoming change is limited to Shenandoah-specific functionality; making its review a little easier._ > > Niklas Radomski has updated the pull request incrementally with one additional commit since the last revision: > > Apply Martin's feedback Marked as reviewed by mdoerr (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2432 From mdoerr at openjdk.java.net Mon Feb 8 11:31:47 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Mon, 8 Feb 2021 11:31:47 GMT Subject: RFR: JDK-8260372: [PPC64] Add support for JDK-8210498 and JDK-8222841 [v2] In-Reply-To: References: Message-ID: On Mon, 8 Feb 2021 11:23:03 GMT, Martin Doerr wrote: >> Niklas Radomski has updated the pull request incrementally with one additional commit since the last revision: >> >> Apply Martin's feedback > > src/hotspot/cpu/ppc/stubGenerator_ppc.cpp line 3556: > >> 3554: StubCodeMark mark(this, "StubRoutines", "nmethod_entry_barrier"); >> 3555: >> 3556: address stub_address = __ pc(); > > Please use __ function_entry(); instead of __ pc(); > Otherwise it's broken on Big Endian which use ABI v1. Sorry, __ pc() seems to be correct. It's directly called from nmethod_entry_barrier without using a function descriptor. ------------- PR: https://git.openjdk.java.net/jdk/pull/2432 From stefank at openjdk.java.net Mon Feb 8 13:17:52 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Mon, 8 Feb 2021 13:17:52 GMT Subject: RFR: 8260019: Move some Thread subtypes out of thread.hpp [v2] In-Reply-To: References: <-Nn8JnMM_HSkbTRFUt3BsUt56gB63ixaurC5zsJ4oGQ=.b7fcba36-2a7d-469c-bff1-9b134713282d@github.com> Message-ID: On Fri, 5 Feb 2021 02:57:20 GMT, Ioi Lam wrote: >> I think the rule has been: if it's a new file, it gets a new year. > > Hi David, you're correct that I just cut-and-pasted the code; the only exception was I made the `thread_entry` functions member functions. > > I don't have a better idea for the file names, either. I'll keep them as is in the patch. Maybe someone else with a better sense for naming could fix our file names in the future. > > I fixed the copyright year. > I think the rule has been: if it's a new file, it gets a new year. My understanding is that if code has been copied, then the old copyright dates should be kept. ------------- PR: https://git.openjdk.java.net/jdk/pull/2390 From hseigel at openjdk.java.net Mon Feb 8 13:22:43 2021 From: hseigel at openjdk.java.net (Harold Seigel) Date: Mon, 8 Feb 2021 13:22:43 GMT Subject: Integrated: 8261161: Clean up warnings in hotspot/jtreg/vmTestbase tests In-Reply-To: References: Message-ID: On Fri, 5 Feb 2021 14:48:25 GMT, Harold Seigel wrote: > Please review this change to clean up warnings, such as the following, in the vmTestbase tests. > > warning: [synchronization] attempt to synchronize on an instance of a value-based class > warning: [removal] Integer(int) in Integer has been deprecated and marked for removal > > This change cleans up the warnings by using a non-value based class to synchronize on, and replacing calls such as Integer(int) with Integer.valueOf(int). > > The change was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-8 on Linux x64. > > Thanks, Harold This pull request has now been integrated. Changeset: db0ca2b9 Author: Harold Seigel URL: https://git.openjdk.java.net/jdk/commit/db0ca2b9 Stats: 747 lines in 129 files changed: 2 ins; 0 del; 745 mod 8261161: Clean up warnings in hotspot/jtreg/vmTestbase tests Reviewed-by: lfoltan, coleenp ------------- PR: https://git.openjdk.java.net/jdk/pull/2427 From hseigel at openjdk.java.net Mon Feb 8 13:22:42 2021 From: hseigel at openjdk.java.net (Harold Seigel) Date: Mon, 8 Feb 2021 13:22:42 GMT Subject: RFR: 8261161: Clean up warnings in hotspot/jtreg/vmTestbase tests In-Reply-To: <1KH2oh0e-Qi8fsB4oxmJPN4AIdvEU2gYlhdGVZoPyjg=.1c2e2036-1fff-4ec5-97e4-616987dd7864@github.com> References: <1KH2oh0e-Qi8fsB4oxmJPN4AIdvEU2gYlhdGVZoPyjg=.1c2e2036-1fff-4ec5-97e4-616987dd7864@github.com> Message-ID: On Fri, 5 Feb 2021 16:53:00 GMT, Coleen Phillimore wrote: >> Please review this change to clean up warnings, such as the following, in the vmTestbase tests. >> >> warning: [synchronization] attempt to synchronize on an instance of a value-based class >> warning: [removal] Integer(int) in Integer has been deprecated and marked for removal >> >> This change cleans up the warnings by using a non-value based class to synchronize on, and replacing calls such as Integer(int) with Integer.valueOf(int). >> >> The change was tested by running Mach5 tiers 1-2 on Linux, Mac OS, and Windows, and Mach5 tiers 3-8 on Linux x64. >> >> Thanks, Harold > > Wow. thank you! Thanks Lois and Coleen for the reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/2427 From luhenry at openjdk.java.net Mon Feb 8 13:38:46 2021 From: luhenry at openjdk.java.net (Ludovic Henry) Date: Mon, 8 Feb 2021 13:38:46 GMT Subject: RFR: 8261027: AArch64: Support for LSE atomics C++ HotSpot code In-Reply-To: References: Message-ID: On Mon, 8 Feb 2021 11:24:00 GMT, Andrew Haley wrote: >> I suppose you could just move it to `os_cpu/linux_aarch64/` as these are only called from the Linux atomics? > > They're probably needed on Windows, or I'd have put them in linux_aarch64. I can confirm that assembly code targeted at Linux generally won't compile out of the box on Windows. For `windows_aarch64`, could we have simple fallbacks in C++ code that simply call `InterlockedAdd`, `InterlockedExchange`, and `InterlockedCompareExchange`? It would be similar to [atomic_windows_aarch64.hpp](https://github.com/openjdk/jdk/blob/master/src/hotspot/os_cpu/windows_aarch64/atomic_windows_aarch64.hpp) ------------- PR: https://git.openjdk.java.net/jdk/pull/2434 From adinn at openjdk.java.net Mon Feb 8 14:54:43 2021 From: adinn at openjdk.java.net (Andrew Dinn) Date: Mon, 8 Feb 2021 14:54:43 GMT Subject: RFR: 8261027: AArch64: Support for LSE atomics C++ HotSpot code In-Reply-To: References: Message-ID: On Fri, 5 Feb 2021 18:56:46 GMT, Andrew Haley wrote: > Go back a few years, and there were simple atomic load/store exclusive > instructions on Arm. Say you want to do an atomic increment of a > counter. You'd do an atomic load to get the counter into your local cache > in exclusive state, increment that counter locally, then write that > incremented counter back to memory with an atomic store. All the time > that cache line was in exclusive state, so you're guaranteed that > no-one else changed anything on that cache line while you had it. > > This is hard to scale on a very large system (e.g. Fugaku) because if > many processors are incrementing that counter you get a lot of cache > line ping-ponging between cores. > > So, Arm decided to add a locked memory increment instruction that > works without needing to load an entire line into local cache. It's a > single instruction that loads, increments, and writes back. The secret > is to send a cache control message to whichever processor owns the > cache line containing the count, tell that processor to increment the > counter and return the incremented value. That way cache coherency > traffic is mimimized. This new set of instructions is known as Large > System Extensions, or LSE. > > Unfortunately, in recent processors, the "old" load/store exclusive > instructions, sometimes perform very badly. Therefore, it's now > necessary for software to detect which version of Arm it's running > on, and use the "new" LSE instructions if they're available. Otherwise > performance can be very poor under heavy contention. > > GCC's -moutline-atomics does this by providing library calls which use > LSE if it's available, but this option is only provided on newer > versions of GCC. This is particularly problematic with older versions > of OpenJDK, which build using old GCC versions. > > Also, I suspect that some other operating systems could use this. > Perhaps not MacOS, given that all Apple CPUs support LSE, but > maybe Windows. It would help to change the name of old_value to value. Otherwise this si ok as is. src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.hpp line 88: > 86: template > 87: D add_and_fetch(D volatile* dest, I add_value, atomic_memory_order order) const { > 88: D old_value = fetch_and_add(dest, add_value, order) + add_value; I'm not sure this should be called old_value. It is actually old_value + add_value i.e. it really ought to be called eiter new_value (or just value?). src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.hpp line 95: > 93: template<> > 94: template > 95: inline D Atomic::PlatformAdd<4>::fetch_and_add(D volatile* dest, I add_value, It may be possible to avoid the cut-and-paste involved in declaring almost exactly the same template body for byte_size == 4 and 8 with a generic template which includes a function type element supplemented with two auxiliary templates to instantiate that function element with either aarch64_atomic_fetch_add_4_impl or aarch64_atomic_fetch_add_8_impl. That would mean more templates but less repetition. On the whole I prefer less templates so perhaps this is best left as is. ------------- Marked as reviewed by adinn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2434 From adinn at openjdk.java.net Mon Feb 8 14:54:44 2021 From: adinn at openjdk.java.net (Andrew Dinn) Date: Mon, 8 Feb 2021 14:54:44 GMT Subject: RFR: 8261027: AArch64: Support for LSE atomics C++ HotSpot code In-Reply-To: References: Message-ID: On Mon, 8 Feb 2021 12:21:22 GMT, Andrew Dinn wrote: >> Go back a few years, and there were simple atomic load/store exclusive >> instructions on Arm. Say you want to do an atomic increment of a >> counter. You'd do an atomic load to get the counter into your local cache >> in exclusive state, increment that counter locally, then write that >> incremented counter back to memory with an atomic store. All the time >> that cache line was in exclusive state, so you're guaranteed that >> no-one else changed anything on that cache line while you had it. >> >> This is hard to scale on a very large system (e.g. Fugaku) because if >> many processors are incrementing that counter you get a lot of cache >> line ping-ponging between cores. >> >> So, Arm decided to add a locked memory increment instruction that >> works without needing to load an entire line into local cache. It's a >> single instruction that loads, increments, and writes back. The secret >> is to send a cache control message to whichever processor owns the >> cache line containing the count, tell that processor to increment the >> counter and return the incremented value. That way cache coherency >> traffic is mimimized. This new set of instructions is known as Large >> System Extensions, or LSE. >> >> Unfortunately, in recent processors, the "old" load/store exclusive >> instructions, sometimes perform very badly. Therefore, it's now >> necessary for software to detect which version of Arm it's running >> on, and use the "new" LSE instructions if they're available. Otherwise >> performance can be very poor under heavy contention. >> >> GCC's -moutline-atomics does this by providing library calls which use >> LSE if it's available, but this option is only provided on newer >> versions of GCC. This is particularly problematic with older versions >> of OpenJDK, which build using old GCC versions. >> >> Also, I suspect that some other operating systems could use this. >> Perhaps not MacOS, given that all Apple CPUs support LSE, but >> maybe Windows. > > src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.hpp line 95: > >> 93: template<> >> 94: template >> 95: inline D Atomic::PlatformAdd<4>::fetch_and_add(D volatile* dest, I add_value, > > It may be possible to avoid the cut-and-paste involved in declaring almost exactly the same template body for byte_size == 4 and 8 with a generic template which includes a function type element supplemented with two auxiliary templates to instantiate that function element with either aarch64_atomic_fetch_add_4_impl or aarch64_atomic_fetch_add_8_impl. That would mean more templates but less repetition. On the whole I prefer less templates so perhaps this is best left as is. n.b. the same comment applies to the cut and paste for PlatformCchg and PlatformCmpxchg ------------- PR: https://git.openjdk.java.net/jdk/pull/2434 From aph at redhat.com Mon Feb 8 17:36:35 2021 From: aph at redhat.com (Andrew Haley) Date: Mon, 8 Feb 2021 17:36:35 +0000 Subject: RFR: 8261027: AArch64: Support for LSE atomics C++ HotSpot code In-Reply-To: References: Message-ID: <5496f9b5-1c36-51e1-e34c-4a754ad15957@redhat.com> On 2/8/21 1:38 PM, Ludovic Henry wrote: > On Mon, 8 Feb 2021 11:24:00 GMT, Andrew Haley wrote: > >>> I suppose you could just move it to `os_cpu/linux_aarch64/` as these are only called from the Linux atomics? >> >> They're probably needed on Windows, or I'd have put them in linux_aarch64. > > I can confirm that assembly code targeted at Linux generally won't compile out of the box on Windows. Generally, OK, but what's wrong with that specific file? It should run just fine. We're getting an incomprehensible error message, but what does it mean? > For `windows_aarch64`, could we have simple fallbacks in C++ code that simply call `InterlockedAdd`, `InterlockedExchange`, and `InterlockedCompareExchange`? It would be similar to [atomic_windows_aarch64.hpp](https://github.com/openjdk/jdk/blob/master/src/hotspot/os_cpu/windows_aarch64/atomic_windows_aarch64.hpp) Well, I don't know. Do InterlockedAdd and its friends generate good code? You've got the machines there, so you can have a look, but I can't. It's up to you to decide whether you care about code quality. I'm trying my best to help, but without your assistance there's nothing I can do. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From shade at openjdk.java.net Mon Feb 8 18:02:52 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 8 Feb 2021 18:02:52 GMT Subject: [jdk16] Withdrawn: 8261310: PPC64 Zero build fails with 'VMError::controlled_crash(int)::FunctionDescriptor functionDescriptor' has incomplete type and cannot be defined In-Reply-To: References: Message-ID: <9dSE5wuU6fbXe1ZkJXcQmSd3yrbM4dCERpLT1ttt2Ac=.3cebb654-0db3-41c9-9f81-3a942d22d9bb@github.com> On Mon, 8 Feb 2021 10:12:31 GMT, Aleksey Shipilev wrote: > $ CONF=linux-ppc64-zero-fastdebug make hotspot > > > 1799 | struct FunctionDescriptor functionDescriptor; > | ^~~~~~~~~~~~~~~~~~ > > `FunctionDescriptor` is from `src/hotspot/cpu/ppc/assembler_ppc.hpp`, and obviously not available for Zero. > > The affected code was removed by JDK-8252148 in 17, so this issue affects versions below it. > While not exactly the regression for 16, it would be nice to have this fixed for 16 and lower, to get clean builds on all platform configurations, including JDK 16 GA. > > The fix is trivial: > > diff --git a/src/hotspot/share/utilities/vmError.cpp b/src/hotspot/share/utilities/vmError.cpp > index 9b0dc413bcd..476fdc48e43 100644 > --- a/src/hotspot/share/utilities/vmError.cpp > +++ b/src/hotspot/share/utilities/vmError.cpp > @@ -1795,7 +1795,7 @@ void VMError::controlled_crash(int how) { > char * const dataPtr = NULL; // bad data pointer > const void (*funcPtr)(void); // bad function pointer > > -#if defined(PPC64) && !defined(ABI_ELFv2) > +#if defined(PPC64) && !defined(ABI_ELFv2) && !defined(ZERO) > struct FunctionDescriptor functionDescriptor; > > functionDescriptor.set_entry((address) 0xF); > > Additional testing: > - [x] Linux Zero PPC64 fastdebug build This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk16/pull/147 From aph at redhat.com Mon Feb 8 18:14:19 2021 From: aph at redhat.com (Andrew Haley) Date: Mon, 8 Feb 2021 18:14:19 +0000 Subject: Atomic operations: your thoughts are welocme Message-ID: I've been looking at the hottest Atomic operations in HotSpot, with a view to finding out if the default memory_order_conservative (which is very expensive on some architectures) can be weakened to something less. It's impossible to fix all of them, but perhaps we can fix some of the most frequent. These are the hottest compare-and-swap uses in HotSpot, with the count at the end of each line. : :: = 16406757 This one is already memory_order_relaxed, so no problem. ::Table::oop_oop_iterate(G1CMOopClosure*, oopDesc*, Klass*)+336>: :: = 3903178 This is actually MarkBitMap::par_mark calling BitMap::par_set_bit. Does this need to be memory_order_conservative, or would something weaker do? Even acq_rel or seq_cst would be better. : :: = 2376632 : :: = 2003895 I can't imagine that either of these actually need memory_order_conservative, they're just reference counts. : :: = 1719614 BitMap::par_set_bit again. , (MEMFLAGS)5>*)+432>: :: = 1617659 This one is GenericTaskQueue::pop_global calling cmpxchg_age(). Again, do we need conservative here? There is, I suppose, always a possibility that some code somewhere is taking advantage of the memory serializing properties of adjusting refcounts, I suppose. Thanks, -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From shade at openjdk.java.net Mon Feb 8 18:33:46 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 8 Feb 2021 18:33:46 GMT Subject: [jdk16] RFR: 8261310: PPC64 Zero build fails with 'VMError::controlled_crash(int)::FunctionDescriptor functionDescriptor' has incomplete type and cannot be defined In-Reply-To: <6rPJd6UdCDY-zqEcAeg65d8zgD7x0Qd1tkXb9WapmZk=.4402ec86-ae74-48dd-99b7-2949ed6a31b5@github.com> References: <6rPJd6UdCDY-zqEcAeg65d8zgD7x0Qd1tkXb9WapmZk=.4402ec86-ae74-48dd-99b7-2949ed6a31b5@github.com> Message-ID: On Mon, 8 Feb 2021 10:33:48 GMT, Thomas Stuefe wrote: >> $ CONF=linux-ppc64-zero-fastdebug make hotspot >> >> >> 1799 | struct FunctionDescriptor functionDescriptor; >> | ^~~~~~~~~~~~~~~~~~ >> >> `FunctionDescriptor` is from `src/hotspot/cpu/ppc/assembler_ppc.hpp`, and obviously not available for Zero. >> >> The affected code was removed by JDK-8252148 in 17, so this issue affects versions below it. >> While not exactly the regression for 16, it would be nice to have this fixed for 16 and lower, to get clean builds on all platform configurations, including JDK 16 GA. >> >> The fix is trivial: >> >> diff --git a/src/hotspot/share/utilities/vmError.cpp b/src/hotspot/share/utilities/vmError.cpp >> index 9b0dc413bcd..476fdc48e43 100644 >> --- a/src/hotspot/share/utilities/vmError.cpp >> +++ b/src/hotspot/share/utilities/vmError.cpp >> @@ -1795,7 +1795,7 @@ void VMError::controlled_crash(int how) { >> char * const dataPtr = NULL; // bad data pointer >> const void (*funcPtr)(void); // bad function pointer >> >> -#if defined(PPC64) && !defined(ABI_ELFv2) >> +#if defined(PPC64) && !defined(ABI_ELFv2) && !defined(ZERO) >> struct FunctionDescriptor functionDescriptor; >> >> functionDescriptor.set_entry((address) 0xF); >> >> Additional testing: >> - [x] Linux Zero PPC64 fastdebug build > > This is trivial and fine. > > The way it is fixed may trip error handler tests on Zero which test that the pc is correctly displayed. But I don't think we have such a test. We have downstream but I ever brought it upstream. Its also not terribly important. > > Also, function descriptors should be available with zero too of course, but I'm not sure its worth following up on that. Note that one could just do: > void* fakefundes[2] = { 0xF, NULL }; > funcPtr = (const void(*)()) &fakefundes; > and it would probably work just fine. > > Cheers, Thomas Retargeted to 16u: https://github.com/openjdk/jdk16u/pull/23 ------------- PR: https://git.openjdk.java.net/jdk16/pull/147 From aph at openjdk.java.net Mon Feb 8 18:50:09 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 8 Feb 2021 18:50:09 GMT Subject: RFR: 8261027: AArch64: Support for LSE atomics C++ HotSpot code [v2] In-Reply-To: References: Message-ID: > Go back a few years, and there were simple atomic load/store exclusive > instructions on Arm. Say you want to do an atomic increment of a > counter. You'd do an atomic load to get the counter into your local cache > in exclusive state, increment that counter locally, then write that > incremented counter back to memory with an atomic store. All the time > that cache line was in exclusive state, so you're guaranteed that > no-one else changed anything on that cache line while you had it. > > This is hard to scale on a very large system (e.g. Fugaku) because if > many processors are incrementing that counter you get a lot of cache > line ping-ponging between cores. > > So, Arm decided to add a locked memory increment instruction that > works without needing to load an entire line into local cache. It's a > single instruction that loads, increments, and writes back. The secret > is to send a cache control message to whichever processor owns the > cache line containing the count, tell that processor to increment the > counter and return the incremented value. That way cache coherency > traffic is mimimized. This new set of instructions is known as Large > System Extensions, or LSE. > > Unfortunately, in recent processors, the "old" load/store exclusive > instructions, sometimes perform very badly. Therefore, it's now > necessary for software to detect which version of Arm it's running > on, and use the "new" LSE instructions if they're available. Otherwise > performance can be very poor under heavy contention. > > GCC's -moutline-atomics does this by providing library calls which use > LSE if it's available, but this option is only provided on newer > versions of GCC. This is particularly problematic with older versions > of OpenJDK, which build using old GCC versions. > > Also, I suspect that some other operating systems could use this. > Perhaps not MacOS, given that all Apple CPUs support LSE, but > maybe Windows. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Review changes ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2434/files - new: https://git.openjdk.java.net/jdk/pull/2434/files/4f17903b..31f9c003 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2434&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2434&range=00-01 Stats: 16 lines in 3 files changed: 14 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2434.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2434/head:pull/2434 PR: https://git.openjdk.java.net/jdk/pull/2434 From aph at openjdk.java.net Mon Feb 8 18:50:10 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 8 Feb 2021 18:50:10 GMT Subject: RFR: 8261027: AArch64: Support for LSE atomics C++ HotSpot code [v2] In-Reply-To: References: Message-ID: On Mon, 8 Feb 2021 12:06:51 GMT, Andrew Dinn wrote: >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> Review changes > > src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.hpp line 88: > >> 86: template >> 87: D add_and_fetch(D volatile* dest, I add_value, atomic_memory_order order) const { >> 88: D old_value = fetch_and_add(dest, add_value, order) + add_value; > > I'm not sure this should be called old_value. It is actually old_value + add_value i.e. it really ought to be called eiter new_value (or just value?). Sure. This templated code is copied from os_cpu/linux_x86, so I kept the names the same as in that file. But you're right, that would make more sense. ------------- PR: https://git.openjdk.java.net/jdk/pull/2434 From aph at openjdk.java.net Mon Feb 8 18:50:10 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 8 Feb 2021 18:50:10 GMT Subject: RFR: 8261027: AArch64: Support for LSE atomics C++ HotSpot code [v2] In-Reply-To: References: Message-ID: On Mon, 8 Feb 2021 12:23:31 GMT, Andrew Dinn wrote: >> src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.hpp line 95: >> >>> 93: template<> >>> 94: template >>> 95: inline D Atomic::PlatformAdd<4>::fetch_and_add(D volatile* dest, I add_value, >> >> It may be possible to avoid the cut-and-paste involved in declaring almost exactly the same template body for byte_size == 4 and 8 with a generic template which includes a function type element supplemented with two auxiliary templates to instantiate that function element with either aarch64_atomic_fetch_add_4_impl or aarch64_atomic_fetch_add_8_impl. That would mean more templates but less repetition. On the whole I prefer less templates so perhaps this is best left as is. > > n.b. the same comment applies to the cut and paste for PlatformCchg and PlatformCmpxchg Lawks! Even Oracle didn't do that with Linux/x86. I think I'll leave it as it is. ------------- PR: https://git.openjdk.java.net/jdk/pull/2434 From aph at openjdk.java.net Mon Feb 8 19:31:43 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 8 Feb 2021 19:31:43 GMT Subject: RFR: 8261027: AArch64: Support for LSE atomics C++ HotSpot code [v2] In-Reply-To: References: Message-ID: On Mon, 8 Feb 2021 14:52:00 GMT, Andrew Dinn wrote: >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> Review changes > > It would help to change the name of old_value to value. Otherwise this si ok as is. > _Mailing list message from [Andrew Haley](mailto:aph at redhat.com) on [hotspot-dev](mailto:hotspot-dev at openjdk.java.net):_ > Well, I don't know. Do InterlockedAdd and its friends generate good code? > You've got the machines there, so you can have a look, but I can't. I'm sorry, that was unnecessarily sharp of me! It's entirely up to you, but you might find investigating this to be useful. ------------- PR: https://git.openjdk.java.net/jdk/pull/2434 From github.com+9200663+quaffel at openjdk.java.net Mon Feb 8 23:57:10 2021 From: github.com+9200663+quaffel at openjdk.java.net (Niklas Radomski) Date: Mon, 8 Feb 2021 23:57:10 GMT Subject: RFR: JDK-8260372: [PPC64] Add support for JDK-8210498 and JDK-8222841 [v3] In-Reply-To: References: Message-ID: > Introduces support for _nmethod entry barriers_ and _c2i entry barriers_ on the ppc platform. Those are required to enable concurrent class unloading for compatible garbage collectors, such as Shenandoah or zGC. > > _This is a preparational change for the Shenandoah GC port to ppc. As such, it introduces features that the current version doesn't make use of, but that are required for the upcoming change. This way, the scope of the upcoming change is limited to Shenandoah-specific functionality; making its review a little easier._ Niklas Radomski has updated the pull request incrementally with two additional commits since the last revision: - Use toc offset in c2i entry barrier - Update comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2432/files - new: https://git.openjdk.java.net/jdk/pull/2432/files/ab354b68..202d0d35 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2432&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2432&range=01-02 Stats: 7 lines in 3 files changed: 1 ins; 1 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/2432.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2432/head:pull/2432 PR: https://git.openjdk.java.net/jdk/pull/2432 From ngasson at openjdk.java.net Tue Feb 9 01:53:43 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Tue, 9 Feb 2021 01:53:43 GMT Subject: Integrated: 8260355: AArch64: deoptimization stub should save vector registers In-Reply-To: <5PbujtOhWB7uqNXu5vRMwYGNMNU78aLAkGpDLWUnQKM=.9d2206fe-9c15-424c-b08e-80eab468df2a@github.com> References: <5PbujtOhWB7uqNXu5vRMwYGNMNU78aLAkGpDLWUnQKM=.9d2206fe-9c15-424c-b08e-80eab468df2a@github.com> Message-ID: On Thu, 28 Jan 2021 08:27:22 GMT, Nick Gasson wrote: > This is an AArch64 port of the fix for JDK-8256056 "Deoptimization stub > doesn't save vector registers on x86". The problem is that a vector > produced by the Vector API may be stored in a register when the deopt > blob is called. Because the deopt blob only stores the lower half of > vector registers, the full vector object cannot be rematerialized during > deoptimization. So the following will crash on AArch64 with current JDK: > > make test TEST="jdk/incubator/vector" \ > JTREG="VM_OPTIONS=-XX:+DeoptimizeALot -XX:DeoptimizeALotInterval=0" > > The fix is to store the full vector registers by passing > save_vectors=true to save_live_registers() in the deopt blob. Because > save_live_registers() places the integer registers above the floating > registers in the stack frame, RegisterSaver::r0_offset_in_bytes() needs > to calculate the SP offset based on whether full vectors were saved, and > whether those vectors were NEON or SVE, rather than using a static > offset as it does currently. > > The change to VectorSupport::allocate_vector_payload_helper() is > required because we only store the lowest VMReg slot in the oop map. > However unlike x86 the vector registers are always saved in a contiguous > region of memory, so we can calculate the address of each vector element > as an offset from the address of the first slot. X86 handles this in > RegisterMap::pd_location() but that won't work on AArch64 because with > SVE there isn't a unique VMReg corresponding to each four-byte physical > slot in the vector (there are always exactly eight logical VMRegs > regardless of the actual vector length). > > Tested hotspot_all_no_apps and jdk_core. This pull request has now been integrated. Changeset: 5183d8ae Author: Nick Gasson URL: https://git.openjdk.java.net/jdk/commit/5183d8ae Stats: 205 lines in 12 files changed: 130 ins; 23 del; 52 mod 8260355: AArch64: deoptimization stub should save vector registers Reviewed-by: vlivanov, aph ------------- PR: https://git.openjdk.java.net/jdk/pull/2279 From dholmes at openjdk.java.net Tue Feb 9 05:11:43 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 9 Feb 2021 05:11:43 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v5] In-Reply-To: References: Message-ID: On Fri, 5 Feb 2021 10:57:00 GMT, Thomas Stuefe wrote: >> In signal handling code, we have code sections which save signal handler state into vectors of sigaction structures, or of integers (if only flags are saved). All these code sections can be unified, disentangled and the using code simplified. >> >> There are three places where we do this: >> >> 1) When installing hotspot signal handlers, should we find a handler in place and signal chaining is enabled, we save the original handler inside a sigaction array and a corresponding sigset: >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L85 >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L338 >> >> 2) if diagnostics are enabled with -Xcheck:jni, we periodically check if our hotspot signal handlers had been replaced (`static void check_signal_handler(int sig)`): >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L766 >> To do that, we store information about the handlers we installed and we expect to be intact; in this case we only store the sigaction flags (`int sigflags[NSIG];`) and deduce the handler address from context. >> >> 3) There is a complicated dance between VMError and the posix signal handler code: If a fatal error happens, we enter error reporting and install the secondary handler (`VMError::install_secondary_signal_handler()`). Before doing that, we store the handler we replace in yet another array, in this case one array for the handler address, one for the flag: >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/vmError_posix.cpp#L77 >> I believe the purpose of this is to - when printing signal handlers as part of error reporting - print the original signal handler instead of the secondary crash handler (see `PosixSignals::print_signal_handler()`): >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L1372 >> and additionally to not trip this warning here: >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L1391 >> >> ------ >> >> Changes in this patch: >> >> - I added some convenience macros to check if a handler matches a given function (HANDLER_IS), check if a handler is set to ignore or default or both (HANDLER_IS_IGN, HANDLER_IS_DFL, HANDLER_IS_IGN_OR_DFL). Makes code more readable. >> - I added convenience class `SavedSignalHandlers` to keep a vector of handler information by signal number. >> - I used that class to cover cases (1)..(3): >> - `chained_handlers` contains all information of chained handlers >> - `expected_handlers` contains a copy of the handlers the hotspot installed >> - `replaced_handlers` contains information about replaced handlers >> >> - about (1): I store the chained signal handler information in `chained_handlers` when installing a hotspot handler, UseSignalChaining is 1, and a non-default handler was encountered. >> >> - about (2): I simplified the signal checking mechanism quite a bit: it compares the handler (address and flags) it finds present with expectations. Before this patch, the expected handler address was deduced in a hard-wired way, now, we just compare the active sigaction structure with the one we installed on VM start. >> >> - about (3): when installing any handler (hotspot as well as user defined via java), I store the handler it replaced in `replaced_handlers`. I use that to print which handler had been replaced in `PosixSignals::print_signal_handler`. I simplified `PosixSignals::print_signal_handler` such that it does not retain any knowledge about hotspot signal handlers. Now, it just prints out the currently established handlers. In addition to that, it prints out chaining information and which handlers had been replaced. I removed the associated coding from VMError. >> >> Output Before: >> 663 Signal Handlers: >> 664 SIGSEGV: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 665 SIGBUS: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 666 SIGFPE: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 667 SIGPIPE: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 668 SIGXFSZ: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 669 SIGILL: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 670 SIGUSR2: SR_handler in libjvm.so, sa_mask[0]=00000000000000000000000000000000, sa_flags=SA_RESTART|SA_SIGINFO >> 671 SIGHUP: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 672 SIGINT: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 673 SIGTERM: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 674 SIGQUIT: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 675 SIGTRAP: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> >> Now: >> Signal Handlers: >> SIGSEGV: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGSEGV: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGBUS: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGBUS: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGFPE: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGFPE: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGPIPE: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGXFSZ: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGILL: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGILL: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGUSR2: SR_handler in libjvm.so, mask=00000000000000000000000000000000, flags=SA_RESTART|SA_SIGINFO >> SIGHUP: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGINT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGTERM: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGQUIT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGTRAP: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> >> ----- >> Tests: GA, and the patch has been tested in our nighlies for over a month now. I manually executed the runtime/jni/checked tests too. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Further fixes Still seems okay. One query below. Thanks, David src/hotspot/os/posix/signals_posix.cpp line 154: > 152: static bool check_signals = true; > 153: static SavedSignalHandlers expected_handlers; > 154: static bool do_check_signal_periodically[NSIG]; Does this need explicit initialization to have false entries? ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2251 From stuefe at openjdk.java.net Tue Feb 9 05:35:03 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 9 Feb 2021 05:35:03 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v6] In-Reply-To: References: Message-ID: > In signal handling code, we have code sections which save signal handler state into vectors of sigaction structures, or of integers (if only flags are saved). All these code sections can be unified, disentangled and the using code simplified. > > There are three places where we do this: > > 1) When installing hotspot signal handlers, should we find a handler in place and signal chaining is enabled, we save the original handler inside a sigaction array and a corresponding sigset: > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L85 > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L338 > > 2) if diagnostics are enabled with -Xcheck:jni, we periodically check if our hotspot signal handlers had been replaced (`static void check_signal_handler(int sig)`): > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L766 > To do that, we store information about the handlers we installed and we expect to be intact; in this case we only store the sigaction flags (`int sigflags[NSIG];`) and deduce the handler address from context. > > 3) There is a complicated dance between VMError and the posix signal handler code: If a fatal error happens, we enter error reporting and install the secondary handler (`VMError::install_secondary_signal_handler()`). Before doing that, we store the handler we replace in yet another array, in this case one array for the handler address, one for the flag: > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/vmError_posix.cpp#L77 > I believe the purpose of this is to - when printing signal handlers as part of error reporting - print the original signal handler instead of the secondary crash handler (see `PosixSignals::print_signal_handler()`): > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L1372 > and additionally to not trip this warning here: > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L1391 > > ------ > > Changes in this patch: > > - I added some convenience macros to check if a handler matches a given function (HANDLER_IS), check if a handler is set to ignore or default or both (HANDLER_IS_IGN, HANDLER_IS_DFL, HANDLER_IS_IGN_OR_DFL). Makes code more readable. > - I added convenience class `SavedSignalHandlers` to keep a vector of handler information by signal number. > - I used that class to cover cases (1)..(3): > - `chained_handlers` contains all information of chained handlers > - `expected_handlers` contains a copy of the handlers the hotspot installed > - `replaced_handlers` contains information about replaced handlers > > - about (1): I store the chained signal handler information in `chained_handlers` when installing a hotspot handler, UseSignalChaining is 1, and a non-default handler was encountered. > > - about (2): I simplified the signal checking mechanism quite a bit: it compares the handler (address and flags) it finds present with expectations. Before this patch, the expected handler address was deduced in a hard-wired way, now, we just compare the active sigaction structure with the one we installed on VM start. > > - about (3): when installing any handler (hotspot as well as user defined via java), I store the handler it replaced in `replaced_handlers`. I use that to print which handler had been replaced in `PosixSignals::print_signal_handler`. I simplified `PosixSignals::print_signal_handler` such that it does not retain any knowledge about hotspot signal handlers. Now, it just prints out the currently established handlers. In addition to that, it prints out chaining information and which handlers had been replaced. I removed the associated coding from VMError. > > Output Before: > 663 Signal Handlers: > 664 SIGSEGV: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 665 SIGBUS: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 666 SIGFPE: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 667 SIGPIPE: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 668 SIGXFSZ: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 669 SIGILL: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 670 SIGUSR2: SR_handler in libjvm.so, sa_mask[0]=00000000000000000000000000000000, sa_flags=SA_RESTART|SA_SIGINFO > 671 SIGHUP: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 672 SIGINT: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 673 SIGTERM: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 674 SIGQUIT: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 675 SIGTRAP: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > > Now: > Signal Handlers: > SIGSEGV: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > replaced: SIGSEGV: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGBUS: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > replaced: SIGBUS: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGFPE: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > replaced: SIGFPE: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGPIPE: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGXFSZ: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGILL: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > replaced: SIGILL: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGUSR2: SR_handler in libjvm.so, mask=00000000000000000000000000000000, flags=SA_RESTART|SA_SIGINFO > SIGHUP: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGINT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGTERM: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGQUIT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGTRAP: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > > ----- > Tests: GA, and the patch has been tested in our nighlies for over a month now. I manually executed the runtime/jni/checked tests too. Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: Use universal zero initializer for do_check_signal_periodically ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2251/files - new: https://git.openjdk.java.net/jdk/pull/2251/files/d2434ad2..06e1b030 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2251&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2251&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2251.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2251/head:pull/2251 PR: https://git.openjdk.java.net/jdk/pull/2251 From stuefe at openjdk.java.net Tue Feb 9 05:35:04 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 9 Feb 2021 05:35:04 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v5] In-Reply-To: References: Message-ID: <92nyRdUpWnUiOulpB_4psvhIKK9ukI8eiBQ3SByZBsg=.51fd4f86-fc4d-4b2e-89fb-daec38090239@github.com> On Tue, 9 Feb 2021 05:08:40 GMT, David Holmes wrote: > Still seems okay. > > One query below. > > Thanks, > David Thank you David. > src/hotspot/os/posix/signals_posix.cpp line 154: > >> 152: static bool check_signals = true; >> 153: static SavedSignalHandlers expected_handlers; >> 154: static bool do_check_signal_periodically[NSIG]; > > Does this need explicit initialization to have false entries? I think it does not since global static objects are zero-initialized and 0 = false; however, I made the initialization explicit. ------------- PR: https://git.openjdk.java.net/jdk/pull/2251 From dholmes at openjdk.java.net Tue Feb 9 05:45:43 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 9 Feb 2021 05:45:43 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v6] In-Reply-To: References: Message-ID: On Tue, 9 Feb 2021 05:35:03 GMT, Thomas Stuefe wrote: >> In signal handling code, we have code sections which save signal handler state into vectors of sigaction structures, or of integers (if only flags are saved). All these code sections can be unified, disentangled and the using code simplified. >> >> There are three places where we do this: >> >> 1) When installing hotspot signal handlers, should we find a handler in place and signal chaining is enabled, we save the original handler inside a sigaction array and a corresponding sigset: >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L85 >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L338 >> >> 2) if diagnostics are enabled with -Xcheck:jni, we periodically check if our hotspot signal handlers had been replaced (`static void check_signal_handler(int sig)`): >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L766 >> To do that, we store information about the handlers we installed and we expect to be intact; in this case we only store the sigaction flags (`int sigflags[NSIG];`) and deduce the handler address from context. >> >> 3) There is a complicated dance between VMError and the posix signal handler code: If a fatal error happens, we enter error reporting and install the secondary handler (`VMError::install_secondary_signal_handler()`). Before doing that, we store the handler we replace in yet another array, in this case one array for the handler address, one for the flag: >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/vmError_posix.cpp#L77 >> I believe the purpose of this is to - when printing signal handlers as part of error reporting - print the original signal handler instead of the secondary crash handler (see `PosixSignals::print_signal_handler()`): >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L1372 >> and additionally to not trip this warning here: >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L1391 >> >> ------ >> >> Changes in this patch: >> >> - I added some convenience macros to check if a handler matches a given function (HANDLER_IS), check if a handler is set to ignore or default or both (HANDLER_IS_IGN, HANDLER_IS_DFL, HANDLER_IS_IGN_OR_DFL). Makes code more readable. >> - I added convenience class `SavedSignalHandlers` to keep a vector of handler information by signal number. >> - I used that class to cover cases (1)..(3): >> - `chained_handlers` contains all information of chained handlers >> - `expected_handlers` contains a copy of the handlers the hotspot installed >> - `replaced_handlers` contains information about replaced handlers >> >> - about (1): I store the chained signal handler information in `chained_handlers` when installing a hotspot handler, UseSignalChaining is 1, and a non-default handler was encountered. >> >> - about (2): I simplified the signal checking mechanism quite a bit: it compares the handler (address and flags) it finds present with expectations. Before this patch, the expected handler address was deduced in a hard-wired way, now, we just compare the active sigaction structure with the one we installed on VM start. >> >> - about (3): when installing any handler (hotspot as well as user defined via java), I store the handler it replaced in `replaced_handlers`. I use that to print which handler had been replaced in `PosixSignals::print_signal_handler`. I simplified `PosixSignals::print_signal_handler` such that it does not retain any knowledge about hotspot signal handlers. Now, it just prints out the currently established handlers. In addition to that, it prints out chaining information and which handlers had been replaced. I removed the associated coding from VMError. >> >> Output Before: >> 663 Signal Handlers: >> 664 SIGSEGV: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 665 SIGBUS: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 666 SIGFPE: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 667 SIGPIPE: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 668 SIGXFSZ: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 669 SIGILL: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 670 SIGUSR2: SR_handler in libjvm.so, sa_mask[0]=00000000000000000000000000000000, sa_flags=SA_RESTART|SA_SIGINFO >> 671 SIGHUP: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 672 SIGINT: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 673 SIGTERM: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 674 SIGQUIT: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 675 SIGTRAP: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> >> Now: >> Signal Handlers: >> SIGSEGV: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGSEGV: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGBUS: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGBUS: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGFPE: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGFPE: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGPIPE: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGXFSZ: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGILL: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGILL: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGUSR2: SR_handler in libjvm.so, mask=00000000000000000000000000000000, flags=SA_RESTART|SA_SIGINFO >> SIGHUP: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGINT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGTERM: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGQUIT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGTRAP: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> >> ----- >> Tests: GA, and the patch has been tested in our nighlies for over a month now. I manually executed the runtime/jni/checked tests too. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Use universal zero initializer for do_check_signal_periodically Marked as reviewed by dholmes (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2251 From dongbo at openjdk.java.net Tue Feb 9 07:03:58 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Tue, 9 Feb 2021 07:03:58 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width Message-ID: In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero, see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`: /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */ public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT); The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64, assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 8`, but `ushr dst.4H, src.4H, 16` instead. According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long. ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i); vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i); The legal right shift amount should be in the range 1 to the element width in bits on aarch64: https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate. Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests. ------------- Commit messages: - fix trailing whitespaces - 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width Changes: https://git.openjdk.java.net/jdk/pull/2472/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261142 Stats: 771 lines in 3 files changed: 641 ins; 17 del; 113 mod Patch: https://git.openjdk.java.net/jdk/pull/2472.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2472/head:pull/2472 PR: https://git.openjdk.java.net/jdk/pull/2472 From stuefe at openjdk.java.net Tue Feb 9 07:24:57 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 9 Feb 2021 07:24:57 GMT Subject: RFR: JDK-8261302: NMT: Improve malloc site table hashing Message-ID: <3vDpj4wrk0G9gSbO8IvCZ-9N8S9LZBEUujqdvL1Ucbs=.903d98bc-3cef-425c-b65b-29c5dc9d2d4c@github.com> While looking at NMT tuning statistics, I saw longish bucket chains in the malloc site table and looked whether this can be improved. The current hash algorithm uses the 32bit masked sum of all stack entries as hash. I first experimented with different hash algorithms on different platforms (x86 and ppc, the latter because it has uniform op sizes) and did actually not find a noticeable improvement over what NMT does now. It seems that using the raw code pointers as base for the hash gives us already enough entropy. Avg load factor of the table always hovered around what was expected. Regardless of the hash I tried I was not able to get rid of the few longer chains. The biggest improvement brought an experimental table size increase: currently the table size is 511 pointer slots (~4K). Quadrupling the size would bring the load factor down to 1-2. However, there is this comment in mallocSiteTable.hpp: https://github.com/openjdk/jdk/blob/5183d8ae1eec86202eace2c4770f81edbc73cb68/src/hotspot/share/services/mallocSiteTable.hpp#L118 wich claims that a load factor of 6 is what is aimed for and deemed acceptable. So I am not going to touch that here (even though 4 or 12K more may be an okay price to pay for more efficient lookups. --- With that out of the way, there are still small things we can improve about the hash function: Since the vast majority of `NativeCallStack` objects will always need a hash code, it makes no sense to delay its calculation. By doing the hash code calculation in the constructor we can make `NativeCallStack::hash()` a simple inline getter. When calculating the hash code, I also omitted the "if stack address is 0 stop" logic. The vast majority of call stacks have the full size and nothing much is gained from omitting those 0 values from the hash code calculation. See difference (linux x86): Before: Dump of assembler code for function NativeCallStack::hash() const: => 0x00007ffff68092f0 <+0>: mov 0x20(%rdi),%eax : push %rbp 0x00007ffff68092f4 <+4>: mov %rsp,%rbp 0x00007ffff68092f7 <+7>: test %eax,%eax <0?->X 0x00007ffff68092f9 <+9>: jne 0x7ffff6809324 0x00007ffff68092fb <+11>: mov (%rdi),%rdx 0x00007ffff68092fe <+14>: test %rdx,%rdx 0x00007ffff6809301 <+17>: je 0x7ffff6809330 0x00007ffff6809303 <+19>: mov 0x8(%rdi),%rax 0x00007ffff6809307 <+23>: test %rax,%rax 0x00007ffff680930a <+26>: je 0x7ffff680931f 0x00007ffff680930c <+28>: add %rax,%rdx 0x00007ffff680930f <+31>: mov 0x10(%rdi),%rax 0x00007ffff6809313 <+35>: test %rax,%rax 0x00007ffff6809316 <+38>: je 0x7ffff680931f 0x00007ffff6809318 <+40>: add %rax,%rdx 0x00007ffff680931b <+43>: add 0x18(%rdi),%rdx 0x00007ffff680931f <+47>: mov %edx,%eax 0x00007ffff6809321 <+49>: mov %edx,0x20(%rdi) 0x00007ffff6809324 <+52>: pop %rbp : retq 0x00007ffff6809326 <+54>: nopw %cs:0x0(%rax,%rax,1) 0x00007ffff6809330 <+64>: xor %edx,%edx 0x00007ffff6809332 <+66>: jmp 0x7ffff680931f hash() getter is not inlined; it queries the hash code each time and, when calculating it, uses simple adds interspersed with conditional jumps because of the "if stack address is 0 stop" logic. With this patch, the `NativeCallStack::hash()` gets inlined at the call sites to a simple load. The hash calculation gets now inlined into the constructor and uses 128bit xmm registers and packed integer adds: Dump of assembler code for function NativeCallStack::NativeCallStack(int, bool): => 0x00007ffff67cbba0 <+0>: push %rbp 0x00007ffff67cbba1 <+1>: mov %rsp,%rbp 0x00007ffff67cbba4 <+4>: push %rbx 0x00007ffff67cbba5 <+5>: mov %rdi,%rbx 0x00007ffff67cbba8 <+8>: sub $0x8,%rsp 0x00007ffff67cbbac <+12>: test %dl,%dl 0x00007ffff67cbbae <+14>: movl $0x0,0x20(%rdi) 0x00007ffff67cbbb5 <+21>: jne 0x7ffff67cbc10 0x00007ffff67cbbb7 <+23>: movq $0x0,(%rdi) 0x00007ffff67cbbbe <+30>: movq $0x0,0x8(%rdi) 0x00007ffff67cbbc6 <+38>: movq $0x0,0x10(%rdi) 0x00007ffff67cbbce <+46>: movq $0x0,0x18(%rdi) 0x00007ffff67cbbd6 <+54>: movdqu (%rbx),%xmm0 <<<< 0x00007ffff67cbbda <+58>: movdqu 0x10(%rbx),%xmm2 <<<< 0x00007ffff67cbbdf <+63>: shufps $0x88,%xmm2,%xmm0 <<<< 0x00007ffff67cbbe3 <+67>: movdqa %xmm0,%xmm1 <<<< 0x00007ffff67cbbe7 <+71>: psrldq $0x8,%xmm1 <<<< 0x00007ffff67cbbec <+76>: paddd %xmm1,%xmm0 <<<< 0x00007ffff67cbbf0 <+80>: movdqa %xmm0,%xmm1 <<<< 0x00007ffff67cbbf4 <+84>: psrldq $0x4,%xmm1 <<<< 0x00007ffff67cbbf9 <+89>: paddd %xmm1,%xmm0 <<<< 0x00007ffff67cbbfd <+93>: movd %xmm0,0x20(%rbx) <<<< 0x00007ffff67cbc02 <+98>: add $0x8,%rsp 0x00007ffff67cbc06 <+102>: pop %rbx 0x00007ffff67cbc07 <+103>: pop %rbp 0x00007ffff67cbc08 <+104>: retq 0x00007ffff67cbc09 <+105>: nopl 0x0(%rax) 0x00007ffff67cbc10 <+112>: mov %esi,%edx 0x00007ffff67cbc12 <+114>: mov $0x4,%esi 0x00007ffff67cbc17 <+119>: callq 0x7ffff6824960 0x00007ffff67cbc1c <+124>: jmp 0x7ffff67cbbd6 Thanks, Thomas ------------- Commit messages: - Remove stray include - Start Changes: https://git.openjdk.java.net/jdk/pull/2473/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2473&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261302 Stats: 29 lines in 2 files changed: 9 ins; 16 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/2473.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2473/head:pull/2473 PR: https://git.openjdk.java.net/jdk/pull/2473 From njian at openjdk.java.net Tue Feb 9 07:55:43 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Tue, 9 Feb 2021 07:55:43 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width In-Reply-To: References: Message-ID: <8kMxMFAYtb0B-yUVEt-HLfhji3Gj-gog8OHvWW_tKfw=.f7c9422b-3574-4c31-9489-7286ee98332f@github.com> On Tue, 9 Feb 2021 06:55:50 GMT, Dong Bo wrote: > In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero, > see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`: > /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */ > public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT); > > The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64, > assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 0`, but `ushr dst.4H, src.4H, 16` instead. > According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long. > ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i); > vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i); > > The legal right shift amount should be in the range 1 to the element width in bits on aarch64: > https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en > > This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate. > Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests. Thanks for the fix. src/hotspot/cpu/aarch64/aarch64_neon.ad line 5285: > 5283: ins_encode %{ > 5284: int sh = (int)$shift$$constant; > 5285: if (sh == 0) { If src and dst are the same reg, no need to emit code. Or maybe c2 can even be improved to optimize this (sh=0 case) out? src/hotspot/cpu/aarch64/aarch64_neon.ad line 5271: > 5269: } else { > 5270: if (sh >= 8) sh = 7; > 5271: __ sshr(as_FloatRegister($dst$$reg), __ T8B, I think we should add an assert to make sure 0 is not passed to the assembler. ------------- PR: https://git.openjdk.java.net/jdk/pull/2472 From stuefe at openjdk.java.net Tue Feb 9 08:05:59 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 9 Feb 2021 08:05:59 GMT Subject: RFR: JDK-8261302: NMT: Improve malloc site table hashing [v2] In-Reply-To: <3vDpj4wrk0G9gSbO8IvCZ-9N8S9LZBEUujqdvL1Ucbs=.903d98bc-3cef-425c-b65b-29c5dc9d2d4c@github.com> References: <3vDpj4wrk0G9gSbO8IvCZ-9N8S9LZBEUujqdvL1Ucbs=.903d98bc-3cef-425c-b65b-29c5dc9d2d4c@github.com> Message-ID: <8IRE-aXvXSOs_MRNdAplymbZHr8yT1zqWVoKlMX55ck=.0eb8db60-a204-4d80-96bf-742833965da2@github.com> > While looking at NMT tuning statistics, I saw longish bucket chains in the malloc site table and looked whether this can be improved. > > The current hash algorithm uses the 32bit masked sum of all stack entries as hash. > > I first experimented with different hash algorithms on different platforms (x86 and ppc, the latter because it has uniform op sizes) and did actually not find a noticeable improvement over what NMT does now. It seems that using the raw code pointers as base for the hash gives us already enough entropy. Avg load factor of the table always hovered around what was expected. Regardless of the hash I tried I was not able to get rid of the few longer chains. > > The biggest improvement brought an experimental table size increase: currently the table size is 511 pointer slots (~4K). Quadrupling the size would bring the load factor down to 1-2. However, there is this comment in mallocSiteTable.hpp: > > https://github.com/openjdk/jdk/blob/5183d8ae1eec86202eace2c4770f81edbc73cb68/src/hotspot/share/services/mallocSiteTable.hpp#L118 > > wich claims that a load factor of 6 is what is aimed for and deemed acceptable. So I am not going to touch that here (even though 4 or 12K more may be an okay price to pay for more efficient lookups. > > --- > > With that out of the way, there are still small things we can improve about the hash function: > > Since the vast majority of `NativeCallStack` objects will always need a hash code, it makes no sense to delay its calculation. By doing the hash code calculation in the constructor we can make `NativeCallStack::hash()` a simple inline getter. > > When calculating the hash code, I also omitted the "if stack address is 0 stop" logic. The vast majority of call stacks have the full size and nothing much is gained from omitting those 0 values from the hash code calculation. > > See difference (linux x86): > > Before: > > Dump of assembler code for function NativeCallStack::hash() const: > => 0x00007ffff68092f0 <+0>: mov 0x20(%rdi),%eax 0x00007ffff68092f3 <+3>: push %rbp > 0x00007ffff68092f4 <+4>: mov %rsp,%rbp > 0x00007ffff68092f7 <+7>: test %eax,%eax <0?->X > 0x00007ffff68092f9 <+9>: jne 0x7ffff6809324 > 0x00007ffff68092fb <+11>: mov (%rdi),%rdx > 0x00007ffff68092fe <+14>: test %rdx,%rdx > 0x00007ffff6809301 <+17>: je 0x7ffff6809330 > 0x00007ffff6809303 <+19>: mov 0x8(%rdi),%rax > 0x00007ffff6809307 <+23>: test %rax,%rax > 0x00007ffff680930a <+26>: je 0x7ffff680931f > 0x00007ffff680930c <+28>: add %rax,%rdx > 0x00007ffff680930f <+31>: mov 0x10(%rdi),%rax > 0x00007ffff6809313 <+35>: test %rax,%rax > 0x00007ffff6809316 <+38>: je 0x7ffff680931f > 0x00007ffff6809318 <+40>: add %rax,%rdx > 0x00007ffff680931b <+43>: add 0x18(%rdi),%rdx > 0x00007ffff680931f <+47>: mov %edx,%eax > 0x00007ffff6809321 <+49>: mov %edx,0x20(%rdi) > 0x00007ffff6809324 <+52>: pop %rbp 0x00007ffff6809325 <+53>: retq > 0x00007ffff6809326 <+54>: nopw %cs:0x0(%rax,%rax,1) > 0x00007ffff6809330 <+64>: xor %edx,%edx > 0x00007ffff6809332 <+66>: jmp 0x7ffff680931f > > hash() getter is not inlined; it queries the hash code each time and, when calculating it, uses simple adds interspersed with conditional jumps because of the "if stack address is 0 stop" logic. > > With this patch, the `NativeCallStack::hash()` gets inlined at the call sites to a simple load. > The hash calculation gets now inlined into the constructor and uses a series of simple adds now: > > Dump of assembler code for function NativeCallStack::NativeCallStack(int, bool): > => 0x00007ffff67cc9a0 <+0>: push %rbp > 0x00007ffff67cc9a1 <+1>: mov %rsp,%rbp > 0x00007ffff67cc9a4 <+4>: push %rbx > 0x00007ffff67cc9a5 <+5>: mov %rdi,%rbx > 0x00007ffff67cc9a8 <+8>: sub $0x8,%rsp > 0x00007ffff67cc9ac <+12>: test %dl,%dl > 0x00007ffff67cc9ae <+14>: movl $0x0,0x20(%rdi) > 0x00007ffff67cc9b5 <+21>: jne 0x7ffff67cc9f0 > 0x00007ffff67cc9b7 <+23>: movq $0x0,(%rdi) > 0x00007ffff67cc9be <+30>: movq $0x0,0x8(%rdi) > 0x00007ffff67cc9c6 <+38>: movq $0x0,0x10(%rdi) > 0x00007ffff67cc9ce <+46>: movq $0x0,0x18(%rdi) > 0x00007ffff67cc9d6 <+54>: mov 0x10(%rbx),%rax <<< > 0x00007ffff67cc9da <+58>: add 0x8(%rbx),%rax <<< > 0x00007ffff67cc9de <+62>: add 0x18(%rbx),%rax <<< > 0x00007ffff67cc9e2 <+66>: add (%rbx),%rax <<< > 0x00007ffff67cc9e5 <+69>: mov %eax,0x20(%rbx) > 0x00007ffff67cc9e8 <+72>: add $0x8,%rsp > 0x00007ffff67cc9ec <+76>: pop %rbx > 0x00007ffff67cc9ed <+77>: pop %rbp > 0x00007ffff67cc9ee <+78>: retq > 0x00007ffff67cc9ef <+79>: nop > 0x00007ffff67cc9f0 <+80>: mov %esi,%edx > 0x00007ffff67cc9f2 <+82>: mov $0x4,%esi > 0x00007ffff67cc9f7 <+87>: callq 0x7ffff68250d0 > 0x00007ffff67cc9fc <+92>: jmp 0x7ffff67cc9d6 > End of assembler dump. > > Thanks, Thomas Thomas Stuefe has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Start ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2473/files - new: https://git.openjdk.java.net/jdk/pull/2473/files/b8e11afb..309d6c0a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2473&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2473&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2473.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2473/head:pull/2473 PR: https://git.openjdk.java.net/jdk/pull/2473 From goetz at openjdk.java.net Tue Feb 9 08:33:43 2021 From: goetz at openjdk.java.net (Goetz Lindenmaier) Date: Tue, 9 Feb 2021 08:33:43 GMT Subject: RFR: JDK-8260372: [PPC64] Add support for JDK-8210498 and JDK-8222841 [v3] In-Reply-To: References: Message-ID: On Mon, 8 Feb 2021 23:57:10 GMT, Niklas Radomski wrote: >> Introduces support for _nmethod entry barriers_ and _c2i entry barriers_ on the ppc platform. Those are required to enable concurrent class unloading for compatible garbage collectors, such as Shenandoah or zGC. >> >> _This is a preparational change for the Shenandoah GC port to ppc. As such, it introduces features that the current version doesn't make use of, but that are required for the upcoming change. This way, the scope of the upcoming change is limited to Shenandoah-specific functionality; making its review a little easier._ > > Niklas Radomski has updated the pull request incrementally with two additional commits since the last revision: > > - Use toc offset in c2i entry barrier > - Update comments Hi, I'm not that deep in this code, but looking at other platforms this looks good. Thanks for porting this! Feel free to fix the comment or push as-is ... if this is possible with this tooling :) src/hotspot/cpu/ppc/gc/shared/barrierSetAssembler_ppc.cpp line 184: > 182: __ bne(CCR0, skip_barrier); > 183: > 184: // Class loader is weak. Determine whether the holder still alive. 'is' in comment missing: ... whether the holder _is_ still alive. ------------- Marked as reviewed by goetz (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2432 From stefank at openjdk.java.net Tue Feb 9 08:51:51 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Tue, 9 Feb 2021 08:51:51 GMT Subject: RFR: 8261027: AArch64: Support for LSE atomics C++ HotSpot code [v2] In-Reply-To: References: Message-ID: On Mon, 8 Feb 2021 17:40:27 GMT, Andrew Haley wrote: >> src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.hpp line 88: >> >>> 86: template >>> 87: D add_and_fetch(D volatile* dest, I add_value, atomic_memory_order order) const { >>> 88: D old_value = fetch_and_add(dest, add_value, order) + add_value; >> >> I'm not sure this should be called old_value. It is actually old_value + add_value i.e. it really ought to be called eiter new_value (or just value?). > > Sure. This templated code is copied from os_cpu/linux_x86, so I kept the names the same as in that file. But you're right, that would make more sense. FTR, the linux_x86 code doesn't use old_value for the add_and_fetch version: template D add_and_fetch(D volatile* dest, I add_value, atomic_memory_order order) const { return fetch_and_add(dest, add_value, order) + add_value; } only the fetch_and_add functions do. But I think that's correct. ------------- PR: https://git.openjdk.java.net/jdk/pull/2434 From dongbo at openjdk.java.net Tue Feb 9 08:57:12 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Tue, 9 Feb 2021 08:57:12 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width In-Reply-To: <8kMxMFAYtb0B-yUVEt-HLfhji3Gj-gog8OHvWW_tKfw=.f7c9422b-3574-4c31-9489-7286ee98332f@github.com> References: <8kMxMFAYtb0B-yUVEt-HLfhji3Gj-gog8OHvWW_tKfw=.f7c9422b-3574-4c31-9489-7286ee98332f@github.com> Message-ID: On Tue, 9 Feb 2021 07:47:57 GMT, Ningsheng Jian wrote: > If src and dst are the same reg, no need to emit code. If we want to do this enhancement, I think we need do it for left shifting and all SVE left/right shifting as well for completeness. > Or maybe c2 can even be improved to optimize this (sh=0 case) out? We can add code in `Ideal` to optimize it to ORR, but I'm not sure `orr` performs better than `shift` on other platforms. Seems we have to created a generic new node to do `vector move` here. > src/hotspot/cpu/aarch64/aarch64_neon.ad line 5271: > >> 5269: } else { >> 5270: if (sh >= 8) sh = 7; >> 5271: __ sshr(as_FloatRegister($dst$$reg), __ T8B, > > I think we should add an assert to make sure 0 is not passed to the assembler. Agree, I'll do this. ------------- PR: https://git.openjdk.java.net/jdk/pull/2472 From dongbo at openjdk.java.net Tue Feb 9 09:13:47 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Tue, 9 Feb 2021 09:13:47 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v2] In-Reply-To: References: Message-ID: > In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero, > see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`: > /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */ > public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT); > > The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64, > assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 0`, but `ushr dst.4H, src.4H, 16` instead. > According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long. > ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i); > vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i); > > The legal right shift amount should be in the range 1 to the element width in bits on aarch64: > https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en > > This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate. > Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests. Dong Bo has updated the pull request incrementally with one additional commit since the last revision: add assertion in the assembler ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2472/files - new: https://git.openjdk.java.net/jdk/pull/2472/files/c44bebb0..8439f167 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2472.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2472/head:pull/2472 PR: https://git.openjdk.java.net/jdk/pull/2472 From aph at openjdk.java.net Tue Feb 9 09:32:37 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 9 Feb 2021 09:32:37 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v2] In-Reply-To: References: Message-ID: <_ach7OekIqkqmFRW3JqA5h4Q_HQUbRni0vkFzx5q3MA=.536a9faa-98c9-4dd9-9798-dcc794e23cd0@github.com> On Tue, 9 Feb 2021 09:13:47 GMT, Dong Bo wrote: >> In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero, >> see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`: >> /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */ >> public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT); >> >> The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64, >> assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 0`, but `ushr dst.4H, src.4H, 16` instead. >> According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long. >> ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i); >> vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i); >> >> The legal right shift amount should be in the range 1 to the element width in bits on aarch64: >> https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en >> >> This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate. >> Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests. > > Dong Bo has updated the pull request incrementally with one additional commit since the last revision: > > add assertion in the assembler src/hotspot/cpu/aarch64/aarch64_neon_ad.m4 line 2057: > 2055: as_FloatRegister($src$$reg), as_FloatRegister($src$$reg)); > 2056: } else {ifelse($4, B,` > 2057: if (sh >= 8) sh = 7; I think it would be possible to move some of this logic from the AD file into MacroAssembler, with macros to generate the appropriate instruction based on their arguments. This might be cleaner: the logic here is very hard to follow. ------------- PR: https://git.openjdk.java.net/jdk/pull/2472 From stefank at openjdk.java.net Tue Feb 9 09:34:38 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Tue, 9 Feb 2021 09:34:38 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v12] In-Reply-To: References: Message-ID: <8MnBLkES1lapB4b01NDzU9nhOk8_9_V--NSCM5H_bg8=.7bdb576b-4acd-4e5b-be14-b363a2ef47bf@github.com> On Fri, 5 Feb 2021 16:07:09 GMT, Anton Kozlov wrote: >> Please review the implementation of JEP 391: macOS/AArch64 Port. >> >> It's heavily based on existing ports to linux/aarch64, macos/x86_64, and windows/aarch64. >> >> Major changes are in: >> * src/hotspot/cpu/aarch64: support of the new calling convention (subtasks JDK-8253817, JDK-8253818) >> * src/hotspot/os_cpu/bsd_aarch64: copy of os_cpu/linux_aarch64 with necessary adjustments (JDK-8253819) >> * src/hotspot/share, test/hotspot/gtest: support of write-xor-execute (W^X), required on macOS/AArch64 platform. It's implemented with pthread_jit_write_protect_np provided by Apple. The W^X mode is local to a thread, so W^X mode change relates to the java thread state change (for java threads). In most cases, JVM executes in write-only mode, except when calling a generated stub like SafeFetch, which requires a temporary switch to execute-only mode. The same execute-only mode is enabled when a java thread executes in java or native states. This approach of managing W^X mode turned out to be simple and efficient enough. >> * src/jdk.hotspot.agent: serviceability agent implementation (JDK-8254941) > > Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Update signal handler part for debugger Thanks for cleaning out WXWriteFromExecSetter. src/hotspot/share/gc/shared/barrierSetNMethod.cpp line 52: > 50: > 51: int BarrierSetNMethod::nmethod_stub_entry_barrier(address* return_address_ptr) { > 52: // Enable WXWrite: the function is called direclty from nmethod_entry_barrier direclty -> directly src/hotspot/share/runtime/threadWXSetters.hpp line 28: > 26: #define SHARE_RUNTIME_THREADWXSETTERS_HPP > 27: > 28: #include "runtime/thread.inline.hpp" This breaks against our convention to forbid inline.hpp files from being included from being included from .hpp files. You need to rework this by either moving the implementation to a .cpp file, or convert this file into an .inline.hpp See the Source Files section in: https://htmlpreview.github.io/?https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.html src/hotspot/share/runtime/thread.hpp line 848: > 846: void init_wx(); > 847: WXMode enable_wx(WXMode new_state); > 848: #endif // __APPLE__ && AARCH64 Now that this is only compiled into macOS/AArch64, could this be moved over to thread_bsd_aarch64.hpp? The same goes for the associated functions. src/hotspot/share/runtime/thread.cpp line 2515: > 2513: void JavaThread::check_special_condition_for_native_trans(JavaThread *thread) { > 2514: // Enable WXWrite: called directly from interpreter native wrapper. > 2515: MACOS_AARCH64_ONLY(ThreadWXEnable wx(WXWrite, thread)); FWIW, I personally think that adding these MACOS_AARCH64_ONLY usages at the call sites increase the line-noise in the affected functions. I think I would have preferred a version: ThreadWXEnable(WXMode new_mode, Thread* thread = NULL) { MACOS_AARCH64_ONLY(initialize(new_mode, thread);) {} void initialize(...); // Implementation in thread_bsd_aarch64.cpp (alt. inline.hpp) With that said, I'm fine with taking this discussion as a follow-up. ------------- Changes requested by stefank (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2200 From simonis at openjdk.java.net Tue Feb 9 11:15:31 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Tue, 9 Feb 2021 11:15:31 GMT Subject: RFR: 8261027: AArch64: Support for LSE atomics C++ HotSpot code [v2] In-Reply-To: References: Message-ID: On Mon, 8 Feb 2021 18:50:09 GMT, Andrew Haley wrote: >> Go back a few years, and there were simple atomic load/store exclusive >> instructions on Arm. Say you want to do an atomic increment of a >> counter. You'd do an atomic load to get the counter into your local cache >> in exclusive state, increment that counter locally, then write that >> incremented counter back to memory with an atomic store. All the time >> that cache line was in exclusive state, so you're guaranteed that >> no-one else changed anything on that cache line while you had it. >> >> This is hard to scale on a very large system (e.g. Fugaku) because if >> many processors are incrementing that counter you get a lot of cache >> line ping-ponging between cores. >> >> So, Arm decided to add a locked memory increment instruction that >> works without needing to load an entire line into local cache. It's a >> single instruction that loads, increments, and writes back. The secret >> is to send a cache control message to whichever processor owns the >> cache line containing the count, tell that processor to increment the >> counter and return the incremented value. That way cache coherency >> traffic is mimimized. This new set of instructions is known as Large >> System Extensions, or LSE. >> >> Unfortunately, in recent processors, the "old" load/store exclusive >> instructions, sometimes perform very badly. Therefore, it's now >> necessary for software to detect which version of Arm it's running >> on, and use the "new" LSE instructions if they're available. Otherwise >> performance can be very poor under heavy contention. >> >> GCC's -moutline-atomics does this by providing library calls which use >> LSE if it's available, but this option is only provided on newer >> versions of GCC. This is particularly problematic with older versions >> of OpenJDK, which build using old GCC versions. >> >> Also, I suspect that some other operating systems could use this. >> Perhaps not MacOS, given that all Apple CPUs support LSE, but >> maybe Windows. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Review changes Hi Andrew, I'm happy to see this change after the [long](https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-November/039930.html) and [tedious](https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-November/039932.html) discussions we had about preferring C++ intrinsic over inline assembly :) In general I'm fine with the change. Some of the previous C++ intrinsics (e.g. `__atomic_exchange_n` and `__atomic_add_fetch`) were called with `__ATOMIC_RELEASE` semantics which has now been dropped in the new implementation. But I think that's safe and a nice "optimization" because the instructions are followed by a full membar anyway. One question I still have is why we need the default assembler implementations at all. As far as I can see, the MacroAssembler already dispatches based on LSE availability. So why can't we just use the generated stubs exclusively? This would also solve the platform problems with assembler code. Finally, I didn't fully understand how you've measured the `call..ret` overhead and what's the "*simple stright-line test*" you've posted performance numbers for. Other than that, the change looks fine to me. ------------- PR: https://git.openjdk.java.net/jdk/pull/2434 From zgu at openjdk.java.net Tue Feb 9 13:23:59 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Tue, 9 Feb 2021 13:23:59 GMT Subject: RFR: JDK-8261302: NMT: Improve malloc site table hashing [v2] In-Reply-To: <8IRE-aXvXSOs_MRNdAplymbZHr8yT1zqWVoKlMX55ck=.0eb8db60-a204-4d80-96bf-742833965da2@github.com> References: <3vDpj4wrk0G9gSbO8IvCZ-9N8S9LZBEUujqdvL1Ucbs=.903d98bc-3cef-425c-b65b-29c5dc9d2d4c@github.com> <8IRE-aXvXSOs_MRNdAplymbZHr8yT1zqWVoKlMX55ck=.0eb8db60-a204-4d80-96bf-742833965da2@github.com> Message-ID: <3-fVQ0wRbhciN9MWY_lexUavzMXikSZLBkJlCWQKMfg=.ca878ca9-9471-4cd8-8748-6dc385b49131@github.com> On Tue, 9 Feb 2021 08:05:59 GMT, Thomas Stuefe wrote: >> While looking at NMT tuning statistics, I saw longish bucket chains in the malloc site table and looked whether this can be improved. >> >> The current hash algorithm uses the 32bit masked sum of all stack entries as hash. >> >> I first experimented with different hash algorithms on different platforms (x86 and ppc, the latter because it has uniform op sizes) and did actually not find a noticeable improvement over what NMT does now. It seems that using the raw code pointers as base for the hash gives us already enough entropy. Avg load factor of the table always hovered around what was expected. Regardless of the hash I tried I was not able to get rid of the few longer chains. >> >> The biggest improvement brought an experimental table size increase: currently the table size is 511 pointer slots (~4K). Quadrupling the size would bring the load factor down to 1-2. However, there is this comment in mallocSiteTable.hpp: >> >> https://github.com/openjdk/jdk/blob/5183d8ae1eec86202eace2c4770f81edbc73cb68/src/hotspot/share/services/mallocSiteTable.hpp#L118 >> >> wich claims that a load factor of 6 is what is aimed for and deemed acceptable. So I am not going to touch that here (even though 4 or 12K more may be an okay price to pay for more efficient lookups. >> >> --- >> >> With that out of the way, there are still small things we can improve about the hash function: >> >> Since the vast majority of `NativeCallStack` objects will always need a hash code, it makes no sense to delay its calculation. By doing the hash code calculation in the constructor we can make `NativeCallStack::hash()` a simple inline getter. >> >> When calculating the hash code, I also omitted the "if stack address is 0 stop" logic. The vast majority of call stacks have the full size and nothing much is gained from omitting those 0 values from the hash code calculation. >> >> See difference (linux x86): >> >> Before: >> >> Dump of assembler code for function NativeCallStack::hash() const: >> => 0x00007ffff68092f0 <+0>: mov 0x20(%rdi),%eax > 0x00007ffff68092f3 <+3>: push %rbp >> 0x00007ffff68092f4 <+4>: mov %rsp,%rbp >> 0x00007ffff68092f7 <+7>: test %eax,%eax <0?->X >> 0x00007ffff68092f9 <+9>: jne 0x7ffff6809324 >> 0x00007ffff68092fb <+11>: mov (%rdi),%rdx >> 0x00007ffff68092fe <+14>: test %rdx,%rdx >> 0x00007ffff6809301 <+17>: je 0x7ffff6809330 >> 0x00007ffff6809303 <+19>: mov 0x8(%rdi),%rax >> 0x00007ffff6809307 <+23>: test %rax,%rax >> 0x00007ffff680930a <+26>: je 0x7ffff680931f >> 0x00007ffff680930c <+28>: add %rax,%rdx >> 0x00007ffff680930f <+31>: mov 0x10(%rdi),%rax >> 0x00007ffff6809313 <+35>: test %rax,%rax >> 0x00007ffff6809316 <+38>: je 0x7ffff680931f >> 0x00007ffff6809318 <+40>: add %rax,%rdx >> 0x00007ffff680931b <+43>: add 0x18(%rdi),%rdx >> 0x00007ffff680931f <+47>: mov %edx,%eax >> 0x00007ffff6809321 <+49>: mov %edx,0x20(%rdi) >> 0x00007ffff6809324 <+52>: pop %rbp > 0x00007ffff6809325 <+53>: retq >> 0x00007ffff6809326 <+54>: nopw %cs:0x0(%rax,%rax,1) >> 0x00007ffff6809330 <+64>: xor %edx,%edx >> 0x00007ffff6809332 <+66>: jmp 0x7ffff680931f >> >> hash() getter is not inlined; it queries the hash code each time and, when calculating it, uses simple adds interspersed with conditional jumps because of the "if stack address is 0 stop" logic. >> >> With this patch, the `NativeCallStack::hash()` gets inlined at the call sites to a simple load. >> The hash calculation gets now inlined into the constructor and uses a series of simple adds now: >> >> Dump of assembler code for function NativeCallStack::NativeCallStack(int, bool): >> => 0x00007ffff67cc9a0 <+0>: push %rbp >> 0x00007ffff67cc9a1 <+1>: mov %rsp,%rbp >> 0x00007ffff67cc9a4 <+4>: push %rbx >> 0x00007ffff67cc9a5 <+5>: mov %rdi,%rbx >> 0x00007ffff67cc9a8 <+8>: sub $0x8,%rsp >> 0x00007ffff67cc9ac <+12>: test %dl,%dl >> 0x00007ffff67cc9ae <+14>: movl $0x0,0x20(%rdi) >> 0x00007ffff67cc9b5 <+21>: jne 0x7ffff67cc9f0 >> 0x00007ffff67cc9b7 <+23>: movq $0x0,(%rdi) >> 0x00007ffff67cc9be <+30>: movq $0x0,0x8(%rdi) >> 0x00007ffff67cc9c6 <+38>: movq $0x0,0x10(%rdi) >> 0x00007ffff67cc9ce <+46>: movq $0x0,0x18(%rdi) >> 0x00007ffff67cc9d6 <+54>: mov 0x10(%rbx),%rax <<< >> 0x00007ffff67cc9da <+58>: add 0x8(%rbx),%rax <<< >> 0x00007ffff67cc9de <+62>: add 0x18(%rbx),%rax <<< >> 0x00007ffff67cc9e2 <+66>: add (%rbx),%rax <<< >> 0x00007ffff67cc9e5 <+69>: mov %eax,0x20(%rbx) >> 0x00007ffff67cc9e8 <+72>: add $0x8,%rsp >> 0x00007ffff67cc9ec <+76>: pop %rbx >> 0x00007ffff67cc9ed <+77>: pop %rbp >> 0x00007ffff67cc9ee <+78>: retq >> 0x00007ffff67cc9ef <+79>: nop >> 0x00007ffff67cc9f0 <+80>: mov %esi,%edx >> 0x00007ffff67cc9f2 <+82>: mov $0x4,%esi >> 0x00007ffff67cc9f7 <+87>: callq 0x7ffff68250d0 >> 0x00007ffff67cc9fc <+92>: jmp 0x7ffff67cc9d6 >> End of assembler dump. >> >> Thanks, Thomas > > Thomas Stuefe has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. Looks good to me. ------------- Marked as reviewed by zgu (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2473 From github.com+9200663+quaffel at openjdk.java.net Tue Feb 9 13:30:27 2021 From: github.com+9200663+quaffel at openjdk.java.net (Niklas Radomski) Date: Tue, 9 Feb 2021 13:30:27 GMT Subject: RFR: JDK-8260372: [PPC64] Add support for JDK-8210498 and JDK-8222841 [v4] In-Reply-To: References: Message-ID: > Introduces support for _nmethod entry barriers_ and _c2i entry barriers_ on the ppc platform. Those are required to enable concurrent class unloading for compatible garbage collectors, such as Shenandoah or zGC. > > _This is a preparational change for the Shenandoah GC port to ppc. As such, it introduces features that the current version doesn't make use of, but that are required for the upcoming change. This way, the scope of the upcoming change is limited to Shenandoah-specific functionality; making its review a little easier._ Niklas Radomski has updated the pull request incrementally with one additional commit since the last revision: Update comment ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2432/files - new: https://git.openjdk.java.net/jdk/pull/2432/files/202d0d35..60f884cc Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2432&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2432&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2432.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2432/head:pull/2432 PR: https://git.openjdk.java.net/jdk/pull/2432 From github.com+9200663+quaffel at openjdk.java.net Tue Feb 9 13:30:30 2021 From: github.com+9200663+quaffel at openjdk.java.net (Niklas Radomski) Date: Tue, 9 Feb 2021 13:30:30 GMT Subject: RFR: JDK-8260372: [PPC64] Add support for JDK-8210498 and JDK-8222841 [v3] In-Reply-To: References: Message-ID: On Tue, 9 Feb 2021 08:28:59 GMT, Goetz Lindenmaier wrote: >> Niklas Radomski has updated the pull request incrementally with two additional commits since the last revision: >> >> - Use toc offset in c2i entry barrier >> - Update comments > > Hi, > I'm not that deep in this code, but looking at other platforms this looks good. > Thanks for porting this! > Feel free to fix the comment or push as-is ... if this is possible with this tooling :) Thank you for your reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/2432 From stuefe at openjdk.java.net Tue Feb 9 13:36:04 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 9 Feb 2021 13:36:04 GMT Subject: RFR: JDK-8261302: NMT: Improve malloc site table hashing [v2] In-Reply-To: <3-fVQ0wRbhciN9MWY_lexUavzMXikSZLBkJlCWQKMfg=.ca878ca9-9471-4cd8-8748-6dc385b49131@github.com> References: <3vDpj4wrk0G9gSbO8IvCZ-9N8S9LZBEUujqdvL1Ucbs=.903d98bc-3cef-425c-b65b-29c5dc9d2d4c@github.com> <8IRE-aXvXSOs_MRNdAplymbZHr8yT1zqWVoKlMX55ck=.0eb8db60-a204-4d80-96bf-742833965da2@github.com> <3-fVQ0wRbhciN9MWY_lexUavzMXikSZLBkJlCWQKMfg=.ca878ca9-9471-4cd8-8748-6dc385b49131@github.com> Message-ID: On Tue, 9 Feb 2021 13:21:11 GMT, Zhengyu Gu wrote: > Looks good to me. Thanks Zhengyu. ------------- PR: https://git.openjdk.java.net/jdk/pull/2473 From aph at openjdk.java.net Tue Feb 9 13:52:41 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 9 Feb 2021 13:52:41 GMT Subject: RFR: 8261027: AArch64: Support for LSE atomics C++ HotSpot code [v2] In-Reply-To: References: Message-ID: On Tue, 9 Feb 2021 11:12:21 GMT, Volker Simonis wrote: > I'm happy to see this change after the [long](https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-November/039930.html) and [tedious](https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-November/039932.html) discussions we had about preferring C++ intrinsic over inline assembly :) > > In general I'm fine with the change. Some of the previous C++ intrinsics (e.g. `__atomic_exchange_n` and `__atomic_add_fetch`) were called with `__ATOMIC_RELEASE` semantics which has now been dropped in the new implementation. But I think that's safe and a nice "optimization" because the instructions are followed by a full membar anyway. None of these sequences is ideal, so I'll follow up with some higher-performance LSE versions in a new patch. > One question I still have is why we need the default assembler implementations at all. As far as I can see, the MacroAssembler already dispatches based on LSE availability. So why can't we just use the generated stubs exclusively? This would also solve the platform problems with assembler code. We'd need an instance of Assembler very early, before the JVM is initialized. It could be done, but it would also a page of memory to be allocated early too. I did try, but it was rather awful. That is why I ended up with these simple bootstrapping versions of the atomics. > Finally, I didn't fully understand how you've measured the `call..ret` overhead and what's the "_simple stright-line test_" you've posted performance numbers for. That was just a counter in a loop. It's not terribly important for this case, given that the current code is way from optimal, but I wanted to know if the call...ret overhead could be measured. It can't, because it's swamped by the cost of the barriers. ------------- PR: https://git.openjdk.java.net/jdk/pull/2434 From github.com+9200663+quaffel at openjdk.java.net Tue Feb 9 15:05:49 2021 From: github.com+9200663+quaffel at openjdk.java.net (Niklas Radomski) Date: Tue, 9 Feb 2021 15:05:49 GMT Subject: Integrated: JDK-8260372: [PPC64] Add support for JDK-8210498 and JDK-8222841 In-Reply-To: References: Message-ID: On Fri, 5 Feb 2021 18:04:34 GMT, Niklas Radomski wrote: > Introduces support for _nmethod entry barriers_ and _c2i entry barriers_ on the ppc platform. Those are required to enable concurrent class unloading for compatible garbage collectors, such as Shenandoah or zGC. > > _This is a preparational change for the Shenandoah GC port to ppc. As such, it introduces features that the current version doesn't make use of, but that are required for the upcoming change. This way, the scope of the upcoming change is limited to Shenandoah-specific functionality; making its review a little easier._ This pull request has now been integrated. Changeset: 906facab Author: Quaffel Committer: Martin Doerr URL: https://git.openjdk.java.net/jdk/commit/906facab Stats: 264 lines in 10 files changed: 254 ins; 0 del; 10 mod 8260372: [PPC64] Add support for JDK-8210498 and JDK-8222841 Reviewed-by: mdoerr, goetz ------------- PR: https://git.openjdk.java.net/jdk/pull/2432 From aph at openjdk.java.net Tue Feb 9 15:07:05 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 9 Feb 2021 15:07:05 GMT Subject: RFR: 8261027: AArch64: Support for LSE atomics C++ HotSpot code [v3] In-Reply-To: References: Message-ID: > Go back a few years, and there were simple atomic load/store exclusive > instructions on Arm. Say you want to do an atomic increment of a > counter. You'd do an atomic load to get the counter into your local cache > in exclusive state, increment that counter locally, then write that > incremented counter back to memory with an atomic store. All the time > that cache line was in exclusive state, so you're guaranteed that > no-one else changed anything on that cache line while you had it. > > This is hard to scale on a very large system (e.g. Fugaku) because if > many processors are incrementing that counter you get a lot of cache > line ping-ponging between cores. > > So, Arm decided to add a locked memory increment instruction that > works without needing to load an entire line into local cache. It's a > single instruction that loads, increments, and writes back. The secret > is to send a cache control message to whichever processor owns the > cache line containing the count, tell that processor to increment the > counter and return the incremented value. That way cache coherency > traffic is mimimized. This new set of instructions is known as Large > System Extensions, or LSE. > > Unfortunately, in recent processors, the "old" load/store exclusive > instructions, sometimes perform very badly. Therefore, it's now > necessary for software to detect which version of Arm it's running > on, and use the "new" LSE instructions if they're available. Otherwise > performance can be very poor under heavy contention. > > GCC's -moutline-atomics does this by providing library calls which use > LSE if it's available, but this option is only provided on newer > versions of GCC. This is particularly problematic with older versions > of OpenJDK, which build using old GCC versions. > > Also, I suspect that some other operating systems could use this. > Perhaps not MacOS, given that all Apple CPUs support LSE, but > maybe Windows. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Properly align everything ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2434/files - new: https://git.openjdk.java.net/jdk/pull/2434/files/31f9c003..1cec2f54 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2434&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2434&range=01-02 Stats: 7 lines in 1 file changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/2434.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2434/head:pull/2434 PR: https://git.openjdk.java.net/jdk/pull/2434 From simonis at openjdk.java.net Tue Feb 9 16:53:40 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Tue, 9 Feb 2021 16:53:40 GMT Subject: RFR: 8261027: AArch64: Support for LSE atomics C++ HotSpot code [v3] In-Reply-To: References: Message-ID: On Tue, 9 Feb 2021 15:07:05 GMT, Andrew Haley wrote: >> Go back a few years, and there were simple atomic load/store exclusive >> instructions on Arm. Say you want to do an atomic increment of a >> counter. You'd do an atomic load to get the counter into your local cache >> in exclusive state, increment that counter locally, then write that >> incremented counter back to memory with an atomic store. All the time >> that cache line was in exclusive state, so you're guaranteed that >> no-one else changed anything on that cache line while you had it. >> >> This is hard to scale on a very large system (e.g. Fugaku) because if >> many processors are incrementing that counter you get a lot of cache >> line ping-ponging between cores. >> >> So, Arm decided to add a locked memory increment instruction that >> works without needing to load an entire line into local cache. It's a >> single instruction that loads, increments, and writes back. The secret >> is to send a cache control message to whichever processor owns the >> cache line containing the count, tell that processor to increment the >> counter and return the incremented value. That way cache coherency >> traffic is mimimized. This new set of instructions is known as Large >> System Extensions, or LSE. >> >> Unfortunately, in recent processors, the "old" load/store exclusive >> instructions, sometimes perform very badly. Therefore, it's now >> necessary for software to detect which version of Arm it's running >> on, and use the "new" LSE instructions if they're available. Otherwise >> performance can be very poor under heavy contention. >> >> GCC's -moutline-atomics does this by providing library calls which use >> LSE if it's available, but this option is only provided on newer >> versions of GCC. This is particularly problematic with older versions >> of OpenJDK, which build using old GCC versions. >> >> Also, I suspect that some other operating systems could use this. >> Perhaps not MacOS, given that all Apple CPUs support LSE, but >> maybe Windows. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Properly align everything src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 35: > 33: #include "interpreter/interpreter.hpp" > 34: #include "memory/universe.hpp" > 35: #include "atomic_aarch64.hpp" I think the conventions is to put the includes in alphabetic order after `#include "precompiled.hpp"` ------------- PR: https://git.openjdk.java.net/jdk/pull/2434 From simonis at openjdk.java.net Tue Feb 9 17:07:38 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Tue, 9 Feb 2021 17:07:38 GMT Subject: RFR: 8261027: AArch64: Support for LSE atomics C++ HotSpot code [v2] In-Reply-To: References: Message-ID: On Tue, 9 Feb 2021 13:49:45 GMT, Andrew Haley wrote: > > I'm happy to see this change after the [long](https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-November/039930.html) and [tedious](https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-November/039932.html) discussions we had about preferring C++ intrinsic over inline assembly :) > > In general I'm fine with the change. Some of the previous C++ intrinsics (e.g. `__atomic_exchange_n` and `__atomic_add_fetch`) were called with `__ATOMIC_RELEASE` semantics which has now been dropped in the new implementation. But I think that's safe and a nice "optimization" because the instructions are followed by a full membar anyway. > > None of these sequences is ideal, so I'll follow up with some higher-performance LSE versions in a new patch. > > > One question I still have is why we need the default assembler implementations at all. As far as I can see, the MacroAssembler already dispatches based on LSE availability. So why can't we just use the generated stubs exclusively? This would also solve the platform problems with assembler code. > > We'd need an instance of Assembler very early, before the JVM is initialized. It could be done, but it would also a page of memory to be allocated early too. I did try, but it was rather awful. That is why I ended up with these simple bootstrapping versions of the atomics. > OK, I see. Bootstrapping is more complex than I thought :) But nevertheless I think implementing the default versions in native assembly isn't really simple and putting that Linux/gcc specific assembly code into the generic aarch64 directory `src/hotspot/cpu/aarch64` will break other aarch64 platforms like Windows and Mac. Why don't you leave the default implementation as simple wrappers for the C++ compiler intrinsics somewhere under `src/hotspot/os_cpu/linux_aarch64` and remove the: if (! UseLSE) { return; } in `generate_atomic_entry_points()`? In that case, the intrinsic versions would really only be used for bootstrapping until the stubs have been generated. I don't see a good reason why we should maintain two different assembler implementations of the non-LSE atomics for aarch64 - one in native assembler and on in generated assembler. > > Finally, I didn't fully understand how you've measured the `call..ret` overhead and what's the "_simple stright-line test_" you've posted performance numbers for. > > That was just a counter in a loop. It's not terribly important for this case, given that the current code is way from optimal, but I wanted to know if the call...ret overhead could be measured. It can't, because it's swamped by the cost of the barriers. ------------- PR: https://git.openjdk.java.net/jdk/pull/2434 From aph at openjdk.java.net Tue Feb 9 17:27:59 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 9 Feb 2021 17:27:59 GMT Subject: RFR: 8261027: AArch64: Support for LSE atomics C++ HotSpot code [v4] In-Reply-To: References: Message-ID: > Go back a few years, and there were simple atomic load/store exclusive > instructions on Arm. Say you want to do an atomic increment of a > counter. You'd do an atomic load to get the counter into your local cache > in exclusive state, increment that counter locally, then write that > incremented counter back to memory with an atomic store. All the time > that cache line was in exclusive state, so you're guaranteed that > no-one else changed anything on that cache line while you had it. > > This is hard to scale on a very large system (e.g. Fugaku) because if > many processors are incrementing that counter you get a lot of cache > line ping-ponging between cores. > > So, Arm decided to add a locked memory increment instruction that > works without needing to load an entire line into local cache. It's a > single instruction that loads, increments, and writes back. The secret > is to send a cache control message to whichever processor owns the > cache line containing the count, tell that processor to increment the > counter and return the incremented value. That way cache coherency > traffic is mimimized. This new set of instructions is known as Large > System Extensions, or LSE. > > Unfortunately, in recent processors, the "old" load/store exclusive > instructions, sometimes perform very badly. Therefore, it's now > necessary for software to detect which version of Arm it's running > on, and use the "new" LSE instructions if they're available. Otherwise > performance can be very poor under heavy contention. > > GCC's -moutline-atomics does this by providing library calls which use > LSE if it's available, but this option is only provided on newer > versions of GCC. This is particularly problematic with older versions > of OpenJDK, which build using old GCC versions. > > Also, I suspect that some other operating systems could use this. > Perhaps not MacOS, given that all Apple CPUs support LSE, but > maybe Windows. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Sort includes ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2434/files - new: https://git.openjdk.java.net/jdk/pull/2434/files/1cec2f54..da004adc Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2434&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2434&range=02-03 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2434.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2434/head:pull/2434 PR: https://git.openjdk.java.net/jdk/pull/2434 From aph at openjdk.java.net Tue Feb 9 17:28:00 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 9 Feb 2021 17:28:00 GMT Subject: RFR: 8261027: AArch64: Support for LSE atomics C++ HotSpot code [v4] In-Reply-To: References: Message-ID: On Mon, 8 Feb 2021 09:51:26 GMT, Andrew Haley wrote: >> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5579: >> >>> 5577: // >>> 5578: // If LSE is in use, generate LSE versions of all the stubs. The >>> 5579: // non-LSE versions are in atomic_aarch64.S. >> >> IMO it would be better for maintainability if the LSE versions were in atomic_aarch64.S too (with an explicit `.arch armv8-a+lse` directive). Is there any reason to generate them here, other than to support old toolchains? As far as I can tell GNU as supported LSE as far back as binutils 2.27. >> >> https://sourceware.org/binutils/docs-2.27/as/AArch64-Extensions.html > > I can't see any reason to do this.There's be no benefit to moving this stuff, and it would be harder to change in the future. I'd do the whole lot as runtime stubs if I could, but they're needed before VM startup. And I should also have said: I intend to do highly-optimized versions of the LSE atomics in a subsequent PR, and I'd much prefer to do the work internally within HotSpot. ------------- PR: https://git.openjdk.java.net/jdk/pull/2434 From aph at openjdk.java.net Tue Feb 9 17:28:02 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 9 Feb 2021 17:28:02 GMT Subject: RFR: 8261027: AArch64: Support for LSE atomics C++ HotSpot code [v3] In-Reply-To: References: Message-ID: On Tue, 9 Feb 2021 16:50:35 GMT, Volker Simonis wrote: >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> Properly align everything > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 35: > >> 33: #include "interpreter/interpreter.hpp" >> 34: #include "memory/universe.hpp" >> 35: #include "atomic_aarch64.hpp" > > I think the conventions is to put the includes in alphabetic order after `#include "precompiled.hpp"` Good catch, thanks ------------- PR: https://git.openjdk.java.net/jdk/pull/2434 From aph at openjdk.java.net Tue Feb 9 17:35:38 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 9 Feb 2021 17:35:38 GMT Subject: RFR: 8261027: AArch64: Support for LSE atomics C++ HotSpot code [v2] In-Reply-To: References: Message-ID: On Tue, 9 Feb 2021 17:05:01 GMT, Volker Simonis wrote: > But nevertheless I think implementing the default versions in native assembly isn't really simple and putting that Linux/gcc specific assembly code into the generic aarch64 directory `src/hotspot/cpu/aarch64` will break other aarch64 platforms like Windows and Mac. OK. We can do Windows another way, and I will move the assembler stubs to Linux. > Why don't you leave the default implementation as simple wrappers for the C++ compiler intrinsics Because the atomic stubs use a non-standard calling convention that only clobbers a few registers, so they can't be written in C++ because we can't control which registers the C++ compiler uses. If we were to use the native calling convention to call stubs we'd need to save and restore a ton of registers somehow - and not just the integer registers but also the vectors. It wouldn't be any simpler. I do intend to provide lower-overhead versions of the Atomic functions in a later patch. This one does the LSE/non-LSE split without changing anything else. ------------- PR: https://git.openjdk.java.net/jdk/pull/2434 From shade at openjdk.java.net Tue Feb 9 17:54:46 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 9 Feb 2021 17:54:46 GMT Subject: RFR: 8261449: Micro-optimize JVM_LatestUserDefinedLoader Message-ID: <3EpwZwOAE3WA0NyFG6b2UcZLdKcBihJ82eKMj0gunBE=.3435d3cc-f9f4-4994-855b-ce1ce6e6f93f@github.com> `JVM_LatestUserDefinedLoader` is called normally from `ObjectInputStream.resolveClass` -> `VM.latestUserDefinedLoader0`. And it takes a measurable time to walk the stack. There is JDK-8173368 that wants to replace it with `StackWalker`, but we can tune up the `JVM_LatestUserDefinedLoader` itself without changing the semantics of it (thus providing the backportability, including the releases that do not have `StackWalker`) and improving performance (thus providing a more aggressive baseline for `StackWalker` rewrite). The key is to recognize that out of two checks: 1) checking for two special subclasses; 2) checking for user classloader -- the first one usually passes, and second one fails much more frequently. First check also requires traversing the superclasses upwards looking for match. Reversing the order of the checks, plus inlining the helper method improves performance without changing the semantics. Out of curiosity, my previous patch dropped the first check completely, replacing it by asserts, and we definitely run into situation where that check is needed on some tests. On my machine, `VM.latestUserDefinedLoader` invocation time drops from 115 to 100 ns/op. Single-threaded SPECjvm2008:serial improves about 3% with this patch. Additional testing: - [x] Ad-hoc benchmarks - [x] Linux x86_64 fastdebug, `tier1`, `tier2`, `tier3` --------- ### Progress - [x] Change must not contain extraneous whitespace - [x] Commit message must refer to an issue - [ ] Change must be properly reviewed ### Download `$ git fetch https://git.openjdk.java.net/jdk pull/2485/head:pull/2485` `$ git checkout pull/2485` ------------- Commit messages: - 8261449: Micro-optimize JVM_LatestUserDefinedLoader Changes: https://git.openjdk.java.net/jdk/pull/2485/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2485&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261449 Stats: 19 lines in 3 files changed: 3 ins; 13 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/2485.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2485/head:pull/2485 PR: https://git.openjdk.java.net/jdk/pull/2485 From simonis at openjdk.java.net Tue Feb 9 19:05:39 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Tue, 9 Feb 2021 19:05:39 GMT Subject: RFR: 8261027: AArch64: Support for LSE atomics C++ HotSpot code [v2] In-Reply-To: References: Message-ID: On Tue, 9 Feb 2021 17:33:17 GMT, Andrew Haley wrote: >>> > I'm happy to see this change after the [long](https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-November/039930.html) and [tedious](https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-November/039932.html) discussions we had about preferring C++ intrinsic over inline assembly :) >>> > In general I'm fine with the change. Some of the previous C++ intrinsics (e.g. `__atomic_exchange_n` and `__atomic_add_fetch`) were called with `__ATOMIC_RELEASE` semantics which has now been dropped in the new implementation. But I think that's safe and a nice "optimization" because the instructions are followed by a full membar anyway. >>> >>> None of these sequences is ideal, so I'll follow up with some higher-performance LSE versions in a new patch. >>> >>> > One question I still have is why we need the default assembler implementations at all. As far as I can see, the MacroAssembler already dispatches based on LSE availability. So why can't we just use the generated stubs exclusively? This would also solve the platform problems with assembler code. >>> >>> We'd need an instance of Assembler very early, before the JVM is initialized. It could be done, but it would also a page of memory to be allocated early too. I did try, but it was rather awful. That is why I ended up with these simple bootstrapping versions of the atomics. >>> >> >> OK, I see. Bootstrapping is more complex than I thought :) >> >> But nevertheless I think implementing the default versions in native assembly isn't really simple and putting that Linux/gcc specific assembly code into the generic aarch64 directory `src/hotspot/cpu/aarch64` will break other aarch64 platforms like Windows and Mac. Why don't you leave the default implementation as simple wrappers for the C++ compiler intrinsics somewhere under `src/hotspot/os_cpu/linux_aarch64` and remove the: >> if (! UseLSE) { >> return; >> } >> in `generate_atomic_entry_points()`? In that case, the intrinsic versions would really only be used for bootstrapping until the stubs have been generated. I don't see a good reason why we should maintain two different assembler implementations of the non-LSE atomics for aarch64 - one in native assembler and on in generated assembler. >> >>> > Finally, I didn't fully understand how you've measured the `call..ret` overhead and what's the "_simple stright-line test_" you've posted performance numbers for. >>> >>> That was just a counter in a loop. It's not terribly important for this case, given that the current code is way from optimal, but I wanted to know if the call...ret overhead could be measured. It can't, because it's swamped by the cost of the barriers. > >> But nevertheless I think implementing the default versions in native assembly isn't really simple and putting that Linux/gcc specific assembly code into the generic aarch64 directory `src/hotspot/cpu/aarch64` will break other aarch64 platforms like Windows and Mac. > > OK. We can do Windows another way, and I will move the assembler stubs to Linux. > >> Why don't you leave the default implementation as simple wrappers for the C++ compiler intrinsics > > Because the atomic stubs use a non-standard calling convention that only clobbers a few registers, so they can't be written in C++ because we can't control which registers the C++ compiler uses. If we were to use the native calling convention to call stubs we'd need to save and restore a ton of registers somehow - and not just the integer registers but also the vectors. It wouldn't be any simpler. > > I do intend to provide lower-overhead versions of the Atomic functions in a later patch. This one does the LSE/non-LSE split without changing anything else. OK, got it. Then I'm fine with these changes modulo the upcoming move of the assembler stubs to Linux. ------------- PR: https://git.openjdk.java.net/jdk/pull/2434 From njian at openjdk.java.net Wed Feb 10 01:38:39 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Wed, 10 Feb 2021 01:38:39 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v2] In-Reply-To: References: <8kMxMFAYtb0B-yUVEt-HLfhji3Gj-gog8OHvWW_tKfw=.f7c9422b-3574-4c31-9489-7286ee98332f@github.com> Message-ID: On Tue, 9 Feb 2021 08:53:14 GMT, Dong Bo wrote: >> src/hotspot/cpu/aarch64/aarch64_neon.ad line 5285: >> >>> 5283: ins_encode %{ >>> 5284: int sh = (int)$shift$$constant; >>> 5285: if (sh == 0) { >> >> If src and dst are the same reg, no need to emit code. Or maybe c2 can even be improved to optimize this (sh=0 case) out? > >> If src and dst are the same reg, no need to emit code. > > If we want to do this enhancement, I think we need do it for left shifting and all SVE left/right shifting as well for completeness. > >> Or maybe c2 can even be improved to optimize this (sh=0 case) out? > > We can add code in `Ideal` to optimize it to ORR, but I'm not sure `orr` performs better than `shift` on other platforms. > Seems we have to created a generic new node to do `vector move` here. I think with proper optimization, no move is required. But I agree it's beyond the scope of this patch. I will have a look. ------------- PR: https://git.openjdk.java.net/jdk/pull/2472 From dholmes at openjdk.java.net Wed Feb 10 01:42:39 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 10 Feb 2021 01:42:39 GMT Subject: RFR: 8261449: Micro-optimize JVM_LatestUserDefinedLoader In-Reply-To: <3EpwZwOAE3WA0NyFG6b2UcZLdKcBihJ82eKMj0gunBE=.3435d3cc-f9f4-4994-855b-ce1ce6e6f93f@github.com> References: <3EpwZwOAE3WA0NyFG6b2UcZLdKcBihJ82eKMj0gunBE=.3435d3cc-f9f4-4994-855b-ce1ce6e6f93f@github.com> Message-ID: <7ttxRIBoaZSM04YoncG5q6mH9fkNYO5lksu3O3WNObc=.14617a81-82d2-4b30-80ae-6ec7ba9744dd@github.com> On Tue, 9 Feb 2021 15:40:03 GMT, Aleksey Shipilev wrote: > `JVM_LatestUserDefinedLoader` is called normally from `ObjectInputStream.resolveClass` -> `VM.latestUserDefinedLoader0`. And it takes a measurable time to walk the stack. There is JDK-8173368 that wants to replace it with `StackWalker`, but we can tune up the `JVM_LatestUserDefinedLoader` itself without changing the semantics of it (thus providing the backportability, including the releases that do not have `StackWalker`) and improving performance (thus providing a more aggressive baseline for `StackWalker` rewrite). > > The key is to recognize that out of two checks: 1) checking for two special subclasses; 2) checking for user classloader -- the first one usually passes, and second one fails much more frequently. First check also requires traversing the superclasses upwards looking for match. Reversing the order of the checks, plus inlining the helper method improves performance without changing the semantics. > > Out of curiosity, my previous patch dropped the first check completely, replacing it by asserts, and we definitely run into situation where that check is needed on some tests. > > On my machine, `VM.latestUserDefinedLoader` invocation time drops from 115 to 100 ns/op. Single-threaded SPECjvm2008:serial improves about 3% with this patch. > > Additional testing: > - [x] Ad-hoc benchmarks > - [x] Linux x86_64 fastdebug, `tier1`, `tier2`, `tier3` > > --------- > ### Progress > - [x] Change must not contain extraneous whitespace > - [x] Commit message must refer to an issue > - [ ] Change must be properly reviewed > > > > ### Download > `$ git fetch https://git.openjdk.java.net/jdk pull/2485/head:pull/2485` > `$ git checkout pull/2485` Hi Aleksey, This seems reasonable to me. The generated reflection classes are loaded by a temporary loader (so they can be unloaded) and so have to be skipped. I've added core-libs to the PR as this is the VM side of their code and I want to make sure nothing has been overlooked. Thanks, David src/hotspot/share/prims/jvm.cpp line 3293: > 3291: oop loader = ik->class_loader(); > 3292: if (loader != NULL && !SystemDictionary::is_platform_class_loader(loader)) { > 3293: if (!ik->is_subclass_of(vmClasses::reflect_MethodAccessorImpl_klass()) && Please add a comment: // Skip reflection related frames ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2485 From dongbo at openjdk.java.net Wed Feb 10 02:56:57 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Wed, 10 Feb 2021 02:56:57 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v3] In-Reply-To: References: Message-ID: > In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero, > see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`: > /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */ > public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT); > > The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64, > assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 0`, but `ushr dst.4H, src.4H, 16` instead. > According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long. > ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i); > vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i); > > The legal right shift amount should be in the range 1 to the element width in bits on aarch64: > https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en > > This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate. > Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests. Dong Bo has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: back out AD modifications and handle zero shift in assembler ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2472/files - new: https://git.openjdk.java.net/jdk/pull/2472/files/8439f167..af3f2a15 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=01-02 Stats: 284 lines in 3 files changed: 19 ins; 143 del; 122 mod Patch: https://git.openjdk.java.net/jdk/pull/2472.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2472/head:pull/2472 PR: https://git.openjdk.java.net/jdk/pull/2472 From dongbo at openjdk.java.net Wed Feb 10 03:02:41 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Wed, 10 Feb 2021 03:02:41 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v2] In-Reply-To: <_ach7OekIqkqmFRW3JqA5h4Q_HQUbRni0vkFzx5q3MA=.536a9faa-98c9-4dd9-9798-dcc794e23cd0@github.com> References: <_ach7OekIqkqmFRW3JqA5h4Q_HQUbRni0vkFzx5q3MA=.536a9faa-98c9-4dd9-9798-dcc794e23cd0@github.com> Message-ID: <0wUxJ4QUIzC-Hg4qSPtf8nFP0ov9J69nA3gjaoEJcWY=.ede23adb-4010-460d-8ac4-d560ace8ffc0@github.com> On Tue, 9 Feb 2021 09:29:50 GMT, Andrew Haley wrote: >> Dong Bo has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. > > src/hotspot/cpu/aarch64/aarch64_neon_ad.m4 line 2057: > >> 2055: as_FloatRegister($src$$reg), as_FloatRegister($src$$reg)); >> 2056: } else {ifelse($4, B,` >> 2057: if (sh >= 8) sh = 7; > > I think it would be possible to move some of this logic from the AD file into MacroAssembler, with macros to generate the appropriate instruction based on their arguments. This might be cleaner: the logic here is very hard to follow. I backed out the modifications of `aarch64_neon.ad` and `aarch64_neon_ad.m4`. The `shift == 0` case is handled by the assembler now. Verified with the regression tests. ------------- PR: https://git.openjdk.java.net/jdk/pull/2472 From iklam at openjdk.java.net Wed Feb 10 05:17:51 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 10 Feb 2021 05:17:51 GMT Subject: RFR: 8260341: CDS dump VM init code does not check exceptions Message-ID: When CDS dumping is enabled, special initialization happens during VM init. However, many of these calls do not properly check for exception. Instead, they rely on the implicit knowledge that `metaspace::allocate()` will exit the VM when allocation fails during CDS dumping. This makes the code hard to understand and tightly coupled to `metaspace::allocate()`. The fix is: all code that makes allocation should be using CHECK macros, so each block of code can be individually understood without considering the behavior of `metaspace::allocate()`. I added `TRAPS` to a bunch of CDS-related functions that are called during VM init. In some cases, I changed `Thread* THREAD` to `TRAPS`. This also eliminated a few `Thread* THREAD = Thread::current()` calls. The "root" of these calls, such as `MetaspaceShared::prepare_for_dumping()`, now follows this pattern: EXCEPTION_MARK; ClassLoader::initialize_shared_path(THREAD); if (HAS_PENDING_EXCEPTION) { java_lang_Throwable::print(PENDING_EXCEPTION, tty); vm_exit_during_initialization("ClassLoader::initialize_shared_path() failed unexpectedly"); } ------------- Commit messages: - 8260341: CDS dump VM init code does not check exceptions Changes: https://git.openjdk.java.net/jdk/pull/2494/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2494&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260341 Stats: 97 lines in 12 files changed: 18 ins; 11 del; 68 mod Patch: https://git.openjdk.java.net/jdk/pull/2494.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2494/head:pull/2494 PR: https://git.openjdk.java.net/jdk/pull/2494 From ioi.lam at oracle.com Wed Feb 10 06:44:46 2021 From: ioi.lam at oracle.com (Ioi Lam) Date: Tue, 9 Feb 2021 22:44:46 -0800 Subject: Atomic operations: your thoughts are welocme In-Reply-To: References: Message-ID: Just curious, which benchmark is this? Thanks - Ioi On 2/8/21 10:14 AM, Andrew Haley wrote: > I've been looking at the hottest Atomic operations in HotSpot, with a view to > finding out if the default memory_order_conservative (which is very expensive > on some architectures) can be weakened to something less. It's impossible to > fix all of them, but perhaps we can fix some of the most frequent. > > These are the hottest compare-and-swap uses in HotSpot, with the count > at the end of each line. > > : :: = 16406757 > > This one is already memory_order_relaxed, so no problem. > > ::Table::oop_oop_iterate(G1CMOopClosure*, oopDesc*, Klass*)+336>: :: = 3903178 > > This is actually MarkBitMap::par_mark calling BitMap::par_set_bit. Does this > need to be memory_order_conservative, or would something weaker do? Even > acq_rel or seq_cst would be better. > > : :: = 2376632 > : :: = 2003895 > > I can't imagine that either of these actually need memory_order_conservative, > they're just reference counts. > > : :: = 1719614 > > BitMap::par_set_bit again. > > , (MEMFLAGS)5>*)+432>: :: = 1617659 > > This one is GenericTaskQueue::pop_global calling cmpxchg_age(). > Again, do we need conservative here? > > There is, I suppose, always a possibility that some code somewhere is taking > advantage of the memory serializing properties of adjusting refcounts, I suppose. > > Thanks, > From dongbo at openjdk.java.net Wed Feb 10 06:56:53 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Wed, 10 Feb 2021 06:56:53 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v4] In-Reply-To: References: Message-ID: > In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero, > see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`: > /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */ > public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT); > > The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64, > assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 0`, but `ushr dst.4H, src.4H, 16` instead. > According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long. > ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i); > vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i); > > The legal right shift amount should be in the range 1 to the element width in bits on aarch64: > https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en > > This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate. > Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests. Dong Bo has updated the pull request incrementally with one additional commit since the last revision: generate add if shift == 0 for accumulation and fix some test code ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2472/files - new: https://git.openjdk.java.net/jdk/pull/2472/files/af3f2a15..a7b72b0a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=02-03 Stats: 127 lines in 2 files changed: 27 ins; 0 del; 100 mod Patch: https://git.openjdk.java.net/jdk/pull/2472.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2472/head:pull/2472 PR: https://git.openjdk.java.net/jdk/pull/2472 From lucy at openjdk.java.net Wed Feb 10 07:31:40 2021 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Wed, 10 Feb 2021 07:31:40 GMT Subject: RFR: JDK-8261302: NMT: Improve malloc site table hashing [v2] In-Reply-To: <8IRE-aXvXSOs_MRNdAplymbZHr8yT1zqWVoKlMX55ck=.0eb8db60-a204-4d80-96bf-742833965da2@github.com> References: <3vDpj4wrk0G9gSbO8IvCZ-9N8S9LZBEUujqdvL1Ucbs=.903d98bc-3cef-425c-b65b-29c5dc9d2d4c@github.com> <8IRE-aXvXSOs_MRNdAplymbZHr8yT1zqWVoKlMX55ck=.0eb8db60-a204-4d80-96bf-742833965da2@github.com> Message-ID: On Tue, 9 Feb 2021 08:05:59 GMT, Thomas Stuefe wrote: >> While looking at NMT tuning statistics, I saw longish bucket chains in the malloc site table and looked whether this can be improved. >> >> The current hash algorithm uses the 32bit masked sum of all stack entries as hash. >> >> I first experimented with different hash algorithms on different platforms (x86 and ppc, the latter because it has uniform op sizes) and did actually not find a noticeable improvement over what NMT does now. It seems that using the raw code pointers as base for the hash gives us already enough entropy. Avg load factor of the table always hovered around what was expected. Regardless of the hash I tried I was not able to get rid of the few longer chains. >> >> The biggest improvement brought an experimental table size increase: currently the table size is 511 pointer slots (~4K). Quadrupling the size would bring the load factor down to 1-2. However, there is this comment in mallocSiteTable.hpp: >> >> https://github.com/openjdk/jdk/blob/5183d8ae1eec86202eace2c4770f81edbc73cb68/src/hotspot/share/services/mallocSiteTable.hpp#L118 >> >> wich claims that a load factor of 6 is what is aimed for and deemed acceptable. So I am not going to touch that here (even though 4 or 12K more may be an okay price to pay for more efficient lookups. >> >> --- >> >> With that out of the way, there are still small things we can improve about the hash function: >> >> Since the vast majority of `NativeCallStack` objects will always need a hash code, it makes no sense to delay its calculation. By doing the hash code calculation in the constructor we can make `NativeCallStack::hash()` a simple inline getter. >> >> When calculating the hash code, I also omitted the "if stack address is 0 stop" logic. The vast majority of call stacks have the full size and nothing much is gained from omitting those 0 values from the hash code calculation. >> >> See difference (linux x86): >> >> Before: >> >> Dump of assembler code for function NativeCallStack::hash() const: >> => 0x00007ffff68092f0 <+0>: mov 0x20(%rdi),%eax > 0x00007ffff68092f3 <+3>: push %rbp >> 0x00007ffff68092f4 <+4>: mov %rsp,%rbp >> 0x00007ffff68092f7 <+7>: test %eax,%eax <0?->X >> 0x00007ffff68092f9 <+9>: jne 0x7ffff6809324 >> 0x00007ffff68092fb <+11>: mov (%rdi),%rdx >> 0x00007ffff68092fe <+14>: test %rdx,%rdx >> 0x00007ffff6809301 <+17>: je 0x7ffff6809330 >> 0x00007ffff6809303 <+19>: mov 0x8(%rdi),%rax >> 0x00007ffff6809307 <+23>: test %rax,%rax >> 0x00007ffff680930a <+26>: je 0x7ffff680931f >> 0x00007ffff680930c <+28>: add %rax,%rdx >> 0x00007ffff680930f <+31>: mov 0x10(%rdi),%rax >> 0x00007ffff6809313 <+35>: test %rax,%rax >> 0x00007ffff6809316 <+38>: je 0x7ffff680931f >> 0x00007ffff6809318 <+40>: add %rax,%rdx >> 0x00007ffff680931b <+43>: add 0x18(%rdi),%rdx >> 0x00007ffff680931f <+47>: mov %edx,%eax >> 0x00007ffff6809321 <+49>: mov %edx,0x20(%rdi) >> 0x00007ffff6809324 <+52>: pop %rbp > 0x00007ffff6809325 <+53>: retq >> 0x00007ffff6809326 <+54>: nopw %cs:0x0(%rax,%rax,1) >> 0x00007ffff6809330 <+64>: xor %edx,%edx >> 0x00007ffff6809332 <+66>: jmp 0x7ffff680931f >> >> hash() getter is not inlined; it queries the hash code each time and, when calculating it, uses simple adds interspersed with conditional jumps because of the "if stack address is 0 stop" logic. >> >> With this patch, the `NativeCallStack::hash()` gets inlined at the call sites to a simple load. >> The hash calculation gets now inlined into the constructor and uses a series of simple adds now: >> >> Dump of assembler code for function NativeCallStack::NativeCallStack(int, bool): >> => 0x00007ffff67cc9a0 <+0>: push %rbp >> 0x00007ffff67cc9a1 <+1>: mov %rsp,%rbp >> 0x00007ffff67cc9a4 <+4>: push %rbx >> 0x00007ffff67cc9a5 <+5>: mov %rdi,%rbx >> 0x00007ffff67cc9a8 <+8>: sub $0x8,%rsp >> 0x00007ffff67cc9ac <+12>: test %dl,%dl >> 0x00007ffff67cc9ae <+14>: movl $0x0,0x20(%rdi) >> 0x00007ffff67cc9b5 <+21>: jne 0x7ffff67cc9f0 >> 0x00007ffff67cc9b7 <+23>: movq $0x0,(%rdi) >> 0x00007ffff67cc9be <+30>: movq $0x0,0x8(%rdi) >> 0x00007ffff67cc9c6 <+38>: movq $0x0,0x10(%rdi) >> 0x00007ffff67cc9ce <+46>: movq $0x0,0x18(%rdi) >> 0x00007ffff67cc9d6 <+54>: mov 0x10(%rbx),%rax <<< >> 0x00007ffff67cc9da <+58>: add 0x8(%rbx),%rax <<< >> 0x00007ffff67cc9de <+62>: add 0x18(%rbx),%rax <<< >> 0x00007ffff67cc9e2 <+66>: add (%rbx),%rax <<< >> 0x00007ffff67cc9e5 <+69>: mov %eax,0x20(%rbx) >> 0x00007ffff67cc9e8 <+72>: add $0x8,%rsp >> 0x00007ffff67cc9ec <+76>: pop %rbx >> 0x00007ffff67cc9ed <+77>: pop %rbp >> 0x00007ffff67cc9ee <+78>: retq >> 0x00007ffff67cc9ef <+79>: nop >> 0x00007ffff67cc9f0 <+80>: mov %esi,%edx >> 0x00007ffff67cc9f2 <+82>: mov $0x4,%esi >> 0x00007ffff67cc9f7 <+87>: callq 0x7ffff68250d0 >> 0x00007ffff67cc9fc <+92>: jmp 0x7ffff67cc9d6 >> End of assembler dump. >> >> Thanks, Thomas > > Thomas Stuefe has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. LGTM, except for the one comment I added (it's not a "must", it's a "please"). src/hotspot/share/utilities/nativeCallStack.cpp line 31: > 29: #include "utilities/nativeCallStack.hpp" > 30: > 31: static unsigned calculate_hash(address stack[NMT_TrackingStackDepth]) { I know it doesn't change anything semantically, but I'd like to see the int type specifier. ------------- Marked as reviewed by lucy (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2473 From shade at openjdk.java.net Wed Feb 10 07:34:59 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 10 Feb 2021 07:34:59 GMT Subject: RFR: 8261449: Micro-optimize JVM_LatestUserDefinedLoader [v2] In-Reply-To: <3EpwZwOAE3WA0NyFG6b2UcZLdKcBihJ82eKMj0gunBE=.3435d3cc-f9f4-4994-855b-ce1ce6e6f93f@github.com> References: <3EpwZwOAE3WA0NyFG6b2UcZLdKcBihJ82eKMj0gunBE=.3435d3cc-f9f4-4994-855b-ce1ce6e6f93f@github.com> Message-ID: > `JVM_LatestUserDefinedLoader` is called normally from `ObjectInputStream.resolveClass` -> `VM.latestUserDefinedLoader0`. And it takes a measurable time to walk the stack. There is JDK-8173368 that wants to replace it with `StackWalker`, but we can tune up the `JVM_LatestUserDefinedLoader` itself without changing the semantics of it (thus providing the backportability, including the releases that do not have `StackWalker`) and improving performance (thus providing a more aggressive baseline for `StackWalker` rewrite). > > The key is to recognize that out of two checks: 1) checking for two special subclasses; 2) checking for user classloader -- the first one usually passes, and second one fails much more frequently. First check also requires traversing the superclasses upwards looking for match. Reversing the order of the checks, plus inlining the helper method improves performance without changing the semantics. > > Out of curiosity, my previous patch dropped the first check completely, replacing it by asserts, and we definitely run into situation where that check is needed on some tests. > > On my machine, `VM.latestUserDefinedLoader` invocation time drops from 115 to 100 ns/op. Single-threaded SPECjvm2008:serial improves about 3% with this patch. > > Additional testing: > - [x] Ad-hoc benchmarks > - [x] Linux x86_64 fastdebug, `tier1`, `tier2`, `tier3` > > --------- > ### Progress > - [x] Change must not contain extraneous whitespace > - [x] Commit message must refer to an issue > - [ ] Change must be properly reviewed > > > > ### Download > `$ git fetch https://git.openjdk.java.net/jdk pull/2485/head:pull/2485` > `$ git checkout pull/2485` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Added a comment ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2485/files - new: https://git.openjdk.java.net/jdk/pull/2485/files/fc333037..72e830a8 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2485&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2485&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2485.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2485/head:pull/2485 PR: https://git.openjdk.java.net/jdk/pull/2485 From shade at openjdk.java.net Wed Feb 10 07:35:00 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 10 Feb 2021 07:35:00 GMT Subject: RFR: 8261449: Micro-optimize JVM_LatestUserDefinedLoader [v2] In-Reply-To: <7ttxRIBoaZSM04YoncG5q6mH9fkNYO5lksu3O3WNObc=.14617a81-82d2-4b30-80ae-6ec7ba9744dd@github.com> References: <3EpwZwOAE3WA0NyFG6b2UcZLdKcBihJ82eKMj0gunBE=.3435d3cc-f9f4-4994-855b-ce1ce6e6f93f@github.com> <7ttxRIBoaZSM04YoncG5q6mH9fkNYO5lksu3O3WNObc=.14617a81-82d2-4b30-80ae-6ec7ba9744dd@github.com> Message-ID: On Wed, 10 Feb 2021 01:29:51 GMT, David Holmes wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Added a comment > > src/hotspot/share/prims/jvm.cpp line 3293: > >> 3291: oop loader = ik->class_loader(); >> 3292: if (loader != NULL && !SystemDictionary::is_platform_class_loader(loader)) { >> 3293: if (!ik->is_subclass_of(vmClasses::reflect_MethodAccessorImpl_klass()) && > > Please add a comment: > // Skip reflection related frames Added! ------------- PR: https://git.openjdk.java.net/jdk/pull/2485 From stuefe at openjdk.java.net Wed Feb 10 07:42:57 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 10 Feb 2021 07:42:57 GMT Subject: RFR: JDK-8261302: NMT: Improve malloc site table hashing [v3] In-Reply-To: <3vDpj4wrk0G9gSbO8IvCZ-9N8S9LZBEUujqdvL1Ucbs=.903d98bc-3cef-425c-b65b-29c5dc9d2d4c@github.com> References: <3vDpj4wrk0G9gSbO8IvCZ-9N8S9LZBEUujqdvL1Ucbs=.903d98bc-3cef-425c-b65b-29c5dc9d2d4c@github.com> Message-ID: > While looking at NMT tuning statistics, I saw longish bucket chains in the malloc site table and looked whether this can be improved. > > The current hash algorithm uses the 32bit masked sum of all stack entries as hash. > > I first experimented with different hash algorithms on different platforms (x86 and ppc, the latter because it has uniform op sizes) and did actually not find a noticeable improvement over what NMT does now. It seems that using the raw code pointers as base for the hash gives us already enough entropy. Avg load factor of the table always hovered around what was expected. Regardless of the hash I tried I was not able to get rid of the few longer chains. > > The biggest improvement brought an experimental table size increase: currently the table size is 511 pointer slots (~4K). Quadrupling the size would bring the load factor down to 1-2. However, there is this comment in mallocSiteTable.hpp: > > https://github.com/openjdk/jdk/blob/5183d8ae1eec86202eace2c4770f81edbc73cb68/src/hotspot/share/services/mallocSiteTable.hpp#L118 > > wich claims that a load factor of 6 is what is aimed for and deemed acceptable. So I am not going to touch that here (even though 4 or 12K more may be an okay price to pay for more efficient lookups. > > --- > > With that out of the way, there are still small things we can improve about the hash function: > > Since the vast majority of `NativeCallStack` objects will always need a hash code, it makes no sense to delay its calculation. By doing the hash code calculation in the constructor we can make `NativeCallStack::hash()` a simple inline getter. > > When calculating the hash code, I also omitted the "if stack address is 0 stop" logic. The vast majority of call stacks have the full size and nothing much is gained from omitting those 0 values from the hash code calculation. > > See difference (linux x86): > > Before: > > Dump of assembler code for function NativeCallStack::hash() const: > => 0x00007ffff68092f0 <+0>: mov 0x20(%rdi),%eax 0x00007ffff68092f3 <+3>: push %rbp > 0x00007ffff68092f4 <+4>: mov %rsp,%rbp > 0x00007ffff68092f7 <+7>: test %eax,%eax <0?->X > 0x00007ffff68092f9 <+9>: jne 0x7ffff6809324 > 0x00007ffff68092fb <+11>: mov (%rdi),%rdx > 0x00007ffff68092fe <+14>: test %rdx,%rdx > 0x00007ffff6809301 <+17>: je 0x7ffff6809330 > 0x00007ffff6809303 <+19>: mov 0x8(%rdi),%rax > 0x00007ffff6809307 <+23>: test %rax,%rax > 0x00007ffff680930a <+26>: je 0x7ffff680931f > 0x00007ffff680930c <+28>: add %rax,%rdx > 0x00007ffff680930f <+31>: mov 0x10(%rdi),%rax > 0x00007ffff6809313 <+35>: test %rax,%rax > 0x00007ffff6809316 <+38>: je 0x7ffff680931f > 0x00007ffff6809318 <+40>: add %rax,%rdx > 0x00007ffff680931b <+43>: add 0x18(%rdi),%rdx > 0x00007ffff680931f <+47>: mov %edx,%eax > 0x00007ffff6809321 <+49>: mov %edx,0x20(%rdi) > 0x00007ffff6809324 <+52>: pop %rbp 0x00007ffff6809325 <+53>: retq > 0x00007ffff6809326 <+54>: nopw %cs:0x0(%rax,%rax,1) > 0x00007ffff6809330 <+64>: xor %edx,%edx > 0x00007ffff6809332 <+66>: jmp 0x7ffff680931f > > hash() getter is not inlined; it queries the hash code each time and, when calculating it, uses simple adds interspersed with conditional jumps because of the "if stack address is 0 stop" logic. > > With this patch, the `NativeCallStack::hash()` gets inlined at the call sites to a simple load. > The hash calculation gets now inlined into the constructor and uses a series of simple adds now: > > Dump of assembler code for function NativeCallStack::NativeCallStack(int, bool): > => 0x00007ffff67cc9a0 <+0>: push %rbp > 0x00007ffff67cc9a1 <+1>: mov %rsp,%rbp > 0x00007ffff67cc9a4 <+4>: push %rbx > 0x00007ffff67cc9a5 <+5>: mov %rdi,%rbx > 0x00007ffff67cc9a8 <+8>: sub $0x8,%rsp > 0x00007ffff67cc9ac <+12>: test %dl,%dl > 0x00007ffff67cc9ae <+14>: movl $0x0,0x20(%rdi) > 0x00007ffff67cc9b5 <+21>: jne 0x7ffff67cc9f0 > 0x00007ffff67cc9b7 <+23>: movq $0x0,(%rdi) > 0x00007ffff67cc9be <+30>: movq $0x0,0x8(%rdi) > 0x00007ffff67cc9c6 <+38>: movq $0x0,0x10(%rdi) > 0x00007ffff67cc9ce <+46>: movq $0x0,0x18(%rdi) > 0x00007ffff67cc9d6 <+54>: mov 0x10(%rbx),%rax <<< > 0x00007ffff67cc9da <+58>: add 0x8(%rbx),%rax <<< > 0x00007ffff67cc9de <+62>: add 0x18(%rbx),%rax <<< > 0x00007ffff67cc9e2 <+66>: add (%rbx),%rax <<< > 0x00007ffff67cc9e5 <+69>: mov %eax,0x20(%rbx) > 0x00007ffff67cc9e8 <+72>: add $0x8,%rsp > 0x00007ffff67cc9ec <+76>: pop %rbx > 0x00007ffff67cc9ed <+77>: pop %rbp > 0x00007ffff67cc9ee <+78>: retq > 0x00007ffff67cc9ef <+79>: nop > 0x00007ffff67cc9f0 <+80>: mov %esi,%edx > 0x00007ffff67cc9f2 <+82>: mov $0x4,%esi > 0x00007ffff67cc9f7 <+87>: callq 0x7ffff68250d0 > 0x00007ffff67cc9fc <+92>: jmp 0x7ffff67cc9d6 > End of assembler dump. > > Thanks, Thomas Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: unsigned -> unsigned int ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2473/files - new: https://git.openjdk.java.net/jdk/pull/2473/files/309d6c0a..e56b85d3 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2473&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2473&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2473.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2473/head:pull/2473 PR: https://git.openjdk.java.net/jdk/pull/2473 From stuefe at openjdk.java.net Wed Feb 10 07:42:58 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 10 Feb 2021 07:42:58 GMT Subject: RFR: JDK-8261302: NMT: Improve malloc site table hashing [v2] In-Reply-To: References: <3vDpj4wrk0G9gSbO8IvCZ-9N8S9LZBEUujqdvL1Ucbs=.903d98bc-3cef-425c-b65b-29c5dc9d2d4c@github.com> <8IRE-aXvXSOs_MRNdAplymbZHr8yT1zqWVoKlMX55ck=.0eb8db60-a204-4d80-96bf-742833965da2@github.com> Message-ID: On Wed, 10 Feb 2021 07:28:33 GMT, Lutz Schmidt wrote: > LGTM, except for the one comment I added (it's not a "must", it's a "please"). Thanks Lucy! > src/hotspot/share/utilities/nativeCallStack.cpp line 31: > >> 29: #include "utilities/nativeCallStack.hpp" >> 30: >> 31: static unsigned calculate_hash(address stack[NMT_TrackingStackDepth]) { > > I know it doesn't change anything semantically, but I'd like to see the int type specifier. Sure, I'll change it before pushing. ------------- PR: https://git.openjdk.java.net/jdk/pull/2473 From stuefe at openjdk.java.net Wed Feb 10 07:49:36 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 10 Feb 2021 07:49:36 GMT Subject: Integrated: JDK-8261302: NMT: Improve malloc site table hashing In-Reply-To: <3vDpj4wrk0G9gSbO8IvCZ-9N8S9LZBEUujqdvL1Ucbs=.903d98bc-3cef-425c-b65b-29c5dc9d2d4c@github.com> References: <3vDpj4wrk0G9gSbO8IvCZ-9N8S9LZBEUujqdvL1Ucbs=.903d98bc-3cef-425c-b65b-29c5dc9d2d4c@github.com> Message-ID: On Tue, 9 Feb 2021 07:17:00 GMT, Thomas Stuefe wrote: > While looking at NMT tuning statistics, I saw longish bucket chains in the malloc site table and looked whether this can be improved. > > The current hash algorithm uses the 32bit masked sum of all stack entries as hash. > > I first experimented with different hash algorithms on different platforms (x86 and ppc, the latter because it has uniform op sizes) and did actually not find a noticeable improvement over what NMT does now. It seems that using the raw code pointers as base for the hash gives us already enough entropy. Avg load factor of the table always hovered around what was expected. Regardless of the hash I tried I was not able to get rid of the few longer chains. > > The biggest improvement brought an experimental table size increase: currently the table size is 511 pointer slots (~4K). Quadrupling the size would bring the load factor down to 1-2. However, there is this comment in mallocSiteTable.hpp: > > https://github.com/openjdk/jdk/blob/5183d8ae1eec86202eace2c4770f81edbc73cb68/src/hotspot/share/services/mallocSiteTable.hpp#L118 > > wich claims that a load factor of 6 is what is aimed for and deemed acceptable. So I am not going to touch that here (even though 4 or 12K more may be an okay price to pay for more efficient lookups. > > --- > > With that out of the way, there are still small things we can improve about the hash function: > > Since the vast majority of `NativeCallStack` objects will always need a hash code, it makes no sense to delay its calculation. By doing the hash code calculation in the constructor we can make `NativeCallStack::hash()` a simple inline getter. > > When calculating the hash code, I also omitted the "if stack address is 0 stop" logic. The vast majority of call stacks have the full size and nothing much is gained from omitting those 0 values from the hash code calculation. > > See difference (linux x86): > > Before: > > Dump of assembler code for function NativeCallStack::hash() const: > => 0x00007ffff68092f0 <+0>: mov 0x20(%rdi),%eax 0x00007ffff68092f3 <+3>: push %rbp > 0x00007ffff68092f4 <+4>: mov %rsp,%rbp > 0x00007ffff68092f7 <+7>: test %eax,%eax <0?->X > 0x00007ffff68092f9 <+9>: jne 0x7ffff6809324 > 0x00007ffff68092fb <+11>: mov (%rdi),%rdx > 0x00007ffff68092fe <+14>: test %rdx,%rdx > 0x00007ffff6809301 <+17>: je 0x7ffff6809330 > 0x00007ffff6809303 <+19>: mov 0x8(%rdi),%rax > 0x00007ffff6809307 <+23>: test %rax,%rax > 0x00007ffff680930a <+26>: je 0x7ffff680931f > 0x00007ffff680930c <+28>: add %rax,%rdx > 0x00007ffff680930f <+31>: mov 0x10(%rdi),%rax > 0x00007ffff6809313 <+35>: test %rax,%rax > 0x00007ffff6809316 <+38>: je 0x7ffff680931f > 0x00007ffff6809318 <+40>: add %rax,%rdx > 0x00007ffff680931b <+43>: add 0x18(%rdi),%rdx > 0x00007ffff680931f <+47>: mov %edx,%eax > 0x00007ffff6809321 <+49>: mov %edx,0x20(%rdi) > 0x00007ffff6809324 <+52>: pop %rbp 0x00007ffff6809325 <+53>: retq > 0x00007ffff6809326 <+54>: nopw %cs:0x0(%rax,%rax,1) > 0x00007ffff6809330 <+64>: xor %edx,%edx > 0x00007ffff6809332 <+66>: jmp 0x7ffff680931f > > hash() getter is not inlined; it queries the hash code each time and, when calculating it, uses simple adds interspersed with conditional jumps because of the "if stack address is 0 stop" logic. > > With this patch, the `NativeCallStack::hash()` gets inlined at the call sites to a simple load. > The hash calculation gets now inlined into the constructor and uses a series of simple adds now: > > Dump of assembler code for function NativeCallStack::NativeCallStack(int, bool): > => 0x00007ffff67cc9a0 <+0>: push %rbp > 0x00007ffff67cc9a1 <+1>: mov %rsp,%rbp > 0x00007ffff67cc9a4 <+4>: push %rbx > 0x00007ffff67cc9a5 <+5>: mov %rdi,%rbx > 0x00007ffff67cc9a8 <+8>: sub $0x8,%rsp > 0x00007ffff67cc9ac <+12>: test %dl,%dl > 0x00007ffff67cc9ae <+14>: movl $0x0,0x20(%rdi) > 0x00007ffff67cc9b5 <+21>: jne 0x7ffff67cc9f0 > 0x00007ffff67cc9b7 <+23>: movq $0x0,(%rdi) > 0x00007ffff67cc9be <+30>: movq $0x0,0x8(%rdi) > 0x00007ffff67cc9c6 <+38>: movq $0x0,0x10(%rdi) > 0x00007ffff67cc9ce <+46>: movq $0x0,0x18(%rdi) > 0x00007ffff67cc9d6 <+54>: mov 0x10(%rbx),%rax <<< > 0x00007ffff67cc9da <+58>: add 0x8(%rbx),%rax <<< > 0x00007ffff67cc9de <+62>: add 0x18(%rbx),%rax <<< > 0x00007ffff67cc9e2 <+66>: add (%rbx),%rax <<< > 0x00007ffff67cc9e5 <+69>: mov %eax,0x20(%rbx) > 0x00007ffff67cc9e8 <+72>: add $0x8,%rsp > 0x00007ffff67cc9ec <+76>: pop %rbx > 0x00007ffff67cc9ed <+77>: pop %rbp > 0x00007ffff67cc9ee <+78>: retq > 0x00007ffff67cc9ef <+79>: nop > 0x00007ffff67cc9f0 <+80>: mov %esi,%edx > 0x00007ffff67cc9f2 <+82>: mov $0x4,%esi > 0x00007ffff67cc9f7 <+87>: callq 0x7ffff68250d0 > 0x00007ffff67cc9fc <+92>: jmp 0x7ffff67cc9d6 > End of assembler dump. > > Thanks, Thomas This pull request has now been integrated. Changeset: a3d6e371 Author: Thomas Stuefe URL: https://git.openjdk.java.net/jdk/commit/a3d6e371 Stats: 29 lines in 2 files changed: 9 ins; 16 del; 4 mod 8261302: NMT: Improve malloc site table hashing Reviewed-by: zgu, lucy ------------- PR: https://git.openjdk.java.net/jdk/pull/2473 From ayang at openjdk.java.net Wed Feb 10 09:44:39 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Wed, 10 Feb 2021 09:44:39 GMT Subject: RFR: 8260941: Remove the conc_scan parameter for CardTable In-Reply-To: <1sGB_hdxutVE55IriJ2XK3krb4vsffzX3OKavt1UwBE=.53483ccd-9923-4ec3-a0d9-82e619c440f5@github.com> References: <1sGB_hdxutVE55IriJ2XK3krb4vsffzX3OKavt1UwBE=.53483ccd-9923-4ec3-a0d9-82e619c440f5@github.com> Message-ID: On Fri, 5 Feb 2021 09:52:25 GMT, Thomas Schatzl wrote: > Hi, > > can I have reviews for this removal of the last(?) CMS-specific code in CardTable, namely some provision to indicate that cards are being scanned concurrently in Serial/Parallel GC barrier code? > > The change simply follows the predicate into Serial/Parallel GC code which always returns false for them and removes that code. > > In the review for JDK-8234534 I mentioned that I split this out due to unexplainable errors; testing tier1-5 three times showed none of that any more (after updating to latest code). > > This change has only been built on Oracle-platforms and linux-x86 via github actions (https://github.com/tschatzl/jdk/actions/runs/539993964), so I would like to kindly ask maintainers of the others to compile and report issues (32 bit ARM, PPC etc). > > Testing: tier1-5 A side note: it seems that `G1BarrierSet` is the only subclass of `CardTableBarrierSet`. Maybe it makes sense to merge them into one. ------------- Marked as reviewed by ayang (Author). PR: https://git.openjdk.java.net/jdk/pull/2425 From dongbo at openjdk.java.net Wed Feb 10 09:59:55 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Wed, 10 Feb 2021 09:59:55 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v5] In-Reply-To: References: Message-ID: > In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero, > see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`: > /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */ > public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT); > > The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64, > assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 0`, but `ushr dst.4H, src.4H, 16` instead. > According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long. > ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i); > vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i); > > The legal right shift amount should be in the range 1 to the element width in bits on aarch64: > https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en > > This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate. > Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests. Dong Bo has updated the pull request incrementally with one additional commit since the last revision: fix windows build failure ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2472/files - new: https://git.openjdk.java.net/jdk/pull/2472/files/a7b72b0a..d75ee99e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2472.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2472/head:pull/2472 PR: https://git.openjdk.java.net/jdk/pull/2472 From stuefe at openjdk.java.net Wed Feb 10 10:07:41 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 10 Feb 2021 10:07:41 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v6] In-Reply-To: References: Message-ID: On Tue, 9 Feb 2021 05:43:05 GMT, David Holmes wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> Use universal zero initializer for do_check_signal_periodically > > Marked as reviewed by dholmes (Reviewer). Gentle ping.. ------------- PR: https://git.openjdk.java.net/jdk/pull/2251 From rkennke at openjdk.java.net Wed Feb 10 10:12:49 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 10 Feb 2021 10:12:49 GMT Subject: RFR: 8261448: Preserve GC stack watermark across safepoints in StackWalk Message-ID: I am observing the following assert: # Internal Error (/home/rkennke/src/openjdk/loom/src/hotspot/share/runtime/stackWatermark.cpp:178), pid=54418, tid=54534 # assert(is_frame_safe(f)) failed: Frame must be safe (see issue for full hs_err) In StackWalk::fetchNextBatch() we prepare the entire stack to be processed by calling StackWatermarkSet::finish_processing(jt, NULL, StackWatermarkKind::gc), but then subsequently, during frames scan, perform allocations to fill in the frame information (fill_in_frames => LiveFrameStream::fill_frame => fill_live_stackframe) at where we could safepoint for GC, which could reset the stack watermark. This is only relevant for GCs that use the StackWatermark, e.g. ZGC and Shenandoah at the moment. Solution is to preserve the stack-watermark across safepoints in StackWalk::fetchNextBatch(). StackWalk::fetchFirstBatch() doesn't look to be affected by this: it is not using the stack-watermark. Testing: - [x] StackWalk tests with Shenandoah/aggressive - [x] StackWalk tests with ZGC/aggressive - [ ] tier1 (+Shenandoah/ZGC) - [ ] tier2 (+Shenandoah/ZGC) ------------- Commit messages: - 8261448: Preserve GC stack watermark across safepoints in StackWalk Changes: https://git.openjdk.java.net/jdk/pull/2500/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2500&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261448 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2500.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2500/head:pull/2500 PR: https://git.openjdk.java.net/jdk/pull/2500 From sjohanss at openjdk.java.net Wed Feb 10 10:32:44 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 10 Feb 2021 10:32:44 GMT Subject: RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages Message-ID: When large pages are enabled on Linux (using -XX:+UseLargePages), both UseHugeTLBFS and UseSHM can be used. We prefer to use HugeTLBFS and first do a sanity check to see if this kind of large pages are available and if so we disable UseSHM. The problematic part is when HugeTLBFS pages are not available, then we disable this flag and without doing any sanity check for UseSHM, we mark large pages as enabled using SHM. One big problem with this is that SHM also requires the same type of explicitly allocated huge pages as HugeTLBFS and also privileges to lock memory. So it is likely that in the case of not being able to use HugeTLBFS we probably can't use SHM either. A fix for this would be to do a similar sanity check as currently done for HugeTLBFS and if it fails disable UseLargePages since we will always fail such allocation attempts anyways. The proposed sanity check consist of two part, where the first is just trying create a shared memory segment using `shmget()` with SHM_HUGETLB to use large pages. If this fails there is no idea in trying to use SHM to get large pages. The second part checks if the process has privileges to lock memory or if there will be a limit for the SHM usage. I think this would be a nice addition since it will notify the user about the limit and explain why large page mappings fail. The implementation parses `/proc/self/status` to make sure the needed capability is available. This change needs two tests to be updated to handle that large pages not can be disabled even when run with +UseLargePages. One of these tests are also updated in [PR#2486](https://github.com/openjdk/jdk/pull/2486) and I plan to get that integrated before this one. ------------- Commit messages: - 8261401-check-effective - 8261401-self-review - 8261401-test-fixes - Only UseSHM if sanity check pass Changes: https://git.openjdk.java.net/jdk/pull/2488/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2488&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261401 Stats: 92 lines in 4 files changed: 86 ins; 0 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/2488.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2488/head:pull/2488 PR: https://git.openjdk.java.net/jdk/pull/2488 From shade at openjdk.java.net Wed Feb 10 11:16:49 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 10 Feb 2021 11:16:49 GMT Subject: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering Message-ID: Shenandoah carries forwardee information in object's mark word. Installing the new mark word is effectively "releasing" the object copy, and reading from the new mark word is "acquiring" that object copy. For the forwardee update side, Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. This seems to be excessive for Shenandoah forwardee updates, and "release" is enough. For the forwardee load side, we need to guarantee "acquire". We do not do it now, reading the markword without memory semantics. It does not seem to pose a practical problem today, because GC does not access the object contents in the new copy, and mutators get this from the JRT-called stub that separates the fwdptr access and object contents access by a lot. It still should be cleaner to "acquire" the mark on load to avoid surprises. Sample run with `aggressive` (back-to-back cycles) on SPECjvm2008:compiler.compiler on AArch64: # Baseline [135.357s][info][gc,stats] Concurrent Evacuation = 17.459 s (a = 66132 us) (n = 264) (lvls, us = 152, 2402, 74414, 123047, 142021) [135.357s][info][gc,stats] Concurrent Update Refs = 72.774 s (a = 276708 us) (n = 263) ( lvls, us = 354, 3281, 259766, 548828, 720417) # Patched [135.923s][info][gc,stats] Concurrent Evacuation = 17.266 s (a = 61444 us) (n = 281) (lvls, us = 137, 2754, 42773, 119141, 142979) [135.923s][info][gc,stats] Concurrent Update Refs = 74.234 s (a = 265121 us) (n = 280) (lvls, us = 354, 3672, 132812, 558594, 748677) Average time goes down, the number of GC cycles go up, since the cycles are shorter. Additional testing: - [x] Linux x86_64 `hotspot_gc_shenandoah` - [x] Linux AArch64 `hotspot_gc_shenandoah` - [x] Linux AArch64 `tier1` with Shenandoah ------------- Commit messages: - 8261492: Shenandoah: reconsider forwardee accesses memory ordering Changes: https://git.openjdk.java.net/jdk/pull/2496/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2496&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261492 Stats: 17 lines in 3 files changed: 11 ins; 0 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/2496.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2496/head:pull/2496 PR: https://git.openjdk.java.net/jdk/pull/2496 From aph at openjdk.java.net Wed Feb 10 11:34:04 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 10 Feb 2021 11:34:04 GMT Subject: RFR: 8261027: AArch64: Support for LSE atomics C++ HotSpot code [v5] In-Reply-To: References: Message-ID: > Go back a few years, and there were simple atomic load/store exclusive > instructions on Arm. Say you want to do an atomic increment of a > counter. You'd do an atomic load to get the counter into your local cache > in exclusive state, increment that counter locally, then write that > incremented counter back to memory with an atomic store. All the time > that cache line was in exclusive state, so you're guaranteed that > no-one else changed anything on that cache line while you had it. > > This is hard to scale on a very large system (e.g. Fugaku) because if > many processors are incrementing that counter you get a lot of cache > line ping-ponging between cores. > > So, Arm decided to add a locked memory increment instruction that > works without needing to load an entire line into local cache. It's a > single instruction that loads, increments, and writes back. The secret > is to send a cache control message to whichever processor owns the > cache line containing the count, tell that processor to increment the > counter and return the incremented value. That way cache coherency > traffic is mimimized. This new set of instructions is known as Large > System Extensions, or LSE. > > Unfortunately, in recent processors, the "old" load/store exclusive > instructions, sometimes perform very badly. Therefore, it's now > necessary for software to detect which version of Arm it's running > on, and use the "new" LSE instructions if they're available. Otherwise > performance can be very poor under heavy contention. > > GCC's -moutline-atomics does this by providing library calls which use > LSE if it's available, but this option is only provided on newer > versions of GCC. This is particularly problematic with older versions > of OpenJDK, which build using old GCC versions. > > Also, I suspect that some other operating systems could use this. > Perhaps not MacOS, given that all Apple CPUs support LSE, but > maybe Windows. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Move assembler stubs to linux-aarch64 dir. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2434/files - new: https://git.openjdk.java.net/jdk/pull/2434/files/da004adc..42926e4b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2434&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2434&range=03-04 Stats: 37 lines in 2 files changed: 37 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2434.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2434/head:pull/2434 PR: https://git.openjdk.java.net/jdk/pull/2434 From rkennke at openjdk.java.net Wed Feb 10 11:53:37 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 10 Feb 2021 11:53:37 GMT Subject: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering In-Reply-To: References: Message-ID: On Wed, 10 Feb 2021 08:55:39 GMT, Aleksey Shipilev wrote: > Shenandoah carries forwardee information in object's mark word. Installing the new mark word is effectively "releasing" the object copy, and reading from the new mark word is "acquiring" that object copy. > > For the forwardee update side, Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. This seems to be excessive for Shenandoah forwardee updates, and "release" is enough. > > For the forwardee load side, we need to guarantee "acquire". We do not do it now, reading the markword without memory semantics. It does not seem to pose a practical problem today, because GC does not access the object contents in the new copy, and mutators get this from the JRT-called stub that separates the fwdptr access and object contents access by a lot. It still should be cleaner to "acquire" the mark on load to avoid surprises. > > Sample run with `aggressive` (back-to-back cycles) on SPECjvm2008:compiler.compiler on AArch64: > > # Baseline > [135.357s][info][gc,stats] Concurrent Evacuation = 17.459 s (a = 66132 us) (n = 264) > (lvls, us = 152, 2402, 74414, 123047, 142021) > [135.357s][info][gc,stats] Concurrent Update Refs = 72.774 s (a = 276708 us) (n = 263) ( > lvls, us = 354, 3281, 259766, 548828, 720417) > > # Patched > [135.923s][info][gc,stats] Concurrent Evacuation = 17.266 s (a = 61444 us) (n = 281) > (lvls, us = 137, 2754, 42773, 119141, 142979) > [135.923s][info][gc,stats] Concurrent Update Refs = 74.234 s (a = 265121 us) (n = 280) > (lvls, us = 354, 3672, 132812, 558594, 748677) > > Average time goes down, the number of GC cycles go up, since the cycles are shorter. > > Additional testing: > - [x] Linux x86_64 `hotspot_gc_shenandoah` > - [x] Linux AArch64 `hotspot_gc_shenandoah` > - [x] Linux AArch64 `tier1` with Shenandoah Nice! Looks good to me! ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2496 From rkennke at openjdk.java.net Wed Feb 10 12:40:38 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 10 Feb 2021 12:40:38 GMT Subject: RFR: 8261448: Preserve GC stack watermark across safepoints in StackWalk In-Reply-To: References: Message-ID: <2X3mb-VkqGf_YYSIeb3n9pxXmocT1GkUYDYI_C8cOZo=.3f2fab17-f8f6-4860-a6b4-0a6bb6a1256f@github.com> On Wed, 10 Feb 2021 10:07:20 GMT, Roman Kennke wrote: > I am observing the following assert: > > # Internal Error (/home/rkennke/src/openjdk/loom/src/hotspot/share/runtime/stackWatermark.cpp:178), pid=54418, tid=54534 > # assert(is_frame_safe(f)) failed: Frame must be safe > > (see issue for full hs_err) > > In StackWalk::fetchNextBatch() we prepare the entire stack to be processed by calling StackWatermarkSet::finish_processing(jt, NULL, StackWatermarkKind::gc), but then subsequently, during frames scan, perform allocations to fill in the frame information (fill_in_frames => LiveFrameStream::fill_frame => fill_live_stackframe) at where we could safepoint for GC, which could reset the stack watermark. > > This is only relevant for GCs that use the StackWatermark, e.g. ZGC and Shenandoah at the moment. > > Solution is to preserve the stack-watermark across safepoints in StackWalk::fetchNextBatch(). StackWalk::fetchFirstBatch() doesn't look to be affected by this: it is not using the stack-watermark. > > Testing: > - [x] StackWalk tests with Shenandoah/aggressive > - [x] StackWalk tests with ZGC/aggressive > - [ ] tier1 (+Shenandoah/ZGC) > - [ ] tier2 (+Shenandoah/ZGC) I'm converting back to draft. The Loom tests (test/jdk/java/lang/Continuation/*) are still failing and it looks like fetchFirstBatch() does indeed require treatment, and it's complicated because fetchFirstBatch() may end up calling fetchNextBatch() and the KeepStackGCProcessedMark is not reentrant. ------------- PR: https://git.openjdk.java.net/jdk/pull/2500 From hseigel at openjdk.java.net Wed Feb 10 14:36:36 2021 From: hseigel at openjdk.java.net (Harold Seigel) Date: Wed, 10 Feb 2021 14:36:36 GMT Subject: RFR: 8260341: CDS dump VM init code does not check exceptions In-Reply-To: References: Message-ID: <_-r6tekItyqxQA5WKZFBj-b0IQRC6R0Nw7QBMZbBPw0=.6c253bd5-697b-4060-9026-3cf015a6dc09@github.com> On Wed, 10 Feb 2021 05:00:43 GMT, Ioi Lam wrote: > When CDS dumping is enabled, special initialization happens during VM init. However, many of these calls do not properly check for exception. Instead, they rely on the implicit knowledge that `metaspace::allocate()` will exit the VM when allocation fails during CDS dumping. This makes the code hard to understand and tightly coupled to `metaspace::allocate()`. > > The fix is: all code that makes allocation should be using CHECK macros, so each block of code can be individually understood without considering the behavior of `metaspace::allocate()`. > > I added `TRAPS` to a bunch of CDS-related functions that are called during VM init. In some cases, I changed `Thread* THREAD` to `TRAPS`. This also eliminated a few `Thread* THREAD = Thread::current()` calls. > > The "root" of these calls, such as `MetaspaceShared::prepare_for_dumping()`, now follows this pattern: > > EXCEPTION_MARK; > ClassLoader::initialize_shared_path(THREAD); > if (HAS_PENDING_EXCEPTION) { > java_lang_Throwable::print(PENDING_EXCEPTION, tty); > vm_exit_during_initialization("ClassLoader::initialize_shared_path() failed unexpectedly"); > } Hi Ioi, Do you avoid using CHECK if it's the last line of a function? For example, why is THREAD used instead of CHECK at line 1506? Thanks, Harold 1503 void ClassLoader::initialize_module_path(TRAPS) { 1504 if (Arguments::is_dumping_archive()) { 1505 ClassLoaderExt::setup_module_paths(CHECK); 1506 FileMapInfo::allocate_shared_path_table(THREAD); 1507 } 1508 } ------------- PR: https://git.openjdk.java.net/jdk/pull/2494 From burban at openjdk.java.net Wed Feb 10 14:37:42 2021 From: burban at openjdk.java.net (Bernhard Urban-Forster) Date: Wed, 10 Feb 2021 14:37:42 GMT Subject: RFR: 8261027: AArch64: Support for LSE atomics C++ HotSpot code [v5] In-Reply-To: References: Message-ID: On Wed, 10 Feb 2021 11:34:04 GMT, Andrew Haley wrote: >> Go back a few years, and there were simple atomic load/store exclusive >> instructions on Arm. Say you want to do an atomic increment of a >> counter. You'd do an atomic load to get the counter into your local cache >> in exclusive state, increment that counter locally, then write that >> incremented counter back to memory with an atomic store. All the time >> that cache line was in exclusive state, so you're guaranteed that >> no-one else changed anything on that cache line while you had it. >> >> This is hard to scale on a very large system (e.g. Fugaku) because if >> many processors are incrementing that counter you get a lot of cache >> line ping-ponging between cores. >> >> So, Arm decided to add a locked memory increment instruction that >> works without needing to load an entire line into local cache. It's a >> single instruction that loads, increments, and writes back. The secret >> is to send a cache control message to whichever processor owns the >> cache line containing the count, tell that processor to increment the >> counter and return the incremented value. That way cache coherency >> traffic is mimimized. This new set of instructions is known as Large >> System Extensions, or LSE. >> >> Unfortunately, in recent processors, the "old" load/store exclusive >> instructions, sometimes perform very badly. Therefore, it's now >> necessary for software to detect which version of Arm it's running >> on, and use the "new" LSE instructions if they're available. Otherwise >> performance can be very poor under heavy contention. >> >> GCC's -moutline-atomics does this by providing library calls which use >> LSE if it's available, but this option is only provided on newer >> versions of GCC. This is particularly problematic with older versions >> of OpenJDK, which build using old GCC versions. >> >> Also, I suspect that some other operating systems could use this. >> Perhaps not MacOS, given that all Apple CPUs support LSE, but >> maybe Windows. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Move assembler stubs to linux-aarch64 dir. src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.hpp line 257: > 255: > 256: #ifndef CPU_AARCH64_ATOMIC_AARCH64_HPP > 257: #define CPU_AARCH64_ATOMIC_AARCH64_HPP something went wrong here. ------------- PR: https://git.openjdk.java.net/jdk/pull/2434 From stuefe at openjdk.java.net Wed Feb 10 14:44:40 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 10 Feb 2021 14:44:40 GMT Subject: RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages In-Reply-To: References: Message-ID: On Tue, 9 Feb 2021 20:50:25 GMT, Stefan Johansson wrote: > When large pages are enabled on Linux (using -XX:+UseLargePages), both UseHugeTLBFS and UseSHM can be used. We prefer to use HugeTLBFS and first do a sanity check to see if this kind of large pages are available and if so we disable UseSHM. > > The problematic part is when HugeTLBFS pages are not available, then we disable this flag and without doing any sanity check for UseSHM, we mark large pages as enabled using SHM. One big problem with this is that SHM also requires the same type of explicitly allocated huge pages as HugeTLBFS and also privileges to lock memory. So it is likely that in the case of not being able to use HugeTLBFS we probably can't use SHM either. > > A fix for this would be to do a similar sanity check as currently done for HugeTLBFS and if it fails disable UseLargePages since we will always fail such allocation attempts anyways. > > The proposed sanity check consist of two part, where the first is just trying create a shared memory segment using `shmget()` with SHM_HUGETLB to use large pages. If this fails there is no idea in trying to use SHM to get large pages. > > The second part checks if the process has privileges to lock memory or if there will be a limit for the SHM usage. I think this would be a nice addition since it will notify the user about the limit and explain why large page mappings fail. The implementation parses `/proc/self/status` to make sure the needed capability is available. > > This change needs two tests to be updated to handle that large pages not can be disabled even when run with +UseLargePages. One of these tests are also updated in [PR#2486](https://github.com/openjdk/jdk/pull/2486) and I plan to get that integrated before this one. Hi Stefan, Testing UseSHM makes definitely sense. Since SysV shm has more restrictions than the mmap API I am not sure that the former works where the latter does not. About your patch: The shmget test is good, I am not sure about the rest. You first attempt to `shmget(IPC_PRIVATE, page_size, SHM_HUGETLB)`. Then you separately scan for CAP_IPC_LOCK. But would shmget(SHM_HUGETLB) not fail with EPERM if CAP_IPC_LOCK is missing? man shmget says: EPERM The SHM_HUGETLB flag was specified, but the caller was not privileged (did not have the CAP_IPC_LOCK capability). So I think querying for EPERM after shmget should be sufficient? If CAP_IPC_LOCK was missing, the shmget should have failed and we should never get to the point of can_lock_memory(). About the rlimit test: Assuming the man page is wrong and shmget succeeds even without the capability, you only print this limit if can_lock_memory() returned false. The way I understand this is that the limit can still be the limiting factor even with those capabilities in place. I think it would make more sense to extend the error reporting if later real "shmget" calls fail. Note that later reserve calls can fail for a number of reasons (huge page pool exhausted, RLIMIT_MEMLOCK reached, SHMMAX or SHMALL reached, commit charge reached its limit...) which means that just reporting on RLIMIT_MEMLOCK would be arbitrary. Whether or not more intelligent error reporting makes sense depends also on whether we think we still need the SysV path at all. I personally doubt that this still makes sense. Cheers, Thomas src/hotspot/os/linux/os_linux.cpp line 3569: > 3567: // The capability needed to lock memory CAP_IPC_LOCK > 3568: #define CAP_IPC_LOCK 14 > 3569: #define CAP_IPC_LOCK_BIT (1 << CAP_IPC_LOCK) Can we not just include linux/capabilities.h ? ------------- PR: https://git.openjdk.java.net/jdk/pull/2488 From aph at openjdk.java.net Wed Feb 10 14:52:41 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 10 Feb 2021 14:52:41 GMT Subject: RFR: 8261027: AArch64: Support for LSE atomics C++ HotSpot code [v5] In-Reply-To: References: Message-ID: On Wed, 10 Feb 2021 14:34:37 GMT, Bernhard Urban-Forster wrote: >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> Move assembler stubs to linux-aarch64 dir. > > src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.hpp line 257: > >> 255: >> 256: #ifndef CPU_AARCH64_ATOMIC_AARCH64_HPP >> 257: #define CPU_AARCH64_ATOMIC_AARCH64_HPP > > something went wrong here. I moved it. ------------- PR: https://git.openjdk.java.net/jdk/pull/2434 From aph at openjdk.java.net Wed Feb 10 15:12:08 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 10 Feb 2021 15:12:08 GMT Subject: RFR: 8261027: AArch64: Support for LSE atomics C++ HotSpot code [v6] In-Reply-To: References: Message-ID: > Go back a few years, and there were simple atomic load/store exclusive > instructions on Arm. Say you want to do an atomic increment of a > counter. You'd do an atomic load to get the counter into your local cache > in exclusive state, increment that counter locally, then write that > incremented counter back to memory with an atomic store. All the time > that cache line was in exclusive state, so you're guaranteed that > no-one else changed anything on that cache line while you had it. > > This is hard to scale on a very large system (e.g. Fugaku) because if > many processors are incrementing that counter you get a lot of cache > line ping-ponging between cores. > > So, Arm decided to add a locked memory increment instruction that > works without needing to load an entire line into local cache. It's a > single instruction that loads, increments, and writes back. The secret > is to send a cache control message to whichever processor owns the > cache line containing the count, tell that processor to increment the > counter and return the incremented value. That way cache coherency > traffic is mimimized. This new set of instructions is known as Large > System Extensions, or LSE. > > Unfortunately, in recent processors, the "old" load/store exclusive > instructions, sometimes perform very badly. Therefore, it's now > necessary for software to detect which version of Arm it's running > on, and use the "new" LSE instructions if they're available. Otherwise > performance can be very poor under heavy contention. > > GCC's -moutline-atomics does this by providing library calls which use > LSE if it's available, but this option is only provided on newer > versions of GCC. This is particularly problematic with older versions > of OpenJDK, which build using old GCC versions. > > Also, I suspect that some other operating systems could use this. > Perhaps not MacOS, given that all Apple CPUs support LSE, but > maybe Windows. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: #ifdef LINUX for now. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2434/files - new: https://git.openjdk.java.net/jdk/pull/2434/files/42926e4b..d156d852 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2434&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2434&range=04-05 Stats: 34 lines in 2 files changed: 6 ins; 27 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2434.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2434/head:pull/2434 PR: https://git.openjdk.java.net/jdk/pull/2434 From aph at openjdk.java.net Wed Feb 10 15:12:08 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 10 Feb 2021 15:12:08 GMT Subject: RFR: 8261027: AArch64: Support for LSE atomics C++ HotSpot code [v5] In-Reply-To: References: Message-ID: On Wed, 10 Feb 2021 14:49:42 GMT, Andrew Haley wrote: >> src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.hpp line 257: >> >>> 255: >>> 256: #ifndef CPU_AARCH64_ATOMIC_AARCH64_HPP >>> 257: #define CPU_AARCH64_ATOMIC_AARCH64_HPP >> >> something went wrong here. > > I moved it. I've "ifdef LINUX'ed" everything for now. If you want to take any of this into Windows, you can. ------------- PR: https://git.openjdk.java.net/jdk/pull/2434 From aph at openjdk.java.net Wed Feb 10 15:20:02 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 10 Feb 2021 15:20:02 GMT Subject: RFR: 8261027: AArch64: Support for LSE atomics C++ HotSpot code [v7] In-Reply-To: References: Message-ID: > Go back a few years, and there were simple atomic load/store exclusive > instructions on Arm. Say you want to do an atomic increment of a > counter. You'd do an atomic load to get the counter into your local cache > in exclusive state, increment that counter locally, then write that > incremented counter back to memory with an atomic store. All the time > that cache line was in exclusive state, so you're guaranteed that > no-one else changed anything on that cache line while you had it. > > This is hard to scale on a very large system (e.g. Fugaku) because if > many processors are incrementing that counter you get a lot of cache > line ping-ponging between cores. > > So, Arm decided to add a locked memory increment instruction that > works without needing to load an entire line into local cache. It's a > single instruction that loads, increments, and writes back. The secret > is to send a cache control message to whichever processor owns the > cache line containing the count, tell that processor to increment the > counter and return the incremented value. That way cache coherency > traffic is mimimized. This new set of instructions is known as Large > System Extensions, or LSE. > > Unfortunately, in recent processors, the "old" load/store exclusive > instructions, sometimes perform very badly. Therefore, it's now > necessary for software to detect which version of Arm it's running > on, and use the "new" LSE instructions if they're available. Otherwise > performance can be very poor under heavy contention. > > GCC's -moutline-atomics does this by providing library calls which use > LSE if it's available, but this option is only provided on newer > versions of GCC. This is particularly problematic with older versions > of OpenJDK, which build using old GCC versions. > > Also, I suspect that some other operating systems could use this. > Perhaps not MacOS, given that all Apple CPUs support LSE, but > maybe Windows. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: #ifdef LINUX for now. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2434/files - new: https://git.openjdk.java.net/jdk/pull/2434/files/d156d852..81d35fe7 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2434&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2434&range=05-06 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2434.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2434/head:pull/2434 PR: https://git.openjdk.java.net/jdk/pull/2434 From stuefe at openjdk.java.net Wed Feb 10 15:29:40 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 10 Feb 2021 15:29:40 GMT Subject: RFR: 8261449: Micro-optimize JVM_LatestUserDefinedLoader [v2] In-Reply-To: References: <3EpwZwOAE3WA0NyFG6b2UcZLdKcBihJ82eKMj0gunBE=.3435d3cc-f9f4-4994-855b-ce1ce6e6f93f@github.com> Message-ID: On Wed, 10 Feb 2021 07:34:59 GMT, Aleksey Shipilev wrote: >> `JVM_LatestUserDefinedLoader` is called normally from `ObjectInputStream.resolveClass` -> `VM.latestUserDefinedLoader0`. And it takes a measurable time to walk the stack. There is JDK-8173368 that wants to replace it with `StackWalker`, but we can tune up the `JVM_LatestUserDefinedLoader` itself without changing the semantics of it (thus providing the backportability, including the releases that do not have `StackWalker`) and improving performance (thus providing a more aggressive baseline for `StackWalker` rewrite). >> >> The key is to recognize that out of two checks: 1) checking for two special subclasses; 2) checking for user classloader -- the first one usually passes, and second one fails much more frequently. First check also requires traversing the superclasses upwards looking for match. Reversing the order of the checks, plus inlining the helper method improves performance without changing the semantics. >> >> Out of curiosity, my previous patch dropped the first check completely, replacing it by asserts, and we definitely run into situation where that check is needed on some tests. >> >> On my machine, `VM.latestUserDefinedLoader` invocation time drops from 115 to 100 ns/op. Single-threaded SPECjvm2008:serial improves about 3% with this patch. >> >> Additional testing: >> - [x] Ad-hoc benchmarks >> - [x] Linux x86_64 fastdebug, `tier1`, `tier2`, `tier3` >> >> --------- >> ### Progress >> - [x] Change must not contain extraneous whitespace >> - [x] Commit message must refer to an issue >> - [ ] Change must be properly reviewed >> >> >> >> ### Download >> `$ git fetch https://git.openjdk.java.net/jdk pull/2485/head:pull/2485` >> `$ git checkout pull/2485` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Added a comment Looks good. I find it simpler too. You could run the tests with sun.reflect.inflationThreshold=0. Should increase the chance of encountering reflection delegator loaders a lot. Cheers, Thomas ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2485 From aph at redhat.com Wed Feb 10 16:07:16 2021 From: aph at redhat.com (Andrew Haley) Date: Wed, 10 Feb 2021 16:07:16 +0000 Subject: Atomic operations: your thoughts are welocme In-Reply-To: References: Message-ID: Oh, sorry. This is my favourite benchmark, javac all of java.base. I'm mostly using that because it's easy to run without any external dependencies, and it loads a lot of classes. It's no better or worse than any other random program. On 2/10/21 6:44 AM, Ioi Lam wrote: > Just curious, which benchmark is this? > > Thanks > - Ioi > > On 2/8/21 10:14 AM, Andrew Haley wrote: >> I've been looking at the hottest Atomic operations in HotSpot, with a view to >> finding out if the default memory_order_conservative (which is very expensive >> on some architectures) can be weakened to something less. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From sjohanss at openjdk.java.net Wed Feb 10 16:22:40 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 10 Feb 2021 16:22:40 GMT Subject: RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages In-Reply-To: References: Message-ID: On Wed, 10 Feb 2021 13:56:11 GMT, Thomas Stuefe wrote: >> When large pages are enabled on Linux (using -XX:+UseLargePages), both UseHugeTLBFS and UseSHM can be used. We prefer to use HugeTLBFS and first do a sanity check to see if this kind of large pages are available and if so we disable UseSHM. >> >> The problematic part is when HugeTLBFS pages are not available, then we disable this flag and without doing any sanity check for UseSHM, we mark large pages as enabled using SHM. One big problem with this is that SHM also requires the same type of explicitly allocated huge pages as HugeTLBFS and also privileges to lock memory. So it is likely that in the case of not being able to use HugeTLBFS we probably can't use SHM either. >> >> A fix for this would be to do a similar sanity check as currently done for HugeTLBFS and if it fails disable UseLargePages since we will always fail such allocation attempts anyways. >> >> The proposed sanity check consist of two part, where the first is just trying create a shared memory segment using `shmget()` with SHM_HUGETLB to use large pages. If this fails there is no idea in trying to use SHM to get large pages. >> >> The second part checks if the process has privileges to lock memory or if there will be a limit for the SHM usage. I think this would be a nice addition since it will notify the user about the limit and explain why large page mappings fail. The implementation parses `/proc/self/status` to make sure the needed capability is available. >> >> This change needs two tests to be updated to handle that large pages not can be disabled even when run with +UseLargePages. One of these tests are also updated in [PR#2486](https://github.com/openjdk/jdk/pull/2486) and I plan to get that integrated before this one. > > src/hotspot/os/linux/os_linux.cpp line 3569: > >> 3567: // The capability needed to lock memory CAP_IPC_LOCK >> 3568: #define CAP_IPC_LOCK 14 >> 3569: #define CAP_IPC_LOCK_BIT (1 << CAP_IPC_LOCK) > > Can we not just include linux/capabilities.h ? We can, I'll change it. ------------- PR: https://git.openjdk.java.net/jdk/pull/2488 From burban at openjdk.java.net Wed Feb 10 16:23:38 2021 From: burban at openjdk.java.net (Bernhard Urban-Forster) Date: Wed, 10 Feb 2021 16:23:38 GMT Subject: RFR: 8261027: AArch64: Support for LSE atomics C++ HotSpot code [v5] In-Reply-To: References: Message-ID: On Wed, 10 Feb 2021 15:07:35 GMT, Andrew Haley wrote: >> I moved it. > > I've "ifdef LINUX'ed" everything for now. If you want to take any of this into Windows, you can. Yes, we'll have a look. For now I suggest to go with the Linux ifdef guard, I don't want us to be a blocker for your PR ?? ------------- PR: https://git.openjdk.java.net/jdk/pull/2434 From lucy at openjdk.java.net Wed Feb 10 16:32:45 2021 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Wed, 10 Feb 2021 16:32:45 GMT Subject: RFR: JDK-8261447: MethodInvocationCounters frequently run into overflow Message-ID: Dear community, may I please request reviews for this fix, improving the usefulness of method invocation counters. - aggregation counters are retyped as uint64_t, shifting the overflow probability way out (185 days in case of a 1 GHz counter update frequency). - counters for individual methods are interpreted as (unsigned int), in contrast to their declaration as int. This gives us a factor of two before the counters overflow. - as a special case, "compiled_invocation_counter" is retyped as long, because it has a higher update frequency than other counters. - before/after sample output is attached to the bug description. Thank you! Lutz ------------- Commit messages: - JDK-8261447: MethodInvocationCounters frequently run into overflow Changes: https://git.openjdk.java.net/jdk/pull/2511/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2511&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261447 Stats: 184 lines in 8 files changed: 89 ins; 3 del; 92 mod Patch: https://git.openjdk.java.net/jdk/pull/2511.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2511/head:pull/2511 PR: https://git.openjdk.java.net/jdk/pull/2511 From simonis at openjdk.java.net Wed Feb 10 17:23:39 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Wed, 10 Feb 2021 17:23:39 GMT Subject: RFR: 8261027: AArch64: Support for LSE atomics C++ HotSpot code [v7] In-Reply-To: References: Message-ID: <_cEQfdVoOj40NE90FFatdV721LmYlXfW1Es-36CKXF4=.a24cbfeb-bc8b-449c-9c8b-ac9491efdcf8@github.com> On Wed, 10 Feb 2021 15:20:02 GMT, Andrew Haley wrote: >> Go back a few years, and there were simple atomic load/store exclusive >> instructions on Arm. Say you want to do an atomic increment of a >> counter. You'd do an atomic load to get the counter into your local cache >> in exclusive state, increment that counter locally, then write that >> incremented counter back to memory with an atomic store. All the time >> that cache line was in exclusive state, so you're guaranteed that >> no-one else changed anything on that cache line while you had it. >> >> This is hard to scale on a very large system (e.g. Fugaku) because if >> many processors are incrementing that counter you get a lot of cache >> line ping-ponging between cores. >> >> So, Arm decided to add a locked memory increment instruction that >> works without needing to load an entire line into local cache. It's a >> single instruction that loads, increments, and writes back. The secret >> is to send a cache control message to whichever processor owns the >> cache line containing the count, tell that processor to increment the >> counter and return the incremented value. That way cache coherency >> traffic is mimimized. This new set of instructions is known as Large >> System Extensions, or LSE. >> >> Unfortunately, in recent processors, the "old" load/store exclusive >> instructions, sometimes perform very badly. Therefore, it's now >> necessary for software to detect which version of Arm it's running >> on, and use the "new" LSE instructions if they're available. Otherwise >> performance can be very poor under heavy contention. >> >> GCC's -moutline-atomics does this by providing library calls which use >> LSE if it's available, but this option is only provided on newer >> versions of GCC. This is particularly problematic with older versions >> of OpenJDK, which build using old GCC versions. >> >> Also, I suspect that some other operating systems could use this. >> Perhaps not MacOS, given that all Apple CPUs support LSE, but >> maybe Windows. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > #ifdef LINUX for now. Changes requested by simonis (Reviewer). src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.hpp line 30: > 28: > 29: #include "runtime/vm_version.hpp" > 30: #include "atomic_aarch64.hpp" Should be sorted before `#include "runtime/vm_version.hpp"` src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.hpp line 52: > 50: extern aarch64_atomic_stub_t aarch64_atomic_cmpxchg_4_impl; > 51: extern aarch64_atomic_stub_t aarch64_atomic_cmpxchg_8_impl; > 52: I don't think you need to duplicate all these declarations here if you include `"atomic_aarch64.hpp"` which already declares all these types and variables. ------------- PR: https://git.openjdk.java.net/jdk/pull/2434 From nhe at activeviam.com Wed Feb 10 17:24:27 2021 From: nhe at activeviam.com (Nicolas Heutte) Date: Wed, 10 Feb 2021 18:24:27 +0100 Subject: SuperWord loop optimization lost after method inlining Message-ID: Hi all, I am encountering a performance issue caused by the interaction between method inlining and automatic vectorization. Our application aggregates arrays intensively using a method named ArrayFloatToArrayFloatVectorBinding.plus() with the following code: for (int i = 0; i < srcLen; ++i) { dstArray[i] += srcArray[i]; } When we microbenchmark this method we observe fast performance close to the practical memory bandwidth and when we print the assembly code we observe loop unrolling and automatic vectorization with SIMD instructions. 0x000001ef4600abf0: vmovdqu 0x10(%r14,%r13,4),%ymm0 0x000001ef4600abf7: vaddps 0x10(%rcx,%r13,4),%ymm0,%ymm0 0x000001ef4600abfe: vmovdqu %ymm0,0x10(%r14,%r13,4) 0x000001ef4600ac05: movslq %r13d,%r11 0x000001ef4600ac08: vmovdqu 0x30(%r14,%r11,4),%ymm0 0x000001ef4600ac0f: vaddps 0x30(%rcx,%r11,4),%ymm0,%ymm0 0x000001ef4600ac16: vmovdqu %ymm0,0x30(%r14,%r11,4) 0x000001ef4600ac1d: vmovdqu 0x50(%r14,%r11,4),%ymm0 0x000001ef4600ac24: vaddps 0x50(%rcx,%r11,4),%ymm0,%ymm0 0x000001ef4600ac2b: vmovdqu %ymm0,0x50(%r14,%r11,4) 0x000001ef4600ac32: vmovdqu 0x70(%r14,%r11,4),%ymm0 0x000001ef4600ac39: vaddps 0x70(%rcx,%r11,4),%ymm0,%ymm0 0x000001ef4600ac40: vmovdqu %ymm0,0x70(%r14,%r11,4) 0x000001ef4600ac47: vmovdqu 0x90(%r14,%r11,4),%ymm0 0x000001ef4600ac51: vaddps 0x90(%rcx,%r11,4),%ymm0,%ymm0 0x000001ef4600ac5b: vmovdqu %ymm0,0x90(%r14,%r11,4) 0x000001ef4600ac65: vmovdqu 0xb0(%r14,%r11,4),%ymm0 0x000001ef4600ac6f: vaddps 0xb0(%rcx,%r11,4),%ymm0,%ymm0 0x000001ef4600ac79: vmovdqu %ymm0,0xb0(%r14,%r11,4) 0x000001ef4600ac83: vmovdqu 0xd0(%r14,%r11,4),%ymm0 0x000001ef4600ac8d: vaddps 0xd0(%rcx,%r11,4),%ymm0,%ymm0 0x000001ef4600ac97: vmovdqu %ymm0,0xd0(%r14,%r11,4) 0x000001ef4600aca1: vmovdqu 0xf0(%r14,%r11,4),%ymm0 0x000001ef4600acab: vaddps 0xf0(%rcx,%r11,4),%ymm0,%ymm0 0x000001ef4600acb5: vmovdqu %ymm0,0xf0(%r14,%r11,4) ;*fastore {reexecute=0 rethrow=0 return_oop=0} ; - com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 61 (line 41) 0x000001ef4600acbf: add $0x40,%r13d ;*iinc {reexecute=0 rethrow=0 return_oop=0} ; - com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 62 (line 40) 0x000001ef4600acc3: cmp %eax,%r13d 0x000001ef4600acc6: jl 0x000001ef4600abf0 ;*goto {reexecute=0 rethrow=0 return_oop=0} ; - com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 65 (line 40) In the real application, this method is actually inlined in a higher level method named AVector.plus(). Unfortunately, the inlined version of the aggregation code is not vectorized anymore: 0x000001ef460180a0: cmp %ebx,%r11d 0x000001ef460180a3: jae 0x000001ef460180e6 0x000001ef460180a5: vmovss 0x10(%r8,%r11,4),%xmm1 ;*faload {reexecute=0 rethrow=0 return_oop=0} ; - com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 54 (line 41) ; - com.qfs.vector.impl.AVector::plus at 17 (line 204) 0x000001ef460180ac: cmp %ecx,%r11d 0x000001ef460180af: jae 0x000001ef46018104 0x000001ef460180b1: vaddss 0x10(%r9,%r11,4),%xmm1,%xmm1 0x000001ef460180b8: vmovss %xmm1,0x10(%r8,%r11,4) ;*fastore {reexecute=0 rethrow=0 return_oop=0} ; - com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 61 (line 41) ; - com.qfs.vector.impl.AVector::plus at 17 (line 204) 0x000001ef460180bf: inc %r11d ;*iinc {reexecute=0 rethrow=0 return_oop=0} ; - com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 62 (line 40) ; - com.qfs.vector.impl.AVector::plus at 17 (line 204) 0x000001ef460180c2: cmp %r10d,%r11d 0x000001ef460180c5: jl 0x000001ef460180a0 ;*goto {reexecute=0 rethrow=0 return_oop=0} ; - com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 65 (line 40) ; - com.qfs.vector.impl.AVector::plus at 17 (line 204) This causes a significant performance drop, compared to a run where we explicitly disable the inlining and observe automatically vectorized code again ( -XX:CompileCommand=dontinline,com/qfs/vector/binding/impl/ArrayFloatToArrayFloatVectorBinding.plus ). How do you guys explain that behavior of the JIT compiler? Is this a known and tracked issue, could it be fixed in the JVM? Can we do something in the java code to prevent this from happening? Best regards, Nicolas Heutte From sjohanss at openjdk.java.net Wed Feb 10 17:53:36 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 10 Feb 2021 17:53:36 GMT Subject: RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages In-Reply-To: References: Message-ID: On Wed, 10 Feb 2021 14:41:23 GMT, Thomas Stuefe wrote: > Hi Stefan, > > Testing UseSHM makes definitely sense. Since SysV shm has more restrictions than the mmap API I am not sure that the former works where the latter does not. > Nice to hear that you agree Thomas :) > About your patch: The shmget test is good, I am not sure about the rest. > > You first attempt to `shmget(IPC_PRIVATE, page_size, SHM_HUGETLB)`. Then you separately scan for CAP_IPC_LOCK. But would shmget(SHM_HUGETLB) not fail with EPERM if CAP_IPC_LOCK is missing? man shmget says: > > ``` > EPERM The SHM_HUGETLB flag was specified, but the caller was not > privileged (did not have the CAP_IPC_LOCK capability). > ``` > > So I think querying for EPERM after shmget should be sufficient? If CAP_IPC_LOCK was missing, the shmget should have failed and we should never get to the point of can_lock_memory(). > This was my initial though as well, but it turns out that the capability is only needed to "lock" more memory than the MEMLOCK-limit defines. So the first `shmget()` will succeed as long as there is at least one large page available (and the MEMLOCK-limit is not smaller than the large page size). I could of course design the other check to once again call `shmget()`, but with a size larger than the limit. If that check fails with `EPERM` we don't have the needed capability. > About the rlimit test: Assuming the man page is wrong and shmget succeeds even without the capability, you only print this limit if can_lock_memory() returned false. The way I understand this is that the limit can still be the limiting factor even with those capabilities in place. > In old kernels this was the case but according to `man setrlimit` this is no longer the case: RLIMIT_MEMLOCK The maximum number of bytes of memory that may be locked into RAM ... In Linux kernels before 2.6.9, this limit controlled the amount of memory that could be locked by a privileged process. Since Linux 2.6.9, no limits are placed on the amount of memory that a privileged process may lock, and this limit instead governs the amount of memory that an unprivileged process may lock. So if we have the capability the limit will never be a limiting factor. > I think it would make more sense to extend the error reporting if later real "shmget" calls fail. Note that later reserve calls can fail for a number of reasons (huge page pool exhausted, RLIMIT_MEMLOCK reached, SHMMAX or SHMALL reached, commit charge reached its limit...) which means that just reporting on RLIMIT_MEMLOCK would be arbitrary. Whether or not more intelligent error reporting makes sense depends also on whether we think we still need the SysV path at all. I personally doubt that this still makes sense. > I agree that this warning is not covering all bases, but I though it was a good addition compared to just be silent about it. But I like your idea about just doing the simplest sanity check here and then when a mapping goes bade try to figure out why it did. In that case I would only include the first `shmget()` in the sanity check and file a separate issue for improving the warning. ------------- PR: https://git.openjdk.java.net/jdk/pull/2488 From iklam at openjdk.java.net Wed Feb 10 18:13:43 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 10 Feb 2021 18:13:43 GMT Subject: RFR: 8260341: CDS dump VM init code does not check exceptions In-Reply-To: <_-r6tekItyqxQA5WKZFBj-b0IQRC6R0Nw7QBMZbBPw0=.6c253bd5-697b-4060-9026-3cf015a6dc09@github.com> References: <_-r6tekItyqxQA5WKZFBj-b0IQRC6R0Nw7QBMZbBPw0=.6c253bd5-697b-4060-9026-3cf015a6dc09@github.com> Message-ID: On Wed, 10 Feb 2021 14:33:50 GMT, Harold Seigel wrote: > Hi Ioi, > Do you avoid using CHECK if it's the last line of a function? For example, why is THREAD used instead of CHECK at line 1506? > Thanks, Harold > > 1503 void ClassLoader::initialize_module_path(TRAPS) { > 1504 if (Arguments::is_dumping_archive()) { > 1505 ClassLoaderExt::setup_module_paths(CHECK); > 1506 FileMapInfo::allocate_shared_path_table(THREAD); > 1507 } > 1508 } I thought it was a commonly used coding convention, but I could only find a few cases where the code wasn't written by me :-( - https://github.com/openjdk/jdk/blame/b9d4211bc1aa92e257ddfe86c7a2b4e4e60598a0/src/hotspot/share/prims/jvm.cpp#L311 - https://github.com/openjdk/jdk/blame/f03e839e481f905358ce7d95a5d1f5179e7f46fe/src/hotspot/share/classfile/javaClasses.cpp#L2415 I will go back to `CHECK`, since the C++ compiler will elide the last `CHECK` anyway: in both cases, gcc compiles the last call to a direct branch to FileMapInfo::allocate_shared_path_table (i.e., a tail call). Using `CHECK` makes the code easier to maintain (if you add new code below it). ------------- PR: https://git.openjdk.java.net/jdk/pull/2494 From sjohanss at openjdk.java.net Wed Feb 10 18:28:51 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 10 Feb 2021 18:28:51 GMT Subject: RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages [v2] In-Reply-To: References: Message-ID: > When large pages are enabled on Linux (using -XX:+UseLargePages), both UseHugeTLBFS and UseSHM can be used. We prefer to use HugeTLBFS and first do a sanity check to see if this kind of large pages are available and if so we disable UseSHM. > > The problematic part is when HugeTLBFS pages are not available, then we disable this flag and without doing any sanity check for UseSHM, we mark large pages as enabled using SHM. One big problem with this is that SHM also requires the same type of explicitly allocated huge pages as HugeTLBFS and also privileges to lock memory. So it is likely that in the case of not being able to use HugeTLBFS we probably can't use SHM either. > > A fix for this would be to do a similar sanity check as currently done for HugeTLBFS and if it fails disable UseLargePages since we will always fail such allocation attempts anyways. > > The proposed sanity check consist of two part, where the first is just trying create a shared memory segment using `shmget()` with SHM_HUGETLB to use large pages. If this fails there is no idea in trying to use SHM to get large pages. > > The second part checks if the process has privileges to lock memory or if there will be a limit for the SHM usage. I think this would be a nice addition since it will notify the user about the limit and explain why large page mappings fail. The implementation parses `/proc/self/status` to make sure the needed capability is available. > > This change needs two tests to be updated to handle that large pages not can be disabled even when run with +UseLargePages. One of these tests are also updated in [PR#2486](https://github.com/openjdk/jdk/pull/2486) and I plan to get that integrated before this one. Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: Thomas review Removed check for IPC_LOCK capability. If a large page mapping fails the errno is already present in the warning printed out. We could look at improving this to better explain when EPERM vs ENOMEM occurs. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2488/files - new: https://git.openjdk.java.net/jdk/pull/2488/files/376b2235..bf6cbce1 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2488&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2488&range=00-01 Stats: 38 lines in 1 file changed: 0 ins; 38 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2488.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2488/head:pull/2488 PR: https://git.openjdk.java.net/jdk/pull/2488 From aph at openjdk.java.net Wed Feb 10 18:31:40 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 10 Feb 2021 18:31:40 GMT Subject: RFR: 8261027: AArch64: Support for LSE atomics C++ HotSpot code [v7] In-Reply-To: <_cEQfdVoOj40NE90FFatdV721LmYlXfW1Es-36CKXF4=.a24cbfeb-bc8b-449c-9c8b-ac9491efdcf8@github.com> References: <_cEQfdVoOj40NE90FFatdV721LmYlXfW1Es-36CKXF4=.a24cbfeb-bc8b-449c-9c8b-ac9491efdcf8@github.com> Message-ID: On Wed, 10 Feb 2021 17:19:04 GMT, Volker Simonis wrote: >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> #ifdef LINUX for now. > > src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.hpp line 30: > >> 28: >> 29: #include "runtime/vm_version.hpp" >> 30: #include "atomic_aarch64.hpp" > > Should be sorted before `#include "runtime/vm_version.hpp"` Thanks. > src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.hpp line 52: > >> 50: extern aarch64_atomic_stub_t aarch64_atomic_cmpxchg_4_impl; >> 51: extern aarch64_atomic_stub_t aarch64_atomic_cmpxchg_8_impl; >> 52: > > I don't think you need to duplicate all these declarations here if you include `"atomic_aarch64.hpp"` which already declares all these types and variables. Thanks, done. ------------- PR: https://git.openjdk.java.net/jdk/pull/2434 From vladimir.kozlov at oracle.com Wed Feb 10 18:35:53 2021 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 10 Feb 2021 10:35:53 -0800 Subject: SuperWord loop optimization lost after method inlining In-Reply-To: References: Message-ID: Hi, Nicolas Looks like, when inlined, the loop from ArrayFloatToArrayFloatVectorBinding::plus() was not optimized at all: it is not unrolled and has range checks. Such loops are not vectorized (you need unrolling and no checks). What Java version you are running? What HotSpot VM flags you are using when running application? Run application with -XX:+LogCompilation and look on compilation data in hotspot_pid.log file for caller AVector::plus(). VM also has several flags to trace loop optimizations but they are only available in debug VM build. If you have access to such build run with -XX:+PrintCompilation -XX:+TraceLoopOpts flags. Thanks, Vladimir K On 2/10/21 9:24 AM, Nicolas Heutte wrote: > Hi all, > > I am encountering a performance issue caused by the interaction between > method inlining and automatic vectorization. > > Our application aggregates arrays intensively using a method named > ArrayFloatToArrayFloatVectorBinding.plus() with the following code: > > for (int i = 0; i < srcLen; ++i) { > > dstArray[i] += srcArray[i]; > > } > > When we microbenchmark this method we observe fast performance close to the > practical memory bandwidth and when we print the assembly code we observe > loop unrolling and automatic vectorization with SIMD instructions. > > 0x000001ef4600abf0: vmovdqu 0x10(%r14,%r13,4),%ymm0 > > 0x000001ef4600abf7: vaddps 0x10(%rcx,%r13,4),%ymm0,%ymm0 > > 0x000001ef4600abfe: vmovdqu %ymm0,0x10(%r14,%r13,4) > > 0x000001ef4600ac05: movslq %r13d,%r11 > > 0x000001ef4600ac08: vmovdqu 0x30(%r14,%r11,4),%ymm0 > > 0x000001ef4600ac0f: vaddps 0x30(%rcx,%r11,4),%ymm0,%ymm0 > > 0x000001ef4600ac16: vmovdqu %ymm0,0x30(%r14,%r11,4) > > 0x000001ef4600ac1d: vmovdqu 0x50(%r14,%r11,4),%ymm0 > > 0x000001ef4600ac24: vaddps 0x50(%rcx,%r11,4),%ymm0,%ymm0 > > 0x000001ef4600ac2b: vmovdqu %ymm0,0x50(%r14,%r11,4) > > 0x000001ef4600ac32: vmovdqu 0x70(%r14,%r11,4),%ymm0 > > 0x000001ef4600ac39: vaddps 0x70(%rcx,%r11,4),%ymm0,%ymm0 > > 0x000001ef4600ac40: vmovdqu %ymm0,0x70(%r14,%r11,4) > > 0x000001ef4600ac47: vmovdqu 0x90(%r14,%r11,4),%ymm0 > > 0x000001ef4600ac51: vaddps 0x90(%rcx,%r11,4),%ymm0,%ymm0 > > 0x000001ef4600ac5b: vmovdqu %ymm0,0x90(%r14,%r11,4) > > 0x000001ef4600ac65: vmovdqu 0xb0(%r14,%r11,4),%ymm0 > > 0x000001ef4600ac6f: vaddps 0xb0(%rcx,%r11,4),%ymm0,%ymm0 > > 0x000001ef4600ac79: vmovdqu %ymm0,0xb0(%r14,%r11,4) > > 0x000001ef4600ac83: vmovdqu 0xd0(%r14,%r11,4),%ymm0 > > 0x000001ef4600ac8d: vaddps 0xd0(%rcx,%r11,4),%ymm0,%ymm0 > > 0x000001ef4600ac97: vmovdqu %ymm0,0xd0(%r14,%r11,4) > > 0x000001ef4600aca1: vmovdqu 0xf0(%r14,%r11,4),%ymm0 > > 0x000001ef4600acab: vaddps 0xf0(%rcx,%r11,4),%ymm0,%ymm0 > > 0x000001ef4600acb5: vmovdqu %ymm0,0xf0(%r14,%r11,4) ;*fastore > {reexecute=0 rethrow=0 return_oop=0} > > ; - > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 61 > (line 41) > > 0x000001ef4600acbf: add $0x40,%r13d ;*iinc {reexecute=0 > rethrow=0 return_oop=0} > > ; - > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 62 > (line 40) > > 0x000001ef4600acc3: cmp %eax,%r13d > > 0x000001ef4600acc6: jl 0x000001ef4600abf0 ;*goto {reexecute=0 > rethrow=0 return_oop=0} > > ; - > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 65 > (line 40) > > > > In the real application, this method is actually inlined in a higher level > method named AVector.plus(). Unfortunately, the inlined version of the > aggregation code is not vectorized anymore: > > > > 0x000001ef460180a0: cmp %ebx,%r11d > > 0x000001ef460180a3: jae 0x000001ef460180e6 > > 0x000001ef460180a5: vmovss 0x10(%r8,%r11,4),%xmm1 ;*faload {reexecute=0 > rethrow=0 return_oop=0} > > ; - > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 54 > (line 41) > > ; - > com.qfs.vector.impl.AVector::plus at 17 (line 204) > > 0x000001ef460180ac: cmp %ecx,%r11d > > 0x000001ef460180af: jae 0x000001ef46018104 > > 0x000001ef460180b1: vaddss 0x10(%r9,%r11,4),%xmm1,%xmm1 > > 0x000001ef460180b8: vmovss %xmm1,0x10(%r8,%r11,4) ;*fastore {reexecute=0 > rethrow=0 return_oop=0} > > ; - > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 61 > (line 41) > > ; - > com.qfs.vector.impl.AVector::plus at 17 (line 204) > > 0x000001ef460180bf: inc %r11d ;*iinc {reexecute=0 > rethrow=0 return_oop=0} > > ; - > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 62 > (line 40) > > ; - > com.qfs.vector.impl.AVector::plus at 17 (line 204) > > 0x000001ef460180c2: cmp %r10d,%r11d > > 0x000001ef460180c5: jl 0x000001ef460180a0 ;*goto {reexecute=0 > rethrow=0 return_oop=0} > > ; - > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 65 > (line 40) > > ; - > com.qfs.vector.impl.AVector::plus at 17 (line 204) > > > > This causes a significant performance drop, compared to a run where we > explicitly disable the inlining and observe automatically vectorized code > again ( > -XX:CompileCommand=dontinline,com/qfs/vector/binding/impl/ArrayFloatToArrayFloatVectorBinding.plus > ). > > > How do you guys explain that behavior of the JIT compiler? Is this a known > and tracked issue, could it be fixed in the JVM? Can we do something in the > java code to prevent this from happening? > > > Best regards, > > Nicolas Heutte > From stuefe at openjdk.java.net Wed Feb 10 18:47:41 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 10 Feb 2021 18:47:41 GMT Subject: RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages [v2] In-Reply-To: References: Message-ID: On Wed, 10 Feb 2021 18:28:51 GMT, Stefan Johansson wrote: >> When large pages are enabled on Linux (using -XX:+UseLargePages), both UseHugeTLBFS and UseSHM can be used. We prefer to use HugeTLBFS and first do a sanity check to see if this kind of large pages are available and if so we disable UseSHM. >> >> The problematic part is when HugeTLBFS pages are not available, then we disable this flag and without doing any sanity check for UseSHM, we mark large pages as enabled using SHM. One big problem with this is that SHM also requires the same type of explicitly allocated huge pages as HugeTLBFS and also privileges to lock memory. So it is likely that in the case of not being able to use HugeTLBFS we probably can't use SHM either. >> >> A fix for this would be to do a similar sanity check as currently done for HugeTLBFS and if it fails disable UseLargePages since we will always fail such allocation attempts anyways. >> >> The proposed sanity check consist of two part, where the first is just trying create a shared memory segment using `shmget()` with SHM_HUGETLB to use large pages. If this fails there is no idea in trying to use SHM to get large pages. >> >> The second part checks if the process has privileges to lock memory or if there will be a limit for the SHM usage. I think this would be a nice addition since it will notify the user about the limit and explain why large page mappings fail. The implementation parses `/proc/self/status` to make sure the needed capability is available. >> >> This change needs two tests to be updated to handle that large pages not can be disabled even when run with +UseLargePages. One of these tests are also updated in [PR#2486](https://github.com/openjdk/jdk/pull/2486) and I plan to get that integrated before this one. > > Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: > > Thomas review > > Removed check for IPC_LOCK capability. If a large page mapping fails > the errno is already present in the warning printed out. We could > look at improving this to better explain when EPERM vs ENOMEM occurs. Looks almost good. Minor nit below. src/hotspot/os/linux/os_linux.cpp line 3781: > 3779: } > 3780: > 3781: log_warning(pagesize)("UseLargePages disabled, no large pages configured and available on the system."); IIUC here we end up if UseLargePages=true (by default or not) and we were unable to get any of our APIs to work? Should this be an unconditional printout? At least make it only unconditional if UseLargePages==true is not default? I know its only a theoretical problem now since default is false. ------------- PR: https://git.openjdk.java.net/jdk/pull/2488 From stuefe at openjdk.java.net Wed Feb 10 18:47:42 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 10 Feb 2021 18:47:42 GMT Subject: RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages [v2] In-Reply-To: References: Message-ID: On Wed, 10 Feb 2021 18:44:26 GMT, Thomas Stuefe wrote: >> Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: >> >> Thomas review >> >> Removed check for IPC_LOCK capability. If a large page mapping fails >> the errno is already present in the warning printed out. We could >> look at improving this to better explain when EPERM vs ENOMEM occurs. > > Looks almost good. Minor nit below. > > Hi Stefan, > > Testing UseSHM makes definitely sense. Since SysV shm has more restrictions than the mmap API I am not sure that the former works where the latter does not. > > Nice to hear that you agree Thomas :) > > > About your patch: The shmget test is good, I am not sure about the rest. > > You first attempt to `shmget(IPC_PRIVATE, page_size, SHM_HUGETLB)`. Then you separately scan for CAP_IPC_LOCK. But would shmget(SHM_HUGETLB) not fail with EPERM if CAP_IPC_LOCK is missing? man shmget says: > > ``` > > EPERM The SHM_HUGETLB flag was specified, but the caller was not > > privileged (did not have the CAP_IPC_LOCK capability). > > ``` > > > > > > So I think querying for EPERM after shmget should be sufficient? If CAP_IPC_LOCK was missing, the shmget should have failed and we should never get to the point of can_lock_memory(). > > This was my initial though as well, but it turns out that the capability is only needed to "lock" more memory than the MEMLOCK-limit defines. So the first `shmget()` will succeed as long as there is at least one large page available (and the MEMLOCK-limit is not smaller than the large page size). > > I could of course design the other check to once again call `shmget()`, but with a size larger than the limit. If that check fails with `EPERM` we don't have the needed capability. Okay, I see what you did now. > > > About the rlimit test: Assuming the man page is wrong and shmget succeeds even without the capability, you only print this limit if can_lock_memory() returned false. The way I understand this is that the limit can still be the limiting factor even with those capabilities in place. > > In old kernels this was the case but according to `man setrlimit` this is no longer the case: > > ``` > RLIMIT_MEMLOCK > The maximum number of bytes of memory that may be locked into RAM ... > In Linux kernels before 2.6.9, this limit controlled the amount of memory that could be > locked by a privileged process. Since Linux 2.6.9, no limits are placed on the amount of > memory that a privileged process may lock, and this limit instead governs the amount > of memory that an unprivileged process may lock. > ``` > > So if we have the capability the limit will never be a limiting factor. > > > I think it would make more sense to extend the error reporting if later real "shmget" calls fail. Note that later reserve calls can fail for a number of reasons (huge page pool exhausted, RLIMIT_MEMLOCK reached, SHMMAX or SHMALL reached, commit charge reached its limit...) which means that just reporting on RLIMIT_MEMLOCK would be arbitrary. Whether or not more intelligent error reporting makes sense depends also on whether we think we still need the SysV path at all. I personally doubt that this still makes sense. > > I agree that this warning is not covering all bases, but I though it was a good addition compared to just be silent about it. But I like your idea about just doing the simplest sanity check here and then when a mapping goes bade try to figure out why it did. > > In that case I would only include the first `shmget()` in the sanity check and file a separate issue for improving the warning. This makes sense, especially since you could add that warning to both SysV and mmap paths since they share at least some of the restrictions. Like huge page pool size. It also would mean you don't pay for some of these tests unless needed. ------------- PR: https://git.openjdk.java.net/jdk/pull/2488 From iklam at openjdk.java.net Wed Feb 10 18:51:53 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 10 Feb 2021 18:51:53 GMT Subject: RFR: 8260341: CDS dump VM init code does not check exceptions [v2] In-Reply-To: References: Message-ID: > When CDS dumping is enabled, special initialization happens during VM init. However, many of these calls do not properly check for exception. Instead, they rely on the implicit knowledge that `metaspace::allocate()` will exit the VM when allocation fails during CDS dumping. This makes the code hard to understand and tightly coupled to `metaspace::allocate()`. > > The fix is: all code that makes allocation should be using CHECK macros, so each block of code can be individually understood without considering the behavior of `metaspace::allocate()`. > > I added `TRAPS` to a bunch of CDS-related functions that are called during VM init. In some cases, I changed `Thread* THREAD` to `TRAPS`. This also eliminated a few `Thread* THREAD = Thread::current()` calls. > > The "root" of these calls, such as `MetaspaceShared::prepare_for_dumping()`, now follows this pattern: > > EXCEPTION_MARK; > ClassLoader::initialize_shared_path(THREAD); > if (HAS_PENDING_EXCEPTION) { > java_lang_Throwable::print(PENDING_EXCEPTION, tty); > vm_exit_during_initialization("ClassLoader::initialize_shared_path() failed unexpectedly"); > } Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: Changed THREAD in tail calls to CHECK ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2494/files - new: https://git.openjdk.java.net/jdk/pull/2494/files/54c251c8..218af58f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2494&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2494&range=00-01 Stats: 14 lines in 3 files changed: 1 ins; 0 del; 13 mod Patch: https://git.openjdk.java.net/jdk/pull/2494.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2494/head:pull/2494 PR: https://git.openjdk.java.net/jdk/pull/2494 From sjohanss at openjdk.java.net Wed Feb 10 19:33:38 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 10 Feb 2021 19:33:38 GMT Subject: RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages [v2] In-Reply-To: References: Message-ID: On Wed, 10 Feb 2021 18:36:28 GMT, Thomas Stuefe wrote: >> Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: >> >> Thomas review >> >> Removed check for IPC_LOCK capability. If a large page mapping fails >> the errno is already present in the warning printed out. We could >> look at improving this to better explain when EPERM vs ENOMEM occurs. > > src/hotspot/os/linux/os_linux.cpp line 3781: > >> 3779: } >> 3780: >> 3781: log_warning(pagesize)("UseLargePages disabled, no large pages configured and available on the system."); > > IIUC here we end up if UseLargePages=true (by default or not) and we were unable to get any of our APIs to work? Should this be an unconditional printout? At least make it only unconditional if UseLargePages==true is not default? I know its only a theoretical problem now since default is false. You are correct we only end up here if `UseLargePages` is true. Making it conditional like that would be consistent with the other warnings. So only warn if the user explicitly set `+UseLargePages`. So something like? Suggestion: if (!FLAG_IS_DEFAULT(UseLargePages)) { log_warning(pagesize)("UseLargePages disabled, no large pages configured and available on the system."); } ------------- PR: https://git.openjdk.java.net/jdk/pull/2488 From coleenp at openjdk.java.net Wed Feb 10 21:13:38 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 10 Feb 2021 21:13:38 GMT Subject: RFR: 8260341: CDS dump VM init code does not check exceptions [v2] In-Reply-To: References: Message-ID: On Wed, 10 Feb 2021 18:51:53 GMT, Ioi Lam wrote: >> When CDS dumping is enabled, special initialization happens during VM init. However, many of these calls do not properly check for exception. Instead, they rely on the implicit knowledge that `metaspace::allocate()` will exit the VM when allocation fails during CDS dumping. This makes the code hard to understand and tightly coupled to `metaspace::allocate()`. >> >> The fix is: all code that makes allocation should be using CHECK macros, so each block of code can be individually understood without considering the behavior of `metaspace::allocate()`. >> >> I added `TRAPS` to a bunch of CDS-related functions that are called during VM init. In some cases, I changed `Thread* THREAD` to `TRAPS`. This also eliminated a few `Thread* THREAD = Thread::current()` calls. >> >> The "root" of these calls, such as `MetaspaceShared::prepare_for_dumping()`, now follows this pattern: >> >> EXCEPTION_MARK; >> ClassLoader::initialize_shared_path(THREAD); >> if (HAS_PENDING_EXCEPTION) { >> java_lang_Throwable::print(PENDING_EXCEPTION, tty); >> vm_exit_during_initialization("ClassLoader::initialize_shared_path() failed unexpectedly"); >> } > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > Changed THREAD in tail calls to CHECK What Harold said. We use THREAD if the call is the last call in the function because I thought there was a tail call problem with one of the compilers once. I think this was the bug I was thinking about and it's only in the return statement: https://bugs.openjdk.java.net/browse/JDK-6889002 If you verified that the HAS_PENDING_EXCEPTION check evaporates, that's fine then, and better to have CHECK. As you say, someone might add some code after it. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2494 From hseigel at openjdk.java.net Wed Feb 10 21:17:39 2021 From: hseigel at openjdk.java.net (Harold Seigel) Date: Wed, 10 Feb 2021 21:17:39 GMT Subject: RFR: 8260341: CDS dump VM init code does not check exceptions [v2] In-Reply-To: References: Message-ID: <5LkYdA9nKXzL1AX4Mpn90DCOsAazDuX0wNhj2LlIc28=.08da1aa9-caf0-4130-b837-04aff29db1ae@github.com> On Wed, 10 Feb 2021 18:51:53 GMT, Ioi Lam wrote: >> When CDS dumping is enabled, special initialization happens during VM init. However, many of these calls do not properly check for exception. Instead, they rely on the implicit knowledge that `metaspace::allocate()` will exit the VM when allocation fails during CDS dumping. This makes the code hard to understand and tightly coupled to `metaspace::allocate()`. >> >> The fix is: all code that makes allocation should be using CHECK macros, so each block of code can be individually understood without considering the behavior of `metaspace::allocate()`. >> >> I added `TRAPS` to a bunch of CDS-related functions that are called during VM init. In some cases, I changed `Thread* THREAD` to `TRAPS`. This also eliminated a few `Thread* THREAD = Thread::current()` calls. >> >> The "root" of these calls, such as `MetaspaceShared::prepare_for_dumping()`, now follows this pattern: >> >> EXCEPTION_MARK; >> ClassLoader::initialize_shared_path(THREAD); >> if (HAS_PENDING_EXCEPTION) { >> java_lang_Throwable::print(PENDING_EXCEPTION, tty); >> vm_exit_during_initialization("ClassLoader::initialize_shared_path() failed unexpectedly"); >> } > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > Changed THREAD in tail calls to CHECK The changes look good. Thanks! Harold ------------- Marked as reviewed by hseigel (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2494 From iklam at openjdk.java.net Wed Feb 10 22:24:37 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 10 Feb 2021 22:24:37 GMT Subject: RFR: 8260341: CDS dump VM init code does not check exceptions [v2] In-Reply-To: References: Message-ID: <5aFoeQQAN76tmm9VgsXgM_37hcSRymZ4VgWVDHTEprg=.53224025-9386-4eb6-8633-aff6504bdbda@github.com> On Wed, 10 Feb 2021 21:11:12 GMT, Coleen Phillimore wrote: > What Harold said. We use THREAD if the call is the last call in the function because I thought there was a tail call problem with one of the compilers once. I think this was the bug I was thinking about and it's only in the return statement: > https://bugs.openjdk.java.net/browse/JDK-6889002 > If you verified that the HAS_PENDING_EXCEPTION check evaporates, that's fine then, and better to have CHECK. As you say, someone might add some code after it. I think the problem in JDK-6889002 is only with using `CHECK` in a return statement like this that produces unreachable source code. return foobar(1, 2, 3, CHECK_0); -> return foobar(1, 2, 3, THREAD); if (HAS_PENDING_EXCEPTION) return 0; The cases that I have changed in this PR are functions with `void` returns. void f(TRAPS) { g(1, 2, 3, CHECK); } would be expanded to void f(TRAPS) { g(1, 2, 3, THREAD); if (HAS_PENDING_EXCEPTION) return; } I suppose any C++ compiler that can compile HotSpot will elide the `if` line. ------------- PR: https://git.openjdk.java.net/jdk/pull/2494 From kim.barrett at oracle.com Thu Feb 11 03:34:11 2021 From: kim.barrett at oracle.com (Kim Barrett) Date: Wed, 10 Feb 2021 22:34:11 -0500 Subject: RFR: 8260941: Remove the conc_scan parameter for CardTable In-Reply-To: References: <1sGB_hdxutVE55IriJ2XK3krb4vsffzX3OKavt1UwBE=.53483ccd-9923-4ec3-a0d9-82e619c440f5@github.com> Message-ID: <4DE49E71-2886-414C-85CB-4BC1E4478E48@oracle.com> > On Feb 10, 2021, at 4:44 AM, Albert Mingkun Yang wrote: > > On Fri, 5 Feb 2021 09:52:25 GMT, Thomas Schatzl wrote: > >> Hi, >> >> can I have reviews for this removal of the last(?) CMS-specific code in CardTable, namely some provision to indicate that cards are being scanned concurrently in Serial/Parallel GC barrier code? >> >> The change simply follows the predicate into Serial/Parallel GC code which always returns false for them and removes that code. >> >> In the review for JDK-8234534 I mentioned that I split this out due to unexplainable errors; testing tier1-5 three times showed none of that any more (after updating to latest code). >> >> This change has only been built on Oracle-platforms and linux-x86 via github actions (https://github.com/tschatzl/jdk/actions/runs/539993964), so I would like to kindly ask maintainers of the others to compile and report issues (32 bit ARM, PPC etc). >> >> Testing: tier1-5 > > A side note: it seems that `G1BarrierSet` is the only subclass of `CardTableBarrierSet`. Maybe it makes sense to merge them into one. GenCollectedHeap (i.e. SerialGC) directly instantiates CardTableBarrierSet. From iklam at openjdk.java.net Thu Feb 11 03:56:56 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 11 Feb 2021 03:56:56 GMT Subject: RFR: 8261125: Move VM_Operation to vmOperation.hpp [v2] In-Reply-To: References: Message-ID: <7NTd4QcyE-O95plU98DtOA8v8NoO1ArbeT0i1AzK0-w=.47ecf860-07f6-4ead-a0b1-4f0127853e4d@github.com> > vmOperations.hpp declares the VM_Operation class, as well as a hodge podge of subclasses such as VM_ForceSafepoint, VM_DeoptimizeFrame. > > Out of the 1000 hotspot .o files, about 680 include vmOperations.hpp (mostly transitively). In most cases, they just need to use the VM_Operation class. > > So we should move VM_Operation to its own header: vmOperation.hpp (no "s"). > > After the refactoring, vmOperations.hpp is included only 64 times. The inclusion count of threadSMR.hpp is also reduced from 687 to 99. HotSpot build time is improved by about 0.4%. > > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: interfaceSupport.inline.hpp needs VM_Exit from vmOperations.hpp for JVM_LEAF macro ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2398/files - new: https://git.openjdk.java.net/jdk/pull/2398/files/f4637f70..01633cda Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2398&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2398&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2398.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2398/head:pull/2398 PR: https://git.openjdk.java.net/jdk/pull/2398 From kim.barrett at oracle.com Thu Feb 11 03:59:27 2021 From: kim.barrett at oracle.com (Kim Barrett) Date: Wed, 10 Feb 2021 22:59:27 -0500 Subject: Atomic operations: your thoughts are welocme In-Reply-To: References: Message-ID: <448C638F-D688-4913-875C-5D8BA9235126@oracle.com> > On Feb 8, 2021, at 1:14 PM, Andrew Haley wrote: > > I've been looking at the hottest Atomic operations in HotSpot, with a view to > finding out if the default memory_order_conservative (which is very expensive > on some architectures) can be weakened to something less. It's impossible to > fix all of them, but perhaps we can fix some of the most frequent. Is there any information about the possible performance improvement from such changes? 1.5-3M occurrences doesn't mean much without context. We don't presently have support for sequentially consistent semantics, only "conservative". My recollection is that this is in part because there might be code that is assuming the possibly stronger "conservative" semantics, and in part because there are different and incompatible approaches to implementing sequentially consistent semantics on some hardware platforms and we didn't want to make assumptions there. We also don't presently have any cmpxchg implementation that really supports anything between conservative and relaxed, nor do we support different order constraints for the success vs failure cases. Things can be complicated enough as is; while we *could* fill some of that in, I'm not sure we should. > These are the hottest compare-and-swap uses in HotSpot, with the count > at the end of each line. > > : :: = 16406757 > > This one is already memory_order_relaxed, so no problem. Right. Although I?m now wondering why this doesn?t need to do anything on the failure side, similar to what is needed in the similar place in ParallelGC when that was changed to use a relaxed cmpxchg. > ::Table::oop_oop_iterate(G1CMOopClosure*, oopDesc*, Klass*)+336>: :: = 3903178 > > This is actually MarkBitMap::par_mark calling BitMap::par_set_bit. Does this > need to be memory_order_conservative, or would something weaker do? Even > acq_rel or seq_cst would be better. I think for setting bits in a bitmap the thing to do would be to identify places that are safe and useful (impacts performance) to do so first. Then add a weaker variant for use in those places, assuming any are found. > : :: = 2376632 > : :: = 2003895 > > I can't imagine that either of these actually need memory_order_conservative, > they're just reference counts. The "usual" refcount implementation involves relaxed increment and stronger ordering for decrement. (If I'm remembering correctly, dec-acquire and a release fence on the zero value path before deleting. But I've not thought about what one might want for this CAS-based variant that handles boundary cases specially.) And as you say, whether any of these could be weakened depends on whether there is any code surrounding a use that depends on the stronger ordering semantics. At a guess I suspect increment could be changed to relaxed, but I've not looked. This one is probably a question for runtime folks. > : :: = 1719614 > > BitMap::par_set_bit again. > > , (MEMFLAGS)5>*)+432>: :: = 1617659 > > This one is GenericTaskQueue::pop_global calling cmpxchg_age(). > Again, do we need conservative here? This needs at least sequentially consistent semantics on the success path. See the referenced paper by Le, et al. There is also a cmpxchg_age in pop_local_slow. The Le, et al paper doesn't deal with that path. But it's also not in your list, which is good since this is supposed to be infrequently taken. From iklam at openjdk.java.net Thu Feb 11 03:59:37 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 11 Feb 2021 03:59:37 GMT Subject: RFR: 8261125: Move VM_Operation to vmOperation.hpp [v2] In-Reply-To: References: <1bctHoGUQv3v65nZBeuvMgXH4ur6CU3xXkzHT6ZqIPo=.aeda32f9-fdfe-4561-b6f0-7201caa0eea1@github.com> <7F1Nq0zPfyWP644ml5icuHT6lkHlAjJG2qFOJr2dGAY=.daffcc16-1ed0-42db-b77b-e59c72dff349@github.com> Message-ID: <54Eg-olaP7adJuc9TR2Rj58smez-FpUWkYcf1Hf1onA=.0137514d-c491-4701-8db4-2fa3fd6ac51d@github.com> On Fri, 5 Feb 2021 17:34:06 GMT, Coleen Phillimore wrote: > I'm fine with leaving vmOperations.hpp and vmOperation.hpp. It's not a big deal. commonVMOperations.hpp - too much noise! > I agree with David. interfaceSupport.inline.hpp imports a lot of things so importing vmOperations.hpp is not a big deal. vmOperations.hpp imports #include "runtime/threadSMR.hpp" otherwise it has all the same imports as interfaceSupport.inline.hpp anyway. > All these files are going to increase compilation time too. > I stand by my check mark above! OK, since no one is fond of further splitting the headers, and no one has vetoed the vmOperation.hpp name, I'll keep everything as originally proposed. Will do more testing and integrate. ------------- PR: https://git.openjdk.java.net/jdk/pull/2398 From iklam at openjdk.java.net Thu Feb 11 04:03:53 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 11 Feb 2021 04:03:53 GMT Subject: RFR: 8260341: CDS dump VM init code does not check exceptions [v3] In-Reply-To: References: Message-ID: > When CDS dumping is enabled, special initialization happens during VM init. However, many of these calls do not properly check for exception. Instead, they rely on the implicit knowledge that `metaspace::allocate()` will exit the VM when allocation fails during CDS dumping. This makes the code hard to understand and tightly coupled to `metaspace::allocate()`. > > The fix is: all code that makes allocation should be using CHECK macros, so each block of code can be individually understood without considering the behavior of `metaspace::allocate()`. > > I added `TRAPS` to a bunch of CDS-related functions that are called during VM init. In some cases, I changed `Thread* THREAD` to `TRAPS`. This also eliminated a few `Thread* THREAD = Thread::current()` calls. > > The "root" of these calls, such as `MetaspaceShared::prepare_for_dumping()`, now follows this pattern: > > EXCEPTION_MARK; > ClassLoader::initialize_shared_path(THREAD); > if (HAS_PENDING_EXCEPTION) { > java_lang_Throwable::print(PENDING_EXCEPTION, tty); > vm_exit_during_initialization("ClassLoader::initialize_shared_path() failed unexpectedly"); > } Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into 8260341-cds-dump-vm-init-does-not-check-exceptions - Changed THREAD in tail calls to CHECK - 8260341: CDS dump VM init code does not check exceptions ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2494/files - new: https://git.openjdk.java.net/jdk/pull/2494/files/218af58f..dd3c7e08 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2494&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2494&range=01-02 Stats: 1017 lines in 43 files changed: 425 ins; 347 del; 245 mod Patch: https://git.openjdk.java.net/jdk/pull/2494.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2494/head:pull/2494 PR: https://git.openjdk.java.net/jdk/pull/2494 From kim.barrett at oracle.com Thu Feb 11 04:09:56 2021 From: kim.barrett at oracle.com (Kim Barrett) Date: Wed, 10 Feb 2021 23:09:56 -0500 Subject: Atomic operations: your thoughts are welocme In-Reply-To: <448C638F-D688-4913-875C-5D8BA9235126@oracle.com> References: <448C638F-D688-4913-875C-5D8BA9235126@oracle.com> Message-ID: <74CD1B2A-E99A-4A97-BBE3-3DF6ED506A11@oracle.com> > On Feb 10, 2021, at 10:59 PM, Kim Barrett wrote: > We also don't presently have any cmpxchg implementation that really supports > anything between conservative and relaxed, nor do we support different order > constraints for the success vs failure cases. Things can be complicated > enough as is; while we *could* fill some of that in, I'm not sure we should. I forgot that the linux-ppc port tries harder in this area. This was so a release-cmpxchg could be used in ParallelGC's PSPromotionManager::copy_to_survivor_space and benefit from that. The initial proposal was to use relaxed-cmpxchg, but that was shown to be insufficient. From kbarrett at openjdk.java.net Thu Feb 11 04:46:39 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 11 Feb 2021 04:46:39 GMT Subject: RFR: 8260941: Remove the conc_scan parameter for CardTable In-Reply-To: <1sGB_hdxutVE55IriJ2XK3krb4vsffzX3OKavt1UwBE=.53483ccd-9923-4ec3-a0d9-82e619c440f5@github.com> References: <1sGB_hdxutVE55IriJ2XK3krb4vsffzX3OKavt1UwBE=.53483ccd-9923-4ec3-a0d9-82e619c440f5@github.com> Message-ID: On Fri, 5 Feb 2021 09:52:25 GMT, Thomas Schatzl wrote: > Hi, > > can I have reviews for this removal of the last(?) CMS-specific code in CardTable, namely some provision to indicate that cards are being scanned concurrently in Serial/Parallel GC barrier code? > > The change simply follows the predicate into Serial/Parallel GC code which always returns false for them and removes that code. > > In the review for JDK-8234534 I mentioned that I split this out due to unexplainable errors; testing tier1-5 three times showed none of that any more (after updating to latest code). > > This change has only been built on Oracle-platforms and linux-x86 via github actions (https://github.com/tschatzl/jdk/actions/runs/539993964), so I would like to kindly ask maintainers of the others to compile and report issues (32 bit ARM, PPC etc). > > Testing: tier1-5 Looks good. Just the one minor nit about "virtual". src/hotspot/share/gc/g1/g1BarrierSet.hpp line 56: > 54: ~G1BarrierSet() { } > 55: > 56: bool card_mark_must_follow_store() const { Should be "virtual". (Probably should be "override", but nobody's gotten around to vetting and proposing that change to the style guide.) ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2425 From iklam at openjdk.java.net Thu Feb 11 05:14:39 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 11 Feb 2021 05:14:39 GMT Subject: RFR: 8260341: CDS dump VM init code does not check exceptions [v2] In-Reply-To: <5LkYdA9nKXzL1AX4Mpn90DCOsAazDuX0wNhj2LlIc28=.08da1aa9-caf0-4130-b837-04aff29db1ae@github.com> References: <5LkYdA9nKXzL1AX4Mpn90DCOsAazDuX0wNhj2LlIc28=.08da1aa9-caf0-4130-b837-04aff29db1ae@github.com> Message-ID: On Wed, 10 Feb 2021 21:14:25 GMT, Harold Seigel wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> Changed THREAD in tail calls to CHECK > > The changes look good. Thanks! > Harold Thanks @hseigel and @coleenp for the review! ------------- PR: https://git.openjdk.java.net/jdk/pull/2494 From iklam at openjdk.java.net Thu Feb 11 05:14:39 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 11 Feb 2021 05:14:39 GMT Subject: Integrated: 8260341: CDS dump VM init code does not check exceptions In-Reply-To: References: Message-ID: On Wed, 10 Feb 2021 05:00:43 GMT, Ioi Lam wrote: > When CDS dumping is enabled, special initialization happens during VM init. However, many of these calls do not properly check for exception. Instead, they rely on the implicit knowledge that `metaspace::allocate()` will exit the VM when allocation fails during CDS dumping. This makes the code hard to understand and tightly coupled to `metaspace::allocate()`. > > The fix is: all code that makes allocation should be using CHECK macros, so each block of code can be individually understood without considering the behavior of `metaspace::allocate()`. > > I added `TRAPS` to a bunch of CDS-related functions that are called during VM init. In some cases, I changed `Thread* THREAD` to `TRAPS`. This also eliminated a few `Thread* THREAD = Thread::current()` calls. > > The "root" of these calls, such as `MetaspaceShared::prepare_for_dumping()`, now follows this pattern: > > EXCEPTION_MARK; > ClassLoader::initialize_shared_path(THREAD); > if (HAS_PENDING_EXCEPTION) { > java_lang_Throwable::print(PENDING_EXCEPTION, tty); > vm_exit_during_initialization("ClassLoader::initialize_shared_path() failed unexpectedly"); > } This pull request has now been integrated. Changeset: adca84cc Author: Ioi Lam URL: https://git.openjdk.java.net/jdk/commit/adca84cc Stats: 104 lines in 12 files changed: 19 ins; 11 del; 74 mod 8260341: CDS dump VM init code does not check exceptions Reviewed-by: coleenp, hseigel ------------- PR: https://git.openjdk.java.net/jdk/pull/2494 From stuefe at openjdk.java.net Thu Feb 11 05:37:39 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 11 Feb 2021 05:37:39 GMT Subject: RFR: 8260341: CDS dump VM init code does not check exceptions [v2] In-Reply-To: References: <5LkYdA9nKXzL1AX4Mpn90DCOsAazDuX0wNhj2LlIc28=.08da1aa9-caf0-4130-b837-04aff29db1ae@github.com> Message-ID: On Thu, 11 Feb 2021 05:11:25 GMT, Ioi Lam wrote: >> The changes look good. Thanks! >> Harold > > Thanks @hseigel and @coleenp for the review! Could we then get rid of the special CDS handling in `Metaspace::allocate()`: https://github.com/openjdk/jdk/blob/837bd8930d0a010110f1318b947c036609d3aa33/src/hotspot/share/memory/metaspace.cpp#L825-L836 ? ------------- PR: https://git.openjdk.java.net/jdk/pull/2494 From stuefe at openjdk.java.net Thu Feb 11 05:45:40 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 11 Feb 2021 05:45:40 GMT Subject: RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages [v2] In-Reply-To: References: Message-ID: <2JY0mge1p7foK5kwDx3JnvxzLql_Dsv1WY3BTO42jN4=.1ebcc7f7-3d64-4d2a-ab62-455fbdff79b3@github.com> On Wed, 10 Feb 2021 19:30:58 GMT, Stefan Johansson wrote: >> src/hotspot/os/linux/os_linux.cpp line 3781: >> >>> 3779: } >>> 3780: >>> 3781: log_warning(pagesize)("UseLargePages disabled, no large pages configured and available on the system."); >> >> IIUC here we end up if UseLargePages=true (by default or not) and we were unable to get any of our APIs to work? Should this be an unconditional printout? At least make it only unconditional if UseLargePages==true is not default? I know its only a theoretical problem now since default is false. > > You are correct we only end up here if `UseLargePages` is true. Making it conditional like that would be consistent with the other warnings. So only warn if the user explicitly set `+UseLargePages`. So something like? > Suggestion: > > if (!FLAG_IS_DEFAULT(UseLargePages)) { > log_warning(pagesize)("UseLargePages disabled, no large pages configured and available on the system."); > } Seems reasonable and in line with what we do otherwise. As a side note, I'm not a big fan of unconditional error printouts for non-fatal errors. Can mess with parsers (see eg JDK-8260902). The argument "we trace only if flag is explicitly set" is valid oc, but even so customers tend to bury flags in scripts and then forget them. But I know this is common procedure so this would be a separate discussion. ------------- PR: https://git.openjdk.java.net/jdk/pull/2488 From iklam at openjdk.java.net Thu Feb 11 05:50:36 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 11 Feb 2021 05:50:36 GMT Subject: RFR: 8260341: CDS dump VM init code does not check exceptions [v2] In-Reply-To: References: <5LkYdA9nKXzL1AX4Mpn90DCOsAazDuX0wNhj2LlIc28=.08da1aa9-caf0-4130-b837-04aff29db1ae@github.com> Message-ID: <3rr7O83wZ5AcOuTutdInpTk48FsVyezRUlHMysc8MuM=.f40b215a-163e-42b0-b0be-8755ca78e2dc@github.com> On Thu, 11 Feb 2021 05:34:35 GMT, Thomas Stuefe wrote: > Could we then get rid of the special CDS handling in `Metaspace::allocate()`: > https://github.com/openjdk/jdk/blob/837bd8930d0a010110f1318b947c036609d3aa33/src/hotspot/share/memory/metaspace.cpp#L825-L836 > I filed https://bugs.openjdk.java.net/browse/JDK-8261551 . We need to fix JDK-8261480 first, since we still have some inappropriate use of THREAD during dumping. I split the work into several steps so that the review can be easier. ------------- PR: https://git.openjdk.java.net/jdk/pull/2494 From stuefe at openjdk.java.net Thu Feb 11 06:02:39 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 11 Feb 2021 06:02:39 GMT Subject: RFR: 8260341: CDS dump VM init code does not check exceptions [v2] In-Reply-To: <3rr7O83wZ5AcOuTutdInpTk48FsVyezRUlHMysc8MuM=.f40b215a-163e-42b0-b0be-8755ca78e2dc@github.com> References: <5LkYdA9nKXzL1AX4Mpn90DCOsAazDuX0wNhj2LlIc28=.08da1aa9-caf0-4130-b837-04aff29db1ae@github.com> <3rr7O83wZ5AcOuTutdInpTk48FsVyezRUlHMysc8MuM=.f40b215a-163e-42b0-b0be-8755ca78e2dc@github.com> Message-ID: On Thu, 11 Feb 2021 05:47:12 GMT, Ioi Lam wrote: > > Could we then get rid of the special CDS handling in `Metaspace::allocate()`: > > https://github.com/openjdk/jdk/blob/837bd8930d0a010110f1318b947c036609d3aa33/src/hotspot/share/memory/metaspace.cpp#L825-L836 > > I filed https://bugs.openjdk.java.net/browse/JDK-8261551 . We need to fix JDK-8261480 first, since we still have some inappropriate use of THREAD during dumping. I split the work into several steps so that the review can be easier. Okay, thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/2494 From iklam at openjdk.java.net Thu Feb 11 06:36:57 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 11 Feb 2021 06:36:57 GMT Subject: RFR: 8261125: Move VM_Operation to vmOperation.hpp [v3] In-Reply-To: References: Message-ID: > vmOperations.hpp declares the VM_Operation class, as well as a hodge podge of subclasses such as VM_ForceSafepoint, VM_DeoptimizeFrame. > > Out of the 1000 hotspot .o files, about 680 include vmOperations.hpp (mostly transitively). In most cases, they just need to use the VM_Operation class. > > So we should move VM_Operation to its own header: vmOperation.hpp (no "s"). > > After the refactoring, vmOperations.hpp is included only 64 times. The inclusion count of threadSMR.hpp is also reduced from 687 to 99. HotSpot build time is improved by about 0.4%. > > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into 8261125-move-VM_Operation-to-vmOperation.hpp - interfaceSupport.inline.hpp needs VM_Exit from vmOperations.hpp for JVM_LEAF macro - 8261125: Move VM_Operation to vmOperation.hpp ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2398/files - new: https://git.openjdk.java.net/jdk/pull/2398/files/01633cda..bab79154 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2398&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2398&range=01-02 Stats: 19114 lines in 561 files changed: 11182 ins; 5559 del; 2373 mod Patch: https://git.openjdk.java.net/jdk/pull/2398.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2398/head:pull/2398 PR: https://git.openjdk.java.net/jdk/pull/2398 From iklam at openjdk.java.net Thu Feb 11 06:57:39 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 11 Feb 2021 06:57:39 GMT Subject: RFR: 8261125: Move VM_Operation to vmOperation.hpp [v3] In-Reply-To: <54Eg-olaP7adJuc9TR2Rj58smez-FpUWkYcf1Hf1onA=.0137514d-c491-4701-8db4-2fa3fd6ac51d@github.com> References: <1bctHoGUQv3v65nZBeuvMgXH4ur6CU3xXkzHT6ZqIPo=.aeda32f9-fdfe-4561-b6f0-7201caa0eea1@github.com> <7F1Nq0zPfyWP644ml5icuHT6lkHlAjJG2qFOJr2dGAY=.daffcc16-1ed0-42db-b77b-e59c72dff349@github.com> <54Eg-olaP7adJuc9TR2Rj58smez-FpUWkYcf1Hf1onA=.0137514d-c491-4701-8db4-2fa3fd6ac51d@github.com> Message-ID: On Thu, 11 Feb 2021 03:56:44 GMT, Ioi Lam wrote: >> I'm fine with leaving vmOperations.hpp and vmOperation.hpp. It's not a big deal. commonVMOperations.hpp - too much noise! >> I agree with David. interfaceSupport.inline.hpp imports a lot of things so importing vmOperations.hpp is not a big deal. vmOperations.hpp imports #include "runtime/threadSMR.hpp" otherwise it has all the same imports as interfaceSupport.inline.hpp anyway. >> All these files are going to increase compilation time too. >> I stand by my check mark above! > >> I'm fine with leaving vmOperations.hpp and vmOperation.hpp. It's not a big deal. commonVMOperations.hpp - too much noise! >> I agree with David. interfaceSupport.inline.hpp imports a lot of things so importing vmOperations.hpp is not a big deal. vmOperations.hpp imports #include "runtime/threadSMR.hpp" otherwise it has all the same imports as interfaceSupport.inline.hpp anyway. >> All these files are going to increase compilation time too. >> I stand by my check mark above! > > OK, since no one is fond of further splitting the headers, and no one has vetoed the vmOperation.hpp name, I'll keep everything as originally proposed. Will do more testing and integrate. BTW, I found a few existing singular/plural pairs of hpp files. Admittedly I created ciSymbols.hpp recently, but the other 3 pairs have been there for quite some time. share/ci/ciSymbol.hpp share/ci/ciSymbols.hpp share/interpreter/bytecode.hpp share/interpreter/bytecodes.hpp share/jfr/recorder/service/jfrEvent.hpp share/jfr/jfrEvents.hpp share/jfr/recorder/checkpoint/types/jfrType.hpp share/jfr/utilities/jfrTypes.hpp ------------- PR: https://git.openjdk.java.net/jdk/pull/2398 From shade at openjdk.java.net Thu Feb 11 07:18:39 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 11 Feb 2021 07:18:39 GMT Subject: RFR: 8261449: Micro-optimize JVM_LatestUserDefinedLoader [v2] In-Reply-To: References: <3EpwZwOAE3WA0NyFG6b2UcZLdKcBihJ82eKMj0gunBE=.3435d3cc-f9f4-4994-855b-ce1ce6e6f93f@github.com> Message-ID: On Wed, 10 Feb 2021 15:26:50 GMT, Thomas Stuefe wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Added a comment > > Looks good. I find it simpler too. > > You could run the tests with sun.reflect.inflationThreshold=0. Should increase the chance of encountering reflection delegator loaders a lot. > > Cheers, Thomas Thanks! Any reviewers from JDK side? @AlanBateman? ------------- PR: https://git.openjdk.java.net/jdk/pull/2485 From sjohanss at openjdk.java.net Thu Feb 11 07:48:53 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Thu, 11 Feb 2021 07:48:53 GMT Subject: RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages [v3] In-Reply-To: References: Message-ID: > When large pages are enabled on Linux (using -XX:+UseLargePages), both UseHugeTLBFS and UseSHM can be used. We prefer to use HugeTLBFS and first do a sanity check to see if this kind of large pages are available and if so we disable UseSHM. > > The problematic part is when HugeTLBFS pages are not available, then we disable this flag and without doing any sanity check for UseSHM, we mark large pages as enabled using SHM. One big problem with this is that SHM also requires the same type of explicitly allocated huge pages as HugeTLBFS and also privileges to lock memory. So it is likely that in the case of not being able to use HugeTLBFS we probably can't use SHM either. > > A fix for this would be to do a similar sanity check as currently done for HugeTLBFS and if it fails disable UseLargePages since we will always fail such allocation attempts anyways. > > The proposed sanity check consist of two part, where the first is just trying create a shared memory segment using `shmget()` with SHM_HUGETLB to use large pages. If this fails there is no idea in trying to use SHM to get large pages. > > The second part checks if the process has privileges to lock memory or if there will be a limit for the SHM usage. I think this would be a nice addition since it will notify the user about the limit and explain why large page mappings fail. The implementation parses `/proc/self/status` to make sure the needed capability is available. > > This change needs two tests to be updated to handle that large pages not can be disabled even when run with +UseLargePages. One of these tests are also updated in [PR#2486](https://github.com/openjdk/jdk/pull/2486) and I plan to get that integrated before this one. Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: Only warn if UseLargePages was explicitly set. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2488/files - new: https://git.openjdk.java.net/jdk/pull/2488/files/bf6cbce1..6a2606be Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2488&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2488&range=01-02 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2488.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2488/head:pull/2488 PR: https://git.openjdk.java.net/jdk/pull/2488 From anleonar at redhat.com Thu Feb 11 07:58:57 2021 From: anleonar at redhat.com (Andrew Leonard) Date: Thu, 11 Feb 2021 07:58:57 +0000 Subject: JEP395: spec clarify isRecord() verification? Message-ID: Hi, Is someone able to clarify please the spec with regards isRecord() A record class is implicitly final, and cannot be abstract <----- Whether this should be enforced at static verification or in isRecord()? The JTreg testcase https://github.com/openjdk/jdk16/blob/4de3a6be9e60b9676f2199cd18eadb54a9d6e3fe/test/jdk/java/lang/reflect/records/IsRecordTest.java#L77 infers the later? Thanks Andrew From alanb at openjdk.java.net Thu Feb 11 08:07:38 2021 From: alanb at openjdk.java.net (Alan Bateman) Date: Thu, 11 Feb 2021 08:07:38 GMT Subject: RFR: 8261449: Micro-optimize JVM_LatestUserDefinedLoader [v2] In-Reply-To: References: <3EpwZwOAE3WA0NyFG6b2UcZLdKcBihJ82eKMj0gunBE=.3435d3cc-f9f4-4994-855b-ce1ce6e6f93f@github.com> Message-ID: On Wed, 10 Feb 2021 07:34:59 GMT, Aleksey Shipilev wrote: >> `JVM_LatestUserDefinedLoader` is called normally from `ObjectInputStream.resolveClass` -> `VM.latestUserDefinedLoader0`. And it takes a measurable time to walk the stack. There is JDK-8173368 that wants to replace it with `StackWalker`, but we can tune up the `JVM_LatestUserDefinedLoader` itself without changing the semantics of it (thus providing the backportability, including the releases that do not have `StackWalker`) and improving performance (thus providing a more aggressive baseline for `StackWalker` rewrite). >> >> The key is to recognize that out of two checks: 1) checking for two special subclasses; 2) checking for user classloader -- the first one usually passes, and second one fails much more frequently. First check also requires traversing the superclasses upwards looking for match. Reversing the order of the checks, plus inlining the helper method improves performance without changing the semantics. >> >> Out of curiosity, my previous patch dropped the first check completely, replacing it by asserts, and we definitely run into situation where that check is needed on some tests. >> >> On my machine, `VM.latestUserDefinedLoader` invocation time drops from 115 to 100 ns/op. Single-threaded SPECjvm2008:serial improves about 3% with this patch. >> >> Additional testing: >> - [x] Ad-hoc benchmarks >> - [x] Linux x86_64 fastdebug, `tier1`, `tier2`, `tier3` >> >> --------- >> ### Progress >> - [x] Change must not contain extraneous whitespace >> - [x] Commit message must refer to an issue >> - [ ] Change must be properly reviewed >> >> >> >> ### Download >> `$ git fetch https://git.openjdk.java.net/jdk pull/2485/head:pull/2485` >> `$ git checkout pull/2485` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Added a comment Marked as reviewed by alanb (Reviewer). src/hotspot/share/prims/jvm.cpp line 3297: > 3295: !ik->is_subclass_of(vmClasses::reflect_ConstructorAccessorImpl_klass())) { > 3296: return JNIHandles::make_local(THREAD, loader); > 3297: } This okay looks but surprised it has a measurable (or micro) difference. There has been several proposals over the years to improve latestUserDefinedLoader in the common case that OIS.resolveClass is not overridden. It may need to be looked at again. Ideally JVM_LastestUSerDefinedLoader would go away and there would be a solution based on StackWalker (but work would be required there to match the current performance). ------------- PR: https://git.openjdk.java.net/jdk/pull/2485 From thartmann at openjdk.java.net Thu Feb 11 08:36:39 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 11 Feb 2021 08:36:39 GMT Subject: RFR: 8261447: MethodInvocationCounters frequently run into overflow In-Reply-To: References: Message-ID: On Wed, 10 Feb 2021 16:28:29 GMT, Lutz Schmidt wrote: > Dear community, > may I please request reviews for this fix, improving the usefulness of method invocation counters. > - aggregation counters are retyped as uint64_t, shifting the overflow probability way out (185 days in case of a 1 GHz counter update frequency). > - counters for individual methods are interpreted as (unsigned int), in contrast to their declaration as int. This gives us a factor of two before the counters overflow. > - as a special case, "compiled_invocation_counter" is retyped as long, because it has a higher update frequency than other counters. > - before/after sample output is attached to the bug description. > > Thank you! > Lutz Changes requested by thartmann (Reviewer). src/hotspot/share/oops/method.hpp line 728: > 726: #ifndef PRODUCT > 727: static ByteSize compiled_invocation_counter_offset() { return byte_offset_of(Method, _compiled_invocation_count); } > 728: static ByteSize compiled_invocation_counter_offset64() { return byte_offset_of(Method, _compiled_invocation_count64); } `compiled_invocation_counter_offset()` looks unused. src/hotspot/share/oops/method.hpp line 459: > 457: #else > 458: // for PrintMethodData in a product build > 459: int compiled_invocation_count() const { return 0; } `compiled_invocation_count()` looks unused. src/hotspot/share/oops/method.hpp line 106: > 104: struct { > 105: int _compiled_invocation_count; // Number of nmethod invocations so far (for perf. debugging) > 106: // Must preserve this as int. Is used outside the jdk by SA. Why not update the SA code to access 64 bit? ------------- PR: https://git.openjdk.java.net/jdk/pull/2511 From jbhateja at openjdk.java.net Thu Feb 11 08:38:49 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Thu, 11 Feb 2021 08:38:49 GMT Subject: RFR: 8261553: Efficient mask generation using BMI2 BZHI instruction Message-ID: BMI2 BHZI instruction can be used to optimize the instruction sequence used for mask generation at various place in array copy stubs and partial in-lining for small copy operations. ------------- Commit messages: - 8261553: Efficient mask generation using BMI2 BZHI instruction. Changes: https://git.openjdk.java.net/jdk/pull/2522/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2522&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261553 Stats: 30 lines in 5 files changed: 8 ins; 11 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/2522.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2522/head:pull/2522 PR: https://git.openjdk.java.net/jdk/pull/2522 From stuefe at openjdk.java.net Thu Feb 11 08:41:40 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 11 Feb 2021 08:41:40 GMT Subject: RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages [v3] In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 07:48:53 GMT, Stefan Johansson wrote: >> When large pages are enabled on Linux (using -XX:+UseLargePages), both UseHugeTLBFS and UseSHM can be used. We prefer to use HugeTLBFS and first do a sanity check to see if this kind of large pages are available and if so we disable UseSHM. >> >> The problematic part is when HugeTLBFS pages are not available, then we disable this flag and without doing any sanity check for UseSHM, we mark large pages as enabled using SHM. One big problem with this is that SHM also requires the same type of explicitly allocated huge pages as HugeTLBFS and also privileges to lock memory. So it is likely that in the case of not being able to use HugeTLBFS we probably can't use SHM either. >> >> A fix for this would be to do a similar sanity check as currently done for HugeTLBFS and if it fails disable UseLargePages since we will always fail such allocation attempts anyways. >> >> The proposed sanity check consist of two part, where the first is just trying create a shared memory segment using `shmget()` with SHM_HUGETLB to use large pages. If this fails there is no idea in trying to use SHM to get large pages. >> >> The second part checks if the process has privileges to lock memory or if there will be a limit for the SHM usage. I think this would be a nice addition since it will notify the user about the limit and explain why large page mappings fail. The implementation parses `/proc/self/status` to make sure the needed capability is available. >> >> This change needs two tests to be updated to handle that large pages not can be disabled even when run with +UseLargePages. One of these tests are also updated in [PR#2486](https://github.com/openjdk/jdk/pull/2486) and I plan to get that integrated before this one. > > Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: > > Only warn if UseLargePages was explicitly set. Looks good now. Thank you! ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2488 From shade at openjdk.java.net Thu Feb 11 08:56:40 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 11 Feb 2021 08:56:40 GMT Subject: Integrated: 8261449: Micro-optimize JVM_LatestUserDefinedLoader In-Reply-To: <3EpwZwOAE3WA0NyFG6b2UcZLdKcBihJ82eKMj0gunBE=.3435d3cc-f9f4-4994-855b-ce1ce6e6f93f@github.com> References: <3EpwZwOAE3WA0NyFG6b2UcZLdKcBihJ82eKMj0gunBE=.3435d3cc-f9f4-4994-855b-ce1ce6e6f93f@github.com> Message-ID: <9KfR8SbKeGMyUykV1nNHXH1zL7OItaqkcXtPZBuBuz4=.55b0b979-b3d9-4b06-997d-115bc3dccaf2@github.com> On Tue, 9 Feb 2021 15:40:03 GMT, Aleksey Shipilev wrote: > `JVM_LatestUserDefinedLoader` is called normally from `ObjectInputStream.resolveClass` -> `VM.latestUserDefinedLoader0`. And it takes a measurable time to walk the stack. There is JDK-8173368 that wants to replace it with `StackWalker`, but we can tune up the `JVM_LatestUserDefinedLoader` itself without changing the semantics of it (thus providing the backportability, including the releases that do not have `StackWalker`) and improving performance (thus providing a more aggressive baseline for `StackWalker` rewrite). > > The key is to recognize that out of two checks: 1) checking for two special subclasses; 2) checking for user classloader -- the first one usually passes, and second one fails much more frequently. First check also requires traversing the superclasses upwards looking for match. Reversing the order of the checks, plus inlining the helper method improves performance without changing the semantics. > > Out of curiosity, my previous patch dropped the first check completely, replacing it by asserts, and we definitely run into situation where that check is needed on some tests. > > On my machine, `VM.latestUserDefinedLoader` invocation time drops from 115 to 100 ns/op. Single-threaded SPECjvm2008:serial improves about 3% with this patch. > > Additional testing: > - [x] Ad-hoc benchmarks > - [x] Linux x86_64 fastdebug, `tier1`, `tier2`, `tier3` > > --------- > ### Progress > - [x] Change must not contain extraneous whitespace > - [x] Commit message must refer to an issue > - [ ] Change must be properly reviewed > > > > ### Download > `$ git fetch https://git.openjdk.java.net/jdk pull/2485/head:pull/2485` > `$ git checkout pull/2485` This pull request has now been integrated. Changeset: 49cf13d2 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/49cf13d2 Stats: 20 lines in 3 files changed: 4 ins; 13 del; 3 mod 8261449: Micro-optimize JVM_LatestUserDefinedLoader Reviewed-by: dholmes, stuefe, alanb ------------- PR: https://git.openjdk.java.net/jdk/pull/2485 From shade at openjdk.java.net Thu Feb 11 08:56:39 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 11 Feb 2021 08:56:39 GMT Subject: RFR: 8261449: Micro-optimize JVM_LatestUserDefinedLoader [v2] In-Reply-To: References: <3EpwZwOAE3WA0NyFG6b2UcZLdKcBihJ82eKMj0gunBE=.3435d3cc-f9f4-4994-855b-ce1ce6e6f93f@github.com> Message-ID: On Thu, 11 Feb 2021 08:04:55 GMT, Alan Bateman wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Added a comment > > Marked as reviewed by alanb (Reviewer). Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/2485 From redestad at openjdk.java.net Thu Feb 11 10:30:38 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Thu, 11 Feb 2021 10:30:38 GMT Subject: RFR: 8261553: Efficient mask generation using BMI2 BZHI instruction In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 08:31:40 GMT, Jatin Bhateja wrote: > BMI2 BHZI instruction can be used to optimize the instruction sequence > used for mask generation at various place in array copy stubs and partial in-lining for small copy operations. - Rather than removing the old code, I believe the code calling bzhiq needs to be in a branch checking `VM_Version::supports_bmi2`. Otherwise you'll hit asserts on older hardware without this extension - Some demonstration of the performance benefit would be nice - either a new microbenchmark or a statistically significant result running some existing ones, e.g. `make test TEST=micro:ArrayCopy` ------------- Changes requested by redestad (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2522 From aph at openjdk.java.net Thu Feb 11 10:52:01 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 11 Feb 2021 10:52:01 GMT Subject: RFR: 8261027: AArch64: Support for LSE atomics C++ HotSpot code [v8] In-Reply-To: References: Message-ID: > Go back a few years, and there were simple atomic load/store exclusive > instructions on Arm. Say you want to do an atomic increment of a > counter. You'd do an atomic load to get the counter into your local cache > in exclusive state, increment that counter locally, then write that > incremented counter back to memory with an atomic store. All the time > that cache line was in exclusive state, so you're guaranteed that > no-one else changed anything on that cache line while you had it. > > This is hard to scale on a very large system (e.g. Fugaku) because if > many processors are incrementing that counter you get a lot of cache > line ping-ponging between cores. > > So, Arm decided to add a locked memory increment instruction that > works without needing to load an entire line into local cache. It's a > single instruction that loads, increments, and writes back. The secret > is to send a cache control message to whichever processor owns the > cache line containing the count, tell that processor to increment the > counter and return the incremented value. That way cache coherency > traffic is mimimized. This new set of instructions is known as Large > System Extensions, or LSE. > > Unfortunately, in recent processors, the "old" load/store exclusive > instructions, sometimes perform very badly. Therefore, it's now > necessary for software to detect which version of Arm it's running > on, and use the "new" LSE instructions if they're available. Otherwise > performance can be very poor under heavy contention. > > GCC's -moutline-atomics does this by providing library calls which use > LSE if it's available, but this option is only provided on newer > versions of GCC. This is particularly problematic with older versions > of OpenJDK, which build using old GCC versions. > > Also, I suspect that some other operating systems could use this. > Perhaps not MacOS, given that all Apple CPUs support LSE, but > maybe Windows. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Cleanup ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2434/files - new: https://git.openjdk.java.net/jdk/pull/2434/files/81d35fe7..239dd7c5 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2434&range=07 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2434&range=06-07 Stats: 22 lines in 2 files changed: 1 ins; 20 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2434.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2434/head:pull/2434 PR: https://git.openjdk.java.net/jdk/pull/2434 From redestad at openjdk.java.net Thu Feb 11 11:41:06 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Thu, 11 Feb 2021 11:41:06 GMT Subject: RFR: 8261031: Move some ClassLoader name checking to native/VM [v2] In-Reply-To: <3fZUkpucpgdhZyyWDQ7Hp1oKthgl1ckXBq942wMNwxI=.7a3db0ca-03c0-44f9-ade9-3b4443cc6666@github.com> References: <3fZUkpucpgdhZyyWDQ7Hp1oKthgl1ckXBq942wMNwxI=.7a3db0ca-03c0-44f9-ade9-3b4443cc6666@github.com> Message-ID: <4ddL2LcWLk7PdgPXMPITpcLkje5n9g8QyYlKw0HPovg=.1383d9b2-c76c-4df3-8656-a64f28873cb6@github.com> > This patch moves some sanity checking done in ClassLoader.java to the corresponding endpoints in native or VM code. Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into checkName - Merge branch 'master' into checkName - Copyrights - Move class name checking for findBootstrapClass/findLoadedClass into native/VM ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2378/files - new: https://git.openjdk.java.net/jdk/pull/2378/files/f2fd1d1c..727b2b37 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2378&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2378&range=00-01 Stats: 28701 lines in 954 files changed: 16489 ins; 8085 del; 4127 mod Patch: https://git.openjdk.java.net/jdk/pull/2378.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2378/head:pull/2378 PR: https://git.openjdk.java.net/jdk/pull/2378 From chris.hegarty at oracle.com Thu Feb 11 12:11:34 2021 From: chris.hegarty at oracle.com (Chris Hegarty) Date: Thu, 11 Feb 2021 12:11:34 +0000 Subject: JEP395: spec clarify isRecord() verification? In-Reply-To: References: Message-ID: Hi Andrew, > On 11 Feb 2021, at 07:58, Andrew Leonard wrote: > > Hi, > Is someone able to clarify please the spec with regards isRecord() > > A record class is implicitly final, and cannot be abstract <----- > > Whether this should be enforced at static verification or in isRecord()? > > The JTreg testcase > https://github.com/openjdk/jdk16/blob/4de3a6be9e60b9676f2199cd18eadb54a9d6e3fe/test/jdk/java/lang/reflect/records/IsRecordTest.java#L77 > infers the later? I think what you are looking for is in this CSR: https://bugs.openjdk.java.net/browse/JDK-8257520 -Chris. From jbhateja at openjdk.java.net Thu Feb 11 12:25:53 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Thu, 11 Feb 2021 12:25:53 GMT Subject: RFR: 8261553: Efficient mask generation using BMI2 BZHI instruction [v2] In-Reply-To: References: Message-ID: <7AXx69j_wscf8ENt98_apHTd1OKbeO80nNVpU68Z794=.7d776104-0f08-4e92-b229-bc210123fc4e@github.com> > BMI2 BHZI instruction can be used to optimize the instruction sequence > used for mask generation at various place in array copy stubs and partial in-lining for small copy operations. Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8261553: Adding BMI2 missing check for partial in-lining. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2522/files - new: https://git.openjdk.java.net/jdk/pull/2522/files/38495aec..84c9c2da Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2522&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2522&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2522.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2522/head:pull/2522 PR: https://git.openjdk.java.net/jdk/pull/2522 From jbhateja at openjdk.java.net Thu Feb 11 12:25:53 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Thu, 11 Feb 2021 12:25:53 GMT Subject: RFR: 8261553: Efficient mask generation using BMI2 BZHI instruction [v2] In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 10:28:05 GMT, Claes Redestad wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> 8261553: Adding BMI2 missing check for partial in-lining. > > - Rather than removing the old code, I believe the code calling bzhiq needs to be in a branch checking `VM_Version::supports_bmi2`. Otherwise you'll hit asserts on older hardware without this extension > - Some demonstration of the performance benefit would be nice - either a new microbenchmark or a statistically significant result running some existing ones, e.g. `make test TEST=micro:ArrayCopy` Hi Claes, Here is the JMH performance data over CLX for arraycopy benchmarks: http://cr.openjdk.java.net/~jbhateja/8261553/JMH_PERF_CLX_BASELINE.txt http://cr.openjdk.java.net/~jbhateja/8261553/JMH_PERF_CLX_WITH_OPTS.txt Regards, Jatin ------------- PR: https://git.openjdk.java.net/jdk/pull/2522 From jbhateja at openjdk.java.net Thu Feb 11 12:33:38 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Thu, 11 Feb 2021 12:33:38 GMT Subject: RFR: 8261553: Efficient mask generation using BMI2 BZHI instruction [v2] In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 10:28:05 GMT, Claes Redestad wrote: > * Rather than removing the old code, I believe the code calling bzhiq needs to be in a branch checking `VM_Version::supports_bmi2`. Otherwise you'll hit asserts on older hardware without this extension Hi Claes, added missing safely check for BMI2, its in general rare that a target supporting AVX-512 does not support BMI2 > * Some demonstration of the performance benefit would be nice - either a new microbenchmark or a statistically significant result running some existing ones, e.g. `make test TEST=micro:ArrayCopy` ------------- PR: https://git.openjdk.java.net/jdk/pull/2522 From redestad at openjdk.java.net Thu Feb 11 12:44:54 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Thu, 11 Feb 2021 12:44:54 GMT Subject: RFR: 8261031: Move some ClassLoader name checking to native/VM [v3] In-Reply-To: <3fZUkpucpgdhZyyWDQ7Hp1oKthgl1ckXBq942wMNwxI=.7a3db0ca-03c0-44f9-ade9-3b4443cc6666@github.com> References: <3fZUkpucpgdhZyyWDQ7Hp1oKthgl1ckXBq942wMNwxI=.7a3db0ca-03c0-44f9-ade9-3b4443cc6666@github.com> Message-ID: > This patch moves some sanity checking done in ClassLoader.java to the corresponding endpoints in native or VM code. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Consolidate verifyClassname and verifyFixClassname ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2378/files - new: https://git.openjdk.java.net/jdk/pull/2378/files/727b2b37..6b8305e9 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2378&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2378&range=01-02 Stats: 77 lines in 4 files changed: 13 ins; 40 del; 24 mod Patch: https://git.openjdk.java.net/jdk/pull/2378.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2378/head:pull/2378 PR: https://git.openjdk.java.net/jdk/pull/2378 From redestad at openjdk.java.net Thu Feb 11 12:44:54 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Thu, 11 Feb 2021 12:44:54 GMT Subject: RFR: 8261031: Move some ClassLoader name checking to native/VM [v3] In-Reply-To: References: <3fZUkpucpgdhZyyWDQ7Hp1oKthgl1ckXBq942wMNwxI=.7a3db0ca-03c0-44f9-ade9-3b4443cc6666@github.com> Message-ID: On Thu, 4 Feb 2021 13:14:16 GMT, Coleen Phillimore wrote: >> Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: >> >> Consolidate verifyClassname and verifyFixClassname > > Changes requested by coleenp (Reviewer). I tried to consolidate all name checking into the native layer for the remaining methods, but there are places where we are calling the JNI code with internalized names directly through `JavaLangAccess.defineClass`, so we'd need a way to differentiate these. Seems simpler to leave the `checkName` in `preDefineClass` for now. For the JNI code consolidating verifyFixClassname and verifyClassname into a single method seems to be the most straightforward simplification possible, since these are currently called back to back. Since ASCII like `/` is never a component of a multibyte character in UTF-8 we can do the fix-up pass without validation, then do the full verification. This simplifies the code and might speed it up marginally. Also added some cleanup to the cleanup code as suggested by @tstuefe in #2407 ------------- PR: https://git.openjdk.java.net/jdk/pull/2378 From redestad at openjdk.java.net Thu Feb 11 12:59:39 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Thu, 11 Feb 2021 12:59:39 GMT Subject: RFR: 8261553: Efficient mask generation using BMI2 BZHI instruction [v2] In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 12:22:29 GMT, Jatin Bhateja wrote: > Hi Claes, > > Here is the JMH performance data over CLX for arraycopy benchmarks: > http://cr.openjdk.java.net/~jbhateja/8261553/JMH_PERF_CLX_BASELINE.txt > http://cr.openjdk.java.net/~jbhateja/8261553/JMH_PERF_CLX_WITH_OPTS.txt > > Regards, > Jatin Thanks! Eyeballing the results it looks like a mixed bag. There even seems to be a few regressions such as this: o.o.b.java.lang.ArrayCopyUnalignedSrc.testLong 1200 N/A avgt 2 61.663 ns/op --> o.o.b.java.lang.ArrayCopyUnalignedSrc.testLong 1200 N/A avgt 2 74.160 ns/op ------------- PR: https://git.openjdk.java.net/jdk/pull/2522 From shade at redhat.com Thu Feb 11 13:13:39 2021 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 11 Feb 2021 14:13:39 +0100 Subject: Atomic operations: your thoughts are welocme In-Reply-To: <448C638F-D688-4913-875C-5D8BA9235126@oracle.com> References: <448C638F-D688-4913-875C-5D8BA9235126@oracle.com> Message-ID: On 2/11/21 4:59 AM, Kim Barrett wrote: >> On Feb 8, 2021, at 1:14 PM, Andrew Haley wrote: >> >> I've been looking at the hottest Atomic operations in HotSpot, with a view to >> finding out if the default memory_order_conservative (which is very expensive >> on some architectures) can be weakened to something less. It's impossible to >> fix all of them, but perhaps we can fix some of the most frequent. > > Is there any information about the possible performance improvement from > such changes? 1.5-3M occurrences doesn't mean much without context. I am going through the exercise of relaxing some of the memory orders in Shenandoah code, and AArch64 benefits greatly from it (= two-way barriers are bad in hot code). There are obvious things like relaxing counter updates: JDK-8261503: Shenandoah: reconsider verifier memory ordering JDK-8261501: Shenandoah: reconsider heap statistics memory ordering JDK-8261500: Shenandoah: reconsider region live data memory ordering JDK-8261496: Shenandoah: reconsider pacing updates memory ordering There are more interesting things like relaxing accesses to marking bitmap (which is a large counter array in disguise) -- which effectively implies a CAS (and thus two FULL_MEM_BARRIER-s on AArch64) per marked object: JDK-8261493: Shenandoah: reconsider bitmap access memory ordering These five relaxations above cut down marking phase time on AArch64 for about 10..15%. And there is more advanced stuff where relaxed is not enough, but conservative is too conservative. There, acq/rel should be enough -- but we cannot yet test it, because AArch64 cmpxchg does not do anything except relaxed/conservative (JDK-8261579): JDK-8261492: Shenandoah: reconsider forwardee accesses memory ordering JDK-8261495: Shenandoah: reconsider update references memory ordering These two (along with experimental 8261579 fix) cut down evacuation and update-references phase times for about 25..30% and 10..15%, respectively. All in all, this cuts down Shenandoah GC cycle times on AArch64 for about 15..20%! So, I believe this shows enough benefit to invest our time. Heavy-duty GC code is where I expect the most benefit. -- Thanks, -Aleksey From lucy at openjdk.java.net Thu Feb 11 13:30:44 2021 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Thu, 11 Feb 2021 13:30:44 GMT Subject: RFR: 8261447: MethodInvocationCounters frequently run into overflow In-Reply-To: References: Message-ID: <0gau8CsVF4CqY9mzy6CmQzC1WUw4kiJ0ASFoZUP-4RU=.b51ee307-1c1c-4569-9fb8-9bdd8b4dea32@github.com> On Thu, 11 Feb 2021 08:28:47 GMT, Tobias Hartmann wrote: >> Dear community, >> may I please request reviews for this fix, improving the usefulness of method invocation counters. >> - aggregation counters are retyped as uint64_t, shifting the overflow probability way out (185 days in case of a 1 GHz counter update frequency). >> - counters for individual methods are interpreted as (unsigned int), in contrast to their declaration as int. This gives us a factor of two before the counters overflow. >> - as a special case, "compiled_invocation_counter" is retyped as long, because it has a higher update frequency than other counters. >> - before/after sample output is attached to the bug description. >> >> Thank you! >> Lutz > > src/hotspot/share/oops/method.hpp line 728: > >> 726: #ifndef PRODUCT >> 727: static ByteSize compiled_invocation_counter_offset() { return byte_offset_of(Method, _compiled_invocation_count); } >> 728: static ByteSize compiled_invocation_counter_offset64() { return byte_offset_of(Method, _compiled_invocation_count64); } > > `compiled_invocation_counter_offset()` looks unused. Correct. Removed. > src/hotspot/share/oops/method.hpp line 459: > >> 457: #else >> 458: // for PrintMethodData in a product build >> 459: int compiled_invocation_count() const { return 0; } > > `compiled_invocation_count()` looks unused. compiled_invocation_count() was used in compare_methods() and collect_invoked_methods(). I have converted those call sites to compiled_invocation_count64(). ------------- PR: https://git.openjdk.java.net/jdk/pull/2511 From enikitin at openjdk.java.net Thu Feb 11 13:32:47 2021 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Thu, 11 Feb 2021 13:32:47 GMT Subject: RFR: 8058176: [mlvm] tests should not allow code cache exhaustion Message-ID: Another approach to the JDK-8058176 and #2440 - never allowing the tests hit CodeCache limits. The most significant consumer is the MH graph builder (the MHTransformationGen), whose consumption is now controlled. List of changes: * Code cache size getters are added to WhiteBox; * MH sequences are now built with remaining Code cache size in mind (always let 2M clearance); * Dependencies on WhiteBox added for all affected tests; * The test cases in question un-problemlisted. Testing: the whole vmTestbase/vm/mlvm/ in win-lin-mac x86. ------------- Commit messages: - Un-problemlist the OOME tests - Add CodeCache methods to the WhiteBox - 8058176: [mlvm] tests should not allow code cache exhaustion Changes: https://git.openjdk.java.net/jdk/pull/2523/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2523&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8058176 Stats: 102 lines in 13 files changed: 88 ins; 6 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/2523.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2523/head:pull/2523 PR: https://git.openjdk.java.net/jdk/pull/2523 From aph at redhat.com Thu Feb 11 13:33:30 2021 From: aph at redhat.com (Andrew Haley) Date: Thu, 11 Feb 2021 13:33:30 +0000 Subject: Atomic operations: your thoughts are welocme In-Reply-To: <448C638F-D688-4913-875C-5D8BA9235126@oracle.com> References: <448C638F-D688-4913-875C-5D8BA9235126@oracle.com> Message-ID: <49d0408a-13f9-ddc8-06e3-e0eb27a708dd@redhat.com> On 11/02/2021 03:59, Kim Barrett wrote: >> On Feb 8, 2021, at 1:14 PM, Andrew Haley wrote: >> >> I've been looking at the hottest Atomic operations in HotSpot, with a view to >> finding out if the default memory_order_conservative (which is very expensive >> on some architectures) can be weakened to something less. It's impossible to >> fix all of them, but perhaps we can fix some of the most frequent. > > Is there any information about the possible performance improvement from > such changes? 1.5-3M occurrences doesn't mean much without context. > > We don't presently have support for sequentially consistent semantics, only > "conservative". My recollection is that this is in part because there might > be code that is assuming the possibly stronger "conservative" semantics, and > in part because there are different and incompatible approaches to > implementing sequentially consistent semantics on some hardware platforms > and we didn't want to make assumptions there. > > We also don't presently have any cmpxchg implementation that really supports > anything between conservative and relaxed, nor do we support different order > constraints for the success vs failure cases. Things can be complicated > enough as is; while we *could* fill some of that in, I'm not sure we should. OK. However, even though we don't implement any of them, we do have an API that includes acq, rel, and seq_cst. The fact that we don't have anything behind them is, I thought, To Be Done rather than Won't Do. >> ::Table::oop_oop_iterate(G1CMOopClosure*, oopDesc*, Klass*)+336>: :: = 3903178 >> >> This is actually MarkBitMap::par_mark calling BitMap::par_set_bit. Does this >> need to be memory_order_conservative, or would something weaker do? Even >> acq_rel or seq_cst would be better. > > I think for setting bits in a bitmap the thing to do would be to identify > places that are safe and useful (impacts performance) to do so first. Then > add a weaker variant for use in those places, assuming any are found. I see. I'm assuming that frequency of use is a useful proxy for impact. Aleksey has already, very helpfully, measured how significant these are for Shenandoah, and I suspect all concurrent GCs would benefit in a similar fashion. >> : :: = 2376632 >> : :: = 2003895 >> >> I can't imagine that either of these actually need memory_order_conservative, >> they're just reference counts. > > The "usual" refcount implementation involves relaxed increment and stronger > ordering for decrement. (If I'm remembering correctly, dec-acquire and a > release fence on the zero value path before deleting. But I've not thought > about what one might want for this CAS-based variant that handles boundary > cases specially.) And as you say, whether any of these could be weakened > depends on whether there is any code surrounding a use that depends on the > stronger ordering semantics. At a guess I suspect increment could be changed > to relaxed, but I've not looked. This one is probably a question for runtime > folks. OK, this makes sense. I'm thinking of the long road to getting this stuff documented so that we can see what side effects of atomic operations are actually required. >> : :: = 1719614 >> >> BitMap::par_set_bit again. >> >> , (MEMFLAGS)5>*)+432>: :: = 1617659 >> >> This one is GenericTaskQueue::pop_global calling cmpxchg_age(). >> Again, do we need conservative here? > > This needs at least sequentially consistent semantics on the success path. Yep. That's easy, it's the full barrier in the failure path that I'd love to eliminate. > See the referenced paper by Le, et al. > > There is also a cmpxchg_age in pop_local_slow. The Le, et al paper doesn't > deal with that path. But it's also not in your list, which is good since > this is supposed to be infrequently taken. Right. I'm trying to concentrate on the low-hanging fruit. Thank you for the very detailed and informative reply. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From zgu at openjdk.java.net Thu Feb 11 13:34:38 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 11 Feb 2021 13:34:38 GMT Subject: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering In-Reply-To: References: Message-ID: On Wed, 10 Feb 2021 08:55:39 GMT, Aleksey Shipilev wrote: > Shenandoah carries forwardee information in object's mark word. Installing the new mark word is effectively "releasing" the object copy, and reading from the new mark word is "acquiring" that object copy. > > For the forwardee update side, Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. This seems to be excessive for Shenandoah forwardee updates, and "release" is enough. > > For the forwardee load side, we need to guarantee "acquire". We do not do it now, reading the markword without memory semantics. It does not seem to pose a practical problem today, because GC does not access the object contents in the new copy, and mutators get this from the JRT-called stub that separates the fwdptr access and object contents access by a lot. It still should be cleaner to "acquire" the mark on load to avoid surprises. > > Additional testing: > - [x] Linux x86_64 `hotspot_gc_shenandoah` > - [x] Linux AArch64 `hotspot_gc_shenandoah` > - [x] Linux AArch64 `tier1` with Shenandoah Changes requested by zgu (Reviewer). src/hotspot/share/gc/shenandoah/shenandoahForwarding.inline.hpp line 89: > 87: // (would be paired with acquire in forwardee accessors). Acquire on failed update > 88: // would get the updated object after the forwardee load. > 89: markWord prev_mark = obj->cas_set_mark(new_mark, old_mark, memory_order_acq_rel); You have obj->mark_acquire() above, I don't think you need leading acquire here. src/hotspot/share/gc/shenandoah/shenandoahForwarding.inline.hpp line 43: > 41: // fwdptr. That object is still not forwarded, and we need to return > 42: // the object itself. > 43: markWord mark = obj->mark_acquire(); We also need the acquire barrier in fast path in generated code, right? ------------- PR: https://git.openjdk.java.net/jdk/pull/2496 From lucy at openjdk.java.net Thu Feb 11 13:35:43 2021 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Thu, 11 Feb 2021 13:35:43 GMT Subject: RFR: 8261447: MethodInvocationCounters frequently run into overflow In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 08:33:40 GMT, Tobias Hartmann wrote: >> Dear community, >> may I please request reviews for this fix, improving the usefulness of method invocation counters. >> - aggregation counters are retyped as uint64_t, shifting the overflow probability way out (185 days in case of a 1 GHz counter update frequency). >> - counters for individual methods are interpreted as (unsigned int), in contrast to their declaration as int. This gives us a factor of two before the counters overflow. >> - as a special case, "compiled_invocation_counter" is retyped as long, because it has a higher update frequency than other counters. >> - before/after sample output is attached to the bug description. >> >> Thank you! >> Lutz > > src/hotspot/share/oops/method.hpp line 106: > >> 104: struct { >> 105: int _compiled_invocation_count; // Number of nmethod invocations so far (for perf. debugging) >> 106: // Must preserve this as int. Is used outside the jdk by SA. > > Why not update the SA code to access 64 bit? Two reasons: scope limitation and lack of expertise in SA code. Don't we need the 32bit decl anyway because it is contained in vmstructs.cpp? ------------- PR: https://git.openjdk.java.net/jdk/pull/2511 From lucy at openjdk.java.net Thu Feb 11 13:41:39 2021 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Thu, 11 Feb 2021 13:41:39 GMT Subject: RFR: 8261447: MethodInvocationCounters frequently run into overflow In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 08:34:09 GMT, Tobias Hartmann wrote: >> Dear community, >> may I please request reviews for this fix, improving the usefulness of method invocation counters. >> - aggregation counters are retyped as uint64_t, shifting the overflow probability way out (185 days in case of a 1 GHz counter update frequency). >> - counters for individual methods are interpreted as (unsigned int), in contrast to their declaration as int. This gives us a factor of two before the counters overflow. >> - as a special case, "compiled_invocation_counter" is retyped as long, because it has a higher update frequency than other counters. >> - before/after sample output is attached to the bug description. >> >> Thank you! >> Lutz > > Changes requested by thartmann (Reviewer). Thank you Tobias for having a look. I'll post an update asap. ------------- PR: https://git.openjdk.java.net/jdk/pull/2511 From shade at openjdk.java.net Thu Feb 11 13:43:40 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 11 Feb 2021 13:43:40 GMT Subject: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 13:32:00 GMT, Zhengyu Gu wrote: >> Shenandoah carries forwardee information in object's mark word. Installing the new mark word is effectively "releasing" the object copy, and reading from the new mark word is "acquiring" that object copy. >> >> For the forwardee update side, Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. This seems to be excessive for Shenandoah forwardee updates, and "release" is enough. >> >> For the forwardee load side, we need to guarantee "acquire". We do not do it now, reading the markword without memory semantics. It does not seem to pose a practical problem today, because GC does not access the object contents in the new copy, and mutators get this from the JRT-called stub that separates the fwdptr access and object contents access by a lot. It still should be cleaner to "acquire" the mark on load to avoid surprises. >> >> Additional testing: >> - [x] Linux x86_64 `hotspot_gc_shenandoah` >> - [x] Linux AArch64 `hotspot_gc_shenandoah` >> - [x] Linux AArch64 `tier1` with Shenandoah > > src/hotspot/share/gc/shenandoah/shenandoahForwarding.inline.hpp line 43: > >> 41: // fwdptr. That object is still not forwarded, and we need to return >> 42: // the object itself. >> 43: markWord mark = obj->mark_acquire(); > > We also need the acquire barrier in fast path in generated code, right? Dang. I thought the beauty of self-fixing barriers is that we moved all fwdptr accesses to C++ (either GC or LRB), and all of them end up in this file. But there is `ShenandoahBarrierSetAssembler::cmpxchg_oop` that accesses the fwdptr directly. I shall see what can be done there. > src/hotspot/share/gc/shenandoah/shenandoahForwarding.inline.hpp line 89: > >> 87: // (would be paired with acquire in forwardee accessors). Acquire on failed update >> 88: // would get the updated object after the forwardee load. >> 89: markWord prev_mark = obj->cas_set_mark(new_mark, old_mark, memory_order_acq_rel); > > You have obj->mark_acquire() above, I don't think you need leading acquire here. A bit different: we want to acquire `prev_mark` (failure witness) here, if we lose the update race. So the `old_mark` acquire does not really help us. ------------- PR: https://git.openjdk.java.net/jdk/pull/2496 From jbhateja at openjdk.java.net Thu Feb 11 13:54:38 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Thu, 11 Feb 2021 13:54:38 GMT Subject: RFR: 8261553: Efficient mask generation using BMI2 BZHI instruction [v2] In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 12:56:24 GMT, Claes Redestad wrote: > > Hi Claes, > > Here is the JMH performance data over CLX for arraycopy benchmarks: > > http://cr.openjdk.java.net/~jbhateja/8261553/JMH_PERF_CLX_BASELINE.txt > > http://cr.openjdk.java.net/~jbhateja/8261553/JMH_PERF_CLX_WITH_OPTS.txt > > Regards, > > Jatin > > Thanks! Eyeballing the results it looks like a mixed bag. There even seems to be a few regressions such as this: > > ``` > o.o.b.java.lang.ArrayCopyUnalignedSrc.testLong 1200 N/A avgt 2 61.663 ns/op > --> > o.o.b.java.lang.ArrayCopyUnalignedSrc.testLong 1200 N/A avgt 2 74.160 ns/op > ``` Hi Claes, This could be a run to run variation, in general we are now having fewer number of instructions (one shift operation saved per mask computation) compared to previous masked generation sequence and thus it will always offer better execution latencies. ------------- PR: https://git.openjdk.java.net/jdk/pull/2522 From shade at openjdk.java.net Thu Feb 11 14:07:59 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 11 Feb 2021 14:07:59 GMT Subject: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v2] In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 13:39:42 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/gc/shenandoah/shenandoahForwarding.inline.hpp line 43: >> >>> 41: // fwdptr. That object is still not forwarded, and we need to return >>> 42: // the object itself. >>> 43: markWord mark = obj->mark_acquire(); >> >> We also need the acquire barrier in fast path in generated code, right? > > Dang. I thought the beauty of self-fixing barriers is that we moved all fwdptr accesses to C++ (either GC or LRB), and all of them end up in this file. But there is `ShenandoahBarrierSetAssembler::cmpxchg_oop` that accesses the fwdptr directly. I shall see what can be done there. I believe only AArch64 needs a fix. x86_64 already has a strong semantics. See new commit. ------------- PR: https://git.openjdk.java.net/jdk/pull/2496 From shade at openjdk.java.net Thu Feb 11 14:07:58 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 11 Feb 2021 14:07:58 GMT Subject: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v2] In-Reply-To: References: Message-ID: > Shenandoah carries forwardee information in object's mark word. Installing the new mark word is effectively "releasing" the object copy, and reading from the new mark word is "acquiring" that object copy. > > For the forwardee update side, Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. This seems to be excessive for Shenandoah forwardee updates, and "release" is enough. > > For the forwardee load side, we need to guarantee "acquire". We do not do it now, reading the markword without memory semantics. It does not seem to pose a practical problem today, because GC does not access the object contents in the new copy, and mutators get this from the JRT-called stub that separates the fwdptr access and object contents access by a lot. It still should be cleaner to "acquire" the mark on load to avoid surprises. > > Additional testing: > - [x] Linux x86_64 `hotspot_gc_shenandoah` > - [x] Linux AArch64 `hotspot_gc_shenandoah` > - [x] Linux AArch64 `tier1` with Shenandoah Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Make sure to access fwdptr with acquire semantics in assembler code ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2496/files - new: https://git.openjdk.java.net/jdk/pull/2496/files/b6ac8e4c..49626781 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2496&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2496&range=00-01 Stats: 5 lines in 1 file changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2496.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2496/head:pull/2496 PR: https://git.openjdk.java.net/jdk/pull/2496 From sviswanathan at openjdk.java.net Thu Feb 11 14:26:45 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Thu, 11 Feb 2021 14:26:45 GMT Subject: RFR: 8261542: X86 slice and unslice intrinsics for 256-bit byte/short vectors Message-ID: The slice and unslice intrinsics for 256-bit byte/short vectors can be implemented for x86 platforms supporting AVX2 using a sequence of instructions. JBS: https://bugs.openjdk.java.net/browse/JDK-8261542 The PerfSliceOrigin.java jmh test attached to the JBS shows the following performance on AVX2 platform. Before: Benchmark (size) Mode Cnt Score Error Units PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 18.887 ? 1.128 ops/ms PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 9.374 ? 0.370 ops/ms After: Benchmark (size) Mode Cnt Score Error Units PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 13861.420 ? 19.071 ops/ms PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 7895.199 ? 142.580 ops/ms ------------- Commit messages: - 8261542: X86 slice and unslice intrinsics for 256-bit byte/short vectors Changes: https://git.openjdk.java.net/jdk/pull/2520/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2520&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261542 Stats: 119 lines in 7 files changed: 99 ins; 5 del; 15 mod Patch: https://git.openjdk.java.net/jdk/pull/2520.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2520/head:pull/2520 PR: https://git.openjdk.java.net/jdk/pull/2520 From nhe at activeviam.com Thu Feb 11 14:28:58 2021 From: nhe at activeviam.com (Nicolas Heutte) Date: Thu, 11 Feb 2021 15:28:58 +0100 Subject: SuperWord loop optimization lost after method inlining In-Reply-To: References: Message-ID: Hi Vladimir, Thank you for your help. I'm currently running Java 11.0.9, and I did not use any VM flag of note. I checked the content of the compilation log, and it seems that ArrayFloatToArrayFloatVectorBinding::plus() was deoptimized in order to allow AVector::plus() to be compiled: The last compilation entry for AVector::plus() is: @ 1 com.qfs.vector.array.impl.ArrayFloatVector::getBindingId (4 bytes) inline (hot) \-> TypeProfile (14552/14552 counts) = com/qfs/vector/array/impl/ArrayFloatVector @ 7 com.qfs.vector.array.impl.ArrayFloatVector::getBindingId (4 bytes) inline (hot) \-> TypeProfile (14150/14150 counts) = com/qfs/vector/array/impl/ArrayFloatVector @ 10 com.qfs.vector.binding.impl.VectorBindings::getBinding (9 bytes) inline (hot) @ 5 com.qfs.vector.binding.impl.VectorBindings$VectorBindingsProvider::getBinding (22 bytes) inline (hot) @ 3 com.qfs.vector.binding.impl.VectorBindings$VectorBindingsProvider::hasBinding (34 bytes) inline (hot) @ 17 com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus (69 bytes) inline (hot) \-> TypeProfile (14054/14054 counts) = com/qfs/vector/binding/impl/ArrayFloatToArrayFloatVectorBinding @ 12 com.qfs.vector.array.impl.ArrayFloatVector::size (6 bytes) inline (hot) @ 22 com.qfs.vector.impl.AVector::checkIndex (37 bytes) inline (hot) @ 6 com.qfs.vector.array.impl.ArrayFloatVector::size (6 bytes) inline (hot) @ 27 com.qfs.vector.array.impl.ArrayFloatVector::getUnderlying (5 bytes) accessor @ 34 com.qfs.vector.array.impl.ArrayFloatVector::getUnderlying (5 bytes) accessor Unfortunately, I do not have access to a debug VM build, so I cannot run the second test you recommend. Best regards, Nicolas Heutte On Wed, Feb 10, 2021 at 7:36 PM Vladimir Kozlov wrote: > Hi, Nicolas > > Looks like, when inlined, the loop from > ArrayFloatToArrayFloatVectorBinding::plus() was not optimized at all: it is > not > unrolled and has range checks. Such loops are not vectorized (you need > unrolling and no checks). > > What Java version you are running? What HotSpot VM flags you are using > when running application? > > Run application with -XX:+LogCompilation and look on compilation data in > hotspot_pid.log file for caller > AVector::plus(). > > VM also has several flags to trace loop optimizations but they are only > available in debug VM build. If you have access > to such build run with -XX:+PrintCompilation -XX:+TraceLoopOpts flags. > > Thanks, > Vladimir K > > On 2/10/21 9:24 AM, Nicolas Heutte wrote: > > Hi all, > > > > I am encountering a performance issue caused by the interaction between > > method inlining and automatic vectorization. > > > > Our application aggregates arrays intensively using a method named > > ArrayFloatToArrayFloatVectorBinding.plus() with the following code: > > > > for (int i = 0; i < srcLen; ++i) { > > > > dstArray[i] += srcArray[i]; > > > > } > > > > When we microbenchmark this method we observe fast performance close to > the > > practical memory bandwidth and when we print the assembly code we observe > > loop unrolling and automatic vectorization with SIMD instructions. > > > > 0x000001ef4600abf0: vmovdqu 0x10(%r14,%r13,4),%ymm0 > > > > 0x000001ef4600abf7: vaddps 0x10(%rcx,%r13,4),%ymm0,%ymm0 > > > > 0x000001ef4600abfe: vmovdqu %ymm0,0x10(%r14,%r13,4) > > > > 0x000001ef4600ac05: movslq %r13d,%r11 > > > > 0x000001ef4600ac08: vmovdqu 0x30(%r14,%r11,4),%ymm0 > > > > 0x000001ef4600ac0f: vaddps 0x30(%rcx,%r11,4),%ymm0,%ymm0 > > > > 0x000001ef4600ac16: vmovdqu %ymm0,0x30(%r14,%r11,4) > > > > 0x000001ef4600ac1d: vmovdqu 0x50(%r14,%r11,4),%ymm0 > > > > 0x000001ef4600ac24: vaddps 0x50(%rcx,%r11,4),%ymm0,%ymm0 > > > > 0x000001ef4600ac2b: vmovdqu %ymm0,0x50(%r14,%r11,4) > > > > 0x000001ef4600ac32: vmovdqu 0x70(%r14,%r11,4),%ymm0 > > > > 0x000001ef4600ac39: vaddps 0x70(%rcx,%r11,4),%ymm0,%ymm0 > > > > 0x000001ef4600ac40: vmovdqu %ymm0,0x70(%r14,%r11,4) > > > > 0x000001ef4600ac47: vmovdqu 0x90(%r14,%r11,4),%ymm0 > > > > 0x000001ef4600ac51: vaddps 0x90(%rcx,%r11,4),%ymm0,%ymm0 > > > > 0x000001ef4600ac5b: vmovdqu %ymm0,0x90(%r14,%r11,4) > > > > 0x000001ef4600ac65: vmovdqu 0xb0(%r14,%r11,4),%ymm0 > > > > 0x000001ef4600ac6f: vaddps 0xb0(%rcx,%r11,4),%ymm0,%ymm0 > > > > 0x000001ef4600ac79: vmovdqu %ymm0,0xb0(%r14,%r11,4) > > > > 0x000001ef4600ac83: vmovdqu 0xd0(%r14,%r11,4),%ymm0 > > > > 0x000001ef4600ac8d: vaddps 0xd0(%rcx,%r11,4),%ymm0,%ymm0 > > > > 0x000001ef4600ac97: vmovdqu %ymm0,0xd0(%r14,%r11,4) > > > > 0x000001ef4600aca1: vmovdqu 0xf0(%r14,%r11,4),%ymm0 > > > > 0x000001ef4600acab: vaddps 0xf0(%rcx,%r11,4),%ymm0,%ymm0 > > > > 0x000001ef4600acb5: vmovdqu %ymm0,0xf0(%r14,%r11,4) ;*fastore > > {reexecute=0 rethrow=0 return_oop=0} > > > > ; - > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 61 > > (line 41) > > > > 0x000001ef4600acbf: add $0x40,%r13d ;*iinc {reexecute=0 > > rethrow=0 return_oop=0} > > > > ; - > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 62 > > (line 40) > > > > 0x000001ef4600acc3: cmp %eax,%r13d > > > > 0x000001ef4600acc6: jl 0x000001ef4600abf0 ;*goto {reexecute=0 > > rethrow=0 return_oop=0} > > > > ; - > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 65 > > (line 40) > > > > > > > > In the real application, this method is actually inlined in a higher > level > > method named AVector.plus(). Unfortunately, the inlined version of the > > aggregation code is not vectorized anymore: > > > > > > > > 0x000001ef460180a0: cmp %ebx,%r11d > > > > 0x000001ef460180a3: jae 0x000001ef460180e6 > > > > 0x000001ef460180a5: vmovss 0x10(%r8,%r11,4),%xmm1 ;*faload > {reexecute=0 > > rethrow=0 return_oop=0} > > > > ; - > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 54 > > (line 41) > > > > ; - > > com.qfs.vector.impl.AVector::plus at 17 (line 204) > > > > 0x000001ef460180ac: cmp %ecx,%r11d > > > > 0x000001ef460180af: jae 0x000001ef46018104 > > > > 0x000001ef460180b1: vaddss 0x10(%r9,%r11,4),%xmm1,%xmm1 > > > > 0x000001ef460180b8: vmovss %xmm1,0x10(%r8,%r11,4) ;*fastore > {reexecute=0 > > rethrow=0 return_oop=0} > > > > ; - > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 61 > > (line 41) > > > > ; - > > com.qfs.vector.impl.AVector::plus at 17 (line 204) > > > > 0x000001ef460180bf: inc %r11d ;*iinc {reexecute=0 > > rethrow=0 return_oop=0} > > > > ; - > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 62 > > (line 40) > > > > ; - > > com.qfs.vector.impl.AVector::plus at 17 (line 204) > > > > 0x000001ef460180c2: cmp %r10d,%r11d > > > > 0x000001ef460180c5: jl 0x000001ef460180a0 ;*goto {reexecute=0 > > rethrow=0 return_oop=0} > > > > ; - > > com.qfs.vector.binding.impl.ArrayFloatToArrayFloatVectorBinding::plus at 65 > > (line 40) > > > > ; - > > com.qfs.vector.impl.AVector::plus at 17 (line 204) > > > > > > > > This causes a significant performance drop, compared to a run where we > > explicitly disable the inlining and observe automatically vectorized code > > again ( > > > -XX:CompileCommand=dontinline,com/qfs/vector/binding/impl/ArrayFloatToArrayFloatVectorBinding.plus > > ). > > > > > > How do you guys explain that behavior of the JIT compiler? Is this a > known > > and tracked issue, could it be fixed in the JVM? Can we do something in > the > > java code to prevent this from happening? > > > > > > Best regards, > > > > Nicolas Heutte > > > From redestad at openjdk.java.net Thu Feb 11 14:30:37 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Thu, 11 Feb 2021 14:30:37 GMT Subject: RFR: 8261553: Efficient mask generation using BMI2 BZHI instruction [v2] In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 13:52:09 GMT, Jatin Bhateja wrote: > Hi Claes, This could be a run to run variation, in general we are now having fewer number of instructions (one shift operation saved per mask computation) compared to previous masked generation sequence and thus it will always offer better execution latencies. Run-to-run variation would be easy to rule out by running more forks and more iterations to attain statistically significant results. While the instruction manuals suggest latency should be better for this instruction on all CPUs where it's supported, it would be good if there was some clear proof - such as a significant benchmark win - to motivate the added complexity. ------------- PR: https://git.openjdk.java.net/jdk/pull/2522 From zgu at openjdk.java.net Thu Feb 11 14:31:41 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 11 Feb 2021 14:31:41 GMT Subject: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v2] In-Reply-To: References: Message-ID: <-mMs6CWp1r2GKMwtamARlpAfWQnUI-I3sSCfpmaRfPI=.9aa2cb72-1370-47ee-8de0-ce4b8a946661@github.com> On Thu, 11 Feb 2021 14:07:58 GMT, Aleksey Shipilev wrote: >> Shenandoah carries forwardee information in object's mark word. Installing the new mark word is effectively "releasing" the object copy, and reading from the new mark word is "acquiring" that object copy. >> >> For the forwardee update side, Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. This seems to be excessive for Shenandoah forwardee updates, and "release" is enough. >> >> For the forwardee load side, we need to guarantee "acquire". We do not do it now, reading the markword without memory semantics. It does not seem to pose a practical problem today, because GC does not access the object contents in the new copy, and mutators get this from the JRT-called stub that separates the fwdptr access and object contents access by a lot. It still should be cleaner to "acquire" the mark on load to avoid surprises. >> >> Additional testing: >> - [x] Linux x86_64 `hotspot_gc_shenandoah` >> - [x] Linux AArch64 `hotspot_gc_shenandoah` >> - [x] Linux AArch64 `tier1` with Shenandoah > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Make sure to access fwdptr with acquire semantics in assembler code src/hotspot/share/gc/shenandoah/shenandoahForwarding.inline.hpp line 58: > 56: assert(Thread::current()->is_Java_thread(), "Must be a mutator thread"); > 57: > 58: markWord mark = obj->mark_acquire(); Actually, I would argue that you don't need acquire here, since you don't touch anything other than mark word, so that there is no order that needs to be enforced. src/hotspot/share/gc/shenandoah/shenandoahForwarding.inline.hpp line 89: > 87: // (would be paired with acquire in forwardee accessors). Acquire on failed update > 88: // would get the updated object after the forwardee load. > 89: markWord prev_mark = obj->cas_set_mark(new_mark, old_mark, memory_order_acq_rel); Same here, CAS guarantees you to see latest mark word. memory_order_release should be sufficient here, paired with memory_order_acquire in resolve_forward barrier to ensure safe publishing the new object. ------------- PR: https://git.openjdk.java.net/jdk/pull/2496 From hshi at openjdk.java.net Thu Feb 11 14:47:52 2021 From: hshi at openjdk.java.net (Hui Shi) Date: Thu, 11 Feb 2021 14:47:52 GMT Subject: RFR: 8261585: Restore HandleArea used in Deoptimization::uncommon_trap Message-ID: Add HandleMark in Deoptimization::uncommon_trap before Deoptimization::fetch_unroll_info_helper, avoid reference hold in HandleArea increase object lifetime. Then object lifetime will be consistent with/without uncommon trap. For test case in commit, WeakReference is expected cleared after GC, but it fails with option "-XX:-Inline -XX:-TieredCompilation -XX:CompileCommand=compileonly,UncommonTrapLeak.foo -XX:CompileThreshold=100 -XX:-BackgroundCompilation". Reference's referent object is still alive after "foo" finish, because with uncommon trap, oops are recorded in HandleArea and HandleArea is not poped when uncommon trap process finish. When Deoptimization::fetch_unroll_info_helper return, all oops in deoptimized frames are saved in Deoptimization::UnrollBlock or Thread data structure, HandleArea can be poped safely. 1. local and expression oops raw address is stored in vframeArrayElement _locals/_expressions as intptr 2. return value restore, raw oop recoreded in frame // (oop *)map->location(rax->as_VMReg()); 3. exception object, raw oop recorded on Thread._exception_oop In deoptimize blob entry, JRT_BLOCK_ENTRY(Deoptimization::fetch_unroll_info) has HandleMarkCleaner, HandleArea is restored after Deoptimization::fetch_unroll_info_helper finish. So it's also safe to add HandleMark in Deoptimization::uncommon_trap before fetch_unroll_info_helper. ------------- Commit messages: - 8261585: Restore HandleArea used in Deoptimization::uncommon_trap Changes: https://git.openjdk.java.net/jdk/pull/2526/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2526&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261585 Stats: 64 lines in 2 files changed: 64 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2526.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2526/head:pull/2526 PR: https://git.openjdk.java.net/jdk/pull/2526 From aph at openjdk.java.net Thu Feb 11 15:36:58 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 11 Feb 2021 15:36:58 GMT Subject: RFR: 8261027: AArch64: Support for LSE atomics C++ HotSpot code [v9] In-Reply-To: References: Message-ID: > Go back a few years, and there were simple atomic load/store exclusive > instructions on Arm. Say you want to do an atomic increment of a > counter. You'd do an atomic load to get the counter into your local cache > in exclusive state, increment that counter locally, then write that > incremented counter back to memory with an atomic store. All the time > that cache line was in exclusive state, so you're guaranteed that > no-one else changed anything on that cache line while you had it. > > This is hard to scale on a very large system (e.g. Fugaku) because if > many processors are incrementing that counter you get a lot of cache > line ping-ponging between cores. > > So, Arm decided to add a locked memory increment instruction that > works without needing to load an entire line into local cache. It's a > single instruction that loads, increments, and writes back. The secret > is to send a cache control message to whichever processor owns the > cache line containing the count, tell that processor to increment the > counter and return the incremented value. That way cache coherency > traffic is mimimized. This new set of instructions is known as Large > System Extensions, or LSE. > > Unfortunately, in recent processors, the "old" load/store exclusive > instructions, sometimes perform very badly. Therefore, it's now > necessary for software to detect which version of Arm it's running > on, and use the "new" LSE instructions if they're available. Otherwise > performance can be very poor under heavy contention. > > GCC's -moutline-atomics does this by providing library calls which use > LSE if it's available, but this option is only provided on newer > versions of GCC. This is particularly problematic with older versions > of OpenJDK, which build using old GCC versions. > > Also, I suspect that some other operating systems could use this. > Perhaps not MacOS, given that all Apple CPUs support LSE, but > maybe Windows. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Spillchuck ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2434/files - new: https://git.openjdk.java.net/jdk/pull/2434/files/239dd7c5..747fab11 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2434&range=08 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2434&range=07-08 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2434.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2434/head:pull/2434 PR: https://git.openjdk.java.net/jdk/pull/2434 From shade at openjdk.java.net Thu Feb 11 16:08:40 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 11 Feb 2021 16:08:40 GMT Subject: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v2] In-Reply-To: <-mMs6CWp1r2GKMwtamARlpAfWQnUI-I3sSCfpmaRfPI=.9aa2cb72-1370-47ee-8de0-ce4b8a946661@github.com> References: <-mMs6CWp1r2GKMwtamARlpAfWQnUI-I3sSCfpmaRfPI=.9aa2cb72-1370-47ee-8de0-ce4b8a946661@github.com> Message-ID: <6xIXW7WXJtqZVPASy4StAzwMhh6AHgYibgX6NRHoOF8=.85ffde9d-662c-4f3a-9410-de5ca9def6a7@github.com> On Thu, 11 Feb 2021 14:25:21 GMT, Zhengyu Gu wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Make sure to access fwdptr with acquire semantics in assembler code > > src/hotspot/share/gc/shenandoah/shenandoahForwarding.inline.hpp line 58: > >> 56: assert(Thread::current()->is_Java_thread(), "Must be a mutator thread"); >> 57: >> 58: markWord mark = obj->mark_acquire(); > > Actually, I would argue that you don't need acquire here, since you don't touch anything other than mark word, so that there is no order that needs to be enforced. It is not about the mark word itself, we access the forwardee afterwards. It is about transitive dependency: object copy stores -> forwardee installation (markword store, release) -> forwardee discovery (markword load, acquire) --> object copy access. Remember this: https://github.com/openjdk/jdk/pull/2498#discussion_r574498830 > src/hotspot/share/gc/shenandoah/shenandoahForwarding.inline.hpp line 89: > >> 87: // (would be paired with acquire in forwardee accessors). Acquire on failed update >> 88: // would get the updated object after the forwardee load. >> 89: markWord prev_mark = obj->cas_set_mark(new_mark, old_mark, memory_order_acq_rel); > > Same here, CAS guarantees you to see latest mark word. memory_order_release should be sufficient here, paired with memory_order_acquire in resolve_forward barrier to ensure safe publishing the new object. Again, not really about the mark word. The transitive load of the object from that fwdptr is what we are after. ------------- PR: https://git.openjdk.java.net/jdk/pull/2496 From zgu at openjdk.java.net Thu Feb 11 16:16:38 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 11 Feb 2021 16:16:38 GMT Subject: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v2] In-Reply-To: <6xIXW7WXJtqZVPASy4StAzwMhh6AHgYibgX6NRHoOF8=.85ffde9d-662c-4f3a-9410-de5ca9def6a7@github.com> References: <-mMs6CWp1r2GKMwtamARlpAfWQnUI-I3sSCfpmaRfPI=.9aa2cb72-1370-47ee-8de0-ce4b8a946661@github.com> <6xIXW7WXJtqZVPASy4StAzwMhh6AHgYibgX6NRHoOF8=.85ffde9d-662c-4f3a-9410-de5ca9def6a7@github.com> Message-ID: On Thu, 11 Feb 2021 16:04:38 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/gc/shenandoah/shenandoahForwarding.inline.hpp line 58: >> >>> 56: assert(Thread::current()->is_Java_thread(), "Must be a mutator thread"); >>> 57: >>> 58: markWord mark = obj->mark_acquire(); >> >> Actually, I would argue that you don't need acquire here, since you don't touch anything other than mark word, so that there is no order that needs to be enforced. > > It is not about the mark word itself, we access the forwardee afterwards. It is about transitive dependency: object copy stores -> forwardee installation (markword store, release) -> forwardee discovery (markword load, acquire) --> object copy access. Remember this: https://github.com/openjdk/jdk/pull/2498#discussion_r574498830 Sorry. I commented on wrong place. I actually meant line #78 in try_update_forwardee(). ------------- PR: https://git.openjdk.java.net/jdk/pull/2496 From zgu at openjdk.java.net Thu Feb 11 16:21:38 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 11 Feb 2021 16:21:38 GMT Subject: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v2] In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 14:07:58 GMT, Aleksey Shipilev wrote: >> Shenandoah carries forwardee information in object's mark word. Installing the new mark word is effectively "releasing" the object copy, and reading from the new mark word is "acquiring" that object copy. >> >> For the forwardee update side, Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. This seems to be excessive for Shenandoah forwardee updates, and "release" is enough. >> >> For the forwardee load side, we need to guarantee "acquire". We do not do it now, reading the markword without memory semantics. It does not seem to pose a practical problem today, because GC does not access the object contents in the new copy, and mutators get this from the JRT-called stub that separates the fwdptr access and object contents access by a lot. It still should be cleaner to "acquire" the mark on load to avoid surprises. >> >> Additional testing: >> - [x] Linux x86_64 `hotspot_gc_shenandoah` >> - [x] Linux AArch64 `hotspot_gc_shenandoah` >> - [x] Linux AArch64 `tier1` with Shenandoah > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Make sure to access fwdptr with acquire semantics in assembler code Good to me. ------------- Marked as reviewed by zgu (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2496 From zgu at openjdk.java.net Thu Feb 11 16:21:40 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 11 Feb 2021 16:21:40 GMT Subject: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v2] In-Reply-To: <6xIXW7WXJtqZVPASy4StAzwMhh6AHgYibgX6NRHoOF8=.85ffde9d-662c-4f3a-9410-de5ca9def6a7@github.com> References: <-mMs6CWp1r2GKMwtamARlpAfWQnUI-I3sSCfpmaRfPI=.9aa2cb72-1370-47ee-8de0-ce4b8a946661@github.com> <6xIXW7WXJtqZVPASy4StAzwMhh6AHgYibgX6NRHoOF8=.85ffde9d-662c-4f3a-9410-de5ca9def6a7@github.com> Message-ID: On Thu, 11 Feb 2021 16:06:37 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/gc/shenandoah/shenandoahForwarding.inline.hpp line 89: >> >>> 87: // (would be paired with acquire in forwardee accessors). Acquire on failed update >>> 88: // would get the updated object after the forwardee load. >>> 89: markWord prev_mark = obj->cas_set_mark(new_mark, old_mark, memory_order_acq_rel); >> >> Same here, CAS guarantees you to see latest mark word. memory_order_release should be sufficient here, paired with memory_order_acquire in resolve_forward barrier to ensure safe publishing the new object. > > Again, not really about the mark word. The transitive load of the object from that fwdptr is what we are after. I kind of seeing what you meant, if there is a load after, we do need acquire. So good to me. ------------- PR: https://git.openjdk.java.net/jdk/pull/2496 From shade at openjdk.java.net Thu Feb 11 16:21:41 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 11 Feb 2021 16:21:41 GMT Subject: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v2] In-Reply-To: References: <-mMs6CWp1r2GKMwtamARlpAfWQnUI-I3sSCfpmaRfPI=.9aa2cb72-1370-47ee-8de0-ce4b8a946661@github.com> <6xIXW7WXJtqZVPASy4StAzwMhh6AHgYibgX6NRHoOF8=.85ffde9d-662c-4f3a-9410-de5ca9def6a7@github.com> Message-ID: On Thu, 11 Feb 2021 16:14:00 GMT, Zhengyu Gu wrote: >> It is not about the mark word itself, we access the forwardee afterwards. It is about transitive dependency: object copy stores -> forwardee installation (markword store, release) -> forwardee discovery (markword load, acquire) --> object copy access. Remember this: https://github.com/openjdk/jdk/pull/2498#discussion_r574498830 > > Sorry. I commented on wrong place. I actually meant line #78 in try_update_forwardee(). You have to do acquire there as well, because you can exit from the next block, the one with `old_mark.is_marked()`, never getting further. Then the whole thing unfolds: for transitive visibility of the object contents, you have to load the mark word with acquire before accessing forwardee contents. ------------- PR: https://git.openjdk.java.net/jdk/pull/2496 From zgu at openjdk.java.net Thu Feb 11 16:30:40 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 11 Feb 2021 16:30:40 GMT Subject: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v2] In-Reply-To: References: <-mMs6CWp1r2GKMwtamARlpAfWQnUI-I3sSCfpmaRfPI=.9aa2cb72-1370-47ee-8de0-ce4b8a946661@github.com> <6xIXW7WXJtqZVPASy4StAzwMhh6AHgYibgX6NRHoOF8=.85ffde9d-662c-4f3a-9410-de5ca9def6a7@github.com> Message-ID: On Thu, 11 Feb 2021 16:17:18 GMT, Aleksey Shipilev wrote: >> Sorry. I commented on wrong place. I actually meant line #78 in try_update_forwardee(). > > You have to do acquire there as well, because you can exit from the next block, the one with `old_mark.is_marked()`, never getting further. Then the whole thing unfolds: for transitive visibility of the object contents, you have to load the mark word with acquire before accessing forwardee contents. Yes. I wasn't thinking of the use of oop content after. ------------- PR: https://git.openjdk.java.net/jdk/pull/2496 From simonis at openjdk.java.net Thu Feb 11 17:24:41 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 11 Feb 2021 17:24:41 GMT Subject: RFR: 8261027: AArch64: Support for LSE atomics C++ HotSpot code [v9] In-Reply-To: References: Message-ID: <4pe9O_EVHRNO0yAOImaUV3GcTaKhZ1tBEnw-Vr1mABo=.63dff376-5413-46ef-b36e-0c7165c566ad@github.com> On Thu, 11 Feb 2021 15:36:58 GMT, Andrew Haley wrote: >> Go back a few years, and there were simple atomic load/store exclusive >> instructions on Arm. Say you want to do an atomic increment of a >> counter. You'd do an atomic load to get the counter into your local cache >> in exclusive state, increment that counter locally, then write that >> incremented counter back to memory with an atomic store. All the time >> that cache line was in exclusive state, so you're guaranteed that >> no-one else changed anything on that cache line while you had it. >> >> This is hard to scale on a very large system (e.g. Fugaku) because if >> many processors are incrementing that counter you get a lot of cache >> line ping-ponging between cores. >> >> So, Arm decided to add a locked memory increment instruction that >> works without needing to load an entire line into local cache. It's a >> single instruction that loads, increments, and writes back. The secret >> is to send a cache control message to whichever processor owns the >> cache line containing the count, tell that processor to increment the >> counter and return the incremented value. That way cache coherency >> traffic is mimimized. This new set of instructions is known as Large >> System Extensions, or LSE. >> >> Unfortunately, in recent processors, the "old" load/store exclusive >> instructions, sometimes perform very badly. Therefore, it's now >> necessary for software to detect which version of Arm it's running >> on, and use the "new" LSE instructions if they're available. Otherwise >> performance can be very poor under heavy contention. >> >> GCC's -moutline-atomics does this by providing library calls which use >> LSE if it's available, but this option is only provided on newer >> versions of GCC. This is particularly problematic with older versions >> of OpenJDK, which build using old GCC versions. >> >> Also, I suspect that some other operating systems could use this. >> Perhaps not MacOS, given that all Apple CPUs support LSE, but >> maybe Windows. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Spillchuck Marked as reviewed by simonis (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2434 From lucy at openjdk.java.net Thu Feb 11 17:47:54 2021 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Thu, 11 Feb 2021 17:47:54 GMT Subject: RFR: 8261447: MethodInvocationCounters frequently run into overflow [v2] In-Reply-To: References: Message-ID: > Dear community, > may I please request reviews for this fix, improving the usefulness of method invocation counters. > - aggregation counters are retyped as uint64_t, shifting the overflow probability way out (185 days in case of a 1 GHz counter update frequency). > - counters for individual methods are interpreted as (unsigned int), in contrast to their declaration as int. This gives us a factor of two before the counters overflow. > - as a special case, "compiled_invocation_counter" is retyped as long, because it has a higher update frequency than other counters. > - before/after sample output is attached to the bug description. > > Thank you! > Lutz Lutz Schmidt has updated the pull request incrementally with one additional commit since the last revision: 8261447: requested changes by TobiHartmann ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2511/files - new: https://git.openjdk.java.net/jdk/pull/2511/files/868ff38b..bfd60a3c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2511&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2511&range=00-01 Stats: 8 lines in 2 files changed: 1 ins; 3 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/2511.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2511/head:pull/2511 PR: https://git.openjdk.java.net/jdk/pull/2511 From mdoerr at openjdk.java.net Thu Feb 11 18:11:43 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 11 Feb 2021 18:11:43 GMT Subject: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v2] In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 16:18:30 GMT, Zhengyu Gu wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Make sure to access fwdptr with acquire semantics in assembler code > > Good to me. Do we really need to change shenandoahBarrierSetAssembler_aarch64.cpp? Address dependency ensures the ordering. Or is there anything missing? ------------- PR: https://git.openjdk.java.net/jdk/pull/2496 From coleenp at openjdk.java.net Thu Feb 11 18:13:39 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 11 Feb 2021 18:13:39 GMT Subject: RFR: 8261585: Restore HandleArea used in Deoptimization::uncommon_trap In-Reply-To: References: Message-ID: <8QAC28j6VeP8xx3mJ6vl6qmW8SnlasvFYdyDYNigaMc=.765f8d4e-aee9-4cfd-bd4b-88507cffcf48@github.com> On Thu, 11 Feb 2021 14:42:59 GMT, Hui Shi wrote: > Add HandleMark in Deoptimization::uncommon_trap before Deoptimization::fetch_unroll_info_helper, avoid reference hold in HandleArea increase object lifetime. Then object lifetime will be consistent with/without uncommon trap. > > For test case in commit, WeakReference is expected cleared after GC, but it fails with option "-XX:-Inline -XX:-TieredCompilation -XX:CompileCommand=compileonly,UncommonTrapLeak.foo -XX:CompileThreshold=100 -XX:-BackgroundCompilation". Reference's referent object is still alive after "foo" finish, because with uncommon trap, oops are recorded in HandleArea and HandleArea is not poped when uncommon trap process finish. > > When Deoptimization::fetch_unroll_info_helper return, all oops in deoptimized frames are saved in Deoptimization::UnrollBlock or Thread data structure, HandleArea can be poped safely. > 1. local and expression oops raw address is stored in vframeArrayElement _locals/_expressions as intptr > 2. return value restore, raw oop recoreded in frame // (oop *)map->location(rax->as_VMReg()); > 3. exception object, raw oop recorded on Thread._exception_oop > > In deoptimize blob entry, JRT_BLOCK_ENTRY(Deoptimization::fetch_unroll_info) has HandleMarkCleaner, HandleArea is restored after Deoptimization::fetch_unroll_info_helper finish. So it's also safe to add HandleMark in Deoptimization::uncommon_trap before fetch_unroll_info_helper. Marked as reviewed by coleenp (Reviewer). src/hotspot/share/runtime/deoptimization.cpp line 2468: > 2466: uncommon_trap_inner(thread, trap_request); > 2467: } > 2468: HandleMark hm(thread); This is fine. I generally prefer the HandleMark closer to the lifetimes of the Handles that it protects. If you put this HandleMark in fetch_unroll_info_helper() would it be needed before the line: Handle exceptionObject; Or are there Handles created before there? Given how complicated this code is, where you have the HandleMark is fine. Well done finding this bug and writing a test for it! ------------- PR: https://git.openjdk.java.net/jdk/pull/2526 From kvn at openjdk.java.net Thu Feb 11 18:25:39 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 11 Feb 2021 18:25:39 GMT Subject: RFR: 8261447: MethodInvocationCounters frequently run into overflow [v2] In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 13:39:15 GMT, Lutz Schmidt wrote: >> Changes requested by thartmann (Reviewer). > > Thank you Tobias for having a look. I'll post an update asap. @veresov please review these changes ------------- PR: https://git.openjdk.java.net/jdk/pull/2511 From martin.doerr at sap.com Thu Feb 11 18:29:42 2021 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 11 Feb 2021 18:29:42 +0000 Subject: Atomic operations: your thoughts are welocme In-Reply-To: <49d0408a-13f9-ddc8-06e3-e0eb27a708dd@redhat.com> References: <448C638F-D688-4913-875C-5D8BA9235126@oracle.com> <49d0408a-13f9-ddc8-06e3-e0eb27a708dd@redhat.com> Message-ID: Hi, I appreciate this investigation. PPC64 has optimized versions for _relaxed, _acquire, _release and _acq_rel which are substantially faster than the other memory order modes. So we should be able to observe performance improvements when any of these ones are used in hot code. Best regards, Martin > -----Original Message----- > From: hotspot-dev On Behalf Of > Andrew Haley > Sent: Donnerstag, 11. Februar 2021 14:34 > To: Kim Barrett > Cc: hotspot-gc-dev openjdk.java.net ; > hotspot-dev at openjdk.java.net > Subject: Re: Atomic operations: your thoughts are welocme > > On 11/02/2021 03:59, Kim Barrett wrote: > >> On Feb 8, 2021, at 1:14 PM, Andrew Haley wrote: > >> > >> I've been looking at the hottest Atomic operations in HotSpot, with a view > to > >> finding out if the default memory_order_conservative (which is very > expensive > >> on some architectures) can be weakened to something less. It's > impossible to > >> fix all of them, but perhaps we can fix some of the most frequent. > > > > Is there any information about the possible performance improvement > from > > such changes? 1.5-3M occurrences doesn't mean much without context. > > > > We don't presently have support for sequentially consistent semantics, > only > > "conservative". My recollection is that this is in part because there might > > be code that is assuming the possibly stronger "conservative" semantics, > and > > in part because there are different and incompatible approaches to > > implementing sequentially consistent semantics on some hardware > platforms > > and we didn't want to make assumptions there. > > > > We also don't presently have any cmpxchg implementation that really > supports > > anything between conservative and relaxed, nor do we support different > order > > constraints for the success vs failure cases. Things can be complicated > > enough as is; while we *could* fill some of that in, I'm not sure we should. > > OK. However, even though we don't implement any of them, we do have an > API that includes acq, rel, and seq_cst. The fact that we don't have > anything behind them is, I thought, To Be Done rather than Won't Do. > > >> > ::Table::oop_oop_iterate anceKlass, narrowOop>(G1CMOopClosure*, oopDesc*, Klass*)+336>: :: = > 3903178 > >> > >> This is actually MarkBitMap::par_mark calling BitMap::par_set_bit. Does > this > >> need to be memory_order_conservative, or would something weaker > do? Even > >> acq_rel or seq_cst would be better. > > > > I think for setting bits in a bitmap the thing to do would be to identify > > places that are safe and useful (impacts performance) to do so first. Then > > add a weaker variant for use in those places, assuming any are found. > > I see. I'm assuming that frequency of use is a useful proxy for impact. > Aleksey has already, very helpfully, measured how significant these are > for Shenandoah, and I suspect all concurrent GCs would benefit in a > similar fashion. > > >> : :: = 2376632 > >> : :: = 2003895 > >> > >> I can't imagine that either of these actually need > memory_order_conservative, > >> they're just reference counts. > > > > The "usual" refcount implementation involves relaxed increment and > stronger > > ordering for decrement. (If I'm remembering correctly, dec-acquire and a > > release fence on the zero value path before deleting. But I've not thought > > about what one might want for this CAS-based variant that handles > boundary > > cases specially.) And as you say, whether any of these could be weakened > > depends on whether there is any code surrounding a use that depends on > the > > stronger ordering semantics. At a guess I suspect increment could be > changed > > to relaxed, but I've not looked. This one is probably a question for runtime > > folks. > > OK, this makes sense. I'm thinking of the long road to getting this stuff > documented so that we can see what side effects of atomic operations are > actually required. > > >> : :: = > 1719614 > >> > >> BitMap::par_set_bit again. > >> > >> > erflowTaskQueue, > (MEMFLAGS)5>*)+432>: :: = 1617659 > >> > >> This one is GenericTaskQueue::pop_global calling cmpxchg_age(). > >> Again, do we need conservative here? > > > > This needs at least sequentially consistent semantics on the success path. > > Yep. That's easy, it's the full barrier in the failure path that > I'd love to eliminate. > > > See the referenced paper by Le, et al. > > > > There is also a cmpxchg_age in pop_local_slow. The Le, et al paper doesn't > > deal with that path. But it's also not in your list, which is good since > > this is supposed to be infrequently taken. > > Right. I'm trying to concentrate on the low-hanging fruit. > > Thank you for the very detailed and informative reply. > > -- > Andrew Haley (he/him) > Java Platform Lead Engineer > Red Hat UK Ltd. > https://keybase.io/andrewhaley > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From gziemski at openjdk.java.net Thu Feb 11 20:07:40 2021 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Thu, 11 Feb 2021 20:07:40 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v6] In-Reply-To: References: Message-ID: <6f45qE_D_iGPVwyKMU4y5ifw3gtVVKwVz-OkjLGsJQc=.5e68053a-7203-46ca-9710-3768ac22e019@github.com> On Wed, 10 Feb 2021 10:04:31 GMT, Thomas Stuefe wrote: >> Marked as reviewed by dholmes (Reviewer). > > Gentle ping.. hi Thomas, I'm interested in reviewing your fix and will work on it as soon as I'm done with https://github.com/openjdk/jdk/pull/2403 (tomorrow?) ------------- PR: https://git.openjdk.java.net/jdk/pull/2251 From iveresov at openjdk.java.net Thu Feb 11 20:48:38 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Thu, 11 Feb 2021 20:48:38 GMT Subject: RFR: 8261447: MethodInvocationCounters frequently run into overflow [v2] In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 18:22:38 GMT, Vladimir Kozlov wrote: >> Thank you Tobias for having a look. I'll post an update asap. > > @veresov please review these changes I don't really like these .*64 suffixes. Can we just make the counter 64 bit, update the SA as Tobias suggested and keep the existing method names? Or is there a reason for doing this that eludes me? ------------- PR: https://git.openjdk.java.net/jdk/pull/2511 From iklam at openjdk.java.net Thu Feb 11 23:51:48 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 11 Feb 2021 23:51:48 GMT Subject: RFR: 8261608: Move common CDS archive building code to archiveBuilder.cpp Message-ID: This is a follow-up to https://git.openjdk.java.net/jdk/pull/2296: - Move common code for writing the CDS archive from metaspaceShared.cpp to archiveBuilder.cpp - Data structures related to dumping were haphazardly organized in several classes (e.g., `DumpRegions`). We needed various APIs to access them across classes. These should be consolidated in archiveBuilder.cpp and the API should be cleaned up - Detailed stats (`DumpAllocStats::print_stats`) were available only for static dump. Refactor the code so they are also printed for dynamic dump ------------- Commit messages: - 8261608: Move common CDS archive building code to archiveBuilder.cpp Changes: https://git.openjdk.java.net/jdk/pull/2536/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2536&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261608 Stats: 943 lines in 31 files changed: 327 ins; 464 del; 152 mod Patch: https://git.openjdk.java.net/jdk/pull/2536.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2536/head:pull/2536 PR: https://git.openjdk.java.net/jdk/pull/2536 From iklam at openjdk.java.net Thu Feb 11 23:51:48 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 11 Feb 2021 23:51:48 GMT Subject: RFR: 8261608: Move common CDS archive building code to archiveBuilder.cpp In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 23:41:34 GMT, Ioi Lam wrote: > This is a follow-up to https://git.openjdk.java.net/jdk/pull/2296: > > - Move common code for writing the CDS archive from metaspaceShared.cpp to archiveBuilder.cpp > > - Data structures related to dumping were haphazardly organized in several classes (e.g., `DumpRegions`). We needed various APIs to access them across classes. These should be consolidated in archiveBuilder.cpp and the API should be cleaned up > > - Detailed stats (`DumpAllocStats::print_stats`) were available only for static dump. Refactor the code so they are also printed for dynamic dump Example of detailed stats that are now available for dynamic dump: [info ][cds] Shared file region (mc ) 0: 24 bytes, addr 0x0000000800c30000 file offset 0x00001000 crc 0xa3c1ca20 [info ][cds] Shared file region (rw ) 1: 1016 bytes, addr 0x0000000800c31000 file offset 0x00002000 crc 0xb7970a35 [info ][cds] Shared file region (ro ) 2: 1632 bytes, addr 0x0000000800c32000 file offset 0x00003000 crc 0x4edeacfa [info ][cds] Shared file region (bm ) 3: 160 bytes, addr 0x0000000000000000 file offset 0x00004000 crc 0x27cf167e [debug][cds] mc space: 24 [ 0.1% of total] out of 4096 bytes [ 0.6% used] at 0x0000000800c30000 [debug][cds] rw space: 1016 [ 6.2% of total] out of 4096 bytes [ 24.8% used] at 0x0000000800c31000 [debug][cds] ro space: 1632 [ 10.0% of total] out of 4096 bytes [ 39.8% used] at 0x0000000800c32000 [debug][cds] bm space: 160 [ 1.0% of total] out of 160 bytes [100.0% used] [debug][cds] total : 2832 [100.0% of total] out of 16384 bytes [ 17.3% used] [debug][cds] Detailed metadata info (excluding heap regions; rw stats include mc regions): [debug][cds] ro_cnt ro_bytes % | rw_cnt rw_bytes % | all_cnt all_bytes % [debug][cds] --------------------+---------------------------+---------------------------+-------------------------- [debug][cds] Class : 0 0 0.0 | 1 544 52.3 | 1 544 20.4 [debug][cds] Symbol : 3 64 3.9 | 0 0 0.0 | 3 64 2.4 [debug][cds] TypeArrayU1 : 3 168 10.3 | 1 40 3.8 | 4 208 7.8 [debug][cds] TypeArrayU2 : 2 16 1.0 | 0 0 0.0 | 2 16 0.6 [debug][cds] TypeArrayU4 : 1 16 1.0 | 0 0 0.0 | 1 16 0.6 [debug][cds] TypeArrayU8 : 2 672 41.2 | 1 40 3.8 | 3 712 26.6 [debug][cds] TypeArrayOther : 0 0 0.0 | 0 0 0.0 | 0 0 0.0 [debug][cds] Method : 0 0 0.0 | 2 192 18.5 | 2 192 7.2 [debug][cds] ConstMethod : 2 144 8.8 | 0 0 0.0 | 2 144 5.4 [debug][cds] MethodData : 0 0 0.0 | 0 0 0.0 | 0 0 0.0 [debug][cds] ConstantPool : 1 312 19.1 | 0 0 0.0 | 1 312 11.7 [debug][cds] ConstantPoolCache : 0 0 0.0 | 1 136 13.1 | 1 136 5.1 [debug][cds] Annotations : 0 0 0.0 | 0 0 0.0 | 0 0 0.0 [debug][cds] MethodCounters : 0 0 0.0 | 1 64 6.2 | 1 64 2.4 [debug][cds] RecordComponent : 0 0 0.0 | 0 0 0.0 | 0 0 0.0 [debug][cds] SymbolHashentry : 3 32 2.0 | 0 0 0.0 | 3 32 1.2 [debug][cds] SymbolBucket : 2 16 1.0 | 0 0 0.0 | 2 16 0.6 [debug][cds] StringHashentry : 0 0 0.0 | 0 0 0.0 | 0 0 0.0 [debug][cds] StringBucket : 0 0 0.0 | 0 0 0.0 | 0 0 0.0 [debug][cds] ModulesNatives : 0 0 0.0 | 0 0 0.0 | 0 0 0.0 [debug][cds] Other : 0 192 11.8 | 0 24 2.3 | 0 216 8.1 [debug][cds] --------------------+---------------------------+---------------------------+-------------------------- [debug][cds] Total : 19 1632 100.0 | 7 1040 100.0 | 26 2672 100.0 ------------- PR: https://git.openjdk.java.net/jdk/pull/2536 From coleenp at openjdk.java.net Fri Feb 12 00:49:44 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 12 Feb 2021 00:49:44 GMT Subject: RFR: 8261608: Move common CDS archive building code to archiveBuilder.cpp In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 23:41:34 GMT, Ioi Lam wrote: > This is a follow-up to https://git.openjdk.java.net/jdk/pull/2296: > > - Move common code for writing the CDS archive from metaspaceShared.cpp to archiveBuilder.cpp > > - Data structures related to dumping were haphazardly organized in several classes (e.g., `DumpRegions`). We needed various APIs to access them across classes. These should be consolidated in archiveBuilder.cpp and the API should be cleaned up > > - Detailed stats (`DumpAllocStats::print_stats`) were available only for static dump. Refactor the code so they are also printed for dynamic dump Looks like a good change. src/hotspot/share/classfile/moduleEntry.cpp line 33: > 31: #include "logging/log.hpp" > 32: #include "memory/archiveBuilder.hpp" > 33: #include "memory/archiveUtils.hpp" Do these files need archiveUtils.hpp or does archiveBuilder.hpp need it? src/hotspot/share/memory/archiveBuilder.cpp line 178: > 176: _num_instance_klasses = 0; > 177: _num_obj_array_klasses = 0; > 178: _num_type_array_klasses = 0; Should these be initializers also? src/hotspot/share/memory/archiveUtils.hpp line 44: > 42: class ArchivePtrMarker : AllStatic { > 43: static CHeapBitMap* _ptrmap; > 44: static VirtualSpace* _vs; I think this is a good change. src/hotspot/share/memory/metaspaceShared.inline.hpp line 34: > 32: > 33: #if INCLUDE_CDS_JAVA_HEAP > 34: bool MetaspaceShared::is_archive_object(oop p) { Was this unused? I don't find 'is_archive_object' in the diff. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2536 From coleenp at openjdk.java.net Fri Feb 12 02:12:43 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 12 Feb 2021 02:12:43 GMT Subject: RFR: 8261031: Move some ClassLoader name checking to native/VM [v3] In-Reply-To: References: <3fZUkpucpgdhZyyWDQ7Hp1oKthgl1ckXBq942wMNwxI=.7a3db0ca-03c0-44f9-ade9-3b4443cc6666@github.com> Message-ID: On Thu, 11 Feb 2021 12:44:54 GMT, Claes Redestad wrote: >> This patch moves some sanity checking done in ClassLoader.java to the corresponding endpoints in native or VM code. > > Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: > > Consolidate verifyClassname and verifyFixClassname This more limited cleanup looks good. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2378 From hshi at openjdk.java.net Fri Feb 12 02:18:39 2021 From: hshi at openjdk.java.net (Hui Shi) Date: Fri, 12 Feb 2021 02:18:39 GMT Subject: RFR: 8261585: Restore HandleArea used in Deoptimization::uncommon_trap In-Reply-To: <8QAC28j6VeP8xx3mJ6vl6qmW8SnlasvFYdyDYNigaMc=.765f8d4e-aee9-4cfd-bd4b-88507cffcf48@github.com> References: <8QAC28j6VeP8xx3mJ6vl6qmW8SnlasvFYdyDYNigaMc=.765f8d4e-aee9-4cfd-bd4b-88507cffcf48@github.com> Message-ID: <9Jw5hyBW0pnk1L2qt1ZOT8jNTAGRcgLSyhIP7XXDrqM=.efbb860f-5dd1-40c2-8970-762a448f5b45@github.com> On Thu, 11 Feb 2021 18:10:35 GMT, Coleen Phillimore wrote: >> Add HandleMark in Deoptimization::uncommon_trap before Deoptimization::fetch_unroll_info_helper, avoid reference hold in HandleArea increase object lifetime. Then object lifetime will be consistent with/without uncommon trap. >> >> For test case in commit, WeakReference is expected cleared after GC, but it fails with option "-XX:-Inline -XX:-TieredCompilation -XX:CompileCommand=compileonly,UncommonTrapLeak.foo -XX:CompileThreshold=100 -XX:-BackgroundCompilation". Reference's referent object is still alive after "foo" finish, because with uncommon trap, oops are recorded in HandleArea and HandleArea is not poped when uncommon trap process finish. >> >> When Deoptimization::fetch_unroll_info_helper return, all oops in deoptimized frames are saved in Deoptimization::UnrollBlock or Thread data structure, HandleArea can be poped safely. >> 1. local and expression oops raw address is stored in vframeArrayElement _locals/_expressions as intptr >> 2. return value restore, raw oop recoreded in frame // (oop *)map->location(rax->as_VMReg()); >> 3. exception object, raw oop recorded on Thread._exception_oop >> >> In deoptimize blob entry, JRT_BLOCK_ENTRY(Deoptimization::fetch_unroll_info) has HandleMarkCleaner, HandleArea is restored after Deoptimization::fetch_unroll_info_helper finish. So it's also safe to add HandleMark in Deoptimization::uncommon_trap before fetch_unroll_info_helper. > > src/hotspot/share/runtime/deoptimization.cpp line 2468: > >> 2466: uncommon_trap_inner(thread, trap_request); >> 2467: } >> 2468: HandleMark hm(thread); > > This is fine. I generally prefer the HandleMark closer to the lifetimes of the Handles that it protects. If you put this HandleMark in fetch_unroll_info_helper() would it be needed before the line: > Handle exceptionObject; > Or are there Handles created before there? > Given how complicated this code is, where you have the HandleMark is fine. Well done finding this bug and writing a test for it! Thanks! There are also Handles used in static method eliminate_allocations and eliminate_locks. ------------- PR: https://git.openjdk.java.net/jdk/pull/2526 From iklam at openjdk.java.net Fri Feb 12 04:08:03 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Fri, 12 Feb 2021 04:08:03 GMT Subject: RFR: 8261608: Move common CDS archive building code to archiveBuilder.cpp [v2] In-Reply-To: References: Message-ID: <7g0L5oYcLEqm2b-mg_7heIQFeOj0atCRN-3Y_gy0UzE=.efc411e4-736b-4f9e-8d47-a814bfacf08c@github.com> > This is a follow-up to https://git.openjdk.java.net/jdk/pull/2296: > > - Move common code for writing the CDS archive from metaspaceShared.cpp to archiveBuilder.cpp > > - Data structures related to dumping were haphazardly organized in several classes (e.g., `DumpRegions`). We needed various APIs to access them across classes. These should be consolidated in archiveBuilder.cpp and the API should be cleaned up > > - Detailed stats (`DumpAllocStats::print_stats`) were available only for static dump. Refactor the code so they are also printed for dynamic dump Ioi Lam has updated the pull request incrementally with two additional commits since the last revision: - fixed spaces - use member initializer list; clean up log message ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2536/files - new: https://git.openjdk.java.net/jdk/pull/2536/files/f32c40b6..9582e40f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2536&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2536&range=00-01 Stats: 48 lines in 1 file changed: 20 ins; 17 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/2536.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2536/head:pull/2536 PR: https://git.openjdk.java.net/jdk/pull/2536 From iklam at openjdk.java.net Fri Feb 12 04:08:05 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Fri, 12 Feb 2021 04:08:05 GMT Subject: RFR: 8261608: Move common CDS archive building code to archiveBuilder.cpp [v2] In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 00:46:04 GMT, Coleen Phillimore wrote: >> Ioi Lam has updated the pull request incrementally with two additional commits since the last revision: >> >> - fixed spaces >> - use member initializer list; clean up log message > > src/hotspot/share/memory/metaspaceShared.inline.hpp line 34: > >> 32: >> 33: #if INCLUDE_CDS_JAVA_HEAP >> 34: bool MetaspaceShared::is_archive_object(oop p) { > > Was this unused? I don't find 'is_archive_object' in the diff. This whole file was unused (and the function was not even declared in the MetaspaceShared class) so I removed it. > src/hotspot/share/memory/archiveBuilder.cpp line 178: > >> 176: _num_instance_klasses = 0; >> 177: _num_obj_array_klasses = 0; >> 178: _num_type_array_klasses = 0; > > Should these be initializers also? I changed them to initializers. > src/hotspot/share/classfile/moduleEntry.cpp line 33: > >> 31: #include "logging/log.hpp" >> 32: #include "memory/archiveBuilder.hpp" >> 33: #include "memory/archiveUtils.hpp" > > Do these files need archiveUtils.hpp or does archiveBuilder.hpp need it? Both packageEntry.cpp and moduleEntry.cpp use ArchivePtrMarker which is defined in archiveUtils.hpp. ------------- PR: https://git.openjdk.java.net/jdk/pull/2536 From jbhateja at openjdk.java.net Fri Feb 12 06:01:37 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Fri, 12 Feb 2021 06:01:37 GMT Subject: RFR: 8261553: Efficient mask generation using BMI2 BZHI instruction [v2] In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 14:28:01 GMT, Claes Redestad wrote: > > Hi Claes, This could be a run to run variation, in general we are now having fewer number of instructions (one shift operation saved per mask computation) compared to previous masked generation sequence and thus it will always offer better execution latencies. > > Run-to-run variation would be easy to rule out by running more forks and more iterations to attain statistically significant results. While the instruction manuals suggest latency should be better for this instruction on all CPUs where it's supported, it would be good if there was some clear proof - such as a significant benchmark win - to motivate the added complexity. BASELINE: Result "org.openjdk.bench.java.lang.ArrayCopyUnalignedSrc.testLong": 61.037 ns/op Secondary result "org.openjdk.bench.java.lang.ArrayCopyUnalignedSrc.testLong:??perf": Perf stats: -------------------------------------------------- 19,739.21 msec task-clock # 0.389 CPUs utilized 646 context-switches # 0.033 K/sec 12 cpu-migrations # 0.001 K/sec 150 page-faults # 0.008 K/sec 74,59,83,59,139 cycles # 3.779 GHz (30.73%) 1,78,78,79,19,117 instructions # 2.40 insn per cycle (38.48%) 24,79,81,63,651 branches # 1256.289 M/sec (38.55%) 32,24,89,924 branch-misses # 1.30% of all branches (38.62%) 52,56,88,28,472 L1-dcache-loads # 2663.167 M/sec (38.65%) 39,00,969 L1-dcache-load-misses # 0.01% of all L1-dcache hits (38.57%) 3,74,131 LLC-loads # 0.019 M/sec (30.77%) 22,315 LLC-load-misses # 5.96% of all LL-cache hits (30.72%) L1-icache-loads 17,49,997 L1-icache-load-misses (30.72%) 52,91,41,70,636 dTLB-loads # 2680.663 M/sec (30.69%) 3,315 dTLB-load-misses # 0.00% of all dTLB cache hits (30.67%) 4,674 iTLB-loads # 0.237 K/sec (30.65%) 33,746 iTLB-load-misses # 721.99% of all iTLB cache hits (30.63%) L1-dcache-prefetches L1-dcache-prefetch-misses 50.723759146 seconds time elapsed 51.447054000 seconds user 0.189949000 seconds sys WITH OPT: Result "org.openjdk.bench.java.lang.ArrayCopyUnalignedSrc.testLong": 74.356 ns/op Secondary result "org.openjdk.bench.java.lang.ArrayCopyUnalignedSrc.testLong:??perf": Perf stats: -------------------------------------------------- 19,741.09 msec task-clock # 0.389 CPUs utilized 641 context-switches # 0.032 K/sec 17 cpu-migrations # 0.001 K/sec 164 page-faults # 0.008 K/sec 74,40,40,48,513 cycles # 3.769 GHz (30.81%) 1,45,66,22,06,797 instructions # 1.96 insn per cycle (38.56%) 20,31,28,43,577 branches # 1028.963 M/sec (38.65%) 14,11,419 branch-misses # 0.01% of all branches (38.69%) 43,07,86,33,662 L1-dcache-loads # 2182.182 M/sec (38.72%) 37,06,744 L1-dcache-load-misses # 0.01% of all L1-dcache hits (38.56%) 1,34,292 LLC-loads # 0.007 M/sec (30.72%) 30,627 LLC-load-misses # 22.81% of all LL-cache hits (30.68%) L1-icache-loads 14,49,145 L1-icache-load-misses (30.65%) 43,44,86,27,516 dTLB-loads # 2200.924 M/sec (30.63%) 218 dTLB-load-misses # 0.00% of all dTLB cache hits (30.63%) 2,445 iTLB-loads # 0.124 K/sec (30.63%) 28,624 iTLB-load-misses # 1170.72% of all iTLB cache hits (30.63%) L1-dcache-prefetches L1-dcache-prefetch-misses 50.716083931 seconds time elapsed 51.467300000 seconds user 0.200390000 seconds sys JMH perf data for ArrayCopyUnalignedSrc.testLong with copy length of 1200 shows degradation in LID accesses, it seems the benchmask got displaced from its sweet spot. But, there is a significant reduction in instruction count and cycles are almost comparable. We are saving one shift per mask computation. OLD Sequence: 0x00007f7fc1030ead: movabs $0x1,%rax 0x00007f7fc1030eb7: shlx %r8,%rax,%rax 0x00007f7fc1030ebc: dec %rax 0x00007f7fc1030ebf: kmovq %rax,%k2 NEW Sequence: 0x00007f775d030d51: movabs $0xffffffffffffffff,%rax 0x00007f775d030d5b: bzhi %r8,%rax,%rax 0x00007f775d030d60: kmovq %rax,%k2 ------------- PR: https://git.openjdk.java.net/jdk/pull/2522 From shade at openjdk.java.net Fri Feb 12 07:29:43 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 12 Feb 2021 07:29:43 GMT Subject: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v2] In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 18:08:49 GMT, Martin Doerr wrote: > Do we really need to change shenandoahBarrierSetAssembler_aarch64.cpp? > Address dependency ensures the ordering. Or is there anything missing? True, we could use address dependency for the ordering. This thing is more or less load-consume of the newly promoted object. But, I think reasoning with (stronger) acquires is more versatile for the code we do not control (C++ generally discourages using `memory_order_consume`; that is why our C++ code acquires mark word). While we control the assembler insns sequence, it seems good to match that. It does not seem worth over-relaxing the rare path: the CAS failure path under running GC. ------------- PR: https://git.openjdk.java.net/jdk/pull/2496 From lucy at openjdk.java.net Fri Feb 12 08:32:41 2021 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Fri, 12 Feb 2021 08:32:41 GMT Subject: RFR: 8261447: MethodInvocationCounters frequently run into overflow [v2] In-Reply-To: References: Message-ID: <7S5jdlFpZ5m2xtFinD92jQEQm6hgbQXjHR5N-3XbXkc=.fe75978e-860a-4bcc-b5b4-3e4b4246d706@github.com> On Thu, 11 Feb 2021 20:45:30 GMT, Igor Veresov wrote: >> @veresov please review these changes > > I don't really like these .*64 suffixes. Can we just make the counter 64 bit, update the SA as Tobias suggested and keep the existing method names? Or is there a reason for doing this that eludes me? I introduced the *64 suffixes to not break anything that still uses the old calls. As old uses disappear step by step, I'm more than happy to remove the suffixes. I will have a look into SA and try to make it 64bit counter ready. There may be no new version before the weekend is over. ------------- PR: https://git.openjdk.java.net/jdk/pull/2511 From tschatzl at openjdk.java.net Fri Feb 12 08:46:55 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 12 Feb 2021 08:46:55 GMT Subject: RFR: 8260941: Remove the conc_scan parameter for CardTable [v2] In-Reply-To: <1sGB_hdxutVE55IriJ2XK3krb4vsffzX3OKavt1UwBE=.53483ccd-9923-4ec3-a0d9-82e619c440f5@github.com> References: <1sGB_hdxutVE55IriJ2XK3krb4vsffzX3OKavt1UwBE=.53483ccd-9923-4ec3-a0d9-82e619c440f5@github.com> Message-ID: <-tP8GTrmtg1g4zqRNqiGiVrp24kqMtNxPY0R7tAi_qg=.f8793b05-cb3e-4416-b65a-a57a4be8f2c3@github.com> > Hi, > > can I have reviews for this removal of the last(?) CMS-specific code in CardTable, namely some provision to indicate that cards are being scanned concurrently in Serial/Parallel GC barrier code? > > The change simply follows the predicate into Serial/Parallel GC code which always returns false for them and removes that code. > > In the review for JDK-8234534 I mentioned that I split this out due to unexplainable errors; testing tier1-5 three times showed none of that any more (after updating to latest code). > > This change has only been built on Oracle-platforms and linux-x86 via github actions (https://github.com/tschatzl/jdk/actions/runs/539993964), so I would like to kindly ask maintainers of the others to compile and report issues (32 bit ARM, PPC etc). > > Testing: tier1-5 Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: kbarrett review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2425/files - new: https://git.openjdk.java.net/jdk/pull/2425/files/5db8cbe6..9c0129d3 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2425&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2425&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2425.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2425/head:pull/2425 PR: https://git.openjdk.java.net/jdk/pull/2425 From tschatzl at openjdk.java.net Fri Feb 12 08:46:56 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 12 Feb 2021 08:46:56 GMT Subject: RFR: 8260941: Remove the conc_scan parameter for CardTable [v2] In-Reply-To: References: <1sGB_hdxutVE55IriJ2XK3krb4vsffzX3OKavt1UwBE=.53483ccd-9923-4ec3-a0d9-82e619c440f5@github.com> Message-ID: <5YoESAAlhRf7kasKg4Q8B-qjNLZ-dL1tJz3g94E8aJE=.29f5940d-facc-4ea0-8fca-d240d174038a@github.com> On Wed, 10 Feb 2021 09:42:15 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> kbarrett review > > A side note: it seems that `G1BarrierSet` is the only subclass of `CardTableBarrierSet`. Maybe it makes sense to merge them into one. Thanks for your revies @albertnetymk @kimbarrett ------------- PR: https://git.openjdk.java.net/jdk/pull/2425 From tschatzl at openjdk.java.net Fri Feb 12 08:46:57 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 12 Feb 2021 08:46:57 GMT Subject: Integrated: 8260941: Remove the conc_scan parameter for CardTable In-Reply-To: <1sGB_hdxutVE55IriJ2XK3krb4vsffzX3OKavt1UwBE=.53483ccd-9923-4ec3-a0d9-82e619c440f5@github.com> References: <1sGB_hdxutVE55IriJ2XK3krb4vsffzX3OKavt1UwBE=.53483ccd-9923-4ec3-a0d9-82e619c440f5@github.com> Message-ID: On Fri, 5 Feb 2021 09:52:25 GMT, Thomas Schatzl wrote: > Hi, > > can I have reviews for this removal of the last(?) CMS-specific code in CardTable, namely some provision to indicate that cards are being scanned concurrently in Serial/Parallel GC barrier code? > > The change simply follows the predicate into Serial/Parallel GC code which always returns false for them and removes that code. > > In the review for JDK-8234534 I mentioned that I split this out due to unexplainable errors; testing tier1-5 three times showed none of that any more (after updating to latest code). > > This change has only been built on Oracle-platforms and linux-x86 via github actions (https://github.com/tschatzl/jdk/actions/runs/539993964), so I would like to kindly ask maintainers of the others to compile and report issues (32 bit ARM, PPC etc). > > Testing: tier1-5 This pull request has now been integrated. Changeset: 9c0ec8d8 Author: Thomas Schatzl URL: https://git.openjdk.java.net/jdk/commit/9c0ec8d8 Stats: 81 lines in 17 files changed: 4 ins; 66 del; 11 mod 8260941: Remove the conc_scan parameter for CardTable Reviewed-by: ayang, kbarrett ------------- PR: https://git.openjdk.java.net/jdk/pull/2425 From tschatzl at openjdk.java.net Fri Feb 12 09:33:52 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 12 Feb 2021 09:33:52 GMT Subject: RFR: 8261309: Remove remaining StoreLoad barrier with UseCondCardMark for Serial/Parallel GC Message-ID: Hi all, can I have reviews for this (tiny) change that removes the last (unconditional) StoreLoad memory barrier for Serial/Parallel GC that has apparently been forgotten to be made conditional on `CardTable::scanned_concurrently()` just removed in [JDK-8260941](https://bugs.openjdk.java.net/browse/JDK-8260941) ? Thanks, Thomas Testing: automatic compilation via github actions, but this is a quite straightforward removal of a single line... ------------- Commit messages: - Initial commit Changes: https://git.openjdk.java.net/jdk/pull/2541/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2541&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261309 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2541.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2541/head:pull/2541 PR: https://git.openjdk.java.net/jdk/pull/2541 From kim.barrett at oracle.com Fri Feb 12 09:58:37 2021 From: kim.barrett at oracle.com (Kim Barrett) Date: Fri, 12 Feb 2021 09:58:37 +0000 Subject: Atomic operations: your thoughts are welocme In-Reply-To: <49d0408a-13f9-ddc8-06e3-e0eb27a708dd@redhat.com> References: <448C638F-D688-4913-875C-5D8BA9235126@oracle.com> <49d0408a-13f9-ddc8-06e3-e0eb27a708dd@redhat.com> Message-ID: <032D7C47-8862-4FDE-9B88-CE209D64C46F@oracle.com> > On Feb 11, 2021, at 8:33 AM, Andrew Haley wrote: > > On 11/02/2021 03:59, Kim Barrett wrote: >>> On Feb 8, 2021, at 1:14 PM, Andrew Haley wrote: >>> >>> I've been looking at the hottest Atomic operations in HotSpot, with a view to >>> finding out if the default memory_order_conservative (which is very expensive >>> on some architectures) can be weakened to something less. It's impossible to >>> fix all of them, but perhaps we can fix some of the most frequent. >> >> Is there any information about the possible performance improvement from >> such changes? 1.5-3M occurrences doesn't mean much without context. >> >> We don't presently have support for sequentially consistent semantics, only >> "conservative". My recollection is that this is in part because there might >> be code that is assuming the possibly stronger "conservative" semantics, and >> in part because there are different and incompatible approaches to >> implementing sequentially consistent semantics on some hardware platforms >> and we didn't want to make assumptions there. >> >> We also don't presently have any cmpxchg implementation that really supports >> anything between conservative and relaxed, nor do we support different order >> constraints for the success vs failure cases. Things can be complicated >> enough as is; while we *could* fill some of that in, I'm not sure we should. > > OK. However, even though we don't implement any of them, we do have an > API that includes acq, rel, and seq_cst. The fact that we don't have > anything behind them is, I thought, To Be Done rather than Won't Do. My inclination is to be pretty conservative in this area. (No pun intended.) I'm not eager to have a lot of reviews like that for JDK-8154736. (And in looking back at that, I see we ended up not addressing non-ppc platforms, even though there was specific concern at the time that by not dealing with them (particularly arm/aarch64) that we might be fobbing off some really hard debugging on some poor future person.) >>> ::Table::oop_oop_iterate(G1CMOopClosure*, oopDesc*, Klass*)+336>: :: = 3903178 >>> >>> This is actually MarkBitMap::par_mark calling BitMap::par_set_bit. Does this >>> need to be memory_order_conservative, or would something weaker do? Even >>> acq_rel or seq_cst would be better. >> >> I think for setting bits in a bitmap the thing to do would be to identify >> places that are safe and useful (impacts performance) to do so first. Then >> add a weaker variant for use in those places, assuming any are found. > > I see. I'm assuming that frequency of use is a useful proxy for impact. > Aleksey has already, very helpfully, measured how significant these are > for Shenandoah, and I suspect all concurrent GCs would benefit in a > similar fashion. Absolute counts don't say much without context. So what if there are a million of these, if they are swamped by the 100 bazillion not-these? Aleksey's measurements turned out to be less informative to me than they seemed at first reading. Many of the proposed changes involve simple counters or accumulators. Changing such to use relaxed atomic addition operations is likely an easy improvement. But even that can suffer badly from contention. If one is serious about reducing the cost of multi-threaded accumulators, much better would be something like http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p0261r4.html >>> , (MEMFLAGS)5>*)+432>: :: = 1617659 >>> >>> This one is GenericTaskQueue::pop_global calling cmpxchg_age(). >>> Again, do we need conservative here? >> >> This needs at least sequentially consistent semantics on the success path. > > Yep. That's easy, it's the full barrier in the failure path that > I'd love to eliminate. Why does the failure path matter here? It should be rare [*], since it only fails when either there is contention between a thief and the owner for the sole entry in the queue, or there is contention between multiple thieves. The former should be rare because non-empty queues usually contain more than one element. The latter should be rare because of the random selection of queues the steal from. And in both cases a losing thief will look for a new queue to steal from. [*] The age/top (where pop_global takes from) and bottom (where push adds and pop_local takes from) used to be adjacent members, so local operations might induce false-sharing failures for the age/top CAS. These members were separated in JDK 15. From akozlov at openjdk.java.net Fri Feb 12 09:58:47 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Fri, 12 Feb 2021 09:58:47 GMT Subject: RFR: 8261075: Create stubRoutines.inline.hpp with SafeFetch implementation Message-ID: Hi, Please reivew a small non-functional change that extracts inline SafeFetch functions to a separate file. This is preliminary work for JEP-391 integration that will reduce the size of that patch. CC @dcubed-ojdk Thanks! ------------- Commit messages: - Update copyrights - Merge remote-tracking branch 'upstream/jdk/master' into 8261075-stubroutines-inline - Merge remote-tracking branch 'upstream/jdk/master' into 8261075-stubroutines-inline - Extract SafeFetch32/N to stubRoutines.inline.hpp Changes: https://git.openjdk.java.net/jdk/pull/2542/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2542&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261075 Stats: 86 lines in 11 files changed: 53 ins; 20 del; 13 mod Patch: https://git.openjdk.java.net/jdk/pull/2542.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2542/head:pull/2542 PR: https://git.openjdk.java.net/jdk/pull/2542 From mdoerr at openjdk.java.net Fri Feb 12 10:09:40 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Fri, 12 Feb 2021 10:09:40 GMT Subject: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v2] In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 07:26:49 GMT, Aleksey Shipilev wrote: >> Do we really need to change shenandoahBarrierSetAssembler_aarch64.cpp? >> Address dependency ensures the ordering. Or is there anything missing? > >> Do we really need to change shenandoahBarrierSetAssembler_aarch64.cpp? >> Address dependency ensures the ordering. Or is there anything missing? > > True, we could use address dependency for the ordering. This thing is more or less load-consume of the newly promoted object. But, I think reasoning with (stronger) acquires is more versatile for the code we do not control (C++ generally discourages using `memory_order_consume`; that is why our C++ code acquires mark word). While we control the assembler insns sequence, it seems good to match that. It does not seem worth over-relaxing the rare path: the CAS failure path under running GC. Thanks for your reply. Yeah, I got your point regarding C++. But we use load-consume a lot in our self-made assembly code which should be ok. I guess the shenandoahBarrierSetAssembler_aarch64.cpp part you're changing is not very perfomance sensitive? ------------- PR: https://git.openjdk.java.net/jdk/pull/2496 From aph at redhat.com Fri Feb 12 10:25:42 2021 From: aph at redhat.com (Andrew Haley) Date: Fri, 12 Feb 2021 10:25:42 +0000 Subject: Atomic operations: your thoughts are welocme In-Reply-To: <032D7C47-8862-4FDE-9B88-CE209D64C46F@oracle.com> References: <448C638F-D688-4913-875C-5D8BA9235126@oracle.com> <49d0408a-13f9-ddc8-06e3-e0eb27a708dd@redhat.com> <032D7C47-8862-4FDE-9B88-CE209D64C46F@oracle.com> Message-ID: <28181731-2880-5f85-80db-354881440295@redhat.com> On 12/02/2021 09:58, Kim Barrett wrote: >> On Feb 11, 2021, at 8:33 AM, Andrew Haley wrote: >> >> On 11/02/2021 03:59, Kim Barrett wrote: >>> >>> We also don't presently have any cmpxchg implementation that really supports >>> anything between conservative and relaxed, nor do we support different order >>> constraints for the success vs failure cases. Things can be complicated >>> enough as is; while we *could* fill some of that in, I'm not sure we should. >> >> OK. However, even though we don't implement any of them, we do have an >> API that includes acq, rel, and seq_cst. The fact that we don't have >> anything behind them is, I thought, To Be Done rather than Won't Do. > > My inclination is to be pretty conservative in this area. (No pun intended.) > I'm not eager to have a lot of reviews like that for JDK-8154736. (And in > looking back at that, I see we ended up not addressing non-ppc platforms, > even though there was specific concern at the time that by not dealing with > them (particularly arm/aarch64) that we might be fobbing off some really > hard debugging on some poor future person.) Sure, and as you are probably aware I've had to do that, more than once, on dusty old GC code that didn't follow the memory model. IMVHO, there are not many places where seq_cst won't be adequate. >> I see. I'm assuming that frequency of use is a useful proxy for impact. >> Aleksey has already, very helpfully, measured how significant these are >> for Shenandoah, and I suspect all concurrent GCs would benefit in a >> similar fashion. > > Absolute counts don't say much without context. So what if there are a > million of these, if they are swamped by the 100 bazillion not-these? > > Aleksey's measurements turned out to be less informative to me than they > seemed at first reading. Many of the proposed changes involve simple > counters or accumulators. Changing such to use relaxed atomic addition > operations is likely an easy improvement. But even that can suffer badly > from contention. If one is serious about reducing the cost of multi-threaded > accumulators, much better would be something like > http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p0261r4.html I very strongly disagree. Aleksey managed to prove a substantial gain with only a couple of hours' work. We're talking about low- hanging fruit here. >>>> , (MEMFLAGS)5>*)+432>: :: = 1617659 >>>> >>>> This one is GenericTaskQueue::pop_global calling cmpxchg_age(). >>>> Again, do we need conservative here? >>> >>> This needs at least sequentially consistent semantics on the success path. >> >> Yep. That's easy, it's the full barrier in the failure path that >> I'd love to eliminate. > > Why does the failure path matter here? > > It should be rare [*], since it only fails when either there is contention > between a thief and the owner for the sole entry in the queue, or there is > contention between multiple thieves. OK, so that's useful guidance for an implementer: full barriers for CAS failures should be wrapped in a conditional. That is a pain, because it complexifies the code, but OK. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From akozlov at openjdk.java.net Fri Feb 12 10:30:41 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Fri, 12 Feb 2021 10:30:41 GMT Subject: Integrated: 8261071: AArch64: Refactor interpreter native wrappers In-Reply-To: References: Message-ID: <1wVsScSxD4_wEMBQtNp6yLDYJ-CE3GZeIheiYD99PY8=.9775b0a8-0d52-4c24-809b-2bfdf04b5524@github.com> On Thu, 4 Feb 2021 22:01:34 GMT, Anton Kozlov wrote: > Please review refactoring of interpreter signature handlers on aarch64. The main objective is to prepare for the new calling convention of macOS/AArch64, although this patch brings nothing from the new convention. > > Tested with signature stress tests and tier1 on Linux/AArch64. > > I have stared with a single function implementing SlowSignatureHandler (https://github.com/openjdk/jdk/commit/5ef1bd15c3bb174f4aed5e358d1ce2fff2846858#diff-1ff58ce70aeea7e9842d34e8d8fd9c94dd91182999d455618b2a171efd8f742cR164). The single function was compact but obscure. I was shuffling it until I eventually came to something similar of the initial approach with few pieces abstracted away. > > The most notable changes in the final version should be > * we count only parameters passed in registers > * ldrw/strw are used to pass via stack in SignatureHandlerGenerator::pass_int This pull request has now been integrated. Changeset: 682e78e8 Author: Anton Kozlov Committer: Vladimir Kempik URL: https://git.openjdk.java.net/jdk/commit/682e78e8 Stats: 266 lines in 2 files changed: 39 ins; 147 del; 80 mod 8261071: AArch64: Refactor interpreter native wrappers Reviewed-by: aph ------------- PR: https://git.openjdk.java.net/jdk/pull/2413 From shade at openjdk.java.net Fri Feb 12 10:59:38 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 12 Feb 2021 10:59:38 GMT Subject: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v2] In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 10:06:52 GMT, Martin Doerr wrote: > I guess the shenandoahBarrierSetAssembler_aarch64.cpp part you're changing is not very perfomance sensitive? Yes, it is not supposed to be: CAS failure path when GC is relocating the objects. ------------- PR: https://git.openjdk.java.net/jdk/pull/2496 From vkempik at openjdk.java.net Fri Feb 12 11:45:49 2021 From: vkempik at openjdk.java.net (Vladimir Kempik) Date: Fri, 12 Feb 2021 11:45:49 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v10] In-Reply-To: <9Nasu4m7orJoGYjX4EYCuz5-aevYNno3Ru3jPHgwkvc=.168cfdf0-648b-46e4-9cb4-b24956eeba7d@github.com> References: <9Nasu4m7orJoGYjX4EYCuz5-aevYNno3Ru3jPHgwkvc=.168cfdf0-648b-46e4-9cb4-b24956eeba7d@github.com> Message-ID: On Thu, 4 Feb 2021 21:59:02 GMT, Gerard Ziemski wrote: >> Anton Kozlov has updated the pull request incrementally with six additional commits since the last revision: >> >> - Merge remote-tracking branch 'origin/jdk/jdk-macos' into jdk-macos >> - Add comments to WX transitions >> >> + minor change of placements >> - Use macro conditionals instead of empty functions >> - Add W^X to tests >> - Do not require known W^X state >> - Revert w^x in gtests > > src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp line 194: > >> 192: // may get turned off by -fomit-frame-pointer. >> 193: frame os::get_sender_for_C_frame(frame* fr) { >> 194: return frame(fr->link(), fr->link(), fr->sender_pc()); > > Why is it > > return frame(fr->link(), fr->link(), fr->sender_pc()); > > and not > > return frame(fr->sender_sp(), fr->link(), fr->sender_pc()); > > like in the bsd-x86 counter part? bsd_aarcb64 was based on linux_aarch64, with addition of bsd-specific things from bsd_x86 You think the bsd-x86 is better here ? ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From vkempik at openjdk.java.net Fri Feb 12 11:58:38 2021 From: vkempik at openjdk.java.net (Vladimir Kempik) Date: Fri, 12 Feb 2021 11:58:38 GMT Subject: RFR: 8261072: AArch64: Fix MacroAssembler::get_thread convention In-Reply-To: <4AGCX0R1EP_qxN7Uux7IsDQFFXN7VH_8Is8Yt8xYRc4=.e9eb6a11-9c4d-4384-97c9-25cc1d71561a@github.com> References: <4AGCX0R1EP_qxN7Uux7IsDQFFXN7VH_8Is8Yt8xYRc4=.e9eb6a11-9c4d-4384-97c9-25cc1d71561a@github.com> Message-ID: <6eMLRh3HAuk4sgcP90QwDzopiel75i-VuG1dvoYL8Ds=.6a478045-e34b-45bf-90d5-cd3f50b05a8f@github.com> On Mon, 8 Feb 2021 10:25:19 GMT, Bernhard Urban-Forster wrote: >> Please review a fix in a special calling convention for aarch64_get_thread_helper for non-Linux platforms (windows/aarch64 for now). >> >> Preliminary review: https://mail.openjdk.java.net/pipermail/aarch64-port-dev/2021-January/011239.html > > Marked as reviewed by burban (Author). Looks good, will make other non-linux platform work without the need to change this ever again ------------- PR: https://git.openjdk.java.net/jdk/pull/2451 From lucy at openjdk.java.net Fri Feb 12 12:01:38 2021 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Fri, 12 Feb 2021 12:01:38 GMT Subject: RFR: 8261447: MethodInvocationCounters frequently run into overflow [v2] In-Reply-To: <7S5jdlFpZ5m2xtFinD92jQEQm6hgbQXjHR5N-3XbXkc=.fe75978e-860a-4bcc-b5b4-3e4b4246d706@github.com> References: <7S5jdlFpZ5m2xtFinD92jQEQm6hgbQXjHR5N-3XbXkc=.fe75978e-860a-4bcc-b5b4-3e4b4246d706@github.com> Message-ID: On Fri, 12 Feb 2021 08:29:37 GMT, Lutz Schmidt wrote: >> I don't really like these .*64 suffixes. Can we just make the counter 64 bit, update the SA as Tobias suggested and keep the existing method names? Or is there a reason for doing this that eludes me? > > I introduced the *64 suffixes to not break anything that still uses the old calls. As old uses disappear step by step, I'm more than happy to remove the suffixes. I will have a look into SA and try to make it 64bit counter ready. There may be no new version before the weekend is over. This is a request for help. Could someone with SA knowledge please check if my assumption is correct? In hotspot code, the field Method::_compiled_invocation_count is annotated with a comment that it is used by SA. The field is also exposed via vmStructs.cpp to enable such use. I have scanned SA code in OpenJDK11 and OpenJDK head but found no evidence that this particular field is accessed. Is this finding/assumption correct? If so, I could just stop exposing the field, making my life easier. Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/2511 From vkempik at openjdk.java.net Fri Feb 12 12:07:52 2021 From: vkempik at openjdk.java.net (Vladimir Kempik) Date: Fri, 12 Feb 2021 12:07:52 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 22:07:15 GMT, Daniel D. Daugherty wrote: >> Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> support macos_aarch64 in hsdis > > src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp line 195: > >> 193: frame os::get_sender_for_C_frame(frame* fr) { >> 194: return frame(fr->link(), fr->link(), fr->sender_pc()); >> 195: } > > Is this file going to be built by GCC or just macOS compilers? there is no support for compiling java with gcc on macos since about jdk11, only clang. considering this and the absence of gcc for macos_m1, the answer is - just macOS compilers. ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From vkempik at openjdk.java.net Fri Feb 12 12:25:47 2021 From: vkempik at openjdk.java.net (Vladimir Kempik) Date: Fri, 12 Feb 2021 12:25:47 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: Message-ID: On Thu, 4 Feb 2021 22:54:42 GMT, Gerard Ziemski wrote: >> src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp line 363: >> >>> 361: address pc = os::Posix::ucontext_get_pc(uc); >>> 362: >>> 363: if (pc != addr && uc->context_esr == 0x9200004F) { //TODO: figure out what this value means >> >> Is this TODO going to be resolved by this port? > > Where did this come from - some snippet/example/tech note code? Maybe other people can help figure it out if we provide more info. This is the version of w^x on-demand switch implemented by microsoft guys. This is enabled only for debug builds. @lewurm could you comment here please ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From vkempik at openjdk.java.net Fri Feb 12 12:25:48 2021 From: vkempik at openjdk.java.net (Vladimir Kempik) Date: Fri, 12 Feb 2021 12:25:48 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 22:12:07 GMT, Daniel D. Daugherty wrote: >> Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> support macos_aarch64 in hsdis > > src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp line 435: > >> 433: // | |\ Java thread created by VM does not have glibc >> 434: // | glibc guard page | - guard, attached Java thread usually has >> 435: // | |/ 1 glibc guard page. > > Is this code going to be built by GCC (with glibc) or will only > macOS compilers and libraries be used? only macos comiplers ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From stuefe at openjdk.java.net Fri Feb 12 12:33:49 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 12 Feb 2021 12:33:49 GMT Subject: RFR: JDK-8261644: NMT: Simplifications and cleanups Message-ID: Hi, may I please have reviews for this RFE? While working on NMT I found a number of possible cleanups and simplifications. I avoided mixing these cleanups with fixed and instead put them into this cleanup RFE. - de-templatize `AllocationSite` since E was used as simple data holder for child classes; the same effect can be had with traditional inheritance with less and clearer code (also IDEs get less confused) - `AllocationSite` child classes `SimpleThreadStackSite`, `VirtualMemoryAllocationSite`, `MallocSite` were simplified. - As for `SimpleThreadStackSite`, we can get rid of the separate data holder class `SimpleThreadStack` entirely by merging its members directly into `SimpleThreadStackSite`. In theory we could do the same for the data holder classes `MemoryCounter` and `VirtualMemory` for `MallocSite` and `VirtualMemoryAllocationSite` too but this would cause larger ripples so I stopped there. - removed the SimpleThreadStackSite(address base, size_t size) constructor (the one not taking a call stack) by slightly rewriting its sole user - made `AllocationSite` immutable - removed unused default constructors from `MallocSite` and `MallocSiteHashTableEntry` since they were not needed - removed unused methods `set_callsite()`, `hash()`, `equals()` from `MallocSiteHashTableEntry` - There was a subtle incorrectness where `AllocationSite::equals()` would only compare callstack and disregard the MEMFLAGS member. Theoretically, if two callstacks end with the same lowest frame, they should always reference the same single allocation, so that's okay. But if the call stack capturing was not precise enough (eg skipping too many low frames) we may accidentally lump several allocation sites together which could have different MEMFLAGS. I added an assert to check that. - `NativeCallStack`: Removed the `fillStack` argument from the first constructor to avoid having to evaluate it in this hot constructor. Its true in almost all cases. - Also removed the `toSkip` default value. Instead, I added an explicit default constructor. - Moved the malloc site table tuning statistics printing from memtracker.cpp down into a new function `MallocSiteTable::print_tuning_statistics()`. When implemented inside `MallocSiteTable`, that coding does not need a walker object anymore and becomes a lot simpler. In particular, we don't have to rely on implicit knowledge about walking order, which made the code complex and was vulnerable against subtle errors. New code is more compact and simpler. ------------- Commit messages: - start Changes: https://git.openjdk.java.net/jdk/pull/2539/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2539&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261644 Stats: 325 lines in 10 files changed: 98 ins; 177 del; 50 mod Patch: https://git.openjdk.java.net/jdk/pull/2539.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2539/head:pull/2539 PR: https://git.openjdk.java.net/jdk/pull/2539 From fweimer at openjdk.java.net Fri Feb 12 12:42:48 2021 From: fweimer at openjdk.java.net (Florian Weimer) Date: Fri, 12 Feb 2021 12:42:48 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 12:22:44 GMT, Vladimir Kempik wrote: >> src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp line 435: >> >>> 433: // | |\ Java thread created by VM does not have glibc >>> 434: // | glibc guard page | - guard, attached Java thread usually has >>> 435: // | |/ 1 glibc guard page. >> >> Is this code going to be built by GCC (with glibc) or will only >> macOS compilers and libraries be used? > > only macos comiplers The comment is also wrong for glibc: The AArch64 ABI requires a 64 KiB guard region independently of page size, otherwise `-fstack-clash-protection` is not reliable. ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From aph at openjdk.java.net Fri Feb 12 13:13:42 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 12 Feb 2021 13:13:42 GMT Subject: RFR: 8261072: AArch64: Fix MacroAssembler::get_thread convention In-Reply-To: References: Message-ID: On Mon, 8 Feb 2021 08:26:41 GMT, Anton Kozlov wrote: > Please review a fix in a special calling convention for aarch64_get_thread_helper for non-Linux platforms (windows/aarch64 for now). > > Preliminary review: https://mail.openjdk.java.net/pipermail/aarch64-port-dev/2021-January/011239.html src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5270: > 5268: // > 5269: // On Linux, aarch64_get_thread_helper() clobbers only r0, r1, and flags. > 5270: // On Windows, the helper is a usual C function. This should say "other systems", not "Windows". Otherwise OK. ------------- PR: https://git.openjdk.java.net/jdk/pull/2451 From aph at openjdk.java.net Fri Feb 12 13:15:46 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 12 Feb 2021 13:15:46 GMT Subject: Integrated: 8261027: AArch64: Support for LSE atomics C++ HotSpot code In-Reply-To: References: Message-ID: On Fri, 5 Feb 2021 18:56:46 GMT, Andrew Haley wrote: > Go back a few years, and there were simple atomic load/store exclusive > instructions on Arm. Say you want to do an atomic increment of a > counter. You'd do an atomic load to get the counter into your local cache > in exclusive state, increment that counter locally, then write that > incremented counter back to memory with an atomic store. All the time > that cache line was in exclusive state, so you're guaranteed that > no-one else changed anything on that cache line while you had it. > > This is hard to scale on a very large system (e.g. Fugaku) because if > many processors are incrementing that counter you get a lot of cache > line ping-ponging between cores. > > So, Arm decided to add a locked memory increment instruction that > works without needing to load an entire line into local cache. It's a > single instruction that loads, increments, and writes back. The secret > is to send a cache control message to whichever processor owns the > cache line containing the count, tell that processor to increment the > counter and return the incremented value. That way cache coherency > traffic is mimimized. This new set of instructions is known as Large > System Extensions, or LSE. > > Unfortunately, in recent processors, the "old" load/store exclusive > instructions, sometimes perform very badly. Therefore, it's now > necessary for software to detect which version of Arm it's running > on, and use the "new" LSE instructions if they're available. Otherwise > performance can be very poor under heavy contention. > > GCC's -moutline-atomics does this by providing library calls which use > LSE if it's available, but this option is only provided on newer > versions of GCC. This is particularly problematic with older versions > of OpenJDK, which build using old GCC versions. > > Also, I suspect that some other operating systems could use this. > Perhaps not MacOS, given that all Apple CPUs support LSE, but > maybe Windows. This pull request has now been integrated. Changeset: 40ae9937 Author: Andrew Haley URL: https://git.openjdk.java.net/jdk/commit/40ae9937 Stats: 411 lines in 6 files changed: 377 ins; 6 del; 28 mod 8261027: AArch64: Support for LSE atomics C++ HotSpot code Reviewed-by: adinn, simonis ------------- PR: https://git.openjdk.java.net/jdk/pull/2434 From akozlov at openjdk.java.net Fri Feb 12 13:19:54 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Fri, 12 Feb 2021 13:19:54 GMT Subject: RFR: 8261072: AArch64: Fix MacroAssembler::get_thread convention [v2] In-Reply-To: References: Message-ID: > Please review a fix in a special calling convention for aarch64_get_thread_helper for non-Linux platforms (windows/aarch64 for now). > > Preliminary review: https://mail.openjdk.java.net/pipermail/aarch64-port-dev/2021-January/011239.html Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: Change comment ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2451/files - new: https://git.openjdk.java.net/jdk/pull/2451/files/217d3c23..3b111ae5 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2451&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2451&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2451.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2451/head:pull/2451 PR: https://git.openjdk.java.net/jdk/pull/2451 From akozlov at openjdk.java.net Fri Feb 12 13:19:55 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Fri, 12 Feb 2021 13:19:55 GMT Subject: RFR: 8261072: AArch64: Fix MacroAssembler::get_thread convention [v2] In-Reply-To: References: Message-ID: <-F-3Q84KjYHr7UAniwoJLSxFiY35DMu5mwlFX1icde8=.e95caa89-7fda-40b7-b06c-aa11c6026457@github.com> On Fri, 12 Feb 2021 13:10:25 GMT, Andrew Haley wrote: >> Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> Change comment > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5270: > >> 5268: // >> 5269: // On Linux, aarch64_get_thread_helper() clobbers only r0, r1, and flags. >> 5270: // On Windows, the helper is a usual C function. > > This should say "other systems", not "Windows". Otherwise OK. Thanks, fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/2451 From vkempik at openjdk.java.net Fri Feb 12 13:35:48 2021 From: vkempik at openjdk.java.net (Vladimir Kempik) Date: Fri, 12 Feb 2021 13:35:48 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 22:14:42 GMT, Daniel D. Daugherty wrote: >> Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> support macos_aarch64 in hsdis > > src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp line 486: > >> 484: } >> 485: } >> 486: } > > This appears to be a mix for Mavericks (10.9) and 10.12 > work arounds. Is this code needed by this project? I wasn't able to replicate JDK-8020753 and JDK-8186286. So will remove these workaround Gerard, 8020753 was originally your fix, do you know if it still needed on intel-mac ? ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From akozlov at openjdk.java.net Fri Feb 12 13:46:09 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Fri, 12 Feb 2021 13:46:09 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v13] In-Reply-To: References: Message-ID: > Please review the implementation of JEP 391: macOS/AArch64 Port. > > It's heavily based on existing ports to linux/aarch64, macos/x86_64, and windows/aarch64. > > Major changes are in: > * src/hotspot/cpu/aarch64: support of the new calling convention (subtasks JDK-8253817, JDK-8253818) > * src/hotspot/os_cpu/bsd_aarch64: copy of os_cpu/linux_aarch64 with necessary adjustments (JDK-8253819) > * src/hotspot/share, test/hotspot/gtest: support of write-xor-execute (W^X), required on macOS/AArch64 platform. It's implemented with pthread_jit_write_protect_np provided by Apple. The W^X mode is local to a thread, so W^X mode change relates to the java thread state change (for java threads). In most cases, JVM executes in write-only mode, except when calling a generated stub like SafeFetch, which requires a temporary switch to execute-only mode. The same execute-only mode is enabled when a java thread executes in java or native states. This approach of managing W^X mode turned out to be simple and efficient enough. > * src/jdk.hotspot.agent: serviceability agent implementation (JDK-8254941) Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: JDK-8257882: oops, fixed 7fe50a996b6f436932452d220b351c73153ed945 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2200/files - new: https://git.openjdk.java.net/jdk/pull/2200/files/0d0e9baf..ad4e4c65 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2200&range=12 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2200&range=11-12 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2200.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2200/head:pull/2200 PR: https://git.openjdk.java.net/jdk/pull/2200 From akozlov at openjdk.java.net Fri Feb 12 13:50:49 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Fri, 12 Feb 2021 13:50:49 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 09:11:50 GMT, Andrew Haley wrote: >> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 323: >> >>> 321: str(zr, Address(rthread, JavaThread::last_Java_pc_offset())); >>> 322: >>> 323: str(zr, Address(rthread, JavaFrameAnchor::saved_fp_address_offset())); >> >> I don't think this switch from `JavaThread::saved_fp_address_offset()` >> to `JavaFrameAnchor::saved_fp_address_offset()` is correct since >> `rthread` is still used and is a JavaThread*. The new code will give you: >> >> `rthread` + offset of the `saved_fp_address` field in a JavaFrameAnchor >> >> The old code gave you: >> >> `rthread` + offset of the `saved_fp_address` field in the JavaFrameAnchor field in the JavaThread >> >> Those are not the same things. > > I agree, I don't understand why this change was made. Wow, this is scary. I don't understand how I've merged JDK-8257882 like this. I've reviewed cpu/aarch64 changes again, there is nothing suspicious besides this. Thank you very much for catching, fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From vkempik at openjdk.java.net Fri Feb 12 14:07:51 2021 From: vkempik at openjdk.java.net (Vladimir Kempik) Date: Fri, 12 Feb 2021 14:07:51 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 22:08:14 GMT, Daniel D. Daugherty wrote: >> Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> support macos_aarch64 in hsdis > > src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp line 221: > >> 219: assert(sig == info->si_signo, "bad siginfo"); >> 220: } >> 221: */ > > Should this code be deleted? From here and from where it was copied > from if it is also commented out there... Thanks, will fix in bsd_aarch64 soon, as for bsd_x86 I've filled new bug and pr - https://github.com/openjdk/jdk/pull/2547 ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From coleenp at openjdk.java.net Fri Feb 12 14:12:40 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 12 Feb 2021 14:12:40 GMT Subject: RFR: 8261608: Move common CDS archive building code to archiveBuilder.cpp [v2] In-Reply-To: <7g0L5oYcLEqm2b-mg_7heIQFeOj0atCRN-3Y_gy0UzE=.efc411e4-736b-4f9e-8d47-a814bfacf08c@github.com> References: <7g0L5oYcLEqm2b-mg_7heIQFeOj0atCRN-3Y_gy0UzE=.efc411e4-736b-4f9e-8d47-a814bfacf08c@github.com> Message-ID: On Fri, 12 Feb 2021 04:08:03 GMT, Ioi Lam wrote: >> This is a follow-up to https://git.openjdk.java.net/jdk/pull/2296: >> >> - Move common code for writing the CDS archive from metaspaceShared.cpp to archiveBuilder.cpp >> >> - Data structures related to dumping were haphazardly organized in several classes (e.g., `DumpRegions`). We needed various APIs to access them across classes. These should be consolidated in archiveBuilder.cpp and the API should be cleaned up >> >> - Detailed stats (`DumpAllocStats::print_stats`) were available only for static dump. Refactor the code so they are also printed for dynamic dump > > Ioi Lam has updated the pull request incrementally with two additional commits since the last revision: > > - fixed spaces > - use member initializer list; clean up log message Looks good! ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2536 From aph at openjdk.java.net Fri Feb 12 15:00:42 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 12 Feb 2021 15:00:42 GMT Subject: RFR: 8261072: AArch64: Fix MacroAssembler::get_thread convention [v2] In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 13:19:54 GMT, Anton Kozlov wrote: >> Please review a fix in a special calling convention for aarch64_get_thread_helper for non-Linux platforms (windows/aarch64 for now). >> >> Preliminary review: https://mail.openjdk.java.net/pipermail/aarch64-port-dev/2021-January/011239.html > > Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Change comment Marked as reviewed by aph (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2451 From mdoerr at openjdk.java.net Fri Feb 12 15:03:51 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Fri, 12 Feb 2021 15:03:51 GMT Subject: RFR: 8261655: [PPC64] Build broken after JDK-8260941 Message-ID: <5YuAUGgXGkrfi3W9EfqVfYxaaT-B-gQbzahRA--fmHo=.3edf75da-92dc-4fb0-a939-0c2a2d0f654c@github.com> We have to add back `CardTable* ct = ctbs->card_table();` after it was removed with JDK-8260941. ------------- Commit messages: - 8261655: [PPC64] Build broken after JDK-8260941 Changes: https://git.openjdk.java.net/jdk/pull/2548/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2548&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261655 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2548.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2548/head:pull/2548 PR: https://git.openjdk.java.net/jdk/pull/2548 From akozlov at openjdk.java.net Fri Feb 12 15:14:38 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Fri, 12 Feb 2021 15:14:38 GMT Subject: Integrated: 8261072: AArch64: Fix MacroAssembler::get_thread convention In-Reply-To: References: Message-ID: On Mon, 8 Feb 2021 08:26:41 GMT, Anton Kozlov wrote: > Please review a fix in a special calling convention for aarch64_get_thread_helper for non-Linux platforms (windows/aarch64 for now). > > Preliminary review: https://mail.openjdk.java.net/pipermail/aarch64-port-dev/2021-January/011239.html This pull request has now been integrated. Changeset: b670efd8 Author: Anton Kozlov Committer: Vladimir Kempik URL: https://git.openjdk.java.net/jdk/commit/b670efd8 Stats: 6 lines in 1 file changed: 4 ins; 0 del; 2 mod 8261072: AArch64: Fix MacroAssembler::get_thread convention Reviewed-by: burban, aph ------------- PR: https://git.openjdk.java.net/jdk/pull/2451 From dcubed at openjdk.java.net Fri Feb 12 15:18:44 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Fri, 12 Feb 2021 15:18:44 GMT Subject: RFR: JDK-8261644: NMT: Simplifications and cleanups In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 07:16:45 GMT, Thomas Stuefe wrote: > Hi, > > may I please have reviews for this RFE? > > While working on NMT I found a number of possible cleanups and simplifications. I avoided mixing these cleanups with fixed and instead put them into this cleanup RFE. > > - de-templatize `AllocationSite` since E was used as simple data holder for child classes; the same effect can be had with traditional inheritance with less and clearer code (also IDEs get less confused) > > - `AllocationSite` child classes `SimpleThreadStackSite`, `VirtualMemoryAllocationSite`, `MallocSite` were simplified. > > - As for `SimpleThreadStackSite`, we can get rid of the separate data holder class `SimpleThreadStack` entirely by merging its members directly into `SimpleThreadStackSite`. In theory we could do the same for the data holder classes `MemoryCounter` and `VirtualMemory` for `MallocSite` and `VirtualMemoryAllocationSite` too but this would cause larger ripples so I stopped there. > > - removed the SimpleThreadStackSite(address base, size_t size) constructor (the one not taking a call stack) by slightly rewriting its sole user > > - made `AllocationSite` immutable > > - removed unused default constructors from `MallocSite` and `MallocSiteHashTableEntry` since they were not needed > > - removed unused methods `set_callsite()`, `hash()`, `equals()` from `MallocSiteHashTableEntry` > > - There was a subtle incorrectness where `AllocationSite::equals()` would only compare callstack and disregard the MEMFLAGS member. Theoretically, if two callstacks end with the same lowest frame, they should always reference the same single allocation, so that's okay. But if the call stack capturing was not precise enough (eg skipping too many low frames) we may accidentally lump several allocation sites together which could have different MEMFLAGS. I added an assert to check that. > > - `NativeCallStack`: Removed the `fillStack` argument from the first constructor to avoid having to evaluate it in this hot constructor. Its true in almost all cases. > > - Also removed the `toSkip` default value. Instead, I added an explicit default constructor. > > - Moved the malloc site table tuning statistics printing from memtracker.cpp down into a new function `MallocSiteTable::print_tuning_statistics()`. When implemented inside `MallocSiteTable`, that coding does not need a walker object anymore and becomes a lot simpler. In particular, we don't have to rely on implicit knowledge about walking order, which made the code complex and was vulnerable against subtle errors. New code is more compact and simpler. @tstuefe - Thanks for keeping cleanups separated from bug fixes. That always makes the PR reviews easier. I don't see any information about what testing was done on this PR. ------------- PR: https://git.openjdk.java.net/jdk/pull/2539 From vkempik at openjdk.java.net Fri Feb 12 15:25:54 2021 From: vkempik at openjdk.java.net (Vladimir Kempik) Date: Fri, 12 Feb 2021 15:25:54 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v10] In-Reply-To: References: Message-ID: <-PhzrEcgREcbXuZ5GrxAfVa6Uwil9YoOkZULt1154rw=.9689a79e-cf61-4f79-9b36-a3295fecab7b@github.com> On Thu, 4 Feb 2021 22:49:23 GMT, Gerard Ziemski wrote: >> Anton Kozlov has updated the pull request incrementally with six additional commits since the last revision: >> >> - Merge remote-tracking branch 'origin/jdk/jdk-macos' into jdk-macos >> - Add comments to WX transitions >> >> + minor change of placements >> - Use macro conditionals instead of empty functions >> - Add W^X to tests >> - Do not require known W^X state >> - Revert w^x in gtests > > src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp line 297: > >> 295: stub = SharedRuntime::handle_unsafe_access(thread, next_pc); >> 296: } >> 297: } else if (sig == SIGILL && nativeInstruction_at(pc)->is_stop()) { > > Can we add a comment here describing what this case means? this arm64 specific part came as is from linux_aarch64 and I can't add any meaning comments here. > src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp line 302: > >> 300: const uint64_t *detail_msg_ptr >> 301: = (uint64_t*)(pc + NativeInstruction::instruction_size); >> 302: const char *detail_msg = (const char *)*detail_msg_ptr; > > Where is `detail_msg` used? Came from linux_arm64. was used in os_linux_aarch64.cpp on line 246 in report_and_die But became unused on bsd_arm64. I agree this needs to be removed > src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp line 403: > >> 401: } >> 402: >> 403: return false; // Mute compiler > > Is this comment needed? this part came as is from linux_aarch64 as well and was supposed to mean something there. ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From shade at openjdk.java.net Fri Feb 12 15:39:48 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 12 Feb 2021 15:39:48 GMT Subject: RFR: 8261655: [PPC64] Build broken after JDK-8260941 In-Reply-To: <5YuAUGgXGkrfi3W9EfqVfYxaaT-B-gQbzahRA--fmHo=.3edf75da-92dc-4fb0-a939-0c2a2d0f654c@github.com> References: <5YuAUGgXGkrfi3W9EfqVfYxaaT-B-gQbzahRA--fmHo=.3edf75da-92dc-4fb0-a939-0c2a2d0f654c@github.com> Message-ID: On Fri, 12 Feb 2021 14:59:13 GMT, Martin Doerr wrote: > We have to add back `CardTable* ct = ctbs->card_table();` after it was removed with JDK-8260941. Looks good and trivial. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2548 From clanger at openjdk.java.net Fri Feb 12 15:43:39 2021 From: clanger at openjdk.java.net (Christoph Langer) Date: Fri, 12 Feb 2021 15:43:39 GMT Subject: RFR: 8261655: [PPC64] Build broken after JDK-8260941 In-Reply-To: <5YuAUGgXGkrfi3W9EfqVfYxaaT-B-gQbzahRA--fmHo=.3edf75da-92dc-4fb0-a939-0c2a2d0f654c@github.com> References: <5YuAUGgXGkrfi3W9EfqVfYxaaT-B-gQbzahRA--fmHo=.3edf75da-92dc-4fb0-a939-0c2a2d0f654c@github.com> Message-ID: <46eASIG9GRrQA4QQBeo_UmCcBugVqwYgBEeXu2MKAMw=.3316afee-086d-4b14-b0f0-d343f9ed34c0@github.com> On Fri, 12 Feb 2021 14:59:13 GMT, Martin Doerr wrote: > We have to add back `CardTable* ct = ctbs->card_table();` after it was removed with JDK-8260941. Marked as reviewed by clanger (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2548 From mdoerr at openjdk.java.net Fri Feb 12 15:51:40 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Fri, 12 Feb 2021 15:51:40 GMT Subject: RFR: 8261655: [PPC64] Build broken after JDK-8260941 In-Reply-To: <46eASIG9GRrQA4QQBeo_UmCcBugVqwYgBEeXu2MKAMw=.3316afee-086d-4b14-b0f0-d343f9ed34c0@github.com> References: <5YuAUGgXGkrfi3W9EfqVfYxaaT-B-gQbzahRA--fmHo=.3edf75da-92dc-4fb0-a939-0c2a2d0f654c@github.com> <46eASIG9GRrQA4QQBeo_UmCcBugVqwYgBEeXu2MKAMw=.3316afee-086d-4b14-b0f0-d343f9ed34c0@github.com> Message-ID: On Fri, 12 Feb 2021 15:41:14 GMT, Christoph Langer wrote: >> We have to add back `CardTable* ct = ctbs->card_table();` after it was removed with JDK-8260941. > > Marked as reviewed by clanger (Reviewer). Thanks for the reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/2548 From mdoerr at openjdk.java.net Fri Feb 12 15:51:42 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Fri, 12 Feb 2021 15:51:42 GMT Subject: Integrated: 8261655: [PPC64] Build broken after JDK-8260941 In-Reply-To: <5YuAUGgXGkrfi3W9EfqVfYxaaT-B-gQbzahRA--fmHo=.3edf75da-92dc-4fb0-a939-0c2a2d0f654c@github.com> References: <5YuAUGgXGkrfi3W9EfqVfYxaaT-B-gQbzahRA--fmHo=.3edf75da-92dc-4fb0-a939-0c2a2d0f654c@github.com> Message-ID: On Fri, 12 Feb 2021 14:59:13 GMT, Martin Doerr wrote: > We have to add back `CardTable* ct = ctbs->card_table();` after it was removed with JDK-8260941. This pull request has now been integrated. Changeset: 6475d477 Author: Martin Doerr URL: https://git.openjdk.java.net/jdk/commit/6475d477 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8261655: [PPC64] Build broken after JDK-8260941 Reviewed-by: shade, clanger ------------- PR: https://git.openjdk.java.net/jdk/pull/2548 From aph at openjdk.java.net Fri Feb 12 16:32:48 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 12 Feb 2021 16:32:48 GMT Subject: RFR: 8261660: AArch64: Race condition in stub code generation for LSE Atomics Message-ID: Temporary fix for race condition. There's a narrow race condition in the code which generates LSE Atomic stubs and enables them for use by the runtime. DIsable LSE stub generation for now. ------------- Commit messages: - Committed Changes: https://git.openjdk.java.net/jdk/pull/2553/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2553&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261660 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2553.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2553/head:pull/2553 PR: https://git.openjdk.java.net/jdk/pull/2553 From dcubed at openjdk.java.net Fri Feb 12 16:45:40 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Fri, 12 Feb 2021 16:45:40 GMT Subject: RFR: 8261660: AArch64: Race condition in stub code generation for LSE Atomics In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 16:27:09 GMT, Andrew Haley wrote: > Temporary fix for race condition. > > There's a narrow race condition in the code which generates LSE Atomic stubs and enables them for use by the runtime. DIsable LSE stub generation for now. Looks okay, but I have to wonder if the rest of the template changes in src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.hpp that were made by https://bugs.openjdk.java.net/browse/JDK-8261027 also need to be backed out? Please clarify. ------------- Changes requested by dcubed (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2553 From stuefe at openjdk.java.net Fri Feb 12 16:56:40 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 12 Feb 2021 16:56:40 GMT Subject: RFR: JDK-8261644: NMT: Simplifications and cleanups In-Reply-To: References: Message-ID: <8M2dRv2ylTX837yvn9xj_pF01wXMumXi3LLo8Ka3Ua0=.4fe0585d-811d-4862-94da-09cd26d25636@github.com> On Fri, 12 Feb 2021 07:16:45 GMT, Thomas Stuefe wrote: > Hi, > > may I please have reviews for this RFE? > > While working on NMT I found a number of possible cleanups and simplifications. I avoided mixing these cleanups with fixed and instead put them into this cleanup RFE. > > - de-templatize `AllocationSite` since E was used as simple data holder for child classes; the same effect can be had with traditional inheritance with less and clearer code (also IDEs get less confused) > > - `AllocationSite` child classes `SimpleThreadStackSite`, `VirtualMemoryAllocationSite`, `MallocSite` were simplified. > > - As for `SimpleThreadStackSite`, we can get rid of the separate data holder class `SimpleThreadStack` entirely by merging its members directly into `SimpleThreadStackSite`. In theory we could do the same for the data holder classes `MemoryCounter` and `VirtualMemory` for `MallocSite` and `VirtualMemoryAllocationSite` too but this would cause larger ripples so I stopped there. > > - removed the SimpleThreadStackSite(address base, size_t size) constructor (the one not taking a call stack) by slightly rewriting its sole user > > - made `AllocationSite` immutable > > - removed unused default constructors from `MallocSite` and `MallocSiteHashTableEntry` since they were not needed > > - removed unused methods `set_callsite()`, `hash()`, `equals()` from `MallocSiteHashTableEntry` > > - There was a subtle incorrectness where `AllocationSite::equals()` would only compare callstack and disregard the MEMFLAGS member. Theoretically, if two callstacks end with the same lowest frame, they should always reference the same single allocation, so that's okay. But if the call stack capturing was not precise enough (eg skipping too many low frames) we may accidentally lump several allocation sites together which could have different MEMFLAGS. I added an assert to check that. > > - `NativeCallStack`: Removed the `fillStack` argument from the first constructor to avoid having to evaluate it in this hot constructor. Its true in almost all cases. > > - Also removed the `toSkip` default value. Instead, I added an explicit default constructor. > > - Moved the malloc site table tuning statistics printing from memtracker.cpp down into a new function `MallocSiteTable::print_tuning_statistics()`. When implemented inside `MallocSiteTable`, that coding does not need a walker object anymore and becomes a lot simpler. In particular, we don't have to rely on implicit knowledge about walking order, which made the code complex and was vulnerable against subtle errors. New code is more compact and simpler. > ---- > Tests: > - github GA > - manual NMT jtreg tests (including the currently disabled runtime/NMT/CheckForProperDetailStackTrace.java) > - Full nightlies at SAP are scheduled > @tstuefe - Thanks for keeping cleanups separated from bug fixes. That always > makes the PR reviews easier. I don't see any information about what testing was > done on this PR. Thanks Dan. I updated the description. I'm very careful after JDK-8261520. Manual tests went through, nightlies at SAP are scheduled. BTW this patch is a preparation for the fix for JDK-8261520. ------------- PR: https://git.openjdk.java.net/jdk/pull/2539 From adinn at openjdk.java.net Fri Feb 12 16:58:40 2021 From: adinn at openjdk.java.net (Andrew Dinn) Date: Fri, 12 Feb 2021 16:58:40 GMT Subject: RFR: 8261660: AArch64: Race condition in stub code generation for LSE Atomics In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 16:27:09 GMT, Andrew Haley wrote: > Temporary fix for race condition. > > There's a narrow race condition in the code which generates LSE Atomic stubs and enables them for use by the runtime. DIsable LSE stub generation for now. Yes, this will be fine for now until the code generation and update of the code pointers can be corrected to happen in the right order. ------------- Marked as reviewed by adinn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2553 From dcubed at openjdk.java.net Fri Feb 12 16:58:41 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Fri, 12 Feb 2021 16:58:41 GMT Subject: RFR: 8261660: AArch64: Race condition in stub code generation for LSE Atomics In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 16:27:09 GMT, Andrew Haley wrote: > Temporary fix for race condition. > > There's a narrow race condition in the code which generates LSE Atomic stubs and enables them for use by the runtime. DIsable LSE stub generation for now. Thanks for the explanation. Thumbs up. This is a trivial fix and does not have to wait for 24 hours. ------------- Marked as reviewed by dcubed (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2553 From adinn at openjdk.java.net Fri Feb 12 16:58:41 2021 From: adinn at openjdk.java.net (Andrew Dinn) Date: Fri, 12 Feb 2021 16:58:41 GMT Subject: RFR: 8261660: AArch64: Race condition in stub code generation for LSE Atomics In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 16:55:01 GMT, Daniel D. Daugherty wrote: >> Temporary fix for race condition. >> >> There's a narrow race condition in the code which generates LSE Atomic stubs and enables them for use by the runtime. DIsable LSE stub generation for now. > > Thanks for the explanation. Thumbs up. > > This is a trivial fix and does not have to wait for 24 hours. @dcubed-ojdk A complete fix merely requires writing the generated stub code before updating the stub function pointers. With this change all that happens is that those pointers continue pointing to the default routines. There is no point backing out the rest of the code given that it will almost all be needed. ------------- PR: https://git.openjdk.java.net/jdk/pull/2553 From dcubed at openjdk.java.net Fri Feb 12 17:06:41 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Fri, 12 Feb 2021 17:06:41 GMT Subject: RFR: 8261075: Create stubRoutines.inline.hpp with SafeFetch implementation In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 09:52:44 GMT, Anton Kozlov wrote: > Hi, > > Please reivew a small non-functional change that extracts inline SafeFetch functions to a separate file. This is preliminary work for JEP-391 integration that will reduce the size of that patch. > > CC @dcubed-ojdk > > Thanks! Thumbs up. ------------- Marked as reviewed by dcubed (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2542 From stuefe at openjdk.java.net Fri Feb 12 17:32:41 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 12 Feb 2021 17:32:41 GMT Subject: RFR: 8261075: Create stubRoutines.inline.hpp with SafeFetch implementation In-Reply-To: References: Message-ID: <2kFQ29OCibSJYdEH5pI5ysRl7KmU4vreLmqGjqJPmHA=.e0ae2601-7063-4eec-b21f-62ab6df01dcf@github.com> On Fri, 12 Feb 2021 09:52:44 GMT, Anton Kozlov wrote: > Hi, > > Please reivew a small non-functional change that extracts inline SafeFetch functions to a separate file. This is preliminary work for JEP-391 integration that will reduce the size of that patch. > > CC @dcubed-ojdk > > Thanks! I would rename the new header safefetch.hpp, just because these functions were never part of the StubRoutines namespace, and the naming is much clearer. ------------- PR: https://git.openjdk.java.net/jdk/pull/2542 From aph at openjdk.java.net Fri Feb 12 17:37:38 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 12 Feb 2021 17:37:38 GMT Subject: Integrated: 8261660: AArch64: Race condition in stub code generation for LSE Atomics In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 16:27:09 GMT, Andrew Haley wrote: > Temporary fix for race condition. > > There's a narrow race condition in the code which generates LSE Atomic stubs and enables them for use by the runtime. DIsable LSE stub generation for now. This pull request has now been integrated. Changeset: a305743c Author: Andrew Haley URL: https://git.openjdk.java.net/jdk/commit/a305743c Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8261660: AArch64: Race condition in stub code generation for LSE Atomics Reviewed-by: dcubed, adinn ------------- PR: https://git.openjdk.java.net/jdk/pull/2553 From enikitin at openjdk.java.net Fri Feb 12 18:22:52 2021 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Fri, 12 Feb 2021 18:22:52 GMT Subject: RFR: 8058176: [mlvm] tests should not allow code cache exhaustion [v2] In-Reply-To: References: Message-ID: > Another approach to the JDK-8058176 and #2440 - never allowing the tests hit CodeCache limits. The most significant consumer is the MH graph builder (the MHTransformationGen), whose consumption is now controlled. List of changes: > > * Code cache size getters are added to WhiteBox; > * MH sequences are now built with remaining Code cache size in mind (always let 2M clearance); > * Dependencies on WhiteBox added for all affected tests; > * The test cases in question un-problemlisted. > > Testing: the whole vmTestbase/vm/mlvm/ in win-lin-mac x86. Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: Switch to ManagementBeans approach instead of the WhiteBox one ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2523/files - new: https://git.openjdk.java.net/jdk/pull/2523/files/4153edb1..71af7185 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2523&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2523&range=00-01 Stats: 100 lines in 12 files changed: 12 ins; 80 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/2523.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2523/head:pull/2523 PR: https://git.openjdk.java.net/jdk/pull/2523 From enikitin at openjdk.java.net Fri Feb 12 19:47:39 2021 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Fri, 12 Feb 2021 19:47:39 GMT Subject: RFR: 8058176: [mlvm] tests should not allow code cache exhaustion In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 13:27:52 GMT, Evgeny Nikitin wrote: > Another approach to the JDK-8058176 and #2440 - never allowing the tests hit CodeCache limits. The most significant consumer is the MH graph builder (the MHTransformationGen), whose consumption is now controlled. List of changes: > > * Code cache size getters are added to WhiteBox; > * MH sequences are now built with remaining Code cache size in mind (always let 2M clearance); > * Dependencies on WhiteBox added for all affected tests; > * The test cases in question un-problemlisted. > > Testing: the whole vmTestbase/vm/mlvm/ in win-lin-mac x86. As suggested by @iignatev, cleaned off WhiteBox changes in favour of management JMX beans, resulting in much cleaner solution. ------------- PR: https://git.openjdk.java.net/jdk/pull/2523 From iklam at openjdk.java.net Fri Feb 12 19:53:52 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Fri, 12 Feb 2021 19:53:52 GMT Subject: RFR: 8261672: Reduce inclusion of classLoaderData.hpp Message-ID: classLoaderData.hpp is included by about 700 out of 1000 .o files in HotSpot. Most of these are transitively included through klass.hpp, typeArrayKlass.hpp and instanceKlass.hpp. These headers can be refactored by moving inline functions that depend on ClassLoaderData to xxx.inline.hpp. This reduces the .o files that include classLoaderData.hpp to about 260. (I also removed a bunch of unnecessary inclusion of classLoader.hpp from a few C files). Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. ------------- Commit messages: - 8261672: Reduce inclusion of classLoaderData.hpp - reduce classLoader.hpp Changes: https://git.openjdk.java.net/jdk/pull/2555/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2555&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261672 Stats: 160 lines in 59 files changed: 108 ins; 47 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/2555.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2555/head:pull/2555 PR: https://git.openjdk.java.net/jdk/pull/2555 From iignatyev at openjdk.java.net Fri Feb 12 20:05:40 2021 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Fri, 12 Feb 2021 20:05:40 GMT Subject: RFR: 8058176: [mlvm] tests should not allow code cache exhaustion [v2] In-Reply-To: References: Message-ID: <2_Gpraz6NaY17HPfRDW-LD-sQrrPQ4dpIVP8vikpdXM=.d425cd8b-aea5-43be-865e-72229db81e6e@github.com> On Fri, 12 Feb 2021 18:22:52 GMT, Evgeny Nikitin wrote: >> Another approach to the JDK-8058176 and #2440 - never allowing the tests hit CodeCache limits. The most significant consumer is the MH graph builder (the MHTransformationGen), whose consumption is now controlled. List of changes: >> >> * Code cache size getters are added to WhiteBox; >> * MH sequences are now built with remaining Code cache size in mind (always let 2M clearance); >> * Dependencies on WhiteBox added for all affected tests; >> * The test cases in question un-problemlisted. >> >> Testing: the whole vmTestbase/vm/mlvm/ in win-lin-mac x86. > > Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: > > Switch to ManagementBeans approach instead of the WhiteBox one test/hotspot/jtreg/vmTestbase/vm/mlvm/meth/share/MHTransformationGen.java line 69: > 67: private static final boolean USE_THROW_CATCH = false; // Test bugs > 68: > 69: private static final MemoryPoolMXBean CODE_CACHE_MX_BEAN = ManagementFactory does it work w/ both `-XX:+SegmentedCodeCache` and `-XX:-SegmentedCodeCache`? If I remember correctly (@TobiHartmann , please correct me if I'm wrong), `CodeCache` pool exists when `SegmentedCodeCache` is disabled, when it's enabled, you will have 3 different pools (one for each "CodeHeap"), and here we would need to use one for `non-nmethod` codeheap. -- Igor ------------- PR: https://git.openjdk.java.net/jdk/pull/2523 From iignatyev at openjdk.java.net Fri Feb 12 20:08:40 2021 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Fri, 12 Feb 2021 20:08:40 GMT Subject: RFR: 8058176: [mlvm] tests should not allow code cache exhaustion [v2] In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 18:22:52 GMT, Evgeny Nikitin wrote: >> Another approach to the JDK-8058176 and #2440 - never allowing the tests hit CodeCache limits. The most significant consumer is the MH graph builder (the MHTransformationGen), whose consumption is now controlled. List of changes: >> >> * Code cache size getters are added to WhiteBox; >> * MH sequences are now built with remaining Code cache size in mind (always let 2M clearance); >> * Dependencies on WhiteBox added for all affected tests; >> * The test cases in question un-problemlisted. >> >> Testing: the whole vmTestbase/vm/mlvm/ in win-lin-mac x86. > > Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: > > Switch to ManagementBeans approach instead of the WhiteBox one test/hotspot/jtreg/vmTestbase/vm/mlvm/meth/share/MHTransformationGen.java line 107: > 105: if (isCodeCacheEffectivelyFull()) { > 106: Env.traceNormal("Not enought code cache to build up MH sequences anymore. " + > 107: " Has only been able to achieve " + (MAX_CYCLES - i) + " out of " + MAX_CYCLES); given `nextInt(x)` returns a random number from `[0; x]`, we might have achieved more (or less) `MAX_CYCLES - i`, i.e. that part of the message is incorrect, I'd just remove it. ------------- PR: https://git.openjdk.java.net/jdk/pull/2523 From lfoltan at openjdk.java.net Fri Feb 12 20:13:39 2021 From: lfoltan at openjdk.java.net (Lois Foltan) Date: Fri, 12 Feb 2021 20:13:39 GMT Subject: RFR: 8261672: Reduce inclusion of classLoaderData.hpp In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 19:41:54 GMT, Ioi Lam wrote: > classLoaderData.hpp is included by about 700 out of 1000 .o files in HotSpot. Most of these are transitively included through klass.hpp, typeArrayKlass.hpp and instanceKlass.hpp. > > These headers can be refactored by moving inline functions that depend on ClassLoaderData to xxx.inline.hpp. This reduces the .o files that include classLoaderData.hpp to about 260. > > (I also removed a bunch of unnecessary inclusion of classLoader.hpp from a few C files). > > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. LGTM. Lois ------------- Marked as reviewed by lfoltan (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2555 From coleenp at openjdk.java.net Fri Feb 12 21:38:39 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 12 Feb 2021 21:38:39 GMT Subject: RFR: 8261672: Reduce inclusion of classLoaderData.hpp In-Reply-To: References: Message-ID: <0EjXR8-41EJ8zdXKlpkVMn0yMvFmDFUS1dKoNK35zDs=.1a74c451-7e08-455b-bd54-63c7651da5ee@github.com> On Fri, 12 Feb 2021 19:41:54 GMT, Ioi Lam wrote: > classLoaderData.hpp is included by about 700 out of 1000 .o files in HotSpot. Most of these are transitively included through klass.hpp, typeArrayKlass.hpp and instanceKlass.hpp. > > These headers can be refactored by moving inline functions that depend on ClassLoaderData to xxx.inline.hpp. This reduces the .o files that include classLoaderData.hpp to about 260. > > (I also removed a bunch of unnecessary inclusion of classLoader.hpp from a few C files). > > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. Looks good. I think adding compiledICHolder.inline.hpp is worth doing because performance matters to its caller in compiledMethod.cpp. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2555 From rkennke at openjdk.java.net Fri Feb 12 21:45:57 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Fri, 12 Feb 2021 21:45:57 GMT Subject: RFR: 8261448: Preserve GC stack watermark across safepoints in StackWalk In-Reply-To: <2X3mb-VkqGf_YYSIeb3n9pxXmocT1GkUYDYI_C8cOZo=.3f2fab17-f8f6-4860-a6b4-0a6bb6a1256f@github.com> References: <2X3mb-VkqGf_YYSIeb3n9pxXmocT1GkUYDYI_C8cOZo=.3f2fab17-f8f6-4860-a6b4-0a6bb6a1256f@github.com> Message-ID: On Wed, 10 Feb 2021 12:38:10 GMT, Roman Kennke wrote: >> I am observing the following assert: >> >> # Internal Error (/home/rkennke/src/openjdk/loom/src/hotspot/share/runtime/stackWatermark.cpp:178), pid=54418, tid=54534 >> # assert(is_frame_safe(f)) failed: Frame must be safe >> >> (see issue for full hs_err) >> >> In StackWalk::fetchNextBatch() we prepare the entire stack to be processed by calling StackWatermarkSet::finish_processing(jt, NULL, StackWatermarkKind::gc), but then subsequently, during frames scan, perform allocations to fill in the frame information (fill_in_frames => LiveFrameStream::fill_frame => fill_live_stackframe) at where we could safepoint for GC, which could reset the stack watermark. >> >> This is only relevant for GCs that use the StackWatermark, e.g. ZGC and Shenandoah at the moment. >> >> Solution is to preserve the stack-watermark across safepoints in StackWalk::fetchNextBatch(). StackWalk::fetchFirstBatch() doesn't look to be affected by this: it is not using the stack-watermark. >> >> Testing: >> - [x] StackWalk tests with Shenandoah/aggressive >> - [x] StackWalk tests with ZGC/aggressive >> - [ ] tier1 (+Shenandoah/ZGC) >> - [ ] tier2 (+Shenandoah/ZGC) > > I'm converting back to draft. The Loom tests (test/jdk/java/lang/Continuation/*) are still failing and it looks like fetchFirstBatch() does indeed require treatment, and it's complicated because fetchFirstBatch() may end up calling fetchNextBatch() and the KeepStackGCProcessedMark is not reentrant. I tested the original patch in Loom with tests that use stack-walking and it failed because we'd need another KeepStackGCProcessedMark in fetchFirstBatch() too. Unfortunately, fetchFirstBatch() can wind up calling fetchNextBatch() recursively, but we *also* can call fetchNextBatch() without calling fetchFirstBatch() on outer frame, thus we need KeepStackGCProcessedMark to be reentrant. I achieved this by linking together nested linked watermark. I am not sure this is the right way to achieve it. It fixes all tests in Loom *and* mainline JDK though. ------------- PR: https://git.openjdk.java.net/jdk/pull/2500 From rkennke at openjdk.java.net Fri Feb 12 21:45:57 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Fri, 12 Feb 2021 21:45:57 GMT Subject: RFR: 8261448: Preserve GC stack watermark across safepoints in StackWalk [v2] In-Reply-To: References: Message-ID: > I am observing the following assert: > > # Internal Error (/home/rkennke/src/openjdk/loom/src/hotspot/share/runtime/stackWatermark.cpp:178), pid=54418, tid=54534 > # assert(is_frame_safe(f)) failed: Frame must be safe > > (see issue for full hs_err) > > In StackWalk::fetchNextBatch() we prepare the entire stack to be processed by calling StackWatermarkSet::finish_processing(jt, NULL, StackWatermarkKind::gc), but then subsequently, during frames scan, perform allocations to fill in the frame information (fill_in_frames => LiveFrameStream::fill_frame => fill_live_stackframe) at where we could safepoint for GC, which could reset the stack watermark. > > This is only relevant for GCs that use the StackWatermark, e.g. ZGC and Shenandoah at the moment. > > Solution is to preserve the stack-watermark across safepoints in StackWalk::fetchNextBatch(). StackWalk::fetchFirstBatch() doesn't look to be affected by this: it is not using the stack-watermark. > > Testing: > - [x] StackWalk tests with Shenandoah/aggressive > - [x] StackWalk tests with ZGC/aggressive > - [ ] tier1 (+Shenandoah/ZGC) > - [ ] tier2 (+Shenandoah/ZGC) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Make KeepStackGCProcessedMark reentrant; Place a KeepStackGCProcessedMark in StackWalker::fetchFirstBatch() ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2500/files - new: https://git.openjdk.java.net/jdk/pull/2500/files/72f20e13..6946499c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2500&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2500&range=00-01 Stats: 12 lines in 4 files changed: 10 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2500.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2500/head:pull/2500 PR: https://git.openjdk.java.net/jdk/pull/2500 From coleenp at openjdk.java.net Fri Feb 12 22:25:40 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 12 Feb 2021 22:25:40 GMT Subject: RFR: JDK-8261644: NMT: Simplifications and cleanups In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 07:16:45 GMT, Thomas Stuefe wrote: > Hi, > > may I please have reviews for this RFE? > > While working on NMT I found a number of possible cleanups and simplifications. I avoided mixing these cleanups with fixed and instead put them into this cleanup RFE. > > - de-templatize `AllocationSite` since E was used as simple data holder for child classes; the same effect can be had with traditional inheritance with less and clearer code (also IDEs get less confused) > > - `AllocationSite` child classes `SimpleThreadStackSite`, `VirtualMemoryAllocationSite`, `MallocSite` were simplified. > > - As for `SimpleThreadStackSite`, we can get rid of the separate data holder class `SimpleThreadStack` entirely by merging its members directly into `SimpleThreadStackSite`. In theory we could do the same for the data holder classes `MemoryCounter` and `VirtualMemory` for `MallocSite` and `VirtualMemoryAllocationSite` too but this would cause larger ripples so I stopped there. > > - removed the SimpleThreadStackSite(address base, size_t size) constructor (the one not taking a call stack) by slightly rewriting its sole user > > - made `AllocationSite` immutable > > - removed unused default constructors from `MallocSite` and `MallocSiteHashTableEntry` since they were not needed > > - removed unused methods `set_callsite()`, `hash()`, `equals()` from `MallocSiteHashTableEntry` > > - There was a subtle incorrectness where `AllocationSite::equals()` would only compare callstack and disregard the MEMFLAGS member. Theoretically, if two callstacks end with the same lowest frame, they should always reference the same single allocation, so that's okay. But if the call stack capturing was not precise enough (eg skipping too many low frames) we may accidentally lump several allocation sites together which could have different MEMFLAGS. I added an assert to check that. > > - `NativeCallStack`: Removed the `fillStack` argument from the first constructor to avoid having to evaluate it in this hot constructor. Its true in almost all cases. > > - Also removed the `toSkip` default value. Instead, I added an explicit default constructor. > > - Moved the malloc site table tuning statistics printing from memtracker.cpp down into a new function `MallocSiteTable::print_tuning_statistics()`. When implemented inside `MallocSiteTable`, that coding does not need a walker object anymore and becomes a lot simpler. In particular, we don't have to rely on implicit knowledge about walking order, which made the code complex and was vulnerable against subtle errors. New code is more compact and simpler. Before removing the old implementation, I ran both statistics side by side for a couple of scenarios (eg really bad hash code implementations) and the output was identical. > > ---- > Tests: > - github GA > - manual NMT jtreg tests (including the currently disabled runtime/NMT/CheckForProperDetailStackTrace.java) > - Full nightlies at SAP are scheduled These changes look great. Thank you! src/hotspot/share/services/mallocSiteTable.cpp line 141: > 139: while (head != NULL && (*pos_idx) <= MAX_BUCKET_LENGTH) { > 140: MallocSite* site = head->data(); > 141: if (site->equals(key, flags)) { Does this now assert when it used to just return false if memflags didn't match? src/hotspot/share/services/threadStackTracker.hpp line 44: > 42: _base(base), > 43: _size(size) > 44: {} nit: can you put {} on the same line as _size ? ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2539 From mchung at openjdk.java.net Fri Feb 12 22:51:46 2021 From: mchung at openjdk.java.net (Mandy Chung) Date: Fri, 12 Feb 2021 22:51:46 GMT Subject: RFR: 8261031: Move some ClassLoader name checking to native/VM [v3] In-Reply-To: References: <3fZUkpucpgdhZyyWDQ7Hp1oKthgl1ckXBq942wMNwxI=.7a3db0ca-03c0-44f9-ade9-3b4443cc6666@github.com> Message-ID: On Fri, 12 Feb 2021 02:10:02 GMT, Coleen Phillimore wrote: >> Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: >> >> Consolidate verifyClassname and verifyFixClassname > > This more limited cleanup looks good. This patch changes `JVM_FindLoadedClass` interface to only accept a binary name. It used to accept both a binary name and internal form. Most, if not all, JVM entry points take the name of internal name. So this change makes this JVM entry point inconsistent with others. Looking closer each API that involves `fixClassName` or `verifyXXXClassName`, the JVM entry points called expects the internal form except `JVM_FindLoadedClass` (see details below). I think a better change is to change the native `JVM_FindLoadedClass` to accept the internal form only and have `findLoadedClass0` method to detect if the name contains '/' or '['. ClassLoader API does not allow loading of an array type whereas `Class::forName` allows to find an array type. Perhaps `verifyFixClassName` should be renamed like `binaryNameToInternalForm`. I think we don't need `fixClassname`? ClassLoader::defineClass - `preDefineClass` checks the name and throws if it contains '/' or '[' - no name check in `JVM_DefineClassWithSource` and `JVM_LookupDefineClass` which expects the name is of internal form native Class::forName0 - converts the binary name to internal form (i.e. replace '.' with '/') - throw if the name contains '/' - no explicit name check in `JVM_FindClassFromCaller` ClassLoader::loadClass - calls native `findLoadedClass0` that calls `JVM_FindLoadedClass` which accepts binary form and converts '.' to '/' but the current implementation accepts both binary name and internal form - calls `native findBootstrapClass` which converts '.' to '/' and pass the internal form to `JVM_FindBootstrapClass`. It'd be helpful to document the internal APIs and JVM entry points clearly what it expects for example binary name vs internal form and where it does the validation e.g. Class::forName0 allows array type and native library methods do the name validation. ------------- PR: https://git.openjdk.java.net/jdk/pull/2378 From mseledtsov at openjdk.java.net Fri Feb 12 23:03:53 2021 From: mseledtsov at openjdk.java.net (Mikhailo Seledtsov) Date: Fri, 12 Feb 2021 23:03:53 GMT Subject: RFR: 8213269: convert test/hotspot/jtreg/runtime/memory/RunUnitTestsConcurrently to gtest Message-ID: This is a preliminary review. I would like to get the initial feedback before I proceed with conversion of the remaining tests. Here is what I did so far: - created a UnitTestThread and a main test runner, based on gtests with similar needs - moved the original code from HotSpot internals (so called hotspot internal tests: src/hotspot/share/memory/virtualspace.cpp) to the newly created gtest while wrapping it into a TestReservedSpace class. I did not change the code of the test. - removed invocations from whitebox.cpp Testing: - ran GTestWrapper on usual platforms - All PASS - ensured that ReservedSpaceConcurrent is in the logs and passed After gathering the feedback my plan is: Plan: - move the remaining internal Memory/VirtualSpace tests into a gTest - I am thinking about using separate files for each test - create a common file for UnitTestThread and MultiThreadTestRunner to reuse the code ------------- Commit messages: - Removing RunUnitTestsConcurrently.java from TEST groups - Using original duration and concurrency factor - Removing test reference from whitebox and the original jtreg test - Cleanup - Fixed build issue, minor cleanup - Moved reserve_memory_special for Windows to GTest - Removed empty test from AIX and BSD - Linux variant of reserve_memory_special_concurrent to GTest - Factored concurrent test runner helpers into its own file, for inclusion from multiple files - Converted TestVirtualSpace_test to gtest - ... and 9 more: https://git.openjdk.java.net/jdk/compare/2be60e37...4076519f Changes: https://git.openjdk.java.net/jdk/pull/2436/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2436&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8213269 Stats: 1301 lines in 13 files changed: 632 ins; 664 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/2436.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2436/head:pull/2436 PR: https://git.openjdk.java.net/jdk/pull/2436 From iignatyev at openjdk.java.net Fri Feb 12 23:03:54 2021 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Fri, 12 Feb 2021 23:03:54 GMT Subject: RFR: 8213269: convert test/hotspot/jtreg/runtime/memory/RunUnitTestsConcurrently to gtest In-Reply-To: References: Message-ID: On Fri, 5 Feb 2021 20:35:23 GMT, Mikhailo Seledtsov wrote: > This is a preliminary review. I would like to get the initial feedback before I proceed with conversion of the remaining tests. > > Here is what I did so far: > - created a UnitTestThread and a main test runner, based on gtests with similar needs > - moved the original code from HotSpot internals (so called hotspot internal tests: src/hotspot/share/memory/virtualspace.cpp) > to the newly created gtest while wrapping it into a TestReservedSpace class. I did not change the code of the test. > - removed invocations from whitebox.cpp > > Testing: > - ran GTestWrapper on usual platforms - All PASS > - ensured that ReservedSpaceConcurrent is in the logs and passed > > After gathering the feedback my plan is: > Plan: > - move the remaining internal Memory/VirtualSpace tests into a gTest > - I am thinking about using separate files for each test > - create a common file for UnitTestThread and MultiThreadTestRunner to reuse the code hi Misha, I haven't finished (pre)reviewing it yet, however, I have found a few things that, I belive, should be changed. test/hotspot/gtest/runtime/test_reservedSpaceConcurrent.cpp line 1: > 1: /* test files should follow the same naming scheme as the product files which contained tested functionality, so in this case, it should be `test/hotspot/gtest/runtime/memory/test_virtualspace.cpp` test/hotspot/gtest/runtime/test_reservedSpaceConcurrent.cpp line 30: > 28: > 29: #define TEST_THREAD_COUNT 10 // TODO: update to original value of 30 > 30: #define TEST_DURATION 15000 // milliSeconds it's 2021, there is (almost) no good reasons to use `#define` to define constants. test/hotspot/gtest/runtime/test_reservedSpaceConcurrent.cpp line 208: > 206: void run_unit_test() { > 207: TestReservedSpace::test_reserved_space(); > 208: } what's the point of having this extra member function? can't we just call `TestReservedSpace::test_reserved_space` directly from main_run? test/hotspot/gtest/runtime/test_reservedSpaceConcurrent.cpp line 46: > 44: static void release_memory_for_test(ReservedSpace rs) { > 45: if (rs.special()) { > 46: guarantee(os::release_memory_special(rs.base(), rs.size()), "Shouldn't fail"); you shouldn't use hotspot's `gurantee/assert`, and use gtest provided `ASSERT`/`EXPECT` instead. ------------- Changes requested by iignatyev (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2436 From mseledtsov at openjdk.java.net Fri Feb 12 23:03:54 2021 From: mseledtsov at openjdk.java.net (Mikhailo Seledtsov) Date: Fri, 12 Feb 2021 23:03:54 GMT Subject: RFR: 8213269: convert test/hotspot/jtreg/runtime/memory/RunUnitTestsConcurrently to gtest In-Reply-To: References: Message-ID: On Fri, 5 Feb 2021 20:35:23 GMT, Mikhailo Seledtsov wrote: > This is a preliminary review. I would like to get the initial feedback before I proceed with conversion of the remaining tests. > > Here is what I did so far: > - created a UnitTestThread and a main test runner, based on gtests with similar needs > - moved the original code from HotSpot internals (so called hotspot internal tests: src/hotspot/share/memory/virtualspace.cpp) > to the newly created gtest while wrapping it into a TestReservedSpace class. I did not change the code of the test. > - removed invocations from whitebox.cpp > > Testing: > - ran GTestWrapper on usual platforms - All PASS > - ensured that ReservedSpaceConcurrent is in the logs and passed > > After gathering the feedback my plan is: > Plan: > - move the remaining internal Memory/VirtualSpace tests into a gTest > - I am thinking about using separate files for each test > - create a common file for UnitTestThread and MultiThreadTestRunner to reuse the code @iignatev @hseigel Could you, please, take a look when you have a chance? This is a preliminary review. ------------- PR: https://git.openjdk.java.net/jdk/pull/2436 From mseledtsov at openjdk.java.net Fri Feb 12 23:03:55 2021 From: mseledtsov at openjdk.java.net (Mikhailo Seledtsov) Date: Fri, 12 Feb 2021 23:03:55 GMT Subject: RFR: 8213269: convert test/hotspot/jtreg/runtime/memory/RunUnitTestsConcurrently to gtest In-Reply-To: References: Message-ID: <2_d7FcyjPSzsWeknIjsJa6qfjaQuEC_CT4blN20daDQ=.0fb1e0a9-5321-476e-953a-b09f07ae1a3c@github.com> On Fri, 5 Feb 2021 21:03:56 GMT, Igor Ignatyev wrote: >> This is a preliminary review. I would like to get the initial feedback before I proceed with conversion of the remaining tests. >> >> Here is what I did so far: >> - created a UnitTestThread and a main test runner, based on gtests with similar needs >> - moved the original code from HotSpot internals (so called hotspot internal tests: src/hotspot/share/memory/virtualspace.cpp) >> to the newly created gtest while wrapping it into a TestReservedSpace class. I did not change the code of the test. >> - removed invocations from whitebox.cpp >> >> Testing: >> - ran GTestWrapper on usual platforms - All PASS >> - ensured that ReservedSpaceConcurrent is in the logs and passed >> >> After gathering the feedback my plan is: >> Plan: >> - move the remaining internal Memory/VirtualSpace tests into a gTest >> - I am thinking about using separate files for each test >> - create a common file for UnitTestThread and MultiThreadTestRunner to reuse the code > > hi Misha, > > I haven't finished (pre)reviewing it yet, however, I have found a few things that, I belive, should be changed. I have updated the changes with recommendations from Igor. I also have: - converted all 3 tests, which also includes platforms-specific tests - using GTest EXPECT/ASSERT macros - removed remnants of the original tests - general cleanup - ran build-and-test on these tests, plus a number of additional builds Please review when you have a chance: @iignatev @hseigel > test/hotspot/gtest/runtime/test_reservedSpaceConcurrent.cpp line 46: > >> 44: static void release_memory_for_test(ReservedSpace rs) { >> 45: if (rs.special()) { >> 46: guarantee(os::release_memory_special(rs.base(), rs.size()), "Shouldn't fail"); > > you shouldn't use hotspot's `gurantee/assert`, and use gtest provided `ASSERT`/`EXPECT` instead. Will do. > test/hotspot/gtest/runtime/test_reservedSpaceConcurrent.cpp line 208: > >> 206: void run_unit_test() { >> 207: TestReservedSpace::test_reserved_space(); >> 208: } > > what's the point of having this extra member function? can't we just call `TestReservedSpace::test_reserved_space` directly from main_run? I reworked the classes to be able to reuse the concurrency driver and threads code. ------------- PR: https://git.openjdk.java.net/jdk/pull/2436 From eosterlund at openjdk.java.net Fri Feb 12 23:17:39 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 12 Feb 2021 23:17:39 GMT Subject: RFR: 8261448: Preserve GC stack watermark across safepoints in StackWalk In-Reply-To: References: <2X3mb-VkqGf_YYSIeb3n9pxXmocT1GkUYDYI_C8cOZo=.3f2fab17-f8f6-4860-a6b4-0a6bb6a1256f@github.com> Message-ID: On Fri, 12 Feb 2021 21:43:20 GMT, Roman Kennke wrote: >> I'm converting back to draft. The Loom tests (test/jdk/java/lang/Continuation/*) are still failing and it looks like fetchFirstBatch() does indeed require treatment, and it's complicated because fetchFirstBatch() may end up calling fetchNextBatch() and the KeepStackGCProcessedMark is not reentrant. > > I tested the original patch in Loom with tests that use stack-walking and it failed because we'd need another KeepStackGCProcessedMark in fetchFirstBatch() too. Unfortunately, fetchFirstBatch() can wind up calling fetchNextBatch() recursively, but we *also* can call fetchNextBatch() without calling fetchFirstBatch() on outer frame, thus we need KeepStackGCProcessedMark to be reentrant. I achieved this by linking together nested linked watermark. I am not sure this is the right way to achieve it. It fixes all tests in Loom *and* mainline JDK though. I think this solution is wrong, regarding nesting. There is only a single node but it looks like you think there are multiple. The result is seemingly that the unlink function won't unlink anything, which permanently disables incremental stack scanning on that thread. Is there any way the mark can be placed closer to the problematic allocation so we don't need nesting? ------------- PR: https://git.openjdk.java.net/jdk/pull/2500 From eosterlund at openjdk.java.net Fri Feb 12 23:17:38 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 12 Feb 2021 23:17:38 GMT Subject: RFR: 8261448: Preserve GC stack watermark across safepoints in StackWalk [v2] In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 21:45:57 GMT, Roman Kennke wrote: >> I am observing the following assert: >> >> # Internal Error (/home/rkennke/src/openjdk/loom/src/hotspot/share/runtime/stackWatermark.cpp:178), pid=54418, tid=54534 >> # assert(is_frame_safe(f)) failed: Frame must be safe >> >> (see issue for full hs_err) >> >> In StackWalk::fetchNextBatch() we prepare the entire stack to be processed by calling StackWatermarkSet::finish_processing(jt, NULL, StackWatermarkKind::gc), but then subsequently, during frames scan, perform allocations to fill in the frame information (fill_in_frames => LiveFrameStream::fill_frame => fill_live_stackframe) at where we could safepoint for GC, which could reset the stack watermark. >> >> This is only relevant for GCs that use the StackWatermark, e.g. ZGC and Shenandoah at the moment. >> >> Solution is to preserve the stack-watermark across safepoints in StackWalk::fetchNextBatch(). StackWalk::fetchFirstBatch() doesn't look to be affected by this: it is not using the stack-watermark. >> >> Testing: >> - [x] StackWalk tests with Shenandoah/aggressive >> - [x] StackWalk tests with ZGC/aggressive >> - [ ] tier1 (+Shenandoah/ZGC) >> - [ ] tier2 (+Shenandoah/ZGC) > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Make KeepStackGCProcessedMark reentrant; Place a KeepStackGCProcessedMark in StackWalker::fetchFirstBatch() Nesting code looks wrong. ------------- Changes requested by eosterlund (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2500 From ccheung at openjdk.java.net Fri Feb 12 23:18:42 2021 From: ccheung at openjdk.java.net (Calvin Cheung) Date: Fri, 12 Feb 2021 23:18:42 GMT Subject: RFR: 8261608: Move common CDS archive building code to archiveBuilder.cpp [v2] In-Reply-To: <7g0L5oYcLEqm2b-mg_7heIQFeOj0atCRN-3Y_gy0UzE=.efc411e4-736b-4f9e-8d47-a814bfacf08c@github.com> References: <7g0L5oYcLEqm2b-mg_7heIQFeOj0atCRN-3Y_gy0UzE=.efc411e4-736b-4f9e-8d47-a814bfacf08c@github.com> Message-ID: <6lGG0UDjQ30S0swXhcYFVYh7sS3n-ncIqtGy1YV7yGU=.24f05b0b-e765-43ee-9213-b8d63f084664@github.com> On Fri, 12 Feb 2021 04:08:03 GMT, Ioi Lam wrote: >> This is a follow-up to https://git.openjdk.java.net/jdk/pull/2296: >> >> - Move common code for writing the CDS archive from metaspaceShared.cpp to archiveBuilder.cpp >> >> - Data structures related to dumping were haphazardly organized in several classes (e.g., `DumpRegions`). We needed various APIs to access them across classes. These should be consolidated in archiveBuilder.cpp and the API should be cleaned up >> >> - Detailed stats (`DumpAllocStats::print_stats`) were available only for static dump. Refactor the code so they are also printed for dynamic dump > > Ioi Lam has updated the pull request incrementally with two additional commits since the last revision: > > - fixed spaces > - use member initializer list; clean up log message Looks good overall. Couple of minor comments. src/hotspot/share/memory/archiveBuilder.cpp line 197: > 195: > 196: assert(_current == NULL, "must be"); > 197: _current = this; These lines used to be at the beginning of the function. Any reasons why they are moved? test/hotspot/jtreg/runtime/cds/appcds/LotsOfClasses.java line 53: > 51: opts.addSuffix("-Xlog:gc+region+cds"); > 52: //opts.addSuffix("-Xlog:gc+region=trace"); > 53: opts.addSuffix("-Xlog:cds=debug"); // test detailed metadata info printing Remove the commented line #52? ------------- PR: https://git.openjdk.java.net/jdk/pull/2536 From jiefu at openjdk.java.net Sat Feb 13 02:50:40 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Sat, 13 Feb 2021 02:50:40 GMT Subject: RFR: 8261585: Restore HandleArea used in Deoptimization::uncommon_trap In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 14:42:59 GMT, Hui Shi wrote: > Add HandleMark in Deoptimization::uncommon_trap before Deoptimization::fetch_unroll_info_helper, avoid reference hold in HandleArea increase object lifetime. Then object lifetime will be consistent with/without uncommon trap. > > For test case in commit, WeakReference is expected cleared after GC, but it fails with option "-XX:-Inline -XX:-TieredCompilation -XX:CompileCommand=compileonly,UncommonTrapLeak.foo -XX:CompileThreshold=100 -XX:-BackgroundCompilation". Reference's referent object is still alive after "foo" finish, because with uncommon trap, oops are recorded in HandleArea and HandleArea is not poped when uncommon trap process finish. > > When Deoptimization::fetch_unroll_info_helper return, all oops in deoptimized frames are saved in Deoptimization::UnrollBlock or Thread data structure, HandleArea can be poped safely. > 1. local and expression oops raw address is stored in vframeArrayElement _locals/_expressions as intptr > 2. return value restore, raw oop recoreded in frame // (oop *)map->location(rax->as_VMReg()); > 3. exception object, raw oop recorded on Thread._exception_oop > > In deoptimize blob entry, JRT_BLOCK_ENTRY(Deoptimization::fetch_unroll_info) has HandleMarkCleaner, HandleArea is restored after Deoptimization::fetch_unroll_info_helper finish. So it's also safe to add HandleMark in Deoptimization::uncommon_trap before fetch_unroll_info_helper. Looks good to me. Thanks for fixing it. I'd like to sponsor it. Thanks. ------------- Marked as reviewed by jiefu (Committer). PR: https://git.openjdk.java.net/jdk/pull/2526 From hshi at openjdk.java.net Sat Feb 13 02:50:42 2021 From: hshi at openjdk.java.net (Hui Shi) Date: Sat, 13 Feb 2021 02:50:42 GMT Subject: Integrated: 8261585: Restore HandleArea used in Deoptimization::uncommon_trap In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 14:42:59 GMT, Hui Shi wrote: > Add HandleMark in Deoptimization::uncommon_trap before Deoptimization::fetch_unroll_info_helper, avoid reference hold in HandleArea increase object lifetime. Then object lifetime will be consistent with/without uncommon trap. > > For test case in commit, WeakReference is expected cleared after GC, but it fails with option "-XX:-Inline -XX:-TieredCompilation -XX:CompileCommand=compileonly,UncommonTrapLeak.foo -XX:CompileThreshold=100 -XX:-BackgroundCompilation". Reference's referent object is still alive after "foo" finish, because with uncommon trap, oops are recorded in HandleArea and HandleArea is not poped when uncommon trap process finish. > > When Deoptimization::fetch_unroll_info_helper return, all oops in deoptimized frames are saved in Deoptimization::UnrollBlock or Thread data structure, HandleArea can be poped safely. > 1. local and expression oops raw address is stored in vframeArrayElement _locals/_expressions as intptr > 2. return value restore, raw oop recoreded in frame // (oop *)map->location(rax->as_VMReg()); > 3. exception object, raw oop recorded on Thread._exception_oop > > In deoptimize blob entry, JRT_BLOCK_ENTRY(Deoptimization::fetch_unroll_info) has HandleMarkCleaner, HandleArea is restored after Deoptimization::fetch_unroll_info_helper finish. So it's also safe to add HandleMark in Deoptimization::uncommon_trap before fetch_unroll_info_helper. This pull request has now been integrated. Changeset: 95d73129 Author: Hui Shi Committer: Jie Fu URL: https://git.openjdk.java.net/jdk/commit/95d73129 Stats: 64 lines in 2 files changed: 64 ins; 0 del; 0 mod 8261585: Restore HandleArea used in Deoptimization::uncommon_trap Reviewed-by: coleenp, jiefu ------------- PR: https://git.openjdk.java.net/jdk/pull/2526 From stuefe at openjdk.java.net Sat Feb 13 04:50:39 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sat, 13 Feb 2021 04:50:39 GMT Subject: RFR: JDK-8261644: NMT: Simplifications and cleanups In-Reply-To: References: Message-ID: <_OWFASsWUnMpGd21tRffsajmHjwvx2yRpFIsDdSss7U=.7657c2d0-7fd1-411e-b98f-17983ea63002@github.com> On Fri, 12 Feb 2021 22:17:43 GMT, Coleen Phillimore wrote: >> Hi, >> >> may I please have reviews for this RFE? >> >> While working on NMT I found a number of possible cleanups and simplifications. I avoided mixing these cleanups with fixed and instead put them into this cleanup RFE. >> >> There should be no behavioral changes in this patch. >> >> - de-templatize `AllocationSite` since E was used as simple data holder for child classes; the same effect can be had with traditional inheritance with less and clearer code (also IDEs get less confused) >> >> - `AllocationSite` child classes `SimpleThreadStackSite`, `VirtualMemoryAllocationSite`, `MallocSite` were simplified. >> >> - As for `SimpleThreadStackSite`, we can get rid of the separate data holder class `SimpleThreadStack` entirely by merging its members directly into `SimpleThreadStackSite`. In theory we could do the same for the data holder classes `MemoryCounter` and `VirtualMemory` for `MallocSite` and `VirtualMemoryAllocationSite` too but this would cause larger ripples so I stopped there. >> >> - removed the SimpleThreadStackSite(address base, size_t size) constructor (the one not taking a call stack) by slightly rewriting its sole user >> >> - made `AllocationSite` immutable >> >> - removed unused default constructors from `MallocSite` and `MallocSiteHashTableEntry` since they were not needed >> >> - removed unused methods `set_callsite()`, `hash()`, `equals()` from `MallocSiteHashTableEntry` >> >> - There was a subtle incorrectness where `AllocationSite::equals()` would only compare callstack and disregard the MEMFLAGS member. Theoretically, if two callstacks end with the same lowest frame, they should always reference the same single allocation, so that's okay. But if the call stack capturing was not precise enough (eg skipping too many low frames) we may accidentally lump several allocation sites together which could have different MEMFLAGS. I added an assert to check that. (_Update: seems this assert really fires on s390x, so this is a real problem. I opened [1] to track this and restored the old behavior._). >> >> - `NativeCallStack`: Removed the `fillStack` argument from the first constructor to avoid having to evaluate it in this hot constructor. Its true in almost all cases. >> >> - Also removed the `toSkip` default value. Instead, I added an explicit default constructor. >> >> - Moved the malloc site table tuning statistics printing from memtracker.cpp down into a new function `MallocSiteTable::print_tuning_statistics()`. When implemented inside `MallocSiteTable`, that coding does not need a walker object anymore and becomes a lot simpler. In particular, we don't have to rely on implicit knowledge about walking order, which made the code complex and was vulnerable against subtle errors. New code is more compact and simpler. Before removing the old implementation, I ran both statistics side by side for a couple of scenarios (eg really bad hash code implementations) and the output was identical. >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8261556 >> >> ---- >> Tests: >> - github GA >> - manual NMT jtreg tests (including the currently disabled runtime/NMT/CheckForProperDetailStackTrace.java) >> - Full nightlies at SAP are scheduled > > src/hotspot/share/services/mallocSiteTable.cpp line 141: > >> 139: while (head != NULL && (*pos_idx) <= MAX_BUCKET_LENGTH) { >> 140: MallocSite* site = head->data(); >> 141: if (site->equals(key, flags)) { > > Does this now assert when it used to just return false if memflags didn't match? Yes, and it seems this actually fired tonight on s390x. Which indicates that the call stack frame skipping is off and we do skip too much. I opened https://bugs.openjdk.java.net/browse/JDK-8261556 to track this, and here I will restore the old behavior for now. That is also cleaner since this RFE should not have brought behavioral changes. ------------- PR: https://git.openjdk.java.net/jdk/pull/2539 From iklam at openjdk.java.net Sat Feb 13 05:04:58 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Sat, 13 Feb 2021 05:04:58 GMT Subject: RFR: 8261608: Move common CDS archive building code to archiveBuilder.cpp [v3] In-Reply-To: References: Message-ID: > This is a follow-up to https://git.openjdk.java.net/jdk/pull/2296: > > - Move common code for writing the CDS archive from metaspaceShared.cpp to archiveBuilder.cpp > > - Data structures related to dumping were haphazardly organized in several classes (e.g., `DumpRegions`). We needed various APIs to access them across classes. These should be consolidated in archiveBuilder.cpp and the API should be cleaned up > > - Detailed stats (`DumpAllocStats::print_stats`) were available only for static dump. Refactor the code so they are also printed for dynamic dump Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: review comments by @calvinccheung ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2536/files - new: https://git.openjdk.java.net/jdk/pull/2536/files/9582e40f..54e7185f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2536&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2536&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2536.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2536/head:pull/2536 PR: https://git.openjdk.java.net/jdk/pull/2536 From iklam at openjdk.java.net Sat Feb 13 05:05:00 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Sat, 13 Feb 2021 05:05:00 GMT Subject: RFR: 8261608: Move common CDS archive building code to archiveBuilder.cpp [v2] In-Reply-To: <6lGG0UDjQ30S0swXhcYFVYh7sS3n-ncIqtGy1YV7yGU=.24f05b0b-e765-43ee-9213-b8d63f084664@github.com> References: <7g0L5oYcLEqm2b-mg_7heIQFeOj0atCRN-3Y_gy0UzE=.efc411e4-736b-4f9e-8d47-a814bfacf08c@github.com> <6lGG0UDjQ30S0swXhcYFVYh7sS3n-ncIqtGy1YV7yGU=.24f05b0b-e765-43ee-9213-b8d63f084664@github.com> Message-ID: On Fri, 12 Feb 2021 23:13:36 GMT, Calvin Cheung wrote: >> Ioi Lam has updated the pull request incrementally with two additional commits since the last revision: >> >> - fixed spaces >> - use member initializer list; clean up log message > > src/hotspot/share/memory/archiveBuilder.cpp line 197: > >> 195: >> 196: assert(_current == NULL, "must be"); >> 197: _current = this; > > These lines used to be at the beginning of the function. Any reasons why they are moved? I think it's better to set `_current` after the instance has been fully initialized. > test/hotspot/jtreg/runtime/cds/appcds/LotsOfClasses.java line 53: > >> 51: opts.addSuffix("-Xlog:gc+region+cds"); >> 52: //opts.addSuffix("-Xlog:gc+region=trace"); >> 53: opts.addSuffix("-Xlog:cds=debug"); // test detailed metadata info printing > > Remove the commented line #52? I removed it. ------------- PR: https://git.openjdk.java.net/jdk/pull/2536 From stuefe at openjdk.java.net Sat Feb 13 05:19:53 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sat, 13 Feb 2021 05:19:53 GMT Subject: RFR: JDK-8261644: NMT: Simplifications and cleanups [v2] In-Reply-To: References: Message-ID: > Hi, > > may I please have reviews for this RFE? > > While working on NMT I found a number of possible cleanups and simplifications. I avoided mixing these cleanups with fixed and instead put them into this cleanup RFE. > > There should be no behavioral changes in this patch. > > - de-templatize `AllocationSite` since E was used as simple data holder for child classes; the same effect can be had with traditional inheritance with less and clearer code (also IDEs get less confused) > > - `AllocationSite` child classes `SimpleThreadStackSite`, `VirtualMemoryAllocationSite`, `MallocSite` were simplified. > > - As for `SimpleThreadStackSite`, we can get rid of the separate data holder class `SimpleThreadStack` entirely by merging its members directly into `SimpleThreadStackSite`. In theory we could do the same for the data holder classes `MemoryCounter` and `VirtualMemory` for `MallocSite` and `VirtualMemoryAllocationSite` too but this would cause larger ripples so I stopped there. > > - removed the SimpleThreadStackSite(address base, size_t size) constructor (the one not taking a call stack) by slightly rewriting its sole user > > - made `AllocationSite` immutable > > - removed unused default constructors from `MallocSite` and `MallocSiteHashTableEntry` since they were not needed > > - removed unused methods `set_callsite()`, `hash()`, `equals()` from `MallocSiteHashTableEntry` > > - There was a subtle incorrectness where `AllocationSite::equals()` would only compare callstack and disregard the MEMFLAGS member. Theoretically, if two callstacks end with the same lowest frame, they should always reference the same single allocation, so that's okay. But if the call stack capturing was not precise enough (eg skipping too many low frames) we may accidentally lump several allocation sites together which could have different MEMFLAGS. I added an assert to check that. (_Update: seems this assert really fires on s390x, so this is a real problem. I opened [1] to track this and restored the old behavior._). > > - `NativeCallStack`: Removed the `fillStack` argument from the first constructor to avoid having to evaluate it in this hot constructor. Its true in almost all cases. > > - Also removed the `toSkip` default value. Instead, I added an explicit default constructor. > > - Moved the malloc site table tuning statistics printing from memtracker.cpp down into a new function `MallocSiteTable::print_tuning_statistics()`. When implemented inside `MallocSiteTable`, that coding does not need a walker object anymore and becomes a lot simpler. In particular, we don't have to rely on implicit knowledge about walking order, which made the code complex and was vulnerable against subtle errors. New code is more compact and simpler. Before removing the old implementation, I ran both statistics side by side for a couple of scenarios (eg really bad hash code implementations) and the output was identical. > > [1] https://bugs.openjdk.java.net/browse/JDK-8261556 > > ---- > Tests: > - github GA > - manual NMT jtreg tests (including the currently disabled runtime/NMT/CheckForProperDetailStackTrace.java) > - Full nightlies at SAP are scheduled Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: Restore old comparison behavior in MallocSiteTable ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2539/files - new: https://git.openjdk.java.net/jdk/pull/2539/files/892759ea..2b06cdf6 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2539&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2539&range=00-01 Stats: 11 lines in 2 files changed: 0 ins; 9 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2539.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2539/head:pull/2539 PR: https://git.openjdk.java.net/jdk/pull/2539 From ccheung at openjdk.java.net Sat Feb 13 05:22:41 2021 From: ccheung at openjdk.java.net (Calvin Cheung) Date: Sat, 13 Feb 2021 05:22:41 GMT Subject: RFR: 8261608: Move common CDS archive building code to archiveBuilder.cpp [v3] In-Reply-To: References: Message-ID: On Sat, 13 Feb 2021 05:04:58 GMT, Ioi Lam wrote: >> This is a follow-up to https://git.openjdk.java.net/jdk/pull/2296: >> >> - Move common code for writing the CDS archive from metaspaceShared.cpp to archiveBuilder.cpp >> >> - Data structures related to dumping were haphazardly organized in several classes (e.g., `DumpRegions`). We needed various APIs to access them across classes. These should be consolidated in archiveBuilder.cpp and the API should be cleaned up >> >> - Detailed stats (`DumpAllocStats::print_stats`) were available only for static dump. Refactor the code so they are also printed for dynamic dump > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > review comments by @calvinccheung Marked as reviewed by ccheung (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2536 From stuefe at openjdk.java.net Sat Feb 13 05:22:59 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sat, 13 Feb 2021 05:22:59 GMT Subject: RFR: JDK-8261644: NMT: Simplifications and cleanups [v3] In-Reply-To: References: Message-ID: > Hi, > > may I please have reviews for this RFE? > > While working on NMT I found a number of possible cleanups and simplifications. I avoided mixing these cleanups with fixed and instead put them into this cleanup RFE. > > There should be no behavioral changes in this patch. > > - de-templatize `AllocationSite` since E was used as simple data holder for child classes; the same effect can be had with traditional inheritance with less and clearer code (also IDEs get less confused) > > - `AllocationSite` child classes `SimpleThreadStackSite`, `VirtualMemoryAllocationSite`, `MallocSite` were simplified. > > - As for `SimpleThreadStackSite`, we can get rid of the separate data holder class `SimpleThreadStack` entirely by merging its members directly into `SimpleThreadStackSite`. In theory we could do the same for the data holder classes `MemoryCounter` and `VirtualMemory` for `MallocSite` and `VirtualMemoryAllocationSite` too but this would cause larger ripples so I stopped there. > > - removed the SimpleThreadStackSite(address base, size_t size) constructor (the one not taking a call stack) by slightly rewriting its sole user > > - made `AllocationSite` immutable > > - removed unused default constructors from `MallocSite` and `MallocSiteHashTableEntry` since they were not needed > > - removed unused methods `set_callsite()`, `hash()`, `equals()` from `MallocSiteHashTableEntry` > > - There was a subtle incorrectness where `AllocationSite::equals()` would only compare callstack and disregard the MEMFLAGS member. Theoretically, if two callstacks end with the same lowest frame, they should always reference the same single allocation, so that's okay. But if the call stack capturing was not precise enough (eg skipping too many low frames) we may accidentally lump several allocation sites together which could have different MEMFLAGS. I added an assert to check that. (_Update: seems this assert really fires on s390x, so this is a real problem. I opened [1] to track this and restored the old behavior._). > > - `NativeCallStack`: Removed the `fillStack` argument from the first constructor to avoid having to evaluate it in this hot constructor. Its true in almost all cases. > > - Also removed the `toSkip` default value. Instead, I added an explicit default constructor. > > - Moved the malloc site table tuning statistics printing from memtracker.cpp down into a new function `MallocSiteTable::print_tuning_statistics()`. When implemented inside `MallocSiteTable`, that coding does not need a walker object anymore and becomes a lot simpler. In particular, we don't have to rely on implicit knowledge about walking order, which made the code complex and was vulnerable against subtle errors. New code is more compact and simpler. Before removing the old implementation, I ran both statistics side by side for a couple of scenarios (eg really bad hash code implementations) and the output was identical. > > [1] https://bugs.openjdk.java.net/browse/JDK-8261556 > > ---- > Tests: > - github GA > - manual NMT jtreg tests (including the currently disabled runtime/NMT/CheckForProperDetailStackTrace.java) > - Full nightlies at SAP are scheduled Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: Constructor brackets ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2539/files - new: https://git.openjdk.java.net/jdk/pull/2539/files/2b06cdf6..74155d15 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2539&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2539&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2539.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2539/head:pull/2539 PR: https://git.openjdk.java.net/jdk/pull/2539 From stuefe at openjdk.java.net Sat Feb 13 05:23:00 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sat, 13 Feb 2021 05:23:00 GMT Subject: RFR: JDK-8261644: NMT: Simplifications and cleanups [v3] In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 22:21:35 GMT, Coleen Phillimore wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> Constructor brackets > > src/hotspot/share/services/threadStackTracker.hpp line 44: > >> 42: _base(base), >> 43: _size(size) >> 44: {} > > nit: can you put {} on the same line as _size ? Done. ------------- PR: https://git.openjdk.java.net/jdk/pull/2539 From stuefe at openjdk.java.net Sat Feb 13 05:43:56 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sat, 13 Feb 2021 05:43:56 GMT Subject: RFR: JDK-8261644: NMT: Simplifications and cleanups [v4] In-Reply-To: References: Message-ID: > Hi, > > may I please have reviews for this RFE? > > While working on NMT I found a number of possible cleanups and simplifications. I avoided mixing these cleanups with fixed and instead put them into this cleanup RFE. > > There should be no behavioral changes in this patch. > > - de-templatize `AllocationSite` since E was used as simple data holder for child classes; the same effect can be had with traditional inheritance with less and clearer code (also IDEs get less confused) > > - `AllocationSite` child classes `SimpleThreadStackSite`, `VirtualMemoryAllocationSite`, `MallocSite` were simplified. > > - As for `SimpleThreadStackSite`, we can get rid of the separate data holder class `SimpleThreadStack` entirely by merging its members directly into `SimpleThreadStackSite`. In theory we could do the same for the data holder classes `MemoryCounter` and `VirtualMemory` for `MallocSite` and `VirtualMemoryAllocationSite` too but this would cause larger ripples so I stopped there. > > - removed the SimpleThreadStackSite(address base, size_t size) constructor (the one not taking a call stack) by slightly rewriting its sole user > > - made `AllocationSite` immutable > > - removed unused default constructors from `MallocSite` and `MallocSiteHashTableEntry` since they were not needed > > - removed unused methods `set_callsite()`, `hash()`, `equals()` from `MallocSiteHashTableEntry` > > - There was a subtle incorrectness where `AllocationSite::equals()` would only compare callstack and disregard the MEMFLAGS member. Theoretically, if two callstacks end with the same lowest frame, they should always reference the same single allocation, so that's okay. But if the call stack capturing was not precise enough (eg skipping too many low frames) we may accidentally lump several allocation sites together which could have different MEMFLAGS. I added an assert to check that. (_Update: seems this assert really fires on s390x, so this is a real problem. I opened [1] to track this and restored the old behavior._). > > - `NativeCallStack`: Removed the `fillStack` argument from the first constructor to avoid having to evaluate it in this hot constructor. Its true in almost all cases. > > - Also removed the `toSkip` default value. Instead, I added an explicit default constructor. > > - Moved the malloc site table tuning statistics printing from memtracker.cpp down into a new function `MallocSiteTable::print_tuning_statistics()`. When implemented inside `MallocSiteTable`, that coding does not need a walker object anymore and becomes a lot simpler. In particular, we don't have to rely on implicit knowledge about walking order, which made the code complex and was vulnerable against subtle errors. New code is more compact and simpler. Before removing the old implementation, I ran both statistics side by side for a couple of scenarios (eg really bad hash code implementations) and the output was identical. > > [1] https://bugs.openjdk.java.net/browse/JDK-8261556 > > ---- > Tests: > - github GA > - manual NMT jtreg tests (including the currently disabled runtime/NMT/CheckForProperDetailStackTrace.java) > - Full nightlies at SAP are scheduled Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: reduce diff ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2539/files - new: https://git.openjdk.java.net/jdk/pull/2539/files/74155d15..0e126c31 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2539&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2539&range=02-03 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/2539.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2539/head:pull/2539 PR: https://git.openjdk.java.net/jdk/pull/2539 From iklam at openjdk.java.net Sat Feb 13 05:45:58 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Sat, 13 Feb 2021 05:45:58 GMT Subject: RFR: 8261672: Reduce inclusion of classLoaderData.hpp [v2] In-Reply-To: References: Message-ID: > classLoaderData.hpp is included by about 700 out of 1000 .o files in HotSpot. Most of these are transitively included through klass.hpp, typeArrayKlass.hpp and instanceKlass.hpp. > > These headers can be refactored by moving inline functions that depend on ClassLoaderData to xxx.inline.hpp. This reduces the .o files that include classLoaderData.hpp to about 260. > > (I also removed a bunch of unnecessary inclusion of classLoader.hpp from a few C files). > > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - fixed copyright - Merge branch 'master' into 8261672-reduce-classLoaderData.hpp - 8261672: Reduce inclusion of classLoaderData.hpp - reduce classLoader.hpp ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2555/files - new: https://git.openjdk.java.net/jdk/pull/2555/files/96f3b680..ea396957 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2555&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2555&range=00-01 Stats: 4352 lines in 164 files changed: 2494 ins; 690 del; 1168 mod Patch: https://git.openjdk.java.net/jdk/pull/2555.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2555/head:pull/2555 PR: https://git.openjdk.java.net/jdk/pull/2555 From iklam at openjdk.java.net Sat Feb 13 07:16:40 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Sat, 13 Feb 2021 07:16:40 GMT Subject: RFR: 8261672: Reduce inclusion of classLoaderData.hpp [v2] In-Reply-To: <0EjXR8-41EJ8zdXKlpkVMn0yMvFmDFUS1dKoNK35zDs=.1a74c451-7e08-455b-bd54-63c7651da5ee@github.com> References: <0EjXR8-41EJ8zdXKlpkVMn0yMvFmDFUS1dKoNK35zDs=.1a74c451-7e08-455b-bd54-63c7651da5ee@github.com> Message-ID: On Fri, 12 Feb 2021 21:35:38 GMT, Coleen Phillimore wrote: >> Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - fixed copyright >> - Merge branch 'master' into 8261672-reduce-classLoaderData.hpp >> - 8261672: Reduce inclusion of classLoaderData.hpp >> - reduce classLoader.hpp > > Looks good. I think adding compiledICHolder.inline.hpp is worth doing because performance matters to its caller in compiledMethod.cpp. Thanks @coleenp and @lfoltan for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/2555 From iklam at openjdk.java.net Sat Feb 13 07:16:43 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Sat, 13 Feb 2021 07:16:43 GMT Subject: Integrated: 8261672: Reduce inclusion of classLoaderData.hpp In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 19:41:54 GMT, Ioi Lam wrote: > classLoaderData.hpp is included by about 700 out of 1000 .o files in HotSpot. Most of these are transitively included through klass.hpp, typeArrayKlass.hpp and instanceKlass.hpp. > > These headers can be refactored by moving inline functions that depend on ClassLoaderData to xxx.inline.hpp. This reduces the .o files that include classLoaderData.hpp to about 260. > > (I also removed a bunch of unnecessary inclusion of classLoader.hpp from a few C files). > > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. This pull request has now been integrated. Changeset: 235da6aa Author: Ioi Lam URL: https://git.openjdk.java.net/jdk/commit/235da6aa Stats: 182 lines in 59 files changed: 108 ins; 47 del; 27 mod 8261672: Reduce inclusion of classLoaderData.hpp Reviewed-by: lfoltan, coleenp ------------- PR: https://git.openjdk.java.net/jdk/pull/2555 From xliu at openjdk.java.net Sun Feb 14 05:44:51 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Sun, 14 Feb 2021 05:44:51 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v4] In-Reply-To: References: Message-ID: > Add a flag _suppress_cr to outputStream. outstream objects won't emit any CR if it's set. > Correct TypeInstPtr::dump2 to make sure it only emits klass name once. > Remove the comment because Klass::oop_print_on() has emitted the address of oop. > > Before: > 689 ConP === 0 [[ 821 ]] Oop:java/lang/Stringjava.lang.String > {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' > - string: "a" > :Constant:exact * > > After: > 689 ConP === 0 [[ 821 ]] Oop:java.lang.String {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' - string: "a":Constant:exact * Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set reimplement this feature. withdraw my intrusive change in outputStream. use stringStream only for the constant OopPtr. after oop->print_on(st), delete all appearances of '\n' - Merge branch 'master' into JDK-8260198 - 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set fix merge conflict. - Merge branch 'master' into JDK-8260198 - 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set Add a flag _suppress_cr to outputStream. outstream objects won't emit any CR if it's set. Correct TypeInstPtr::dump2 to make sure it only emits klass name once. Remove the comment because Klass::oop_print_on() has emitted the address of oop. Before: 689 ConP === 0 [[ 821 ]] Oop:java/lang/Stringjava.lang.String {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' - string: "a" :Constant:exact * After: 689 ConP === 0 [[ 821 ]] Oop:java.lang.String {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' - string: "a":Constant:exact * ------------- Changes: https://git.openjdk.java.net/jdk/pull/2178/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2178&range=03 Stats: 51 lines in 4 files changed: 45 ins; 1 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/2178.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2178/head:pull/2178 PR: https://git.openjdk.java.net/jdk/pull/2178 From xliu at openjdk.java.net Sun Feb 14 05:53:40 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Sun, 14 Feb 2021 05:53:40 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set In-Reply-To: References: <1lfc5MjMIFUx-Q19CkfKJvP4yYoM6B5APuRsvevpuk8=.3ba9c333-74b6-4ba4-bb3c-66c4583701fb@github.com> Message-ID: On Thu, 28 Jan 2021 18:03:43 GMT, Xin Liu wrote: >> The result your are trying to achieve is good, but I'm not sure pushing supress_cr into outputstream is the right thing. I would like to just not emit the cr's instead - but do also I see that isn't simple, because adding an extra bool to print_on would cascade into the entire codebase. > > @neliasso Thanks for reviewing this. > Exactly. The first reason is I am not familiar with oops/ codebase. I guess some clients expect to see multiple lines. The second reason is that there are [many places](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/oops/klass.cpp#L783). I am not sure I can clean them up completely. > > That's why I modify outputStream and give it a 'suppress_cr' mode. May I ask hotspot-dev's advice? /cc hotspot-dev I reimplement this feature using streamStream. After change, a ConP of an Constant OopPtr becomes a one-liner. eg. 279 ConP === 0 [[ 1105 ]] Oop:java/lang/String java.lang.String {0x000000010100e3d0} - klass: public final synchronized 'java/lang/String' - string: "":Constant:exact * please note that I keep "Oop:java/lang/String". It's the output klass()->print_name_on(st); The remaining part "java.lang.String {0x000000010100e3d0} - klass: public final synchronized 'java/lang/String' - string: "":Constant:exact *" is the output of oop->print_on(st). because it's output with `:+Verbose`, I think it's okay to have a bit verbose. ------------- PR: https://git.openjdk.java.net/jdk/pull/2178 From iklam at openjdk.java.net Mon Feb 15 06:40:58 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Mon, 15 Feb 2021 06:40:58 GMT Subject: RFR: 8261608: Move common CDS archive building code to archiveBuilder.cpp [v4] In-Reply-To: References: Message-ID: > This is a follow-up to https://git.openjdk.java.net/jdk/pull/2296: > > - Move common code for writing the CDS archive from metaspaceShared.cpp to archiveBuilder.cpp > > - Data structures related to dumping were haphazardly organized in several classes (e.g., `DumpRegions`). We needed various APIs to access them across classes. These should be consolidated in archiveBuilder.cpp and the API should be cleaned up > > - Detailed stats (`DumpAllocStats::print_stats`) were available only for static dump. Refactor the code so they are also printed for dynamic dump Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into 8261608-move-common-archive-building-code - review comments by @calvinccheung - fixed spaces - use member initializer list; clean up log message - 8261608: Move common CDS archive building code to archiveBuilder.cpp ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2536/files - new: https://git.openjdk.java.net/jdk/pull/2536/files/54e7185f..8aaf33f6 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2536&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2536&range=02-03 Stats: 4934 lines in 207 files changed: 2805 ins; 891 del; 1238 mod Patch: https://git.openjdk.java.net/jdk/pull/2536.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2536/head:pull/2536 PR: https://git.openjdk.java.net/jdk/pull/2536 From iklam at openjdk.java.net Mon Feb 15 06:40:58 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Mon, 15 Feb 2021 06:40:58 GMT Subject: RFR: 8261608: Move common CDS archive building code to archiveBuilder.cpp [v2] In-Reply-To: References: <7g0L5oYcLEqm2b-mg_7heIQFeOj0atCRN-3Y_gy0UzE=.efc411e4-736b-4f9e-8d47-a814bfacf08c@github.com> Message-ID: <1CS7ZpBzcFddpU59jeh6FrXdrkGuFhSruVPv-9FHFJk=.f91cf0a5-1cbb-4882-bb0c-ab8e774440ed@github.com> On Fri, 12 Feb 2021 14:09:24 GMT, Coleen Phillimore wrote: >> Ioi Lam has updated the pull request incrementally with two additional commits since the last revision: >> >> - fixed spaces >> - use member initializer list; clean up log message > > Looks good! Thanks @coleenp and @calvinccheung for your review! ------------- PR: https://git.openjdk.java.net/jdk/pull/2536 From iklam at openjdk.java.net Mon Feb 15 06:40:59 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Mon, 15 Feb 2021 06:40:59 GMT Subject: Integrated: 8261608: Move common CDS archive building code to archiveBuilder.cpp In-Reply-To: References: Message-ID: <1kI6241XXumNyAVT-4yF13ozOKlFaNaOZrdWGtgFFOo=.d4a0bb1e-bc9e-4df5-ad27-8ee1cec0121c@github.com> On Thu, 11 Feb 2021 23:41:34 GMT, Ioi Lam wrote: > This is a follow-up to https://git.openjdk.java.net/jdk/pull/2296: > > - Move common code for writing the CDS archive from metaspaceShared.cpp to archiveBuilder.cpp > > - Data structures related to dumping were haphazardly organized in several classes (e.g., `DumpRegions`). We needed various APIs to access them across classes. These should be consolidated in archiveBuilder.cpp and the API should be cleaned up > > - Detailed stats (`DumpAllocStats::print_stats`) were available only for static dump. Refactor the code so they are also printed for dynamic dump This pull request has now been integrated. Changeset: d9744f65 Author: Ioi Lam URL: https://git.openjdk.java.net/jdk/commit/d9744f65 Stats: 982 lines in 31 files changed: 346 ins; 481 del; 155 mod 8261608: Move common CDS archive building code to archiveBuilder.cpp Reviewed-by: coleenp, ccheung ------------- PR: https://git.openjdk.java.net/jdk/pull/2536 From shade at openjdk.java.net Mon Feb 15 08:40:40 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 15 Feb 2021 08:40:40 GMT Subject: RFR: 8261309: Remove remaining StoreLoad barrier with UseCondCardMark for Serial/Parallel GC In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 08:55:40 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this (tiny) change that removes the last (unconditional) StoreLoad memory barrier for Serial/Parallel GC that has apparently been forgotten to be made conditional on `CardTable::scanned_concurrently()` just removed in [JDK-8260941](https://bugs.openjdk.java.net/browse/JDK-8260941) ? > > Thanks, > Thomas > > Testing: automatic compilation via github actions, but this is a quite straightforward removal of a single line... Looks good. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2541 From xliu at openjdk.java.net Mon Feb 15 09:26:56 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Mon, 15 Feb 2021 09:26:56 GMT Subject: RFR: 8261731: shallow copy the internal buffer of a scalar-replaced java.lang.String object Message-ID: There are 3 nodes involving in the construction of a java.lang.String object. 1. Allocate of itself, aka. alloc 2. AllocateArray of a byte array, which is value:byte[], aka. aa 3. ArrayCopyNode which copys in the contents of value, aka. ac Lemma When a String object `alloc` is scalar replaced, C2 can eliminate `aa` and `ac`. Because `alloc` is scalar replaced, it must be non-escaped. The field value:byte[] of j.l.String cannot be seen by external world, therefore it must not be global escaped. Because the buffer is marked as stable, it is safe to assume its contents are whatever ac copies in. Because all public java.lang.String constructors clone the incoming array, the source of `ac` is stable as well. It is possible to rewire `aa` to the source of ac with the correct offset. That is to say, we can replace both `aa` and `ac` with a ?shallow copy? of the source of `ac`. It?s safe if C2 keeps a reference of the source oop for all safepoints. ------------- Commit messages: - fix regression for x86-32 - add a statistical counter for OptimizeTempArray. - [SIM-JVM-450] support deoptimization v2 - add a unit test for deoptimization - [SIM-JVM-450] support deoptimization part2 - enable OptimizeTempArray by default - Merge branch 'master' into optimize_substring - Revert "8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set" - Revert "add a new bucket afterea_late_inlines" - [SIM-JVM-450] support deoptimization - ... and 25 more: https://git.openjdk.java.net/jdk/compare/4619f372...fd9ca4b8 Changes: https://git.openjdk.java.net/jdk/pull/2570/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2570&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261731 Stats: 861 lines in 16 files changed: 844 ins; 2 del; 15 mod Patch: https://git.openjdk.java.net/jdk/pull/2570.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2570/head:pull/2570 PR: https://git.openjdk.java.net/jdk/pull/2570 From stefank at openjdk.java.net Mon Feb 15 09:28:41 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Mon, 15 Feb 2021 09:28:41 GMT Subject: RFR: 8261448: Preserve GC stack watermark across safepoints in StackWalk [v2] In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 23:14:47 GMT, Erik ?sterlund wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Make KeepStackGCProcessedMark reentrant; Place a KeepStackGCProcessedMark in StackWalker::fetchFirstBatch() > > Nesting code looks wrong. I incorrectly read Erik's comment as "Nesting code looks **good**", so I created a unit test to show the problem with the patch: https://github.com/stefank/jdk/commit/8760f1b0409b3cccf76a8ea417b90e66da31af72 Maybe you could build a few more test based on this? ------------- PR: https://git.openjdk.java.net/jdk/pull/2500 From mdoerr at openjdk.java.net Mon Feb 15 12:00:40 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Mon, 15 Feb 2021 12:00:40 GMT Subject: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v2] In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 10:56:34 GMT, Aleksey Shipilev wrote: >> Thanks for your reply. Yeah, I got your point regarding C++. But we use load-consume a lot in our self-made assembly code which should be ok. >> I guess the shenandoahBarrierSetAssembler_aarch64.cpp part you're changing is not very perfomance sensitive? > >> I guess the shenandoahBarrierSetAssembler_aarch64.cpp part you're changing is not very perfomance sensitive? > > Yes, it is not supposed to be: CAS failure path when GC is relocating the objects. I'd prefer using load-consume with comment in assembly code and acquire in C++ code. That would be consistent with other code. But that's just my opinion. I'll leave the aarch64 maintainers free to decide. ------------- PR: https://git.openjdk.java.net/jdk/pull/2496 From sjohanss at openjdk.java.net Mon Feb 15 13:31:56 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Mon, 15 Feb 2021 13:31:56 GMT Subject: RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages [v4] In-Reply-To: References: Message-ID: > When large pages are enabled on Linux (using -XX:+UseLargePages), both UseHugeTLBFS and UseSHM can be used. We prefer to use HugeTLBFS and first do a sanity check to see if this kind of large pages are available and if so we disable UseSHM. > > The problematic part is when HugeTLBFS pages are not available, then we disable this flag and without doing any sanity check for UseSHM, we mark large pages as enabled using SHM. One big problem with this is that SHM also requires the same type of explicitly allocated huge pages as HugeTLBFS and also privileges to lock memory. So it is likely that in the case of not being able to use HugeTLBFS we probably can't use SHM either. > > A fix for this would be to do a similar sanity check as currently done for HugeTLBFS and if it fails disable UseLargePages since we will always fail such allocation attempts anyways. > > The proposed sanity check consist of two part, where the first is just trying create a shared memory segment using `shmget()` with SHM_HUGETLB to use large pages. If this fails there is no idea in trying to use SHM to get large pages. > > The second part checks if the process has privileges to lock memory or if there will be a limit for the SHM usage. I think this would be a nice addition since it will notify the user about the limit and explain why large page mappings fail. The implementation parses `/proc/self/status` to make sure the needed capability is available. > > This change needs two tests to be updated to handle that large pages not can be disabled even when run with +UseLargePages. One of these tests are also updated in [PR#2486](https://github.com/openjdk/jdk/pull/2486) and I plan to get that integrated before this one. Stefan Johansson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - Merge branch 'master' into 8261401-shm-sanity - Only warn if UseLargePages was explicitly set. - Thomas review Removed check for IPC_LOCK capability. If a large page mapping fails the errno is already present in the warning printed out. We could look at improving this to better explain when EPERM vs ENOMEM occurs. - 8261401-check-effective - 8261401-self-review - 8261401-test-fixes - Only UseSHM if sanity check pass ------------- Changes: https://git.openjdk.java.net/jdk/pull/2488/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2488&range=03 Stats: 47 lines in 4 files changed: 44 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/2488.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2488/head:pull/2488 PR: https://git.openjdk.java.net/jdk/pull/2488 From sjohanss at openjdk.java.net Mon Feb 15 13:44:58 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Mon, 15 Feb 2021 13:44:58 GMT Subject: RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages [v5] In-Reply-To: References: Message-ID: > When large pages are enabled on Linux (using -XX:+UseLargePages), both UseHugeTLBFS and UseSHM can be used. We prefer to use HugeTLBFS and first do a sanity check to see if this kind of large pages are available and if so we disable UseSHM. > > The problematic part is when HugeTLBFS pages are not available, then we disable this flag and without doing any sanity check for UseSHM, we mark large pages as enabled using SHM. One big problem with this is that SHM also requires the same type of explicitly allocated huge pages as HugeTLBFS and also privileges to lock memory. So it is likely that in the case of not being able to use HugeTLBFS we probably can't use SHM either. > > A fix for this would be to do a similar sanity check as currently done for HugeTLBFS and if it fails disable UseLargePages since we will always fail such allocation attempts anyways. > > The proposed sanity check consist of two part, where the first is just trying create a shared memory segment using `shmget()` with SHM_HUGETLB to use large pages. If this fails there is no idea in trying to use SHM to get large pages. > > The second part checks if the process has privileges to lock memory or if there will be a limit for the SHM usage. I think this would be a nice addition since it will notify the user about the limit and explain why large page mappings fail. The implementation parses `/proc/self/status` to make sure the needed capability is available. > > This change needs two tests to be updated to handle that large pages not can be disabled even when run with +UseLargePages. One of these tests are also updated in [PR#2486](https://github.com/openjdk/jdk/pull/2486) and I plan to get that integrated before this one. Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: Clean up test after merge ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2488/files - new: https://git.openjdk.java.net/jdk/pull/2488/files/681dc92b..a9d8c0f7 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2488&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2488&range=03-04 Stats: 5 lines in 1 file changed: 0 ins; 5 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2488.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2488/head:pull/2488 PR: https://git.openjdk.java.net/jdk/pull/2488 From aph at redhat.com Mon Feb 15 14:06:23 2021 From: aph at redhat.com (Andrew Haley) Date: Mon, 15 Feb 2021 14:06:23 +0000 Subject: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v2] In-Reply-To: References: Message-ID: On 15/02/2021 12:00, Martin Doerr wrote: > I'd prefer using load-consume with comment in assembly code and acquire in C++ code. That would be consistent with other code. But that's just my opinion. I'll leave the aarch64 maintainers free to decide. That sounds right to me too. One day we'll get memory_order_consume for HotSpot C++ code, but until then acquire will have to do. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From rkennke at openjdk.java.net Mon Feb 15 15:20:59 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 15 Feb 2021 15:20:59 GMT Subject: RFR: 8261448: Preserve GC stack watermark across safepoints in StackWalk [v2] In-Reply-To: References: Message-ID: <2KGNm2sghEHT4velRWjE5yMCU5lBdvYhk2UkPUZktV8=.01c29e94-e5e6-47b1-815b-e327076d8c74@github.com> On Mon, 15 Feb 2021 09:26:03 GMT, Stefan Karlsson wrote: >> Nesting code looks wrong. > > I incorrectly read Erik's comment as "Nesting code looks **good**", so I created a unit test to show the problem with the patch: > https://github.com/stefank/jdk/commit/8760f1b0409b3cccf76a8ea417b90e66da31af72 > > Maybe you could build a few more test based on this? > I think this solution is wrong, regarding nesting. There is only a single node but it looks like you think there are multiple. The result is seemingly that the unlink function won't unlink anything, which permanently disables incremental stack scanning on that thread. > Is there any way the mark can be placed closer to the problematic allocation so we don't need nesting? I just realized that the reentrancy comes from the Java call lower in fetchFirstBatch(). The problem can be easily avoided by putting the KeepStackGCProcessedMark in sensible scope that excludes the call. ------------- PR: https://git.openjdk.java.net/jdk/pull/2500 From rkennke at openjdk.java.net Mon Feb 15 15:20:58 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 15 Feb 2021 15:20:58 GMT Subject: RFR: 8261448: Preserve GC stack watermark across safepoints in StackWalk [v3] In-Reply-To: References: Message-ID: > I am observing the following assert: > > # Internal Error (/home/rkennke/src/openjdk/loom/src/hotspot/share/runtime/stackWatermark.cpp:178), pid=54418, tid=54534 > # assert(is_frame_safe(f)) failed: Frame must be safe > > (see issue for full hs_err) > > In StackWalk::fetchNextBatch() we prepare the entire stack to be processed by calling StackWatermarkSet::finish_processing(jt, NULL, StackWatermarkKind::gc), but then subsequently, during frames scan, perform allocations to fill in the frame information (fill_in_frames => LiveFrameStream::fill_frame => fill_live_stackframe) at where we could safepoint for GC, which could reset the stack watermark. > > This is only relevant for GCs that use the StackWatermark, e.g. ZGC and Shenandoah at the moment. > > Solution is to preserve the stack-watermark across safepoints in StackWalk::fetchNextBatch(). StackWalk::fetchFirstBatch() doesn't look to be affected by this: it is not using the stack-watermark. > > Testing: > - [x] StackWalk tests with Shenandoah/aggressive > - [x] StackWalk tests with ZGC/aggressive > - [ ] tier1 (+Shenandoah/ZGC) > - [ ] tier2 (+Shenandoah/ZGC) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Make KeepStackGCProcessedMark non-reentrant again ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2500/files - new: https://git.openjdk.java.net/jdk/pull/2500/files/6946499c..345f78b4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2500&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2500&range=01-02 Stats: 11 lines in 3 files changed: 0 ins; 9 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2500.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2500/head:pull/2500 PR: https://git.openjdk.java.net/jdk/pull/2500 From akozlov at openjdk.java.net Mon Feb 15 15:21:41 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Mon, 15 Feb 2021 15:21:41 GMT Subject: RFR: 8261075: Create stubRoutines.inline.hpp with SafeFetch implementation In-Reply-To: <2kFQ29OCibSJYdEH5pI5ysRl7KmU4vreLmqGjqJPmHA=.e0ae2601-7063-4eec-b21f-62ab6df01dcf@github.com> References: <2kFQ29OCibSJYdEH5pI5ysRl7KmU4vreLmqGjqJPmHA=.e0ae2601-7063-4eec-b21f-62ab6df01dcf@github.com> Message-ID: <8xMNV_-0gXYQHgE9764gqfj2KGbfjve5iLLdM7re1C0=.8ba7575b-3159-4863-ad5a-84e1b5fce473@github.com> On Fri, 12 Feb 2021 17:29:29 GMT, Thomas Stuefe wrote: > I would rename the new header safefetch.hpp, just because these functions were never part of the StubRoutines namespace, and the naming is much clearer. Good idea. It should be safefetch.inline.hpp, right? ------------- PR: https://git.openjdk.java.net/jdk/pull/2542 From stuefe at openjdk.java.net Mon Feb 15 15:38:41 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Mon, 15 Feb 2021 15:38:41 GMT Subject: RFR: 8261075: Create stubRoutines.inline.hpp with SafeFetch implementation In-Reply-To: <8xMNV_-0gXYQHgE9764gqfj2KGbfjve5iLLdM7re1C0=.8ba7575b-3159-4863-ad5a-84e1b5fce473@github.com> References: <2kFQ29OCibSJYdEH5pI5ysRl7KmU4vreLmqGjqJPmHA=.e0ae2601-7063-4eec-b21f-62ab6df01dcf@github.com> <8xMNV_-0gXYQHgE9764gqfj2KGbfjve5iLLdM7re1C0=.8ba7575b-3159-4863-ad5a-84e1b5fce473@github.com> Message-ID: On Mon, 15 Feb 2021 15:18:57 GMT, Anton Kozlov wrote: > > I would rename the new header safefetch.hpp, just because these functions were never part of the StubRoutines namespace, and the naming is much clearer. > > Good idea. It should be safefetch.inline.hpp, right? I don't think so? To my understanding, xxx.inline.hpp is a companion file to an xxx.hpp which you create if you want to remove some of the rarely used functions from a high traffic header. Here, we just have some independent global utility functions. ------------- PR: https://git.openjdk.java.net/jdk/pull/2542 From akozlov at openjdk.java.net Mon Feb 15 15:50:40 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Mon, 15 Feb 2021 15:50:40 GMT Subject: RFR: 8261075: Create stubRoutines.inline.hpp with SafeFetch implementation In-Reply-To: References: <2kFQ29OCibSJYdEH5pI5ysRl7KmU4vreLmqGjqJPmHA=.e0ae2601-7063-4eec-b21f-62ab6df01dcf@github.com> <8xMNV_-0gXYQHgE9764gqfj2KGbfjve5iLLdM7re1C0=.8ba7575b-3159-4863-ad5a-84e1b5fce473@github.com> Message-ID: On Mon, 15 Feb 2021 15:36:18 GMT, Thomas Stuefe wrote: > To my understanding, xxx.inline.hpp is a companion file to an xxx.hpp which you create if you want to remove some of the rarely used functions from a high traffic header. Oh, you're right. I had an impression that there are standalone inline.hpp files, and I could not find the opposite in the Hotspot Style Guide. However, now I see inline.hpp are just the way you've described. Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/2542 From burban at openjdk.java.net Mon Feb 15 16:24:48 2021 From: burban at openjdk.java.net (Bernhard Urban-Forster) Date: Mon, 15 Feb 2021 16:24:48 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 12:22:09 GMT, Vladimir Kempik wrote: >> Where did this come from - some snippet/example/tech note code? Maybe other people can help figure it out if we provide more info. > > This is the version of w^x on-demand switch implemented by microsoft guys. This is enabled only for debug builds. > @lewurm could you comment here please Those values can be observed in the debugger, but aren't documented or defined in header files. This mode was useful for the initial bootstrap of the platform (it helped with missing W^X transitions), but shouldn't be required anymore today. I'm fine with removing it altogether. ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From akozlov at openjdk.java.net Mon Feb 15 17:34:14 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Mon, 15 Feb 2021 17:34:14 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v14] In-Reply-To: References: Message-ID: <6SmGFRbH6_SgIW-aRpEIrRnfsv-BI2nmDUffhObbyu8=.e4979c82-2bfb-4932-b980-d261d7a55f74@github.com> > Please review the implementation of JEP 391: macOS/AArch64 Port. > > It's heavily based on existing ports to linux/aarch64, macos/x86_64, and windows/aarch64. > > Major changes are in: > * src/hotspot/cpu/aarch64: support of the new calling convention (subtasks JDK-8253817, JDK-8253818) > * src/hotspot/os_cpu/bsd_aarch64: copy of os_cpu/linux_aarch64 with necessary adjustments (JDK-8253819) > * src/hotspot/share, test/hotspot/gtest: support of write-xor-execute (W^X), required on macOS/AArch64 platform. It's implemented with pthread_jit_write_protect_np provided by Apple. The W^X mode is local to a thread, so W^X mode change relates to the java thread state change (for java threads). In most cases, JVM executes in write-only mode, except when calling a generated stub like SafeFetch, which requires a temporary switch to execute-only mode. The same execute-only mode is enabled when a java thread executes in java or native states. This approach of managing W^X mode turned out to be simple and efficient enough. > * src/jdk.hotspot.agent: serviceability agent implementation (JDK-8254941) Anton Kozlov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 73 commits: - Merge remote-tracking branch 'upstream/jdk/master' into jdk-macos Additional work for JDK-8253817: Support macOS Aarch64 ABI in Interpreter - JDK-8257882: oops, fixed 7fe50a996b6f436932452d220b351c73153ed945 - Update signal handler part for debugger - Cleanup SA changes - Merge remote-tracking branch 'origin/jdk/jdk-macos' into jdk-macos - support macos_aarch64 in hsdis - Merge branch 'master' into jdk-macos - Update copyright year for BsdAARCH64ThreadContext.java - Fix inclusing of StubRoutines header - Redo buildsys fix - ... and 63 more: https://git.openjdk.java.net/jdk/compare/40ae9937...4094f351 ------------- Changes: https://git.openjdk.java.net/jdk/pull/2200/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2200&range=13 Stats: 3066 lines in 84 files changed: 2954 ins; 47 del; 65 mod Patch: https://git.openjdk.java.net/jdk/pull/2200.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2200/head:pull/2200 PR: https://git.openjdk.java.net/jdk/pull/2200 From akozlov at openjdk.java.net Mon Feb 15 17:45:07 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Mon, 15 Feb 2021 17:45:07 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v15] In-Reply-To: References: Message-ID: > Please review the implementation of JEP 391: macOS/AArch64 Port. > > It's heavily based on existing ports to linux/aarch64, macos/x86_64, and windows/aarch64. > > Major changes are in: > * src/hotspot/cpu/aarch64: support of the new calling convention (subtasks JDK-8253817, JDK-8253818) > * src/hotspot/os_cpu/bsd_aarch64: copy of os_cpu/linux_aarch64 with necessary adjustments (JDK-8253819) > * src/hotspot/share, test/hotspot/gtest: support of write-xor-execute (W^X), required on macOS/AArch64 platform. It's implemented with pthread_jit_write_protect_np provided by Apple. The W^X mode is local to a thread, so W^X mode change relates to the java thread state change (for java threads). In most cases, JVM executes in write-only mode, except when calling a generated stub like SafeFetch, which requires a temporary switch to execute-only mode. The same execute-only mode is enabled when a java thread executes in java or native states. This approach of managing W^X mode turned out to be simple and efficient enough. > * src/jdk.hotspot.agent: serviceability agent implementation (JDK-8254941) Anton Kozlov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 75 commits: - Pull/2200 (#5) * bsd_aarch64 cleanup * remove the actual attribute too * Refactor bailing out on nativeWrapper generation * rename c_call_conv_priv function - Merge branch 'master' into jdk-macos - Merge remote-tracking branch 'upstream/jdk/master' into jdk-macos Additional work for JDK-8253817: Support macOS Aarch64 ABI in Interpreter - JDK-8257882: oops, fixed 7fe50a996b6f436932452d220b351c73153ed945 - Update signal handler part for debugger - Cleanup SA changes - Merge remote-tracking branch 'origin/jdk/jdk-macos' into jdk-macos - support macos_aarch64 in hsdis - Merge branch 'master' into jdk-macos - Update copyright year for BsdAARCH64ThreadContext.java - ... and 65 more: https://git.openjdk.java.net/jdk/compare/849f4c0f...a9452a4c ------------- Changes: https://git.openjdk.java.net/jdk/pull/2200/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2200&range=14 Stats: 3032 lines in 83 files changed: 2919 ins; 47 del; 66 mod Patch: https://git.openjdk.java.net/jdk/pull/2200.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2200/head:pull/2200 PR: https://git.openjdk.java.net/jdk/pull/2200 From akozlov at openjdk.java.net Mon Feb 15 17:48:48 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Mon, 15 Feb 2021 17:48:48 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v15] In-Reply-To: <0-H97G4l5XqFtiEUbNAqrP_j143iny5kkF0tsiAqMvQ=.2396963b-db5d-469e-bc30-511f754a600a@github.com> References: <0-H97G4l5XqFtiEUbNAqrP_j143iny5kkF0tsiAqMvQ=.2396963b-db5d-469e-bc30-511f754a600a@github.com> Message-ID: On Sun, 31 Jan 2021 20:08:01 GMT, Vladimir Kempik wrote: >> I'm not sure it can wait. This change turns already-messy code into something significantly messier, to the extent that it's not really good enough to go into mainline. > > Hello > Does this look like something in the right direction ? > > https://github.com/VladimirKempik/jdk/commit/c2820734f4b10148154085a70d380b8c5775fa49 The latest merge with JDK-8261071 should resolve the issue. Please take a look. ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From vkempik at openjdk.java.net Mon Feb 15 18:03:51 2021 From: vkempik at openjdk.java.net (Vladimir Kempik) Date: Mon, 15 Feb 2021 18:03:51 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 21:52:47 GMT, Daniel D. Daugherty wrote: >> Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> support macos_aarch64 in hsdis > > src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 810: > >> 808: #ifdef __APPLE__ >> 809: // Less-than word types are stored one after another. >> 810: // The code unable to handle this, bailout. > > Perhaps: // The code is unable to handle this so bailout. Hello, we have updated PR, now this bailout is used only by the code which can handle it (native wrapper generator), for the rest it will cause guarantee failed if this bailout is triggered ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From vkempik at openjdk.java.net Mon Feb 15 18:03:55 2021 From: vkempik at openjdk.java.net (Vladimir Kempik) Date: Mon, 15 Feb 2021 18:03:55 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v8] In-Reply-To: References: Message-ID: On Mon, 1 Feb 2021 18:44:48 GMT, Andrew Haley wrote: >> Anton Kozlov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 62 commits: >> >> - Merge branch 'master' into jdk-macos >> - Update copyright year for BsdAARCH64ThreadContext.java >> - Fix inclusing of StubRoutines header >> - Redo buildsys fix >> - Revert harfbuzz changes, disable warnings for it >> - Little adjustement of SlowSignatureHandler >> - Partially bring previous commit >> - Revert "Address feedback for signature generators" >> >> This reverts commit 50b55f6684cd21f8b532fa979b7b6fbb4613266d. >> - Refactor CDS disabling >> - Redo builsys support for aarch64-darwin >> - ... and 52 more: https://git.openjdk.java.net/jdk/compare/8a9004da...b421e0b4 > > src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 839: > >> 837: // The code unable to handle this, bailout. >> 838: return -1; >> 839: #endif > > This looks like a bug to me. The caller doesn't necessarily check the return value. See CallRuntimeNode::calling_convention. Hello, we have updated PR, now this bailout is used only by the code which can handle it (native wrapper generator), for the rest it will cause guarantee failed if this bailout is triggered ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From akozlov at openjdk.java.net Mon Feb 15 18:03:56 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Mon, 15 Feb 2021 18:03:56 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 20:08:41 GMT, Anton Kozlov wrote: > I'm going to do as much refactoring as needed before this patch under JDK-8261071 The recent merge resolves inconsitency between pass_byte/pass_short and other methods. ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From lucy at openjdk.java.net Mon Feb 15 18:05:59 2021 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Mon, 15 Feb 2021 18:05:59 GMT Subject: RFR: 8261447: MethodInvocationCounters frequently run into overflow [v3] In-Reply-To: References: Message-ID: <9fS9BEifmElhKXeBhN6anBBKCwaf3Rk4YeEA9ONBrQ4=.44725bb0-a3b5-456f-a43e-71d45fb69c73@github.com> > Dear community, > may I please request reviews for this fix, improving the usefulness of method invocation counters. > - aggregation counters are retyped as uint64_t, shifting the overflow probability way out (185 days in case of a 1 GHz counter update frequency). > - counters for individual methods are interpreted as (unsigned int), in contrast to their declaration as int. This gives us a factor of two before the counters overflow. > - as a special case, "compiled_invocation_counter" is retyped as long, because it has a higher update frequency than other counters. > - before/after sample output is attached to the bug description. > > Thank you! > Lutz Lutz Schmidt has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - expand remaining counters to 64-bit, remove 64 suffix - 8261447: requested changes by TobiHartmann - JDK-8261447: MethodInvocationCounters frequently run into overflow - expand remaining counters to 64-bit, remove 64 duffix - 8261447: requested changes by TobiHartmann - JDK-8261447: MethodInvocationCounters frequently run into overflow ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2511/files - new: https://git.openjdk.java.net/jdk/pull/2511/files/bfd60a3c..67fb3f7d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2511&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2511&range=01-02 Stats: 8620 lines in 332 files changed: 4163 ins; 2472 del; 1985 mod Patch: https://git.openjdk.java.net/jdk/pull/2511.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2511/head:pull/2511 PR: https://git.openjdk.java.net/jdk/pull/2511 From akozlov at openjdk.java.net Mon Feb 15 18:10:09 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Mon, 15 Feb 2021 18:10:09 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v16] In-Reply-To: References: Message-ID: > Please review the implementation of JEP 391: macOS/AArch64 Port. > > It's heavily based on existing ports to linux/aarch64, macos/x86_64, and windows/aarch64. > > Major changes are in: > * src/hotspot/cpu/aarch64: support of the new calling convention (subtasks JDK-8253817, JDK-8253818) > * src/hotspot/os_cpu/bsd_aarch64: copy of os_cpu/linux_aarch64 with necessary adjustments (JDK-8253819) > * src/hotspot/share, test/hotspot/gtest: support of write-xor-execute (W^X), required on macOS/AArch64 platform. It's implemented with pthread_jit_write_protect_np provided by Apple. The W^X mode is local to a thread, so W^X mode change relates to the java thread state change (for java threads). In most cases, JVM executes in write-only mode, except when calling a generated stub like SafeFetch, which requires a temporary switch to execute-only mode. The same execute-only mode is enabled when a java thread executes in java or native states. This approach of managing W^X mode turned out to be simple and efficient enough. > * src/jdk.hotspot.agent: serviceability agent implementation (JDK-8254941) Anton Kozlov has updated the pull request incrementally with two additional commits since the last revision: - Merge pull request #6 from VladimirKempik/pull/2200 Fix typo in comments - Fix typo in comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2200/files - new: https://git.openjdk.java.net/jdk/pull/2200/files/a9452a4c..419c2b1a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2200&range=15 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2200&range=14-15 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2200.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2200/head:pull/2200 PR: https://git.openjdk.java.net/jdk/pull/2200 From akozlov at openjdk.java.net Mon Feb 15 18:24:08 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Mon, 15 Feb 2021 18:24:08 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v17] In-Reply-To: References: Message-ID: > Please review the implementation of JEP 391: macOS/AArch64 Port. > > It's heavily based on existing ports to linux/aarch64, macos/x86_64, and windows/aarch64. > > Major changes are in: > * src/hotspot/cpu/aarch64: support of the new calling convention (subtasks JDK-8253817, JDK-8253818) > * src/hotspot/os_cpu/bsd_aarch64: copy of os_cpu/linux_aarch64 with necessary adjustments (JDK-8253819) > * src/hotspot/share, test/hotspot/gtest: support of write-xor-execute (W^X), required on macOS/AArch64 platform. It's implemented with pthread_jit_write_protect_np provided by Apple. The W^X mode is local to a thread, so W^X mode change relates to the java thread state change (for java threads). In most cases, JVM executes in write-only mode, except when calling a generated stub like SafeFetch, which requires a temporary switch to execute-only mode. The same execute-only mode is enabled when a java thread executes in java or native states. This approach of managing W^X mode turned out to be simple and efficient enough. > * src/jdk.hotspot.agent: serviceability agent implementation (JDK-8254941) Anton Kozlov has updated the pull request incrementally with two additional commits since the last revision: - Merge pull request #9 from VladimirKempik/pull/2200 Removed unused variables - Removed unused variables ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2200/files - new: https://git.openjdk.java.net/jdk/pull/2200/files/419c2b1a..90e244e9 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2200&range=16 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2200&range=15-16 Stats: 5 lines in 1 file changed: 0 ins; 5 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2200.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2200/head:pull/2200 PR: https://git.openjdk.java.net/jdk/pull/2200 From lucy at openjdk.java.net Mon Feb 15 18:24:40 2021 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Mon, 15 Feb 2021 18:24:40 GMT Subject: RFR: 8261447: MethodInvocationCounters frequently run into overflow [v3] In-Reply-To: References: <7S5jdlFpZ5m2xtFinD92jQEQm6hgbQXjHR5N-3XbXkc=.fe75978e-860a-4bcc-b5b4-3e4b4246d706@github.com> Message-ID: On Fri, 12 Feb 2021 11:58:47 GMT, Lutz Schmidt wrote: >> I introduced the *64 suffixes to not break anything that still uses the old calls. As old uses disappear step by step, I'm more than happy to remove the suffixes. I will have a look into SA and try to make it 64bit counter ready. There may be no new version before the weekend is over. > > This is a request for help. Could someone with SA knowledge please check if my assumption is correct? > > In hotspot code, the field Method::_compiled_invocation_count is annotated with a comment that it is used by SA. The field is also exposed via vmStructs.cpp to enable such use. I have scanned SA code in OpenJDK11 and OpenJDK head but found no evidence that this particular field is accessed. Is this finding/assumption correct? > > If so, I could just stop exposing the field, making my life easier. Thanks! Looks like I have completely messed up my pull request. Please disregard for now. I'm trying to find a way how to clean up. Maybe I'll just start over. ------------- PR: https://git.openjdk.java.net/jdk/pull/2511 From gziemski at openjdk.java.net Mon Feb 15 18:45:41 2021 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Mon, 15 Feb 2021 18:45:41 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v6] In-Reply-To: <6f45qE_D_iGPVwyKMU4y5ifw3gtVVKwVz-OkjLGsJQc=.5e68053a-7203-46ca-9710-3768ac22e019@github.com> References: <6f45qE_D_iGPVwyKMU4y5ifw3gtVVKwVz-OkjLGsJQc=.5e68053a-7203-46ca-9710-3768ac22e019@github.com> Message-ID: On Thu, 11 Feb 2021 20:05:19 GMT, Gerard Ziemski wrote: >> Gentle ping.. > > hi Thomas, I'm interested in reviewing your fix and will work on it as soon as I'm done with https://github.com/openjdk/jdk/pull/2403 > > (tomorrow?) I'm starting the review... ------------- PR: https://git.openjdk.java.net/jdk/pull/2251 From jbhateja at openjdk.java.net Mon Feb 15 18:55:03 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Mon, 15 Feb 2021 18:55:03 GMT Subject: RFR: 8261553: Efficient mask generation using BMI2 BZHI instruction [v3] In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 05:59:01 GMT, Jatin Bhateja wrote: >>> Hi Claes, This could be a run to run variation, in general we are now having fewer number of instructions (one shift operation saved per mask computation) compared to previous masked generation sequence and thus it will always offer better execution latencies. >> >> Run-to-run variation would be easy to rule out by running more forks and more iterations to attain statistically significant results. While the instruction manuals suggest latency should be better for this instruction on all CPUs where it's supported, it would be good if there was some clear proof - such as a significant benchmark win - to motivate the added complexity. > >> > Hi Claes, This could be a run to run variation, in general we are now having fewer number of instructions (one shift operation saved per mask computation) compared to previous masked generation sequence and thus it will always offer better execution latencies. >> >> Run-to-run variation would be easy to rule out by running more forks and more iterations to attain statistically significant results. While the instruction manuals suggest latency should be better for this instruction on all CPUs where it's supported, it would be good if there was some clear proof - such as a significant benchmark win - to motivate the added complexity. > > BASELINE: > Result "org.openjdk.bench.java.lang.ArrayCopyUnalignedSrc.testLong": > 61.037 ns/op > > Secondary result "org.openjdk.bench.java.lang.ArrayCopyUnalignedSrc.testLong:??perf": > Perf stats: > -------------------------------------------------- > > 19,739.21 msec task-clock # 0.389 CPUs utilized > 646 context-switches # 0.033 K/sec > 12 cpu-migrations # 0.001 K/sec > 150 page-faults # 0.008 K/sec > 74,59,83,59,139 cycles # 3.779 GHz (30.73%) > 1,78,78,79,19,117 instructions # 2.40 insn per cycle (38.48%) > 24,79,81,63,651 branches # 1256.289 M/sec (38.55%) > 32,24,89,924 branch-misses # 1.30% of all branches (38.62%) > 52,56,88,28,472 L1-dcache-loads # 2663.167 M/sec (38.65%) > 39,00,969 L1-dcache-load-misses # 0.01% of all L1-dcache hits (38.57%) > 3,74,131 LLC-loads # 0.019 M/sec (30.77%) > 22,315 LLC-load-misses # 5.96% of all LL-cache hits (30.72%) > L1-icache-loads > 17,49,997 L1-icache-load-misses (30.72%) > 52,91,41,70,636 dTLB-loads # 2680.663 M/sec (30.69%) > 3,315 dTLB-load-misses # 0.00% of all dTLB cache hits (30.67%) > 4,674 iTLB-loads # 0.237 K/sec (30.65%) > 33,746 iTLB-load-misses # 721.99% of all iTLB cache hits (30.63%) > L1-dcache-prefetches > L1-dcache-prefetch-misses > > 50.723759146 seconds time elapsed > > 51.447054000 seconds user > 0.189949000 seconds sys > > > WITH OPT: > Result "org.openjdk.bench.java.lang.ArrayCopyUnalignedSrc.testLong": > 74.356 ns/op > > Secondary result "org.openjdk.bench.java.lang.ArrayCopyUnalignedSrc.testLong:??perf": > Perf stats: > -------------------------------------------------- > > 19,741.09 msec task-clock # 0.389 CPUs utilized > 641 context-switches # 0.032 K/sec > 17 cpu-migrations # 0.001 K/sec > 164 page-faults # 0.008 K/sec > 74,40,40,48,513 cycles # 3.769 GHz (30.81%) > 1,45,66,22,06,797 instructions # 1.96 insn per cycle (38.56%) > 20,31,28,43,577 branches # 1028.963 M/sec (38.65%) > 14,11,419 branch-misses # 0.01% of all branches (38.69%) > 43,07,86,33,662 L1-dcache-loads # 2182.182 M/sec (38.72%) > 37,06,744 L1-dcache-load-misses # 0.01% of all L1-dcache hits (38.56%) > 1,34,292 LLC-loads # 0.007 M/sec (30.72%) > 30,627 LLC-load-misses # 22.81% of all LL-cache hits (30.68%) > L1-icache-loads > 14,49,145 L1-icache-load-misses (30.65%) > 43,44,86,27,516 dTLB-loads # 2200.924 M/sec (30.63%) > 218 dTLB-load-misses # 0.00% of all dTLB cache hits (30.63%) > 2,445 iTLB-loads # 0.124 K/sec (30.63%) > 28,624 iTLB-load-misses # 1170.72% of all iTLB cache hits (30.63%) > L1-dcache-prefetches > L1-dcache-prefetch-misses > > 50.716083931 seconds time elapsed > > 51.467300000 seconds user > 0.200390000 seconds sys > > > JMH perf data for ArrayCopyUnalignedSrc.testLong with copy length of 1200 shows degradation in LID accesses, it seems the benchmask got displaced from its sweet spot. > > But, there is a significant reduction in instruction count and cycles are almost comparable. We are saving one shift per mask computation. > > OLD Sequence: > 0x00007f7fc1030ead: movabs $0x1,%rax > 0x00007f7fc1030eb7: shlx %r8,%rax,%rax > 0x00007f7fc1030ebc: dec %rax > 0x00007f7fc1030ebf: kmovq %rax,%k2 > NEW Sequence: > 0x00007f775d030d51: movabs $0xffffffffffffffff,%rax > 0x00007f775d030d5b: bzhi %r8,%rax,%rax > 0x00007f775d030d60: kmovq %rax,%k2 Further analysis of perf degradation revealed that with new optimized instruction pattern, code alignment got disturbed. This led to increase in LSD misses, also it reduced the UOPs cashing in DSB. Aligning copy loops at 32 byte boundary prevents any adverse impact on UOP caching. NOPs used for padding add up to the instruction count and thus may over shadow the code size gains due to new mask generation sequence in copy stubs. Baseline: ArrayCopyAligned.testLong Length : 1200 61 ns/op (approx) 1,93,44,43,11,622 cycles 4,59,57,99,78,727 instructions # 2.38 insn per cycle 1,83,68,75,68,255 idq.dsb_uops 2,08,32,43,71,906 lsd.uops 37,12,54,60,211 idq.mite_uops With Opt: ArrayCopyAligned.testLong Length : 1200 74 ns/op (approx) 1,93,51,25,94,766 cycles 3,75,11,57,91,917 instructions # 1.94 insn per cycle 48,67,58,25,566 idq.dsb_uops 19,46,13,236 lsd.uops 2,87,42,95,74,280 idq.mite_uops With Opt + main loop alignment(nop): 61 ns/op (approx) ArrayCopyAligned.testLong Length : 1200 1,93,52,15,90,080 cycles 4,60,89,14,06,528 instructions # 2.38 insn per cycle 1,78,76,10,34,991 idq.dsb_uops 2,09,16,15,84,313 lsd.uops 46,25,31,92,101 idq.mite_uops While computing the mask for partial in-lining of small copy calls ( currently enabled for sub-word types with copy length less than 32/64 bytes), new optimized sequence should always offer lower instruction count and latency path. Baseline: ArrayCopyAligned.testByte Length : 20 avgt 2 2.635 ns/op 1,97,76,75,18,052 cycles 8,96,00,37,11,803 instructions # 4.53 insn per cycle 2,71,83,79,035 idq.dsb_uops 7,54,82,43,63,409 lsd.uops 3,92,55,74,395 idq.mite_uops ArrayCopyAligned.testByte Length : 31 avgt 2 2.635 ns/op 1,97,79,16,56,787 cycles 8,96,13,15,69,780 instructions # 4.53 insn per cycle 2,69,07,11,691 idq.dsb_uops 7,54,95,63,77,683 lsd.uops 3,90,19,10,747 idq.mite_uops WithOpt: ArrayCopyAligned.testByte Length : 20 avgt 2 2.635 ns/op 1,97,66,64,62,541 cycles 8,92,03,95,00,236 instructions # 4.51 insn per cycle 2,72,38,56,205 idq.dsb_uops 7,50,87,50,60,591 lsd.uops 3,89,15,02,954 idq.mite_uops ArrayCopyAligned.testByte Length : 31 avgt 2 2.635 ns/op 1,97,54,21,61,110 cycles 8,91,46,64,23,754 instructions # 4.51 insn per cycle 2,78,12,19,544 idq.dsb_uops 7,50,35,88,95,843 lsd.uops 3,90,41,97,276 idq.mite_uops Following are the links to updated JMH perf data: http://cr.openjdk.java.net/~jbhateja/8261553/JMH_PERF_CLX_BASELINE.txt http://cr.openjdk.java.net/~jbhateja/8261553/JMH_PERF_CLX_WITH_OPTS_LOOP_ALIGN.txt In general gains are not significant in case of copy stubs, but new sequence offers a optimal latency path for mask computation sequence. ------------- PR: https://git.openjdk.java.net/jdk/pull/2522 From jbhateja at openjdk.java.net Mon Feb 15 18:55:02 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Mon, 15 Feb 2021 18:55:02 GMT Subject: RFR: 8261553: Efficient mask generation using BMI2 BZHI instruction [v3] In-Reply-To: References: Message-ID: > BMI2 BHZI instruction can be used to optimize the instruction sequence > used for mask generation at various place in array copy stubs and partial in-lining for small copy operations. Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: 8261553 : Aligning main copy loop to prevent any penalty due to LSD and DSB misses. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2522/files - new: https://git.openjdk.java.net/jdk/pull/2522/files/84c9c2da..7012eed0 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2522&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2522&range=01-02 Stats: 8 lines in 1 file changed: 4 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/2522.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2522/head:pull/2522 PR: https://git.openjdk.java.net/jdk/pull/2522 From aph at openjdk.java.net Mon Feb 15 19:10:47 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 15 Feb 2021 19:10:47 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v8] In-Reply-To: References: Message-ID: On Mon, 15 Feb 2021 18:00:50 GMT, Vladimir Kempik wrote: >> src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 839: >> >>> 837: // The code unable to handle this, bailout. >>> 838: return -1; >>> 839: #endif >> >> This looks like a bug to me. The caller doesn't necessarily check the return value. See CallRuntimeNode::calling_convention. > > Hello, we have updated PR, now this bailout is used only by the code which can handle it (native wrapper generator), for the rest it will cause guarantee failed if this bailout is triggered This is when passing a float, yes? In the case where we have more float arguments than n_float_register_parameters_c. I don't understand why you think it's acceptable to bail in this case. Can you explain, please? ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From stefank at openjdk.java.net Mon Feb 15 19:56:39 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Mon, 15 Feb 2021 19:56:39 GMT Subject: RFR: 8261075: Create stubRoutines.inline.hpp with SafeFetch implementation In-Reply-To: References: <2kFQ29OCibSJYdEH5pI5ysRl7KmU4vreLmqGjqJPmHA=.e0ae2601-7063-4eec-b21f-62ab6df01dcf@github.com> <8xMNV_-0gXYQHgE9764gqfj2KGbfjve5iLLdM7re1C0=.8ba7575b-3159-4863-ad5a-84e1b5fce473@github.com> Message-ID: On Mon, 15 Feb 2021 15:47:47 GMT, Anton Kozlov wrote: >>> > I would rename the new header safefetch.hpp, just because these functions were never part of the StubRoutines namespace, and the naming is much clearer. >>> >>> Good idea. It should be safefetch.inline.hpp, right? >> >> I don't think so? To my understanding, xxx.inline.hpp is a companion file to an xxx.hpp which you create if you want to remove some of the rarely used functions from a high traffic header. Here, we just have some independent global utility functions. > >> To my understanding, xxx.inline.hpp is a companion file to an xxx.hpp which you create if you want to remove some of the rarely used functions from a high traffic header. > > Oh, you're right. I had an impression that there are standalone inline.hpp files, and I could not find the opposite in the Hotspot Style Guide. However, now I see inline.hpp are just the way you've described. Thanks! If you put non-trivial code in the header, then it should go into an .inline.hpp file (or .cpp file). From the style-guide: Do not put non-trivial function implementations in .hpp files. If the implementation depends on other .hpp files, put it in a .cpp or a .inline.hpp file. It doesn't matter if there exists a .hpp file or not. However, what's non-trivial becomes a judgment call. I tend to use rule that if it uses code from another header, then it's non-trivial, and should most likely be moved out of the .hpp file. ------------- PR: https://git.openjdk.java.net/jdk/pull/2542 From gziemski at openjdk.java.net Mon Feb 15 22:43:43 2021 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Mon, 15 Feb 2021 22:43:43 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v6] In-Reply-To: References: Message-ID: On Tue, 9 Feb 2021 05:35:03 GMT, Thomas Stuefe wrote: >> In signal handling code, we have code sections which save signal handler state into vectors of sigaction structures, or of integers (if only flags are saved). All these code sections can be unified, disentangled and the using code simplified. >> >> There are three places where we do this: >> >> 1) When installing hotspot signal handlers, should we find a handler in place and signal chaining is enabled, we save the original handler inside a sigaction array and a corresponding sigset: >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L85 >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L338 >> >> 2) if diagnostics are enabled with -Xcheck:jni, we periodically check if our hotspot signal handlers had been replaced (`static void check_signal_handler(int sig)`): >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L766 >> To do that, we store information about the handlers we installed and we expect to be intact; in this case we only store the sigaction flags (`int sigflags[NSIG];`) and deduce the handler address from context. >> >> 3) There is a complicated dance between VMError and the posix signal handler code: If a fatal error happens, we enter error reporting and install the secondary handler (`VMError::install_secondary_signal_handler()`). Before doing that, we store the handler we replace in yet another array, in this case one array for the handler address, one for the flag: >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/vmError_posix.cpp#L77 >> I believe the purpose of this is to - when printing signal handlers as part of error reporting - print the original signal handler instead of the secondary crash handler (see `PosixSignals::print_signal_handler()`): >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L1372 >> and additionally to not trip this warning here: >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L1391 >> >> ------ >> >> Changes in this patch: >> >> - I added some convenience macros to check if a handler matches a given function (HANDLER_IS), check if a handler is set to ignore or default or both (HANDLER_IS_IGN, HANDLER_IS_DFL, HANDLER_IS_IGN_OR_DFL). Makes code more readable. >> - I added convenience class `SavedSignalHandlers` to keep a vector of handler information by signal number. >> - I used that class to cover cases (1)..(3): >> - `chained_handlers` contains all information of chained handlers >> - `expected_handlers` contains a copy of the handlers the hotspot installed >> - `replaced_handlers` contains information about replaced handlers >> >> - about (1): I store the chained signal handler information in `chained_handlers` when installing a hotspot handler, UseSignalChaining is 1, and a non-default handler was encountered. >> >> - about (2): I simplified the signal checking mechanism quite a bit: it compares the handler (address and flags) it finds present with expectations. Before this patch, the expected handler address was deduced in a hard-wired way, now, we just compare the active sigaction structure with the one we installed on VM start. >> >> - about (3): when installing any handler (hotspot as well as user defined via java), I store the handler it replaced in `replaced_handlers`. I use that to print which handler had been replaced in `PosixSignals::print_signal_handler`. I simplified `PosixSignals::print_signal_handler` such that it does not retain any knowledge about hotspot signal handlers. Now, it just prints out the currently established handlers. In addition to that, it prints out chaining information and which handlers had been replaced. I removed the associated coding from VMError. >> >> Output Before: >> 663 Signal Handlers: >> 664 SIGSEGV: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 665 SIGBUS: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 666 SIGFPE: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 667 SIGPIPE: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 668 SIGXFSZ: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 669 SIGILL: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 670 SIGUSR2: SR_handler in libjvm.so, sa_mask[0]=00000000000000000000000000000000, sa_flags=SA_RESTART|SA_SIGINFO >> 671 SIGHUP: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 672 SIGINT: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 673 SIGTERM: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 674 SIGQUIT: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 675 SIGTRAP: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> >> Now: >> Signal Handlers: >> SIGSEGV: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGSEGV: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGBUS: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGBUS: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGFPE: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGFPE: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGPIPE: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGXFSZ: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGILL: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGILL: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGUSR2: SR_handler in libjvm.so, mask=00000000000000000000000000000000, flags=SA_RESTART|SA_SIGINFO >> SIGHUP: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGINT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGTERM: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGQUIT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGTRAP: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> >> ----- >> Tests: GA, and the patch has been tested in our nighlies for over a month now. I manually executed the runtime/jni/checked tests too. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Use universal zero initializer for do_check_signal_periodically I did not get as far as I hoped today, I'll review more tomorrow. src/hotspot/os/posix/signals_posix.cpp line 115: > 113: assert(sig > 0 && sig < NSIG, "invalid signal number %d", sig); > 114: return sig > 0 && sig < NSIG; > 115: } We have assert(sig > 0 && sig < NSIG, "invalid signal number %d", sig); return sig > 0 && sig < NSIG; here in `SavedSignalHandlers::check_signal_number()` and #if defined(__APPLE__) return sig >= 1 && sig < NSIG; #else in `is_valid_signal()`. Can we combine those, something like: // Returns true if signal number is valid. static bool is_valid_signal(int sig) { assert(sig > 0 && sig < NSIG, "invalid signal number %d", sig); if (sig > 0 && sig < NSIG) { // Use sigaddset to check for signal validity. sigset_t set; sigemptyset(&set); if (sigaddset(&set, sig) == -1 && errno == EINVAL) { return false; } else { return true; } } else { return false; } } so then we can drop `SavedSignalHandlers::check_signal_number()` altogether? src/hotspot/os/posix/signals_posix.cpp line 414: > 412: if (actp == NULL) { > 413: // Retrieve the preinstalled signal handler from jvm > 414: actp = const_cast(chained_handlers.get(sig)); Must `SavedSignalHandlers::get()` have the **const struct** in `const struct sigaction* get(int sig) const` signature? If it was just `struct sigaction* get(int sig) const` then we wouldn't need this awkward cast. ------------- Changes requested by gziemski (Committer). PR: https://git.openjdk.java.net/jdk/pull/2251 From duke at openjdk.java.net Tue Feb 16 01:29:44 2021 From: duke at openjdk.java.net (duke) Date: Tue, 16 Feb 2021 01:29:44 GMT Subject: Withdrawn: 8253757: Add LLVM-based backend for hsdis In-Reply-To: <91erxiMDb4ftvSomuJYHPi9SX-v8Z2VLD2qEwCbz5tk=.b9ed01b5-f0e0-4ed7-9c1a-b06bc0e64640@github.com> References: <91erxiMDb4ftvSomuJYHPi9SX-v8Z2VLD2qEwCbz5tk=.b9ed01b5-f0e0-4ed7-9c1a-b06bc0e64640@github.com> Message-ID: On Tue, 29 Sep 2020 04:36:16 GMT, Ludovic Henry wrote: > When bringing up Hotspot onto new platforms, it is not always possible to compile hsdis because gcc is not yet available. For example, for Windows-AArch64 and macOS-AArch64. > > For some such platforms, it is possible to use LLVM as an alternative backend as it also supports a disassembler feature. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/392 From kbarrett at openjdk.java.net Tue Feb 16 06:21:39 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 16 Feb 2021 06:21:39 GMT Subject: RFR: 8259668: Make SubTasksDone use-once [v3] In-Reply-To: References: Message-ID: On Sun, 7 Feb 2021 09:47:08 GMT, Albert Mingkun Yang wrote: >> After JDK-8260574, a instance of `SubTasksDone` is never reused, so part of its APIs could be revised: `clear()` and the code calling it is removed. >> >> With this patch, `all_tasks_completed` contains only assertion. Kim suggested moving this assertion logic to `~SubTasksDone`, but that could defer the assertion violation. For example, in the case of `G1FullGCMarkTask::work`, there is a significant amount of code running btw the instance when all subtasks are claimed (where `all_tasks_completed` is called in this PR) and `~SubTasksDone`. In the interest of having more precise location where bugs may lie, I have kept `all_tasks_completed` in the original place. More comments on this are welcome. > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review I would prefer to see all_tasks_claimed eliminated rather than made more precise at the expense of being more complicated. I think the added precision doesn't provide enough benefit (vs alternatives like checking once in the destructor) to pay the API, usage, and implementation cost. But that's matter of opinion, about which Albert feels to the contrary. I think this change is an improvement regardless of that question, so won't block on that basis. So looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2383 From vkempik at openjdk.java.net Tue Feb 16 06:26:46 2021 From: vkempik at openjdk.java.net (Vladimir Kempik) Date: Tue, 16 Feb 2021 06:26:46 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v8] In-Reply-To: References: Message-ID: On Mon, 15 Feb 2021 19:07:40 GMT, Andrew Haley wrote: >> Hello, we have updated PR, now this bailout is used only by the code which can handle it (native wrapper generator), for the rest it will cause guarantee failed if this bailout is triggered > > This is when passing a float, yes? In the case where we have more float arguments than n_float_register_parameters_c. > I don't understand why you think it's acceptable to bail in this case. Can you explain, please? it's for everything that uses less than 8 bytes on a stack( ints ( 4), shorts(2), bytes(1), floats(4)). currently native wrapper generation does not support such cases at all, it needs refactoring before this can be implemented. So when a method has more argument than can be placed in registers, we may have issues. So we just bailing out to interpreter in case when a smaller (<=4 b) type is going to be passed thru the stack. There was attempt to implement handling such cases but currently it requires some hacks (like using some vectors for non-specific task) - https://github.com/openjdk/aarch64-port/pull/3 ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From ayang at openjdk.java.net Tue Feb 16 08:24:39 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 16 Feb 2021 08:24:39 GMT Subject: RFR: 8259668: Make SubTasksDone use-once [v3] In-Reply-To: References: Message-ID: On Tue, 16 Feb 2021 06:19:13 GMT, Kim Barrett wrote: >> Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > I would prefer to see all_tasks_claimed eliminated rather than made more > precise at the expense of being more complicated. I think the added > precision doesn't provide enough benefit (vs alternatives like checking once > in the destructor) to pay the API, usage, and implementation cost. But > that's matter of opinion, about which Albert feels to the contrary. I think > this change is an improvement regardless of that question, so won't block on > that basis. > > So looks good. Thanks for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/2383 From ysuenaga at openjdk.java.net Tue Feb 16 08:29:39 2021 From: ysuenaga at openjdk.java.net (Yasumasa Suenaga) Date: Tue, 16 Feb 2021 08:29:39 GMT Subject: RFR: 8256916: Add JFR event for OutOfMemoryError In-Reply-To: References: <73crkD-SepaAqEyV2wuGyO8tnAHjQmiIOHjN9zovm8M=.3ea49ef3-a2f6-487e-b86a-956832861a46@github.com> <7n0kAmo0Qtlt0sD2XWE52BL_6BI0kMffisYLj8R8O38=.3d324a31-4909-4b6d-bc90-a63f7a6a41ef@github.com> <_zddsYkBbuHo_AfH3Ld3jnBUID7EnOvIhjwK2qm67sw=.3ad67eeb-8a25-4cae-80cf-d34b73d65723@github.com> Message-ID: <202P26t9PXaIZbQfvwBXdNJMcinkidPR2MVjqvzbY6E=.815a8382-3642-4b34-a372-a7cb45659fe7@github.com> On Tue, 19 Jan 2021 01:35:20 GMT, Yasumasa Suenaga wrote: >> As you say: "move throwing events into HotSpot makes big change" - I agree and Erik has already stated that many have tried already to make this kind of change, but attempts have faltered because the impact / disruptions are unclear. Do you fully understand the consequences of how it will change the way users work with exceptions in JFR? If C2 optimizations now start to remove exception sites that were previously reported? >> >> I understand and appreciate the ambition, and I acknowledge the existing mechanisms has drawbacks (known for a long time) but reworking how exceptions are reported is a big project. Such a project should also involve exception throttling - and throttling might pose additional constraints / opportunities that need to be considered. >> >> Modifying this subsystem is a sensitive undertaking and should solve the fundamental and already known problems before it is to be endorsed and for the disruption to end user motivated. > >> As you say: "move throwing events into HotSpot makes big change" - I agree and Erik has already stated that many have tried already to make this kind of change, but attempts have faltered because the impact / disruptions are unclear. Do you fully understand the consequences of how it will change the way users work with exceptions in JFR? If C2 optimizations now start to remove exception sites that were previously reported? > > I believe the user who hooks JavaErrorThrow event wants to know the occurrence of OOME. > Let's think about the user who want to restart the system automatically if the fatal error happens. > The user defines OOME as an event that should be restarted and monitored by remote recording. If JavaErrorThrow event for OOME cannot be hooked, it is meaningless. > > I understand all `Throwable` s cannot be hooked due to C2 optimization, but most of them should be hooked. They should be hooked as well as unified logging at least, and we can do it like this PR. I said in before, it is big change, but JFR should hook exceptions as possible like unified logging with `exceptions` tag. I understand some exceptions cannot be hooked due to JIT compilation in this discussion, but even if so, it is a problem that JFR cannot hook various exceptions which are thrown in HotSpot. Again, I understand to fix this problem is very difficult, but we should work for it as possible, or we cannot analyze the relation between exceptions and various metrics in flight record. Of course I agree to separate to smaller fix. Anyway, I want to record exceptions in JFR. It is a problem not to be able to hook critical errors such as OutOfMemoryError and StackOverflowError. ------------- PR: https://git.openjdk.java.net/jdk/pull/1403 From ayang at openjdk.java.net Tue Feb 16 08:50:38 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 16 Feb 2021 08:50:38 GMT Subject: Integrated: 8259668: Make SubTasksDone use-once In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 16:26:33 GMT, Albert Mingkun Yang wrote: > After JDK-8260574, a instance of `SubTasksDone` is never reused, so part of its APIs could be revised: `clear()` and the code calling it is removed. > > With this patch, `all_tasks_completed` contains only assertion. Kim suggested moving this assertion logic to `~SubTasksDone`, but that could defer the assertion violation. For example, in the case of `G1FullGCMarkTask::work`, there is a significant amount of code running btw the instance when all subtasks are claimed (where `all_tasks_completed` is called in this PR) and `~SubTasksDone`. In the interest of having more precise location where bugs may lie, I have kept `all_tasks_completed` in the original place. More comments on this are welcome. This pull request has now been integrated. Changeset: 3cbd16de Author: Albert Mingkun Yang Committer: Thomas Schatzl URL: https://git.openjdk.java.net/jdk/commit/3cbd16de Stats: 63 lines in 4 files changed: 11 ins; 39 del; 13 mod 8259668: Make SubTasksDone use-once Reviewed-by: tschatzl, kbarrett ------------- PR: https://git.openjdk.java.net/jdk/pull/2383 From shade at openjdk.java.net Tue Feb 16 10:26:06 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 16 Feb 2021 10:26:06 GMT Subject: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v3] In-Reply-To: References: Message-ID: > Shenandoah carries forwardee information in object's mark word. Installing the new mark word is effectively "releasing" the object copy, and reading from the new mark word is "acquiring" that object copy. > > For the forwardee update side, Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. This seems to be excessive for Shenandoah forwardee updates, and "release" is enough. > > For the forwardee load side, we need to guarantee "acquire". We do not do it now, reading the markword without memory semantics. It does not seem to pose a practical problem today, because GC does not access the object contents in the new copy, and mutators get this from the JRT-called stub that separates the fwdptr access and object contents access by a lot. It still should be cleaner to "acquire" the mark on load to avoid surprises. > > Additional testing: > - [x] Linux x86_64 `hotspot_gc_shenandoah` > - [x] Linux AArch64 `hotspot_gc_shenandoah` > - [x] Linux AArch64 `tier1` with Shenandoah Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - A few minor touchups - Add a blurb to x86 code as well - Use implicit "consume" in AArch64, add more notes. - Merge branch 'master' into JDK-8261492-shenandoah-forwardee-memord - Make sure to access fwdptr with acquire semantics in assembler code - 8261492: Shenandoah: reconsider forwardee accesses memory ordering ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2496/files - new: https://git.openjdk.java.net/jdk/pull/2496/files/49626781..0c159cc3 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2496&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2496&range=01-02 Stats: 11804 lines in 386 files changed: 5929 ins; 3700 del; 2175 mod Patch: https://git.openjdk.java.net/jdk/pull/2496.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2496/head:pull/2496 PR: https://git.openjdk.java.net/jdk/pull/2496 From shade at openjdk.java.net Tue Feb 16 10:26:06 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 16 Feb 2021 10:26:06 GMT Subject: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v2] In-Reply-To: References: Message-ID: On Mon, 15 Feb 2021 11:58:18 GMT, Martin Doerr wrote: >>> I guess the shenandoahBarrierSetAssembler_aarch64.cpp part you're changing is not very perfomance sensitive? >> >> Yes, it is not supposed to be: CAS failure path when GC is relocating the objects. > > I'd prefer using load-consume with comment in assembly code and acquire in C++ code. That would be consistent with other code. But that's just my opinion. I'll leave the aarch64 maintainers free to decide. All right, I added more discussion right in the code comments that hopefully makes the whole thing clearer. Re-running tests now with dirty AArch64 relaxed CAS patch, but we would need to get Andrew's fix for https://bugs.openjdk.java.net/browse/JDK-8261579 to properly estimate the performance impact. ------------- PR: https://git.openjdk.java.net/jdk/pull/2496 From akozlov at openjdk.java.net Tue Feb 16 13:04:04 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Tue, 16 Feb 2021 13:04:04 GMT Subject: RFR: 8261075: Create stubRoutines.inline.hpp with SafeFetch implementation [v2] In-Reply-To: References: Message-ID: > Hi, > > Please reivew a small non-functional change that extracts inline SafeFetch functions to a separate file. This is preliminary work for JEP-391 integration that will reduce the size of that patch. > > CC @dcubed-ojdk > > Thanks! Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: stubRoutines.inline.hpp -> safefetch.hpp ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2542/files - new: https://git.openjdk.java.net/jdk/pull/2542/files/a00d9064..d2957b98 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2542&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2542&range=00-01 Stats: 20 lines in 10 files changed: 7 ins; 8 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/2542.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2542/head:pull/2542 PR: https://git.openjdk.java.net/jdk/pull/2542 From vkempik at openjdk.java.net Tue Feb 16 14:09:54 2021 From: vkempik at openjdk.java.net (Vladimir Kempik) Date: Tue, 16 Feb 2021 14:09:54 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v10] In-Reply-To: References: Message-ID: <44L8Jccum5-J3RntkBsRZJ5daAJ-X1tYt_4xspIbP6U=.4dd53895-2fb7-4396-82e1-8387828bcdbf@github.com> On Thu, 4 Feb 2021 22:49:23 GMT, Gerard Ziemski wrote: >> Anton Kozlov has updated the pull request incrementally with six additional commits since the last revision: >> >> - Merge remote-tracking branch 'origin/jdk/jdk-macos' into jdk-macos >> - Add comments to WX transitions >> >> + minor change of placements >> - Use macro conditionals instead of empty functions >> - Add W^X to tests >> - Do not require known W^X state >> - Revert w^x in gtests > > src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp line 297: > >> 295: stub = SharedRuntime::handle_unsafe_access(thread, next_pc); >> 296: } >> 297: } else if (sig == SIGILL && nativeInstruction_at(pc)->is_stop()) { > > Can we add a comment here describing what this case means? This was added as part of this commit ( to linux_aarch64) - https://github.com/openjdk/jdk/commit/339d52600b285eb3bc57d9ff107567d4424efeb1 @gerard-ziemski do we really want to add anything new here ? ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From lucy at openjdk.java.net Tue Feb 16 15:20:17 2021 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Tue, 16 Feb 2021 15:20:17 GMT Subject: RFR: 8261447: MethodInvocationCounters frequently run into overflow [v4] In-Reply-To: References: Message-ID: <_AP8N2ZJvjN2VbW_JhYes3vCgEGtmSVQYtSqLm6VsGI=.39897514-c44a-49b6-8e81-2f83b3bcde44@github.com> > Dear community, > may I please request reviews for this fix, improving the usefulness of method invocation counters. > - aggregation counters are retyped as uint64_t, shifting the overflow probability way out (185 days in case of a 1 GHz counter update frequency). > - counters for individual methods are interpreted as (unsigned int), in contrast to their declaration as int. This gives us a factor of two before the counters overflow. > - as a special case, "compiled_invocation_counter" is retyped as long, because it has a higher update frequency than other counters. > - before/after sample output is attached to the bug description. > > Thank you! > Lutz Lutz Schmidt has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - 8261447: requested changes by TobiHartmann - JDK-8261447: MethodInvocationCounters frequently run into overflow - expand remaining counters to 64-bit, remove 64 duffix - 8261447: requested changes by TobiHartmann - JDK-8261447: MethodInvocationCounters frequently run into overflow ------------- Changes: https://git.openjdk.java.net/jdk/pull/2511/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2511&range=03 Stats: 5969 lines in 92 files changed: 5856 ins; 4 del; 109 mod Patch: https://git.openjdk.java.net/jdk/pull/2511.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2511/head:pull/2511 PR: https://git.openjdk.java.net/jdk/pull/2511 From lucy at openjdk.java.net Tue Feb 16 15:31:02 2021 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Tue, 16 Feb 2021 15:31:02 GMT Subject: RFR: 8261447: MethodInvocationCounters frequently run into overflow [v5] In-Reply-To: References: Message-ID: > Dear community, > may I please request reviews for this fix, improving the usefulness of method invocation counters. > - aggregation counters are retyped as uint64_t, shifting the overflow probability way out (185 days in case of a 1 GHz counter update frequency). > - counters for individual methods are interpreted as (unsigned int), in contrast to their declaration as int. This gives us a factor of two before the counters overflow. > - as a special case, "compiled_invocation_counter" is retyped as long, because it has a higher update frequency than other counters. > - before/after sample output is attached to the bug description. > > Thank you! > Lutz Lutz Schmidt has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2511/files - new: https://git.openjdk.java.net/jdk/pull/2511/files/273d55c2..0a99ee4e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2511&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2511&range=03-04 Stats: 5780 lines in 79 files changed: 0 ins; 5779 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2511.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2511/head:pull/2511 PR: https://git.openjdk.java.net/jdk/pull/2511 From akozlov at openjdk.java.net Tue Feb 16 16:06:38 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Tue, 16 Feb 2021 16:06:38 GMT Subject: RFR: 8261075: Create stubRoutines.inline.hpp with SafeFetch implementation [v2] In-Reply-To: References: <2kFQ29OCibSJYdEH5pI5ysRl7KmU4vreLmqGjqJPmHA=.e0ae2601-7063-4eec-b21f-62ab6df01dcf@github.com> <8xMNV_-0gXYQHgE9764gqfj2KGbfjve5iLLdM7re1C0=.8ba7575b-3159-4863-ad5a-84e1b5fce473@github.com> Message-ID: On Mon, 15 Feb 2021 19:53:48 GMT, Stefan Karlsson wrote: > It doesn't matter if there exists a .hpp file or not. However, what's non-trivial becomes a judgment call. I tend to use rule that if it uses code from another header, then it's non-trivial, and should most likely be moved out of the .hpp file. I suppose this file to be on the edge between trivial and not. Later, it will have a W^X transition and will include thread.hpp, I don't want to rename it again. @stefank, what do you think, should it be safefetch.inline.hpp? Or are you fine with safefetch.hpp? ------------- PR: https://git.openjdk.java.net/jdk/pull/2542 From stefank at openjdk.java.net Tue Feb 16 16:14:41 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Tue, 16 Feb 2021 16:14:41 GMT Subject: RFR: 8261075: Create stubRoutines.inline.hpp with SafeFetch implementation [v2] In-Reply-To: References: <2kFQ29OCibSJYdEH5pI5ysRl7KmU4vreLmqGjqJPmHA=.e0ae2601-7063-4eec-b21f-62ab6df01dcf@github.com> <8xMNV_-0gXYQHgE9764gqfj2KGbfjve5iLLdM7re1C0=.8ba7575b-3159-4863-ad5a-84e1b5fce473@github.com> Message-ID: On Tue, 16 Feb 2021 16:04:05 GMT, Anton Kozlov wrote: >> If you put non-trivial code in the header, then it should go into an .inline.hpp file (or .cpp file). From the style-guide: >> Do not put non-trivial function implementations in .hpp files. If the implementation depends on other .hpp files, put it in a .cpp or a .inline.hpp file. >> It doesn't matter if there exists a .hpp file or not. However, what's non-trivial becomes a judgment call. I tend to use rule that if it uses code from another header, then it's non-trivial, and should most likely be moved out of the .hpp file. > >> It doesn't matter if there exists a .hpp file or not. However, what's non-trivial becomes a judgment call. I tend to use rule that if it uses code from another header, then it's non-trivial, and should most likely be moved out of the .hpp file. > > I suppose this file to be on the edge between trivial and not. Later, it will have a W^X transition and will include thread.hpp, I don't want to rename it again. @stefank, what do you think, should it be safefetch.inline.hpp? Or are you fine with safefetch.hpp? I'm fine with leaving it as safefetch.hpp. ------------- PR: https://git.openjdk.java.net/jdk/pull/2542 From stefank at openjdk.java.net Tue Feb 16 16:14:41 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Tue, 16 Feb 2021 16:14:41 GMT Subject: RFR: 8261075: Create stubRoutines.inline.hpp with SafeFetch implementation [v2] In-Reply-To: References: Message-ID: On Tue, 16 Feb 2021 13:04:04 GMT, Anton Kozlov wrote: >> Hi, >> >> Please reivew a small non-functional change that extracts inline SafeFetch functions to a separate file. This is preliminary work for JEP-391 integration that will reduce the size of that patch. >> >> CC @dcubed-ojdk >> >> Thanks! > > Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > stubRoutines.inline.hpp -> safefetch.hpp Marked as reviewed by stefank (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2542 From stuefe at openjdk.java.net Tue Feb 16 16:27:41 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 16 Feb 2021 16:27:41 GMT Subject: RFR: 8261075: Create stubRoutines.inline.hpp with SafeFetch implementation [v2] In-Reply-To: References: Message-ID: <4WH-uG2jzE7FnlxBUsQZ2vo7_FGR3FitcNZmEtjhAAQ=.a1b8f140-3c58-4993-8167-ce8f44b43ef5@github.com> On Tue, 16 Feb 2021 13:04:04 GMT, Anton Kozlov wrote: >> Hi, >> >> Please reivew a small non-functional change that extracts inline SafeFetch functions to a separate file. This is preliminary work for JEP-391 integration that will reduce the size of that patch. >> >> CC @dcubed-ojdk >> >> Thanks! > > Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > stubRoutines.inline.hpp -> safefetch.hpp LGTM ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2542 From github.com+168222+mgkwill at openjdk.java.net Tue Feb 16 16:32:56 2021 From: github.com+168222+mgkwill at openjdk.java.net (Marcus G K Williams) Date: Tue, 16 Feb 2021 16:32:56 GMT Subject: RFR: JDK-8256155: os::Linux Populate all large_page_sizes, select smallest page size in reserve_memory_special_huge_tlbfs* [v16] In-Reply-To: References: Message-ID: > When using LargePageSizeInBytes=1G, os::Linux::reserve_memory_special_huge_tlbfs* cannot select large pages smaller than 1G. Code heap usually uses less than 1G, so currently the code precludes code heap from using > Large pages in this circumstance and when os::Linux::reserve_memory_special_huge_tlbfs* is called page sizes fall back to Linux::page_size() (usually 4k). > > This change allows the above use case by populating all large_page_sizes present in /sys/kernel/mm/hugepages in _page_sizes upon calling os::Linux::setup_large_page_size(). > > In os::Linux::reserve_memory_special_huge_tlbfs* we then select the largest large page size available in _page_sizes that is smaller than bytes being reserved. Marcus G K Williams has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 25 commits: - Merge branch 'master' into pull/1153 - kstefanj update Signed-off-by: Marcus G K Williams - Merge branch 'master' into update_hlp - Merge branch 'master' into update_hlp - Remove extraneous ' from warning Signed-off-by: Marcus G K Williams - Merge branch 'master' into update_hlp - Merge branch 'master' into update_hlp - Merge branch 'master' into update_hlp - Fix os::large_page_size() in last update Signed-off-by: Marcus G K Williams - Ivan W. Requested Changes Removed os::Linux::select_large_page_size and use os::page_size_for_region instead Removed Linux::find_large_page_size and use register_large_page_sizes. Streamlined Linux::setup_large_page_size Signed-off-by: Marcus G K Williams - ... and 15 more: https://git.openjdk.java.net/jdk/compare/f4cfd758...f2e44ac7 ------------- Changes: https://git.openjdk.java.net/jdk/pull/1153/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1153&range=15 Stats: 71 lines in 2 files changed: 32 ins; 10 del; 29 mod Patch: https://git.openjdk.java.net/jdk/pull/1153.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1153/head:pull/1153 PR: https://git.openjdk.java.net/jdk/pull/1153 From redestad at openjdk.java.net Tue Feb 16 16:36:40 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Tue, 16 Feb 2021 16:36:40 GMT Subject: RFR: 8261553: Efficient mask generation using BMI2 BZHI instruction [v3] In-Reply-To: References: Message-ID: On Mon, 15 Feb 2021 18:51:58 GMT, Jatin Bhateja wrote: >>> > Hi Claes, This could be a run to run variation, in general we are now having fewer number of instructions (one shift operation saved per mask computation) compared to previous masked generation sequence and thus it will always offer better execution latencies. >>> >>> Run-to-run variation would be easy to rule out by running more forks and more iterations to attain statistically significant results. While the instruction manuals suggest latency should be better for this instruction on all CPUs where it's supported, it would be good if there was some clear proof - such as a significant benchmark win - to motivate the added complexity. >> >> BASELINE: >> Result "org.openjdk.bench.java.lang.ArrayCopyUnalignedSrc.testLong": >> 61.037 ns/op >> >> Secondary result "org.openjdk.bench.java.lang.ArrayCopyUnalignedSrc.testLong:??perf": >> Perf stats: >> -------------------------------------------------- >> >> 19,739.21 msec task-clock # 0.389 CPUs utilized >> 646 context-switches # 0.033 K/sec >> 12 cpu-migrations # 0.001 K/sec >> 150 page-faults # 0.008 K/sec >> 74,59,83,59,139 cycles # 3.779 GHz (30.73%) >> 1,78,78,79,19,117 instructions # 2.40 insn per cycle (38.48%) >> 24,79,81,63,651 branches # 1256.289 M/sec (38.55%) >> 32,24,89,924 branch-misses # 1.30% of all branches (38.62%) >> 52,56,88,28,472 L1-dcache-loads # 2663.167 M/sec (38.65%) >> 39,00,969 L1-dcache-load-misses # 0.01% of all L1-dcache hits (38.57%) >> 3,74,131 LLC-loads # 0.019 M/sec (30.77%) >> 22,315 LLC-load-misses # 5.96% of all LL-cache hits (30.72%) >> L1-icache-loads >> 17,49,997 L1-icache-load-misses (30.72%) >> 52,91,41,70,636 dTLB-loads # 2680.663 M/sec (30.69%) >> 3,315 dTLB-load-misses # 0.00% of all dTLB cache hits (30.67%) >> 4,674 iTLB-loads # 0.237 K/sec (30.65%) >> 33,746 iTLB-load-misses # 721.99% of all iTLB cache hits (30.63%) >> L1-dcache-prefetches >> L1-dcache-prefetch-misses >> >> 50.723759146 seconds time elapsed >> >> 51.447054000 seconds user >> 0.189949000 seconds sys >> >> >> WITH OPT: >> Result "org.openjdk.bench.java.lang.ArrayCopyUnalignedSrc.testLong": >> 74.356 ns/op >> >> Secondary result "org.openjdk.bench.java.lang.ArrayCopyUnalignedSrc.testLong:??perf": >> Perf stats: >> -------------------------------------------------- >> >> 19,741.09 msec task-clock # 0.389 CPUs utilized >> 641 context-switches # 0.032 K/sec >> 17 cpu-migrations # 0.001 K/sec >> 164 page-faults # 0.008 K/sec >> 74,40,40,48,513 cycles # 3.769 GHz (30.81%) >> 1,45,66,22,06,797 instructions # 1.96 insn per cycle (38.56%) >> 20,31,28,43,577 branches # 1028.963 M/sec (38.65%) >> 14,11,419 branch-misses # 0.01% of all branches (38.69%) >> 43,07,86,33,662 L1-dcache-loads # 2182.182 M/sec (38.72%) >> 37,06,744 L1-dcache-load-misses # 0.01% of all L1-dcache hits (38.56%) >> 1,34,292 LLC-loads # 0.007 M/sec (30.72%) >> 30,627 LLC-load-misses # 22.81% of all LL-cache hits (30.68%) >> L1-icache-loads >> 14,49,145 L1-icache-load-misses (30.65%) >> 43,44,86,27,516 dTLB-loads # 2200.924 M/sec (30.63%) >> 218 dTLB-load-misses # 0.00% of all dTLB cache hits (30.63%) >> 2,445 iTLB-loads # 0.124 K/sec (30.63%) >> 28,624 iTLB-load-misses # 1170.72% of all iTLB cache hits (30.63%) >> L1-dcache-prefetches >> L1-dcache-prefetch-misses >> >> 50.716083931 seconds time elapsed >> >> 51.467300000 seconds user >> 0.200390000 seconds sys >> >> >> JMH perf data for ArrayCopyUnalignedSrc.testLong with copy length of 1200 shows degradation in LID accesses, it seems the benchmask got displaced from its sweet spot. >> >> But, there is a significant reduction in instruction count and cycles are almost comparable. We are saving one shift per mask computation. >> >> OLD Sequence: >> 0x00007f7fc1030ead: movabs $0x1,%rax >> 0x00007f7fc1030eb7: shlx %r8,%rax,%rax >> 0x00007f7fc1030ebc: dec %rax >> 0x00007f7fc1030ebf: kmovq %rax,%k2 >> NEW Sequence: >> 0x00007f775d030d51: movabs $0xffffffffffffffff,%rax >> 0x00007f775d030d5b: bzhi %r8,%rax,%rax >> 0x00007f775d030d60: kmovq %rax,%k2 > > Further analysis of perf degradation revealed that with new optimized instruction pattern, code alignment got disturbed. This led to increase in LSD misses, also it reduced the UOPs cashing in DSB. > Aligning copy loops at 32 byte boundary prevents any adverse impact on UOP caching. > NOPs used for padding add up to the instruction count and thus may over shadow the code size gains due to new mask generation sequence in copy stubs. > > Baseline: > ArrayCopyAligned.testLong Length : 1200 61 ns/op (approx) > 1,93,44,43,11,622 cycles > 4,59,57,99,78,727 instructions # 2.38 insn per cycle > 1,83,68,75,68,255 idq.dsb_uops > 2,08,32,43,71,906 lsd.uops > 37,12,54,60,211 idq.mite_uops > > With Opt: > ArrayCopyAligned.testLong Length : 1200 74 ns/op (approx) > 1,93,51,25,94,766 cycles > 3,75,11,57,91,917 instructions # 1.94 insn per cycle > 48,67,58,25,566 idq.dsb_uops > 19,46,13,236 lsd.uops > 2,87,42,95,74,280 idq.mite_uops > > With Opt + main loop alignment(nop): 61 ns/op (approx) > ArrayCopyAligned.testLong Length : 1200 > 1,93,52,15,90,080 cycles > 4,60,89,14,06,528 instructions # 2.38 insn per cycle > 1,78,76,10,34,991 idq.dsb_uops > 2,09,16,15,84,313 lsd.uops > 46,25,31,92,101 idq.mite_uops > > > While computing the mask for partial in-lining of small copy calls ( currently enabled for sub-word types with copy length less than 32/64 bytes), new optimized sequence should always offer lower instruction count and latency path. > > > Baseline: > ArrayCopyAligned.testByte Length : 20 avgt 2 2.635 ns/op > 1,97,76,75,18,052 cycles > 8,96,00,37,11,803 instructions # 4.53 insn per cycle > 2,71,83,79,035 idq.dsb_uops > 7,54,82,43,63,409 lsd.uops > 3,92,55,74,395 idq.mite_uops > > ArrayCopyAligned.testByte Length : 31 avgt 2 2.635 ns/op > 1,97,79,16,56,787 cycles > 8,96,13,15,69,780 instructions # 4.53 insn per cycle > 2,69,07,11,691 idq.dsb_uops > 7,54,95,63,77,683 lsd.uops > 3,90,19,10,747 idq.mite_uops > > WithOpt: > ArrayCopyAligned.testByte Length : 20 avgt 2 2.635 ns/op > 1,97,66,64,62,541 cycles > 8,92,03,95,00,236 instructions # 4.51 insn per cycle > 2,72,38,56,205 idq.dsb_uops > 7,50,87,50,60,591 lsd.uops > 3,89,15,02,954 idq.mite_uops > > ArrayCopyAligned.testByte Length : 31 avgt 2 2.635 ns/op > 1,97,54,21,61,110 cycles > 8,91,46,64,23,754 instructions # 4.51 insn per cycle > 2,78,12,19,544 idq.dsb_uops > 7,50,35,88,95,843 lsd.uops > 3,90,41,97,276 idq.mite_uops > > > Following are the links to updated JMH perf data: > http://cr.openjdk.java.net/~jbhateja/8261553/JMH_PERF_CLX_BASELINE.txt > http://cr.openjdk.java.net/~jbhateja/8261553/JMH_PERF_CLX_WITH_OPTS_LOOP_ALIGN.txt > > In general gains are not significant in case of copy stubs, but new sequence offers a optimal latency path for mask computation sequence. Thanks for getting to the bottom of that regression. ------------- PR: https://git.openjdk.java.net/jdk/pull/2522 From dcubed at openjdk.java.net Tue Feb 16 17:50:40 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Tue, 16 Feb 2021 17:50:40 GMT Subject: RFR: 8261075: Create stubRoutines.inline.hpp with SafeFetch implementation [v2] In-Reply-To: References: Message-ID: On Tue, 16 Feb 2021 13:04:04 GMT, Anton Kozlov wrote: >> Hi, >> >> Please reivew a small non-functional change that extracts inline SafeFetch functions to a separate file. This is preliminary work for JEP-391 integration that will reduce the size of that patch. >> >> CC @dcubed-ojdk >> >> Thanks! > > Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > stubRoutines.inline.hpp -> safefetch.hpp Still thumbs up. ------------- Marked as reviewed by dcubed (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2542 From gziemski at openjdk.java.net Tue Feb 16 19:00:47 2021 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Tue, 16 Feb 2021 19:00:47 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v6] In-Reply-To: References: Message-ID: <-3-_jZX9uZJptkKvOOYeagiPnij3xQqqVCBVhgOtK5o=.95794b96-18ac-41ec-a793-16d097e8efd4@github.com> On Tue, 9 Feb 2021 05:35:03 GMT, Thomas Stuefe wrote: >> In signal handling code, we have code sections which save signal handler state into vectors of sigaction structures, or of integers (if only flags are saved). All these code sections can be unified, disentangled and the using code simplified. >> >> There are three places where we do this: >> >> 1) When installing hotspot signal handlers, should we find a handler in place and signal chaining is enabled, we save the original handler inside a sigaction array and a corresponding sigset: >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L85 >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L338 >> >> 2) if diagnostics are enabled with -Xcheck:jni, we periodically check if our hotspot signal handlers had been replaced (`static void check_signal_handler(int sig)`): >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L766 >> To do that, we store information about the handlers we installed and we expect to be intact; in this case we only store the sigaction flags (`int sigflags[NSIG];`) and deduce the handler address from context. >> >> 3) There is a complicated dance between VMError and the posix signal handler code: If a fatal error happens, we enter error reporting and install the secondary handler (`VMError::install_secondary_signal_handler()`). Before doing that, we store the handler we replace in yet another array, in this case one array for the handler address, one for the flag: >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/vmError_posix.cpp#L77 >> I believe the purpose of this is to - when printing signal handlers as part of error reporting - print the original signal handler instead of the secondary crash handler (see `PosixSignals::print_signal_handler()`): >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L1372 >> and additionally to not trip this warning here: >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L1391 >> >> ------ >> >> Changes in this patch: >> >> - I added some convenience macros to check if a handler matches a given function (HANDLER_IS), check if a handler is set to ignore or default or both (HANDLER_IS_IGN, HANDLER_IS_DFL, HANDLER_IS_IGN_OR_DFL). Makes code more readable. >> - I added convenience class `SavedSignalHandlers` to keep a vector of handler information by signal number. >> - I used that class to cover cases (1)..(3): >> - `chained_handlers` contains all information of chained handlers >> - `expected_handlers` contains a copy of the handlers the hotspot installed >> - `replaced_handlers` contains information about replaced handlers >> >> - about (1): I store the chained signal handler information in `chained_handlers` when installing a hotspot handler, UseSignalChaining is 1, and a non-default handler was encountered. >> >> - about (2): I simplified the signal checking mechanism quite a bit: it compares the handler (address and flags) it finds present with expectations. Before this patch, the expected handler address was deduced in a hard-wired way, now, we just compare the active sigaction structure with the one we installed on VM start. >> >> - about (3): when installing any handler (hotspot as well as user defined via java), I store the handler it replaced in `replaced_handlers`. I use that to print which handler had been replaced in `PosixSignals::print_signal_handler`. I simplified `PosixSignals::print_signal_handler` such that it does not retain any knowledge about hotspot signal handlers. Now, it just prints out the currently established handlers. In addition to that, it prints out chaining information and which handlers had been replaced. I removed the associated coding from VMError. >> >> Output Before: >> 663 Signal Handlers: >> 664 SIGSEGV: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 665 SIGBUS: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 666 SIGFPE: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 667 SIGPIPE: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 668 SIGXFSZ: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 669 SIGILL: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 670 SIGUSR2: SR_handler in libjvm.so, sa_mask[0]=00000000000000000000000000000000, sa_flags=SA_RESTART|SA_SIGINFO >> 671 SIGHUP: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 672 SIGINT: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 673 SIGTERM: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 674 SIGQUIT: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 675 SIGTRAP: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> >> Now: >> Signal Handlers: >> SIGSEGV: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGSEGV: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGBUS: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGBUS: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGFPE: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGFPE: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGPIPE: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGXFSZ: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGILL: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGILL: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGUSR2: SR_handler in libjvm.so, mask=00000000000000000000000000000000, flags=SA_RESTART|SA_SIGINFO >> SIGHUP: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGINT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGTERM: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGQUIT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGTRAP: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> >> ----- >> Tests: GA, and the patch has been tested in our nighlies for over a month now. I manually executed the runtime/jni/checked tests too. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Use universal zero initializer for do_check_signal_periodically src/hotspot/os/posix/signals_posix.cpp line 1222: > 1220: // Query the current signal handler. Needs to be a separate operation > 1221: // from installing a new handler since we need to honor AllowUserSignalHandlers. > 1222: void* oldhand = get_signal_handler(&oldAct); It's a pre-existing issue, but I dislike the **"old"** in **oldhand** and **oldAct**. It makes it sound like it is a cached or previous value, which is especially confusing when in the comment we refer to it as the **current** one. Any chance we can clean this up in this fix? ------------- PR: https://git.openjdk.java.net/jdk/pull/2251 From gziemski at openjdk.java.net Tue Feb 16 19:06:43 2021 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Tue, 16 Feb 2021 19:06:43 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v6] In-Reply-To: References: Message-ID: <-Em8H1HesE3XG9TSUE5h6h1E6xdJNzD6nkO9qwt-67k=.0ca7d14e-8b61-40a8-9524-3ec41c2f316b@github.com> On Tue, 9 Feb 2021 05:35:03 GMT, Thomas Stuefe wrote: >> In signal handling code, we have code sections which save signal handler state into vectors of sigaction structures, or of integers (if only flags are saved). All these code sections can be unified, disentangled and the using code simplified. >> >> There are three places where we do this: >> >> 1) When installing hotspot signal handlers, should we find a handler in place and signal chaining is enabled, we save the original handler inside a sigaction array and a corresponding sigset: >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L85 >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L338 >> >> 2) if diagnostics are enabled with -Xcheck:jni, we periodically check if our hotspot signal handlers had been replaced (`static void check_signal_handler(int sig)`): >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L766 >> To do that, we store information about the handlers we installed and we expect to be intact; in this case we only store the sigaction flags (`int sigflags[NSIG];`) and deduce the handler address from context. >> >> 3) There is a complicated dance between VMError and the posix signal handler code: If a fatal error happens, we enter error reporting and install the secondary handler (`VMError::install_secondary_signal_handler()`). Before doing that, we store the handler we replace in yet another array, in this case one array for the handler address, one for the flag: >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/vmError_posix.cpp#L77 >> I believe the purpose of this is to - when printing signal handlers as part of error reporting - print the original signal handler instead of the secondary crash handler (see `PosixSignals::print_signal_handler()`): >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L1372 >> and additionally to not trip this warning here: >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L1391 >> >> ------ >> >> Changes in this patch: >> >> - I added some convenience macros to check if a handler matches a given function (HANDLER_IS), check if a handler is set to ignore or default or both (HANDLER_IS_IGN, HANDLER_IS_DFL, HANDLER_IS_IGN_OR_DFL). Makes code more readable. >> - I added convenience class `SavedSignalHandlers` to keep a vector of handler information by signal number. >> - I used that class to cover cases (1)..(3): >> - `chained_handlers` contains all information of chained handlers >> - `expected_handlers` contains a copy of the handlers the hotspot installed >> - `replaced_handlers` contains information about replaced handlers >> >> - about (1): I store the chained signal handler information in `chained_handlers` when installing a hotspot handler, UseSignalChaining is 1, and a non-default handler was encountered. >> >> - about (2): I simplified the signal checking mechanism quite a bit: it compares the handler (address and flags) it finds present with expectations. Before this patch, the expected handler address was deduced in a hard-wired way, now, we just compare the active sigaction structure with the one we installed on VM start. >> >> - about (3): when installing any handler (hotspot as well as user defined via java), I store the handler it replaced in `replaced_handlers`. I use that to print which handler had been replaced in `PosixSignals::print_signal_handler`. I simplified `PosixSignals::print_signal_handler` such that it does not retain any knowledge about hotspot signal handlers. Now, it just prints out the currently established handlers. In addition to that, it prints out chaining information and which handlers had been replaced. I removed the associated coding from VMError. >> >> Output Before: >> 663 Signal Handlers: >> 664 SIGSEGV: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 665 SIGBUS: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 666 SIGFPE: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 667 SIGPIPE: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 668 SIGXFSZ: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 669 SIGILL: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 670 SIGUSR2: SR_handler in libjvm.so, sa_mask[0]=00000000000000000000000000000000, sa_flags=SA_RESTART|SA_SIGINFO >> 671 SIGHUP: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 672 SIGINT: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 673 SIGTERM: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 674 SIGQUIT: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 675 SIGTRAP: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> >> Now: >> Signal Handlers: >> SIGSEGV: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGSEGV: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGBUS: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGBUS: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGFPE: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGFPE: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGPIPE: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGXFSZ: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGILL: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGILL: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGUSR2: SR_handler in libjvm.so, mask=00000000000000000000000000000000, flags=SA_RESTART|SA_SIGINFO >> SIGHUP: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGINT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGTERM: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGQUIT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGTRAP: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> >> ----- >> Tests: GA, and the patch has been tested in our nighlies for over a month now. I manually executed the runtime/jni/checked tests too. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Use universal zero initializer for do_check_signal_periodically src/hotspot/os/posix/signals_posix.cpp line 153: > 151: // and compare it periodically against reality (see os::run_periodic_checks()). > 152: static bool check_signals = true; > 153: static SavedSignalHandlers expected_handlers; I personally would prefer `vm_handlers` or `java_handlers` or `our_handlers` here instead of `expected_handlers`. ------------- PR: https://git.openjdk.java.net/jdk/pull/2251 From gziemski at openjdk.java.net Tue Feb 16 19:10:43 2021 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Tue, 16 Feb 2021 19:10:43 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v6] In-Reply-To: References: Message-ID: On Tue, 9 Feb 2021 05:35:03 GMT, Thomas Stuefe wrote: >> In signal handling code, we have code sections which save signal handler state into vectors of sigaction structures, or of integers (if only flags are saved). All these code sections can be unified, disentangled and the using code simplified. >> >> There are three places where we do this: >> >> 1) When installing hotspot signal handlers, should we find a handler in place and signal chaining is enabled, we save the original handler inside a sigaction array and a corresponding sigset: >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L85 >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L338 >> >> 2) if diagnostics are enabled with -Xcheck:jni, we periodically check if our hotspot signal handlers had been replaced (`static void check_signal_handler(int sig)`): >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L766 >> To do that, we store information about the handlers we installed and we expect to be intact; in this case we only store the sigaction flags (`int sigflags[NSIG];`) and deduce the handler address from context. >> >> 3) There is a complicated dance between VMError and the posix signal handler code: If a fatal error happens, we enter error reporting and install the secondary handler (`VMError::install_secondary_signal_handler()`). Before doing that, we store the handler we replace in yet another array, in this case one array for the handler address, one for the flag: >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/vmError_posix.cpp#L77 >> I believe the purpose of this is to - when printing signal handlers as part of error reporting - print the original signal handler instead of the secondary crash handler (see `PosixSignals::print_signal_handler()`): >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L1372 >> and additionally to not trip this warning here: >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L1391 >> >> ------ >> >> Changes in this patch: >> >> - I added some convenience macros to check if a handler matches a given function (HANDLER_IS), check if a handler is set to ignore or default or both (HANDLER_IS_IGN, HANDLER_IS_DFL, HANDLER_IS_IGN_OR_DFL). Makes code more readable. >> - I added convenience class `SavedSignalHandlers` to keep a vector of handler information by signal number. >> - I used that class to cover cases (1)..(3): >> - `chained_handlers` contains all information of chained handlers >> - `expected_handlers` contains a copy of the handlers the hotspot installed >> - `replaced_handlers` contains information about replaced handlers >> >> - about (1): I store the chained signal handler information in `chained_handlers` when installing a hotspot handler, UseSignalChaining is 1, and a non-default handler was encountered. >> >> - about (2): I simplified the signal checking mechanism quite a bit: it compares the handler (address and flags) it finds present with expectations. Before this patch, the expected handler address was deduced in a hard-wired way, now, we just compare the active sigaction structure with the one we installed on VM start. >> >> - about (3): when installing any handler (hotspot as well as user defined via java), I store the handler it replaced in `replaced_handlers`. I use that to print which handler had been replaced in `PosixSignals::print_signal_handler`. I simplified `PosixSignals::print_signal_handler` such that it does not retain any knowledge about hotspot signal handlers. Now, it just prints out the currently established handlers. In addition to that, it prints out chaining information and which handlers had been replaced. I removed the associated coding from VMError. >> >> Output Before: >> 663 Signal Handlers: >> 664 SIGSEGV: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 665 SIGBUS: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 666 SIGFPE: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 667 SIGPIPE: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 668 SIGXFSZ: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 669 SIGILL: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 670 SIGUSR2: SR_handler in libjvm.so, sa_mask[0]=00000000000000000000000000000000, sa_flags=SA_RESTART|SA_SIGINFO >> 671 SIGHUP: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 672 SIGINT: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 673 SIGTERM: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 674 SIGQUIT: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 675 SIGTRAP: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> >> Now: >> Signal Handlers: >> SIGSEGV: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGSEGV: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGBUS: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGBUS: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGFPE: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGFPE: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGPIPE: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGXFSZ: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGILL: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGILL: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGUSR2: SR_handler in libjvm.so, mask=00000000000000000000000000000000, flags=SA_RESTART|SA_SIGINFO >> SIGHUP: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGINT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGTERM: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGQUIT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGTRAP: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> >> ----- >> Tests: GA, and the patch has been tested in our nighlies for over a month now. I manually executed the runtime/jni/checked tests too. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Use universal zero initializer for do_check_signal_periodically src/hotspot/os/posix/signals_posix.cpp line 817: > 815: const int expected_flags = get_sanitized_sa_flags(expected_sa); > 816: return this_handler == expected_handler && > 817: this_flags == expected_flags; Could we use brackets here like: return ((this_handler == expected_handler) && (this_flags == expected_flags)); ------------- PR: https://git.openjdk.java.net/jdk/pull/2251 From gziemski at openjdk.java.net Tue Feb 16 19:16:44 2021 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Tue, 16 Feb 2021 19:16:44 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v6] In-Reply-To: References: Message-ID: On Tue, 9 Feb 2021 05:35:03 GMT, Thomas Stuefe wrote: >> In signal handling code, we have code sections which save signal handler state into vectors of sigaction structures, or of integers (if only flags are saved). All these code sections can be unified, disentangled and the using code simplified. >> >> There are three places where we do this: >> >> 1) When installing hotspot signal handlers, should we find a handler in place and signal chaining is enabled, we save the original handler inside a sigaction array and a corresponding sigset: >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L85 >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L338 >> >> 2) if diagnostics are enabled with -Xcheck:jni, we periodically check if our hotspot signal handlers had been replaced (`static void check_signal_handler(int sig)`): >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L766 >> To do that, we store information about the handlers we installed and we expect to be intact; in this case we only store the sigaction flags (`int sigflags[NSIG];`) and deduce the handler address from context. >> >> 3) There is a complicated dance between VMError and the posix signal handler code: If a fatal error happens, we enter error reporting and install the secondary handler (`VMError::install_secondary_signal_handler()`). Before doing that, we store the handler we replace in yet another array, in this case one array for the handler address, one for the flag: >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/vmError_posix.cpp#L77 >> I believe the purpose of this is to - when printing signal handlers as part of error reporting - print the original signal handler instead of the secondary crash handler (see `PosixSignals::print_signal_handler()`): >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L1372 >> and additionally to not trip this warning here: >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L1391 >> >> ------ >> >> Changes in this patch: >> >> - I added some convenience macros to check if a handler matches a given function (HANDLER_IS), check if a handler is set to ignore or default or both (HANDLER_IS_IGN, HANDLER_IS_DFL, HANDLER_IS_IGN_OR_DFL). Makes code more readable. >> - I added convenience class `SavedSignalHandlers` to keep a vector of handler information by signal number. >> - I used that class to cover cases (1)..(3): >> - `chained_handlers` contains all information of chained handlers >> - `expected_handlers` contains a copy of the handlers the hotspot installed >> - `replaced_handlers` contains information about replaced handlers >> >> - about (1): I store the chained signal handler information in `chained_handlers` when installing a hotspot handler, UseSignalChaining is 1, and a non-default handler was encountered. >> >> - about (2): I simplified the signal checking mechanism quite a bit: it compares the handler (address and flags) it finds present with expectations. Before this patch, the expected handler address was deduced in a hard-wired way, now, we just compare the active sigaction structure with the one we installed on VM start. >> >> - about (3): when installing any handler (hotspot as well as user defined via java), I store the handler it replaced in `replaced_handlers`. I use that to print which handler had been replaced in `PosixSignals::print_signal_handler`. I simplified `PosixSignals::print_signal_handler` such that it does not retain any knowledge about hotspot signal handlers. Now, it just prints out the currently established handlers. In addition to that, it prints out chaining information and which handlers had been replaced. I removed the associated coding from VMError. >> >> Output Before: >> 663 Signal Handlers: >> 664 SIGSEGV: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 665 SIGBUS: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 666 SIGFPE: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 667 SIGPIPE: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 668 SIGXFSZ: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 669 SIGILL: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 670 SIGUSR2: SR_handler in libjvm.so, sa_mask[0]=00000000000000000000000000000000, sa_flags=SA_RESTART|SA_SIGINFO >> 671 SIGHUP: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 672 SIGINT: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 673 SIGTERM: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 674 SIGQUIT: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 675 SIGTRAP: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> >> Now: >> Signal Handlers: >> SIGSEGV: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGSEGV: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGBUS: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGBUS: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGFPE: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGFPE: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGPIPE: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGXFSZ: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGILL: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGILL: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGUSR2: SR_handler in libjvm.so, mask=00000000000000000000000000000000, flags=SA_RESTART|SA_SIGINFO >> SIGHUP: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGINT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGTERM: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGQUIT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGTRAP: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> >> ----- >> Tests: GA, and the patch has been tested in our nighlies for over a month now. I manually executed the runtime/jni/checked tests too. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Use universal zero initializer for do_check_signal_periodically src/hotspot/os/posix/signals_posix.cpp line 845: > 843: > 844: // Compare both sigaction structures (intelligently; only the members we care about). > 845: if (!compare_handler_info(&act, expected_act)) { The name `compare_handler_info()` makes it sound like a function that tells whether we should compare the values, not whether the values' comparison evaluates to true. How about `are_handlers_equal()`? ------------- PR: https://git.openjdk.java.net/jdk/pull/2251 From enikitin at openjdk.java.net Tue Feb 16 19:19:59 2021 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Tue, 16 Feb 2021 19:19:59 GMT Subject: RFR: 8058176: [mlvm] tests should not allow code cache exhaustion [v3] In-Reply-To: References: Message-ID: > Another approach to the JDK-8058176 and #2440 - never allowing the tests hit CodeCache limits. The most significant consumer is the MH graph builder (the MHTransformationGen), whose consumption is now controlled. List of changes: > > * Code cache size getters are added to WhiteBox; > * MH sequences are now built with remaining Code cache size in mind (always let 2M clearance); > * Dependencies on WhiteBox added for all affected tests; > * The test cases in question un-problemlisted. > > Testing: the whole vmTestbase/vm/mlvm/ in win-lin-mac x86. Evgeny Nikitin has updated the pull request incrementally with two additional commits since the last revision: - Fix 'cycles to build' error output - Add support for segmented CodeCache ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2523/files - new: https://git.openjdk.java.net/jdk/pull/2523/files/71af7185..763d94b8 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2523&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2523&range=01-02 Stats: 31 lines in 1 file changed: 23 ins; 1 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/2523.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2523/head:pull/2523 PR: https://git.openjdk.java.net/jdk/pull/2523 From enikitin at openjdk.java.net Tue Feb 16 19:32:44 2021 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Tue, 16 Feb 2021 19:32:44 GMT Subject: RFR: 8058176: [mlvm] tests should not allow code cache exhaustion [v2] In-Reply-To: <2_Gpraz6NaY17HPfRDW-LD-sQrrPQ4dpIVP8vikpdXM=.d425cd8b-aea5-43be-865e-72229db81e6e@github.com> References: <2_Gpraz6NaY17HPfRDW-LD-sQrrPQ4dpIVP8vikpdXM=.d425cd8b-aea5-43be-865e-72229db81e6e@github.com> Message-ID: On Fri, 12 Feb 2021 20:03:01 GMT, Igor Ignatyev wrote: >> Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: >> >> Switch to ManagementBeans approach instead of the WhiteBox one > > test/hotspot/jtreg/vmTestbase/vm/mlvm/meth/share/MHTransformationGen.java line 69: > >> 67: private static final boolean USE_THROW_CATCH = false; // Test bugs >> 68: >> 69: private static final MemoryPoolMXBean CODE_CACHE_MX_BEAN = ManagementFactory > > does it work w/ both `-XX:+SegmentedCodeCache` and `-XX:-SegmentedCodeCache`? > If I remember correctly (@TobiHartmann , please correct me if I'm wrong), `CodeCache` pool exists when `SegmentedCodeCache` is disabled, when it's enabled, you will have 3 different pools (one for each "CodeHeap"), and here we would need to use one for `non-nmethod` codeheap. > > -- Igor Thanks for the info about the segmented code cache. I did some research and found that the opposite is true - both nmethod pools ('profiled' and 'non-profiled') are growing along with the MH graph growth. This is supported by the specification for non-method code heap at: https://docs.oracle.com/en/java/javase/15/vm/java-hotspot-virtual-machine-performance-enhancements.html#GUID-1D9B26AD-8E0A-4771-90DA-A81A2C1F5B55 Please check the the fixed version. > test/hotspot/jtreg/vmTestbase/vm/mlvm/meth/share/MHTransformationGen.java line 107: > >> 105: if (isCodeCacheEffectivelyFull()) { >> 106: Env.traceNormal("Not enought code cache to build up MH sequences anymore. " + >> 107: " Has only been able to achieve " + (MAX_CYCLES - i) + " out of " + MAX_CYCLES); > > given `nextInt(x)` returns a random number from `[0; x]`, we might have achieved more (or less) `MAX_CYCLES - i`, i.e. that part of the message is incorrect, I'd just remove it. Fixed by extracting the generated random number first. ------------- PR: https://git.openjdk.java.net/jdk/pull/2523 From gziemski at openjdk.java.net Tue Feb 16 19:34:41 2021 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Tue, 16 Feb 2021 19:34:41 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v6] In-Reply-To: References: Message-ID: On Mon, 15 Feb 2021 22:40:55 GMT, Gerard Ziemski wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> Use universal zero initializer for do_check_signal_periodically > > I did not get as far as I hoped today, I'll review more tomorrow. General comment: I get weird results when I try to run `java -XX:ErrorHandlerTest=2 -version` With the proposed changes, I get: oracle at dhcp-10-154-126-49 jdk % ./build/xcode/build/jdk/bin/java -XX:ErrorHandlerTest=2 -version # To suppress the following error report, specify this argument # after -XX: or in .hotspotrc: SuppressErrorAt=/vmError.cpp:1785 # # A fatal error has been detected by the Java Runtime Environment: # # [Too many errors, abort] zsh: abort ./build/xcode/build/jdk/bin/java -XX:ErrorHandlerTest=2 -version with several empty lines, which I deleted for compactness, whereas the original code produces: oracle at Oracles-MacBook-Pro-16 jdk_orig % ./build/xcode/build/jdk/bin/java -XX:ErrorHandlerTest=2 -version # To suppress the following error report, specify this argument # after -XX: or in .hotspotrc: SuppressErrorAt=/vmError.cpp:1785 # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/Volumes/Work/review/2251/jdk_orig/src/hotspot/share/utilities/vmError.cpp:1785), pid=69509, tid=7427 # guarantee(how == 0) failed: test guarantee # # JRE version: OpenJDK Runtime Environment (17.0) (fastdebug build 17-internal+0-adhoc.oracle.jdkorig) # Java VM: OpenJDK 64-Bit Server VM (fastdebug 17-internal+0-adhoc.oracle.jdkorig, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, bsd-amd64) # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # An error report file with more information is saved as: # /Volumes/Work/review/2251/jdk_orig/hs_err_pid69509.log # # If you would like to submit a bug report, please visit: # https://bugreport.java.com/bugreport/crash.jsp # zsh: abort ./build/xcode/build/jdk/bin/java -XX:ErrorHandlerTest=2 -version so something is off. ------------- PR: https://git.openjdk.java.net/jdk/pull/2251 From iignatyev at openjdk.java.net Tue Feb 16 19:52:09 2021 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Tue, 16 Feb 2021 19:52:09 GMT Subject: RFR: 8058176: [mlvm] tests should not allow code cache exhaustion [v2] In-Reply-To: References: <2_Gpraz6NaY17HPfRDW-LD-sQrrPQ4dpIVP8vikpdXM=.d425cd8b-aea5-43be-865e-72229db81e6e@github.com> Message-ID: On Tue, 16 Feb 2021 19:29:42 GMT, Evgeny Nikitin wrote: >> test/hotspot/jtreg/vmTestbase/vm/mlvm/meth/share/MHTransformationGen.java line 69: >> >>> 67: private static final boolean USE_THROW_CATCH = false; // Test bugs >>> 68: >>> 69: private static final MemoryPoolMXBean CODE_CACHE_MX_BEAN = ManagementFactory >> >> does it work w/ both `-XX:+SegmentedCodeCache` and `-XX:-SegmentedCodeCache`? >> If I remember correctly (@TobiHartmann , please correct me if I'm wrong), `CodeCache` pool exists when `SegmentedCodeCache` is disabled, when it's enabled, you will have 3 different pools (one for each "CodeHeap"), and here we would need to use one for `non-nmethod` codeheap. >> >> -- Igor > > Thanks for the info about the segmented code cache. I did some research and found that the opposite is true - both nmethod pools ('profiled' and 'non-profiled') are growing along with the MH graph growth. This is supported by the specification for non-method code heap at: > > https://docs.oracle.com/en/java/javase/15/vm/java-hotspot-virtual-machine-performance-enhancements.html#GUID-1D9B26AD-8E0A-4771-90DA-A81A2C1F5B55 > > Please check the the fixed version. o/c they grow, b/c we use them for compiled code *and* if there is no space in non-nmethod heap, we use them for adapters as well, so I guess that the growth that you see is already after non-nmethod heap got exhausted. I'd recommend you simply use the sum of all available code-heaps (this will increase the possibility of false-positive results due to segmentation, but I don't think it matters much here). ------------- PR: https://git.openjdk.java.net/jdk/pull/2523 From never at openjdk.java.net Tue Feb 16 20:23:00 2021 From: never at openjdk.java.net (Tom Rodriguez) Date: Tue, 16 Feb 2021 20:23:00 GMT Subject: RFR: 8261846: [JVMCI] c2v_iterateFrames can get out of sync with the StackFrameStream Message-ID: c2v_iterateFrames mixes a StackFrameSteam and vframes and the vframe factory method can silently skip stub frames. The could leave the StackFrameStream out of sync with the vframe walk. This can cause the iteration fail in strange ways and assert in fastdebug builds. ------------- Commit messages: - Keep StackFrameStream in sync with vframes Changes: https://git.openjdk.java.net/jdk/pull/2594/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2594&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261846 Stats: 13 lines in 3 files changed: 9 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/2594.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2594/head:pull/2594 PR: https://git.openjdk.java.net/jdk/pull/2594 From stuefe at openjdk.java.net Tue Feb 16 21:23:02 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 16 Feb 2021 21:23:02 GMT Subject: RFR: JDK-8261552: s390: MacroAssembler::encode_klass_not_null() may produce wrong results for non-zero values of narrow klass base Message-ID: If Compressed class pointer base has a non-zero value it may cause MacroAssembler::encode_klass_not_null() to encode a Klass pointer to a wrong narrow pointer. This can be reproduced by starting the VM with -Xshare:dump -XX:HeapBaseMinAddress=2g -Xmx128m but CDS is not involved. It is only relevant insofar as this is the only way to get the following combination: - heap is allocated at 0x800_0000. It is small and ends at 0x8800_0000. - class space follows at 0x8800_0000 - the narrow klass pointer base points to the start of the class space at 0x8800_0000. In MacroAssembler::encode_klass_not_null(), there is the following section: if (base != NULL) { unsigned int base_h = ((unsigned long)base)>>32; unsigned int base_l = (unsigned int)((unsigned long)base); if ((base_h != 0) && (base_l == 0) && VM_Version::has_HighWordInstr()) { lgr_if_needed(dst, current); z_aih(dst, -((int)base_h)); // Base has no set bits in lower half. } else if ((base_h == 0) && (base_l != 0)) { (A) lgr_if_needed(dst, current); z_agfi(dst, -(int)base_l); (B) } else { load_const(Z_R0, base); lgr_if_needed(dst, current); z_sgr(dst, Z_R0); } current = dst; } We enter the condition at (A) if the narrow klass pointer base is non-zero but fits into 32bit. At (B), we want to substract the base from the Klass pointer; we do this by calculating the 32bit twos-complement of the base and add it with AGFI. AGFI adds a 32bit immediate to a 64bit register. In this case, it produces the wrong result if the base is >0x800_0000: In the case of the crash, we have: base: 8800_0000 klass pointer: 8804_1040 32bit two's complement of base: 7800_0000 added to the klass pointer: 1_0004_1040 So the result of the "substraction" is 1_0004_1040, it should be 4_1040, which would be the correct offset of the Klass* pointer within the ccs. This bug has been dormant; was activated by JDK-8250989 which changed the way class space reservation happens at CDS dump time. It surfaced first as crash in a CDS-specific jtreg test (JDK-8261552). ================ Fix: I changed the AGFI instruction to a pure 32bit add (AFI). That works as long as the Klass pointer also fits into 32bit. So I narrowed the condition at (A) to only fire if it can be ensured that both narrow base and Klass* pointers fit into 32bit. I also added a runtime verification in that case that any Klass pointer passed down is indeed a 32bit pointer. However, I am not really sure this is useful, or that this is the best way to do this (using TMHH and TMHL). I was looking for something like TMH or TML to check whole 32bit words but could not find any. ---- Tests: I manually tested that the crash disappears, which it does. I stepped through the encoding code and the values now look right. I also did build a VM with the ability to override both class space start address and the narrow klass pointer base to exact values (see https://github.com/openjdk/jdk/compare/master...tstuefe:override-ccs-start-and-base). I used this method to test various combinations: - narrow klass pointer base > 0 < 4g + ccs end < 4g (we hit our branch doing AFI) - narrow klass pointer base > 0 < 4g + ccs end > 4g (we hit the fallback doing SGR with r0) - narrow klass pointer base = 0 (we dont do anything) (would this override-feature be useful? We could do better testing). Thanks, Thomas ------------- Commit messages: - Remove whitespace - start Changes: https://git.openjdk.java.net/jdk/pull/2595/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2595&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261552 Stats: 42 lines in 3 files changed: 37 ins; 0 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/2595.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2595/head:pull/2595 PR: https://git.openjdk.java.net/jdk/pull/2595 From neliasso at openjdk.java.net Tue Feb 16 21:44:46 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Tue, 16 Feb 2021 21:44:46 GMT Subject: RFR: 8261553: Efficient mask generation using BMI2 BZHI instruction [v2] In-Reply-To: <7AXx69j_wscf8ENt98_apHTd1OKbeO80nNVpU68Z794=.7d776104-0f08-4e92-b229-bc210123fc4e@github.com> References: <7AXx69j_wscf8ENt98_apHTd1OKbeO80nNVpU68Z794=.7d776104-0f08-4e92-b229-bc210123fc4e@github.com> Message-ID: <8ShwNGfn1vuWGuS_kjH2zXFGyW3pObvxBZt3KivpwYU=.55fb12d3-10bf-41a6-88a4-2fa665c892ea@github.com> On Thu, 11 Feb 2021 12:25:53 GMT, Jatin Bhateja wrote: >> BMI2 BHZI instruction can be used to optimize the instruction sequence >> used for mask generation at various place in array copy stubs and partial in-lining for small copy operations. > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8261553: Adding BMI2 missing check for partial in-lining. Looks good. ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2522 From lucy at openjdk.java.net Tue Feb 16 22:04:10 2021 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Tue, 16 Feb 2021 22:04:10 GMT Subject: RFR: 8261447: MethodInvocationCounters frequently run into overflow [v6] In-Reply-To: References: Message-ID: > Dear community, > may I please request reviews for this fix, improving the usefulness of method invocation counters. > - aggregation counters are retyped as uint64_t, shifting the overflow probability way out (185 days in case of a 1 GHz counter update frequency). > - counters for individual methods are interpreted as (unsigned int), in contrast to their declaration as int. This gives us a factor of two before the counters overflow. > - as a special case, "compiled_invocation_counter" is retyped as long, because it has a higher update frequency than other counters. > - before/after sample output is attached to the bug description. > > Thank you! > Lutz Lutz Schmidt has updated the pull request incrementally with two additional commits since the last revision: - no incrementq for x86_32 - cleaning up the remaining mess ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2511/files - new: https://git.openjdk.java.net/jdk/pull/2511/files/0a99ee4e..faab64b0 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2511&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2511&range=04-05 Stats: 30 lines in 5 files changed: 0 ins; 15 del; 15 mod Patch: https://git.openjdk.java.net/jdk/pull/2511.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2511/head:pull/2511 PR: https://git.openjdk.java.net/jdk/pull/2511 From kvn at openjdk.java.net Wed Feb 17 00:48:37 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 17 Feb 2021 00:48:37 GMT Subject: RFR: 8261846: [JVMCI] c2v_iterateFrames can get out of sync with the StackFrameStream In-Reply-To: References: Message-ID: On Tue, 16 Feb 2021 20:17:30 GMT, Tom Rodriguez wrote: > c2v_iterateFrames mixes a StackFrameSteam and vframes and the vframe factory method can silently skip stub frames. The could leave the StackFrameStream out of sync with the vframe walk. This can cause the iteration fail in strange ways and assert in fastdebug builds. Update copyright year in vframe.hpp. Otherwise good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2594 From dlong at openjdk.java.net Wed Feb 17 02:35:42 2021 From: dlong at openjdk.java.net (Dean Long) Date: Wed, 17 Feb 2021 02:35:42 GMT Subject: RFR: 8261846: [JVMCI] c2v_iterateFrames can get out of sync with the StackFrameStream In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 00:46:12 GMT, Vladimir Kozlov wrote: >> c2v_iterateFrames mixes a StackFrameSteam and vframes and the vframe factory method can silently skip stub frames. The could leave the StackFrameStream out of sync with the vframe walk. This can cause the iteration fail in strange ways and assert in fastdebug builds. > > Update copyright year in vframe.hpp. > Otherwise good. Hi Tom. This code could be simplified and made faster using vframeStream as the iterator and vframeStream:asJavaVFrame to get the vframe. If you don't need the locals of every frame, then vframeStream::next is faster than vframe::sender. See JDK-8214329. ------------- PR: https://git.openjdk.java.net/jdk/pull/2594 From iklam at openjdk.java.net Wed Feb 17 05:32:05 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 17 Feb 2021 05:32:05 GMT Subject: RFR: 8261125: Move VM_Operation to vmOperation.hpp [v4] In-Reply-To: References: Message-ID: > vmOperations.hpp declares the VM_Operation class, as well as a hodge podge of subclasses such as VM_ForceSafepoint, VM_DeoptimizeFrame. > > Out of the 1000 hotspot .o files, about 680 include vmOperations.hpp (mostly transitively). In most cases, they just need to use the VM_Operation class. > > So we should move VM_Operation to its own header: vmOperation.hpp (no "s"). > > After the refactoring, vmOperations.hpp is included only 64 times. The inclusion count of threadSMR.hpp is also reduced from 687 to 99. HotSpot build time is improved by about 0.4%. > > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into 8261125-move-VM_Operation-to-vmOperation.hpp - Merge branch 'master' into 8261125-move-VM_Operation-to-vmOperation.hpp - interfaceSupport.inline.hpp needs VM_Exit from vmOperations.hpp for JVM_LEAF macro - 8261125: Move VM_Operation to vmOperation.hpp ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2398/files - new: https://git.openjdk.java.net/jdk/pull/2398/files/bab79154..5b8f1b4b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2398&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2398&range=02-03 Stats: 8513 lines in 340 files changed: 4350 ins; 2305 del; 1858 mod Patch: https://git.openjdk.java.net/jdk/pull/2398.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2398/head:pull/2398 PR: https://git.openjdk.java.net/jdk/pull/2398 From gziemski at openjdk.java.net Wed Feb 17 05:44:45 2021 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Wed, 17 Feb 2021 05:44:45 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v6] In-Reply-To: References: Message-ID: On Tue, 9 Feb 2021 05:35:03 GMT, Thomas Stuefe wrote: >> In signal handling code, we have code sections which save signal handler state into vectors of sigaction structures, or of integers (if only flags are saved). All these code sections can be unified, disentangled and the using code simplified. >> >> There are three places where we do this: >> >> 1) When installing hotspot signal handlers, should we find a handler in place and signal chaining is enabled, we save the original handler inside a sigaction array and a corresponding sigset: >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L85 >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L338 >> >> 2) if diagnostics are enabled with -Xcheck:jni, we periodically check if our hotspot signal handlers had been replaced (`static void check_signal_handler(int sig)`): >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L766 >> To do that, we store information about the handlers we installed and we expect to be intact; in this case we only store the sigaction flags (`int sigflags[NSIG];`) and deduce the handler address from context. >> >> 3) There is a complicated dance between VMError and the posix signal handler code: If a fatal error happens, we enter error reporting and install the secondary handler (`VMError::install_secondary_signal_handler()`). Before doing that, we store the handler we replace in yet another array, in this case one array for the handler address, one for the flag: >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/vmError_posix.cpp#L77 >> I believe the purpose of this is to - when printing signal handlers as part of error reporting - print the original signal handler instead of the secondary crash handler (see `PosixSignals::print_signal_handler()`): >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L1372 >> and additionally to not trip this warning here: >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L1391 >> >> ------ >> >> Changes in this patch: >> >> - I added some convenience macros to check if a handler matches a given function (HANDLER_IS), check if a handler is set to ignore or default or both (HANDLER_IS_IGN, HANDLER_IS_DFL, HANDLER_IS_IGN_OR_DFL). Makes code more readable. >> - I added convenience class `SavedSignalHandlers` to keep a vector of handler information by signal number. >> - I used that class to cover cases (1)..(3): >> - `chained_handlers` contains all information of chained handlers >> - `expected_handlers` contains a copy of the handlers the hotspot installed >> - `replaced_handlers` contains information about replaced handlers >> >> - about (1): I store the chained signal handler information in `chained_handlers` when installing a hotspot handler, UseSignalChaining is 1, and a non-default handler was encountered. >> >> - about (2): I simplified the signal checking mechanism quite a bit: it compares the handler (address and flags) it finds present with expectations. Before this patch, the expected handler address was deduced in a hard-wired way, now, we just compare the active sigaction structure with the one we installed on VM start. >> >> - about (3): when installing any handler (hotspot as well as user defined via java), I store the handler it replaced in `replaced_handlers`. I use that to print which handler had been replaced in `PosixSignals::print_signal_handler`. I simplified `PosixSignals::print_signal_handler` such that it does not retain any knowledge about hotspot signal handlers. Now, it just prints out the currently established handlers. In addition to that, it prints out chaining information and which handlers had been replaced. I removed the associated coding from VMError. >> >> Output Before: >> 663 Signal Handlers: >> 664 SIGSEGV: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 665 SIGBUS: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 666 SIGFPE: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 667 SIGPIPE: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 668 SIGXFSZ: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 669 SIGILL: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 670 SIGUSR2: SR_handler in libjvm.so, sa_mask[0]=00000000000000000000000000000000, sa_flags=SA_RESTART|SA_SIGINFO >> 671 SIGHUP: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 672 SIGINT: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 673 SIGTERM: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 674 SIGQUIT: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 675 SIGTRAP: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> >> Now: >> Signal Handlers: >> SIGSEGV: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGSEGV: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGBUS: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGBUS: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGFPE: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGFPE: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGPIPE: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGXFSZ: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGILL: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGILL: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGUSR2: SR_handler in libjvm.so, mask=00000000000000000000000000000000, flags=SA_RESTART|SA_SIGINFO >> SIGHUP: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGINT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGTERM: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGQUIT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGTRAP: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> >> ----- >> Tests: GA, and the patch has been tested in our nighlies for over a month now. I manually executed the runtime/jni/checked tests too. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Use universal zero initializer for do_check_signal_periodically src/hotspot/os/posix/signals_posix.cpp line 848: > 846: tty->print_cr("Warning: %s handler modified!", os::exception_name(sig, buf, sizeof(buf))); > 847: // If we had a mismatch: > 848: // - print all signal handlers. As part of that printout details will be printed Needs comma: `// - print all signal handlers. As part of that printout, details will be printed` src/hotspot/os/posix/signals_posix.cpp line 1266: > 1264: void* oldhand2 = get_signal_handler(&oldAct); > 1265: assert(oldhand2 == oldhand, "no concurrent signal handler installation"); > 1266: Empty line. ------------- PR: https://git.openjdk.java.net/jdk/pull/2251 From stuefe at openjdk.java.net Wed Feb 17 05:44:46 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 17 Feb 2021 05:44:46 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v6] In-Reply-To: References: Message-ID: On Mon, 15 Feb 2021 22:19:04 GMT, Gerard Ziemski wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> Use universal zero initializer for do_check_signal_periodically > > src/hotspot/os/posix/signals_posix.cpp line 115: > >> 113: assert(sig > 0 && sig < NSIG, "invalid signal number %d", sig); >> 114: return sig > 0 && sig < NSIG; >> 115: } > > We have > > assert(sig > 0 && sig < NSIG, "invalid signal number %d", sig); > return sig > 0 && sig < NSIG; > > here in `SavedSignalHandlers::check_signal_number()` and > > #if defined(__APPLE__) > return sig >= 1 && sig < NSIG; > #else > > in `is_valid_signal()`. > > Can we combine those, something like: > > // Returns true if signal number is valid. > static bool is_valid_signal(int sig) { > assert(sig > 0 && sig < NSIG, "invalid signal number %d", sig); > if (sig > 0 && sig < NSIG) { > // Use sigaddset to check for signal validity. > sigset_t set; > sigemptyset(&set); > if (sigaddset(&set, sig) == -1 && errno == EINVAL) { > return false; > } else { > return true; > } > } else { > return false; > } > } > > so then we can drop `SavedSignalHandlers::check_signal_number()` altogether? I would like to keep this. SavedSignalHandlers::check_signal_number() is just a simple bounds check to prevent memory overwriters, with an added assert for goot measure in debug builds. Its proximity to the array it guards makes this clear. SavedSignalHandlers gets only fed with know values in this file, so an assert is sufficient. is_valid_signal() OTOH does a runtime check for signal validity. It is more expensive but only used in analysis situations. It pulls its signal number from a sigaction structure, which may be corrupt and contain whatever, so the runtime check makes more sense to me. ------------- PR: https://git.openjdk.java.net/jdk/pull/2251 From stuefe at openjdk.java.net Wed Feb 17 05:44:46 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 17 Feb 2021 05:44:46 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v6] In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 05:39:50 GMT, Thomas Stuefe wrote: >> src/hotspot/os/posix/signals_posix.cpp line 115: >> >>> 113: assert(sig > 0 && sig < NSIG, "invalid signal number %d", sig); >>> 114: return sig > 0 && sig < NSIG; >>> 115: } >> >> We have >> >> assert(sig > 0 && sig < NSIG, "invalid signal number %d", sig); >> return sig > 0 && sig < NSIG; >> >> here in `SavedSignalHandlers::check_signal_number()` and >> >> #if defined(__APPLE__) >> return sig >= 1 && sig < NSIG; >> #else >> >> in `is_valid_signal()`. >> >> Can we combine those, something like: >> >> // Returns true if signal number is valid. >> static bool is_valid_signal(int sig) { >> assert(sig > 0 && sig < NSIG, "invalid signal number %d", sig); >> if (sig > 0 && sig < NSIG) { >> // Use sigaddset to check for signal validity. >> sigset_t set; >> sigemptyset(&set); >> if (sigaddset(&set, sig) == -1 && errno == EINVAL) { >> return false; >> } else { >> return true; >> } >> } else { >> return false; >> } >> } >> >> so then we can drop `SavedSignalHandlers::check_signal_number()` altogether? > > I would like to keep this. SavedSignalHandlers::check_signal_number() is just a simple bounds check to prevent memory overwriters, with an added assert for goot measure in debug builds. Its proximity to the array it guards makes this clear. SavedSignalHandlers gets only fed with know values in this file, so an assert is sufficient. > > is_valid_signal() OTOH does a runtime check for signal validity. It is more expensive but only used in analysis situations. It pulls its signal number from a sigaction structure, which may be corrupt and contain whatever, so the runtime check makes more sense to me. In fact, SavedSignalHandler::_sa[NSIG] could be reduced in size to something more sensible, since we only ever feed "normal" signals into it. Same reason as NUM_IMPORTANT_SIGS. But we only talk a few kbyte here, so I did not bother. ------------- PR: https://git.openjdk.java.net/jdk/pull/2251 From stuefe at openjdk.java.net Wed Feb 17 05:56:45 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 17 Feb 2021 05:56:45 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v6] In-Reply-To: References: Message-ID: On Mon, 15 Feb 2021 22:35:30 GMT, Gerard Ziemski wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> Use universal zero initializer for do_check_signal_periodically > > src/hotspot/os/posix/signals_posix.cpp line 414: > >> 412: if (actp == NULL) { >> 413: // Retrieve the preinstalled signal handler from jvm >> 414: actp = const_cast(chained_handlers.get(sig)); > > Must `SavedSignalHandlers::get()` have the **const struct** in `const struct sigaction* get(int sig) const` signature? > > If it was just `struct sigaction* get(int sig) const` then we wouldn't need this awkward cast. I'm a stickler to const since its very expressive and prevents stupid errors. `const struct sigaction* get() const` makes perfect sense since at no time you are supposed to modify the underlying sigaction structure. Once it is saved, it should only be used to read from (eg to print, or to extract the handler address and call it in `call_chained_handler()`. But old code was not const correct. The way to start being const correct without having to rewrite everything is to make new code const correct and cauterize at the interface between new and old code with these ugly but at least expressive casts. Side note, even the sigaction function is const correct: https://pubs.opengroup.org/onlinepubs/007904875/functions/sigaction.html see the `act` input parameter designating the new handler, its const. Only the output parameter `oact` is nonconst. I will see if I can make the rest of the code const correct too wrt the sigaction structures. That way we can get rid of the cast and have better code as well. ------------- PR: https://git.openjdk.java.net/jdk/pull/2251 From iklam at openjdk.java.net Wed Feb 17 05:56:42 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 17 Feb 2021 05:56:42 GMT Subject: RFR: 8261125: Move VM_Operation to vmOperation.hpp [v4] In-Reply-To: References: <1bctHoGUQv3v65nZBeuvMgXH4ur6CU3xXkzHT6ZqIPo=.aeda32f9-fdfe-4561-b6f0-7201caa0eea1@github.com> <7F1Nq0zPfyWP644ml5icuHT6lkHlAjJG2qFOJr2dGAY=.daffcc16-1ed0-42db-b77b-e59c72dff349@github.com> <54Eg-olaP7adJuc9TR2Rj58smez-FpUWkYcf1Hf1onA=.0137514d-c491-4701-8db4-2fa3fd6ac51d@github.com> Message-ID: <1wBxEwq-PpiIdcoTAZD5wWMhSh4vn0t4CYb82CnXGsU=.0589d8eb-0508-40d4-abde-5c488345347c@github.com> On Thu, 11 Feb 2021 06:54:40 GMT, Ioi Lam wrote: >>> I'm fine with leaving vmOperations.hpp and vmOperation.hpp. It's not a big deal. commonVMOperations.hpp - too much noise! >>> I agree with David. interfaceSupport.inline.hpp imports a lot of things so importing vmOperations.hpp is not a big deal. vmOperations.hpp imports #include "runtime/threadSMR.hpp" otherwise it has all the same imports as interfaceSupport.inline.hpp anyway. >>> All these files are going to increase compilation time too. >>> I stand by my check mark above! >> >> OK, since no one is fond of further splitting the headers, and no one has vetoed the vmOperation.hpp name, I'll keep everything as originally proposed. Will do more testing and integrate. > > BTW, I found a few existing singular/plural pairs of hpp files. Admittedly I created ciSymbols.hpp recently, but the other 3 pairs have been there for quite some time. > > share/ci/ciSymbol.hpp > share/ci/ciSymbols.hpp > > share/interpreter/bytecode.hpp > share/interpreter/bytecodes.hpp > > share/jfr/recorder/service/jfrEvent.hpp > share/jfr/jfrEvents.hpp > > share/jfr/recorder/checkpoint/types/jfrType.hpp > share/jfr/utilities/jfrTypes.hpp Thanks @dcubed-ojdk @coleenp @tstuefe @dholmes-ora for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/2398 From iklam at openjdk.java.net Wed Feb 17 05:56:44 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 17 Feb 2021 05:56:44 GMT Subject: Integrated: 8261125: Move VM_Operation to vmOperation.hpp In-Reply-To: References: Message-ID: On Thu, 4 Feb 2021 05:38:49 GMT, Ioi Lam wrote: > vmOperations.hpp declares the VM_Operation class, as well as a hodge podge of subclasses such as VM_ForceSafepoint, VM_DeoptimizeFrame. > > Out of the 1000 hotspot .o files, about 680 include vmOperations.hpp (mostly transitively). In most cases, they just need to use the VM_Operation class. > > So we should move VM_Operation to its own header: vmOperation.hpp (no "s"). > > After the refactoring, vmOperations.hpp is included only 64 times. The inclusion count of threadSMR.hpp is also reduced from 687 to 99. HotSpot build time is improved by about 0.4%. > > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. This pull request has now been integrated. Changeset: fc1d0321 Author: Ioi Lam URL: https://git.openjdk.java.net/jdk/commit/fc1d0321 Stats: 345 lines in 23 files changed: 189 ins; 141 del; 15 mod 8261125: Move VM_Operation to vmOperation.hpp Reviewed-by: coleenp, stuefe ------------- PR: https://git.openjdk.java.net/jdk/pull/2398 From stuefe at openjdk.java.net Wed Feb 17 06:07:43 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 17 Feb 2021 06:07:43 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v6] In-Reply-To: <-3-_jZX9uZJptkKvOOYeagiPnij3xQqqVCBVhgOtK5o=.95794b96-18ac-41ec-a793-16d097e8efd4@github.com> References: <-3-_jZX9uZJptkKvOOYeagiPnij3xQqqVCBVhgOtK5o=.95794b96-18ac-41ec-a793-16d097e8efd4@github.com> Message-ID: On Tue, 16 Feb 2021 18:58:15 GMT, Gerard Ziemski wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> Use universal zero initializer for do_check_signal_periodically > > src/hotspot/os/posix/signals_posix.cpp line 1222: > >> 1220: // Query the current signal handler. Needs to be a separate operation >> 1221: // from installing a new handler since we need to honor AllowUserSignalHandlers. >> 1222: void* oldhand = get_signal_handler(&oldAct); > > It's a pre-existing issue, but I dislike the **"old"** in **oldhand** and **oldAct**. It makes it sound like it is a cached or previous value, which is especially confusing when in the comment we refer to it as the **current** one. Any chance we can clean this up in this fix? I think it stems from the Posix interface calling the parameters `act` and `oact` and ppl are used to that naming: int sigaction(int sig, const struct sigaction *restrict act, struct sigaction *restrict oact); That said, we use a bunch of different names, like oact, oldAct or oldSigAct, which I disliked. I wanted to unify those namings but refrained since I wanted to keep the patch small. Its difficult enough as it is. Since this would be an easy change in itself, can we keep it for a different RFE? As for name, lets say currentAct? > src/hotspot/os/posix/signals_posix.cpp line 153: > >> 151: // and compare it periodically against reality (see os::run_periodic_checks()). >> 152: static bool check_signals = true; >> 153: static SavedSignalHandlers expected_handlers; > > I personally would prefer `vm_handlers` or `java_handlers` or `our_handlers` here instead of `expected_handlers`. Okay, vm_handlers it is. > src/hotspot/os/posix/signals_posix.cpp line 817: > >> 815: const int expected_flags = get_sanitized_sa_flags(expected_sa); >> 816: return this_handler == expected_handler && >> 817: this_flags == expected_flags; > > Could we use brackets here like: > > return ((this_handler == expected_handler) && > (this_flags == expected_flags)); I can add the inside brackets, but don't see the need for the outside ones, there is nothing to bracket off against. > src/hotspot/os/posix/signals_posix.cpp line 845: > >> 843: >> 844: // Compare both sigaction structures (intelligently; only the members we care about). >> 845: if (!compare_handler_info(&act, expected_act)) { > > The name `compare_handler_info()` makes it sound like a function that tells whether we should compare the values, not whether the values' comparison evaluates to true. > > How about `are_handlers_equal()`? Ok ------------- PR: https://git.openjdk.java.net/jdk/pull/2251 From iklam at openjdk.java.net Wed Feb 17 06:19:59 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 17 Feb 2021 06:19:59 GMT Subject: RFR: 8261868: Reduce inclusion of metaspace.hpp Message-ID: metaspace.hpp is included by about 770 out of 1000 HotSpot .o files. Most of these are transitively included via array.hpp and classLoaderData.hpp. - classLoaderData.hpp doesn't actually need metaspace.hpp. - array.hpp can be refactored to put a function that depends on metaspace.hpp into array.inline.hpp Doing the above reduces the number of .o files that include metaspace.hpp to 343. Since this is still a significant number, we should split out the rarely used classes (such as `MetaspaceGC` and `MetaspaceUtils`) into a new header file (metaspaceUtils.hpp, which is included only 30 times). Also, these 3 includes can now be removed from metaspace.hpp. #include "memory/memRegion.hpp" #include "memory/metaspaceChunkFreeListSummary.hpp" #include "memory/virtualspace.hpp" Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. (I also fixed an unrelated comment in archiveUtils.cpp when I was scanning for the word "Metaspace" in the source files -- the function `MetaspaceShared::commit_to()` no longer exists). ------------- Commit messages: - step2 - step 1 Changes: https://git.openjdk.java.net/jdk/pull/2599/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2599&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261868 Stats: 361 lines in 49 files changed: 221 ins; 132 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/2599.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2599/head:pull/2599 PR: https://git.openjdk.java.net/jdk/pull/2599 From iklam at openjdk.java.net Wed Feb 17 07:31:12 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 17 Feb 2021 07:31:12 GMT Subject: RFR: 8261868: Reduce inclusion of metaspace.hpp [v2] In-Reply-To: References: Message-ID: > metaspace.hpp is included by about 770 out of 1000 HotSpot .o files. Most of these are transitively included via array.hpp and classLoaderData.hpp. > > - classLoaderData.hpp doesn't actually need metaspace.hpp. > - array.hpp can be refactored to put a function that depends on metaspace.hpp into array.inline.hpp > > Doing the above reduces the number of .o files that include metaspace.hpp to 343. Since this is still a significant number, we should split out the rarely used classes (such as `MetaspaceGC` and `MetaspaceUtils`) into a new header file (metaspaceUtils.hpp, which is included only 30 times). > > Also, these 3 includes can now be removed from metaspace.hpp. > > #include "memory/memRegion.hpp" > #include "memory/metaspaceChunkFreeListSummary.hpp" > #include "memory/virtualspace.hpp" > > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. > > (I also fixed an unrelated comment in archiveUtils.cpp when I was scanning for the word "Metaspace" in the source files -- the function `MetaspaceShared::commit_to()` no longer exists). Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: fixed ppc/s390 builds ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2599/files - new: https://git.openjdk.java.net/jdk/pull/2599/files/7a32ad1d..700d1c16 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2599&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2599&range=00-01 Stats: 2 lines in 2 files changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2599.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2599/head:pull/2599 PR: https://git.openjdk.java.net/jdk/pull/2599 From stuefe at openjdk.java.net Wed Feb 17 07:57:40 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 17 Feb 2021 07:57:40 GMT Subject: RFR: 8261868: Reduce inclusion of metaspace.hpp [v2] In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 07:31:12 GMT, Ioi Lam wrote: >> metaspace.hpp is included by about 770 out of 1000 HotSpot .o files. Most of these are transitively included via array.hpp and classLoaderData.hpp. >> >> - classLoaderData.hpp doesn't actually need metaspace.hpp. >> - array.hpp can be refactored to put a function that depends on metaspace.hpp into array.inline.hpp >> >> Doing the above reduces the number of .o files that include metaspace.hpp to 343. Since this is still a significant number, we should split out the rarely used classes (such as `MetaspaceGC` and `MetaspaceUtils`) into a new header file (metaspaceUtils.hpp, which is included only 30 times). >> >> Also, these 3 includes can now be removed from metaspace.hpp. >> >> #include "memory/memRegion.hpp" >> #include "memory/metaspaceChunkFreeListSummary.hpp" >> #include "memory/virtualspace.hpp" >> >> Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. >> >> (I also fixed an unrelated comment in archiveUtils.cpp when I was scanning for the word "Metaspace" in the source files -- the function `MetaspaceShared::commit_to()` no longer exists). > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > fixed ppc/s390 builds Hi Ioi, this is very appreciated. metaspace.hpp is still a bit of a mess. Its the last holdover for the old metaspace implementation and I always wanted to clean it out a bit. Splitting this header into three is a right step. A lot of that stuff may still vanish and/or be reformed if I have the time (eg metaspaceChunkFreeListSummary). Assuming this builds and tests fine on all our platform, including the weirder ones, I am fine with this patch. It looks good. Thanks, Thomas ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2599 From akozlov at openjdk.java.net Wed Feb 17 08:14:42 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Wed, 17 Feb 2021 08:14:42 GMT Subject: Integrated: 8261075: Create stubRoutines.inline.hpp with SafeFetch implementation In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 09:52:44 GMT, Anton Kozlov wrote: > Hi, > > Please reivew a small non-functional change that extracts inline SafeFetch functions to a separate file. This is preliminary work for JEP-391 integration that will reduce the size of that patch. > > CC @dcubed-ojdk > > Thanks! This pull request has now been integrated. Changeset: b955f85e Author: Anton Kozlov Committer: Vladimir Kempik URL: https://git.openjdk.java.net/jdk/commit/b955f85e Stats: 91 lines in 11 files changed: 58 ins; 26 del; 7 mod 8261075: Create stubRoutines.inline.hpp with SafeFetch implementation Reviewed-by: dcubed, stuefe, stefank ------------- PR: https://git.openjdk.java.net/jdk/pull/2542 From ayang at openjdk.java.net Wed Feb 17 08:19:38 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Wed, 17 Feb 2021 08:19:38 GMT Subject: RFR: 8261309: Remove remaining StoreLoad barrier with UseCondCardMark for Serial/Parallel GC In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 08:55:40 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this (tiny) change that removes the last (unconditional) StoreLoad memory barrier for Serial/Parallel GC that has apparently been forgotten to be made conditional on `CardTable::scanned_concurrently()` just removed in [JDK-8260941](https://bugs.openjdk.java.net/browse/JDK-8260941) ? > > Thanks, > Thomas > > Testing: automatic compilation via github actions, but this is a quite straightforward removal of a single line... Marked as reviewed by ayang (Author). ------------- PR: https://git.openjdk.java.net/jdk/pull/2541 From jbhateja at openjdk.java.net Wed Feb 17 08:31:39 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Wed, 17 Feb 2021 08:31:39 GMT Subject: RFR: 8261553: Efficient mask generation using BMI2 BZHI instruction [v2] In-Reply-To: <8ShwNGfn1vuWGuS_kjH2zXFGyW3pObvxBZt3KivpwYU=.55fb12d3-10bf-41a6-88a4-2fa665c892ea@github.com> References: <7AXx69j_wscf8ENt98_apHTd1OKbeO80nNVpU68Z794=.7d776104-0f08-4e92-b229-bc210123fc4e@github.com> <8ShwNGfn1vuWGuS_kjH2zXFGyW3pObvxBZt3KivpwYU=.55fb12d3-10bf-41a6-88a4-2fa665c892ea@github.com> Message-ID: On Tue, 16 Feb 2021 21:40:58 GMT, Nils Eliasson wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> 8261553: Adding BMI2 missing check for partial in-lining. > > Looks good. > Thanks for getting to the bottom of that regression. Thanks, since there is not a significant impact on performance, but having an optimum instruction sequence will still reduce the complexity. Should it be ok to check this in ? We have one reviewer consent. ------------- PR: https://git.openjdk.java.net/jdk/pull/2522 From sjohanss at openjdk.java.net Wed Feb 17 08:48:42 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 17 Feb 2021 08:48:42 GMT Subject: RFR: 8261309: Remove remaining StoreLoad barrier with UseCondCardMark for Serial/Parallel GC In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 08:55:40 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this (tiny) change that removes the last (unconditional) StoreLoad memory barrier for Serial/Parallel GC that has apparently been forgotten to be made conditional on `CardTable::scanned_concurrently()` just removed in [JDK-8260941](https://bugs.openjdk.java.net/browse/JDK-8260941) ? > > Thanks, > Thomas > > Testing: automatic compilation via github actions, but this is a quite straightforward removal of a single line... Looks good. ------------- Marked as reviewed by sjohanss (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2541 From tschatzl at openjdk.java.net Wed Feb 17 08:53:42 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 17 Feb 2021 08:53:42 GMT Subject: Integrated: 8261309: Remove remaining StoreLoad barrier with UseCondCardMark for Serial/Parallel GC In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 08:55:40 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this (tiny) change that removes the last (unconditional) StoreLoad memory barrier for Serial/Parallel GC that has apparently been forgotten to be made conditional on `CardTable::scanned_concurrently()` just removed in [JDK-8260941](https://bugs.openjdk.java.net/browse/JDK-8260941) ? > > Thanks, > Thomas > > Testing: automatic compilation via github actions, but this is a quite straightforward removal of a single line... This pull request has now been integrated. Changeset: a9308705 Author: Thomas Schatzl URL: https://git.openjdk.java.net/jdk/commit/a9308705 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod 8261309: Remove remaining StoreLoad barrier with UseCondCardMark for Serial/Parallel GC Reviewed-by: shade, ayang, sjohanss ------------- PR: https://git.openjdk.java.net/jdk/pull/2541 From tschatzl at openjdk.java.net Wed Feb 17 08:53:41 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 17 Feb 2021 08:53:41 GMT Subject: RFR: 8261309: Remove remaining StoreLoad barrier with UseCondCardMark for Serial/Parallel GC In-Reply-To: References: Message-ID: <_XY8jPYCLzNDgC7Ssyj025s3YI-U83r14bfrPMQlHB0=.7853e7b8-3233-49de-ba28-cd3bde8ca7ef@github.com> On Mon, 15 Feb 2021 08:37:24 GMT, Aleksey Shipilev wrote: >> Hi all, >> >> can I have reviews for this (tiny) change that removes the last (unconditional) StoreLoad memory barrier for Serial/Parallel GC that has apparently been forgotten to be made conditional on `CardTable::scanned_concurrently()` just removed in [JDK-8260941](https://bugs.openjdk.java.net/browse/JDK-8260941) ? >> >> Thanks, >> Thomas >> >> Testing: automatic compilation via github actions, but this is a quite straightforward removal of a single line... > > Looks good. Thanks @shipilev , @albertnetymk , @kstefanj for your reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/2541 From stuefe at openjdk.java.net Wed Feb 17 09:21:44 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 17 Feb 2021 09:21:44 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v6] In-Reply-To: References: Message-ID: On Tue, 16 Feb 2021 19:31:42 GMT, Gerard Ziemski wrote: > java -XX:ErrorHandlerTest=2 -version I cannot reproduce it. What build is this? The error is weird since "too many errors, abort" should have been preceded by any number of secondary error printouts. ------------- PR: https://git.openjdk.java.net/jdk/pull/2251 From tschatzl at openjdk.java.net Wed Feb 17 10:21:42 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 17 Feb 2021 10:21:42 GMT Subject: RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages [v5] In-Reply-To: References: Message-ID: On Mon, 15 Feb 2021 13:44:58 GMT, Stefan Johansson wrote: >> When large pages are enabled on Linux (using -XX:+UseLargePages), both UseHugeTLBFS and UseSHM can be used. We prefer to use HugeTLBFS and first do a sanity check to see if this kind of large pages are available and if so we disable UseSHM. >> >> The problematic part is when HugeTLBFS pages are not available, then we disable this flag and without doing any sanity check for UseSHM, we mark large pages as enabled using SHM. One big problem with this is that SHM also requires the same type of explicitly allocated huge pages as HugeTLBFS and also privileges to lock memory. So it is likely that in the case of not being able to use HugeTLBFS we probably can't use SHM either. >> >> A fix for this would be to do a similar sanity check as currently done for HugeTLBFS and if it fails disable UseLargePages since we will always fail such allocation attempts anyways. >> >> The proposed sanity check consist of two part, where the first is just trying create a shared memory segment using `shmget()` with SHM_HUGETLB to use large pages. If this fails there is no idea in trying to use SHM to get large pages. >> >> The second part checks if the process has privileges to lock memory or if there will be a limit for the SHM usage. I think this would be a nice addition since it will notify the user about the limit and explain why large page mappings fail. The implementation parses `/proc/self/status` to make sure the needed capability is available. >> >> This change needs two tests to be updated to handle that large pages not can be disabled even when run with +UseLargePages. One of these tests are also updated in [PR#2486](https://github.com/openjdk/jdk/pull/2486) and I plan to get that integrated before this one. > > Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: > > Clean up test after merge Changes requested by tschatzl (Reviewer). src/hotspot/os/linux/os_linux.cpp line 3574: > 3572: // 1. shmmax is too small for the request. > 3573: // > check shmmax value: cat /proc/sys/kernel/shmmax > 3574: // > increase shmmax value: echo "0xffffffff" > /proc/sys/kernel/shmmax I'd avoid using a 32 bit constant on potentially 64 bit systems. According to that same man page, the limit on 64 bits is 127 TB which is more than the suggested 4 GB :) The man page talks about `ULONG_MAX - 2^24` as default (for 64 bit systems?). src/hotspot/os/linux/os_linux.cpp line 3570: > 3568: // Try to create a large shared memory segment. > 3569: int shmid = shmget(IPC_PRIVATE, page_size, SHM_HUGETLB|IPC_CREAT|SHM_R|SHM_W); > 3570: if (shmid == -1) { I think the flags should contain the `SHM_HUGE_*` flags (considering large pages) similar to hugetlbfs for proper checking to avoid the failure like in [JDK-8261636](https://bugs.openjdk.java.net/browse/JDK-8261636) reported just recently for the same reason. According to the [man pages](https://man7.org/linux/man-pages/man2/shmget.2.html)] such flags exist. src/hotspot/os/linux/os_linux.cpp line 3578: > 3576: // > check available large pages: cat /proc/meminfo > 3577: // > increase amount of large pages: > 3578: // echo new_value > /proc/sys/vm/nr_hugepages This is just my opinion, but I would prefer to refer to some generic documentation about these options (and use the "new" locations in `/sys/kernel/mm/hugepages/hugepages-*/` instead of the "legacy" /proc/sys, e.g. `https://lwn.net/Articles/376606/` - yes this is not official documentation, but the best I could find on the spot) ------------- PR: https://git.openjdk.java.net/jdk/pull/2488 From sjohanss at openjdk.java.net Wed Feb 17 10:56:40 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 17 Feb 2021 10:56:40 GMT Subject: RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages [v5] In-Reply-To: References: Message-ID: <4Q39iT2bzVf6OkhkkKx9LpCg62wRFdkuND5Xf4RXTog=.fa52ba1f-b152-4e1d-a368-4fdfc7dafd51@github.com> On Mon, 15 Feb 2021 13:44:58 GMT, Stefan Johansson wrote: >> When large pages are enabled on Linux (using -XX:+UseLargePages), both UseHugeTLBFS and UseSHM can be used. We prefer to use HugeTLBFS and first do a sanity check to see if this kind of large pages are available and if so we disable UseSHM. >> >> The problematic part is when HugeTLBFS pages are not available, then we disable this flag and without doing any sanity check for UseSHM, we mark large pages as enabled using SHM. One big problem with this is that SHM also requires the same type of explicitly allocated huge pages as HugeTLBFS and also privileges to lock memory. So it is likely that in the case of not being able to use HugeTLBFS we probably can't use SHM either. >> >> A fix for this would be to do a similar sanity check as currently done for HugeTLBFS and if it fails disable UseLargePages since we will always fail such allocation attempts anyways. >> >> The proposed sanity check consist of two part, where the first is just trying create a shared memory segment using `shmget()` with SHM_HUGETLB to use large pages. If this fails there is no idea in trying to use SHM to get large pages. >> >> The second part checks if the process has privileges to lock memory or if there will be a limit for the SHM usage. I think this would be a nice addition since it will notify the user about the limit and explain why large page mappings fail. The implementation parses `/proc/self/status` to make sure the needed capability is available. >> >> This change needs two tests to be updated to handle that large pages not can be disabled even when run with +UseLargePages. One of these tests are also updated in [PR#2486](https://github.com/openjdk/jdk/pull/2486) and I plan to get that integrated before this one. > > Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: > > Clean up test after merge Thanks for taking a look Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/2488 From sjohanss at openjdk.java.net Wed Feb 17 10:56:43 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 17 Feb 2021 10:56:43 GMT Subject: RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages [v5] In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 10:14:18 GMT, Thomas Schatzl wrote: >> Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: >> >> Clean up test after merge > > src/hotspot/os/linux/os_linux.cpp line 3570: > >> 3568: // Try to create a large shared memory segment. >> 3569: int shmid = shmget(IPC_PRIVATE, page_size, SHM_HUGETLB|IPC_CREAT|SHM_R|SHM_W); >> 3570: if (shmid == -1) { > > I think the flags should contain the `SHM_HUGE_*` flags (considering large pages) similar to hugetlbfs for proper checking to avoid the failure like in [JDK-8261636](https://bugs.openjdk.java.net/browse/JDK-8261636) reported just recently for the same reason. > > According to the [man pages](https://man7.org/linux/man-pages/man2/shmget.2.html)] such flags exist. I would agree if the later usage of `shmget()` would make use of those flags but it don't. We might want o add a CR for enabling use of other page sizes than the default large page size for SHM as well. But there is also discussions about trying to remove this SHM support. > src/hotspot/os/linux/os_linux.cpp line 3574: > >> 3572: // 1. shmmax is too small for the request. >> 3573: // > check shmmax value: cat /proc/sys/kernel/shmmax >> 3574: // > increase shmmax value: echo "0xffffffff" > /proc/sys/kernel/shmmax > > I'd avoid using a 32 bit constant on potentially 64 bit systems. According to that same man page, the limit on 64 bits is 127 TB which is more than the suggested 4 GB :) > > The man page talks about `ULONG_MAX - 2^24` as default (for 64 bit systems?). The comment is a copy of the old one where we actually use `shmget()`, but I agree that it could use an update. Not even sure we have to suggest a specific value, something like: Suggestion: // > increase shmmax value: echo "new value" > /proc/sys/kernel/shmmax Might be good enough. I will also update the original comment. > src/hotspot/os/linux/os_linux.cpp line 3578: > >> 3576: // > check available large pages: cat /proc/meminfo >> 3577: // > increase amount of large pages: >> 3578: // echo new_value > /proc/sys/vm/nr_hugepages > > This is just my opinion, but I would prefer to refer to some generic documentation about these options (and use the "new" locations in `/sys/kernel/mm/hugepages/hugepages-*/` instead of the "legacy" /proc/sys, e.g. `https://lwn.net/Articles/376606/` - yes this is not official documentation, but the best I could find on the spot) I agree, the documentation I usually refer to is: https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt Since our SHM-support currently only handles the default large page size I think we should suggest to use: `sysctl -w vm.nr_hugepages=new_value` Suggestion: // sysctl -w vm.nr_hugepages=new_value // > For more information regarding large pages please refer to: // https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt ------------- PR: https://git.openjdk.java.net/jdk/pull/2488 From sjohanss at openjdk.java.net Wed Feb 17 11:06:59 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 17 Feb 2021 11:06:59 GMT Subject: RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages [v6] In-Reply-To: References: Message-ID: > When large pages are enabled on Linux (using -XX:+UseLargePages), both UseHugeTLBFS and UseSHM can be used. We prefer to use HugeTLBFS and first do a sanity check to see if this kind of large pages are available and if so we disable UseSHM. > > The problematic part is when HugeTLBFS pages are not available, then we disable this flag and without doing any sanity check for UseSHM, we mark large pages as enabled using SHM. One big problem with this is that SHM also requires the same type of explicitly allocated huge pages as HugeTLBFS and also privileges to lock memory. So it is likely that in the case of not being able to use HugeTLBFS we probably can't use SHM either. > > A fix for this would be to do a similar sanity check as currently done for HugeTLBFS and if it fails disable UseLargePages since we will always fail such allocation attempts anyways. > > The proposed sanity check consist of two part, where the first is just trying create a shared memory segment using `shmget()` with SHM_HUGETLB to use large pages. If this fails there is no idea in trying to use SHM to get large pages. > > The second part checks if the process has privileges to lock memory or if there will be a limit for the SHM usage. I think this would be a nice addition since it will notify the user about the limit and explain why large page mappings fail. The implementation parses `/proc/self/status` to make sure the needed capability is available. > > This change needs two tests to be updated to handle that large pages not can be disabled even when run with +UseLargePages. One of these tests are also updated in [PR#2486](https://github.com/openjdk/jdk/pull/2486) and I plan to get that integrated before this one. Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: Thomas review - Updated comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2488/files - new: https://git.openjdk.java.net/jdk/pull/2488/files/a9d8c0f7..8354e74f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2488&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2488&range=04-05 Stats: 9 lines in 1 file changed: 4 ins; 0 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/2488.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2488/head:pull/2488 PR: https://git.openjdk.java.net/jdk/pull/2488 From tschatzl at openjdk.java.net Wed Feb 17 11:17:41 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 17 Feb 2021 11:17:41 GMT Subject: RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages [v6] In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 11:06:59 GMT, Stefan Johansson wrote: >> When large pages are enabled on Linux (using -XX:+UseLargePages), both UseHugeTLBFS and UseSHM can be used. We prefer to use HugeTLBFS and first do a sanity check to see if this kind of large pages are available and if so we disable UseSHM. >> >> The problematic part is when HugeTLBFS pages are not available, then we disable this flag and without doing any sanity check for UseSHM, we mark large pages as enabled using SHM. One big problem with this is that SHM also requires the same type of explicitly allocated huge pages as HugeTLBFS and also privileges to lock memory. So it is likely that in the case of not being able to use HugeTLBFS we probably can't use SHM either. >> >> A fix for this would be to do a similar sanity check as currently done for HugeTLBFS and if it fails disable UseLargePages since we will always fail such allocation attempts anyways. >> >> The proposed sanity check consist of two part, where the first is just trying create a shared memory segment using `shmget()` with SHM_HUGETLB to use large pages. If this fails there is no idea in trying to use SHM to get large pages. >> >> The second part checks if the process has privileges to lock memory or if there will be a limit for the SHM usage. I think this would be a nice addition since it will notify the user about the limit and explain why large page mappings fail. The implementation parses `/proc/self/status` to make sure the needed capability is available. >> >> This change needs two tests to be updated to handle that large pages not can be disabled even when run with +UseLargePages. One of these tests are also updated in [PR#2486](https://github.com/openjdk/jdk/pull/2486) and I plan to get that integrated before this one. > > Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: > > Thomas review - Updated comments I can't approve the changes you made in the "Suggested changes" boxes, or even comment on them as "looks good" directly, but they look all good. Marked as reviewed by tschatzl (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2488 From tschatzl at openjdk.java.net Wed Feb 17 11:17:42 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 17 Feb 2021 11:17:42 GMT Subject: RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages [v5] In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 10:32:09 GMT, Stefan Johansson wrote: >> src/hotspot/os/linux/os_linux.cpp line 3570: >> >>> 3568: // Try to create a large shared memory segment. >>> 3569: int shmid = shmget(IPC_PRIVATE, page_size, SHM_HUGETLB|IPC_CREAT|SHM_R|SHM_W); >>> 3570: if (shmid == -1) { >> >> I think the flags should contain the `SHM_HUGE_*` flags (considering large pages) similar to hugetlbfs for proper checking to avoid the failure like in [JDK-8261636](https://bugs.openjdk.java.net/browse/JDK-8261636) reported just recently for the same reason. >> >> According to the [man pages](https://man7.org/linux/man-pages/man2/shmget.2.html)] such flags exist. > > I would agree if the later usage of `shmget()` would make use of those flags but it don't. We might want o add a CR for enabling use of other page sizes than the default large page size for SHM as well. But there is also discussions about trying to remove this SHM support. I am good with creating a CR for that. We can always close it later. ------------- PR: https://git.openjdk.java.net/jdk/pull/2488 From redestad at openjdk.java.net Wed Feb 17 11:47:47 2021 From: redestad at openjdk.java.net (Claes Redestad) Date: Wed, 17 Feb 2021 11:47:47 GMT Subject: RFR: 8261553: Efficient mask generation using BMI2 BZHI instruction [v3] In-Reply-To: References: Message-ID: On Mon, 15 Feb 2021 18:55:02 GMT, Jatin Bhateja wrote: >> BMI2 BHZI instruction can be used to optimize the instruction sequence >> used for mask generation at various place in array copy stubs and partial in-lining for small copy operations. > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > 8261553 : Aligning main copy loop to prevent any penalty due to LSD and DSB misses. Marked as reviewed by redestad (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2522 From sjohanss at openjdk.java.net Wed Feb 17 12:03:55 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 17 Feb 2021 12:03:55 GMT Subject: RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages [v3] In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 08:38:24 GMT, Thomas Stuefe wrote: >> Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: >> >> Only warn if UseLargePages was explicitly set. > > Looks good now. Thank you! Thanks for the reviews @tstuefe and @tschatzl FYI, I just filed three more enhancements: [JDK-8261894: Remove support for UseSHM](https://bugs.openjdk.java.net/browse/JDK-8261894) [JDK-8261896: Add support for multiple page sizes for UseSHM](https://bugs.openjdk.java.net/browse/JDK-8261896) [JDK-8261899: Improve warning for UseSHM failures](https://bugs.openjdk.java.net/browse/JDK-8261899) ------------- PR: https://git.openjdk.java.net/jdk/pull/2488 From sjohanss at openjdk.java.net Wed Feb 17 12:06:41 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 17 Feb 2021 12:06:41 GMT Subject: Integrated: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages In-Reply-To: References: Message-ID: <4KGWHrMZKa-uElGRhw5lVaR9fflGUuUDFPM4npnqqmw=.ea2198c8-c5e1-4b00-a978-8b848134c936@github.com> On Tue, 9 Feb 2021 20:50:25 GMT, Stefan Johansson wrote: > When large pages are enabled on Linux (using -XX:+UseLargePages), both UseHugeTLBFS and UseSHM can be used. We prefer to use HugeTLBFS and first do a sanity check to see if this kind of large pages are available and if so we disable UseSHM. > > The problematic part is when HugeTLBFS pages are not available, then we disable this flag and without doing any sanity check for UseSHM, we mark large pages as enabled using SHM. One big problem with this is that SHM also requires the same type of explicitly allocated huge pages as HugeTLBFS and also privileges to lock memory. So it is likely that in the case of not being able to use HugeTLBFS we probably can't use SHM either. > > A fix for this would be to do a similar sanity check as currently done for HugeTLBFS and if it fails disable UseLargePages since we will always fail such allocation attempts anyways. > > The proposed sanity check consist of two part, where the first is just trying create a shared memory segment using `shmget()` with SHM_HUGETLB to use large pages. If this fails there is no idea in trying to use SHM to get large pages. > > The second part checks if the process has privileges to lock memory or if there will be a limit for the SHM usage. I think this would be a nice addition since it will notify the user about the limit and explain why large page mappings fail. The implementation parses `/proc/self/status` to make sure the needed capability is available. > > This change needs two tests to be updated to handle that large pages not can be disabled even when run with +UseLargePages. One of these tests are also updated in [PR#2486](https://github.com/openjdk/jdk/pull/2486) and I plan to get that integrated before this one. This pull request has now been integrated. Changeset: f639df43 Author: Stefan Johansson URL: https://git.openjdk.java.net/jdk/commit/f639df43 Stats: 49 lines in 3 files changed: 43 ins; 0 del; 6 mod 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages Reviewed-by: stuefe, tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/2488 From github.com+42899633+eastig at openjdk.java.net Wed Feb 17 12:09:42 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Wed, 17 Feb 2021 12:09:42 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v4] In-Reply-To: References: Message-ID: On Sun, 14 Feb 2021 05:44:51 GMT, Xin Liu wrote: >> Add a flag _suppress_cr to outputStream. outstream objects won't emit any CR if it's set. >> Correct TypeInstPtr::dump2 to make sure it only emits klass name once. >> Remove the comment because Klass::oop_print_on() has emitted the address of oop. >> >> Before: >> 689 ConP === 0 [[ 821 ]] Oop:java/lang/Stringjava.lang.String >> {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' >> - string: "a" >> :Constant:exact * >> >> After: >> 689 ConP === 0 [[ 821 ]] Oop:java.lang.String {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' - string: "a":Constant:exact * > > Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set > > reimplement this feature. withdraw my intrusive change in outputStream. > use stringStream only for the constant OopPtr. after oop->print_on(st), > delete all appearances of '\n' > - Merge branch 'master' into JDK-8260198 > - 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set > > fix merge conflict. > - Merge branch 'master' into JDK-8260198 > - 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set > > Add a flag _suppress_cr to outputStream. outstream objects won't emit any CR if it's set. > Correct TypeInstPtr::dump2 to make sure it only emits klass name once. > Remove the comment because Klass::oop_print_on() has emitted the address of oop. > > Before: > 689 ConP === 0 [[ 821 ]] Oop:java/lang/Stringjava.lang.String > {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' > - string: "a" > :Constant:exact * > > After: > 689 ConP === 0 [[ 821 ]] Oop:java.lang.String {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' - string: "a":Constant:exact * Changes requested by eastig at github.com (no known OpenJDK username). src/hotspot/share/opto/type.cpp line 4049: > 4047: ss.print(" "); > 4048: const_oop()->print_oop(&ss); > 4049: ss.tr_delete('\n'); `tr_delete` is expensive. Also deleting something in a stream does not fit into a concept of streams. I see that the content of `ss` is traversed many times. What about this code: for (const char *str = ss.base(); *str; ) { size_t i = 0; while (str[i] && str[i] != '\n' ) { ++i; } st->print_raw(str, i); str += i; while (*str == '\n') { ++str; } } ------------- PR: https://git.openjdk.java.net/jdk/pull/2178 From akozlov at openjdk.java.net Wed Feb 17 12:36:10 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Wed, 17 Feb 2021 12:36:10 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v18] In-Reply-To: References: Message-ID: > Please review the implementation of JEP 391: macOS/AArch64 Port. > > It's heavily based on existing ports to linux/aarch64, macos/x86_64, and windows/aarch64. > > Major changes are in: > * src/hotspot/cpu/aarch64: support of the new calling convention (subtasks JDK-8253817, JDK-8253818) > * src/hotspot/os_cpu/bsd_aarch64: copy of os_cpu/linux_aarch64 with necessary adjustments (JDK-8253819) > * src/hotspot/share, test/hotspot/gtest: support of write-xor-execute (W^X), required on macOS/AArch64 platform. It's implemented with pthread_jit_write_protect_np provided by Apple. The W^X mode is local to a thread, so W^X mode change relates to the java thread state change (for java threads). In most cases, JVM executes in write-only mode, except when calling a generated stub like SafeFetch, which requires a temporary switch to execute-only mode. The same execute-only mode is enabled when a java thread executes in java or native states. This approach of managing W^X mode turned out to be simple and efficient enough. > * src/jdk.hotspot.agent: serviceability agent implementation (JDK-8254941) Anton Kozlov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 88 commits: - Merge remote-tracking branch 'upstream/jdk/master' into jdk-macos - Re-do safefetch.hpp - Merge remote-tracking branch 'origin/jdk/8261075-stubroutines-inline' into jdk-macos - stubRoutines.inline.hpp -> safefetch.hpp - Update copyrights - Merge remote-tracking branch 'upstream/jdk/master' into 8261075-stubroutines-inline - Merge remote-tracking branch 'upstream/jdk/master' into 8261075-stubroutines-inline - Extract SafeFetch32/N to stubRoutines.inline.hpp - Revert "Extract SafeFetch32/N to stubRoutines.inline.hpp" This reverts commit b873c25f31dd21349d140b790713cc9ccb5f2dc0. - Merge pull request #9 from VladimirKempik/pull/2200 Removed unused variables - ... and 78 more: https://git.openjdk.java.net/jdk/compare/b955f85e...ab72613c ------------- Changes: https://git.openjdk.java.net/jdk/pull/2200/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2200&range=17 Stats: 2946 lines in 74 files changed: 2861 ins; 27 del; 58 mod Patch: https://git.openjdk.java.net/jdk/pull/2200.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2200/head:pull/2200 PR: https://git.openjdk.java.net/jdk/pull/2200 From akozlov at openjdk.java.net Wed Feb 17 12:36:11 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Wed, 17 Feb 2021 12:36:11 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 21:51:56 GMT, Daniel D. Daugherty wrote: >> Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> support macos_aarch64 in hsdis > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 31: > >> 29: #include "asm/assembler.inline.hpp" >> 30: #include "oops/compressedOops.hpp" >> 31: #include "runtime/vm_version.hpp" > > It's not clear why this include needed to be added. Line 448 calls `VM_Version::features()`. It seems the declaration is included indirectly somehow on the rest of the platforms, through OS specific headers. ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From david.holmes at oracle.com Wed Feb 17 12:56:53 2021 From: david.holmes at oracle.com (David Holmes) Date: Wed, 17 Feb 2021 22:56:53 +1000 Subject: RFR: 8260341: CDS dump VM init code does not check exceptions In-Reply-To: References: <_-r6tekItyqxQA5WKZFBj-b0IQRC6R0Nw7QBMZbBPw0=.6c253bd5-697b-4060-9026-3cf015a6dc09@github.com> Message-ID: <6457788b-cc0a-1038-2032-3978c4a067f5@oracle.com> On 11/02/2021 4:13 am, Ioi Lam wrote: > On Wed, 10 Feb 2021 14:33:50 GMT, Harold Seigel wrote: > >> Hi Ioi, >> Do you avoid using CHECK if it's the last line of a function? For example, why is THREAD used instead of CHECK at line 1506? >> Thanks, Harold >> >> 1503 void ClassLoader::initialize_module_path(TRAPS) { >> 1504 if (Arguments::is_dumping_archive()) { >> 1505 ClassLoaderExt::setup_module_paths(CHECK); >> 1506 FileMapInfo::allocate_shared_path_table(THREAD); >> 1507 } >> 1508 } > > I thought it was a commonly used coding convention, but I could only find a few cases where the code wasn't written by me :-( It is something we have been gradually fixing. I'm made a number of requests to remove CHECK from code that will return anyway. > - https://github.com/openjdk/jdk/blame/b9d4211bc1aa92e257ddfe86c7a2b4e4e60598a0/src/hotspot/share/prims/jvm.cpp#L311 > - https://github.com/openjdk/jdk/blame/f03e839e481f905358ce7d95a5d1f5179e7f46fe/src/hotspot/share/classfile/javaClasses.cpp#L2415 > > I will go back to `CHECK`, since the C++ compiler will elide the last `CHECK` anyway: in both cases, gcc compiles the last call to a direct branch to FileMapInfo::allocate_shared_path_table (i.e., a tail call). Do all our C++ compilers do this? If they do that is great, but if they only may perhaps sometimes do then not so great. > Using `CHECK` makes the code easier to maintain (if you add new code below it). I prefer not to rely on the C++ compiler to remove the exception checking logic and use THREAD in such cases. David ----- > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/2494 > From akozlov at openjdk.java.net Wed Feb 17 13:09:14 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Wed, 17 Feb 2021 13:09:14 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: Message-ID: On Mon, 15 Feb 2021 16:21:53 GMT, Bernhard Urban-Forster wrote: >> This is the version of w^x on-demand switch implemented by microsoft guys. This is enabled only for debug builds. >> @lewurm could you comment here please > > Those values can be observed in the debugger, but aren't documented or defined in header files. > > This mode was useful for the initial bootstrap of the platform (it helped with missing W^X transitions), but shouldn't be required anymore today. I'm fine with removing it altogether. OK, I'm going to remove this block. So we'll be able to revert changes in globals.hpp https://github.com/openjdk/jdk/pull/2200/files#r568986339 ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From akozlov at openjdk.java.net Wed Feb 17 13:11:48 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Wed, 17 Feb 2021 13:11:48 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: Message-ID: <2DZXzH1_KDoevOmbbR0ipW0LWSo16BCilHOP1geU3_0=.201276ca-ddb3-49eb-a488-122b54467c49@github.com> On Tue, 2 Feb 2021 22:42:22 GMT, Daniel D. Daugherty wrote: >> Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> support macos_aarch64 in hsdis > > src/hotspot/share/logging/logStream.hpp line 35: > >> 33: class LogStream : public outputStream { >> 34: // see test/hotspot/gtest/logging/test_logStream.cpp >> 35: friend class LogStreamTest; > > It's not clear why this change is made for this port. This was done for previous implementation of W^X, for gtests be able to access this test. This not required anymore, this hunk was reverted. ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From stuefe at openjdk.java.net Wed Feb 17 13:26:20 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 17 Feb 2021 13:26:20 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v7] In-Reply-To: References: Message-ID: <8vAf2pIuZoDkFY-d3VC9rzzss_Bmult2_cbEJo6Aw6c=.fd724d30-bae0-4aaf-8676-769982ed45d6@github.com> > In signal handling code, we have code sections which save signal handler state into vectors of sigaction structures, or of integers (if only flags are saved). All these code sections can be unified, disentangled and the using code simplified. > > There are three places where we do this: > > 1) When installing hotspot signal handlers, should we find a handler in place and signal chaining is enabled, we save the original handler inside a sigaction array and a corresponding sigset: > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L85 > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L338 > > 2) if diagnostics are enabled with -Xcheck:jni, we periodically check if our hotspot signal handlers had been replaced (`static void check_signal_handler(int sig)`): > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L766 > To do that, we store information about the handlers we installed and we expect to be intact; in this case we only store the sigaction flags (`int sigflags[NSIG];`) and deduce the handler address from context. > > 3) There is a complicated dance between VMError and the posix signal handler code: If a fatal error happens, we enter error reporting and install the secondary handler (`VMError::install_secondary_signal_handler()`). Before doing that, we store the handler we replace in yet another array, in this case one array for the handler address, one for the flag: > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/vmError_posix.cpp#L77 > I believe the purpose of this is to - when printing signal handlers as part of error reporting - print the original signal handler instead of the secondary crash handler (see `PosixSignals::print_signal_handler()`): > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L1372 > and additionally to not trip this warning here: > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L1391 > > ------ > > Changes in this patch: > > - I added some convenience macros to check if a handler matches a given function (HANDLER_IS), check if a handler is set to ignore or default or both (HANDLER_IS_IGN, HANDLER_IS_DFL, HANDLER_IS_IGN_OR_DFL). Makes code more readable. > - I added convenience class `SavedSignalHandlers` to keep a vector of handler information by signal number. > - I used that class to cover cases (1)..(3): > - `chained_handlers` contains all information of chained handlers > - `expected_handlers` contains a copy of the handlers the hotspot installed > - `replaced_handlers` contains information about replaced handlers > > - about (1): I store the chained signal handler information in `chained_handlers` when installing a hotspot handler, UseSignalChaining is 1, and a non-default handler was encountered. > > - about (2): I simplified the signal checking mechanism quite a bit: it compares the handler (address and flags) it finds present with expectations. Before this patch, the expected handler address was deduced in a hard-wired way, now, we just compare the active sigaction structure with the one we installed on VM start. > > - about (3): when installing any handler (hotspot as well as user defined via java), I store the handler it replaced in `replaced_handlers`. I use that to print which handler had been replaced in `PosixSignals::print_signal_handler`. I simplified `PosixSignals::print_signal_handler` such that it does not retain any knowledge about hotspot signal handlers. Now, it just prints out the currently established handlers. In addition to that, it prints out chaining information and which handlers had been replaced. I removed the associated coding from VMError. > > Output Before: > 663 Signal Handlers: > 664 SIGSEGV: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 665 SIGBUS: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 666 SIGFPE: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 667 SIGPIPE: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 668 SIGXFSZ: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 669 SIGILL: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 670 SIGUSR2: SR_handler in libjvm.so, sa_mask[0]=00000000000000000000000000000000, sa_flags=SA_RESTART|SA_SIGINFO > 671 SIGHUP: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 672 SIGINT: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 673 SIGTERM: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 674 SIGQUIT: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 675 SIGTRAP: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > > Now: > Signal Handlers: > SIGSEGV: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > replaced: SIGSEGV: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGBUS: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > replaced: SIGBUS: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGFPE: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > replaced: SIGFPE: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGPIPE: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGXFSZ: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGILL: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > replaced: SIGILL: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGUSR2: SR_handler in libjvm.so, mask=00000000000000000000000000000000, flags=SA_RESTART|SA_SIGINFO > SIGHUP: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGINT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGTERM: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGQUIT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGTRAP: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > > ----- > Tests: GA, and the patch has been tested in our nighlies for over a month now. I manually executed the runtime/jni/checked tests too. Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: - Style fixes - expected_handlers->vm_handlers - Merge - Use universal zero initializer for do_check_signal_periodically - Further fixes - Fix build error on zlinux - David Feedback - Make SavedSignalHandlers use C-heap for its items - Removed display-replaced-handler-logic - Feedback David - ... and 2 more: https://git.openjdk.java.net/jdk/compare/823789fb...5cf58186 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2251/files - new: https://git.openjdk.java.net/jdk/pull/2251/files/06e1b030..5cf58186 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2251&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2251&range=05-06 Stats: 37235 lines in 1175 files changed: 21184 ins; 10280 del; 5771 mod Patch: https://git.openjdk.java.net/jdk/pull/2251.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2251/head:pull/2251 PR: https://git.openjdk.java.net/jdk/pull/2251 From jbhateja at openjdk.java.net Wed Feb 17 14:12:39 2021 From: jbhateja at openjdk.java.net (Jatin Bhateja) Date: Wed, 17 Feb 2021 14:12:39 GMT Subject: Integrated: 8261553: Efficient mask generation using BMI2 BZHI instruction In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 08:31:40 GMT, Jatin Bhateja wrote: > BMI2 BHZI instruction can be used to optimize the instruction sequence > used for mask generation at various place in array copy stubs and partial in-lining for small copy operations. This pull request has now been integrated. Changeset: cb84539d Author: Jatin Bhateja URL: https://git.openjdk.java.net/jdk/commit/cb84539d Stats: 39 lines in 6 files changed: 12 ins; 11 del; 16 mod 8261553: Efficient mask generation using BMI2 BZHI instruction Reviewed-by: redestad, neliasso ------------- PR: https://git.openjdk.java.net/jdk/pull/2522 From enikitin at openjdk.java.net Wed Feb 17 15:37:42 2021 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Wed, 17 Feb 2021 15:37:42 GMT Subject: RFR: 8058176: [mlvm] tests should not allow code cache exhaustion [v2] In-Reply-To: References: <2_Gpraz6NaY17HPfRDW-LD-sQrrPQ4dpIVP8vikpdXM=.d425cd8b-aea5-43be-865e-72229db81e6e@github.com> Message-ID: On Tue, 16 Feb 2021 19:49:02 GMT, Igor Ignatyev wrote: >> Thanks for the info about the segmented code cache. I did some research and found that the opposite is true - both nmethod pools ('profiled' and 'non-profiled') are growing along with the MH graph growth. This is supported by the specification for non-method code heap at: >> >> https://docs.oracle.com/en/java/javase/15/vm/java-hotspot-virtual-machine-performance-enhancements.html#GUID-1D9B26AD-8E0A-4771-90DA-A81A2C1F5B55 >> >> Please check the the fixed version. > > o/c they grow, b/c we use them for compiled code *and* if there is no space in non-nmethod heap, we use them for adapters as well, so I guess that the growth that you see is already after non-nmethod heap got exhausted. I'd recommend you simply use the sum of all available code-heaps (this will increase the possibility of false-positive results due to segmentation, but I don't think it matters much here). Well, seems like rebalancing doesn't works that good. Here's a sample failure with plenty of free space in the non-nmethods heap: [8.230s][warning][codecache] CodeHeap 'non-profiled nmethods' is full. Compiler has been disabled. [8.230s][warning][codecache] Try increasing the code heap size using -XX:NonProfiledCodeHeapSize= Java HotSpot(TM) 64-Bit Server VM warning: CodeHeap 'non-profiled nmethods' is full. Compiler has been disabled. Java HotSpot(TM) 64-Bit Server VM warning: Try increasing the code heap size using -XX:NonProfiledCodeHeapSize= CodeHeap 'non-profiled nmethods': size=8192Kb used=8191Kb max_used=8191Kb free=0Kb << Exhausted CodeHeap 'profiled nmethods': size=8192Kb used=8191Kb max_used=8191Kb free=0Kb << Exhausted CodeHeap 'non-nmethods': size=102400Kb used=18343Kb max_used=18343Kb free=84056Kb << 84Mb of free space # ERROR: Caught exception in Thread[Thread-41,5,MainThreadGroup] ... # ERROR: Caused by: java.lang.VirtualMachineError: Out of space in CodeCache for method handle intrinsic The sum monitoring won't help here either. I've added non-nmethods heap to the monitoring, just to be sure. ------------- PR: https://git.openjdk.java.net/jdk/pull/2523 From iignatyev at openjdk.java.net Wed Feb 17 15:49:39 2021 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Wed, 17 Feb 2021 15:49:39 GMT Subject: RFR: 8058176: [mlvm] tests should not allow code cache exhaustion [v2] In-Reply-To: References: <2_Gpraz6NaY17HPfRDW-LD-sQrrPQ4dpIVP8vikpdXM=.d425cd8b-aea5-43be-865e-72229db81e6e@github.com> Message-ID: <2qEkvkaxAPHeFaDoCRmcPaehczQgwZNnZMxO2Z-Vc28=.d4845a88-7d71-4768-b952-5ff9c4ab8311@github.com> On Wed, 17 Feb 2021 15:34:41 GMT, Evgeny Nikitin wrote: >> o/c they grow, b/c we use them for compiled code *and* if there is no space in non-nmethod heap, we use them for adapters as well, so I guess that the growth that you see is already after non-nmethod heap got exhausted. I'd recommend you simply use the sum of all available code-heaps (this will increase the possibility of false-positive results due to segmentation, but I don't think it matters much here). > > Well, seems like rebalancing doesn't works that good. Here's a sample failure with plenty of free space in the non-nmethods heap: > > [8.230s][warning][codecache] CodeHeap 'non-profiled nmethods' is full. Compiler has been disabled. > [8.230s][warning][codecache] Try increasing the code heap size using -XX:NonProfiledCodeHeapSize= > Java HotSpot(TM) 64-Bit Server VM warning: CodeHeap 'non-profiled nmethods' is full. Compiler has been disabled. > Java HotSpot(TM) 64-Bit Server VM warning: Try increasing the code heap size using -XX:NonProfiledCodeHeapSize= > CodeHeap 'non-profiled nmethods': size=8192Kb used=8191Kb max_used=8191Kb free=0Kb << Exhausted > CodeHeap 'profiled nmethods': size=8192Kb used=8191Kb max_used=8191Kb free=0Kb << Exhausted > CodeHeap 'non-nmethods': size=102400Kb used=18343Kb max_used=18343Kb free=84056Kb << 84Mb of free space > > # ERROR: Caught exception in Thread[Thread-41,5,MainThreadGroup] > ... > # ERROR: Caused by: java.lang.VirtualMachineError: Out of space in CodeCache for method handle intrinsic > The sum monitoring won't help here either. I've added non-nmethods heap to the monitoring, just to be sure. hm... that can mean that there is a product bug (or my recollections about code heaps aren't as good as I thought). @TobiHartmann , @iwanowww, could you please take a look? Evgeny's observations suggest that method handle intrinsics use `non-profiled nmethods` and `profiled nmethods` heaps and not `non-nmethods` heap despite the fact that the last one has plenty of free space. my understanding is/was that we should have used `non-nmethods` heap for MH intrinsic 1st and if it's exhausted start to use the other heaps. Thanks, -- Igor ------------- PR: https://git.openjdk.java.net/jdk/pull/2523 From aph at openjdk.java.net Wed Feb 17 15:55:51 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 17 Feb 2021 15:55:51 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v8] In-Reply-To: References: Message-ID: On Tue, 16 Feb 2021 06:24:05 GMT, Vladimir Kempik wrote: >> This is when passing a float, yes? In the case where we have more float arguments than n_float_register_parameters_c. >> I don't understand why you think it's acceptable to bail in this case. Can you explain, please? > > it's for everything that uses less than 8 bytes on a stack( ints ( 4), shorts(2), bytes(1), floats(4)). > currently native wrapper generation does not support such cases at all, it needs refactoring before this can be implemented. > So when a method has more argument than can be placed in registers, we may have issues. > > So we just bailing out to interpreter in case when a smaller (<=4 b) type is going to be passed thru the stack. > > There was attempt to implement handling such cases but currently it requires some hacks (like using some vectors for non-specific task) - https://github.com/openjdk/aarch64-port/pull/3 OK. I checked and the Panama preview doesn't support direct native calls for stack arguments, so we're good for now. ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From aph at openjdk.java.net Wed Feb 17 15:55:50 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 17 Feb 2021 15:55:50 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v18] In-Reply-To: References: <0-H97G4l5XqFtiEUbNAqrP_j143iny5kkF0tsiAqMvQ=.2396963b-db5d-469e-bc30-511f754a600a@github.com> Message-ID: <2D2djDl7sFFBYlKWrD1t7aXikT8r5iVAibMR4HI6bfw=.1465f633-0045-4917-b4dd-883f04e5e41e@github.com> On Mon, 15 Feb 2021 17:45:32 GMT, Anton Kozlov wrote: >> I'm not sure it can wait. This change turns already-messy code into something significantly messier, to the extent that it's not really good enough to go into mainline. > > The latest merge with JDK-8261071 should resolve the issue. Please take a look. Looks much better, thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From gziemski at openjdk.java.net Wed Feb 17 16:00:47 2021 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Wed, 17 Feb 2021 16:00:47 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v6] In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 05:53:42 GMT, Thomas Stuefe wrote: >> src/hotspot/os/posix/signals_posix.cpp line 414: >> >>> 412: if (actp == NULL) { >>> 413: // Retrieve the preinstalled signal handler from jvm >>> 414: actp = const_cast(chained_handlers.get(sig)); >> >> Must `SavedSignalHandlers::get()` have the **const struct** in `const struct sigaction* get(int sig) const` signature? >> >> If it was just `struct sigaction* get(int sig) const` then we wouldn't need this awkward cast. > > I'm a stickler to const since its very expressive and prevents stupid errors. `const struct sigaction* get() const` makes perfect sense since at no time you are supposed to modify the underlying sigaction structure. Once it is saved, it should only be used to read from (eg to print, or to extract the handler address and call it in `call_chained_handler()`. > > But old code was not const correct. The way to start being const correct without having to rewrite everything is to make new code const correct and cauterize at the interface between new and old code with these ugly but at least expressive casts. > > Side note, even the sigaction function is const correct: https://pubs.opengroup.org/onlinepubs/007904875/functions/sigaction.html > see the `act` input parameter designating the new handler, its const. Only the output parameter `oact` is nonconst. > > I will see if I can make the rest of the code const correct too wrt the sigaction structures. That way we can get rid of the cast and have better code as well. If we can do that, then that would be ideal. ------------- PR: https://git.openjdk.java.net/jdk/pull/2251 From gziemski at openjdk.java.net Wed Feb 17 16:13:42 2021 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Wed, 17 Feb 2021 16:13:42 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v6] In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 09:19:14 GMT, Thomas Stuefe wrote: > > java -XX:ErrorHandlerTest=2 -version > > I cannot reproduce it (macos, fastdebug). What build is this? The error is weird since "too many errors, abort" should have been preceded by any number of secondary error printouts. > > These kind of tests (let it crash and check if error handling works) are also executed as part of tier1 hotspot (runtime/ErrorHandling) and seem to run fine in GA actions. I'll look into why I see this and report back... ------------- PR: https://git.openjdk.java.net/jdk/pull/2251 From gziemski at openjdk.java.net Wed Feb 17 16:13:43 2021 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Wed, 17 Feb 2021 16:13:43 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v6] In-Reply-To: References: Message-ID: <1F6D0fZvQIyMqVMcCImTr1CYgww4meSU5BTPHxBQtwc=.2c95996f-22de-453f-8d7f-6f92b25ffc83@github.com> On Wed, 17 Feb 2021 05:42:11 GMT, Thomas Stuefe wrote: >> I would like to keep this. SavedSignalHandlers::check_signal_number() is just a simple bounds check to prevent memory overwriters, with an added assert for goot measure in debug builds. Its proximity to the array it guards makes this clear. SavedSignalHandlers gets only fed with know values in this file, so an assert is sufficient. >> >> is_valid_signal() OTOH does a runtime check for signal validity. It is more expensive but only used in analysis situations. It pulls its signal number from a sigaction structure, which may be corrupt and contain whatever, so the runtime check makes more sense to me. > > In fact, SavedSignalHandler::_sa[NSIG] could be reduced in size to something more sensible, since we only ever feed "normal" signals into it. Same reason as NUM_IMPORTANT_SIGS. But we only talk a few kbyte here, so I did not bother. I would have preferred to have a single function to check whether the signal is valid, otherwise we have 2 separate specialized API that almost do the same thing. Any future reader of the code will probably wonder again. For code readability I'd prefer to have just one API, or at least refactor the more complex one in terms of the simpler one, and give them meaningful names to differentiate them, but OK. ------------- PR: https://git.openjdk.java.net/jdk/pull/2251 From gziemski at openjdk.java.net Wed Feb 17 16:13:43 2021 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Wed, 17 Feb 2021 16:13:43 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v6] In-Reply-To: References: <-3-_jZX9uZJptkKvOOYeagiPnij3xQqqVCBVhgOtK5o=.95794b96-18ac-41ec-a793-16d097e8efd4@github.com> Message-ID: On Wed, 17 Feb 2021 06:01:03 GMT, Thomas Stuefe wrote: >> src/hotspot/os/posix/signals_posix.cpp line 1222: >> >>> 1220: // Query the current signal handler. Needs to be a separate operation >>> 1221: // from installing a new handler since we need to honor AllowUserSignalHandlers. >>> 1222: void* oldhand = get_signal_handler(&oldAct); >> >> It's a pre-existing issue, but I dislike the **"old"** in **oldhand** and **oldAct**. It makes it sound like it is a cached or previous value, which is especially confusing when in the comment we refer to it as the **current** one. Any chance we can clean this up in this fix? > > I think it stems from the Posix interface calling the parameters `act` and `oact` and ppl are used to that naming: > int sigaction(int sig, const struct sigaction *restrict act, > struct sigaction *restrict oact); > That said, we use a bunch of different names, like oact, oldAct or oldSigAct, which I disliked. I wanted to unify those namings but refrained since I wanted to keep the patch small. Its difficult enough as it is. > > Since this would be an easy change in itself, can we keep it for a different RFE? As for name, lets say currentAct? I'd love if it we could follow up on it. ------------- PR: https://git.openjdk.java.net/jdk/pull/2251 From stuefe at openjdk.java.net Wed Feb 17 16:40:47 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 17 Feb 2021 16:40:47 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v6] In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 16:10:29 GMT, Gerard Ziemski wrote: > > > java -XX:ErrorHandlerTest=2 -version > > > > > > I cannot reproduce it (macos, fastdebug). What build is this? The error is weird since "too many errors, abort" should have been preceded by any number of secondary error printouts. > > These kind of tests (let it crash and check if error handling works) are also executed as part of tier1 hotspot (runtime/ErrorHandling) and seem to run fine in GA actions. > > I'll look into why I see this and report back... Okay, thank you. Note that I'm off the rest of the week, so I won't be able to reply until Monday. ------------- PR: https://git.openjdk.java.net/jdk/pull/2251 From gziemski at openjdk.java.net Wed Feb 17 16:56:44 2021 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Wed, 17 Feb 2021 16:56:44 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v6] In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 16:38:22 GMT, Thomas Stuefe wrote: >>> > java -XX:ErrorHandlerTest=2 -version >>> >>> I cannot reproduce it (macos, fastdebug). What build is this? The error is weird since "too many errors, abort" should have been preceded by any number of secondary error printouts. >>> >>> These kind of tests (let it crash and check if error handling works) are also executed as part of tier1 hotspot (runtime/ErrorHandling) and seem to run fine in GA actions. >> >> I'll look into why I see this and report back... > >> > > java -XX:ErrorHandlerTest=2 -version >> > >> > >> > I cannot reproduce it (macos, fastdebug). What build is this? The error is weird since "too many errors, abort" should have been preceded by any number of secondary error printouts. >> > These kind of tests (let it crash and check if error handling works) are also executed as part of tier1 hotspot (runtime/ErrorHandling) and seem to run fine in GA actions. >> >> I'll look into why I see this and report back... > > Okay, thank you. Note that I'm off the rest of the week, so I won't be able to reply until Monday. It looks like I see the issue only with Xcode build hotspot (using the actual Xcode IDE), the build produced using the command line make/compiler works fine. Digging in deeper... ------------- PR: https://git.openjdk.java.net/jdk/pull/2251 From lucy at openjdk.java.net Wed Feb 17 16:58:40 2021 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Wed, 17 Feb 2021 16:58:40 GMT Subject: RFR: JDK-8261552: s390: MacroAssembler::encode_klass_not_null() may produce wrong results for non-zero values of narrow klass base In-Reply-To: References: Message-ID: On Tue, 16 Feb 2021 20:49:49 GMT, Thomas Stuefe wrote: > If Compressed class pointer base has a non-zero value it may cause MacroAssembler::encode_klass_not_null() to encode a Klass pointer to a wrong narrow pointer. > > This can be reproduced by starting the VM with > -Xshare:dump -XX:HeapBaseMinAddress=2g -Xmx128m > but CDS is not involved. It is only relevant insofar as this is the only way to get the following combination: > - heap is allocated at 0x800_0000. It is small and ends at 0x8800_0000. > - class space follows at 0x8800_0000 > - the narrow klass pointer base points to the start of the class space at 0x8800_0000. > > In MacroAssembler::encode_klass_not_null(), there is the following section: > > if (base != NULL) { > unsigned int base_h = ((unsigned long)base)>>32; > unsigned int base_l = (unsigned int)((unsigned long)base); > if ((base_h != 0) && (base_l == 0) && VM_Version::has_HighWordInstr()) { > lgr_if_needed(dst, current); > z_aih(dst, -((int)base_h)); // Base has no set bits in lower half. > } else if ((base_h == 0) && (base_l != 0)) { (A) > lgr_if_needed(dst, current); > z_agfi(dst, -(int)base_l); (B) > } else { > load_const(Z_R0, base); > lgr_if_needed(dst, current); > z_sgr(dst, Z_R0); > } > current = dst; > } > > We enter the condition at (A) if the narrow klass pointer base is non-zero but fits into 32bit. At (B), we want to substract the base from the Klass pointer; we do this by calculating the 32bit twos-complement of the base and add it with AGFI. AGFI adds a 32bit immediate to a 64bit register. In this case, it produces the wrong result if the base is >0x800_0000: > > In the case of the crash, we have: > base: 8800_0000 > klass pointer: 8804_1040 > 32bit two's complement of base: 7800_0000 > added to the klass pointer: 1_0004_1040 > > So the result of the "substraction" is 1_0004_1040, it should be 4_1040, which would be the correct offset of the Klass* pointer within the ccs. > > This bug has been dormant; was activated by JDK-8250989 which changed the way class space reservation happens at CDS dump time. It surfaced first as crash in a CDS-specific jtreg test (JDK-8261552). > > ================ > > Fix: > > I changed the AGFI instruction to a pure 32bit add (AFI). That works as long as the Klass pointer also fits into 32bit. So I narrowed the condition at (A) to only fire if it can be ensured that both narrow base and Klass* pointers fit into 32bit. > > I also added a runtime verification in that case that any Klass pointer passed down is indeed a 32bit pointer. However, I am not really sure this is useful, or that this is the best way to do this (using TMHH and TMHL). I was looking for something like TMH or TML to check whole 32bit words but could not find any. > > ---- > > Tests: > > I manually tested that the crash disappears, which it does. I stepped through the encoding code and the values now look right. > > I also did build a VM with the ability to override both class space start address and the narrow klass pointer base to exact values (see https://github.com/openjdk/jdk/compare/master...tstuefe:override-ccs-start-and-base). > > I used this method to test various combinations: > - narrow klass pointer base > 0 < 4g + ccs end < 4g (we hit our branch doing AFI) > - narrow klass pointer base > 0 < 4g + ccs end > 4g (we hit the fallback doing SGR with r0) > - narrow klass pointer base = 0 (we dont do anything) > > (would this override-feature be useful? We could do better testing). > > Thanks, Thomas src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3645: > 3643: } > 3644: #endif > 3645: I do not like the cross-dependency to metaspace.hpp just for the sake of checking an artificial restriction on Klass pointers. And by the way, you could do the check with one test: z_oihf(current, 0); z_brc(Assembler::bcondZero, ok); z_oihf() does modify the contents of register current, but it writes back the same value. src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3657: > 3655: } else { > 3656: load_const(Z_R0, base); > 3657: lgr_if_needed(dst, current); What would you think of a more general rework like this? The comments in the code should explain the intentions/assumptions/conclusions. // Klass oop manipulations if compressed. void MacroAssembler::encode_klass_not_null(Register dst, Register src) { Register current = (src != noreg) ? src : dst; // Klass is in dst if no src provided. (dst == src) also possible. address base = CompressedKlassPointers::base(); int shift = CompressedKlassPointers::shift(); bool need_zero_extend = false; assert(UseCompressedClassPointers, "only for compressed klass ptrs"); BLOCK_COMMENT("cKlass encoder {"); #ifdef ASSERT Label ok; z_tmll(current, KlassAlignmentInBytes-1); // Check alignment. z_brc(Assembler::bcondAllZero, ok); // The plain disassembler does not recognize illtrap. It instead displays // a 32-bit value. Issueing two illtraps assures the disassembler finds // the proper beginning of the next instruction. z_illtrap(0xee); z_illtrap(0xee); bind(ok); #endif // Scale down the incoming klass pointer first. // We then can be sure we calculate an offset that fits into 32 bit. // More generally speaking: all subsequent calculations are purely 32-bit. if (shift != 0) { assert (LogKlassAlignmentInBytes == shift, "decode alg wrong"); z_srlg(dst, current, shift); need_zero_extend = true; current = dst; } if (base != NULL) { // Use scaled-down base address parts to match scaled-down klass pointer. unsigned int base_h = ((unsigned long)base)>>(32+shift); unsigned int base_l = (unsigned int)(((unsigned long)base)>>shift); // General considerations: // - when calculating (current_h - base_h), all digits must cancel (become 0). // Otherwise, we would end up with a compressed klass pointer which doesn't // fit into 32-bit. // - Only bit#33 of the difference could potentially be non-zero. For that // to happen, (current_l < base_l) must hold. In this case, the subtraction // will create a borrow out of bit#32, nicely killing bit#33. // - With the above, we only need to consider current_l and base_l to // calculate the result. // - Both values are treated as unsigned. The unsigned subtraction is // replaced by adding (unsigned) the 2's complement of the subtrahend. if (base_l == 0) { // - By theory, the calculation to be performed here (current_h - base_h) MUST // cancel all high-word bits. Otherwise, we would end up with an offset // (i.e. compressed klass pointer) that does not fit into 32 bit. // - current_l remains unchanged. // - Therefore, we can replace all calculation with just a // zero-extending load 32 to 64 bit. // - Even that can be replaced with a conditional load if dst != current. // (this is a local view. The shift step may have requested zero-extension). } else { // To begin with, we may need to copy and/or zero-extend the register operand. // We have to calculate (current_l - base_l). Because there is no unsigend // subtract instruction with immediate operand, we add the 2's complement of base_l. if (need_zero_extend) { z_llgfr(dst, current); need_zero_extend = false; } else { llgfr_if_needed(dst, current); // zero-extension while copying comes at no extra cost. } current = dst; z_alfi(dst, -(int)base_l); } if (need_zero_extend) { // We must zero-extend the calculated result. It may have some leftover bits in // the hi-word because we only did optimized calculations. z_llgfr(dst, current); } else { llgfr_if_needed(dst, current); // zero-extension while copying comes at no extra cost. } BLOCK_COMMENT("} cKlass encoder"); } ------------- PR: https://git.openjdk.java.net/jdk/pull/2595 From aph at openjdk.java.net Wed Feb 17 17:24:58 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 17 Feb 2021 17:24:58 GMT Subject: RFR: 8261649: AArch64: Optimize LSE atomics in C++ code Message-ID: 8261649: AArch64: Optimize LSE atomics in C++ code ------------- Commit messages: - Restore LSE CAS generation. - Merge https://github.com/openjdk/jdk into JDK-8261650 - Trailing membar. - Flush everything correctly - Committed - Intermediate Changes: https://git.openjdk.java.net/jdk/pull/2611/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2611&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261649 Stats: 282 lines in 5 files changed: 166 ins; 51 del; 65 mod Patch: https://git.openjdk.java.net/jdk/pull/2611.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2611/head:pull/2611 PR: https://git.openjdk.java.net/jdk/pull/2611 From aph at openjdk.java.net Wed Feb 17 18:13:52 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 17 Feb 2021 18:13:52 GMT Subject: RFR: 8261649: AArch64: Optimize LSE atomics in C++ code Message-ID: Now that we have support for LSE atomics in C++ HotSpot source, we can generate much better code for them. In particular, the sequence we generate for CMPXCHG with a full two-way barrier using two DMBs is way suboptimal. Barrier-ordered-before, Arm Architecture Reference Manual B2.3 : | Barrier instructions order prior Memory effects before subsequent | Memory effects generated by the same Observer. A read or a write RW1 | is Barrier-ordered-before a read or a write RW2 from the same Observer | if and only if RW1 appears in program order before RW2 and any of the | following cases apply: | | [...] | | * RW1 appears in program order before an atomic instruction with both | Acquire and Release semantics that appears in program order before RW2. So a prior load or store cannot be reordered with the load of an atomic swap with Acquire and Release semantics. This barrier-ordered-before in combination with sequential consistency gives us everything we need for a full barrier. However, we still need a DMB after the cmpxchg to ensure that subsequent loads and stores cannot be reordered with the store in an atomic instruction. ------------- Commit messages: - Everything Changes: https://git.openjdk.java.net/jdk/pull/2612/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2612&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261649 Stats: 280 lines in 4 files changed: 164 ins; 51 del; 65 mod Patch: https://git.openjdk.java.net/jdk/pull/2612.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2612/head:pull/2612 PR: https://git.openjdk.java.net/jdk/pull/2612 From aph at openjdk.java.net Wed Feb 17 18:17:43 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 17 Feb 2021 18:17:43 GMT Subject: RFR: 8261649: AArch64: Optimize LSE atomics in C++ code In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 18:06:55 GMT, Andrew Haley wrote: > Now that we have support for LSE atomics in C++ HotSpot source, we can generate much better code for them. In particular, the sequence we generate for CMPXCHG with a full two-way barrier using two DMBs is way suboptimal. > > Barrier-ordered-before, Arm Architecture Reference Manual B2.3 : > > | Barrier instructions order prior Memory effects before subsequent > | Memory effects generated by the same Observer. A read or a write RW1 > | is Barrier-ordered-before a read or a write RW2 from the same Observer > | if and only if RW1 appears in program order before RW2 and any of the > | following cases apply: > | > | [...] > | > | * RW1 appears in program order before an atomic instruction with both > | Acquire and Release semantics that appears in program order before RW2. > > So a prior load or store cannot be reordered with the load of an atomic swap with Acquire and Release semantics. This barrier-ordered-before in combination with sequential consistency gives us everything we need for a full barrier. However, we still need a DMB after the cmpxchg to ensure that subsequent loads and stores cannot be reordered with the store in an atomic instruction. This patch: Moves memory barriers from the atomic_linux_aarch64 file into the stubs. Rewrites the LSE versions of the stubs to be more efficient. Fixes a race condition in stub generation. Mostly leaves the pre-LSE stubs alone, except that I added a PRFM which according to kernel engineers improves performance. ------------- PR: https://git.openjdk.java.net/jdk/pull/2612 From lucy at openjdk.java.net Wed Feb 17 18:22:01 2021 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Wed, 17 Feb 2021 18:22:01 GMT Subject: RFR: 8261447: MethodInvocationCounters frequently run into overflow [v7] In-Reply-To: References: Message-ID: > Dear community, > may I please request reviews for this fix, improving the usefulness of method invocation counters. > - aggregation counters are retyped as uint64_t, shifting the overflow probability way out (185 days in case of a 1 GHz counter update frequency). > - counters for individual methods are interpreted as (unsigned int), in contrast to their declaration as int. This gives us a factor of two before the counters overflow. > - as a special case, "compiled_invocation_counter" is retyped as long, because it has a higher update frequency than other counters. > - before/after sample output is attached to the bug description. > > Thank you! > Lutz Lutz Schmidt has updated the pull request incrementally with one additional commit since the last revision: update copyright year ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2511/files - new: https://git.openjdk.java.net/jdk/pull/2511/files/faab64b0..0f220ee3 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2511&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2511&range=05-06 Stats: 8 lines in 6 files changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/2511.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2511/head:pull/2511 PR: https://git.openjdk.java.net/jdk/pull/2511 From lucy at openjdk.java.net Wed Feb 17 18:27:43 2021 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Wed, 17 Feb 2021 18:27:43 GMT Subject: RFR: 8261447: MethodInvocationCounters frequently run into overflow [v7] In-Reply-To: References: <7S5jdlFpZ5m2xtFinD92jQEQm6hgbQXjHR5N-3XbXkc=.fe75978e-860a-4bcc-b5b4-3e4b4246d706@github.com> Message-ID: On Mon, 15 Feb 2021 18:21:53 GMT, Lutz Schmidt wrote: >> This is a request for help. Could someone with SA knowledge please check if my assumption is correct? >> >> In hotspot code, the field Method::_compiled_invocation_count is annotated with a comment that it is used by SA. The field is also exposed via vmStructs.cpp to enable such use. I have scanned SA code in OpenJDK11 and OpenJDK head but found no evidence that this particular field is accessed. Is this finding/assumption correct? >> >> If so, I could just stop exposing the field, making my life easier. Thanks! > > Looks like I have completely messed up my pull request. Please disregard for now. I'm trying to find a way how to clean up. Maybe I'll just start over. OK, my pull request is back in a reviewable state. Here is what changed: 1) Honouring review comments from @TobiHartmann and @veresov Trusting my own code research, I removed _compiled_invocation_count from vmStructs.cpp. Builds are ok and all tests we run in-house (including the jtreg suite) did not show any issue. The updated pull request has _compiled_invocation_count widened to 64-bit and all those *64 suffixes are removed. 2) Dealing with counter updates in {v|i}table stubs While waiting for a response from SA experts, I took the time and had a closer look at the last remaining 32-bit counter (_nof_megamorphic_calls). It turned out the required changes to code generation were trivial. So I took the opportunity and made it a 64-bit counter. Call stats look even nicer now! In summary: All global invocation counters are 64-bit now. From those counters that register method-individual calls, only _compiled_invocation_count and _nof_megamorphic_calls were widened to 64-bit. The three remaining method-individual counters (invocation_count, interpreter_invocation_count. and backedge_count) remain untouched. I appreciate your feedback! Here is how stats look like now: Invocations summary for 28214 methods: 41055191904 (100%) total 4818528940 (11.7%) |- interpreted 36236662964 (88.3%) |- compiled 9065026571 (22.1%) |- special methods (interpreted and compiled) 607128840 ( 1.5%) |- synchronized 2107652419 ( 5.1%) |- final 6347934023 (15.5%) |- static 1122857 ( 0.0%) |- native 1188432 ( 0.0%) |- accessor Calls from compiled code: 27011733837 (100%) total non-inlined 14500960686 (53.7%) |- virtual calls 124325246564 ( 857%) | |- inlined 0 ( 0%) | |- optimized 8890453008 ( 61%) | |- monomorphic 5610507678 ( 39%) | |- megamorphic 4529160753 (16.8%) |- interface calls 8905052200 ( 197%) | |- inlined 0 ( 0%) | |- optimized 4529160753 ( 100%) | |- monomorphic 0 ( 0%) | |- megamorphic 7981612398 (29.5%) |- static/special calls 73886243527 ( 926%) | |- inlined ------------- PR: https://git.openjdk.java.net/jdk/pull/2511 From adinn at openjdk.java.net Wed Feb 17 17:54:40 2021 From: adinn at openjdk.java.net (Andrew Dinn) Date: Wed, 17 Feb 2021 17:54:40 GMT Subject: RFR: 8261649: AArch64: Optimize LSE atomics in C++ code In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 17:14:46 GMT, Andrew Haley wrote: > 8261649: AArch64: Optimize LSE atomics in C++ code src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5651: > 5649: __ mov(prev, compare_val); > 5650: __ lse_cas(prev, exchange_val, ptr, size, acquire, release, /*not_pair*/true); > 5651: if (acquire && release) { These two flags are only ever passed as true,true or false,false. Does any other combination make sense? If not then should you not be using a single flag? or at least asserting (pro tem) that they are both equal? src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.S line 75: > 73: .align 5 > 74: aarch64_atomic_cmpxchg_1_default_impl: > 75: dmb ish Having argued above that this dmb is never needed why is it in this default impl? (also for size 4 and 8) ------------- PR: https://git.openjdk.java.net/jdk/pull/2611 From github.com+42899633+eastig at openjdk.java.net Wed Feb 17 19:20:42 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Wed, 17 Feb 2021 19:20:42 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v4] In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 12:04:56 GMT, Evgeny Astigeevich wrote: >> Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: >> >> - 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set >> >> reimplement this feature. withdraw my intrusive change in outputStream. >> use stringStream only for the constant OopPtr. after oop->print_on(st), >> delete all appearances of '\n' >> - Merge branch 'master' into JDK-8260198 >> - 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set >> >> fix merge conflict. >> - Merge branch 'master' into JDK-8260198 >> - 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set >> >> Add a flag _suppress_cr to outputStream. outstream objects won't emit any CR if it's set. >> Correct TypeInstPtr::dump2 to make sure it only emits klass name once. >> Remove the comment because Klass::oop_print_on() has emitted the address of oop. >> >> Before: >> 689 ConP === 0 [[ 821 ]] Oop:java/lang/Stringjava.lang.String >> {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' >> - string: "a" >> :Constant:exact * >> >> After: >> 689 ConP === 0 [[ 821 ]] Oop:java.lang.String {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' - string: "a":Constant:exact * > > src/hotspot/share/opto/type.cpp line 4049: > >> 4047: ss.print(" "); >> 4048: const_oop()->print_oop(&ss); >> 4049: ss.tr_delete('\n'); > > `tr_delete` is expensive. Also deleting something in a stream does not fit into a concept of streams. > I see that the content of `ss` is traversed many times. > What about this code: > for (const char *str = ss.base(); *str; ) { > size_t i = 0; > while (str[i] && str[i] != '\n' ) { > ++i; > } > st->print_raw(str, i); > str += i; > while (*str == '\n') { > ++str; > } > } Another option: class filterStringStream: public stringStream { private: char ch; public: filterStringStream(char ch_to_filter, size_t initial_bufsize = 256) : stringStream(initial_bufsize), ch(ch_to_filter) {} virtual void write(const char* c, size_t len) override { const char* e = c + len; while (c != e) { size_t i = 0; while ((c+i) != e && c[i] != ch ) { ++i; } stringStream::write(c, i); c += i; while (c != e && *ch == ch) { ++c; } } } }; Your code will be: filterStringStream ss('\n'); ss.print(" "); const_oop->print_oop(&ss); st->print_raw(ss.base(), ss,size()); ------------- PR: https://git.openjdk.java.net/jdk/pull/2178 From github.com+42899633+eastig at openjdk.java.net Wed Feb 17 19:25:40 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Wed, 17 Feb 2021 19:25:40 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v4] In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 19:16:59 GMT, Evgeny Astigeevich wrote: >> src/hotspot/share/opto/type.cpp line 4049: >> >>> 4047: ss.print(" "); >>> 4048: const_oop()->print_oop(&ss); >>> 4049: ss.tr_delete('\n'); >> >> `tr_delete` is expensive. Also deleting something in a stream does not fit into a concept of streams. >> I see that the content of `ss` is traversed many times. >> What about this code: >> for (const char *str = ss.base(); *str; ) { >> size_t i = 0; >> while (str[i] && str[i] != '\n' ) { >> ++i; >> } >> st->print_raw(str, i); >> str += i; >> while (*str == '\n') { >> ++str; >> } >> } > > Another option: > class filterStringStream: public stringStream { > private: > char ch; > public: > filterStringStream(char ch_to_filter, size_t initial_bufsize = 256) : stringStream(initial_bufsize), ch(ch_to_filter) {} > > virtual void write(const char* c, size_t len) override { > const char* e = c + len; > while (c != e) { > size_t i = 0; > while ((c+i) != e && c[i] != ch ) { > ++i; > } > stringStream::write(c, i); > c += i; > while (c != e && *ch == ch) { > ++c; > } > } > } > }; > > Your code will be: > filterStringStream ss('\n'); > ss.print(" "); > const_oop->print_oop(&ss); > st->print_raw(ss.base(), ss,size()); > `tr_delete` is expensive. Also deleting something in a stream does not fit into a concept of streams. > I see that the content of `ss` is traversed many times. > What about this code: > > ``` > for (const char *str = ss.base(); *str; ) { > size_t i = 0; > while (str[i] && str[i] != '\n' ) { > ++i; > } > st->print_raw(str, i); > str += i; > while (*str == '\n') { > ++str; > } > } > ``` You can put this code in a function like `print_filtering_ch(char, const stringStream&, outputStream*)` ------------- PR: https://git.openjdk.java.net/jdk/pull/2178 From never at openjdk.java.net Wed Feb 17 19:37:41 2021 From: never at openjdk.java.net (Tom Rodriguez) Date: Wed, 17 Feb 2021 19:37:41 GMT Subject: RFR: 8261846: [JVMCI] c2v_iterateFrames can get out of sync with the StackFrameStream In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 02:33:02 GMT, Dean Long wrote: >> Update copyright year in vframe.hpp. >> Otherwise good. > > Hi Tom. This code could be simplified and made faster using vframeStream as the iterator and vframeStream:asJavaVFrame to get the vframe. If you don't need the locals of every frame, then vframeStream::next is faster than vframe::sender. See JDK-8214329. @dean-long I think it would be possible to rewrite this to use vframeStream though it would still need to build vframes because of the introspection it does on compiled frames. It would be a major rewrite though and in general this code is working just fine so I don't think I want to tackle that. ------------- PR: https://git.openjdk.java.net/jdk/pull/2594 From dnsimon at openjdk.java.net Wed Feb 17 19:40:41 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Wed, 17 Feb 2021 19:40:41 GMT Subject: RFR: 8261846: [JVMCI] c2v_iterateFrames can get out of sync with the StackFrameStream In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 19:34:53 GMT, Tom Rodriguez wrote: >> Hi Tom. This code could be simplified and made faster using vframeStream as the iterator and vframeStream:asJavaVFrame to get the vframe. If you don't need the locals of every frame, then vframeStream::next is faster than vframe::sender. See JDK-8214329. > > @dean-long I think it would be possible to rewrite this to use vframeStream though it would still need to build vframes because of the introspection it does on compiled frames. It would be a major rewrite though and in general this code is working just fine so I don't think I want to tackle that. Looks good to me. ------------- PR: https://git.openjdk.java.net/jdk/pull/2594 From dlong at openjdk.java.net Wed Feb 17 22:13:40 2021 From: dlong at openjdk.java.net (Dean Long) Date: Wed, 17 Feb 2021 22:13:40 GMT Subject: RFR: 8261846: [JVMCI] c2v_iterateFrames can get out of sync with the StackFrameStream In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 19:34:53 GMT, Tom Rodriguez wrote: >> Hi Tom. This code could be simplified and made faster using vframeStream as the iterator and vframeStream:asJavaVFrame to get the vframe. If you don't need the locals of every frame, then vframeStream::next is faster than vframe::sender. See JDK-8214329. > > @dean-long I think it would be possible to rewrite this to use vframeStream though it would still need to build vframes because of the introspection it does on compiled frames. It would be a major rewrite though and in general this code is working just fine so I don't think I want to tackle that. @tkrodriguez OK, I filed a separate RFE for my suggestion. ------------- PR: https://git.openjdk.java.net/jdk/pull/2594 From ioi.lam at oracle.com Wed Feb 17 22:16:35 2021 From: ioi.lam at oracle.com (Ioi Lam) Date: Wed, 17 Feb 2021 14:16:35 -0800 Subject: CHECK at the end of a void function In-Reply-To: <6457788b-cc0a-1038-2032-3978c4a067f5@oracle.com> References: <_-r6tekItyqxQA5WKZFBj-b0IQRC6R0Nw7QBMZbBPw0=.6c253bd5-697b-4060-9026-3cf015a6dc09@github.com> <6457788b-cc0a-1038-2032-3978c4a067f5@oracle.com> Message-ID: <02e97ff9-5518-b980-32c1-16b52bff52f6@oracle.com> Converting this from a PR discussion (https://git.openjdk.java.net/jdk/pull/2494) to a regular mail. What are people's opinion of: ??? void bar(TRAPS); ??? void foo(TRAPS) { ? ?? ? bar(CHECK); ??? } vs ??? void foo(TRAPS) { ? ?? ? bar(THREAD); ??? } There's no mention of this in https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md Advantage of CHECK: - More readable -- you don't need to ask yourself: ? does the callee need a THREAD, or is the callee a TRAPS function? - More maintainable. You don't accidentally miss a check if you add new ? code below ??? void foo(TRAPS) { ? ?? ? bar(THREAD); ? ? ?? baz();?????? // adding a new call .... ??? } Note that we MUST use THREAD when returning a value (see https://bugs.openjdk.java.net/browse/JDK-6889002) ??? int x(TRAPS); ??? int y(TRAPS) { ?????? return x(THREAD); ??? } so there's inconsistency. However, the compiler will given an error if you add code below the THREAD. So we don't have the maintenance issue as void functions: ??? int Y(TRAPS) { ?????? return X(THREAD); ?????? baz(); ??? } Disadvantage of CHECK: - It's not guaranteed that the C compiler will elide it. The code gets pre-processed to ??? inlined bool ThreadShadow::has_pending_exception() const { ?? ?? ? return _pending_exception != NULL; ??? } ??? void foo(Thread*? __the_thread__) { ? ????? bar(__the_thread__); ??????? if (((ThreadShadow*)__the_thread__)->has_pending_exception()) return; ??? } Is it safe to assume that any C compiler that can efficiently compile HotSpot will always elide the "if" line? I am a little worried about the maintenance issue. If we really want to avoid the CHECK, I would prefer to have a new macro like: ??? void foo(TRAPS) { ?????? bar(CHECK_AT_RETURN); ??? } which will be preprocessed to ??? void foo(....) { ?????? bar(_thread__); return; ??? } So you can't accidentally add code below it. Thanks ?-Ioi From david.holmes at oracle.com Thu Feb 18 01:55:20 2021 From: david.holmes at oracle.com (David Holmes) Date: Thu, 18 Feb 2021 11:55:20 +1000 Subject: CHECK at the end of a void function In-Reply-To: <02e97ff9-5518-b980-32c1-16b52bff52f6@oracle.com> References: <_-r6tekItyqxQA5WKZFBj-b0IQRC6R0Nw7QBMZbBPw0=.6c253bd5-697b-4060-9026-3cf015a6dc09@github.com> <6457788b-cc0a-1038-2032-3978c4a067f5@oracle.com> <02e97ff9-5518-b980-32c1-16b52bff52f6@oracle.com> Message-ID: <4b640b6a-d188-d34f-e979-26690974701a@oracle.com> Hi Ioi, > CHECK at the end of a void function This isn't really about void functions, nor check at the end. It is about using CHECK/CHECK_* on a call that is immediately followed by a return from the current function as it degenerates to: if (EXCEPTION_OCCURRED) return; else return; On 18/02/2021 8:16 am, Ioi Lam wrote: > Converting this from a PR discussion > (https://git.openjdk.java.net/jdk/pull/2494) to a regular mail. Thanks for doing that. > What are people's opinion of: > > ??? void bar(TRAPS); > > ??? void foo(TRAPS) { > ? ?? ? bar(CHECK); > ??? } > > vs > > ??? void foo(TRAPS) { > ? ?? ? bar(THREAD); > ??? } > > There's no mention of this in > https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md > > Advantage of CHECK: > > - More readable -- you don't need to ask yourself: > ? does the callee need a THREAD, or is the callee a TRAPS function? But you also don't need to ask yourself that because it doesn't matter when the next action is to return anyway. > - More maintainable. You don't accidentally miss a check if you add new > ? code below > > ??? void foo(TRAPS) { > ? ?? ? bar(THREAD); > ? ? ?? baz();?????? // adding a new call .... > ??? } True but I would argue that you need to think about the behaviour of bar when adding baz() regardless. This might be wrong: bar(CHECK); baz(); // <= critical code that must always be executed no matter what! > Note that we MUST use THREAD when returning a value (see > https://bugs.openjdk.java.net/browse/JDK-6889002) > > ??? int x(TRAPS); > > ??? int y(TRAPS) { > ?????? return x(THREAD); > ??? } I think this is an anti-pattern and we should prefer the more explicit: RetType ret = x(CHECK_*); return ret; if we want to emphasize use of CHECK. > so there's inconsistency. However, the compiler will given an error if > you add code below the THREAD. So we don't have the maintenance issue as > void functions: > > ??? int Y(TRAPS) { > ?????? return X(THREAD); > ?????? baz(); > ??? } Don't quite follow that as you wouldn't write anything after a return statement anyway. > Disadvantage of CHECK: > > - It's not guaranteed that the C compiler will elide it. The code gets > pre-processed to > > ??? inlined bool ThreadShadow::has_pending_exception() const { > ?? ?? ? return _pending_exception != NULL; > ??? } > > ??? void foo(Thread*? __the_thread__) { > ? ????? bar(__the_thread__); > ??????? if (((ThreadShadow*)__the_thread__)->has_pending_exception()) > return; > ??? } > > Is it safe to assume that any C compiler that can efficiently compile > HotSpot will always elide the "if" line? I've no idea, but if we can rely on it then I'm okay with always using CHECK. It was the redundant code execution that was my concern. > I am a little worried about the maintenance issue. If we really want to > avoid the CHECK, I would prefer to have a new macro like: > > ??? void foo(TRAPS) { > ?????? bar(CHECK_AT_RETURN); > ??? } > > which will be preprocessed to > > ??? void foo(....) { > ?????? bar(_thread__); return; > ??? } > > So you can't accidentally add code below it. I agree that a new macro might be preferable than just using THREAD. Coming up with a short but meaningful name will be a problem. :) The point is we are not actually checking anything in this case. Thanks, David > Thanks > ?-Ioi > From dholmes at openjdk.java.net Thu Feb 18 04:58:50 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 18 Feb 2021 04:58:50 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v7] In-Reply-To: <8vAf2pIuZoDkFY-d3VC9rzzss_Bmult2_cbEJo6Aw6c=.fd724d30-bae0-4aaf-8676-769982ed45d6@github.com> References: <8vAf2pIuZoDkFY-d3VC9rzzss_Bmult2_cbEJo6Aw6c=.fd724d30-bae0-4aaf-8676-769982ed45d6@github.com> Message-ID: On Wed, 17 Feb 2021 13:26:20 GMT, Thomas Stuefe wrote: >> In signal handling code, we have code sections which save signal handler state into vectors of sigaction structures, or of integers (if only flags are saved). All these code sections can be unified, disentangled and the using code simplified. >> >> There are three places where we do this: >> >> 1) When installing hotspot signal handlers, should we find a handler in place and signal chaining is enabled, we save the original handler inside a sigaction array and a corresponding sigset: >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L85 >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L338 >> >> 2) if diagnostics are enabled with -Xcheck:jni, we periodically check if our hotspot signal handlers had been replaced (`static void check_signal_handler(int sig)`): >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L766 >> To do that, we store information about the handlers we installed and we expect to be intact; in this case we only store the sigaction flags (`int sigflags[NSIG];`) and deduce the handler address from context. >> >> 3) There is a complicated dance between VMError and the posix signal handler code: If a fatal error happens, we enter error reporting and install the secondary handler (`VMError::install_secondary_signal_handler()`). Before doing that, we store the handler we replace in yet another array, in this case one array for the handler address, one for the flag: >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/vmError_posix.cpp#L77 >> I believe the purpose of this is to - when printing signal handlers as part of error reporting - print the original signal handler instead of the secondary crash handler (see `PosixSignals::print_signal_handler()`): >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L1372 >> and additionally to not trip this warning here: >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L1391 >> >> ------ >> >> Changes in this patch: >> >> - I added some convenience macros to check if a handler matches a given function (HANDLER_IS), check if a handler is set to ignore or default or both (HANDLER_IS_IGN, HANDLER_IS_DFL, HANDLER_IS_IGN_OR_DFL). Makes code more readable. >> - I added convenience class `SavedSignalHandlers` to keep a vector of handler information by signal number. >> - I used that class to cover cases (1)..(3): >> - `chained_handlers` contains all information of chained handlers >> - `expected_handlers` contains a copy of the handlers the hotspot installed >> - `replaced_handlers` contains information about replaced handlers >> >> - about (1): I store the chained signal handler information in `chained_handlers` when installing a hotspot handler, UseSignalChaining is 1, and a non-default handler was encountered. >> >> - about (2): I simplified the signal checking mechanism quite a bit: it compares the handler (address and flags) it finds present with expectations. Before this patch, the expected handler address was deduced in a hard-wired way, now, we just compare the active sigaction structure with the one we installed on VM start. >> >> - about (3): when installing any handler (hotspot as well as user defined via java), I store the handler it replaced in `replaced_handlers`. I use that to print which handler had been replaced in `PosixSignals::print_signal_handler`. I simplified `PosixSignals::print_signal_handler` such that it does not retain any knowledge about hotspot signal handlers. Now, it just prints out the currently established handlers. In addition to that, it prints out chaining information and which handlers had been replaced. I removed the associated coding from VMError. >> >> Output Before: >> 663 Signal Handlers: >> 664 SIGSEGV: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 665 SIGBUS: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 666 SIGFPE: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 667 SIGPIPE: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 668 SIGXFSZ: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 669 SIGILL: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 670 SIGUSR2: SR_handler in libjvm.so, sa_mask[0]=00000000000000000000000000000000, sa_flags=SA_RESTART|SA_SIGINFO >> 671 SIGHUP: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 672 SIGINT: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 673 SIGTERM: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 674 SIGQUIT: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 675 SIGTRAP: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> >> Now: >> Signal Handlers: >> SIGSEGV: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGSEGV: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGBUS: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGBUS: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGFPE: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGFPE: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGPIPE: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGXFSZ: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGILL: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGILL: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGUSR2: SR_handler in libjvm.so, mask=00000000000000000000000000000000, flags=SA_RESTART|SA_SIGINFO >> SIGHUP: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGINT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGTERM: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGQUIT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGTRAP: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> >> ----- >> Tests: GA, and the patch has been tested in our nighlies for over a month now. I manually executed the runtime/jni/checked tests too. > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: > > - Style fixes > - expected_handlers->vm_handlers > - Merge > - Use universal zero initializer for do_check_signal_periodically > - Further fixes > - Fix build error on zlinux > - David Feedback > - Make SavedSignalHandlers use C-heap for its items > - Removed display-replaced-handler-logic > - Feedback David > - ... and 2 more: https://git.openjdk.java.net/jdk/compare/5e40b9f5...5cf58186 Still good to me. Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2251 From dholmes at openjdk.java.net Thu Feb 18 05:14:40 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 18 Feb 2021 05:14:40 GMT Subject: RFR: 8261447: MethodInvocationCounters frequently run into overflow [v7] In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 18:22:01 GMT, Lutz Schmidt wrote: >> Dear community, >> may I please request reviews for this fix, improving the usefulness of method invocation counters. >> - aggregation counters are retyped as uint64_t, shifting the overflow probability way out (185 days in case of a 1 GHz counter update frequency). >> - counters for individual methods are interpreted as (unsigned int), in contrast to their declaration as int. This gives us a factor of two before the counters overflow. >> - as a special case, "compiled_invocation_counter" is retyped as long, because it has a higher update frequency than other counters. >> - before/after sample output is attached to the bug description. >> >> Thank you! >> Lutz > > Lutz Schmidt has updated the pull request incrementally with one additional commit since the last revision: > > update copyright year src/hotspot/share/oops/method.cpp line 511: > 509: tty->cr(); > 510: > 511: // Internal counting is based on signed int counters. They tend to Is there a good reason to not simply make them unsigned int? ------------- PR: https://git.openjdk.java.net/jdk/pull/2511 From stuefe at openjdk.java.net Thu Feb 18 05:29:41 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 18 Feb 2021 05:29:41 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v7] In-Reply-To: References: <8vAf2pIuZoDkFY-d3VC9rzzss_Bmult2_cbEJo6Aw6c=.fd724d30-bae0-4aaf-8676-769982ed45d6@github.com> Message-ID: On Thu, 18 Feb 2021 04:55:33 GMT, David Holmes wrote: > Still good to me. > > Thanks, > David Thanks, David ------------- PR: https://git.openjdk.java.net/jdk/pull/2251 From stuefe at openjdk.java.net Thu Feb 18 05:34:41 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 18 Feb 2021 05:34:41 GMT Subject: RFR: JDK-8261552: s390: MacroAssembler::encode_klass_not_null() may produce wrong results for non-zero values of narrow klass base In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 15:58:49 GMT, Lutz Schmidt wrote: > I do not like the cross-dependency to metaspace.hpp just for the sake of checking an artificial restriction on Klass pointers. It is not just for the assertion, it is for limiting the 32bit add to situations where we know Klass pointers cannot exceed 32bit. That was the main reason. As I wrote, I was not sure about the assertion myself and am happy to drop it. > And by the way, you could do the check with one test: > > ``` > z_oihf(current, 0); > z_brc(Assembler::bcondZero, ok); > ``` > > z_oihf() does modify the contents of register current, but it writes back the same value. Thank you. Unfortunately, information about z assembly was hard to come by. The only public information I found had hardly more than the instruction names, the rest was trial and error. ------------- PR: https://git.openjdk.java.net/jdk/pull/2595 From stuefe at openjdk.java.net Thu Feb 18 05:39:39 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 18 Feb 2021 05:39:39 GMT Subject: RFR: JDK-8261552: s390: MacroAssembler::encode_klass_not_null() may produce wrong results for non-zero values of narrow klass base In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 16:53:03 GMT, Lutz Schmidt wrote: >> If Compressed class pointer base has a non-zero value it may cause MacroAssembler::encode_klass_not_null() to encode a Klass pointer to a wrong narrow pointer. >> >> This can be reproduced by starting the VM with >> -Xshare:dump -XX:HeapBaseMinAddress=2g -Xmx128m >> but CDS is not involved. It is only relevant insofar as this is the only way to get the following combination: >> - heap is allocated at 0x800_0000. It is small and ends at 0x8800_0000. >> - class space follows at 0x8800_0000 >> - the narrow klass pointer base points to the start of the class space at 0x8800_0000. >> >> In MacroAssembler::encode_klass_not_null(), there is the following section: >> >> if (base != NULL) { >> unsigned int base_h = ((unsigned long)base)>>32; >> unsigned int base_l = (unsigned int)((unsigned long)base); >> if ((base_h != 0) && (base_l == 0) && VM_Version::has_HighWordInstr()) { >> lgr_if_needed(dst, current); >> z_aih(dst, -((int)base_h)); // Base has no set bits in lower half. >> } else if ((base_h == 0) && (base_l != 0)) { (A) >> lgr_if_needed(dst, current); >> z_agfi(dst, -(int)base_l); (B) >> } else { >> load_const(Z_R0, base); >> lgr_if_needed(dst, current); >> z_sgr(dst, Z_R0); >> } >> current = dst; >> } >> >> We enter the condition at (A) if the narrow klass pointer base is non-zero but fits into 32bit. At (B), we want to substract the base from the Klass pointer; we do this by calculating the 32bit twos-complement of the base and add it with AGFI. AGFI adds a 32bit immediate to a 64bit register. In this case, it produces the wrong result if the base is >0x800_0000: >> >> In the case of the crash, we have: >> base: 8800_0000 >> klass pointer: 8804_1040 >> 32bit two's complement of base: 7800_0000 >> added to the klass pointer: 1_0004_1040 >> >> So the result of the "substraction" is 1_0004_1040, it should be 4_1040, which would be the correct offset of the Klass* pointer within the ccs. >> >> This bug has been dormant; was activated by JDK-8250989 which changed the way class space reservation happens at CDS dump time. It surfaced first as crash in a CDS-specific jtreg test (JDK-8261552). >> >> ================ >> >> Fix: >> >> I changed the AGFI instruction to a pure 32bit add (AFI). That works as long as the Klass pointer also fits into 32bit. So I narrowed the condition at (A) to only fire if it can be ensured that both narrow base and Klass* pointers fit into 32bit. >> >> I also added a runtime verification in that case that any Klass pointer passed down is indeed a 32bit pointer. However, I am not really sure this is useful, or that this is the best way to do this (using TMHH and TMHL). I was looking for something like TMH or TML to check whole 32bit words but could not find any. >> >> ---- >> >> Tests: >> >> I manually tested that the crash disappears, which it does. I stepped through the encoding code and the values now look right. >> >> I also did build a VM with the ability to override both class space start address and the narrow klass pointer base to exact values (see https://github.com/openjdk/jdk/compare/master...tstuefe:override-ccs-start-and-base). >> >> I used this method to test various combinations: >> - narrow klass pointer base > 0 < 4g + ccs end < 4g (we hit our branch doing AFI) >> - narrow klass pointer base > 0 < 4g + ccs end > 4g (we hit the fallback doing SGR with r0) >> - narrow klass pointer base = 0 (we dont do anything) >> >> (would this override-feature be useful? We could do better testing). >> >> Thanks, Thomas > > src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3657: > >> 3655: } else { >> 3656: load_const(Z_R0, base); >> 3657: lgr_if_needed(dst, current); > > What would you think of a more general rework like this? The comments in the code should explain the intentions/assumptions/conclusions. > > // Klass oop manipulations if compressed. > void MacroAssembler::encode_klass_not_null(Register dst, Register src) { > Register current = (src != noreg) ? src : dst; // Klass is in dst if no src provided. (dst == src) also possible. > address base = CompressedKlassPointers::base(); > int shift = CompressedKlassPointers::shift(); > bool need_zero_extend = false; > assert(UseCompressedClassPointers, "only for compressed klass ptrs"); > > BLOCK_COMMENT("cKlass encoder {"); > > #ifdef ASSERT > Label ok; > z_tmll(current, KlassAlignmentInBytes-1); // Check alignment. > z_brc(Assembler::bcondAllZero, ok); > // The plain disassembler does not recognize illtrap. It instead displays > // a 32-bit value. Issueing two illtraps assures the disassembler finds > // the proper beginning of the next instruction. > z_illtrap(0xee); > z_illtrap(0xee); > bind(ok); > #endif > > // Scale down the incoming klass pointer first. > // We then can be sure we calculate an offset that fits into 32 bit. > // More generally speaking: all subsequent calculations are purely 32-bit. > if (shift != 0) { > assert (LogKlassAlignmentInBytes == shift, "decode alg wrong"); > z_srlg(dst, current, shift); > need_zero_extend = true; > current = dst; > } > > if (base != NULL) { > // Use scaled-down base address parts to match scaled-down klass pointer. > unsigned int base_h = ((unsigned long)base)>>(32+shift); > unsigned int base_l = (unsigned int)(((unsigned long)base)>>shift); > > // General considerations: > // - when calculating (current_h - base_h), all digits must cancel (become 0). > // Otherwise, we would end up with a compressed klass pointer which doesn't > // fit into 32-bit. > // - Only bit#33 of the difference could potentially be non-zero. For that > // to happen, (current_l < base_l) must hold. In this case, the subtraction > // will create a borrow out of bit#32, nicely killing bit#33. > // - With the above, we only need to consider current_l and base_l to > // calculate the result. > // - Both values are treated as unsigned. The unsigned subtraction is > // replaced by adding (unsigned) the 2's complement of the subtrahend. > > if (base_l == 0) { > // - By theory, the calculation to be performed here (current_h - base_h) MUST > // cancel all high-word bits. Otherwise, we would end up with an offset > // (i.e. compressed klass pointer) that does not fit into 32 bit. > // - current_l remains unchanged. > // - Therefore, we can replace all calculation with just a > // zero-extending load 32 to 64 bit. > // - Even that can be replaced with a conditional load if dst != current. > // (this is a local view. The shift step may have requested zero-extension). > } else { > // To begin with, we may need to copy and/or zero-extend the register operand. > // We have to calculate (current_l - base_l). Because there is no unsigend > // subtract instruction with immediate operand, we add the 2's complement of base_l. > if (need_zero_extend) { > z_llgfr(dst, current); > need_zero_extend = false; > } else { > llgfr_if_needed(dst, current); // zero-extension while copying comes at no extra cost. > } > current = dst; > z_alfi(dst, -(int)base_l); > } > > if (need_zero_extend) { > // We must zero-extend the calculated result. It may have some leftover bits in > // the hi-word because we only did optimized calculations. > z_llgfr(dst, current); > } else { > llgfr_if_needed(dst, current); // zero-extension while copying comes at no extra cost. > } > > BLOCK_COMMENT("} cKlass encoder"); > } Looks nice and elegant. But as said offlist, I dislike the fact that this hard codes the limitation to 32bit for the narrow klass pointer range. That restriction is artificial and we may just want to drop it. E.g. one recurring idea I have is to drop the duality in metaspace between non-class- and class-metaspace, and just store everything in class space. That would save quite a bit of memory (less overhead) and make the metaspace coding quite a bit simpler. However, in that case it could be that we exceed the current 3g limit and may even exceed 32bit. Since add+shift for decoding is universally done on all platforms at least if CDS is on, this should work out of the box. Unless of course the platforms hard-code the 32bit limitation into their encoding schemes. ------------- PR: https://git.openjdk.java.net/jdk/pull/2595 From never at openjdk.java.net Thu Feb 18 06:33:02 2021 From: never at openjdk.java.net (Tom Rodriguez) Date: Thu, 18 Feb 2021 06:33:02 GMT Subject: RFR: 8261846: [JVMCI] c2v_iterateFrames can get out of sync with the StackFrameStream [v2] In-Reply-To: References: Message-ID: > c2v_iterateFrames mixes a StackFrameSteam and vframes and the vframe factory method can silently skip stub frames. The could leave the StackFrameStream out of sync with the vframe walk. This can cause the iteration fail in strange ways and assert in fastdebug builds. Tom Rodriguez has updated the pull request incrementally with one additional commit since the last revision: Update copyright year ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2594/files - new: https://git.openjdk.java.net/jdk/pull/2594/files/45f83f12..1da3a26f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2594&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2594&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2594.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2594/head:pull/2594 PR: https://git.openjdk.java.net/jdk/pull/2594 From never at openjdk.java.net Thu Feb 18 06:38:41 2021 From: never at openjdk.java.net (Tom Rodriguez) Date: Thu, 18 Feb 2021 06:38:41 GMT Subject: Integrated: 8261846: [JVMCI] c2v_iterateFrames can get out of sync with the StackFrameStream In-Reply-To: References: Message-ID: On Tue, 16 Feb 2021 20:17:30 GMT, Tom Rodriguez wrote: > c2v_iterateFrames mixes a StackFrameSteam and vframes and the vframe factory method can silently skip stub frames. The could leave the StackFrameStream out of sync with the vframe walk. This can cause the iteration fail in strange ways and assert in fastdebug builds. This pull request has now been integrated. Changeset: 97e1657b Author: Tom Rodriguez URL: https://git.openjdk.java.net/jdk/commit/97e1657b Stats: 14 lines in 3 files changed: 9 ins; 0 del; 5 mod 8261846: [JVMCI] c2v_iterateFrames can get out of sync with the StackFrameStream Reviewed-by: kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/2594 From dongbo at openjdk.java.net Thu Feb 18 07:57:25 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Thu, 18 Feb 2021 07:57:25 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v6] In-Reply-To: References: Message-ID: <45gEz_9Vli1Mvby8SbtMsoEx65KL11RQnI-qKy6zfgo=.42933345-0086-466b-ae38-3add53fc1816@github.com> > In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero, > see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`: > /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */ > public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT); > > The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64, > assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 0`, but `ushr dst.4H, src.4H, 16` instead. > According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long. > ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i); > vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i); > > The legal right shift amount should be in the range 1 to the element width in bits on aarch64: > https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en > > This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate. > Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests. Dong Bo has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - add tests in aarch64-asmtest.py - fix windows operator precedence error and cleanup testcase - Merge branch 'master' into aarch64_vector_api_shift - fix windows build failure - generate add if shift == 0 for accumulation and fix some test code - back out AD modifications and handle zero shift in assembler ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2472/files - new: https://git.openjdk.java.net/jdk/pull/2472/files/d75ee99e..d746f209 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=04-05 Stats: 22519 lines in 709 files changed: 13758 ins; 4768 del; 3993 mod Patch: https://git.openjdk.java.net/jdk/pull/2472.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2472/head:pull/2472 PR: https://git.openjdk.java.net/jdk/pull/2472 From dongbo at openjdk.java.net Thu Feb 18 08:10:40 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Thu, 18 Feb 2021 08:10:40 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v2] In-Reply-To: <0wUxJ4QUIzC-Hg4qSPtf8nFP0ov9J69nA3gjaoEJcWY=.ede23adb-4010-460d-8ac4-d560ace8ffc0@github.com> References: <_ach7OekIqkqmFRW3JqA5h4Q_HQUbRni0vkFzx5q3MA=.536a9faa-98c9-4dd9-9798-dcc794e23cd0@github.com> <0wUxJ4QUIzC-Hg4qSPtf8nFP0ov9J69nA3gjaoEJcWY=.ede23adb-4010-460d-8ac4-d560ace8ffc0@github.com> Message-ID: On Wed, 10 Feb 2021 02:59:24 GMT, Dong Bo wrote: >> src/hotspot/cpu/aarch64/aarch64_neon_ad.m4 line 2057: >> >>> 2055: as_FloatRegister($src$$reg), as_FloatRegister($src$$reg)); >>> 2056: } else {ifelse($4, B,` >>> 2057: if (sh >= 8) sh = 7; >> >> I think it would be possible to move some of this logic from the AD file into MacroAssembler, with macros to generate the appropriate instruction based on their arguments. This might be cleaner: the logic here is very hard to follow. > > I backed out the modifications of `aarch64_neon.ad` and `aarch64_neon_ad.m4`. > The `shift == 0` case is handled by the assembler now. Verified with the regression tests. > I think it would be possible to move some of this logic from the AD file into MacroAssembler, with macros to generate the appropriate instruction based on their arguments. This might be cleaner: the logic here is very hard to follow. Hi, I moved the logic to the assembler. The assembler will generate different instructions based on the value of `shift`. If `shift == 0` and need not to accumulte, generated a `mov`. If `shift == 0` and need to accumulte, generated an `add`. Also added tests in `aarch64-asmtest.py` to verify the assembler modifications. ------------- PR: https://git.openjdk.java.net/jdk/pull/2472 From lucy at openjdk.java.net Thu Feb 18 09:14:46 2021 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Thu, 18 Feb 2021 09:14:46 GMT Subject: RFR: JDK-8261552: s390: MacroAssembler::encode_klass_not_null() may produce wrong results for non-zero values of narrow klass base In-Reply-To: References: Message-ID: On Thu, 18 Feb 2021 05:37:18 GMT, Thomas Stuefe wrote: >> src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3657: >> >>> 3655: } else { >>> 3656: load_const(Z_R0, base); >>> 3657: lgr_if_needed(dst, current); >> >> What would you think of a more general rework like this? The comments in the code should explain the intentions/assumptions/conclusions. >> >> // Klass oop manipulations if compressed. >> void MacroAssembler::encode_klass_not_null(Register dst, Register src) { >> Register current = (src != noreg) ? src : dst; // Klass is in dst if no src provided. (dst == src) also possible. >> address base = CompressedKlassPointers::base(); >> int shift = CompressedKlassPointers::shift(); >> bool need_zero_extend = false; >> assert(UseCompressedClassPointers, "only for compressed klass ptrs"); >> >> BLOCK_COMMENT("cKlass encoder {"); >> >> #ifdef ASSERT >> Label ok; >> z_tmll(current, KlassAlignmentInBytes-1); // Check alignment. >> z_brc(Assembler::bcondAllZero, ok); >> // The plain disassembler does not recognize illtrap. It instead displays >> // a 32-bit value. Issueing two illtraps assures the disassembler finds >> // the proper beginning of the next instruction. >> z_illtrap(0xee); >> z_illtrap(0xee); >> bind(ok); >> #endif >> >> // Scale down the incoming klass pointer first. >> // We then can be sure we calculate an offset that fits into 32 bit. >> // More generally speaking: all subsequent calculations are purely 32-bit. >> if (shift != 0) { >> assert (LogKlassAlignmentInBytes == shift, "decode alg wrong"); >> z_srlg(dst, current, shift); >> need_zero_extend = true; >> current = dst; >> } >> >> if (base != NULL) { >> // Use scaled-down base address parts to match scaled-down klass pointer. >> unsigned int base_h = ((unsigned long)base)>>(32+shift); >> unsigned int base_l = (unsigned int)(((unsigned long)base)>>shift); >> >> // General considerations: >> // - when calculating (current_h - base_h), all digits must cancel (become 0). >> // Otherwise, we would end up with a compressed klass pointer which doesn't >> // fit into 32-bit. >> // - Only bit#33 of the difference could potentially be non-zero. For that >> // to happen, (current_l < base_l) must hold. In this case, the subtraction >> // will create a borrow out of bit#32, nicely killing bit#33. >> // - With the above, we only need to consider current_l and base_l to >> // calculate the result. >> // - Both values are treated as unsigned. The unsigned subtraction is >> // replaced by adding (unsigned) the 2's complement of the subtrahend. >> >> if (base_l == 0) { >> // - By theory, the calculation to be performed here (current_h - base_h) MUST >> // cancel all high-word bits. Otherwise, we would end up with an offset >> // (i.e. compressed klass pointer) that does not fit into 32 bit. >> // - current_l remains unchanged. >> // - Therefore, we can replace all calculation with just a >> // zero-extending load 32 to 64 bit. >> // - Even that can be replaced with a conditional load if dst != current. >> // (this is a local view. The shift step may have requested zero-extension). >> } else { >> // To begin with, we may need to copy and/or zero-extend the register operand. >> // We have to calculate (current_l - base_l). Because there is no unsigend >> // subtract instruction with immediate operand, we add the 2's complement of base_l. >> if (need_zero_extend) { >> z_llgfr(dst, current); >> need_zero_extend = false; >> } else { >> llgfr_if_needed(dst, current); // zero-extension while copying comes at no extra cost. >> } >> current = dst; >> z_alfi(dst, -(int)base_l); >> } >> >> if (need_zero_extend) { >> // We must zero-extend the calculated result. It may have some leftover bits in >> // the hi-word because we only did optimized calculations. >> z_llgfr(dst, current); >> } else { >> llgfr_if_needed(dst, current); // zero-extension while copying comes at no extra cost. >> } >> >> BLOCK_COMMENT("} cKlass encoder"); >> } > > Looks nice and elegant. > > But as said offlist, I dislike the fact that this hard codes the limitation to 32bit for the narrow klass pointer range. > > That restriction is artificial and we may just want to drop it. E.g. one recurring idea I have is to drop the duality in metaspace between non-class- and class-metaspace, and just store everything in class space. That would save quite a bit of memory (less overhead) and make the metaspace coding quite a bit simpler. However, in that case it could be that we exceed the current 3g limit and may even exceed 32bit. Since add+shift for decoding is universally done on all platforms at least if CDS is on, this should work out of the box. Unless of course the platforms hard-code the 32bit limitation into their encoding schemes. I don't see how you want to overcome the 32-bit limit for compressed pointers. This whole "compression" thing is based on the "trick" to store an offset instead of the full address. Depending on the object alignment requirement, this affords you 32 GB (8-byte alignment) or 64 GB (16-byte alignment) of addressable (or should I say offset-able) space. That's quite a bit. You use pointer compression to save space, and for nothing else. Space savings have to be so significant that they outweigh the added effort for encoding and decoding. With just some shift and add, the effort is limited, though noticeable. If you would make compressed pointers 40 bits wide (5 bytes), encoding and decoding would impose more effort. What's even worse, you then would have entities with a size not native to any processor. Just imagine you have to atomically store such a value. I my opinion, wider compressed pointers will have to wait until we have 128-bit pointers. Back to code: In the code suggested above, you could make use of the Metaspace::class_space_end() function. If the class space end address, shifted right, fits into 32 bit, need_zero_extend may remain false. Your choice. ------------- PR: https://git.openjdk.java.net/jdk/pull/2595 From aph at openjdk.java.net Thu Feb 18 09:18:40 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 18 Feb 2021 09:18:40 GMT Subject: RFR: 8261649: AArch64: Optimize LSE atomics in C++ code In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 17:48:06 GMT, Andrew Dinn wrote: >> 8261649: AArch64: Optimize LSE atomics in C++ code > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5651: > >> 5649: __ mov(prev, compare_val); >> 5650: __ lse_cas(prev, exchange_val, ptr, size, acquire, release, /*not_pair*/true); >> 5651: if (acquire && release) { > > These two flags are only ever passed as true,true or false,false. Does any other combination make sense? If not then should you not be using a single flag? or at least asserting (pro tem) that they are both equal? Today HotSpot only really supports mo_conservative and mo_relaxed, but there are many where release on its own would make sense; I think Aleksey recently found some. Having said that, it would be clearer here to expose mo_conservative as well. I'll do so. ------------- PR: https://git.openjdk.java.net/jdk/pull/2611 From aph at openjdk.java.net Thu Feb 18 09:22:39 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 18 Feb 2021 09:22:39 GMT Subject: RFR: 8261649: AArch64: Optimize LSE atomics in C++ code In-Reply-To: References: Message-ID: <5cSjmWB-Tt-01jPQ5YIJDdZx6RncoQMxTA6DFvSmHzw=.b8681c51-6def-4fd1-a473-e0e5f9e1433f@github.com> On Wed, 17 Feb 2021 17:51:35 GMT, Andrew Dinn wrote: >> 8261649: AArch64: Optimize LSE atomics in C++ code > > src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.S line 75: > >> 73: .align 5 >> 74: aarch64_atomic_cmpxchg_1_default_impl: >> 75: dmb ish > > Having argued above that this dmb is never needed why is it in this default impl? (also for size 4 and 8) The default impl uses ARMv8 LDXR ... STXR but the DMB is not needed if and only if ARMv8.1 LSE instructions. ------------- PR: https://git.openjdk.java.net/jdk/pull/2611 From aph at openjdk.java.net Thu Feb 18 09:25:39 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 18 Feb 2021 09:25:39 GMT Subject: RFR: 8261649: AArch64: Optimize LSE atomics in C++ code In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 18:15:02 GMT, Andrew Haley wrote: >> Now that we have support for LSE atomics in C++ HotSpot source, we can generate much better code for them. In particular, the sequence we generate for CMPXCHG with a full two-way barrier using two DMBs is way suboptimal. >> >> Barrier-ordered-before, Arm Architecture Reference Manual B2.3 : >> >> | Barrier instructions order prior Memory effects before subsequent >> | Memory effects generated by the same Observer. A read or a write RW1 >> | is Barrier-ordered-before a read or a write RW2 from the same Observer >> | if and only if RW1 appears in program order before RW2 and any of the >> | following cases apply: >> | >> | [...] >> | >> | * RW1 appears in program order before an atomic instruction with both >> | Acquire and Release semantics that appears in program order before RW2. >> >> So a prior load or store cannot be reordered with the load of an atomic swap with Acquire and Release semantics. This barrier-ordered-before in combination with sequential consistency gives us everything we need for a full barrier. However, we still need a DMB after the cmpxchg to ensure that subsequent loads and stores cannot be reordered with the store in an atomic instruction. > > This patch: > > Moves memory barriers from the atomic_linux_aarch64 file into the stubs. > Rewrites the LSE versions of the stubs to be more efficient. > Fixes a race condition in stub generation. > Mostly leaves the pre-LSE stubs alone, except that I added a PRFM which according to kernel engineers improves performance. Closing because this is a duplicate. ------------- PR: https://git.openjdk.java.net/jdk/pull/2612 From aph at openjdk.java.net Thu Feb 18 09:25:39 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 18 Feb 2021 09:25:39 GMT Subject: Withdrawn: 8261649: AArch64: Optimize LSE atomics in C++ code In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 18:06:55 GMT, Andrew Haley wrote: > Now that we have support for LSE atomics in C++ HotSpot source, we can generate much better code for them. In particular, the sequence we generate for CMPXCHG with a full two-way barrier using two DMBs is way suboptimal. > > Barrier-ordered-before, Arm Architecture Reference Manual B2.3 : > > | Barrier instructions order prior Memory effects before subsequent > | Memory effects generated by the same Observer. A read or a write RW1 > | is Barrier-ordered-before a read or a write RW2 from the same Observer > | if and only if RW1 appears in program order before RW2 and any of the > | following cases apply: > | > | [...] > | > | * RW1 appears in program order before an atomic instruction with both > | Acquire and Release semantics that appears in program order before RW2. > > So a prior load or store cannot be reordered with the load of an atomic swap with Acquire and Release semantics. This barrier-ordered-before in combination with sequential consistency gives us everything we need for a full barrier. However, we still need a DMB after the cmpxchg to ensure that subsequent loads and stores cannot be reordered with the store in an atomic instruction. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/2612 From aph at openjdk.java.net Thu Feb 18 09:37:43 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 18 Feb 2021 09:37:43 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v6] In-Reply-To: <45gEz_9Vli1Mvby8SbtMsoEx65KL11RQnI-qKy6zfgo=.42933345-0086-466b-ae38-3add53fc1816@github.com> References: <45gEz_9Vli1Mvby8SbtMsoEx65KL11RQnI-qKy6zfgo=.42933345-0086-466b-ae38-3add53fc1816@github.com> Message-ID: On Thu, 18 Feb 2021 07:57:25 GMT, Dong Bo wrote: >> In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero, >> see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`: >> /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */ >> public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT); >> >> The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64, >> assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 0`, but `ushr dst.4H, src.4H, 16` instead. >> According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long. >> ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i); >> vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i); >> >> The legal right shift amount should be in the range 1 to the element width in bits on aarch64: >> https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en >> >> This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate. >> Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests. > > Dong Bo has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - add tests in aarch64-asmtest.py > - fix windows operator precedence error and cleanup testcase > - Merge branch 'master' into aarch64_vector_api_shift > - fix windows build failure > - generate add if shift == 0 for accumulation and fix some test code > - back out AD modifications and handle zero shift in assembler src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 2709: > 2707: f(encodedShift, 22, 16); f(opc2, 15, 10), rf(Vn, 5), rf(Vd, 0); \ > 2708: } \ > 2709: } Is this correct, according to the definition in the Architecture Reference Manual? It doesn't look like it to me. Assembler methods should generate bit patterns exactly as defined in the Manual. This logic should be in a MacroAssembler method. ------------- PR: https://git.openjdk.java.net/jdk/pull/2472 From enikitin at openjdk.java.net Thu Feb 18 10:04:04 2021 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Thu, 18 Feb 2021 10:04:04 GMT Subject: RFR: 8058176: [mlvm] tests should not allow code cache exhaustion [v4] In-Reply-To: References: Message-ID: <2a-RCfm845Xe9B92_9mx-qGY9uDbMBBA95WSkaS6X4g=.4493f0d9-f318-4200-8532-27182367cead@github.com> > Another approach to the JDK-8058176 and #2440 - never allowing the tests hit CodeCache limits. The most significant consumer is the MH graph builder (the MHTransformationGen), whose consumption is now controlled. List of changes: > > * Code cache size getters are added to WhiteBox; > * MH sequences are now built with remaining Code cache size in mind (always let 2M clearance); > * Dependencies on WhiteBox added for all affected tests; > * The test cases in question un-problemlisted. > > Testing: the whole vmTestbase/vm/mlvm/ in win-lin-mac x86. Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: Add non-nmethods pool to the monitoring ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2523/files - new: https://git.openjdk.java.net/jdk/pull/2523/files/763d94b8..6a3c4785 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2523&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2523&range=02-03 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2523.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2523/head:pull/2523 PR: https://git.openjdk.java.net/jdk/pull/2523 From jbachorik at openjdk.java.net Thu Feb 18 10:09:02 2021 From: jbachorik at openjdk.java.net (Jaroslav Bachorik) Date: Thu, 18 Feb 2021 10:09:02 GMT Subject: RFR: 8258431: Provide a JFR event with live set size estimate Message-ID: <7HVs4jngEbNIQIPQByuE6IRYAxdijfa82uhEFWHld5U=.a7784482-d7e1-4d59-88ee-455d8691631e@github.com> The purpose of this change is to expose a 'cheap' estimate of the current live set size (the meaning of 'current' is dependent on each particular GC implementation but in worst case 'at last full GC') in form of a periodically emitted JFR event. ## Introducing new JFR event While there is already 'GC Heap Summary' JFR event it does not fit the requirements as it is closely tied to GC cycle so eg. for ZGC or Shenandoah it may not happen for quite a long time, increasing the risk of not having the heap summary events being present in the JFR recording at all. Because of this I am proposing to add a new 'Heap Usage Summary' event which will be emitted periodically, by default on each JFR chunk, and will contain the information abut the heap capacity, the used and live bytes. This information is available from all GC implementations and can be provided at literally any time. ## Implementation The implementation differs from GC to GC because each GC algorithm/implementation provides a slightly different way to track the liveness. The common part is `size_t live() const` method added to `CollectedHeap` superclass and the use of a cached 'liveness' value computed after the last GC cycle. If `liveness` hasn't been calculated yet the implementation will default to returning 'used' value. The implementations are based on my (rather shallow) knowledge of inner working of the respective GC engines and I am open to suggestions to make them better/correct. ### Epsilon GC Trivial implementation - just return `used()` instead. ### Serial GC Here we utilize the fact that mark-copy phase is naturally compacting so the number of bytes after copy is 'live' and that the mark-sweep implementation keeps an internal info about objects being 'dead' but excluded from the compaction effort and we can these numbers to derive the old-gen live set size (used bytes minus the cumulative size of the 'un-dead' objects). ### Parallel GC For Parallel GC the liveness is calculated as the sum of used bytes in all regions after the last GC cycle. This seems to be a safe bet because this collector is always compacting (AFAIK). ### G1 GC Using `G1ConcurrentMark::remark()` method the live set size is computed as a sum of `_live_words` from the associated `G1RegionMarkStats` objects. Here I am not 100% sure this approach covers all eventualities and it would be great to have someone skilled in G1 implementation to chime in so I can fix it. However, the numbers I am getting for G1 are comparable to other GCs for the same application. ### Shenandoah In Shenandoah, the regions are keeping the liveness info. However, the VM op that is used for iterating regions is a safe-pointing one so it would be great to run it in an already safe-pointed context. This leads to hooking into `ShenandoahConcurrentMark::finish_mark()` and `ShenandoahSTWMark::mark()` where at the end of the marking process the liveness info is summarized and set to `ShenandoahHeap::_live` volatile field - which is later read by the event emitting code. ### ZGC `ZStatHeap` is already holding the liveness info - so this implementation is just making it accessible via `ZCollectedHeap::live()` method. ------------- Commit messages: - 8258431: Provide a JFR event with live set size estimate Changes: https://git.openjdk.java.net/jdk/pull/2579/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2579&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8258431 Stats: 177 lines in 33 files changed: 172 ins; 1 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/2579.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2579/head:pull/2579 PR: https://git.openjdk.java.net/jdk/pull/2579 From shade at openjdk.java.net Thu Feb 18 10:29:41 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 18 Feb 2021 10:29:41 GMT Subject: RFR: 8258431: Provide a JFR event with live set size estimate In-Reply-To: References: <7HVs4jngEbNIQIPQByuE6IRYAxdijfa82uhEFWHld5U=.a7784482-d7e1-4d59-88ee-455d8691631e@github.com> Message-ID: On Thu, 18 Feb 2021 10:23:37 GMT, Aleksey Shipilev wrote: >> The purpose of this change is to expose a 'cheap' estimate of the current live set size (the meaning of 'current' is dependent on each particular GC implementation but in worst case 'at last full GC') in form of a periodically emitted JFR event. >> >> ## Introducing new JFR event >> >> While there is already 'GC Heap Summary' JFR event it does not fit the requirements as it is closely tied to GC cycle so eg. for ZGC or Shenandoah it may not happen for quite a long time, increasing the risk of not having the heap summary events being present in the JFR recording at all. >> Because of this I am proposing to add a new 'Heap Usage Summary' event which will be emitted periodically, by default on each JFR chunk, and will contain the information abut the heap capacity, the used and live bytes. This information is available from all GC implementations and can be provided at literally any time. >> >> ## Implementation >> >> The implementation differs from GC to GC because each GC algorithm/implementation provides a slightly different way to track the liveness. The common part is `size_t live() const` method added to `CollectedHeap` superclass and the use of a cached 'liveness' value computed after the last GC cycle. If `liveness` hasn't been calculated yet the implementation will default to returning 'used' value. >> >> The implementations are based on my (rather shallow) knowledge of inner working of the respective GC engines and I am open to suggestions to make them better/correct. >> >> ### Epsilon GC >> >> Trivial implementation - just return `used()` instead. >> >> ### Serial GC >> >> Here we utilize the fact that mark-copy phase is naturally compacting so the number of bytes after copy is 'live' and that the mark-sweep implementation keeps an internal info about objects being 'dead' but excluded from the compaction effort and we can these numbers to derive the old-gen live set size (used bytes minus the cumulative size of the 'un-dead' objects). >> >> ### Parallel GC >> >> For Parallel GC the liveness is calculated as the sum of used bytes in all regions after the last GC cycle. This seems to be a safe bet because this collector is always compacting (AFAIK). >> >> ### G1 GC >> >> Using `G1ConcurrentMark::remark()` method the live set size is computed as a sum of `_live_words` from the associated `G1RegionMarkStats` objects. Here I am not 100% sure this approach covers all eventualities and it would be great to have someone skilled in G1 implementation to chime in so I can fix it. However, the numbers I am getting for G1 are comparable to other GCs for the same application. >> >> ### Shenandoah >> >> In Shenandoah, the regions are keeping the liveness info. However, the VM op that is used for iterating regions is a safe-pointing one so it would be great to run it in an already safe-pointed context. >> This leads to hooking into `ShenandoahConcurrentMark::finish_mark()` and `ShenandoahSTWMark::mark()` where at the end of the marking process the liveness info is summarized and set to `ShenandoahHeap::_live` volatile field - which is later read by the event emitting code. >> >> ### ZGC >> >> `ZStatHeap` is already holding the liveness info - so this implementation is just making it accessible via `ZCollectedHeap::live()` method. > > src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 627: > >> 625: >> 626: size_t ShenandoahHeap::live() const { >> 627: size_t live = Atomic::load_acquire(&_live); > > I understand you copy-pasted from the same file. We have removed `_acquire` with #2504. Do `Atomic::load` here. ...which also means you want to merge from master to get recent changes? ------------- PR: https://git.openjdk.java.net/jdk/pull/2579 From shade at openjdk.java.net Thu Feb 18 10:29:40 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 18 Feb 2021 10:29:40 GMT Subject: RFR: 8258431: Provide a JFR event with live set size estimate In-Reply-To: <7HVs4jngEbNIQIPQByuE6IRYAxdijfa82uhEFWHld5U=.a7784482-d7e1-4d59-88ee-455d8691631e@github.com> References: <7HVs4jngEbNIQIPQByuE6IRYAxdijfa82uhEFWHld5U=.a7784482-d7e1-4d59-88ee-455d8691631e@github.com> Message-ID: On Mon, 15 Feb 2021 17:23:44 GMT, Jaroslav Bachorik wrote: > The purpose of this change is to expose a 'cheap' estimate of the current live set size (the meaning of 'current' is dependent on each particular GC implementation but in worst case 'at last full GC') in form of a periodically emitted JFR event. > > ## Introducing new JFR event > > While there is already 'GC Heap Summary' JFR event it does not fit the requirements as it is closely tied to GC cycle so eg. for ZGC or Shenandoah it may not happen for quite a long time, increasing the risk of not having the heap summary events being present in the JFR recording at all. > Because of this I am proposing to add a new 'Heap Usage Summary' event which will be emitted periodically, by default on each JFR chunk, and will contain the information abut the heap capacity, the used and live bytes. This information is available from all GC implementations and can be provided at literally any time. > > ## Implementation > > The implementation differs from GC to GC because each GC algorithm/implementation provides a slightly different way to track the liveness. The common part is `size_t live() const` method added to `CollectedHeap` superclass and the use of a cached 'liveness' value computed after the last GC cycle. If `liveness` hasn't been calculated yet the implementation will default to returning 'used' value. > > The implementations are based on my (rather shallow) knowledge of inner working of the respective GC engines and I am open to suggestions to make them better/correct. > > ### Epsilon GC > > Trivial implementation - just return `used()` instead. > > ### Serial GC > > Here we utilize the fact that mark-copy phase is naturally compacting so the number of bytes after copy is 'live' and that the mark-sweep implementation keeps an internal info about objects being 'dead' but excluded from the compaction effort and we can these numbers to derive the old-gen live set size (used bytes minus the cumulative size of the 'un-dead' objects). > > ### Parallel GC > > For Parallel GC the liveness is calculated as the sum of used bytes in all regions after the last GC cycle. This seems to be a safe bet because this collector is always compacting (AFAIK). > > ### G1 GC > > Using `G1ConcurrentMark::remark()` method the live set size is computed as a sum of `_live_words` from the associated `G1RegionMarkStats` objects. Here I am not 100% sure this approach covers all eventualities and it would be great to have someone skilled in G1 implementation to chime in so I can fix it. However, the numbers I am getting for G1 are comparable to other GCs for the same application. > > ### Shenandoah > > In Shenandoah, the regions are keeping the liveness info. However, the VM op that is used for iterating regions is a safe-pointing one so it would be great to run it in an already safe-pointed context. > This leads to hooking into `ShenandoahConcurrentMark::finish_mark()` and `ShenandoahSTWMark::mark()` where at the end of the marking process the liveness info is summarized and set to `ShenandoahHeap::_live` volatile field - which is later read by the event emitting code. > > ### ZGC > > `ZStatHeap` is already holding the liveness info - so this implementation is just making it accessible via `ZCollectedHeap::live()` method. Interesting! Cursory review follows. src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 4578: > 4576: > 4577: void G1CollectedHeap::set_live(size_t bytes) { > 4578: Atomic::release_store(&_live_size, bytes); I don't think this requires `release_store`, regular `store` would be enough. G1 folks can say for sure. src/hotspot/share/gc/parallel/parallelScavengeHeap.hpp line 100: > 98: HeapWord* mem_allocate_old_gen(size_t size); > 99: > 100: Excess newline? src/hotspot/share/gc/shared/collectedHeap.hpp line 217: > 215: virtual size_t capacity() const = 0; > 216: virtual size_t used() const = 0; > 217: // a best-effort estimate of the live set size Suggestion: // Returns the estimate of live set size. Because live set changes over time, // this is a best-effort estimate by each of the implementations. These usually // are most precise right after the GC cycle. src/hotspot/share/gc/shared/genCollectedHeap.cpp line 1144: > 1142: _old_gen->prepare_for_compaction(&cp); > 1143: _young_gen->prepare_for_compaction(&cp); > 1144: Stray newline? src/hotspot/share/gc/shared/genCollectedHeap.hpp line 183: > 181: size_t live = _live_size; > 182: return live > 0 ? live : used(); > 183: }; I think the implementation belongs to `genCollectedHeap.cpp`. src/hotspot/share/gc/shared/generation.hpp line 140: > 138: virtual size_t used() const = 0; // The number of used bytes in the gen. > 139: virtual size_t free() const = 0; // The number of free bytes in the gen. > 140: virtual size_t live() const = 0; Needs a comment to match the lines above? Say, `// The estimate of live bytes in the gen.` src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 579: > 577: event.set_heapLive(heap->live()); > 578: event.commit(); > 579: } On the first sight, this belongs in `ShenandoahConcurrentMark::finish_mark()`. Placing the event here would fire the event when concurrent GC is cancelled, which is not what you want. src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 265: > 263: ShenandoahHeap* const heap = ShenandoahHeap::heap(); > 264: heap->set_concurrent_mark_in_progress(false); > 265: heap->mark_finished(); Let's not rename this method. Introduce a new method, `ShenandoahHeap::update_live`, and call it every time after `mark_complete_marking_context()` is called. src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 627: > 625: > 626: size_t ShenandoahHeap::live() const { > 627: size_t live = Atomic::load_acquire(&_live); I understand you copy-pasted from the same file. We have removed `_acquire` with #2504. Do `Atomic::load` here. src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 655: > 653: > 654: void ShenandoahHeap::set_live(size_t bytes) { > 655: Atomic::release_store_fence(&_live, bytes); Same, do `Atomic::store` here. src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp line 494: > 492: mark_complete_marking_context(); > 493: > 494: class ShenandoahCollectLiveSizeClosure : public ShenandoahHeapRegionClosure { We don't usually use the in-method declarations like these, pull it out of the method. src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp line 511: > 509: > 510: ShenandoahCollectLiveSizeClosure cl; > 511: heap_region_iterate(&cl); I think you want `parallel_heap_region_iterate` on this path, and do `Atomic::add(&_live, r->get_live_data_bytes())` in the closure. We shall see if this makes sense to make fully concurrently... src/hotspot/share/gc/epsilon/epsilonHeap.hpp line 80: > 78: virtual size_t capacity() const { return _virtual_space.committed_size(); } > 79: virtual size_t used() const { return _space->used(); } > 80: virtual size_t live() const { return used(); } I'd prefer to call `_space->used()` directly here. Minor optimization, I know. ------------- Changes requested by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2579 From lucy at openjdk.java.net Thu Feb 18 11:00:40 2021 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Thu, 18 Feb 2021 11:00:40 GMT Subject: RFR: JDK-8261552: s390: MacroAssembler::encode_klass_not_null() may produce wrong results for non-zero values of narrow klass base In-Reply-To: References: Message-ID: On Thu, 18 Feb 2021 05:31:33 GMT, Thomas Stuefe wrote: >> src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3645: >> >>> 3643: } >>> 3644: #endif >>> 3645: >> >> I do not like the cross-dependency to metaspace.hpp just for the sake of checking an artificial restriction on Klass pointers. And by the way, you could do the check with one test: >> z_oihf(current, 0); >> z_brc(Assembler::bcondZero, ok); >> >> z_oihf() does modify the contents of register current, but it writes back the same value. > >> I do not like the cross-dependency to metaspace.hpp just for the sake of checking an artificial restriction on Klass pointers. > > It is not just for the assertion, it is for limiting the 32bit add to situations where we know Klass pointers cannot exceed 32bit. That was the main reason. As I wrote, I was not sure about the assertion myself and am happy to drop it. > >> And by the way, you could do the check with one test: >> >> ``` >> z_oihf(current, 0); >> z_brc(Assembler::bcondZero, ok); >> ``` >> >> z_oihf() does modify the contents of register current, but it writes back the same value. > > Thank you. Unfortunately, information about z assembly was hard to come by. The only public information I found had hardly more than the instruction names, the rest was trial and error. I admit. To find System z information, you need to know the "magic keywords" to search for. In this case, it would be "Principles of Operation". The third or so Google hit would lead you to the System z architecture document. With 2000+ pages to read, you would be lost anyway. :-) ------------- PR: https://git.openjdk.java.net/jdk/pull/2595 From lucy at openjdk.java.net Thu Feb 18 11:25:40 2021 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Thu, 18 Feb 2021 11:25:40 GMT Subject: RFR: 8261447: MethodInvocationCounters frequently run into overflow [v7] In-Reply-To: References: Message-ID: <5KNOS1zweuPypTt0JZTNCzmHaBPj4NhiwCsmNwbDr8c=.a32b1c8b-1eb2-40bc-877a-44cbfebbb0fd@github.com> On Thu, 18 Feb 2021 05:12:08 GMT, David Holmes wrote: >> Lutz Schmidt has updated the pull request incrementally with one additional commit since the last revision: >> >> update copyright year > > src/hotspot/share/oops/method.cpp line 511: > >> 509: tty->cr(); >> 510: >> 511: // Internal counting is based on signed int counters. They tend to > > Is there a good reason to not simply make them unsigned int? Well, depends on what you accept as a good reason. :-) I decided to keep the counters as they are to limit the scope of the change. A grep for backedge_counter returns 94 lines, for example. Deep down, these counters are InvocationCounters, declared as uint. On their way up to the surface, they are treated signed or unsigned. Pretty inconsistent, yes. But a huge task to get it all straight, including checking/fixing assembly code. Is that reason enough? ------------- PR: https://git.openjdk.java.net/jdk/pull/2511 From adinn at openjdk.java.net Thu Feb 18 12:20:39 2021 From: adinn at openjdk.java.net (Andrew Dinn) Date: Thu, 18 Feb 2021 12:20:39 GMT Subject: RFR: 8261649: AArch64: Optimize LSE atomics in C++ code In-Reply-To: <5cSjmWB-Tt-01jPQ5YIJDdZx6RncoQMxTA6DFvSmHzw=.b8681c51-6def-4fd1-a473-e0e5f9e1433f@github.com> References: <5cSjmWB-Tt-01jPQ5YIJDdZx6RncoQMxTA6DFvSmHzw=.b8681c51-6def-4fd1-a473-e0e5f9e1433f@github.com> Message-ID: On Thu, 18 Feb 2021 09:20:21 GMT, Andrew Haley wrote: >> src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.S line 75: >> >>> 73: .align 5 >>> 74: aarch64_atomic_cmpxchg_1_default_impl: >>> 75: dmb ish >> >> Having argued above that this dmb is never needed why is it in this default impl? (also for size 4 and 8) > > The default impl uses ARMv8 LDXR ... STXR but the DMB is not needed if and only if ARMv8.1 LSE instructions. Doh! And it says that clearly in the comment. Ok. ------------- PR: https://git.openjdk.java.net/jdk/pull/2611 From github.com+5010047+kelthuzadx at openjdk.java.net Thu Feb 18 12:37:51 2021 From: github.com+5010047+kelthuzadx at openjdk.java.net (Yang Yi) Date: Thu, 18 Feb 2021 12:37:51 GMT Subject: RFR: 8261949: fileStream::readln returns incorrect line string Message-ID: <36ELXWgiTy9NBIxJnRUIR2YKB2Ka-RYBl8hXOWGbpXs=.b0d8c8be-9f5e-457b-8f47-ba9cd341c2dd@github.com> When the last line does not contain a NEWLINE character, fileStream::readln would read truncated line string: $ cat file_content: AA BB CC fileStream::readlnresult: "AA" "BB" "C" This patch address this problem, it works for Posix and Windows since the last character of these systems is always '\n'. ------------- Commit messages: - fileStream::readln returns incorrect line string Changes: https://git.openjdk.java.net/jdk/pull/2626/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2626&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261949 Stats: 5 lines in 1 file changed: 3 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2626.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2626/head:pull/2626 PR: https://git.openjdk.java.net/jdk/pull/2626 From coleen.phillimore at oracle.com Thu Feb 18 12:54:30 2021 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Thu, 18 Feb 2021 07:54:30 -0500 Subject: CHECK at the end of a void function In-Reply-To: <4b640b6a-d188-d34f-e979-26690974701a@oracle.com> References: <_-r6tekItyqxQA5WKZFBj-b0IQRC6R0Nw7QBMZbBPw0=.6c253bd5-697b-4060-9026-3cf015a6dc09@github.com> <6457788b-cc0a-1038-2032-3978c4a067f5@oracle.com> <02e97ff9-5518-b980-32c1-16b52bff52f6@oracle.com> <4b640b6a-d188-d34f-e979-26690974701a@oracle.com> Message-ID: My preference is to keep THREAD as an argument if you were going to use CHECK for the last statement of a function.? You have to do this if you're returning with a function that takes TRAPS. ie: ? return my_fun(THREAD); For the most part, I don't think it matters if the compiler can optimize it away.? Do our compilers optimize away the extra check for pending exception?? I don't know if you answered this. Lastly, please no, I don't want to see yet another macro for this special case. Coleen On 2/17/21 8:55 PM, David Holmes wrote: > Hi Ioi, > > > CHECK at the end of a void function > > This isn't really about void functions, nor check at the end. It is > about using CHECK/CHECK_* on a call that is immediately followed by a > return from the current function as it degenerates to: > > if (EXCEPTION_OCCURRED) > ? return; > else > ? return; > > On 18/02/2021 8:16 am, Ioi Lam wrote: >> Converting this from a PR discussion >> (https://git.openjdk.java.net/jdk/pull/2494) to a regular mail. > > Thanks for doing that. > >> What are people's opinion of: >> >> ???? void bar(TRAPS); >> >> ???? void foo(TRAPS) { >> ?? ?? ? bar(CHECK); >> ???? } >> >> vs >> >> ???? void foo(TRAPS) { >> ?? ?? ? bar(THREAD); >> ???? } >> >> There's no mention of this in >> https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md >> >> Advantage of CHECK: >> >> - More readable -- you don't need to ask yourself: >> ?? does the callee need a THREAD, or is the callee a TRAPS function? > > But you also don't need to ask yourself that because it doesn't matter > when the next action is to return anyway. > >> - More maintainable. You don't accidentally miss a check if you add new >> ?? code below >> >> ???? void foo(TRAPS) { >> ?? ?? ? bar(THREAD); >> ?? ? ?? baz();?????? // adding a new call .... >> ???? } > > True but I would argue that you need to think about the behaviour of > bar when adding baz() regardless. This might be wrong: > > bar(CHECK); > baz(); // <= critical code that must always be executed no matter what! > >> Note that we MUST use THREAD when returning a value (see >> https://bugs.openjdk.java.net/browse/JDK-6889002) >> >> ???? int x(TRAPS); >> >> ???? int y(TRAPS) { >> ??????? return x(THREAD); >> ???? } > > I think this is an anti-pattern and we should prefer the more explicit: > > RetType ret = x(CHECK_*); > return ret; > > if we want to emphasize use of CHECK. > >> so there's inconsistency. However, the compiler will given an error >> if you add code below the THREAD. So we don't have the maintenance >> issue as void functions: >> >> ???? int Y(TRAPS) { >> ??????? return X(THREAD); >> ??????? baz(); >> ???? } > > Don't quite follow that as you wouldn't write anything after a return > statement anyway. > >> Disadvantage of CHECK: >> >> - It's not guaranteed that the C compiler will elide it. The code >> gets pre-processed to >> >> ???? inlined bool ThreadShadow::has_pending_exception() const { >> ??? ?? ? return _pending_exception != NULL; >> ???? } >> >> ???? void foo(Thread*? __the_thread__) { >> ?? ????? bar(__the_thread__); >> ???????? if >> (((ThreadShadow*)__the_thread__)->has_pending_exception()) return; >> ???? } >> >> Is it safe to assume that any C compiler that can efficiently compile >> HotSpot will always elide the "if" line? > > I've no idea, but if we can rely on it then I'm okay with always using > CHECK. It was the redundant code execution that was my concern. > >> I am a little worried about the maintenance issue. If we really want >> to avoid the CHECK, I would prefer to have a new macro like: >> >> ???? void foo(TRAPS) { >> ??????? bar(CHECK_AT_RETURN); >> ???? } >> >> which will be preprocessed to >> >> ???? void foo(....) { >> ??????? bar(_thread__); return; >> ???? } >> >> So you can't accidentally add code below it. > > I agree that a new macro might be preferable than just using THREAD. > Coming up with a short but meaningful name will be a problem. :) The > point is we are not actually checking anything in this case. > > Thanks, > David > >> Thanks >> ??-Ioi >> From david.holmes at oracle.com Thu Feb 18 12:56:15 2021 From: david.holmes at oracle.com (David Holmes) Date: Thu, 18 Feb 2021 22:56:15 +1000 Subject: RFR: 8261447: MethodInvocationCounters frequently run into overflow [v7] In-Reply-To: <5KNOS1zweuPypTt0JZTNCzmHaBPj4NhiwCsmNwbDr8c=.a32b1c8b-1eb2-40bc-877a-44cbfebbb0fd@github.com> References: <5KNOS1zweuPypTt0JZTNCzmHaBPj4NhiwCsmNwbDr8c=.a32b1c8b-1eb2-40bc-877a-44cbfebbb0fd@github.com> Message-ID: <0d1f637c-0330-3362-99d6-5c310fd3f98d@oracle.com> On 18/02/2021 9:25 pm, Lutz Schmidt wrote: > On Thu, 18 Feb 2021 05:12:08 GMT, David Holmes wrote: > >>> Lutz Schmidt has updated the pull request incrementally with one additional commit since the last revision: >>> >>> update copyright year >> >> src/hotspot/share/oops/method.cpp line 511: >> >>> 509: tty->cr(); >>> 510: >>> 511: // Internal counting is based on signed int counters. They tend to >> >> Is there a good reason to not simply make them unsigned int? > > Well, depends on what you accept as a good reason. :-) > > I decided to keep the counters as they are to limit the scope of the change. A grep for backedge_counter returns 94 lines, for example. Deep down, these counters are InvocationCounters, declared as uint. On their way up to the surface, they are treated signed or unsigned. Pretty inconsistent, yes. But a huge task to get it all straight, including checking/fixing assembly code. > > Is that reason enough? I guess so :) It sounds terribly messy and confused. Thanks, David > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/2511 > From zgu at openjdk.java.net Thu Feb 18 15:45:42 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 18 Feb 2021 15:45:42 GMT Subject: RFR: JDK-8261644: NMT: Simplifications and cleanups [v4] In-Reply-To: References: Message-ID: On Sat, 13 Feb 2021 05:43:56 GMT, Thomas Stuefe wrote: >> Hi, >> >> may I please have reviews for this RFE? >> >> While working on NMT I found a number of possible cleanups and simplifications. I avoided mixing these cleanups with fixed and instead put them into this cleanup RFE. >> >> There should be no behavioral changes in this patch. >> >> - de-templatize `AllocationSite` since E was used as simple data holder for child classes; the same effect can be had with traditional inheritance with less and clearer code (also IDEs get less confused) >> >> - `AllocationSite` child classes `SimpleThreadStackSite`, `VirtualMemoryAllocationSite`, `MallocSite` were simplified. >> >> - As for `SimpleThreadStackSite`, we can get rid of the separate data holder class `SimpleThreadStack` entirely by merging its members directly into `SimpleThreadStackSite`. In theory we could do the same for the data holder classes `MemoryCounter` and `VirtualMemory` for `MallocSite` and `VirtualMemoryAllocationSite` too but this would cause larger ripples so I stopped there. >> >> - removed the SimpleThreadStackSite(address base, size_t size) constructor (the one not taking a call stack) by slightly rewriting its sole user >> >> - made `AllocationSite` immutable >> >> - removed unused default constructors from `MallocSite` and `MallocSiteHashTableEntry` since they were not needed >> >> - removed unused methods `set_callsite()`, `hash()`, `equals()` from `MallocSiteHashTableEntry` >> >> - There was a subtle incorrectness where `AllocationSite::equals()` would only compare callstack and disregard the MEMFLAGS member. Theoretically, if two callstacks end with the same lowest frame, they should always reference the same single allocation, so that's okay. But if the call stack capturing was not precise enough (eg skipping too many low frames) we may accidentally lump several allocation sites together which could have different MEMFLAGS. I added an assert to check that. (_Update: seems this assert really fires on s390x, so this is a real problem. I opened [1] to track this and restored the old behavior._). >> >> - `NativeCallStack`: Removed the `fillStack` argument from the first constructor to avoid having to evaluate it in this hot constructor. Its true in almost all cases. >> >> - Also removed the `toSkip` default value. Instead, I added an explicit default constructor. >> >> - Moved the malloc site table tuning statistics printing from memtracker.cpp down into a new function `MallocSiteTable::print_tuning_statistics()`. When implemented inside `MallocSiteTable`, that coding does not need a walker object anymore and becomes a lot simpler. In particular, we don't have to rely on implicit knowledge about walking order, which made the code complex and was vulnerable against subtle errors. New code is more compact and simpler. Before removing the old implementation, I ran both statistics side by side for a couple of scenarios (eg really bad hash code implementations) and the output was identical. >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8261556 >> >> ---- >> Tests: >> - github GA >> - manual NMT jtreg tests (including the currently disabled runtime/NMT/CheckForProperDetailStackTrace.java) >> - Full nightlies at SAP are scheduled > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > reduce diff Looks good to me ------------- Marked as reviewed by zgu (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2539 From dcubed at openjdk.java.net Thu Feb 18 16:23:43 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 18 Feb 2021 16:23:43 GMT Subject: RFR: 8261949: fileStream::readln returns incorrect line string In-Reply-To: <36ELXWgiTy9NBIxJnRUIR2YKB2Ka-RYBl8hXOWGbpXs=.b0d8c8be-9f5e-457b-8f47-ba9cd341c2dd@github.com> References: <36ELXWgiTy9NBIxJnRUIR2YKB2Ka-RYBl8hXOWGbpXs=.b0d8c8be-9f5e-457b-8f47-ba9cd341c2dd@github.com> Message-ID: On Thu, 18 Feb 2021 12:31:29 GMT, Yang Yi wrote: > When the last line does not contain a NEWLINE character, fileStream::readln would read > truncated line string: > > $ cat file_content: > AA > BB > CC > > fileStream::readln result: > "AA" > "BB" > "C" > > This patch address this problem, it works for Posix and Windows since the last character > of these systems is always '\n'. Changes requested by dcubed (Reviewer). src/hotspot/share/utilities/ostream.cpp line 593: > 591: ret = ::fgets(data, count, _file); > 592: // Get rid of annoying \n char only if it presents, it works for Posix > 593: // and Windows since the last character of these systems is always '\n' s/of these/on these/ Please end the comment with a period. src/hotspot/share/utilities/ostream.cpp line 595: > 593: // and Windows since the last character of these systems is always '\n' > 594: if (data[::strlen(data)-1] == '\n') { > 595: data[::strlen(data)-1] = '\0'; Perhaps: size_t last_char = ::strlen(data) - 1; if (last_char >= 0 && data[last_char] == '\n') { data[last_char] = '\0'; ------------- PR: https://git.openjdk.java.net/jdk/pull/2626 From adinn at openjdk.java.net Thu Feb 18 16:31:39 2021 From: adinn at openjdk.java.net (Andrew Dinn) Date: Thu, 18 Feb 2021 16:31:39 GMT Subject: RFR: 8261649: AArch64: Optimize LSE atomics in C++ code In-Reply-To: References: Message-ID: <8D9bHVuaaxLpoW5kFZ3N1c7myEwj1-yOmDpKmZ599I4=.c6ea773e-db18-4377-b791-9f1959e487eb@github.com> On Wed, 17 Feb 2021 17:14:46 GMT, Andrew Haley wrote: > Now that we have support for LSE atomics in C++ HotSpot source, we can generate much better code for them. In particular, the sequence we generate for CMPXCHG with a full two-way barrier using two DMBs is way suboptimal. > > This patch: > > Moves memory barriers from the atomic_linux_aarch64 file into the stubs. > Rewrites the LSE versions of the stubs to be more efficient. > Fixes a race condition in stub generation. > Mostly leaves the pre-LSE stubs alone, except that I added a PRFM which according to kernel engineers improves performance. Ok, this looks good enough to me. ------------- Marked as reviewed by adinn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2611 From aph at openjdk.java.net Thu Feb 18 16:49:00 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 18 Feb 2021 16:49:00 GMT Subject: RFR: 8261649: AArch64: Optimize LSE atomics in C++ code [v2] In-Reply-To: References: Message-ID: > Now that we have support for LSE atomics in C++ HotSpot source, we can generate much better code for them. In particular, the sequence we generate for CMPXCHG with a full two-way barrier using two DMBs is way suboptimal. > > This patch: > > Moves memory barriers from the atomic_linux_aarch64 file into the stubs. > Rewrites the LSE versions of the stubs to be more efficient. > Fixes a race condition in stub generation. > Mostly leaves the pre-LSE stubs alone, except that I added a PRFM which according to kernel engineers improves performance. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Make things slightly less confusing ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2611/files - new: https://git.openjdk.java.net/jdk/pull/2611/files/23df4485..68ff88d9 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2611&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2611&range=00-01 Stats: 21 lines in 1 file changed: 8 ins; 0 del; 13 mod Patch: https://git.openjdk.java.net/jdk/pull/2611.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2611/head:pull/2611 PR: https://git.openjdk.java.net/jdk/pull/2611 From aph at openjdk.java.net Thu Feb 18 16:49:01 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 18 Feb 2021 16:49:01 GMT Subject: RFR: 8261649: AArch64: Optimize LSE atomics in C++ code [v2] In-Reply-To: References: Message-ID: On Thu, 18 Feb 2021 09:15:12 GMT, Andrew Haley wrote: >> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5651: >> >>> 5649: __ mov(prev, compare_val); >>> 5650: __ lse_cas(prev, exchange_val, ptr, size, acquire, release, /*not_pair*/true); >>> 5651: if (acquire && release) { >> >> These two flags are only ever passed as true,true or false,false. Does any other combination make sense? If not then should you not be using a single flag? or at least asserting (pro tem) that they are both equal? > > Today HotSpot only really supports mo_conservative and mo_relaxed, but there are many places in HotSpot where release on its own would make sense; I think Aleksey recently found some. Having said that, it would be clearer here to expose mo_conservative as well. I'll do so. Clearer now? ------------- PR: https://git.openjdk.java.net/jdk/pull/2611 From aph at openjdk.java.net Thu Feb 18 16:56:05 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 18 Feb 2021 16:56:05 GMT Subject: RFR: 8261649: AArch64: Optimize LSE atomics in C++ code [v3] In-Reply-To: References: Message-ID: > Now that we have support for LSE atomics in C++ HotSpot source, we can generate much better code for them. In particular, the sequence we generate for CMPXCHG with a full two-way barrier using two DMBs is way suboptimal. > > This patch: > > Moves memory barriers from the atomic_linux_aarch64 file into the stubs. > Rewrites the LSE versions of the stubs to be more efficient. > Fixes a race condition in stub generation. > Mostly leaves the pre-LSE stubs alone, except that I added a PRFM which according to kernel engineers improves performance. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Remove mistaken change. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2611/files - new: https://git.openjdk.java.net/jdk/pull/2611/files/68ff88d9..6b338901 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2611&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2611&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2611.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2611/head:pull/2611 PR: https://git.openjdk.java.net/jdk/pull/2611 From shade at openjdk.java.net Thu Feb 18 16:56:06 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 18 Feb 2021 16:56:06 GMT Subject: RFR: 8261649: AArch64: Optimize LSE atomics in C++ code [v3] In-Reply-To: References: Message-ID: On Thu, 18 Feb 2021 16:52:29 GMT, Andrew Haley wrote: >> Now that we have support for LSE atomics in C++ HotSpot source, we can generate much better code for them. In particular, the sequence we generate for CMPXCHG with a full two-way barrier using two DMBs is way suboptimal. >> >> This patch: >> >> Moves memory barriers from the atomic_linux_aarch64 file into the stubs. >> Rewrites the LSE versions of the stubs to be more efficient. >> Fixes a race condition in stub generation. >> Mostly leaves the pre-LSE stubs alone, except that I added a PRFM which according to kernel engineers improves performance. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Remove mistaken change. Passer-by comments... src/hotspot/cpu/aarch64/vm_version_aarch64.cpp line 192: > 190: if (_cpu == CPU_ARM && (_model == 0xd07 || _model2 == 0xd07)) _features |= CPU_STXR_PREFETCH; > 191: > 192: _features |= CPU_STXR_PREFETCH; This looks weird. The line above it adds `CPU_STXR_PREFETCH` conditionally. Which one is correct? ------------- PR: https://git.openjdk.java.net/jdk/pull/2611 From aph at openjdk.java.net Thu Feb 18 16:56:07 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 18 Feb 2021 16:56:07 GMT Subject: RFR: 8261649: AArch64: Optimize LSE atomics in C++ code [v3] In-Reply-To: <8D9bHVuaaxLpoW5kFZ3N1c7myEwj1-yOmDpKmZ599I4=.c6ea773e-db18-4377-b791-9f1959e487eb@github.com> References: <8D9bHVuaaxLpoW5kFZ3N1c7myEwj1-yOmDpKmZ599I4=.c6ea773e-db18-4377-b791-9f1959e487eb@github.com> Message-ID: On Thu, 18 Feb 2021 16:28:47 GMT, Andrew Dinn wrote: > Ok, this looks good enough to me. I pulled the vm_version.cpp change, which I committed by mistake. I think it's the right thing to do, but it needs a separate discussion. OK? ------------- PR: https://git.openjdk.java.net/jdk/pull/2611 From shade at openjdk.java.net Thu Feb 18 16:56:08 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 18 Feb 2021 16:56:08 GMT Subject: RFR: 8261649: AArch64: Optimize LSE atomics in C++ code [v2] In-Reply-To: References: Message-ID: On Thu, 18 Feb 2021 16:49:00 GMT, Andrew Haley wrote: >> Now that we have support for LSE atomics in C++ HotSpot source, we can generate much better code for them. In particular, the sequence we generate for CMPXCHG with a full two-way barrier using two DMBs is way suboptimal. >> >> This patch: >> >> Moves memory barriers from the atomic_linux_aarch64 file into the stubs. >> Rewrites the LSE versions of the stubs to be more efficient. >> Fixes a race condition in stub generation. >> Mostly leaves the pre-LSE stubs alone, except that I added a PRFM which according to kernel engineers improves performance. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Make things slightly less confusing src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5655: > 5653: acquire = false, release = false; break; > 5654: default: > 5655: acquire = true, release = true; break; I think hotspot style is to indent the `case`-s. Also commas in `acquire` and `release` assignments look weird. Consider: Register prev = r3; Register ptr = c_rarg0; Register compare_val = c_rarg1; Register exchange_val = c_rarg2; bool acquire, release; switch (order) { case memory_order_relaxed: acquire = false; release = false; break; default: acquire = true; release = true; break; } ------------- PR: https://git.openjdk.java.net/jdk/pull/2611 From aph at openjdk.java.net Thu Feb 18 17:01:45 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 18 Feb 2021 17:01:45 GMT Subject: RFR: 8261649: AArch64: Optimize LSE atomics in C++ code [v2] In-Reply-To: References: Message-ID: On Thu, 18 Feb 2021 16:49:26 GMT, Aleksey Shipilev wrote: >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> Make things slightly less confusing > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5655: > >> 5653: acquire = false, release = false; break; >> 5654: default: >> 5655: acquire = true, release = true; break; > > I think hotspot style is to indent the `case`-s. Also commas in `acquire` and `release` assignments look weird. Consider: > > Register prev = r3; > Register ptr = c_rarg0; > Register compare_val = c_rarg1; > Register exchange_val = c_rarg2; > > bool acquire, release; > switch (order) { > case memory_order_relaxed: > acquire = false; > release = false; > break; > default: > acquire = true; > release = true; > break; > } Oh noes! I'll have to change my emacs indentation setup. I may be gone some time... ------------- PR: https://git.openjdk.java.net/jdk/pull/2611 From aph at openjdk.java.net Thu Feb 18 17:06:53 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 18 Feb 2021 17:06:53 GMT Subject: RFR: 8261649: AArch64: Optimize LSE atomics in C++ code [v4] In-Reply-To: References: Message-ID: > Now that we have support for LSE atomics in C++ HotSpot source, we can generate much better code for them. In particular, the sequence we generate for CMPXCHG with a full two-way barrier using two DMBs is way suboptimal. > > This patch: > > Moves memory barriers from the atomic_linux_aarch64 file into the stubs. > Rewrites the LSE versions of the stubs to be more efficient. > Fixes a race condition in stub generation. > Mostly leaves the pre-LSE stubs alone, except that I added a PRFM which according to kernel engineers improves performance. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Switch statement layout. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2611/files - new: https://git.openjdk.java.net/jdk/pull/2611/files/6b338901..83016c1d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2611&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2611&range=02-03 Stats: 8 lines in 1 file changed: 4 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/2611.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2611/head:pull/2611 PR: https://git.openjdk.java.net/jdk/pull/2611 From aph at openjdk.java.net Thu Feb 18 17:06:53 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 18 Feb 2021 17:06:53 GMT Subject: RFR: 8261649: AArch64: Optimize LSE atomics in C++ code [v2] In-Reply-To: References: Message-ID: On Thu, 18 Feb 2021 16:58:39 GMT, Andrew Haley wrote: >> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5655: >> >>> 5653: acquire = false, release = false; break; >>> 5654: default: >>> 5655: acquire = true, release = true; break; >> >> I think hotspot style is to indent the `case`-s. Also commas in `acquire` and `release` assignments look weird. Consider: >> >> Register prev = r3; >> Register ptr = c_rarg0; >> Register compare_val = c_rarg1; >> Register exchange_val = c_rarg2; >> >> bool acquire, release; >> switch (order) { >> case memory_order_relaxed: >> acquire = false; >> release = false; >> break; >> default: >> acquire = true; >> release = true; >> break; >> } > > Oh noes! I'll have to change my emacs indentation setup. I may be gone some time... Ha! Turns out it's an emacs FAQ: (c-set-offset 'case-label '+) ------------- PR: https://git.openjdk.java.net/jdk/pull/2611 From aph at openjdk.java.net Thu Feb 18 17:11:40 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 18 Feb 2021 17:11:40 GMT Subject: RFR: 8261649: AArch64: Optimize LSE atomics in C++ code [v4] In-Reply-To: References: Message-ID: On Thu, 18 Feb 2021 16:43:29 GMT, Aleksey Shipilev wrote: >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> Switch statement layout. > > src/hotspot/cpu/aarch64/vm_version_aarch64.cpp line 192: > >> 190: if (_cpu == CPU_ARM && (_model == 0xd07 || _model2 == 0xd07)) _features |= CPU_STXR_PREFETCH; >> 191: >> 192: _features |= CPU_STXR_PREFETCH; > > This looks weird. The line above it adds `CPU_STXR_PREFETCH` conditionally. Which one is correct? It was a mistake. Gone now. ------------- PR: https://git.openjdk.java.net/jdk/pull/2611 From adinn at openjdk.java.net Thu Feb 18 17:43:40 2021 From: adinn at openjdk.java.net (Andrew Dinn) Date: Thu, 18 Feb 2021 17:43:40 GMT Subject: RFR: 8261649: AArch64: Optimize LSE atomics in C++ code [v4] In-Reply-To: References: Message-ID: On Thu, 18 Feb 2021 16:52:29 GMT, Aleksey Shipilev wrote: >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> Switch statement layout. > > Passer-by comments... The memory_order_conservative/relaxed change is better. still good to go. ------------- PR: https://git.openjdk.java.net/jdk/pull/2611 From kvn at openjdk.java.net Thu Feb 18 19:17:44 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 18 Feb 2021 19:17:44 GMT Subject: RFR: 8261542: X86 slice and unslice intrinsics for 256-bit byte/short vectors In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 02:37:35 GMT, Sandhya Viswanathan wrote: > The slice and unslice intrinsics for 256-bit byte/short vectors can be implemented for x86 platforms supporting AVX2 using a sequence of instructions. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8261542 > > The PerfSliceOrigin.java jmh test attached to the JBS shows the following performance on AVX2 platform. > > Before: > Benchmark (size) Mode Cnt Score Error Units > PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 18.887 ? 1.128 ops/ms > PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 9.374 ? 0.370 ops/ms > > After: > Benchmark (size) Mode Cnt Score Error Units > PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 13861.420 ? 19.071 ops/ms > PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 7895.199 ? 142.580 ops/ms Please, add a test which verifies correctness of results when this code is used. If we don't have it already. src/hotspot/cpu/x86/x86.ad line 7506: > 7504: instruct rearrangeB_avx(legVec dst, legVec src, vec shuffle, legVec vtmp1, legVec vtmp2, rRegP scratch) %{ > 7505: predicate(vector_element_basic_type(n) == T_BYTE && > 7506: vector_length(n) == 32 && !VM_Version::supports_avx512_vbmi()); Predicate matches bail-out condition in match_rule_supported_vector(). Does it mean this code never used before? So you are implementing it now. Right? src/hotspot/cpu/x86/x86.ad line 7550: > 7548: // only byte shuffle instruction available on these platforms > 7549: int vlen_in_bytes = vector_length_in_bytes(this); > 7550: if (UseAVX == 0) { This code will not be executed with vector length 16 because match_rule_supported_vector() bailout with (size_in_bits == 256 && UseAVX < 2). ------------- Changes requested by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2520 From gziemski at openjdk.java.net Thu Feb 18 19:17:53 2021 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Thu, 18 Feb 2021 19:17:53 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v7] In-Reply-To: <8vAf2pIuZoDkFY-d3VC9rzzss_Bmult2_cbEJo6Aw6c=.fd724d30-bae0-4aaf-8676-769982ed45d6@github.com> References: <8vAf2pIuZoDkFY-d3VC9rzzss_Bmult2_cbEJo6Aw6c=.fd724d30-bae0-4aaf-8676-769982ed45d6@github.com> Message-ID: <2uu-RLIOIKtYE08Gw0OsFUqqncjKqUEWc_7X46OxV5w=.8988091b-1d5d-49e9-bf5e-a1287439f324@github.com> On Wed, 17 Feb 2021 13:26:20 GMT, Thomas Stuefe wrote: >> In signal handling code, we have code sections which save signal handler state into vectors of sigaction structures, or of integers (if only flags are saved). All these code sections can be unified, disentangled and the using code simplified. >> >> There are three places where we do this: >> >> 1) When installing hotspot signal handlers, should we find a handler in place and signal chaining is enabled, we save the original handler inside a sigaction array and a corresponding sigset: >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L85 >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L338 >> >> 2) if diagnostics are enabled with -Xcheck:jni, we periodically check if our hotspot signal handlers had been replaced (`static void check_signal_handler(int sig)`): >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L766 >> To do that, we store information about the handlers we installed and we expect to be intact; in this case we only store the sigaction flags (`int sigflags[NSIG];`) and deduce the handler address from context. >> >> 3) There is a complicated dance between VMError and the posix signal handler code: If a fatal error happens, we enter error reporting and install the secondary handler (`VMError::install_secondary_signal_handler()`). Before doing that, we store the handler we replace in yet another array, in this case one array for the handler address, one for the flag: >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/vmError_posix.cpp#L77 >> I believe the purpose of this is to - when printing signal handlers as part of error reporting - print the original signal handler instead of the secondary crash handler (see `PosixSignals::print_signal_handler()`): >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L1372 >> and additionally to not trip this warning here: >> https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L1391 >> >> ------ >> >> Changes in this patch: >> >> - I added some convenience macros to check if a handler matches a given function (HANDLER_IS), check if a handler is set to ignore or default or both (HANDLER_IS_IGN, HANDLER_IS_DFL, HANDLER_IS_IGN_OR_DFL). Makes code more readable. >> - I added convenience class `SavedSignalHandlers` to keep a vector of handler information by signal number. >> - I used that class to cover cases (1)..(3): >> - `chained_handlers` contains all information of chained handlers >> - `expected_handlers` contains a copy of the handlers the hotspot installed >> - `replaced_handlers` contains information about replaced handlers >> >> - about (1): I store the chained signal handler information in `chained_handlers` when installing a hotspot handler, UseSignalChaining is 1, and a non-default handler was encountered. >> >> - about (2): I simplified the signal checking mechanism quite a bit: it compares the handler (address and flags) it finds present with expectations. Before this patch, the expected handler address was deduced in a hard-wired way, now, we just compare the active sigaction structure with the one we installed on VM start. >> >> - about (3): when installing any handler (hotspot as well as user defined via java), I store the handler it replaced in `replaced_handlers`. I use that to print which handler had been replaced in `PosixSignals::print_signal_handler`. I simplified `PosixSignals::print_signal_handler` such that it does not retain any knowledge about hotspot signal handlers. Now, it just prints out the currently established handlers. In addition to that, it prints out chaining information and which handlers had been replaced. I removed the associated coding from VMError. >> >> Output Before: >> 663 Signal Handlers: >> 664 SIGSEGV: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 665 SIGBUS: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 666 SIGFPE: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 667 SIGPIPE: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 668 SIGXFSZ: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 669 SIGILL: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 670 SIGUSR2: SR_handler in libjvm.so, sa_mask[0]=00000000000000000000000000000000, sa_flags=SA_RESTART|SA_SIGINFO >> 671 SIGHUP: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 672 SIGINT: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 673 SIGTERM: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 674 SIGQUIT: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> 675 SIGTRAP: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO >> >> Now: >> Signal Handlers: >> SIGSEGV: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGSEGV: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGBUS: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGBUS: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGFPE: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGFPE: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGPIPE: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGXFSZ: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGILL: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> replaced: SIGILL: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGUSR2: SR_handler in libjvm.so, mask=00000000000000000000000000000000, flags=SA_RESTART|SA_SIGINFO >> SIGHUP: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGINT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGTERM: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGQUIT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> SIGTRAP: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO >> >> ----- >> Tests: GA, and the patch has been tested in our nighlies for over a month now. I manually executed the runtime/jni/checked tests too. > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: > > - Style fixes > - expected_handlers->vm_handlers > - Merge > - Use universal zero initializer for do_check_signal_periodically > - Further fixes > - Fix build error on zlinux > - David Feedback > - Make SavedSignalHandlers use C-heap for its items > - Removed display-replaced-handler-logic > - Feedback David > - ... and 2 more: https://git.openjdk.java.net/jdk/compare/36fdd646...5cf58186 Marked as reviewed by gziemski (Committer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2251 From gziemski at openjdk.java.net Thu Feb 18 19:17:53 2021 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Thu, 18 Feb 2021 19:17:53 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v7] In-Reply-To: References: <8vAf2pIuZoDkFY-d3VC9rzzss_Bmult2_cbEJo6Aw6c=.fd724d30-bae0-4aaf-8676-769982ed45d6@github.com> Message-ID: On Thu, 18 Feb 2021 05:26:58 GMT, Thomas Stuefe wrote: >> Still good to me. >> >> Thanks, >> David > >> Still good to me. >> >> Thanks, >> David > > Thanks, David I'm going to give thumbs up and not hold this up any longer, even though I haven't figured out yet why I see the issue with my Xcode built hotspot. Whatever the issue, which is most likely due to how I build it, it can be handled either on my side, or if it's JDK then we can fix it in a followup. cheers ------------- PR: https://git.openjdk.java.net/jdk/pull/2251 From sviswanathan at openjdk.java.net Thu Feb 18 21:26:41 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Thu, 18 Feb 2021 21:26:41 GMT Subject: RFR: 8261542: X86 slice and unslice intrinsics for 256-bit byte/short vectors In-Reply-To: References: Message-ID: On Thu, 18 Feb 2021 19:14:37 GMT, Vladimir Kozlov wrote: >> The slice and unslice intrinsics for 256-bit byte/short vectors can be implemented for x86 platforms supporting AVX2 using a sequence of instructions. >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8261542 >> >> The PerfSliceOrigin.java jmh test attached to the JBS shows the following performance on AVX2 platform. >> >> Before: >> Benchmark (size) Mode Cnt Score Error Units >> PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 18.887 ? 1.128 ops/ms >> PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 9.374 ? 0.370 ops/ms >> >> After: >> Benchmark (size) Mode Cnt Score Error Units >> PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 13861.420 ? 19.071 ops/ms >> PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 7895.199 ? 142.580 ops/ms > > Please, add a test which verifies correctness of results when this code is used. If we don't have it already. @vnkozlov thanks a lot for the review. The test for slice and unslice are already part of test/jdk/jdk/incubator/vector/Byte256VectorTests.java and Short256VectorTests.java. > src/hotspot/cpu/x86/x86.ad line 7550: > >> 7548: // only byte shuffle instruction available on these platforms >> 7549: int vlen_in_bytes = vector_length_in_bytes(this); >> 7550: if (UseAVX == 0) { > > This code will not be executed with vector length 16 because match_rule_supported_vector() bailout with (size_in_bits == 256 && UseAVX < 2). Yes you are right, but this code will execute for vector length 16 when UseAVX ==2. It will also execure for vector length 16 when UseAVX == 3 && !VM_Version::supports_avx512bw. > src/hotspot/cpu/x86/x86.ad line 7506: > >> 7504: instruct rearrangeB_avx(legVec dst, legVec src, vec shuffle, legVec vtmp1, legVec vtmp2, rRegP scratch) %{ >> 7505: predicate(vector_element_basic_type(n) == T_BYTE && >> 7506: vector_length(n) == 32 && !VM_Version::supports_avx512_vbmi()); > > Predicate matches bail-out condition in match_rule_supported_vector(). Does it mean this code never used before? > So you are implementing it now. Right? Yes, this rule was not used before and I am implementing it now. ------------- PR: https://git.openjdk.java.net/jdk/pull/2520 From kvn at openjdk.java.net Thu Feb 18 23:24:43 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 18 Feb 2021 23:24:43 GMT Subject: RFR: 8261542: X86 slice and unslice intrinsics for 256-bit byte/short vectors In-Reply-To: References: Message-ID: On Thu, 18 Feb 2021 19:14:37 GMT, Vladimir Kozlov wrote: >> The slice and unslice intrinsics for 256-bit byte/short vectors can be implemented for x86 platforms supporting AVX2 using a sequence of instructions. >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8261542 >> >> The PerfSliceOrigin.java jmh test attached to the JBS shows the following performance on AVX2 platform. >> >> Before: >> Benchmark (size) Mode Cnt Score Error Units >> PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 18.887 ? 1.128 ops/ms >> PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 9.374 ? 0.370 ops/ms >> >> After: >> Benchmark (size) Mode Cnt Score Error Units >> PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 13861.420 ? 19.071 ops/ms >> PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 7895.199 ? 142.580 ops/ms > > Please, add a test which verifies correctness of results when this code is used. If we don't have it already. > @vnkozlov thanks a lot for the review. > The test for slice and unslice are already part of test/jdk/jdk/incubator/vector/Byte256VectorTests.java and Short256VectorTests.java. Good. ------------- PR: https://git.openjdk.java.net/jdk/pull/2520 From kvn at openjdk.java.net Thu Feb 18 23:34:45 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 18 Feb 2021 23:34:45 GMT Subject: RFR: 8261542: X86 slice and unslice intrinsics for 256-bit byte/short vectors In-Reply-To: References: Message-ID: On Thu, 18 Feb 2021 21:21:49 GMT, Sandhya Viswanathan wrote: >> src/hotspot/cpu/x86/x86.ad line 7550: >> >>> 7548: // only byte shuffle instruction available on these platforms >>> 7549: int vlen_in_bytes = vector_length_in_bytes(this); >>> 7550: if (UseAVX == 0) { >> >> This code will not be executed with vector length 16 because match_rule_supported_vector() bailout with (size_in_bits == 256 && UseAVX < 2). > > Yes you are right, but this code will execute for vector length 16 when UseAVX ==2. > It will also execure for vector length 16 when UseAVX == 3 && > !VM_Version::supports_avx512bw. Next assert checks <= 16 when code is guarded by (UseAVX == 0). It is not (UseAVX ==2). Also } else { case is for UseAVX > 0 which includes AVX=1 but vpaddb() (avx3) is used there. Seems UseAVX checks wrong here. ------------- PR: https://git.openjdk.java.net/jdk/pull/2520 From sviswanathan at openjdk.java.net Fri Feb 19 00:35:54 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Fri, 19 Feb 2021 00:35:54 GMT Subject: RFR: 8261542: X86 slice and unslice intrinsics for 256-bit byte/short vectors [v2] In-Reply-To: References: Message-ID: > The slice and unslice intrinsics for 256-bit byte/short vectors can be implemented for x86 platforms supporting AVX2 using a sequence of instructions. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8261542 > > The PerfSliceOrigin.java jmh test attached to the JBS shows the following performance on AVX2 platform. > > Before: > Benchmark (size) Mode Cnt Score Error Units > PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 18.887 ? 1.128 ops/ms > PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 9.374 ? 0.370 ops/ms > > After: > Benchmark (size) Mode Cnt Score Error Units > PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 13861.420 ? 19.071 ops/ms > PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 7895.199 ? 142.580 ops/ms Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: corrected assert ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2520/files - new: https://git.openjdk.java.net/jdk/pull/2520/files/77324374..55165cb5 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2520&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2520&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2520.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2520/head:pull/2520 PR: https://git.openjdk.java.net/jdk/pull/2520 From sviswanathan at openjdk.java.net Fri Feb 19 01:23:04 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Fri, 19 Feb 2021 01:23:04 GMT Subject: RFR: 8261542: X86 slice and unslice intrinsics for 256-bit byte/short vectors [v3] In-Reply-To: References: Message-ID: <4TNVM4VqqsezSVCJndS1RpM9EvIfICm5mv1AbZhthgg=.3943a37b-cce4-415e-989f-b9696ad6c008@github.com> > The slice and unslice intrinsics for 256-bit byte/short vectors can be implemented for x86 platforms supporting AVX2 using a sequence of instructions. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8261542 > > The PerfSliceOrigin.java jmh test attached to the JBS shows the following performance on AVX2 platform. > > Before: > Benchmark (size) Mode Cnt Score Error Units > PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 18.887 ? 1.128 ops/ms > PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 9.374 ? 0.370 ops/ms > > After: > Benchmark (size) Mode Cnt Score Error Units > PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 13861.420 ? 19.071 ops/ms > PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 7895.199 ? 142.580 ops/ms Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: corrected assert ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2520/files - new: https://git.openjdk.java.net/jdk/pull/2520/files/55165cb5..fa13679a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2520&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2520&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2520.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2520/head:pull/2520 PR: https://git.openjdk.java.net/jdk/pull/2520 From sviswanathan at openjdk.java.net Fri Feb 19 01:30:41 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Fri, 19 Feb 2021 01:30:41 GMT Subject: RFR: 8261542: X86 slice and unslice intrinsics for 256-bit byte/short vectors [v3] In-Reply-To: References: Message-ID: On Thu, 18 Feb 2021 23:31:28 GMT, Vladimir Kozlov wrote: >> Yes you are right, but this code will execute for vector length 16 when UseAVX ==2. >> It will also execure for vector length 16 when UseAVX == 3 && >> !VM_Version::supports_avx512bw. > > Next assert checks <= 16 when code is guarded by (UseAVX == 0). It is not (UseAVX ==2). > Also } else { case is for UseAVX > 0 which includes AVX=1 but vpaddb() (avx3) is used there. > Seems UseAVX checks wrong here. The assert checks for vlen_in_bytes <= 16 (128 bits) and so is a correct check for UseAVX=0. vpaddb is supported on AVX1/AVX2 as well. vpaddb is supported on AVX1 for up to 128 bit and on AVX2 for upto 256 bit and on AVX3 (512) for upto 512 bit vectors. I have tested this for UseAVX=0, UseAVX=1, UseAVX=2, UseAVX=3 platform. The check is for UseAVX as with any flavor of AVX, we can use less number of instructions to do this operation. This is because AVX allows destination to be separate from both the sources. Please let me know if I am missing something. ------------- PR: https://git.openjdk.java.net/jdk/pull/2520 From dholmes at openjdk.java.net Fri Feb 19 01:30:41 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Fri, 19 Feb 2021 01:30:41 GMT Subject: RFR: 8261949: fileStream::readln returns incorrect line string In-Reply-To: References: <36ELXWgiTy9NBIxJnRUIR2YKB2Ka-RYBl8hXOWGbpXs=.b0d8c8be-9f5e-457b-8f47-ba9cd341c2dd@github.com> Message-ID: On Thu, 18 Feb 2021 16:15:11 GMT, Daniel D. Daugherty wrote: >> When the last line does not contain a NEWLINE character, fileStream::readln would read >> truncated line string: >> >> $ cat file_content: >> AA >> BB >> CC >> >> fileStream::readln result: >> "AA" >> "BB" >> "C" >> >> This patch address this problem, it works for Posix and Windows since the last character >> of these systems is always '\n'. > > src/hotspot/share/utilities/ostream.cpp line 593: > >> 591: ret = ::fgets(data, count, _file); >> 592: // Get rid of annoying \n char only if it presents, it works for Posix >> 593: // and Windows since the last character of these systems is always '\n' > > s/of these/on these/ > Please end the comment with a period. I suggest simply: // Get rid of \n char if it is present > src/hotspot/share/utilities/ostream.cpp line 595: > >> 593: // and Windows since the last character of these systems is always '\n' >> 594: if (data[::strlen(data)-1] == '\n') { >> 595: data[::strlen(data)-1] = '\0'; > > Perhaps: > size_t last_char = ::strlen(data) - 1; > if (last_char >= 0 && data[last_char] == '\n') { > data[last_char] = '\0'; @dcubed-ojdk size_t is unsigned so by definition always >= 0. :) I suggest: `size_t len = ::strlen(data);` `if (len > 0 && data[len - 1] == '\n') {` ` data[len - 1] = '\0';` `}` ------------- PR: https://git.openjdk.java.net/jdk/pull/2626 From dholmes at openjdk.java.net Fri Feb 19 01:30:39 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Fri, 19 Feb 2021 01:30:39 GMT Subject: RFR: 8261949: fileStream::readln returns incorrect line string In-Reply-To: <36ELXWgiTy9NBIxJnRUIR2YKB2Ka-RYBl8hXOWGbpXs=.b0d8c8be-9f5e-457b-8f47-ba9cd341c2dd@github.com> References: <36ELXWgiTy9NBIxJnRUIR2YKB2Ka-RYBl8hXOWGbpXs=.b0d8c8be-9f5e-457b-8f47-ba9cd341c2dd@github.com> Message-ID: On Thu, 18 Feb 2021 12:31:29 GMT, Yang Yi wrote: > When the last line does not contain a NEWLINE character, fileStream::readln would read > truncated line string: > > $ cat file_content: > AA > BB > CC > > fileStream::readln result: > "AA" > "BB" > "C" > > This patch address this problem, it works for Posix and Windows since the last character > of these systems is always '\n'. Changes requested by dholmes (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2626 From kvn at openjdk.java.net Fri Feb 19 01:58:44 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 19 Feb 2021 01:58:44 GMT Subject: RFR: 8261542: X86 slice and unslice intrinsics for 256-bit byte/short vectors [v3] In-Reply-To: <4TNVM4VqqsezSVCJndS1RpM9EvIfICm5mv1AbZhthgg=.3943a37b-cce4-415e-989f-b9696ad6c008@github.com> References: <4TNVM4VqqsezSVCJndS1RpM9EvIfICm5mv1AbZhthgg=.3943a37b-cce4-415e-989f-b9696ad6c008@github.com> Message-ID: On Fri, 19 Feb 2021 01:23:04 GMT, Sandhya Viswanathan wrote: >> The slice and unslice intrinsics for 256-bit byte/short vectors can be implemented for x86 platforms supporting AVX2 using a sequence of instructions. >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8261542 >> >> The PerfSliceOrigin.java jmh test attached to the JBS shows the following performance on AVX2 platform. >> >> Before: >> Benchmark (size) Mode Cnt Score Error Units >> PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 18.887 ? 1.128 ops/ms >> PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 9.374 ? 0.370 ops/ms >> >> After: >> Benchmark (size) Mode Cnt Score Error Units >> PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 13861.420 ? 19.071 ops/ms >> PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 7895.199 ? 142.580 ops/ms > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > corrected assert src/hotspot/cpu/x86/x86.ad line 1695: > 1693: if(vlen == 2) { > 1694: return false; // Implementation limitation due to how shuffle is loaded > 1695: } else if (size_in_bits == 256 && UseAVX < 2) { Should this be >= 256? ------------- PR: https://git.openjdk.java.net/jdk/pull/2520 From kvn at openjdk.java.net Fri Feb 19 02:07:40 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 19 Feb 2021 02:07:40 GMT Subject: RFR: 8261542: X86 slice and unslice intrinsics for 256-bit byte/short vectors [v3] In-Reply-To: References: Message-ID: On Fri, 19 Feb 2021 01:27:57 GMT, Sandhya Viswanathan wrote: >> Next assert checks <= 16 when code is guarded by (UseAVX == 0). It is not (UseAVX ==2). >> Also } else { case is for UseAVX > 0 which includes AVX=1 but vpaddb() (avx3) is used there. >> Seems UseAVX checks wrong here. > > The assert checks for vlen_in_bytes <= 16 (128 bits) and so is a correct check for UseAVX=0. > vpaddb is supported on AVX1/AVX2 as well. > vpaddb is supported on AVX1 for up to 128 bit and > on AVX2 for upto 256 bit and > on AVX3 (512) for upto 512 bit vectors. > I have tested this for UseAVX=0, UseAVX=1, UseAVX=2, UseAVX=3 platform. > > The check is for UseAVX as with any flavor of AVX, we can use less number of instructions to do this operation. > This is because AVX allows destination to be separate from both the sources. > > Please let me know if I am missing something. My bad - I missed that size is in bytes in assert. The assert is correct, as you said. And `} else {` part works for AVX1 because of match_rule_supported_vector() bailout 256-bit case. May be add assert(UseAVX > 1 || vlen_in_bytes <= 16, ). I only have one question left - about check >= 256 in match_rule_supported_vector() ------------- PR: https://git.openjdk.java.net/jdk/pull/2520 From github.com+5010047+kelthuzadx at openjdk.java.net Fri Feb 19 02:08:52 2021 From: github.com+5010047+kelthuzadx at openjdk.java.net (Yang Yi) Date: Fri, 19 Feb 2021 02:08:52 GMT Subject: RFR: 8261949: fileStream::readln returns incorrect line string [v2] In-Reply-To: <36ELXWgiTy9NBIxJnRUIR2YKB2Ka-RYBl8hXOWGbpXs=.b0d8c8be-9f5e-457b-8f47-ba9cd341c2dd@github.com> References: <36ELXWgiTy9NBIxJnRUIR2YKB2Ka-RYBl8hXOWGbpXs=.b0d8c8be-9f5e-457b-8f47-ba9cd341c2dd@github.com> Message-ID: <6mopOrvW_0CcGxgYT6JMLuSQi7IUW2eh58EHj0SX_eY=.55868bf5-fd8d-4ab1-b02e-944f626cbb2b@github.com> > When the last line does not contain a NEWLINE character, fileStream::readln would read > truncated line string: > > $ cat file_content: > AA > BB > CC > > fileStream::readln result: > "AA" > "BB" > "C" > > This patch address this problem, it works for Posix and Windows since the last character > of these systems is always '\n'. Yang Yi has updated the pull request incrementally with one additional commit since the last revision: tweak ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2626/files - new: https://git.openjdk.java.net/jdk/pull/2626/files/f7bd8206..2e816b2e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2626&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2626&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/2626.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2626/head:pull/2626 PR: https://git.openjdk.java.net/jdk/pull/2626 From github.com+5010047+kelthuzadx at openjdk.java.net Fri Feb 19 02:08:53 2021 From: github.com+5010047+kelthuzadx at openjdk.java.net (Yang Yi) Date: Fri, 19 Feb 2021 02:08:53 GMT Subject: RFR: 8261949: fileStream::readln returns incorrect line string [v2] In-Reply-To: References: <36ELXWgiTy9NBIxJnRUIR2YKB2Ka-RYBl8hXOWGbpXs=.b0d8c8be-9f5e-457b-8f47-ba9cd341c2dd@github.com> Message-ID: On Fri, 19 Feb 2021 01:22:51 GMT, David Holmes wrote: >> src/hotspot/share/utilities/ostream.cpp line 593: >> >>> 591: ret = ::fgets(data, count, _file); >>> 592: // Get rid of annoying \n char only if it presents, it works for Posix >>> 593: // and Windows since the last character of these systems is always '\n' >> >> s/of these/on these/ >> Please end the comment with a period. > > I suggest simply: > > // Get rid of \n char if it is present Changed. >> src/hotspot/share/utilities/ostream.cpp line 595: >> >>> 593: // and Windows since the last character of these systems is always '\n' >>> 594: if (data[::strlen(data)-1] == '\n') { >>> 595: data[::strlen(data)-1] = '\0'; >> >> Perhaps: >> size_t last_char = ::strlen(data) - 1; >> if (last_char >= 0 && data[last_char] == '\n') { >> data[last_char] = '\0'; > > @dcubed-ojdk size_t is unsigned so by definition always >= 0. :) I suggest: > `size_t len = ::strlen(data);` > `if (len > 0 && data[len - 1] == '\n') {` > ` data[len - 1] = '\0';` > `}` Make sense. Changed. ------------- PR: https://git.openjdk.java.net/jdk/pull/2626 From sviswanathan at openjdk.java.net Fri Feb 19 02:33:40 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Fri, 19 Feb 2021 02:33:40 GMT Subject: RFR: 8261542: X86 slice and unslice intrinsics for 256-bit byte/short vectors [v3] In-Reply-To: References: <4TNVM4VqqsezSVCJndS1RpM9EvIfICm5mv1AbZhthgg=.3943a37b-cce4-415e-989f-b9696ad6c008@github.com> Message-ID: On Fri, 19 Feb 2021 01:56:19 GMT, Vladimir Kozlov wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: >> >> corrected assert > > src/hotspot/cpu/x86/x86.ad line 1695: > >> 1693: if(vlen == 2) { >> 1694: return false; // Implementation limitation due to how shuffle is loaded >> 1695: } else if (size_in_bits == 256 && UseAVX < 2) { > > Should this be >= 256? The general >= 256 part is taken care of early on in match_rule_supported_vector as below: if (!vector_size_supported(bt, vlen)) { return false; } The only additional check that is being done here is for float and double 256 bit vectors that are supported on AVX=1 and will pass the vector_size_supported check. This is because the VectorLoadShuffle cannot be performed for 256 bit vectors on AVX1 platform as it needs "integer" 256 bit instructions which are only available on AVX2. ------------- PR: https://git.openjdk.java.net/jdk/pull/2520 From kbarrett at openjdk.java.net Fri Feb 19 02:45:04 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 19 Feb 2021 02:45:04 GMT Subject: RFR: 8261998: Remove unused shared entry support from utilities/hashtable Message-ID: <49wKLf3cdp0YDVDbUsbRIyV6576RzmT4fqFlIC3BDLU=.301faf2d-ad9b-4c84-8c15-dec7df375bb9@github.com> Please review this small cleanup in the utilities/hashtable facility. The support for "shared" entries is no longer needed or used, so is being deleted. Testing: mach5 tier1-4 (some CDS tests are in tier4) ------------- Commit messages: - remove shared entry support Changes: https://git.openjdk.java.net/jdk/pull/2638/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2638&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261998 Stats: 28 lines in 3 files changed: 0 ins; 23 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/2638.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2638/head:pull/2638 PR: https://git.openjdk.java.net/jdk/pull/2638 From coleenp at openjdk.java.net Fri Feb 19 02:45:41 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 19 Feb 2021 02:45:41 GMT Subject: RFR: JDK-8261644: NMT: Simplifications and cleanups [v4] In-Reply-To: References: Message-ID: <8lRj1rs9KAsWeLATEW1vHTsXiE4-jQvd3SaUOZuyb0M=.f4b2e86c-6b7f-4ff5-890c-296d63168593@github.com> On Sat, 13 Feb 2021 05:43:56 GMT, Thomas Stuefe wrote: >> Hi, >> >> may I please have reviews for this RFE? >> >> While working on NMT I found a number of possible cleanups and simplifications. I avoided mixing these cleanups with fixed and instead put them into this cleanup RFE. >> >> There should be no behavioral changes in this patch. >> >> - de-templatize `AllocationSite` since E was used as simple data holder for child classes; the same effect can be had with traditional inheritance with less and clearer code (also IDEs get less confused) >> >> - `AllocationSite` child classes `SimpleThreadStackSite`, `VirtualMemoryAllocationSite`, `MallocSite` were simplified. >> >> - As for `SimpleThreadStackSite`, we can get rid of the separate data holder class `SimpleThreadStack` entirely by merging its members directly into `SimpleThreadStackSite`. In theory we could do the same for the data holder classes `MemoryCounter` and `VirtualMemory` for `MallocSite` and `VirtualMemoryAllocationSite` too but this would cause larger ripples so I stopped there. >> >> - removed the SimpleThreadStackSite(address base, size_t size) constructor (the one not taking a call stack) by slightly rewriting its sole user >> >> - made `AllocationSite` immutable >> >> - removed unused default constructors from `MallocSite` and `MallocSiteHashTableEntry` since they were not needed >> >> - removed unused methods `set_callsite()`, `hash()`, `equals()` from `MallocSiteHashTableEntry` >> >> - There was a subtle incorrectness where `AllocationSite::equals()` would only compare callstack and disregard the MEMFLAGS member. Theoretically, if two callstacks end with the same lowest frame, they should always reference the same single allocation, so that's okay. But if the call stack capturing was not precise enough (eg skipping too many low frames) we may accidentally lump several allocation sites together which could have different MEMFLAGS. I added an assert to check that. (_Update: seems this assert really fires on s390x, so this is a real problem. I opened [1] to track this and restored the old behavior._). >> >> - `NativeCallStack`: Removed the `fillStack` argument from the first constructor to avoid having to evaluate it in this hot constructor. Its true in almost all cases. >> >> - Also removed the `toSkip` default value. Instead, I added an explicit default constructor. >> >> - Moved the malloc site table tuning statistics printing from memtracker.cpp down into a new function `MallocSiteTable::print_tuning_statistics()`. When implemented inside `MallocSiteTable`, that coding does not need a walker object anymore and becomes a lot simpler. In particular, we don't have to rely on implicit knowledge about walking order, which made the code complex and was vulnerable against subtle errors. New code is more compact and simpler. Before removing the old implementation, I ran both statistics side by side for a couple of scenarios (eg really bad hash code implementations) and the output was identical. >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8261556 >> >> ---- >> Tests: >> - github GA >> - manual NMT jtreg tests (including the currently disabled runtime/NMT/CheckForProperDetailStackTrace.java) >> - Full nightlies at SAP are scheduled > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > reduce diff Looks good! ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2539 From dcubed at openjdk.java.net Fri Feb 19 02:58:41 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Fri, 19 Feb 2021 02:58:41 GMT Subject: RFR: 8261949: fileStream::readln returns incorrect line string [v2] In-Reply-To: <6mopOrvW_0CcGxgYT6JMLuSQi7IUW2eh58EHj0SX_eY=.55868bf5-fd8d-4ab1-b02e-944f626cbb2b@github.com> References: <36ELXWgiTy9NBIxJnRUIR2YKB2Ka-RYBl8hXOWGbpXs=.b0d8c8be-9f5e-457b-8f47-ba9cd341c2dd@github.com> <6mopOrvW_0CcGxgYT6JMLuSQi7IUW2eh58EHj0SX_eY=.55868bf5-fd8d-4ab1-b02e-944f626cbb2b@github.com> Message-ID: <9dW7QzfmQ8pZQbnqFR4x64EH6YUmqXKWNT0TPDdXcU0=.75d8426c-f121-44f3-b341-fed32f844156@github.com> On Fri, 19 Feb 2021 02:08:52 GMT, Yang Yi wrote: >> When the last line does not contain a NEWLINE character, fileStream::readln would read >> truncated line string: >> >> $ cat file_content: >> AA >> BB >> CC >> >> fileStream::readln result: >> "AA" >> "BB" >> "C" >> >> This patch address this problem, it works for Posix and Windows since the last character >> of these systems is always '\n'. > > Yang Yi has updated the pull request incrementally with one additional commit since the last revision: > > tweak I like David's suggestion better than mine. Thumbs up. ------------- Marked as reviewed by dcubed (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2626 From coleenp at openjdk.java.net Fri Feb 19 02:58:42 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 19 Feb 2021 02:58:42 GMT Subject: RFR: 8261998: Remove unused shared entry support from utilities/hashtable In-Reply-To: <49wKLf3cdp0YDVDbUsbRIyV6576RzmT4fqFlIC3BDLU=.301faf2d-ad9b-4c84-8c15-dec7df375bb9@github.com> References: <49wKLf3cdp0YDVDbUsbRIyV6576RzmT4fqFlIC3BDLU=.301faf2d-ad9b-4c84-8c15-dec7df375bb9@github.com> Message-ID: <71z-SnDx5bGZgGxNyQVcKz79eA_Vsg6XEZ_RGNVP1K4=.dd905031-8f45-43ed-9d66-4753d41f34b4@github.com> On Fri, 19 Feb 2021 02:39:20 GMT, Kim Barrett wrote: > Please review this small cleanup in the utilities/hashtable facility. The > support for "shared" entries is no longer needed or used, so is being deleted. > > Testing: > mach5 tier1-4 (some CDS tests are in tier4) We might want to share other hashtables like this, like the loader constraint table, but I don't think this will be needed. src/hotspot/share/prims/jvmtiTagMapTable.cpp line 255: > 253: } > 254: // get next entry > 255: entry = (JvmtiTagMapEntry*)HashtableEntry::make_ptr(*p); Nice to get rid of this. It would be nice if the hashtables didn't need to declare bucket() and bucket_addr() with all these casts but some template nonsense makes it not compile. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2638 From dongbo at openjdk.java.net Fri Feb 19 03:13:12 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Fri, 19 Feb 2021 03:13:12 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v7] In-Reply-To: References: Message-ID: > In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero, > see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`: > /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */ > public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT); > > The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64, > assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 0`, but `ushr dst.4H, src.4H, 16` instead. > According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long. > ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i); > vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i); > > The legal right shift amount should be in the range 1 to the element width in bits on aarch64: > https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en > > This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate. > Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests. Dong Bo has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: handle zero shift in macro assembler ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2472/files - new: https://git.openjdk.java.net/jdk/pull/2472/files/d746f209..1aba5629 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=05-06 Stats: 530 lines in 4 files changed: 27 ins; 174 del; 329 mod Patch: https://git.openjdk.java.net/jdk/pull/2472.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2472/head:pull/2472 PR: https://git.openjdk.java.net/jdk/pull/2472 From dongbo at openjdk.java.net Fri Feb 19 03:17:42 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Fri, 19 Feb 2021 03:17:42 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v6] In-Reply-To: References: <45gEz_9Vli1Mvby8SbtMsoEx65KL11RQnI-qKy6zfgo=.42933345-0086-466b-ae38-3add53fc1816@github.com> Message-ID: On Thu, 18 Feb 2021 09:35:07 GMT, Andrew Haley wrote: >> Dong Bo has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. > > src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 2694: > >> 2692: assert((1 << ((T>>1)+3)) > shift, "Invalid Shift value"); \ >> 2693: if (shift == 0) { \ >> 2694: bool accumulate = ((opc2 & 0b100) != 0); \ > > Is this correct, according to the definition in the Architecture Reference Manual? It doesn't look like it to me. Assembler methods should generate bit patterns exactly as defined in the Manual. This logic should be in a MacroAssembler method. Hi, I moved the logic into MacroAssembler. The assert is kept to make sure that we would never pass a zero right shift to assemlber. ------------- PR: https://git.openjdk.java.net/jdk/pull/2472 From sviswanathan at openjdk.java.net Fri Feb 19 03:20:59 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Fri, 19 Feb 2021 03:20:59 GMT Subject: RFR: 8261542: X86 slice and unslice intrinsics for 256-bit byte/short vectors [v4] In-Reply-To: References: Message-ID: <_0LW_Yx-ItDG7f-3kVjOTPa5BbcaN1FMRBbATLMwpmk=.da545e81-59a0-4d80-9f45-0714fe7bfc94@github.com> On Fri, 19 Feb 2021 02:05:02 GMT, Vladimir Kozlov wrote: >> The assert checks for vlen_in_bytes <= 16 (128 bits) and so is a correct check for UseAVX=0. >> vpaddb is supported on AVX1/AVX2 as well. >> vpaddb is supported on AVX1 for up to 128 bit and >> on AVX2 for upto 256 bit and >> on AVX3 (512) for upto 512 bit vectors. >> I have tested this for UseAVX=0, UseAVX=1, UseAVX=2, UseAVX=3 platform. >> >> The check is for UseAVX as with any flavor of AVX, we can use less number of instructions to do this operation. >> This is because AVX allows destination to be separate from both the sources. >> >> Please let me know if I am missing something. > > My bad - I missed that size is in bytes in assert. The assert is correct, as you said. > And `} else {` part works for AVX1 because of match_rule_supported_vector() bailout 256-bit case. > May be add assert(UseAVX > 1 || vlen_in_bytes <= 16, ). > > I only have one question left - about check >= 256 in match_rule_supported_vector() Added the following assert on else path: + assert(UseAVX > 1 || vlen_in_bytes <= 16, "required"); ------------- PR: https://git.openjdk.java.net/jdk/pull/2520 From sviswanathan at openjdk.java.net Fri Feb 19 03:20:59 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Fri, 19 Feb 2021 03:20:59 GMT Subject: RFR: 8261542: X86 slice and unslice intrinsics for 256-bit byte/short vectors [v4] In-Reply-To: References: Message-ID: > The slice and unslice intrinsics for 256-bit byte/short vectors can be implemented for x86 platforms supporting AVX2 using a sequence of instructions. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8261542 > > The PerfSliceOrigin.java jmh test attached to the JBS shows the following performance on AVX2 platform. > > Before: > Benchmark (size) Mode Cnt Score Error Units > PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 18.887 ? 1.128 ops/ms > PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 9.374 ? 0.370 ops/ms > > After: > Benchmark (size) Mode Cnt Score Error Units > PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 13861.420 ? 19.071 ops/ms > PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 7895.199 ? 142.580 ops/ms Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: add assert on else path ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2520/files - new: https://git.openjdk.java.net/jdk/pull/2520/files/fa13679a..ad3ab2b1 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2520&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2520&range=02-03 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2520.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2520/head:pull/2520 PR: https://git.openjdk.java.net/jdk/pull/2520 From dholmes at openjdk.java.net Fri Feb 19 03:56:40 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Fri, 19 Feb 2021 03:56:40 GMT Subject: RFR: 8261949: fileStream::readln returns incorrect line string [v2] In-Reply-To: <6mopOrvW_0CcGxgYT6JMLuSQi7IUW2eh58EHj0SX_eY=.55868bf5-fd8d-4ab1-b02e-944f626cbb2b@github.com> References: <36ELXWgiTy9NBIxJnRUIR2YKB2Ka-RYBl8hXOWGbpXs=.b0d8c8be-9f5e-457b-8f47-ba9cd341c2dd@github.com> <6mopOrvW_0CcGxgYT6JMLuSQi7IUW2eh58EHj0SX_eY=.55868bf5-fd8d-4ab1-b02e-944f626cbb2b@github.com> Message-ID: On Fri, 19 Feb 2021 02:08:52 GMT, Yang Yi wrote: >> When the last line does not contain a NEWLINE character, fileStream::readln would read >> truncated line string: >> >> $ cat file_content: >> AA >> BB >> CC >> >> fileStream::readln result: >> "AA" >> "BB" >> "C" >> >> This patch address this problem, it works for Posix and Windows since the last character >> of these systems is always '\n'. > > Yang Yi has updated the pull request incrementally with one additional commit since the last revision: > > tweak Looks good! Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2626 From kvn at openjdk.java.net Fri Feb 19 05:52:39 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 19 Feb 2021 05:52:39 GMT Subject: RFR: 8261542: X86 slice and unslice intrinsics for 256-bit byte/short vectors [v4] In-Reply-To: References: Message-ID: On Fri, 19 Feb 2021 03:20:59 GMT, Sandhya Viswanathan wrote: >> The slice and unslice intrinsics for 256-bit byte/short vectors can be implemented for x86 platforms supporting AVX2 using a sequence of instructions. >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8261542 >> >> The PerfSliceOrigin.java jmh test attached to the JBS shows the following performance on AVX2 platform. >> >> Before: >> Benchmark (size) Mode Cnt Score Error Units >> PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 18.887 ? 1.128 ops/ms >> PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 9.374 ? 0.370 ops/ms >> >> After: >> Benchmark (size) Mode Cnt Score Error Units >> PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 13861.420 ? 19.071 ops/ms >> PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 7895.199 ? 142.580 ops/ms > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > add assert on else path Marked as reviewed by kvn (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2520 From kvn at openjdk.java.net Fri Feb 19 05:52:41 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 19 Feb 2021 05:52:41 GMT Subject: RFR: 8261542: X86 slice and unslice intrinsics for 256-bit byte/short vectors [v3] In-Reply-To: References: <4TNVM4VqqsezSVCJndS1RpM9EvIfICm5mv1AbZhthgg=.3943a37b-cce4-415e-989f-b9696ad6c008@github.com> Message-ID: On Fri, 19 Feb 2021 02:30:58 GMT, Sandhya Viswanathan wrote: >> src/hotspot/cpu/x86/x86.ad line 1695: >> >>> 1693: if(vlen == 2) { >>> 1694: return false; // Implementation limitation due to how shuffle is loaded >>> 1695: } else if (size_in_bits == 256 && UseAVX < 2) { >> >> Should this be >= 256? > > The general >= 256 part is taken care of early on in match_rule_supported_vector as below: > if (!vector_size_supported(bt, vlen)) { > return false; > } > The only additional check that is being done here is for float and double 256 bit vectors that are supported on AVX=1 and will pass the vector_size_supported check. > This is because the VectorLoadShuffle cannot be performed for 256 bit vectors on AVX1 platform as it needs "integer" 256 bit instructions which are only available on AVX2. Okay. ------------- PR: https://git.openjdk.java.net/jdk/pull/2520 From kvn at openjdk.java.net Fri Feb 19 05:52:42 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 19 Feb 2021 05:52:42 GMT Subject: RFR: 8261542: X86 slice and unslice intrinsics for 256-bit byte/short vectors [v4] In-Reply-To: <_0LW_Yx-ItDG7f-3kVjOTPa5BbcaN1FMRBbATLMwpmk=.da545e81-59a0-4d80-9f45-0714fe7bfc94@github.com> References: <_0LW_Yx-ItDG7f-3kVjOTPa5BbcaN1FMRBbATLMwpmk=.da545e81-59a0-4d80-9f45-0714fe7bfc94@github.com> Message-ID: On Fri, 19 Feb 2021 03:17:56 GMT, Sandhya Viswanathan wrote: >> My bad - I missed that size is in bytes in assert. The assert is correct, as you said. >> And `} else {` part works for AVX1 because of match_rule_supported_vector() bailout 256-bit case. >> May be add assert(UseAVX > 1 || vlen_in_bytes <= 16, ). >> >> I only have one question left - about check >= 256 in match_rule_supported_vector() > > Added the following assert on else path: > + assert(UseAVX > 1 || vlen_in_bytes <= 16, "required"); Good. ------------- PR: https://git.openjdk.java.net/jdk/pull/2520 From ccheung at openjdk.java.net Fri Feb 19 06:10:51 2021 From: ccheung at openjdk.java.net (Calvin Cheung) Date: Fri, 19 Feb 2021 06:10:51 GMT Subject: RFR: 8261868: Reduce inclusion of metaspace.hpp [v2] In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 07:31:12 GMT, Ioi Lam wrote: >> metaspace.hpp is included by about 770 out of 1000 HotSpot .o files. Most of these are transitively included via array.hpp and classLoaderData.hpp. >> >> - classLoaderData.hpp doesn't actually need metaspace.hpp. >> - array.hpp can be refactored to put a function that depends on metaspace.hpp into array.inline.hpp >> >> Doing the above reduces the number of .o files that include metaspace.hpp to 343. Since this is still a significant number, we should split out the rarely used classes (such as `MetaspaceGC` and `MetaspaceUtils`) into a new header file (metaspaceUtils.hpp, which is included only 30 times). >> >> Also, these 3 includes can now be removed from metaspace.hpp. >> >> #include "memory/memRegion.hpp" >> #include "memory/metaspaceChunkFreeListSummary.hpp" >> #include "memory/virtualspace.hpp" >> >> Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. >> >> (I also fixed an unrelated comment in archiveUtils.cpp when I was scanning for the word "Metaspace" in the source files -- the function `MetaspaceShared::commit_to()` no longer exists). > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > fixed ppc/s390 builds One more file you may consider updating is share/memory/classLoaderMetaspace.cpp. It now depends on metaspaceUtils.hpp but it includes it transitively via metaspaceTracer.hpp. The include of metaspace.hpp is not needed because classLoaderMetaspace.hpp includes it. Other changes seem good. Thanks, Calvin ------------- Marked as reviewed by ccheung (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2599 From stuefe at openjdk.java.net Fri Feb 19 06:23:39 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 19 Feb 2021 06:23:39 GMT Subject: RFR: JDK-8261644: NMT: Simplifications and cleanups [v4] In-Reply-To: <8lRj1rs9KAsWeLATEW1vHTsXiE4-jQvd3SaUOZuyb0M=.f4b2e86c-6b7f-4ff5-890c-296d63168593@github.com> References: <8lRj1rs9KAsWeLATEW1vHTsXiE4-jQvd3SaUOZuyb0M=.f4b2e86c-6b7f-4ff5-890c-296d63168593@github.com> Message-ID: On Fri, 19 Feb 2021 02:43:14 GMT, Coleen Phillimore wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> reduce diff > > Looks good! Thanks Coleen and Zhengyu! ------------- PR: https://git.openjdk.java.net/jdk/pull/2539 From stuefe at openjdk.java.net Fri Feb 19 06:23:40 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 19 Feb 2021 06:23:40 GMT Subject: Integrated: JDK-8261644: NMT: Simplifications and cleanups In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 07:16:45 GMT, Thomas Stuefe wrote: > Hi, > > may I please have reviews for this RFE? > > While working on NMT I found a number of possible cleanups and simplifications. I avoided mixing these cleanups with fixed and instead put them into this cleanup RFE. > > There should be no behavioral changes in this patch. > > - de-templatize `AllocationSite` since E was used as simple data holder for child classes; the same effect can be had with traditional inheritance with less and clearer code (also IDEs get less confused) > > - `AllocationSite` child classes `SimpleThreadStackSite`, `VirtualMemoryAllocationSite`, `MallocSite` were simplified. > > - As for `SimpleThreadStackSite`, we can get rid of the separate data holder class `SimpleThreadStack` entirely by merging its members directly into `SimpleThreadStackSite`. In theory we could do the same for the data holder classes `MemoryCounter` and `VirtualMemory` for `MallocSite` and `VirtualMemoryAllocationSite` too but this would cause larger ripples so I stopped there. > > - removed the SimpleThreadStackSite(address base, size_t size) constructor (the one not taking a call stack) by slightly rewriting its sole user > > - made `AllocationSite` immutable > > - removed unused default constructors from `MallocSite` and `MallocSiteHashTableEntry` since they were not needed > > - removed unused methods `set_callsite()`, `hash()`, `equals()` from `MallocSiteHashTableEntry` > > - There was a subtle incorrectness where `AllocationSite::equals()` would only compare callstack and disregard the MEMFLAGS member. Theoretically, if two callstacks end with the same lowest frame, they should always reference the same single allocation, so that's okay. But if the call stack capturing was not precise enough (eg skipping too many low frames) we may accidentally lump several allocation sites together which could have different MEMFLAGS. I added an assert to check that. (_Update: seems this assert really fires on s390x, so this is a real problem. I opened [1] to track this and restored the old behavior._). > > - `NativeCallStack`: Removed the `fillStack` argument from the first constructor to avoid having to evaluate it in this hot constructor. Its true in almost all cases. > > - Also removed the `toSkip` default value. Instead, I added an explicit default constructor. > > - Moved the malloc site table tuning statistics printing from memtracker.cpp down into a new function `MallocSiteTable::print_tuning_statistics()`. When implemented inside `MallocSiteTable`, that coding does not need a walker object anymore and becomes a lot simpler. In particular, we don't have to rely on implicit knowledge about walking order, which made the code complex and was vulnerable against subtle errors. New code is more compact and simpler. Before removing the old implementation, I ran both statistics side by side for a couple of scenarios (eg really bad hash code implementations) and the output was identical. > > [1] https://bugs.openjdk.java.net/browse/JDK-8261556 > > ---- > Tests: > - github GA > - manual NMT jtreg tests (including the currently disabled runtime/NMT/CheckForProperDetailStackTrace.java) > - Full nightlies at SAP are scheduled This pull request has now been integrated. Changeset: 5caf686c Author: Thomas Stuefe URL: https://git.openjdk.java.net/jdk/commit/5caf686c Stats: 312 lines in 10 files changed: 89 ins; 178 del; 45 mod 8261644: NMT: Simplifications and cleanups Reviewed-by: coleenp, zgu ------------- PR: https://git.openjdk.java.net/jdk/pull/2539 From stuefe at openjdk.java.net Fri Feb 19 06:43:46 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 19 Feb 2021 06:43:46 GMT Subject: RFR: JDK-8260485: Simplify and unify handler vectors in Posix signal code [v7] In-Reply-To: <2uu-RLIOIKtYE08Gw0OsFUqqncjKqUEWc_7X46OxV5w=.8988091b-1d5d-49e9-bf5e-a1287439f324@github.com> References: <8vAf2pIuZoDkFY-d3VC9rzzss_Bmult2_cbEJo6Aw6c=.fd724d30-bae0-4aaf-8676-769982ed45d6@github.com> <2uu-RLIOIKtYE08Gw0OsFUqqncjKqUEWc_7X46OxV5w=.8988091b-1d5d-49e9-bf5e-a1287439f324@github.com> Message-ID: On Thu, 18 Feb 2021 19:14:38 GMT, Gerard Ziemski wrote: >> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: >> >> - Style fixes >> - expected_handlers->vm_handlers >> - Merge >> - Use universal zero initializer for do_check_signal_periodically >> - Further fixes >> - Fix build error on zlinux >> - David Feedback >> - Make SavedSignalHandlers use C-heap for its items >> - Removed display-replaced-handler-logic >> - Feedback David >> - ... and 2 more: https://git.openjdk.java.net/jdk/compare/a185aa91...5cf58186 > > Marked as reviewed by gziemski (Committer). Thanks Gerard and David! About const correctness, I looked into this. Looked mostly straightforward, apart from the one fact that inside call_chained_handler, we modify the handed over sigaction structure. Which is really yucky. Especially if you consider that this function is called also for the libjsig case, where the sigaction structure is handed in from the libjsig, so we essentially modify memory in libjsig. We do this for two reasons, one is to just have a holder for a temporary sigset, which would be trivial to fix with a temporary local variable. And then, to switch off chaining for SA_NODEFER (mimicking one shot semantics). And while looking into that, I saw that one shot semantics can never have worked reliably. I guess its like David said in other places, its a best effort thing. The whole investigation turned out too deep for this RFE, so I leave that for another day (https://bugs.openjdk.java.net/browse/JDK-8262006). @gerard-ziemski : I also created https://bugs.openjdk.java.net/browse/JDK-8262007 to track the renaming you wanted. Left it unassigned, feel free to grab that one. Cheers, Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/2251 From stuefe at openjdk.java.net Fri Feb 19 06:43:47 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 19 Feb 2021 06:43:47 GMT Subject: Integrated: JDK-8260485: Simplify and unify handler vectors in Posix signal code In-Reply-To: References: Message-ID: On Wed, 27 Jan 2021 09:18:19 GMT, Thomas Stuefe wrote: > In signal handling code, we have code sections which save signal handler state into vectors of sigaction structures, or of integers (if only flags are saved). All these code sections can be unified, disentangled and the using code simplified. > > There are three places where we do this: > > 1) When installing hotspot signal handlers, should we find a handler in place and signal chaining is enabled, we save the original handler inside a sigaction array and a corresponding sigset: > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L85 > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L338 > > 2) if diagnostics are enabled with -Xcheck:jni, we periodically check if our hotspot signal handlers had been replaced (`static void check_signal_handler(int sig)`): > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L766 > To do that, we store information about the handlers we installed and we expect to be intact; in this case we only store the sigaction flags (`int sigflags[NSIG];`) and deduce the handler address from context. > > 3) There is a complicated dance between VMError and the posix signal handler code: If a fatal error happens, we enter error reporting and install the secondary handler (`VMError::install_secondary_signal_handler()`). Before doing that, we store the handler we replace in yet another array, in this case one array for the handler address, one for the flag: > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/vmError_posix.cpp#L77 > I believe the purpose of this is to - when printing signal handlers as part of error reporting - print the original signal handler instead of the secondary crash handler (see `PosixSignals::print_signal_handler()`): > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L1372 > and additionally to not trip this warning here: > https://github.com/openjdk/jdk/blob/e0d748d56f817b4fb5158aae86aee12d90b36356/src/hotspot/os/posix/signals_posix.cpp#L1391 > > ------ > > Changes in this patch: > > - I added some convenience macros to check if a handler matches a given function (HANDLER_IS), check if a handler is set to ignore or default or both (HANDLER_IS_IGN, HANDLER_IS_DFL, HANDLER_IS_IGN_OR_DFL). Makes code more readable. > - I added convenience class `SavedSignalHandlers` to keep a vector of handler information by signal number. > - I used that class to cover cases (1)..(3): > - `chained_handlers` contains all information of chained handlers > - `expected_handlers` contains a copy of the handlers the hotspot installed > - `replaced_handlers` contains information about replaced handlers > > - about (1): I store the chained signal handler information in `chained_handlers` when installing a hotspot handler, UseSignalChaining is 1, and a non-default handler was encountered. > > - about (2): I simplified the signal checking mechanism quite a bit: it compares the handler (address and flags) it finds present with expectations. Before this patch, the expected handler address was deduced in a hard-wired way, now, we just compare the active sigaction structure with the one we installed on VM start. > > - about (3): when installing any handler (hotspot as well as user defined via java), I store the handler it replaced in `replaced_handlers`. I use that to print which handler had been replaced in `PosixSignals::print_signal_handler`. I simplified `PosixSignals::print_signal_handler` such that it does not retain any knowledge about hotspot signal handlers. Now, it just prints out the currently established handlers. In addition to that, it prints out chaining information and which handlers had been replaced. I removed the associated coding from VMError. > > Output Before: > 663 Signal Handlers: > 664 SIGSEGV: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 665 SIGBUS: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 666 SIGFPE: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 667 SIGPIPE: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 668 SIGXFSZ: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 669 SIGILL: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 670 SIGUSR2: SR_handler in libjvm.so, sa_mask[0]=00000000000000000000000000000000, sa_flags=SA_RESTART|SA_SIGINFO > 671 SIGHUP: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 672 SIGINT: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 673 SIGTERM: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 674 SIGQUIT: UserHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > 675 SIGTRAP: javaSignalHandler in libjvm.so, sa_mask[0]=11100100010111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO > > Now: > Signal Handlers: > SIGSEGV: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > replaced: SIGSEGV: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGBUS: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > replaced: SIGBUS: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGFPE: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > replaced: SIGFPE: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGPIPE: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGXFSZ: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGILL: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > replaced: SIGILL: javaSignalHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGUSR2: SR_handler in libjvm.so, mask=00000000000000000000000000000000, flags=SA_RESTART|SA_SIGINFO > SIGHUP: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGINT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGTERM: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGQUIT: UserHandler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > SIGTRAP: crash_handler in libjvm.so, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO > > ----- > Tests: GA, and the patch has been tested in our nighlies for over a month now. I manually executed the runtime/jni/checked tests too. This pull request has now been integrated. Changeset: 7e2c909e Author: Thomas Stuefe URL: https://git.openjdk.java.net/jdk/commit/7e2c909e Stats: 356 lines in 4 files changed: 132 ins; 148 del; 76 mod 8260485: Simplify and unify handler vectors in Posix signal code Reviewed-by: dholmes, gziemski ------------- PR: https://git.openjdk.java.net/jdk/pull/2251 From ayang at openjdk.java.net Fri Feb 19 08:36:40 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Fri, 19 Feb 2021 08:36:40 GMT Subject: RFR: 8258431: Provide a JFR event with live set size estimate In-Reply-To: <7HVs4jngEbNIQIPQByuE6IRYAxdijfa82uhEFWHld5U=.a7784482-d7e1-4d59-88ee-455d8691631e@github.com> References: <7HVs4jngEbNIQIPQByuE6IRYAxdijfa82uhEFWHld5U=.a7784482-d7e1-4d59-88ee-455d8691631e@github.com> Message-ID: On Mon, 15 Feb 2021 17:23:44 GMT, Jaroslav Bachorik wrote: > The purpose of this change is to expose a 'cheap' estimate of the current live set size (the meaning of 'current' is dependent on each particular GC implementation but in worst case 'at last full GC') in form of a periodically emitted JFR event. > > ## Introducing new JFR event > > While there is already 'GC Heap Summary' JFR event it does not fit the requirements as it is closely tied to GC cycle so eg. for ZGC or Shenandoah it may not happen for quite a long time, increasing the risk of not having the heap summary events being present in the JFR recording at all. > Because of this I am proposing to add a new 'Heap Usage Summary' event which will be emitted periodically, by default on each JFR chunk, and will contain the information abut the heap capacity, the used and live bytes. This information is available from all GC implementations and can be provided at literally any time. > > ## Implementation > > The implementation differs from GC to GC because each GC algorithm/implementation provides a slightly different way to track the liveness. The common part is `size_t live() const` method added to `CollectedHeap` superclass and the use of a cached 'liveness' value computed after the last GC cycle. If `liveness` hasn't been calculated yet the implementation will default to returning 'used' value. > > The implementations are based on my (rather shallow) knowledge of inner working of the respective GC engines and I am open to suggestions to make them better/correct. > > ### Epsilon GC > > Trivial implementation - just return `used()` instead. > > ### Serial GC > > Here we utilize the fact that mark-copy phase is naturally compacting so the number of bytes after copy is 'live' and that the mark-sweep implementation keeps an internal info about objects being 'dead' but excluded from the compaction effort and we can these numbers to derive the old-gen live set size (used bytes minus the cumulative size of the 'un-dead' objects). > > ### Parallel GC > > For Parallel GC the liveness is calculated as the sum of used bytes in all regions after the last GC cycle. This seems to be a safe bet because this collector is always compacting (AFAIK). > > ### G1 GC > > Using `G1ConcurrentMark::remark()` method the live set size is computed as a sum of `_live_words` from the associated `G1RegionMarkStats` objects. Here I am not 100% sure this approach covers all eventualities and it would be great to have someone skilled in G1 implementation to chime in so I can fix it. However, the numbers I am getting for G1 are comparable to other GCs for the same application. > > ### Shenandoah > > In Shenandoah, the regions are keeping the liveness info. However, the VM op that is used for iterating regions is a safe-pointing one so it would be great to run it in an already safe-pointed context. > This leads to hooking into `ShenandoahConcurrentMark::finish_mark()` and `ShenandoahSTWMark::mark()` where at the end of the marking process the liveness info is summarized and set to `ShenandoahHeap::_live` volatile field - which is later read by the event emitting code. > > ### ZGC > > `ZStatHeap` is already holding the liveness info - so this implementation is just making it accessible via `ZCollectedHeap::live()` method. Additionally, some test(s) on this new feature would be nice. Maybe you can add sth in `HeapSummaryEventAllGcs`? PS: I was looking into how to get periodic heap usage info just a few days ago, and settled for `MemProfiling` as a workaround. Thank you for the patch. src/hotspot/share/jfr/periodic/jfrPeriodic.cpp line 649: > 647: TRACE_REQUEST_FUNC(HeapUsageSummary) { > 648: EventHeapUsageSummary event; > 649: if (event.should_commit()) { I believe the `should_commit` check is not needed; the period check is handle by the caller. src/hotspot/share/gc/parallel/parallelScavengeHeap.hpp line 79: > 77: size_t _young_live; > 78: size_t _eden_live; > 79: size_t _old_live; It's only the sum that's ever exposed, right? I wonder if it makes sense to merge them into one var to only track the sum. ------------- Changes requested by ayang (Author). PR: https://git.openjdk.java.net/jdk/pull/2579 From iklam at openjdk.java.net Fri Feb 19 08:49:47 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Fri, 19 Feb 2021 08:49:47 GMT Subject: RFR: 8261998: Remove unused shared entry support from utilities/hashtable In-Reply-To: <49wKLf3cdp0YDVDbUsbRIyV6576RzmT4fqFlIC3BDLU=.301faf2d-ad9b-4c84-8c15-dec7df375bb9@github.com> References: <49wKLf3cdp0YDVDbUsbRIyV6576RzmT4fqFlIC3BDLU=.301faf2d-ad9b-4c84-8c15-dec7df375bb9@github.com> Message-ID: On Fri, 19 Feb 2021 02:39:20 GMT, Kim Barrett wrote: > Please review this small cleanup in the utilities/hashtable facility. The > support for "shared" entries is no longer needed or used, so is being deleted. > > Testing: > mach5 tier1-4 (some CDS tests are in tier4) Marked as reviewed by iklam (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2638 From iklam at openjdk.java.net Fri Feb 19 08:49:47 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Fri, 19 Feb 2021 08:49:47 GMT Subject: RFR: 8261998: Remove unused shared entry support from utilities/hashtable In-Reply-To: <71z-SnDx5bGZgGxNyQVcKz79eA_Vsg6XEZ_RGNVP1K4=.dd905031-8f45-43ed-9d66-4753d41f34b4@github.com> References: <49wKLf3cdp0YDVDbUsbRIyV6576RzmT4fqFlIC3BDLU=.301faf2d-ad9b-4c84-8c15-dec7df375bb9@github.com> <71z-SnDx5bGZgGxNyQVcKz79eA_Vsg6XEZ_RGNVP1K4=.dd905031-8f45-43ed-9d66-4753d41f34b4@github.com> Message-ID: On Fri, 19 Feb 2021 02:56:14 GMT, Coleen Phillimore wrote: > We might want to share other hashtables like this, like the loader constraint table, but I don't think this will be needed. CDS used to use Hashtable to store stuff into the archive. It doesn't do that anymore, and has switched to CompactHashtable. So the "shared entry" support in Hashtable can be safely deleted. ------------- PR: https://git.openjdk.java.net/jdk/pull/2638 From neliasso at openjdk.java.net Fri Feb 19 09:31:41 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 19 Feb 2021 09:31:41 GMT Subject: RFR: 8261542: X86 slice and unslice intrinsics for 256-bit byte/short vectors [v4] In-Reply-To: References: Message-ID: <4j6fwKgy-WwLXNITZYeqzBVreGvoYli08IP4OxpnmUI=.7d07c386-7778-43e3-8245-5c13b411a63d@github.com> On Fri, 19 Feb 2021 03:20:59 GMT, Sandhya Viswanathan wrote: >> The slice and unslice intrinsics for 256-bit byte/short vectors can be implemented for x86 platforms supporting AVX2 using a sequence of instructions. >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8261542 >> >> The PerfSliceOrigin.java jmh test attached to the JBS shows the following performance on AVX2 platform. >> >> Before: >> Benchmark (size) Mode Cnt Score Error Units >> PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 18.887 ? 1.128 ops/ms >> PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 9.374 ? 0.370 ops/ms >> >> After: >> Benchmark (size) Mode Cnt Score Error Units >> PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 13861.420 ? 19.071 ops/ms >> PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 7895.199 ? 142.580 ops/ms > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > add assert on else path Looks good. ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2520 From aph at openjdk.java.net Fri Feb 19 11:25:50 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 19 Feb 2021 11:25:50 GMT Subject: Integrated: 8261649: AArch64: Optimize LSE atomics in C++ code In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 17:14:46 GMT, Andrew Haley wrote: > Now that we have support for LSE atomics in C++ HotSpot source, we can generate much better code for them. In particular, the sequence we generate for CMPXCHG with a full two-way barrier using two DMBs is way suboptimal. > > This patch: > > Moves memory barriers from the atomic_linux_aarch64 file into the stubs. > Rewrites the LSE versions of the stubs to be more efficient. > Fixes a race condition in stub generation. > Mostly leaves the pre-LSE stubs alone, except that I added a PRFM which according to kernel engineers improves performance. This pull request has now been integrated. Changeset: 1b0c36b0 Author: Andrew Haley URL: https://git.openjdk.java.net/jdk/commit/1b0c36b0 Stats: 285 lines in 4 files changed: 170 ins; 45 del; 70 mod 8261649: AArch64: Optimize LSE atomics in C++ code Reviewed-by: adinn ------------- PR: https://git.openjdk.java.net/jdk/pull/2611 From fdavid at openjdk.java.net Fri Feb 19 14:26:52 2021 From: fdavid at openjdk.java.net (Florian David) Date: Fri, 19 Feb 2021 14:26:52 GMT Subject: RFR: 8258414: OldObjectSample events too expensive Message-ID: The purpose of this change is to reduce the size of JFR recordings when the OldObjectSample event is enabled. ## Problem JFR recordings size blows up when the `OldObjectSample` is enabled. The memory allocation events are known to be very high traffic and will cause a lot of data, just the sheer number of events produced, and if stacktraces are added to this, the associated metadata can be huge as well. Sampled object are stored in a priority queue and their associated stack traces stored in `JFRStackTraceRepository`. When sample candidates are removed from the priority queue, their stacktraces remain in the repository, which will be later written at chunk rotation even if the sample has been removed. ## Implementation This PR adds a `JFRStackTraceRepository` dedicated to store stack traces for the `OldObjectSample` event. At chunk rotation, every sample stack trace is looked up in this repository and is serialized. Other stack traces are simply removed. ## Benchmarks On an AWS c5.metal instance (96 cores, 192 Gib), running SPECjvm2008 with default profile.jfc configuration with OldObjectSample event enabled gives - a recording size 2.78Mb with the PR fix - a recording size 20.73Mb with the PR fix ------------- Commit messages: - Add ObjectSamplerStackTraceRepository - Add objecetSampleCheckpoint.cpp StackTraceChunkWriter - Instanciate leak profiler StackTraceRepository - Add leak profiler StackTraceRepository - Un-statify jfrStackTraceRepository Changes: https://git.openjdk.java.net/jdk/pull/2644/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2644&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8258414 Stats: 189 lines in 16 files changed: 163 ins; 1 del; 25 mod Patch: https://git.openjdk.java.net/jdk/pull/2644.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2644/head:pull/2644 PR: https://git.openjdk.java.net/jdk/pull/2644 From fdavid at openjdk.java.net Fri Feb 19 14:43:47 2021 From: fdavid at openjdk.java.net (Florian David) Date: Fri, 19 Feb 2021 14:43:47 GMT Subject: Withdrawn: 8258414: OldObjectSample events too expensive In-Reply-To: References: Message-ID: <0VNi96iEwob33LcxVFEzJAMcaSRUFDTN0vlJdLuJiM8=.da238296-e855-4892-9e34-1126b2066e95@github.com> On Fri, 19 Feb 2021 14:21:06 GMT, Florian David wrote: > The purpose of this change is to reduce the size of JFR recordings when the OldObjectSample event is enabled. > > ## Problem > > JFR recordings size blows up when the `OldObjectSample` is enabled. The memory allocation events are known to be very high traffic and will cause a lot of data, just the sheer number of events produced, and if stacktraces are added to this, the associated metadata can be huge as well. Sampled object are stored in a priority queue and their associated stack traces stored in `JFRStackTraceRepository`. When sample candidates are removed from the priority queue, their stacktraces remain in the repository, which will be later written at chunk rotation even if the sample has been removed. > > ## Implementation > > This PR adds a `JFRStackTraceRepository` dedicated to store stack traces for the `OldObjectSample` event. At chunk rotation, every sample stack trace is looked up in this repository and is serialized. Other stack traces are simply removed. > > ## Benchmarks > On an AWS c5.metal instance (96 cores, 192 Gib), running SPECjvm2008 with default profile.jfc configuration with OldObjectSample event enabled gives > - a recording size 2.78Mb with the PR fix > - a recording size 20.73Mb with the PR fix This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/2644 From aph at openjdk.java.net Fri Feb 19 14:46:41 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 19 Feb 2021 14:46:41 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v7] In-Reply-To: References: Message-ID: <1zpEXArPUG-5mAnHqc7E-YBJ_whPIr0KWpSrgqV2mnQ=.d72ed309-938a-4d6a-9bbc-0a8f065c8411@github.com> On Fri, 19 Feb 2021 03:13:12 GMT, Dong Bo wrote: >> In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero, >> see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`: >> /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */ >> public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT); >> >> The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64, >> assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 0`, but `ushr dst.4H, src.4H, 16` instead. >> According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long. >> ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i); >> vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i); >> >> The legal right shift amount should be in the range 1 to the element width in bits on aarch64: >> https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en >> >> This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate. >> Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests. > > Dong Bo has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > handle zero shift in macro assembler src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 554: > 552: > 553: WRAP(usra) WRAP(ssra) > 554: #undef WRAP Are ssra and usra tested by anything? I don't seem them accessed in the test case. src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 531: > 529: > 530: // NEON shift instructions > 531: #define WRAP(INSN) \ This comment should be // AdvSIMD shift by immediate. // These are "user friendly" variants which allow a shift count of 0. ------------- PR: https://git.openjdk.java.net/jdk/pull/2472 From fdavid at openjdk.java.net Fri Feb 19 14:49:59 2021 From: fdavid at openjdk.java.net (Florian David) Date: Fri, 19 Feb 2021 14:49:59 GMT Subject: RFR: 8258414: OldObjectSample events too expensive Message-ID: <1_LsNBt-Yy5NlHbfwtRSRNvGa2AbTuhMGYuiw3Hy8gU=.3b79e283-87fe-451e-8e60-25b59c5e837a@github.com> The purpose of this change is to reduce the size of JFR recordings when the OldObjectSample event is enabled. ##Problem JFR recordings size blows up when the OldObjectSample is enabled. The memory allocation events are known to be very high traffic and will cause a lot of data, just the sheer number of events produced, and if stacktraces are added to this, the associated metadata can be huge as well. Sampled object are stored in a priority queue and their associated stack traces stored in JFRStackTraceRepository. When sample candidates are removed from the priority queue, their stacktraces remain in the repository, which will be later written at chunk rotation even if the sample has been removed. ##Implementation This PR adds a JFRStackTraceRepository dedicated to store stack traces for the OldObjectSample event. At chunk rotation, every sample stack trace is looked up in this repository and is serialized. Other stack traces are simply removed. ##Benchmarks On an AWS c5.metal instance (96 cores, 192 Gib), running SPECjvm2008 with default profile.jfc configuration with OldObjectSample event enabled gives a recording size 2.78Mb with the PR fix a recording size 20.73Mb with the PR fix ------------- Commit messages: - Make JFRRecorder serialize Leak Profiler stack traces - Serialize ObjectSampler's stack traces to Chunk - Instanciate leak profiler StackTraceRepository - Add leak profiler StackTraceRepository - Un-statify jfrStackTraceRepository Changes: https://git.openjdk.java.net/jdk/pull/2645/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2645&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8258414 Stats: 185 lines in 16 files changed: 159 ins; 1 del; 25 mod Patch: https://git.openjdk.java.net/jdk/pull/2645.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2645/head:pull/2645 PR: https://git.openjdk.java.net/jdk/pull/2645 From sviswanathan at openjdk.java.net Fri Feb 19 18:13:41 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Fri, 19 Feb 2021 18:13:41 GMT Subject: Integrated: 8261542: X86 slice and unslice intrinsics for 256-bit byte/short vectors In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 02:37:35 GMT, Sandhya Viswanathan wrote: > The slice and unslice intrinsics for 256-bit byte/short vectors can be implemented for x86 platforms supporting AVX2 using a sequence of instructions. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8261542 > > The PerfSliceOrigin.java jmh test attached to the JBS shows the following performance on AVX2 platform. > > Before: > Benchmark (size) Mode Cnt Score Error Units > PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 18.887 ? 1.128 ops/ms > PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 9.374 ? 0.370 ops/ms > > After: > Benchmark (size) Mode Cnt Score Error Units > PerfSliceOrigin.vectorSliceOrigin 1024 thrpt 5 13861.420 ? 19.071 ops/ms > PerfSliceOrigin.vectorSliceUnsliceOrigin 1024 thrpt 5 7895.199 ? 142.580 ops/ms This pull request has now been integrated. Changeset: c53acc2a Author: Sandhya Viswanathan URL: https://git.openjdk.java.net/jdk/commit/c53acc2a Stats: 120 lines in 7 files changed: 100 ins; 5 del; 15 mod 8261542: X86 slice and unslice intrinsics for 256-bit byte/short vectors Reviewed-by: kvn, neliasso ------------- PR: https://git.openjdk.java.net/jdk/pull/2520 From kbarrett at openjdk.java.net Sat Feb 20 02:57:40 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sat, 20 Feb 2021 02:57:40 GMT Subject: RFR: 8261998: Remove unused shared entry support from utilities/hashtable In-Reply-To: <71z-SnDx5bGZgGxNyQVcKz79eA_Vsg6XEZ_RGNVP1K4=.dd905031-8f45-43ed-9d66-4753d41f34b4@github.com> References: <49wKLf3cdp0YDVDbUsbRIyV6576RzmT4fqFlIC3BDLU=.301faf2d-ad9b-4c84-8c15-dec7df375bb9@github.com> <71z-SnDx5bGZgGxNyQVcKz79eA_Vsg6XEZ_RGNVP1K4=.dd905031-8f45-43ed-9d66-4753d41f34b4@github.com> Message-ID: On Fri, 19 Feb 2021 02:56:14 GMT, Coleen Phillimore wrote: >> Please review this small cleanup in the utilities/hashtable facility. The >> support for "shared" entries is no longer needed or used, so is being deleted. >> >> Testing: >> mach5 tier1-4 (some CDS tests are in tier4) > > We might want to share other hashtables like this, like the loader constraint table, but I don't think this will be needed. Thanks @coleenp and @iklam for reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/2638 From kbarrett at openjdk.java.net Sat Feb 20 03:05:59 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sat, 20 Feb 2021 03:05:59 GMT Subject: Integrated: 8261998: Remove unused shared entry support from utilities/hashtable In-Reply-To: <49wKLf3cdp0YDVDbUsbRIyV6576RzmT4fqFlIC3BDLU=.301faf2d-ad9b-4c84-8c15-dec7df375bb9@github.com> References: <49wKLf3cdp0YDVDbUsbRIyV6576RzmT4fqFlIC3BDLU=.301faf2d-ad9b-4c84-8c15-dec7df375bb9@github.com> Message-ID: On Fri, 19 Feb 2021 02:39:20 GMT, Kim Barrett wrote: > Please review this small cleanup in the utilities/hashtable facility. The > support for "shared" entries is no longer needed or used, so is being deleted. > > Testing: > mach5 tier1-4 (some CDS tests are in tier4) This pull request has now been integrated. Changeset: 5a25cea5 Author: Kim Barrett URL: https://git.openjdk.java.net/jdk/commit/5a25cea5 Stats: 28 lines in 3 files changed: 0 ins; 23 del; 5 mod 8261998: Remove unused shared entry support from utilities/hashtable Reviewed-by: coleenp, iklam ------------- PR: https://git.openjdk.java.net/jdk/pull/2638 From kbarrett at openjdk.java.net Sat Feb 20 03:05:58 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sat, 20 Feb 2021 03:05:58 GMT Subject: RFR: 8261998: Remove unused shared entry support from utilities/hashtable [v2] In-Reply-To: <49wKLf3cdp0YDVDbUsbRIyV6576RzmT4fqFlIC3BDLU=.301faf2d-ad9b-4c84-8c15-dec7df375bb9@github.com> References: <49wKLf3cdp0YDVDbUsbRIyV6576RzmT4fqFlIC3BDLU=.301faf2d-ad9b-4c84-8c15-dec7df375bb9@github.com> Message-ID: > Please review this small cleanup in the utilities/hashtable facility. The > support for "shared" entries is no longer needed or used, so is being deleted. > > Testing: > mach5 tier1-4 (some CDS tests are in tier4) Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into ht_shared - remove shared entry support ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2638/files - new: https://git.openjdk.java.net/jdk/pull/2638/files/4ebdd71a..bf811b54 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2638&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2638&range=00-01 Stats: 3323 lines in 88 files changed: 2061 ins; 627 del; 635 mod Patch: https://git.openjdk.java.net/jdk/pull/2638.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2638/head:pull/2638 PR: https://git.openjdk.java.net/jdk/pull/2638 From dongbo at openjdk.java.net Sat Feb 20 06:26:13 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Sat, 20 Feb 2021 06:26:13 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v8] In-Reply-To: References: Message-ID: <5Y24E2lvmpeh6Ke9LT-S74vUT2bf1-wE8AfRPyunycs=.8740560a-d8b1-4f25-a0c6-ad0117aa6aff@github.com> > In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero, > see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`: > /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */ > public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT); > > The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64, > assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 0`, but `ushr dst.4H, src.4H, 16` instead. > According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long. > ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i); > vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i); > > The legal right shift amount should be in the range 1 to the element width in bits on aarch64: > https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en > > This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate. > Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests. Dong Bo has updated the pull request incrementally with two additional commits since the last revision: - fix trailing whitespace - split ssra/usra tests ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2472/files - new: https://git.openjdk.java.net/jdk/pull/2472/files/1aba5629..ba8dc5ac Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=07 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=06-07 Stats: 469 lines in 3 files changed: 352 ins; 112 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/2472.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2472/head:pull/2472 PR: https://git.openjdk.java.net/jdk/pull/2472 From dongbo at openjdk.java.net Sat Feb 20 06:29:40 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Sat, 20 Feb 2021 06:29:40 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v7] In-Reply-To: <1zpEXArPUG-5mAnHqc7E-YBJ_whPIr0KWpSrgqV2mnQ=.d72ed309-938a-4d6a-9bbc-0a8f065c8411@github.com> References: <1zpEXArPUG-5mAnHqc7E-YBJ_whPIr0KWpSrgqV2mnQ=.d72ed309-938a-4d6a-9bbc-0a8f065c8411@github.com> Message-ID: On Fri, 19 Feb 2021 14:42:15 GMT, Andrew Haley wrote: >> Dong Bo has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 554: > >> 552: >> 553: WRAP(usra) WRAP(ssra) >> 554: #undef WRAP > > Are ssra and usra tested by anything? I don't seem them accessed in the test case. Updated. The `ssra/usra` are accessed by tests in `TestVectorShiftImmAndAccumulate.java`. Manually injected error by changing `addv` to `subv` if shifting right and accumulating with 0, the tests failed as expected. The `vba.add(vbb.lanewise(SHIFT, Imm))` pattern in `TestVectorShiftImmAndAccumulate.java` are actually the same with the original code in `TestVectorShiftImm.java`. As of now, I have no idea why `ssra/usra` are not accessed by the previous test code. The `vba.add(vbb.lanewise(SHIFT, Imm))` pattern should match `ssra/usra` anyway. I think we need a separate investigation. ------------- PR: https://git.openjdk.java.net/jdk/pull/2472 From xliu at openjdk.java.net Sat Feb 20 08:30:00 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Sat, 20 Feb 2021 08:30:00 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v5] In-Reply-To: References: Message-ID: > Add a flag _suppress_cr to outputStream. outstream objects won't emit any CR if it's set. > Correct TypeInstPtr::dump2 to make sure it only emits klass name once. > Remove the comment because Klass::oop_print_on() has emitted the address of oop. > > Before: > 689 ConP === 0 [[ 821 ]] Oop:java/lang/Stringjava.lang.String > {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' > - string: "a" > :Constant:exact * > > After: > 689 ConP === 0 [[ 821 ]] Oop:java.lang.String {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' - string: "a":Constant:exact * Xin Liu has updated the pull request incrementally with one additional commit since the last revision: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set move tr_delete in StringUtils. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2178/files - new: https://git.openjdk.java.net/jdk/pull/2178/files/cfd51fb3..edbd13bd Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2178&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2178&range=03-04 Stats: 119 lines in 7 files changed: 89 ins; 19 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/2178.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2178/head:pull/2178 PR: https://git.openjdk.java.net/jdk/pull/2178 From xliu at openjdk.java.net Sat Feb 20 08:35:38 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Sat, 20 Feb 2021 08:35:38 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v4] In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 19:23:14 GMT, Evgeny Astigeevich wrote: >> Another option: >> class filterStringStream: public stringStream { >> private: >> char ch; >> public: >> filterStringStream(char ch_to_filter, size_t initial_bufsize = 256) : stringStream(initial_bufsize), ch(ch_to_filter) {} >> >> virtual void write(const char* c, size_t len) override { >> const char* e = c + len; >> while (c != e) { >> size_t i = 0; >> while ((c+i) != e && c[i] != ch ) { >> ++i; >> } >> stringStream::write(c, i); >> c += i; >> while (c != e && *ch == ch) { >> ++c; >> } >> } >> } >> }; >> >> Your code will be: >> filterStringStream ss('\n'); >> ss.print(" "); >> const_oop->print_oop(&ss); >> st->print_raw(ss.base(), ss,size()); > >> `tr_delete` is expensive. Also deleting something in a stream does not fit into a concept of streams. >> I see that the content of `ss` is traversed many times. >> What about this code: >> >> ``` >> for (const char *str = ss.base(); *str; ) { >> size_t i = 0; >> while (str[i] && str[i] != '\n' ) { >> ++i; >> } >> st->print_raw(str, i); >> str += i; >> while (*str == '\n') { >> ++str; >> } >> } >> ``` > > You can put this code in a function like `print_filtering_ch(char, const stringStream&, outputStream*)` hi, @eastig , Thank you for reviewing this code. you are right, I shouldn't modify the contents of a stringStream. I treated it as a buffer instead of stream. I took you advice, I moved `tr_delete` logic to StringUtils, which is a toolkit class. Could you take a look again? ------------- PR: https://git.openjdk.java.net/jdk/pull/2178 From xliu at openjdk.java.net Sun Feb 21 02:07:59 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Sun, 21 Feb 2021 02:07:59 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v6] In-Reply-To: References: Message-ID: > Add a flag _suppress_cr to outputStream. outstream objects won't emit any CR if it's set. > Correct TypeInstPtr::dump2 to make sure it only emits klass name once. > Remove the comment because Klass::oop_print_on() has emitted the address of oop. > > Before: > 689 ConP === 0 [[ 821 ]] Oop:java/lang/Stringjava.lang.String > {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' > - string: "a" > :Constant:exact * > > After: > 689 ConP === 0 [[ 821 ]] Oop:java.lang.String {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' - string: "a":Constant:exact * Xin Liu has updated the pull request incrementally with one additional commit since the last revision: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set fix build failures on Windows. StringUtils::tr_delete returns size_of. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2178/files - new: https://git.openjdk.java.net/jdk/pull/2178/files/edbd13bd..077f9b60 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2178&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2178&range=04-05 Stats: 10 lines in 3 files changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/2178.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2178/head:pull/2178 PR: https://git.openjdk.java.net/jdk/pull/2178 From prr at openjdk.java.net Sun Feb 21 16:42:50 2021 From: prr at openjdk.java.net (Phil Race) Date: Sun, 21 Feb 2021 16:42:50 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v18] In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 12:36:10 GMT, Anton Kozlov wrote: >> Please review the implementation of JEP 391: macOS/AArch64 Port. >> >> It's heavily based on existing ports to linux/aarch64, macos/x86_64, and windows/aarch64. >> >> Major changes are in: >> * src/hotspot/cpu/aarch64: support of the new calling convention (subtasks JDK-8253817, JDK-8253818) >> * src/hotspot/os_cpu/bsd_aarch64: copy of os_cpu/linux_aarch64 with necessary adjustments (JDK-8253819) >> * src/hotspot/share, test/hotspot/gtest: support of write-xor-execute (W^X), required on macOS/AArch64 platform. It's implemented with pthread_jit_write_protect_np provided by Apple. The W^X mode is local to a thread, so W^X mode change relates to the java thread state change (for java threads). In most cases, JVM executes in write-only mode, except when calling a generated stub like SafeFetch, which requires a temporary switch to execute-only mode. The same execute-only mode is enabled when a java thread executes in java or native states. This approach of managing W^X mode turned out to be simple and efficient enough. >> * src/jdk.hotspot.agent: serviceability agent implementation (JDK-8254941) > > Anton Kozlov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 88 commits: > > - Merge remote-tracking branch 'upstream/jdk/master' into jdk-macos > - Re-do safefetch.hpp > - Merge remote-tracking branch 'origin/jdk/8261075-stubroutines-inline' into jdk-macos > - stubRoutines.inline.hpp -> safefetch.hpp > - Update copyrights > - Merge remote-tracking branch 'upstream/jdk/master' into 8261075-stubroutines-inline > - Merge remote-tracking branch 'upstream/jdk/master' into 8261075-stubroutines-inline > - Extract SafeFetch32/N to stubRoutines.inline.hpp > - Revert "Extract SafeFetch32/N to stubRoutines.inline.hpp" > > This reverts commit b873c25f31dd21349d140b790713cc9ccb5f2dc0. > - Merge pull request #9 from VladimirKempik/pull/2200 > > Removed unused variables > - ... and 78 more: https://git.openjdk.java.net/jdk/compare/b955f85e...ab72613c Looks like the compiler warning changess are now the only desktop changes. That is fine by me. ------------- Marked as reviewed by prr (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2200 From github.com+5010047+kelthuzadx at openjdk.java.net Mon Feb 22 02:14:39 2021 From: github.com+5010047+kelthuzadx at openjdk.java.net (Yang Yi) Date: Mon, 22 Feb 2021 02:14:39 GMT Subject: RFR: 8261949: fileStream::readln returns incorrect line string [v2] In-Reply-To: <9dW7QzfmQ8pZQbnqFR4x64EH6YUmqXKWNT0TPDdXcU0=.75d8426c-f121-44f3-b341-fed32f844156@github.com> References: <36ELXWgiTy9NBIxJnRUIR2YKB2Ka-RYBl8hXOWGbpXs=.b0d8c8be-9f5e-457b-8f47-ba9cd341c2dd@github.com> <6mopOrvW_0CcGxgYT6JMLuSQi7IUW2eh58EHj0SX_eY=.55868bf5-fd8d-4ab1-b02e-944f626cbb2b@github.com> <9dW7QzfmQ8pZQbnqFR4x64EH6YUmqXKWNT0TPDdXcU0=.75d8426c-f121-44f3-b341-fed32f844156@github.com> Message-ID: <-NU2c-YsKiu8X3wFitqDxLAc_JmD3n3_c4Ejs130Umo=.a6245529-b3f2-49e3-96b4-c4f057420934@github.com> On Fri, 19 Feb 2021 02:55:56 GMT, Daniel D. Daugherty wrote: >> Yang Yi has updated the pull request incrementally with one additional commit since the last revision: >> >> tweak > > I like David's suggestion better than mine. > Thumbs up. Thanks @dcubed-ojdk @dholmes-ora for reviews! Would you be able to sponsor this patch? I'm not a committer so that I can not push this commit directly. ------------- PR: https://git.openjdk.java.net/jdk/pull/2626 From github.com+5010047+kelthuzadx at openjdk.java.net Mon Feb 22 02:24:39 2021 From: github.com+5010047+kelthuzadx at openjdk.java.net (Yang Yi) Date: Mon, 22 Feb 2021 02:24:39 GMT Subject: Integrated: 8261949: fileStream::readln returns incorrect line string In-Reply-To: <36ELXWgiTy9NBIxJnRUIR2YKB2Ka-RYBl8hXOWGbpXs=.b0d8c8be-9f5e-457b-8f47-ba9cd341c2dd@github.com> References: <36ELXWgiTy9NBIxJnRUIR2YKB2Ka-RYBl8hXOWGbpXs=.b0d8c8be-9f5e-457b-8f47-ba9cd341c2dd@github.com> Message-ID: On Thu, 18 Feb 2021 12:31:29 GMT, Yang Yi wrote: > When the last line does not contain a NEWLINE character, fileStream::readln would read > truncated line string: > > $ cat file_content: > AA > BB > CC > > fileStream::readln result: > "AA" > "BB" > "C" > > This patch address this problem, it works for Posix and Windows since the last character > of these systems is always '\n'. This pull request has now been integrated. Changeset: 2b555015 Author: Yang Yi Committer: Jie Fu URL: https://git.openjdk.java.net/jdk/commit/2b555015 Stats: 5 lines in 1 file changed: 3 ins; 0 del; 2 mod 8261949: fileStream::readln returns incorrect line string Reviewed-by: dcubed, dholmes ------------- PR: https://git.openjdk.java.net/jdk/pull/2626 From github.com+5010047+kelthuzadx at openjdk.java.net Mon Feb 22 02:27:40 2021 From: github.com+5010047+kelthuzadx at openjdk.java.net (Yang Yi) Date: Mon, 22 Feb 2021 02:27:40 GMT Subject: RFR: 8261949: fileStream::readln returns incorrect line string [v2] In-Reply-To: <-NU2c-YsKiu8X3wFitqDxLAc_JmD3n3_c4Ejs130Umo=.a6245529-b3f2-49e3-96b4-c4f057420934@github.com> References: <36ELXWgiTy9NBIxJnRUIR2YKB2Ka-RYBl8hXOWGbpXs=.b0d8c8be-9f5e-457b-8f47-ba9cd341c2dd@github.com> <6mopOrvW_0CcGxgYT6JMLuSQi7IUW2eh58EHj0SX_eY=.55868bf5-fd8d-4ab1-b02e-944f626cbb2b@github.com> <9dW7QzfmQ8pZQbnqFR4x64EH6YUmqXKWNT0TPDdXcU0=.75d8426c-f121-44f3-b341-fed32f844156@github.com> <-NU2c-YsKiu8X3wFitqDxLAc_JmD3n3_c4Ejs130Umo=.a6245529-b3f2-49e3-96b4-c4f057420934@github.com> Message-ID: On Mon, 22 Feb 2021 02:12:02 GMT, Yang Yi wrote: >> I like David's suggestion better than mine. >> Thumbs up. > > Thanks @dcubed-ojdk @dholmes-ora for reviews! > > Would you be able to sponsor this patch? I'm not a committer so that I can not push this commit directly. @DamonFool I'm grateful for your sponsorship. ------------- PR: https://git.openjdk.java.net/jdk/pull/2626 From stuefe at openjdk.java.net Mon Feb 22 06:30:39 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Mon, 22 Feb 2021 06:30:39 GMT Subject: RFR: JDK-8261552: s390: MacroAssembler::encode_klass_not_null() may produce wrong results for non-zero values of narrow klass base In-Reply-To: References: Message-ID: On Thu, 18 Feb 2021 09:12:17 GMT, Lutz Schmidt wrote: >> Looks nice and elegant. >> >> But as said offlist, I dislike the fact that this hard codes the limitation to 32bit for the narrow klass pointer range. >> >> That restriction is artificial and we may just want to drop it. E.g. one recurring idea I have is to drop the duality in metaspace between non-class- and class-metaspace, and just store everything in class space. That would save quite a bit of memory (less overhead) and make the metaspace coding quite a bit simpler. However, in that case it could be that we exceed the current 3g limit and may even exceed 32bit. Since add+shift for decoding is universally done on all platforms at least if CDS is on, this should work out of the box. Unless of course the platforms hard-code the 32bit limitation into their encoding schemes. > > I don't see how you want to overcome the 32-bit limit for compressed pointers. This whole "compression" thing is based on the "trick" to store an offset instead of the full address. Depending on the object alignment requirement, this affords you 32 GB (8-byte alignment) or 64 GB (16-byte alignment) of addressable (or should I say offset-able) space. That's quite a bit. > > You use pointer compression to save space, and for nothing else. Space savings have to be so significant that they outweigh the added effort for encoding and decoding. With just some shift and add, the effort is limited, though noticeable. If you would make compressed pointers 40 bits wide (5 bytes), encoding and decoding would impose more effort. What's even worse, you then would have entities with a size not native to any processor. Just imagine you have to atomically store such a value. > > I my opinion, wider compressed pointers will have to wait until we have 128-bit pointers. > > Back to code: > In the code suggested above, you could make use of the Metaspace::class_space_end() function. If the class space end address, shifted right, fits into 32 bit, need_zero_extend may remain false. Your choice. You misunderstand me. My point was not to make narrow pointers larger than 32bit, but use the full encodable range. The encodable range is 32g atm. But we artificially limit the range to 3G (CompressedClassSpaceSize is capped at that value). I thought your proposal was based upon the assumption that the highest *uncompressed* offset into class space can be not larger than 4G. But looking at your proposal again, I see you moved the shift up before the add, so it should probably work. ------------- PR: https://git.openjdk.java.net/jdk/pull/2595 From lucy at openjdk.java.net Mon Feb 22 07:14:41 2021 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Mon, 22 Feb 2021 07:14:41 GMT Subject: RFR: JDK-8261552: s390: MacroAssembler::encode_klass_not_null() may produce wrong results for non-zero values of narrow klass base In-Reply-To: References: Message-ID: On Mon, 22 Feb 2021 06:28:14 GMT, Thomas Stuefe wrote: >> I don't see how you want to overcome the 32-bit limit for compressed pointers. This whole "compression" thing is based on the "trick" to store an offset instead of the full address. Depending on the object alignment requirement, this affords you 32 GB (8-byte alignment) or 64 GB (16-byte alignment) of addressable (or should I say offset-able) space. That's quite a bit. >> >> You use pointer compression to save space, and for nothing else. Space savings have to be so significant that they outweigh the added effort for encoding and decoding. With just some shift and add, the effort is limited, though noticeable. If you would make compressed pointers 40 bits wide (5 bytes), encoding and decoding would impose more effort. What's even worse, you then would have entities with a size not native to any processor. Just imagine you have to atomically store such a value. >> >> I my opinion, wider compressed pointers will have to wait until we have 128-bit pointers. >> >> Back to code: >> In the code suggested above, you could make use of the Metaspace::class_space_end() function. If the class space end address, shifted right, fits into 32 bit, need_zero_extend may remain false. Your choice. > > You misunderstand me. My point was not to make narrow pointers larger than 32bit, but use the full encodable range. The encodable range is 32g atm. But we artificially limit the range to 3G (CompressedClassSpaceSize is capped at that value). > > I thought your proposal was based upon the assumption that the highest *uncompressed* offset into class space can be not larger than 4G. But looking at your proposal again, I see you moved the shift up before the add, so it should probably work. So it was mutual misunderstanding. Good to have that resolved. ------------- PR: https://git.openjdk.java.net/jdk/pull/2595 From stefank at openjdk.java.net Mon Feb 22 08:28:40 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Mon, 22 Feb 2021 08:28:40 GMT Subject: RFR: 8261448: Preserve GC stack watermark across safepoints in StackWalk [v3] In-Reply-To: References: Message-ID: On Mon, 15 Feb 2021 15:20:58 GMT, Roman Kennke wrote: >> I am observing the following assert: >> >> # Internal Error (/home/rkennke/src/openjdk/loom/src/hotspot/share/runtime/stackWatermark.cpp:178), pid=54418, tid=54534 >> # assert(is_frame_safe(f)) failed: Frame must be safe >> >> (see issue for full hs_err) >> >> In StackWalk::fetchNextBatch() we prepare the entire stack to be processed by calling StackWatermarkSet::finish_processing(jt, NULL, StackWatermarkKind::gc), but then subsequently, during frames scan, perform allocations to fill in the frame information (fill_in_frames => LiveFrameStream::fill_frame => fill_live_stackframe) at where we could safepoint for GC, which could reset the stack watermark. >> >> This is only relevant for GCs that use the StackWatermark, e.g. ZGC and Shenandoah at the moment. >> >> Solution is to preserve the stack-watermark across safepoints in StackWalk::fetchNextBatch(). StackWalk::fetchFirstBatch() doesn't look to be affected by this: it is not using the stack-watermark. >> >> Testing: >> - [x] StackWalk tests with Shenandoah/aggressive >> - [x] StackWalk tests with ZGC/aggressive >> - [ ] tier1 (+Shenandoah/ZGC) >> - [ ] tier2 (+Shenandoah/ZGC) > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Make KeepStackGCProcessedMark non-reentrant again Looks good. ------------- Marked as reviewed by stefank (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2500 From kalinshi at tencent.com Mon Feb 22 08:29:45 2021 From: kalinshi at tencent.com (=?gb2312?B?a2FsaW5zaGkoyqm72yk=?=) Date: Mon, 22 Feb 2021 08:29:45 +0000 Subject: JvmtiExport::can_walk_any_space() usage in hotspot Message-ID: <365ef5a026a84e81917462643ea4bc97@tencent.com> Hi hotspot experts, Would you help on my question about JvmtiExport::can_walk_any_space() check? Question is why JvmtiExport::can_walk_any_space() check is needed in CDS when mapping region? JvmtiExport::can_walk_any_space() method is only used in FileMapInfo::map_region for modifing region read-only mapping attribute. JvmtiExport::can_walk_any_space() is set true when jvmtiCapabilities.can_tag_objects is enabled. JVMTI capability can_tag_objects enables java heap iteration/object reference tracing, and JvmtiEnv::Set/GetTag doesn't modify read-only regions in shared archive (I might wrong). comments in latest code seems outdated, JvmtiExport::can_walk_any_space() doesn't disable sharing now. " JvmtiExport::set_can_walk_any_space( avail.can_tag_objects); // disable sharing in onload phase " Back to initial code, class sharing is disabled when condition JvmtiExport::can_modify_any_class() || JvmtiExport::can_walk_any_space() is true. This matches above comment in JvmtiManageCapabilities::update. " if (JvmtiExport::can_modify_any_class() || JvmtiExport::can_walk_any_space()) { fail_continue("Tool agent requires sharing to be disabled."); return false; } " JvmtiExport::can_modify_any_class condition disables class data sharing when class file load hook (requires modify code and read only contents) is needed in initial code. Both checks are removed and used to determine region read/write attribute with following commits. These commits are mainly supporting class file load hook with CDS. 1. enable shared class when these tow checks on, modify/map all regions in shared archive as RW. 8054386: Allow Java debugging when CDS is enabled Map archive RW when debugging is enabled 8087153: EXCEPTION_ACCESS_VIOLATION when CDS RO section vanished on win32 2. Support class file load hook with CDS 8141341: CDS should be disabled if JvmtiExport::should_post_class_file_load_hook() is true Disable loading shared class if JvmtiExport::should_post_class_file_load_hook is true. 8078644: CDS needs to support JVMTI CFLH Support posting CLFH for shared classes. 3. Fix jvmtiCapabilities::can_generate_all_class_hook_events inconsistent state when shared 8161605: The '!UseSharedSpaces' check is not need in JvmtiManageCapabilities::recompute_always_capabilities 4. Fix class file load hook error for early class hook event when shared 8212200: assert when shared java.lang.Object is redefined by JVMTI agent Regards Hui From pliden at openjdk.java.net Mon Feb 22 08:47:39 2021 From: pliden at openjdk.java.net (Per Liden) Date: Mon, 22 Feb 2021 08:47:39 GMT Subject: RFR: 8258431: Provide a JFR event with live set size estimate In-Reply-To: <7HVs4jngEbNIQIPQByuE6IRYAxdijfa82uhEFWHld5U=.a7784482-d7e1-4d59-88ee-455d8691631e@github.com> References: <7HVs4jngEbNIQIPQByuE6IRYAxdijfa82uhEFWHld5U=.a7784482-d7e1-4d59-88ee-455d8691631e@github.com> Message-ID: On Mon, 15 Feb 2021 17:23:44 GMT, Jaroslav Bachorik wrote: > The purpose of this change is to expose a 'cheap' estimate of the current live set size (the meaning of 'current' is dependent on each particular GC implementation but in worst case 'at last full GC') in form of a periodically emitted JFR event. > > ## Introducing new JFR event > > While there is already 'GC Heap Summary' JFR event it does not fit the requirements as it is closely tied to GC cycle so eg. for ZGC or Shenandoah it may not happen for quite a long time, increasing the risk of not having the heap summary events being present in the JFR recording at all. > Because of this I am proposing to add a new 'Heap Usage Summary' event which will be emitted periodically, by default on each JFR chunk, and will contain the information abut the heap capacity, the used and live bytes. This information is available from all GC implementations and can be provided at literally any time. > > ## Implementation > > The implementation differs from GC to GC because each GC algorithm/implementation provides a slightly different way to track the liveness. The common part is `size_t live() const` method added to `CollectedHeap` superclass and the use of a cached 'liveness' value computed after the last GC cycle. If `liveness` hasn't been calculated yet the implementation will default to returning 'used' value. > > The implementations are based on my (rather shallow) knowledge of inner working of the respective GC engines and I am open to suggestions to make them better/correct. > > ### Epsilon GC > > Trivial implementation - just return `used()` instead. > > ### Serial GC > > Here we utilize the fact that mark-copy phase is naturally compacting so the number of bytes after copy is 'live' and that the mark-sweep implementation keeps an internal info about objects being 'dead' but excluded from the compaction effort and we can these numbers to derive the old-gen live set size (used bytes minus the cumulative size of the 'un-dead' objects). > > ### Parallel GC > > For Parallel GC the liveness is calculated as the sum of used bytes in all regions after the last GC cycle. This seems to be a safe bet because this collector is always compacting (AFAIK). > > ### G1 GC > > Using `G1ConcurrentMark::remark()` method the live set size is computed as a sum of `_live_words` from the associated `G1RegionMarkStats` objects. Here I am not 100% sure this approach covers all eventualities and it would be great to have someone skilled in G1 implementation to chime in so I can fix it. However, the numbers I am getting for G1 are comparable to other GCs for the same application. > > ### Shenandoah > > In Shenandoah, the regions are keeping the liveness info. However, the VM op that is used for iterating regions is a safe-pointing one so it would be great to run it in an already safe-pointed context. > This leads to hooking into `ShenandoahConcurrentMark::finish_mark()` and `ShenandoahSTWMark::mark()` where at the end of the marking process the liveness info is summarized and set to `ShenandoahHeap::_live` volatile field - which is later read by the event emitting code. > > ### ZGC > > `ZStatHeap` is already holding the liveness info - so this implementation is just making it accessible via `ZCollectedHeap::live()` method. src/hotspot/share/gc/z/zStat.hpp line 549: > 547: static size_t used_at_mark_start(); > 548: static size_t used_at_relocate_end(); > 549: static size_t live(); Please call this `live_at_mark_end()` to match the names of the neighboring functions. ------------- PR: https://git.openjdk.java.net/jdk/pull/2579 From stuefe at openjdk.java.net Mon Feb 22 09:32:01 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Mon, 22 Feb 2021 09:32:01 GMT Subject: RFR: JDK-8261552: s390: MacroAssembler::encode_klass_not_null() may produce wrong results for non-zero values of narrow klass base [v2] In-Reply-To: References: Message-ID: > If Compressed class pointer base has a non-zero value it may cause MacroAssembler::encode_klass_not_null() to encode a Klass pointer to a wrong narrow pointer. > > This can be reproduced by starting the VM with > -Xshare:dump -XX:HeapBaseMinAddress=2g -Xmx128m > but CDS is not involved. It is only relevant insofar as this is the only way to get the following combination: > - heap is allocated at 0x800_0000. It is small and ends at 0x8800_0000. > - class space follows at 0x8800_0000 > - the narrow klass pointer base points to the start of the class space at 0x8800_0000. > > In MacroAssembler::encode_klass_not_null(), there is the following section: > > if (base != NULL) { > unsigned int base_h = ((unsigned long)base)>>32; > unsigned int base_l = (unsigned int)((unsigned long)base); > if ((base_h != 0) && (base_l == 0) && VM_Version::has_HighWordInstr()) { > lgr_if_needed(dst, current); > z_aih(dst, -((int)base_h)); // Base has no set bits in lower half. > } else if ((base_h == 0) && (base_l != 0)) { (A) > lgr_if_needed(dst, current); > z_agfi(dst, -(int)base_l); (B) > } else { > load_const(Z_R0, base); > lgr_if_needed(dst, current); > z_sgr(dst, Z_R0); > } > current = dst; > } > > We enter the condition at (A) if the narrow klass pointer base is non-zero but fits into 32bit. At (B), we want to substract the base from the Klass pointer; we do this by calculating the 32bit twos-complement of the base and add it with AGFI. AGFI adds a 32bit immediate to a 64bit register. In this case, it produces the wrong result if the base is >0x800_0000: > > In the case of the crash, we have: > base: 8800_0000 > klass pointer: 8804_1040 > 32bit two's complement of base: 7800_0000 > added to the klass pointer: 1_0004_1040 > > So the result of the "substraction" is 1_0004_1040, it should be 4_1040, which would be the correct offset of the Klass* pointer within the ccs. > > This bug has been dormant; was activated by JDK-8250989 which changed the way class space reservation happens at CDS dump time. It surfaced first as crash in a CDS-specific jtreg test (JDK-8261552). > > ================ > > Fix: > > I changed the AGFI instruction to a pure 32bit add (AFI). That works as long as the Klass pointer also fits into 32bit. So I narrowed the condition at (A) to only fire if it can be ensured that both narrow base and Klass* pointers fit into 32bit. > > I also added a runtime verification in that case that any Klass pointer passed down is indeed a 32bit pointer. However, I am not really sure this is useful, or that this is the best way to do this (using TMHH and TMHL). I was looking for something like TMH or TML to check whole 32bit words but could not find any. > > ---- > > Tests: > > I manually tested that the crash disappears, which it does. I stepped through the encoding code and the values now look right. > > I also did build a VM with the ability to override both class space start address and the narrow klass pointer base to exact values (see https://github.com/openjdk/jdk/compare/master...tstuefe:override-ccs-start-and-base). > > I used this method to test various combinations: > - narrow klass pointer base > 0 < 4g + ccs end < 4g (we hit our branch doing AFI) > - narrow klass pointer base > 0 < 4g + ccs end > 4g (we hit the fallback doing SGR with r0) > - narrow klass pointer base = 0 (we dont do anything) > > (would this override-feature be useful? We could do better testing). > > Thanks, Thomas Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: - Lucys proposal - Revert first attempt ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2595/files - new: https://git.openjdk.java.net/jdk/pull/2595/files/07e83dfb..e096f09c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2595&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2595&range=00-01 Stats: 84 lines in 3 files changed: 28 ins; 28 del; 28 mod Patch: https://git.openjdk.java.net/jdk/pull/2595.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2595/head:pull/2595 PR: https://git.openjdk.java.net/jdk/pull/2595 From rkennke at openjdk.java.net Mon Feb 22 09:35:46 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 22 Feb 2021 09:35:46 GMT Subject: RFR: 8261448: Preserve GC stack watermark across safepoints in StackWalk [v3] In-Reply-To: References: Message-ID: On Mon, 22 Feb 2021 08:26:19 GMT, Stefan Karlsson wrote: > Looks good. Thanks, Stefan! @fisk also good? ------------- PR: https://git.openjdk.java.net/jdk/pull/2500 From eosterlund at openjdk.java.net Mon Feb 22 09:42:40 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 22 Feb 2021 09:42:40 GMT Subject: RFR: 8261448: Preserve GC stack watermark across safepoints in StackWalk [v3] In-Reply-To: References: Message-ID: On Mon, 15 Feb 2021 15:20:58 GMT, Roman Kennke wrote: >> I am observing the following assert: >> >> # Internal Error (/home/rkennke/src/openjdk/loom/src/hotspot/share/runtime/stackWatermark.cpp:178), pid=54418, tid=54534 >> # assert(is_frame_safe(f)) failed: Frame must be safe >> >> (see issue for full hs_err) >> >> In StackWalk::fetchNextBatch() we prepare the entire stack to be processed by calling StackWatermarkSet::finish_processing(jt, NULL, StackWatermarkKind::gc), but then subsequently, during frames scan, perform allocations to fill in the frame information (fill_in_frames => LiveFrameStream::fill_frame => fill_live_stackframe) at where we could safepoint for GC, which could reset the stack watermark. >> >> This is only relevant for GCs that use the StackWatermark, e.g. ZGC and Shenandoah at the moment. >> >> Solution is to preserve the stack-watermark across safepoints in StackWalk::fetchNextBatch(). StackWalk::fetchFirstBatch() doesn't look to be affected by this: it is not using the stack-watermark. >> >> Testing: >> - [x] StackWalk tests with Shenandoah/aggressive >> - [x] StackWalk tests with ZGC/aggressive >> - [ ] tier1 (+Shenandoah/ZGC) >> - [ ] tier2 (+Shenandoah/ZGC) > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Make KeepStackGCProcessedMark non-reentrant again Also good! ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2500 From rkennke at openjdk.java.net Mon Feb 22 10:13:48 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 22 Feb 2021 10:13:48 GMT Subject: Integrated: 8261448: Preserve GC stack watermark across safepoints in StackWalk In-Reply-To: References: Message-ID: On Wed, 10 Feb 2021 10:07:20 GMT, Roman Kennke wrote: > I am observing the following assert: > > # Internal Error (/home/rkennke/src/openjdk/loom/src/hotspot/share/runtime/stackWatermark.cpp:178), pid=54418, tid=54534 > # assert(is_frame_safe(f)) failed: Frame must be safe > > (see issue for full hs_err) > > In StackWalk::fetchNextBatch() we prepare the entire stack to be processed by calling StackWatermarkSet::finish_processing(jt, NULL, StackWatermarkKind::gc), but then subsequently, during frames scan, perform allocations to fill in the frame information (fill_in_frames => LiveFrameStream::fill_frame => fill_live_stackframe) at where we could safepoint for GC, which could reset the stack watermark. > > This is only relevant for GCs that use the StackWatermark, e.g. ZGC and Shenandoah at the moment. > > Solution is to preserve the stack-watermark across safepoints in StackWalk::fetchNextBatch(). StackWalk::fetchFirstBatch() doesn't look to be affected by this: it is not using the stack-watermark. > > Testing: > - [x] StackWalk tests with Shenandoah/aggressive > - [x] StackWalk tests with ZGC/aggressive > - [x] tier1 (+Shenandoah/ZGC) > - [x] tier2 (+Shenandoah/ZGC) This pull request has now been integrated. Changeset: c20fb5db Author: Roman Kennke URL: https://git.openjdk.java.net/jdk/commit/c20fb5db Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod 8261448: Preserve GC stack watermark across safepoints in StackWalk Reviewed-by: eosterlund, stefank ------------- PR: https://git.openjdk.java.net/jdk/pull/2500 From mdoerr at openjdk.java.net Mon Feb 22 10:38:41 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Mon, 22 Feb 2021 10:38:41 GMT Subject: RFR: JDK-8261552: s390: MacroAssembler::encode_klass_not_null() may produce wrong results for non-zero values of narrow klass base [v2] In-Reply-To: References: Message-ID: On Mon, 22 Feb 2021 09:32:01 GMT, Thomas Stuefe wrote: >> If Compressed class pointer base has a non-zero value it may cause MacroAssembler::encode_klass_not_null() to encode a Klass pointer to a wrong narrow pointer. >> >> This can be reproduced by starting the VM with >> -Xshare:dump -XX:HeapBaseMinAddress=2g -Xmx128m >> but CDS is not involved. It is only relevant insofar as this is the only way to get the following combination: >> - heap is allocated at 0x800_0000. It is small and ends at 0x8800_0000. >> - class space follows at 0x8800_0000 >> - the narrow klass pointer base points to the start of the class space at 0x8800_0000. >> >> In MacroAssembler::encode_klass_not_null(), there is the following section: >> >> if (base != NULL) { >> unsigned int base_h = ((unsigned long)base)>>32; >> unsigned int base_l = (unsigned int)((unsigned long)base); >> if ((base_h != 0) && (base_l == 0) && VM_Version::has_HighWordInstr()) { >> lgr_if_needed(dst, current); >> z_aih(dst, -((int)base_h)); // Base has no set bits in lower half. >> } else if ((base_h == 0) && (base_l != 0)) { (A) >> lgr_if_needed(dst, current); >> z_agfi(dst, -(int)base_l); (B) >> } else { >> load_const(Z_R0, base); >> lgr_if_needed(dst, current); >> z_sgr(dst, Z_R0); >> } >> current = dst; >> } >> >> We enter the condition at (A) if the narrow klass pointer base is non-zero but fits into 32bit. At (B), we want to substract the base from the Klass pointer; we do this by calculating the 32bit twos-complement of the base and add it with AGFI. AGFI adds a 32bit immediate to a 64bit register. In this case, it produces the wrong result if the base is >0x800_0000: >> >> In the case of the crash, we have: >> base: 8800_0000 >> klass pointer: 8804_1040 >> 32bit two's complement of base: 7800_0000 >> added to the klass pointer: 1_0004_1040 >> >> So the result of the "substraction" is 1_0004_1040, it should be 4_1040, which would be the correct offset of the Klass* pointer within the ccs. >> >> This bug has been dormant; was activated by JDK-8250989 which changed the way class space reservation happens at CDS dump time. It surfaced first as crash in a CDS-specific jtreg test (JDK-8261552). >> >> ================ >> >> Fix: >> >> I changed the AGFI instruction to a pure 32bit add (AFI). That works as long as the Klass pointer also fits into 32bit. So I narrowed the condition at (A) to only fire if it can be ensured that both narrow base and Klass* pointers fit into 32bit. >> >> I also added a runtime verification in that case that any Klass pointer passed down is indeed a 32bit pointer. However, I am not really sure this is useful, or that this is the best way to do this (using TMHH and TMHL). I was looking for something like TMH or TML to check whole 32bit words but could not find any. >> >> ---- >> >> Tests: >> >> I manually tested that the crash disappears, which it does. I stepped through the encoding code and the values now look right. >> >> I also did build a VM with the ability to override both class space start address and the narrow klass pointer base to exact values (see https://github.com/openjdk/jdk/compare/master...tstuefe:override-ccs-start-and-base). >> >> I used this method to test various combinations: >> - narrow klass pointer base > 0 < 4g + ccs end < 4g (we hit our branch doing AFI) >> - narrow klass pointer base > 0 < 4g + ccs end > 4g (we hit the fallback doing SGR with r0) >> - narrow klass pointer base = 0 (we dont do anything) >> >> (would this override-feature be useful? We could do better testing). >> >> Thanks, Thomas > > Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: > > - Lucys proposal > - Revert first attempt Thanks for fixing. Looks correct, but I have one minor finding. src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3635: > 3633: if (base != NULL) { > 3634: // Use scaled-down base address parts to match scaled-down klass pointer. > 3635: unsigned int base_h = ((unsigned long)base)>>(32+shift); base_h is unused, but referred to in the comments ------------- Changes requested by mdoerr (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2595 From lucy at openjdk.java.net Mon Feb 22 10:51:40 2021 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Mon, 22 Feb 2021 10:51:40 GMT Subject: RFR: JDK-8261552: s390: MacroAssembler::encode_klass_not_null() may produce wrong results for non-zero values of narrow klass base [v2] In-Reply-To: References: Message-ID: On Mon, 22 Feb 2021 09:32:01 GMT, Thomas Stuefe wrote: >> If Compressed class pointer base has a non-zero value it may cause MacroAssembler::encode_klass_not_null() to encode a Klass pointer to a wrong narrow pointer. >> >> This can be reproduced by starting the VM with >> -Xshare:dump -XX:HeapBaseMinAddress=2g -Xmx128m >> but CDS is not involved. It is only relevant insofar as this is the only way to get the following combination: >> - heap is allocated at 0x800_0000. It is small and ends at 0x8800_0000. >> - class space follows at 0x8800_0000 >> - the narrow klass pointer base points to the start of the class space at 0x8800_0000. >> >> In MacroAssembler::encode_klass_not_null(), there is the following section: >> >> if (base != NULL) { >> unsigned int base_h = ((unsigned long)base)>>32; >> unsigned int base_l = (unsigned int)((unsigned long)base); >> if ((base_h != 0) && (base_l == 0) && VM_Version::has_HighWordInstr()) { >> lgr_if_needed(dst, current); >> z_aih(dst, -((int)base_h)); // Base has no set bits in lower half. >> } else if ((base_h == 0) && (base_l != 0)) { (A) >> lgr_if_needed(dst, current); >> z_agfi(dst, -(int)base_l); (B) >> } else { >> load_const(Z_R0, base); >> lgr_if_needed(dst, current); >> z_sgr(dst, Z_R0); >> } >> current = dst; >> } >> >> We enter the condition at (A) if the narrow klass pointer base is non-zero but fits into 32bit. At (B), we want to substract the base from the Klass pointer; we do this by calculating the 32bit twos-complement of the base and add it with AGFI. AGFI adds a 32bit immediate to a 64bit register. In this case, it produces the wrong result if the base is >0x800_0000: >> >> In the case of the crash, we have: >> base: 8800_0000 >> klass pointer: 8804_1040 >> 32bit two's complement of base: 7800_0000 >> added to the klass pointer: 1_0004_1040 >> >> So the result of the "substraction" is 1_0004_1040, it should be 4_1040, which would be the correct offset of the Klass* pointer within the ccs. >> >> This bug has been dormant; was activated by JDK-8250989 which changed the way class space reservation happens at CDS dump time. It surfaced first as crash in a CDS-specific jtreg test (JDK-8261552). >> >> ================ >> >> Fix: >> >> I changed the AGFI instruction to a pure 32bit add (AFI). That works as long as the Klass pointer also fits into 32bit. So I narrowed the condition at (A) to only fire if it can be ensured that both narrow base and Klass* pointers fit into 32bit. >> >> I also added a runtime verification in that case that any Klass pointer passed down is indeed a 32bit pointer. However, I am not really sure this is useful, or that this is the best way to do this (using TMHH and TMHL). I was looking for something like TMH or TML to check whole 32bit words but could not find any. >> >> ---- >> >> Tests: >> >> I manually tested that the crash disappears, which it does. I stepped through the encoding code and the values now look right. >> >> I also did build a VM with the ability to override both class space start address and the narrow klass pointer base to exact values (see https://github.com/openjdk/jdk/compare/master...tstuefe:override-ccs-start-and-base). >> >> I used this method to test various combinations: >> - narrow klass pointer base > 0 < 4g + ccs end < 4g (we hit our branch doing AFI) >> - narrow klass pointer base > 0 < 4g + ccs end > 4g (we hit the fallback doing SGR with r0) >> - narrow klass pointer base = 0 (we dont do anything) >> >> (would this override-feature be useful? We could do better testing). >> >> Thanks, Thomas > > Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: > > - Lucys proposal > - Revert first attempt The changes look good to me now. Including the additional optimisation is optional. Thanks for debugging, finding and fixing! src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3649: > 3647: // - Both values are treated as unsigned. The unsigned subtraction is > 3648: // replaced by adding (unsigned) the 2's complement of the subtrahend. > 3649: There is a further tiny optimisation you may want to include in the final version: // If we happen to see (base_h == 0), we are sure there // is no borrow from bit#33. No zero-extension is needed. if (base_h == 0) { need_zero_extend = false; } ------------- Marked as reviewed by lucy (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2595 From lucy at openjdk.java.net Mon Feb 22 10:54:42 2021 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Mon, 22 Feb 2021 10:54:42 GMT Subject: RFR: JDK-8261552: s390: MacroAssembler::encode_klass_not_null() may produce wrong results for non-zero values of narrow klass base [v2] In-Reply-To: References: Message-ID: On Mon, 22 Feb 2021 10:31:45 GMT, Martin Doerr wrote: >> Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: >> >> - Lucys proposal >> - Revert first attempt > > src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3635: > >> 3633: if (base != NULL) { >> 3634: // Use scaled-down base address parts to match scaled-down klass pointer. >> 3635: unsigned int base_h = ((unsigned long)base)>>(32+shift); > > base_h is unused, but referred to in the comments There was a comment crossing... With my latest suggestion, base_h is now used. ------------- PR: https://git.openjdk.java.net/jdk/pull/2595 From mdoerr at openjdk.java.net Mon Feb 22 14:32:09 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Mon, 22 Feb 2021 14:32:09 GMT Subject: RFR: JDK-8261552: s390: MacroAssembler::encode_klass_not_null() may produce wrong results for non-zero values of narrow klass base [v2] In-Reply-To: References: Message-ID: On Mon, 22 Feb 2021 09:32:01 GMT, Thomas Stuefe wrote: >> If Compressed class pointer base has a non-zero value it may cause MacroAssembler::encode_klass_not_null() to encode a Klass pointer to a wrong narrow pointer. >> >> This can be reproduced by starting the VM with >> -Xshare:dump -XX:HeapBaseMinAddress=2g -Xmx128m >> but CDS is not involved. It is only relevant insofar as this is the only way to get the following combination: >> - heap is allocated at 0x800_0000. It is small and ends at 0x8800_0000. >> - class space follows at 0x8800_0000 >> - the narrow klass pointer base points to the start of the class space at 0x8800_0000. >> >> In MacroAssembler::encode_klass_not_null(), there is the following section: >> >> if (base != NULL) { >> unsigned int base_h = ((unsigned long)base)>>32; >> unsigned int base_l = (unsigned int)((unsigned long)base); >> if ((base_h != 0) && (base_l == 0) && VM_Version::has_HighWordInstr()) { >> lgr_if_needed(dst, current); >> z_aih(dst, -((int)base_h)); // Base has no set bits in lower half. >> } else if ((base_h == 0) && (base_l != 0)) { (A) >> lgr_if_needed(dst, current); >> z_agfi(dst, -(int)base_l); (B) >> } else { >> load_const(Z_R0, base); >> lgr_if_needed(dst, current); >> z_sgr(dst, Z_R0); >> } >> current = dst; >> } >> >> We enter the condition at (A) if the narrow klass pointer base is non-zero but fits into 32bit. At (B), we want to substract the base from the Klass pointer; we do this by calculating the 32bit twos-complement of the base and add it with AGFI. AGFI adds a 32bit immediate to a 64bit register. In this case, it produces the wrong result if the base is >0x800_0000: >> >> In the case of the crash, we have: >> base: 8800_0000 >> klass pointer: 8804_1040 >> 32bit two's complement of base: 7800_0000 >> added to the klass pointer: 1_0004_1040 >> >> So the result of the "substraction" is 1_0004_1040, it should be 4_1040, which would be the correct offset of the Klass* pointer within the ccs. >> >> This bug has been dormant; was activated by JDK-8250989 which changed the way class space reservation happens at CDS dump time. It surfaced first as crash in a CDS-specific jtreg test (JDK-8261552). >> >> ================ >> >> Fix: >> >> I changed the AGFI instruction to a pure 32bit add (AFI). That works as long as the Klass pointer also fits into 32bit. So I narrowed the condition at (A) to only fire if it can be ensured that both narrow base and Klass* pointers fit into 32bit. >> >> I also added a runtime verification in that case that any Klass pointer passed down is indeed a 32bit pointer. However, I am not really sure this is useful, or that this is the best way to do this (using TMHH and TMHL). I was looking for something like TMH or TML to check whole 32bit words but could not find any. >> >> ---- >> >> Tests: >> >> I manually tested that the crash disappears, which it does. I stepped through the encoding code and the values now look right. >> >> I also did build a VM with the ability to override both class space start address and the narrow klass pointer base to exact values (see https://github.com/openjdk/jdk/compare/master...tstuefe:override-ccs-start-and-base). >> >> I used this method to test various combinations: >> - narrow klass pointer base > 0 < 4g + ccs end < 4g (we hit our branch doing AFI) >> - narrow klass pointer base > 0 < 4g + ccs end > 4g (we hit the fallback doing SGR with r0) >> - narrow klass pointer base = 0 (we dont do anything) >> >> (would this override-feature be useful? We could do better testing). >> >> Thanks, Thomas > > Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: > > - Lucys proposal > - Revert first attempt As discussed offline, I think we need to support encoding of Class Pointer 0x100000000 (and above) with e.g. base = 0x0C0000000 and shift = 0. need_zero_extend is false in this example which possibly leaves a 1 in the higher 32 bit. Lower 32 bit are correct in your current version, but some code may rely on zero extension to 64 bit. ------------- Changes requested by mdoerr (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2595 From tschatzl at openjdk.java.net Mon Feb 22 17:23:43 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 22 Feb 2021 17:23:43 GMT Subject: RFR: 8258431: Provide a JFR event with live set size estimate In-Reply-To: <7HVs4jngEbNIQIPQByuE6IRYAxdijfa82uhEFWHld5U=.a7784482-d7e1-4d59-88ee-455d8691631e@github.com> References: <7HVs4jngEbNIQIPQByuE6IRYAxdijfa82uhEFWHld5U=.a7784482-d7e1-4d59-88ee-455d8691631e@github.com> Message-ID: On Mon, 15 Feb 2021 17:23:44 GMT, Jaroslav Bachorik wrote: > The purpose of this change is to expose a 'cheap' estimate of the current live set size (the meaning of 'current' is dependent on each particular GC implementation but in worst case 'at last full GC') in form of a periodically emitted JFR event. > > ## Introducing new JFR event > > While there is already 'GC Heap Summary' JFR event it does not fit the requirements as it is closely tied to GC cycle so eg. for ZGC or Shenandoah it may not happen for quite a long time, increasing the risk of not having the heap summary events being present in the JFR recording at all. > Because of this I am proposing to add a new 'Heap Usage Summary' event which will be emitted periodically, by default on each JFR chunk, and will contain the information abut the heap capacity, the used and live bytes. This information is available from all GC implementations and can be provided at literally any time. > > ## Implementation > > The implementation differs from GC to GC because each GC algorithm/implementation provides a slightly different way to track the liveness. The common part is `size_t live() const` method added to `CollectedHeap` superclass and the use of a cached 'liveness' value computed after the last GC cycle. If `liveness` hasn't been calculated yet the implementation will default to returning 'used' value. > > The implementations are based on my (rather shallow) knowledge of inner working of the respective GC engines and I am open to suggestions to make them better/correct. > > ### Epsilon GC > > Trivial implementation - just return `used()` instead. > > ### Serial GC > > Here we utilize the fact that mark-copy phase is naturally compacting so the number of bytes after copy is 'live' and that the mark-sweep implementation keeps an internal info about objects being 'dead' but excluded from the compaction effort and we can these numbers to derive the old-gen live set size (used bytes minus the cumulative size of the 'un-dead' objects). > > ### Parallel GC > > For Parallel GC the liveness is calculated as the sum of used bytes in all regions after the last GC cycle. This seems to be a safe bet because this collector is always compacting (AFAIK). > > ### G1 GC > > Using `G1ConcurrentMark::remark()` method the live set size is computed as a sum of `_live_words` from the associated `G1RegionMarkStats` objects. Here I am not 100% sure this approach covers all eventualities and it would be great to have someone skilled in G1 implementation to chime in so I can fix it. However, the numbers I am getting for G1 are comparable to other GCs for the same application. > > ### Shenandoah > > In Shenandoah, the regions are keeping the liveness info. However, the VM op that is used for iterating regions is a safe-pointing one so it would be great to run it in an already safe-pointed context. > This leads to hooking into `ShenandoahConcurrentMark::finish_mark()` and `ShenandoahSTWMark::mark()` where at the end of the marking process the liveness info is summarized and set to `ShenandoahHeap::_live` volatile field - which is later read by the event emitting code. > > ### ZGC > > `ZStatHeap` is already holding the liveness info - so this implementation is just making it accessible via `ZCollectedHeap::live()` method. The change also misses liveness update after G1 Full GC: it should at least reset the internal liveness counter to 0 so that `used()` is used. I think there is the same issue for Parallel Full GC. Serial seems to be handled. src/hotspot/share/gc/shared/collectedHeap.hpp line 217: > 215: virtual size_t capacity() const = 0; > 216: virtual size_t used() const = 0; > 217: // a best-effort estimate of the live set size I would prefer @shipilev's comment. Also I would like to suggest to call this method `live_estimate()` to set the expectations right. src/hotspot/share/gc/g1/g1ConcurrentMark.cpp line 1114: > 1112: > 1113: _g1h->set_live(live_size * HeapWordSize); > 1114: This code is located in the wrong place. It will return only the live words for the areas that have been marked, not eden or objects allocated in old gen after the marking started. Further it iterates over all regions, which can be large compared to actually active regions. A better place is in `G1UpdateRemSetTrackingBeforeRebuild::do_heap_region()` after the last method call - at that point, `HeapRegion::live_bytes()` contains the per-region number of live data for all regions. `G1UpdateRemSetTrackingBeforeRebuild` is instantiated and then called by multiple threads. It's probably best that that `HeapClosure` locally sums up the live byte estimates and then in the caller `G1UpdateRemSetTrackingBeforeRebuildTask::work()` sums up the per thread results like is done for `G1UpdateRemSetTrackingBeforeRebuildTask::_total_selected_for_rebuild`, which is then set in the caller of the `G1UpdateRemSetTrackingBeforeRebuildTask`. src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 1850: > 1848: size_t G1CollectedHeap::live() const { > 1849: size_t size = Atomic::load(&_live_size); > 1850: return size > 0 ? size : used(); note that `used()` is susceptible to fluttering due to memory ordering problems: since its result consists of multiple reads, you can get readings from very different situations. It is recommended to use `used_unlocked()` instead, which does not take allocation regions and archive regions into account, but at least it is not susceptible to jumping around when re-reading it in quick succession. src/hotspot/share/gc/parallel/parallelScavengeHeap.inline.hpp line 49: > 47: _young_live = young_gen()->used_in_bytes(); > 48: _eden_live = young_gen()->eden_space()->used_in_bytes(); > 49: _old_live = old_gen()->used_in_bytes(); `_young_live` already seems to contain `_eden_live` looking at the implementation of `PSYoungGen::used_in_bytes()`: I.e. `size_t PSYoungGen::used_in_bytes() const { return eden_space()->used_in_bytes() + from_space()->used_in_bytes(); // to_space() is only used during scavenge } ` but maybe I'm wrong here. src/hotspot/share/gc/shared/genCollectedHeap.cpp line 683: > 681: } > 682: // update the live size after last GC > 683: _live_size = _young_gen->live() + _old_gen->live(); I would prefer if that code were placed into `gc_epilogue`. src/hotspot/share/gc/shared/space.inline.hpp line 189: > 187: oop obj = oop(cur_obj); > 188: size_t obj_size = obj->size(); > 189: live_offset += obj_size; It seems more natural to me to put this counting into the `DeadSpacer` as this is what this change does. Also, the actual dead space "used" can be calculated from the difference between the `_allowed_deadspace_words` and the maximum (calculated in the constructor of `DeadSpacer`) afaict at the end of evacuation. So there is no need to incur per-object costs during evacuation at all. ------------- Changes requested by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2579 From tschatzl at openjdk.java.net Mon Feb 22 17:23:44 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 22 Feb 2021 17:23:44 GMT Subject: RFR: 8258431: Provide a JFR event with live set size estimate In-Reply-To: References: <7HVs4jngEbNIQIPQByuE6IRYAxdijfa82uhEFWHld5U=.a7784482-d7e1-4d59-88ee-455d8691631e@github.com> Message-ID: On Thu, 18 Feb 2021 10:15:37 GMT, Aleksey Shipilev wrote: >> The purpose of this change is to expose a 'cheap' estimate of the current live set size (the meaning of 'current' is dependent on each particular GC implementation but in worst case 'at last full GC') in form of a periodically emitted JFR event. >> >> ## Introducing new JFR event >> >> While there is already 'GC Heap Summary' JFR event it does not fit the requirements as it is closely tied to GC cycle so eg. for ZGC or Shenandoah it may not happen for quite a long time, increasing the risk of not having the heap summary events being present in the JFR recording at all. >> Because of this I am proposing to add a new 'Heap Usage Summary' event which will be emitted periodically, by default on each JFR chunk, and will contain the information abut the heap capacity, the used and live bytes. This information is available from all GC implementations and can be provided at literally any time. >> >> ## Implementation >> >> The implementation differs from GC to GC because each GC algorithm/implementation provides a slightly different way to track the liveness. The common part is `size_t live() const` method added to `CollectedHeap` superclass and the use of a cached 'liveness' value computed after the last GC cycle. If `liveness` hasn't been calculated yet the implementation will default to returning 'used' value. >> >> The implementations are based on my (rather shallow) knowledge of inner working of the respective GC engines and I am open to suggestions to make them better/correct. >> >> ### Epsilon GC >> >> Trivial implementation - just return `used()` instead. >> >> ### Serial GC >> >> Here we utilize the fact that mark-copy phase is naturally compacting so the number of bytes after copy is 'live' and that the mark-sweep implementation keeps an internal info about objects being 'dead' but excluded from the compaction effort and we can these numbers to derive the old-gen live set size (used bytes minus the cumulative size of the 'un-dead' objects). >> >> ### Parallel GC >> >> For Parallel GC the liveness is calculated as the sum of used bytes in all regions after the last GC cycle. This seems to be a safe bet because this collector is always compacting (AFAIK). >> >> ### G1 GC >> >> Using `G1ConcurrentMark::remark()` method the live set size is computed as a sum of `_live_words` from the associated `G1RegionMarkStats` objects. Here I am not 100% sure this approach covers all eventualities and it would be great to have someone skilled in G1 implementation to chime in so I can fix it. However, the numbers I am getting for G1 are comparable to other GCs for the same application. >> >> ### Shenandoah >> >> In Shenandoah, the regions are keeping the liveness info. However, the VM op that is used for iterating regions is a safe-pointing one so it would be great to run it in an already safe-pointed context. >> This leads to hooking into `ShenandoahConcurrentMark::finish_mark()` and `ShenandoahSTWMark::mark()` where at the end of the marking process the liveness info is summarized and set to `ShenandoahHeap::_live` volatile field - which is later read by the event emitting code. >> >> ### ZGC >> >> `ZStatHeap` is already holding the liveness info - so this implementation is just making it accessible via `ZCollectedHeap::live()` method. > > src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 4578: > >> 4576: >> 4577: void G1CollectedHeap::set_live(size_t bytes) { >> 4578: Atomic::release_store(&_live_size, bytes); > > I don't think this requires `release_store`, regular `store` would be enough. G1 folks can say for sure. Not required. > src/hotspot/share/gc/shared/genCollectedHeap.hpp line 183: > >> 181: size_t live = _live_size; >> 182: return live > 0 ? live : used(); >> 183: }; > > I think the implementation belongs to `genCollectedHeap.cpp`. +1. Does not seem to be performance sensitive. ------------- PR: https://git.openjdk.java.net/jdk/pull/2579 From tschatzl at openjdk.java.net Mon Feb 22 17:23:45 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 22 Feb 2021 17:23:45 GMT Subject: RFR: 8258431: Provide a JFR event with live set size estimate In-Reply-To: References: <7HVs4jngEbNIQIPQByuE6IRYAxdijfa82uhEFWHld5U=.a7784482-d7e1-4d59-88ee-455d8691631e@github.com> Message-ID: On Fri, 19 Feb 2021 08:22:56 GMT, Albert Mingkun Yang wrote: >> The purpose of this change is to expose a 'cheap' estimate of the current live set size (the meaning of 'current' is dependent on each particular GC implementation but in worst case 'at last full GC') in form of a periodically emitted JFR event. >> >> ## Introducing new JFR event >> >> While there is already 'GC Heap Summary' JFR event it does not fit the requirements as it is closely tied to GC cycle so eg. for ZGC or Shenandoah it may not happen for quite a long time, increasing the risk of not having the heap summary events being present in the JFR recording at all. >> Because of this I am proposing to add a new 'Heap Usage Summary' event which will be emitted periodically, by default on each JFR chunk, and will contain the information abut the heap capacity, the used and live bytes. This information is available from all GC implementations and can be provided at literally any time. >> >> ## Implementation >> >> The implementation differs from GC to GC because each GC algorithm/implementation provides a slightly different way to track the liveness. The common part is `size_t live() const` method added to `CollectedHeap` superclass and the use of a cached 'liveness' value computed after the last GC cycle. If `liveness` hasn't been calculated yet the implementation will default to returning 'used' value. >> >> The implementations are based on my (rather shallow) knowledge of inner working of the respective GC engines and I am open to suggestions to make them better/correct. >> >> ### Epsilon GC >> >> Trivial implementation - just return `used()` instead. >> >> ### Serial GC >> >> Here we utilize the fact that mark-copy phase is naturally compacting so the number of bytes after copy is 'live' and that the mark-sweep implementation keeps an internal info about objects being 'dead' but excluded from the compaction effort and we can these numbers to derive the old-gen live set size (used bytes minus the cumulative size of the 'un-dead' objects). >> >> ### Parallel GC >> >> For Parallel GC the liveness is calculated as the sum of used bytes in all regions after the last GC cycle. This seems to be a safe bet because this collector is always compacting (AFAIK). >> >> ### G1 GC >> >> Using `G1ConcurrentMark::remark()` method the live set size is computed as a sum of `_live_words` from the associated `G1RegionMarkStats` objects. Here I am not 100% sure this approach covers all eventualities and it would be great to have someone skilled in G1 implementation to chime in so I can fix it. However, the numbers I am getting for G1 are comparable to other GCs for the same application. >> >> ### Shenandoah >> >> In Shenandoah, the regions are keeping the liveness info. However, the VM op that is used for iterating regions is a safe-pointing one so it would be great to run it in an already safe-pointed context. >> This leads to hooking into `ShenandoahConcurrentMark::finish_mark()` and `ShenandoahSTWMark::mark()` where at the end of the marking process the liveness info is summarized and set to `ShenandoahHeap::_live` volatile field - which is later read by the event emitting code. >> >> ### ZGC >> >> `ZStatHeap` is already holding the liveness info - so this implementation is just making it accessible via `ZCollectedHeap::live()` method. > > src/hotspot/share/gc/parallel/parallelScavengeHeap.hpp line 79: > >> 77: size_t _young_live; >> 78: size_t _eden_live; >> 79: size_t _old_live; > > It's only the sum that's ever exposed, right? I wonder if it makes sense to merge them into one var to only track the sum. I agree because they seem to be always read and written at the same time. ------------- PR: https://git.openjdk.java.net/jdk/pull/2579 From daniel.daugherty at oracle.com Mon Feb 22 18:17:23 2021 From: daniel.daugherty at oracle.com (daniel.daugherty at oracle.com) Date: Mon, 22 Feb 2021 13:17:23 -0500 Subject: JvmtiExport::can_walk_any_space() usage in hotspot In-Reply-To: <365ef5a026a84e81917462643ea4bc97@tencent.com> References: <365ef5a026a84e81917462643ea4bc97@tencent.com> Message-ID: <15ea918b-2d8a-b021-f5d0-ed1af1f4c84c@oracle.com> Adding serviceability-dev at ... to this email thread since JVM/TI is maintained by the Serviceability Team... Dan On 2/22/21 3:29 AM, kalinshi(??) wrote: > Hi hotspot experts, > > Would you help on my question about JvmtiExport::can_walk_any_space() check? > Question is why JvmtiExport::can_walk_any_space() check is needed in CDS when mapping region? > > JvmtiExport::can_walk_any_space() method is only used in FileMapInfo::map_region for modifing region read-only mapping attribute. > JvmtiExport::can_walk_any_space() is set true when jvmtiCapabilities.can_tag_objects is enabled. > JVMTI capability can_tag_objects enables java heap iteration/object reference tracing, and JvmtiEnv::Set/GetTag doesn't modify read-only regions in shared archive (I might wrong). > > comments in latest code seems outdated, JvmtiExport::can_walk_any_space() doesn't disable sharing now. > " > JvmtiExport::set_can_walk_any_space( > avail.can_tag_objects); // disable sharing in onload phase > " > > Back to initial code, class sharing is disabled when condition JvmtiExport::can_modify_any_class() || JvmtiExport::can_walk_any_space() is true. > This matches above comment in JvmtiManageCapabilities::update. > " > if (JvmtiExport::can_modify_any_class() || JvmtiExport::can_walk_any_space()) { > fail_continue("Tool agent requires sharing to be disabled."); > return false; > } > " > > JvmtiExport::can_modify_any_class condition disables class data sharing when class file load hook (requires modify code and read only contents) is needed in initial code. > Both checks are removed and used to determine region read/write attribute with following commits. These commits are mainly supporting class file load hook with CDS. > > 1. enable shared class when these tow checks on, modify/map all regions in shared archive as RW. > 8054386: Allow Java debugging when CDS is enabled Map archive RW when debugging is enabled > 8087153: EXCEPTION_ACCESS_VIOLATION when CDS RO section vanished on win32 > > 2. Support class file load hook with CDS > 8141341: CDS should be disabled if JvmtiExport::should_post_class_file_load_hook() is true Disable loading shared class if JvmtiExport::should_post_class_file_load_hook is true. > 8078644: CDS needs to support JVMTI CFLH Support posting CLFH for shared classes. > > 3. Fix jvmtiCapabilities::can_generate_all_class_hook_events inconsistent state when shared > 8161605: The '!UseSharedSpaces' check is not need in JvmtiManageCapabilities::recompute_always_capabilities > > 4. Fix class file load hook error for early class hook event when shared > 8212200: assert when shared java.lang.Object is redefined by JVMTI agent > > Regards > Hui From egahlin at openjdk.java.net Mon Feb 22 19:40:39 2021 From: egahlin at openjdk.java.net (Erik Gahlin) Date: Mon, 22 Feb 2021 19:40:39 GMT Subject: RFR: 8258431: Provide a JFR event with live set size estimate In-Reply-To: <7HVs4jngEbNIQIPQByuE6IRYAxdijfa82uhEFWHld5U=.a7784482-d7e1-4d59-88ee-455d8691631e@github.com> References: <7HVs4jngEbNIQIPQByuE6IRYAxdijfa82uhEFWHld5U=.a7784482-d7e1-4d59-88ee-455d8691631e@github.com> Message-ID: On Mon, 15 Feb 2021 17:23:44 GMT, Jaroslav Bachorik wrote: > The purpose of this change is to expose a 'cheap' estimate of the current live set size (the meaning of 'current' is dependent on each particular GC implementation but in worst case 'at last full GC') in form of a periodically emitted JFR event. > > ## Introducing new JFR event > > While there is already 'GC Heap Summary' JFR event it does not fit the requirements as it is closely tied to GC cycle so eg. for ZGC or Shenandoah it may not happen for quite a long time, increasing the risk of not having the heap summary events being present in the JFR recording at all. > Because of this I am proposing to add a new 'Heap Usage Summary' event which will be emitted periodically, by default on each JFR chunk, and will contain the information abut the heap capacity, the used and live bytes. This information is available from all GC implementations and can be provided at literally any time. > > ## Implementation > > The implementation differs from GC to GC because each GC algorithm/implementation provides a slightly different way to track the liveness. The common part is `size_t live() const` method added to `CollectedHeap` superclass and the use of a cached 'liveness' value computed after the last GC cycle. If `liveness` hasn't been calculated yet the implementation will default to returning 'used' value. > > The implementations are based on my (rather shallow) knowledge of inner working of the respective GC engines and I am open to suggestions to make them better/correct. > > ### Epsilon GC > > Trivial implementation - just return `used()` instead. > > ### Serial GC > > Here we utilize the fact that mark-copy phase is naturally compacting so the number of bytes after copy is 'live' and that the mark-sweep implementation keeps an internal info about objects being 'dead' but excluded from the compaction effort and we can these numbers to derive the old-gen live set size (used bytes minus the cumulative size of the 'un-dead' objects). > > ### Parallel GC > > For Parallel GC the liveness is calculated as the sum of used bytes in all regions after the last GC cycle. This seems to be a safe bet because this collector is always compacting (AFAIK). > > ### G1 GC > > Using `G1ConcurrentMark::remark()` method the live set size is computed as a sum of `_live_words` from the associated `G1RegionMarkStats` objects. Here I am not 100% sure this approach covers all eventualities and it would be great to have someone skilled in G1 implementation to chime in so I can fix it. However, the numbers I am getting for G1 are comparable to other GCs for the same application. > > ### Shenandoah > > In Shenandoah, the regions are keeping the liveness info. However, the VM op that is used for iterating regions is a safe-pointing one so it would be great to run it in an already safe-pointed context. > This leads to hooking into `ShenandoahConcurrentMark::finish_mark()` and `ShenandoahSTWMark::mark()` where at the end of the marking process the liveness info is summarized and set to `ShenandoahHeap::_live` volatile field - which is later read by the event emitting code. > > ### ZGC > > `ZStatHeap` is already holding the liveness info - so this implementation is just making it accessible via `ZCollectedHeap::live()` method. src/hotspot/share/jfr/metadata/metadata.xml line 205: > 203: > 204: > 205: I think it would be good to mention in the description that it is an estimate, i.e. "Estimate of live bytes ....". ------------- PR: https://git.openjdk.java.net/jdk/pull/2579 From enikitin at openjdk.java.net Mon Feb 22 20:36:44 2021 From: enikitin at openjdk.java.net (Evgeny Nikitin) Date: Mon, 22 Feb 2021 20:36:44 GMT Subject: RFR: 8058176: [mlvm] tests should not allow code cache exhaustion [v2] In-Reply-To: <2qEkvkaxAPHeFaDoCRmcPaehczQgwZNnZMxO2Z-Vc28=.d4845a88-7d71-4768-b952-5ff9c4ab8311@github.com> References: <2_Gpraz6NaY17HPfRDW-LD-sQrrPQ4dpIVP8vikpdXM=.d425cd8b-aea5-43be-865e-72229db81e6e@github.com> <2qEkvkaxAPHeFaDoCRmcPaehczQgwZNnZMxO2Z-Vc28=.d4845a88-7d71-4768-b952-5ff9c4ab8311@github.com> Message-ID: On Wed, 17 Feb 2021 15:46:44 GMT, Igor Ignatyev wrote: >> Well, seems like rebalancing doesn't works that good. Here's a sample failure with plenty of free space in the non-nmethods heap: >> >> [8.230s][warning][codecache] CodeHeap 'non-profiled nmethods' is full. Compiler has been disabled. >> [8.230s][warning][codecache] Try increasing the code heap size using -XX:NonProfiledCodeHeapSize= >> Java HotSpot(TM) 64-Bit Server VM warning: CodeHeap 'non-profiled nmethods' is full. Compiler has been disabled. >> Java HotSpot(TM) 64-Bit Server VM warning: Try increasing the code heap size using -XX:NonProfiledCodeHeapSize= >> CodeHeap 'non-profiled nmethods': size=8192Kb used=8191Kb max_used=8191Kb free=0Kb << Exhausted >> CodeHeap 'profiled nmethods': size=8192Kb used=8191Kb max_used=8191Kb free=0Kb << Exhausted >> CodeHeap 'non-nmethods': size=102400Kb used=18343Kb max_used=18343Kb free=84056Kb << 84Mb of free space >> >> # ERROR: Caught exception in Thread[Thread-41,5,MainThreadGroup] >> ... >> # ERROR: Caused by: java.lang.VirtualMachineError: Out of space in CodeCache for method handle intrinsic >> The sum monitoring won't help here either. I've added non-nmethods heap to the monitoring, just to be sure. > > hm... that can mean that there is a product bug (or my recollections about code heaps aren't as good as I thought). > > @TobiHartmann , @iwanowww, could you please take a look? Evgeny's observations suggest that method handle intrinsics use `non-profiled nmethods` and `profiled nmethods` heaps and not `non-nmethods` heap despite the fact that the last one has plenty of free space. my understanding is/was that we should have used `non-nmethods` heap for MH intrinsic 1st and if it's exhausted start to use the other heaps. > > Thanks, > -- Igor I inspected sample built up cache with 'Compiler.CodeHeap_Analytics' diagnostic command. The vast majority of the 'non-profiled nmethods' heap are zillions of `invokeBasic`, `linkToStatic` and similar, with different signatures. Dump shows something like this: nMethod (active) invokeBasic(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; nMethod (active) invokeBasic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;DFJD)Ljava/lang/Object; nMethod (active) invokeBasic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;DFJDLjava/lang/Object;)Ljava/lang/Object; nMethod (active) invokeBasic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;DFJDLjava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; ... with their signatures marching to the right screen border and beyond. Given that their arguments are mish-mashed in all possible combinations, there are really many of them (I've been able to build up cashes up to 300MB without a pair signatures repeating). They are nmethods, and should be in the nmethods cache, aren't they? ------------- PR: https://git.openjdk.java.net/jdk/pull/2523 From stuefe at openjdk.java.net Tue Feb 23 05:00:57 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 23 Feb 2021 05:00:57 GMT Subject: RFR: JDK-8261520: JDK-8261302 breaks runtime/NMT/CheckForProperDetailStackTrace.java Message-ID: <4pPyMfRC1i30Q_MxXBo8QE_RYmKd3CYfzWr2M4K-c5w=.e8d85c7a-e4c9-42f2-9bce-55600c0e0ec9@github.com> Since JDK-8261302, the test runtime/NMT/CheckForProperDetailStackTrace.java fails with java.lang.RuntimeException: 'NativeCallStack::NativeCallStack' found in stdout -- `NativeCallStack` contains a hash code. Before JDK-8261302, that hash code was calculated lazily in a non-inline hashcode getter. With JDK-8261302, the hash code calculation was moved into the `NativeCallStack` constructor and the getter was made inline. The `NativeCallStack` constructor fills itself via `os::get_native_stack()`. Before JDK-8261302, that call has been the last call in the constructor and hence had been sometimes optimized into a tail call. Whether or not its a tail call matters since it affects the number of stack frames the stack walker has to skip. Therefore, the constructor contains coding to predict tail-call-ness: #if (defined(_NMT_NOINLINE_) || defined(_WINDOWS) || !defined(_LP64) || defined(PPC64)) // Not a tail call. toSkip++; #if (defined(_NMT_NOINLINE_) && defined(BSD) && defined(_LP64)) // Mac OS X slowdebug builds have this odd behavior where NativeCallStack::NativeCallStack // appears as two frames, so we need to skip an extra frame. toSkip++; #endif // Special-case for BSD. #endif // Not a tail call. This prediction was now off since the hash code calculation happened at the end of the callstack. This causes the test error, since on some platforms (eg Linux x64) we now think we have a tail call when we don't, which means we do not skip enough frames, and the NMT output contains call frames like "NativeCallStack::NativeCallStack()", which trips the test. ----------- Fix: This fix moves the hash code calculation completely out of NativeCallStack. There is no reason why NativeCallStack should have a hash code. It mainly exists as a convenience to place it in a hash map. The patch moves the hash code calculation up into MallocSiteTableEntry. This has the advantage of only having to pay for a hash code when you need it - in theory, one may use NativeCallStack in places other than NMT, where it is unnecessary. I considered other options: - modify `os::get_native_stack()` to also calculate a hash in addition to capturing the stack, and return it in a caller provided variable. That would have left this call to be the tail call. However, it seemed less clean - we have two implementations of this function, as well as other, non-capturing, NativeCallStack constructors, which would have to be modified. It also would have made `os::get_native_stack()` less general purpose. - Leave it as it is and just always skip frames: Seemed attractive, but I did not want to touch the tailcode-prediction-code and play whack-the-mole with platform specific test errors. --------------- Tests: GA, manual test, nightlies at SAP ------------- Commit messages: - Initial Changes: https://git.openjdk.java.net/jdk/pull/2672/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2672&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261520 Stats: 39 lines in 6 files changed: 13 ins; 17 del; 9 mod Patch: https://git.openjdk.java.net/jdk/pull/2672.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2672/head:pull/2672 PR: https://git.openjdk.java.net/jdk/pull/2672 From stuefe at openjdk.java.net Tue Feb 23 05:31:47 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 23 Feb 2021 05:31:47 GMT Subject: RFR: JDK-8262074: Investigate defaults for MetaspaceSize Message-ID: <2CoSJFr7zv4Q38bybB1v6-MLLePLJs-kv_897GOuudk=.92c86e36-18e6-4415-8ab1-6889993e976a@github.com> I was looking at whether the default values for MetaspaceSize (the initial threshold to start off a metaspace-motivated GC) still make sense after JEP-387. The default is dependent on compiler tier and bitness. It is also spread across all platforms. In addition to that, it also may get modified after Metaspace::ergo_initialize() in client-compiler-emulation-mode: https://github.com/openjdk/jdk/blob/2b00367e1154feb2c05b84a11d62fb5750e46acf/src/hotspot/share/compiler/compilerDefinitions.cpp#L194-L196 which is unexpected and causes confusion (eg JDK-8261907, JDK-8261907). The reasons for this seem to originate from PermGen times: https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2021-February/045536.html ---- Today, MetaspaceSize defaults to: - no compiler (eg Zero): **4M** (32bit) **5.19M** (64bit) - C1-only build: **12M** - C1+C2 build (standard): **16M** (32bit) **20,75M** (64bit) I was surprised to see that they do not depend on any compiler *runtime* switches. It only depends on build time decisions. --- How much do we use? I analyzed a simple java app to see the difference VM settings make on initial metaspace consumption. Committed space, used in brackets: (Note: (used) committed CDS on: 64bit: (181,58 KB) 384 KB (a) 64bit tier1 only: (170,04 KB) 384 KB 64bit Xint: (16,62 KB) 256 KB 32bit (178 KB) 256 KB 32bit tier1 only: (144 KB) 256 KB 32bit Xint: (11 KB) (b) 128 KB CDS off: 64bit: (5,06 MB) 5.62 MB 64bit tier1 only: (5,00 MB) 5,56 MB 64bit Xint: (4,84 MB) 5.44 MB 32bit (3,69 MB) 3.75 MB 32bit tier1 only: (3,65 MB) 3.75 MB 32bit Xint: (3,52 MB) 3.62 MB Class space on/off CDS off, 64bit, +CompressedClassPointers: 5.44M CDS off, 64bit, -CompressedClassPointers: 5.38M _Notes: (a) Since JEP-387, with CDS=on, we pay very little committed footprint upfront (384K). For comparison, JDK 15 commits here 5.75M. (b) The seemingly high difference between Xint and C1+C2 - 11K vs 178K - is misleading: All initial classes get compiled, but since most of their metadata live in CDS, not in Metaspace, all we allocate at the start are MethodCounters. Hence, with -Xint, we almost allocate nothing. That changes as soon as we start loading application classes._ Conclusions: - CDS=off increases metaspace footprint by a flat amount, in my case ~5MB, which makes sense. - Running with (any) compiler has not much influence once we start using Metaspace for real. The difference between C1-only and C1+C2 is neglectible, the difference between Xint and C1+C2 amounts to about 2% wrt to initial metaspace consumption. - Running with or without compressed Klass pointers makes not much difference. With class space, we pay for certain overhead twice, but at this early stage this is not noticeable. - The difference between 64bit and 32bit is more like 1.4-1.5, not the 1.3 factor we currently assume ----- Proposal: 1) I propose to make MetaspaceSize independent from compiler. For one, if the intention was to have a lower threshold with compilers deactivated, that has never worked. E.g. on 64bit we always had a threshold of 20.75MB regardless of Xint/TieredStopAtLevel. Even if it worked, the compiler does not make that much difference in metaspace footprint. 2) I propose to slightly lower MetaspaceSize - on 32bit from 16M to 14M, on 64bit from 20.75M to 20M. This takes the slightly lower metaspace footprint since JEP 387 into account (less waste) and the scale I found to be higher than 1.3. This is all very cautious. For the standard VM, very little changes, so this is mainly a cleanup patch. We could probably tune MetaspaceSize down to much lower levels. And/or make size it differently depending on UseSharedSpaces. However, atm I don't have time to hunt regressions due to too early GCs. ----- Tests: GA, nightlies at SAP ------------- Commit messages: - Start Changes: https://git.openjdk.java.net/jdk/pull/2675/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2675&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8262074 Stats: 28 lines in 13 files changed: 0 ins; 27 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2675.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2675/head:pull/2675 PR: https://git.openjdk.java.net/jdk/pull/2675 From iklam at openjdk.java.net Tue Feb 23 05:45:40 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Tue, 23 Feb 2021 05:45:40 GMT Subject: RFR: JDK-8262074: Investigate defaults for MetaspaceSize In-Reply-To: <2CoSJFr7zv4Q38bybB1v6-MLLePLJs-kv_897GOuudk=.92c86e36-18e6-4415-8ab1-6889993e976a@github.com> References: <2CoSJFr7zv4Q38bybB1v6-MLLePLJs-kv_897GOuudk=.92c86e36-18e6-4415-8ab1-6889993e976a@github.com> Message-ID: On Mon, 22 Feb 2021 16:08:20 GMT, Thomas Stuefe wrote: > I was looking at whether the default values for MetaspaceSize (the initial threshold to start off a metaspace-motivated GC) still make sense after JEP-387. > > The default is dependent on compiler tier and bitness. It is also spread across all platforms. > > In addition to that, it also may get modified after Metaspace::ergo_initialize() in client-compiler-emulation-mode: > > https://github.com/openjdk/jdk/blob/2b00367e1154feb2c05b84a11d62fb5750e46acf/src/hotspot/share/compiler/compilerDefinitions.cpp#L194-L196 > > which is unexpected and causes confusion (eg JDK-8261907, JDK-8261907). > > The reasons for this seem to originate from PermGen times: > https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2021-February/045536.html > > ---- > > Today, MetaspaceSize defaults to: > > - no compiler (eg Zero): **4M** (32bit) **5.19M** (64bit) > - C1-only build: **12M** > - C1+C2 build (standard): **16M** (32bit) **20,75M** (64bit) > > I was surprised to see that they do not depend on any compiler *runtime* switches. It only depends on build time decisions. > > --- > > How much do we use? I analyzed a simple java app to see the difference VM settings make on initial metaspace consumption. Committed space, used in brackets: > > > (Note: (used) committed > CDS on: > > 64bit: (181,58 KB) 384 KB (a) > 64bit tier1 only: (170,04 KB) 384 KB > 64bit Xint: (16,62 KB) 256 KB > > 32bit (178 KB) 256 KB > 32bit tier1 only: (144 KB) 256 KB > 32bit Xint: (11 KB) (b) 128 KB > > CDS off: > > 64bit: (5,06 MB) 5.62 MB > 64bit tier1 only: (5,00 MB) 5,56 MB > 64bit Xint: (4,84 MB) 5.44 MB > > 32bit (3,69 MB) 3.75 MB > 32bit tier1 only: (3,65 MB) 3.75 MB > 32bit Xint: (3,52 MB) 3.62 MB > > Class space on/off > > CDS off, 64bit, +CompressedClassPointers: 5.44M > CDS off, 64bit, -CompressedClassPointers: 5.38M > > > _Notes: > (a) Since JEP-387, with CDS=on, we pay very little committed footprint upfront (384K). For comparison, JDK 15 commits here 5.75M. > (b) The seemingly high difference between Xint and C1+C2 - 11K vs 178K - is misleading: All initial classes get compiled, but since most of their metadata live in CDS, not in Metaspace, all we allocate at the start are MethodCounters. Hence, with -Xint, we almost allocate nothing. That changes as soon as we start loading application classes._ > > Conclusions: > - CDS=off increases metaspace footprint by a flat amount, in my case ~5MB, which makes sense. > - Running with (any) compiler has not much influence once we start using Metaspace for real. The difference between C1-only and C1+C2 is neglectible, the difference between Xint and C1+C2 amounts to about 2% wrt to initial metaspace consumption. > - Running with or without compressed Klass pointers makes not much difference. With class space, we pay for certain overhead twice, but at this early stage this is not noticeable. > - The difference between 64bit and 32bit is more like 1.4-1.5, not the 1.3 factor we currently assume > > ----- > > Proposal: > > 1) I propose to make MetaspaceSize independent from compiler. For one, if the intention was to have a lower threshold with compilers deactivated, that has never worked. E.g. on 64bit we always had a threshold of 20.75MB regardless of Xint/TieredStopAtLevel. Even if it worked, the compiler does not make that much difference in metaspace footprint. > > 2) I propose to slightly lower MetaspaceSize - on 32bit from 16M to 14M, on 64bit from 20.75M to 20M. This takes the slightly lower metaspace footprint since JEP 387 into account (less waste) and the scale I found to be higher than 1.3. > > This is all very cautious. For the standard VM, very little changes, so this is mainly a cleanup patch. We could probably tune MetaspaceSize down to much lower levels. And/or make size it differently depending on UseSharedSpaces. However, atm I don't have time to hunt regressions due to too early GCs. > > ----- > > Tests: GA, nightlies at SAP Looks good to me. I would suggest changing the RFE title to something like "Consolidate the default value of MetaspaceSize". ------------- Marked as reviewed by iklam (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2675 From stuefe at openjdk.java.net Tue Feb 23 06:07:44 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 23 Feb 2021 06:07:44 GMT Subject: RFR: JDK-8262074: Consolidate the default value of MetaspaceSize In-Reply-To: References: <2CoSJFr7zv4Q38bybB1v6-MLLePLJs-kv_897GOuudk=.92c86e36-18e6-4415-8ab1-6889993e976a@github.com> Message-ID: <6BaRQSMUKPRXvzJBFniXLwomnPGQxToVd2uAe04UjHs=.2fe74157-3bf0-4865-8696-e6c7996f7381@github.com> On Tue, 23 Feb 2021 05:42:41 GMT, Ioi Lam wrote: > Looks good to me. Thanks! > I would suggest changing the RFE title to something like "Consolidate the default value of MetaspaceSize". Done. ------------- PR: https://git.openjdk.java.net/jdk/pull/2675 From stefan.karlsson at oracle.com Tue Feb 23 07:32:36 2021 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Tue, 23 Feb 2021 08:32:36 +0100 Subject: RFR: JDK-8262074: Investigate defaults for MetaspaceSize In-Reply-To: <2CoSJFr7zv4Q38bybB1v6-MLLePLJs-kv_897GOuudk=.92c86e36-18e6-4415-8ab1-6889993e976a@github.com> References: <2CoSJFr7zv4Q38bybB1v6-MLLePLJs-kv_897GOuudk=.92c86e36-18e6-4415-8ab1-6889993e976a@github.com> Message-ID: <4347dede-f083-4147-465d-9a6a1cd67c83@oracle.com> Hi Thomas, On 2021-02-23 06:31, Thomas Stuefe wrote: > I was looking at whether the default values for MetaspaceSize (the initial threshold to start off a metaspace-motivated GC) still make sense after JEP-387. > > The default is dependent on compiler tier and bitness. It is also spread across all platforms. > > In addition to that, it also may get modified after Metaspace::ergo_initialize() in client-compiler-emulation-mode: > > https://github.com/openjdk/jdk/blob/2b00367e1154feb2c05b84a11d62fb5750e46acf/src/hotspot/share/compiler/compilerDefinitions.cpp#L194-L196 > > which is unexpected and causes confusion (eg JDK-8261907, JDK-8261907). > > The reasons for this seem to originate from PermGen times: > https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2021-February/045536.html > > ---- > > Today, MetaspaceSize defaults to: > > - no compiler (eg Zero): **4M** (32bit) **5.19M** (64bit) > - C1-only build: **12M** > - C1+C2 build (standard): **16M** (32bit) **20,75M** (64bit) > > I was surprised to see that they do not depend on any compiler *runtime* switches. It only depends on build time decisions. > > --- > > How much do we use? I analyzed a simple java app to see the difference VM settings make on initial metaspace consumption. Committed space, used in brackets: > > > (Note: (used) committed > CDS on: > > 64bit: (181,58 KB) 384 KB (a) > 64bit tier1 only: (170,04 KB) 384 KB > 64bit Xint: (16,62 KB) 256 KB > > 32bit (178 KB) 256 KB > 32bit tier1 only: (144 KB) 256 KB > 32bit Xint: (11 KB) (b) 128 KB > > CDS off: > > 64bit: (5,06 MB) 5.62 MB > 64bit tier1 only: (5,00 MB) 5,56 MB > 64bit Xint: (4,84 MB) 5.44 MB > > 32bit (3,69 MB) 3.75 MB > 32bit tier1 only: (3,65 MB) 3.75 MB > 32bit Xint: (3,52 MB) 3.62 MB > > Class space on/off > > CDS off, 64bit, +CompressedClassPointers: 5.44M > CDS off, 64bit, -CompressedClassPointers: 5.38M > > > _Notes: > (a) Since JEP-387, with CDS=on, we pay very little committed footprint upfront (384K). For comparison, JDK 15 commits here 5.75M. > (b) The seemingly high difference between Xint and C1+C2 - 11K vs 178K - is misleading: All initial classes get compiled, but since most of their metadata live in CDS, not in Metaspace, all we allocate at the start are MethodCounters. Hence, with -Xint, we almost allocate nothing. That changes as soon as we start loading application classes._ > > Conclusions: > - CDS=off increases metaspace footprint by a flat amount, in my case ~5MB, which makes sense. > - Running with (any) compiler has not much influence once we start using Metaspace for real. The difference between C1-only and C1+C2 is neglectible, the difference between Xint and C1+C2 amounts to about 2% wrt to initial metaspace consumption. > - Running with or without compressed Klass pointers makes not much difference. With class space, we pay for certain overhead twice, but at this early stage this is not noticeable. > - The difference between 64bit and 32bit is more like 1.4-1.5, not the 1.3 factor we currently assume > > ----- > > Proposal: > > 1) I propose to make MetaspaceSize independent from compiler. For one, if the intention was to have a lower threshold with compilers deactivated, that has never worked. E.g. on 64bit we always had a threshold of 20.75MB regardless of Xint/TieredStopAtLevel. Even if it worked, the compiler does not make that much difference in metaspace footprint. I think it makes sense to get rid of this difference between build configurations. > > 2) I propose to slightly lower MetaspaceSize - on 32bit from 16M to 14M, on 64bit from 20.75M to 20M. This takes the slightly lower metaspace footprint since JEP 387 into account (less waste) and the scale I found to be higher than 1.3. > > This is all very cautious. For the standard VM, very little changes, so this is mainly a cleanup patch. We could probably tune MetaspaceSize down to much lower levels. And/or make size it differently depending on UseSharedSpaces. However, atm I don't have time to hunt regressions due to too early GCs. What would the motivation be for wanting to lower the size of MetaspaceSize? I've gotten feedback from others reporting that we trigger GCs too early, because of a (for them) too low MetaspaceSize. Thanks, StefanK > > ----- > > Tests: GA, nightlies at SAP > > ------------- > > Commit messages: > - Start > > Changes: https://git.openjdk.java.net/jdk/pull/2675/files > Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2675&range=00 > Issue: https://bugs.openjdk.java.net/browse/JDK-8262074 > Stats: 28 lines in 13 files changed: 0 ins; 27 del; 1 mod > Patch: https://git.openjdk.java.net/jdk/pull/2675.diff > Fetch: git fetch https://git.openjdk.java.net/jdk pull/2675/head:pull/2675 > > PR: https://git.openjdk.java.net/jdk/pull/2675 From stuefe at openjdk.java.net Tue Feb 23 07:47:59 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 23 Feb 2021 07:47:59 GMT Subject: RFR: JDK-8262074: Consolidate the default value of MetaspaceSize [v2] In-Reply-To: <2CoSJFr7zv4Q38bybB1v6-MLLePLJs-kv_897GOuudk=.92c86e36-18e6-4415-8ab1-6889993e976a@github.com> References: <2CoSJFr7zv4Q38bybB1v6-MLLePLJs-kv_897GOuudk=.92c86e36-18e6-4415-8ab1-6889993e976a@github.com> Message-ID: > I was looking at whether the default values for MetaspaceSize (the initial threshold to start off a metaspace-motivated GC) still make sense after JEP-387. > > The default is dependent on compiler tier and bitness. It is also spread across all platforms. > > In addition to that, it also may get modified after Metaspace::ergo_initialize() in client-compiler-emulation-mode: > > https://github.com/openjdk/jdk/blob/2b00367e1154feb2c05b84a11d62fb5750e46acf/src/hotspot/share/compiler/compilerDefinitions.cpp#L194-L196 > > which is unexpected and causes confusion (eg JDK-8261907, JDK-8261907). > > The reasons for this seem to originate from PermGen times: > https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2021-February/045536.html > > ---- > > Today, MetaspaceSize defaults to: > > - no compiler (eg Zero): **4M** (32bit) **5.19M** (64bit) > - C1-only build: **12M** > - C1+C2 build (standard): **16M** (32bit) **20,75M** (64bit) > > I was surprised to see that they do not depend on any compiler *runtime* switches. It only depends on build time decisions. > > --- > > How much do we use? I analyzed a simple java app to see the difference VM settings make on initial metaspace consumption. Committed space, used in brackets: > > > (Note: (used) committed > CDS on: > > 64bit: (181,58 KB) 384 KB (a) > 64bit tier1 only: (170,04 KB) 384 KB > 64bit Xint: (16,62 KB) 256 KB > > 32bit (178 KB) 256 KB > 32bit tier1 only: (144 KB) 256 KB > 32bit Xint: (11 KB) (b) 128 KB > > CDS off: > > 64bit: (5,06 MB) 5.62 MB > 64bit tier1 only: (5,00 MB) 5,56 MB > 64bit Xint: (4,84 MB) 5.44 MB > > 32bit (3,69 MB) 3.75 MB > 32bit tier1 only: (3,65 MB) 3.75 MB > 32bit Xint: (3,52 MB) 3.62 MB > > Class space on/off > > CDS off, 64bit, +CompressedClassPointers: 5.44M > CDS off, 64bit, -CompressedClassPointers: 5.38M > > > _Notes: > (a) Since JEP-387, with CDS=on, we pay very little committed footprint upfront (384K). For comparison, JDK 15 commits here 5.75M. > (b) The seemingly high difference between Xint and C1+C2 - 11K vs 178K - is misleading: All initial classes get compiled, but since most of their metadata live in CDS, not in Metaspace, all we allocate at the start are MethodCounters. Hence, with -Xint, we almost allocate nothing. That changes as soon as we start loading application classes._ > > Conclusions: > - CDS=off increases metaspace footprint by a flat amount, in my case ~5MB, which makes sense. > - Running with (any) compiler has not much influence once we start using Metaspace for real. The difference between C1-only and C1+C2 is neglectible, the difference between Xint and C1+C2 amounts to about 2% wrt to initial metaspace consumption. > - Running with or without compressed Klass pointers makes not much difference. With class space, we pay for certain overhead twice, but at this early stage this is not noticeable. > - The difference between 64bit and 32bit is more like 1.4-1.5, not the 1.3 factor we currently assume > > ----- > > Proposal: > > 1) I propose to make MetaspaceSize independent from compiler. For one, if the intention was to have a lower threshold with compilers deactivated, that has never worked. E.g. on 64bit we always had a threshold of 20.75MB regardless of Xint/TieredStopAtLevel. Even if it worked, the compiler does not make that much difference in metaspace footprint. > > 2) I propose to slightly lower MetaspaceSize - on 32bit from 16M to 14M, on 64bit from 20.75M to 20M. This takes the slightly lower metaspace footprint since JEP 387 into account (less waste) and the scale I found to be higher than 1.3. > > This is all very cautious. For the standard VM, very little changes, so this is mainly a cleanup patch. We could probably tune MetaspaceSize down to much lower levels. And/or make size it differently depending on UseSharedSpaces. However, atm I don't have time to hunt regressions due to too early GCs. > > ----- > > Tests: GA, nightlies at SAP Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: Increase MetaspaceSize default ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2675/files - new: https://git.openjdk.java.net/jdk/pull/2675/files/6b3c81ab..152ec617 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2675&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2675&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2675.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2675/head:pull/2675 PR: https://git.openjdk.java.net/jdk/pull/2675 From stuefe at openjdk.java.net Tue Feb 23 07:59:37 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 23 Feb 2021 07:59:37 GMT Subject: RFR: JDK-8262074: Consolidate the default value of MetaspaceSize [v2] In-Reply-To: <6BaRQSMUKPRXvzJBFniXLwomnPGQxToVd2uAe04UjHs=.2fe74157-3bf0-4865-8696-e6c7996f7381@github.com> References: <2CoSJFr7zv4Q38bybB1v6-MLLePLJs-kv_897GOuudk=.92c86e36-18e6-4415-8ab1-6889993e976a@github.com> <6BaRQSMUKPRXvzJBFniXLwomnPGQxToVd2uAe04UjHs=.2fe74157-3bf0-4865-8696-e6c7996f7381@github.com> Message-ID: On Tue, 23 Feb 2021 06:04:47 GMT, Thomas Stuefe wrote: >> Looks good to me. I would suggest changing the RFE title to something like "Consolidate the default value of MetaspaceSize". > >> Looks good to me. > Thanks! >> I would suggest changing the RFE title to something like "Consolidate the default value of MetaspaceSize". > Done. > Hi Thomas, > (skip) > > ----- > > Proposal: > > 1) I propose to make MetaspaceSize independent from compiler. For one, if the intention was to have a lower threshold with compilers deactivated, that has never worked. E.g. on 64bit we always had a threshold of 20.75MB regardless of Xint/TieredStopAtLevel. Even if it worked, the compiler does not make that much difference in metaspace footprint. > > I think it makes sense to get rid of this difference between build > configurations. > > > 2) I propose to slightly lower MetaspaceSize - on 32bit from 16M to 14M, on 64bit from 20.75M to 20M. This takes the slightly lower metaspace footprint since JEP 387 into account (less waste) and the scale I found to be higher than 1.3. > > This is all very cautious. For the standard VM, very little changes, so this is mainly a cleanup patch. We could probably tune MetaspaceSize down to much lower levels. And/or make size it differently depending on UseSharedSpaces. However, atm I don't have time to hunt regressions due to too early GCs. > > What would the motivation be for wanting to lower the size of > MetaspaceSize? I've gotten feedback from others reporting that we > trigger GCs too early, because of a (for them) too low MetaspaceSize. Initial metaspace usage is probably much lower than it was when these sizes were thought up. For one, due to the CDS, which is usually on. Then, due to the reduced waste in metaspace. See measurements. This may matter for low-consumption JVMs which do class unloading - since you would never hit the threshold, you may never clean up, unless the GC runs due to other reasons. But I admit that I do not have a good base upon which to decide which size is right. Once could say "you must be able to load as many classes as you did when this limit was thought up" which would mean we could lower the limit much more aggressivly than I did. Or we could say "dont shake the boat" and leave the limits as they are. I tried to find a middle way. However, it also makes sense to keep the limits as they are. Then this patch is truly just a cleanup. I now set the 32bit limit to 16M, which is what we did have before; the 64bit threshold is set to 21M, which is 0.25M more than before, just to have a nice round number. > > Thanks, > StefanK Thanks, Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/2675 From thomas.stuefe at gmail.com Tue Feb 23 10:03:17 2021 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Tue, 23 Feb 2021 11:03:17 +0100 Subject: Disallow C-Heap allocations from within dynamic C++ initialization? Message-ID: Hi, I currently investigate how NMT could be made "late-initializable" - so, not to rely anymore on the awkward combination of the NMT environment variable and command line argument. For details see: https://bugs.openjdk.java.net/browse/JDK-8256844 There are some difficult problems when letting NMT late-initialize, all stemming from the fact that allocations can happen before argument parsing happens and NMT is initialized. These problems are solvable, and I have several approaches, but they make NMT more complicated, which I dislike. The easiest approach would be to simply disallow early C-Heap allocations. So lets say I do this: - move (a part of) NMT initialization very close to the start of Thread::create_vm() - disallow and rewrite all code which does C-Heap allocation earlier (during dynamic C++ initialization) Would that be an acceptable and maintainable stance? Since that would mean that we disallow global C++ objects which do C-Heap allocation in their constructors somewhere. I did a quick test, and I think that this is doable and would not affect too much code. Most of which can be rewritten to use explicit initialization. Personally I think explicit initialization is cleaner than relying on dynamic C++ initialization anyway since it makes the order of initialization more predictable. Wrt to NMT, this also would have other advantages, e.g. we could allocate certain NMT structures only if NMT is on which are now unconditionally allocated (eg MallocSiteTable). What are your opinions? Thanks a lot, Thomas From dongbo at openjdk.java.net Tue Feb 23 10:21:02 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Tue, 23 Feb 2021 10:21:02 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v9] In-Reply-To: References: Message-ID: > In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero, > see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`: > /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */ > public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT); > > The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64, > assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 0`, but `ushr dst.4H, src.4H, 16` instead. > According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long. > ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i); > vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i); > > The legal right shift amount should be in the range 1 to the element width in bits on aarch64: > https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en > > This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate. > Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests. Dong Bo has updated the pull request incrementally with two additional commits since the last revision: - whitespace - Rebase tests so that sshr/ushr/sshl/ssra/usra are accessed in one jtreg case, assembly print verified. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2472/files - new: https://git.openjdk.java.net/jdk/pull/2472/files/ba8dc5ac..e2dc7b83 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=08 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=07-08 Stats: 903 lines in 2 files changed: 320 ins; 387 del; 196 mod Patch: https://git.openjdk.java.net/jdk/pull/2472.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2472/head:pull/2472 PR: https://git.openjdk.java.net/jdk/pull/2472 From dongbo at openjdk.java.net Tue Feb 23 10:31:45 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Tue, 23 Feb 2021 10:31:45 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v9] In-Reply-To: <8kMxMFAYtb0B-yUVEt-HLfhji3Gj-gog8OHvWW_tKfw=.f7c9422b-3574-4c31-9489-7286ee98332f@github.com> References: <8kMxMFAYtb0B-yUVEt-HLfhji3Gj-gog8OHvWW_tKfw=.f7c9422b-3574-4c31-9489-7286ee98332f@github.com> Message-ID: On Tue, 9 Feb 2021 07:50:45 GMT, Ningsheng Jian wrote: >> Dong Bo has updated the pull request incrementally with two additional commits since the last revision: >> >> - whitespace >> - Rebase tests so that sshr/ushr/sshl/ssra/usra are accessed in one jtreg case, assembly print verified. > > Thanks for the fix. Hi, @theRealAph I've rebased the tests so that sshr/ushr/sshl/ssra/usra are accessed in one jtreg `test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java`. Local tests by manually injected error shows all instructions are covered by the jtreg case. Suggestions? ------------- PR: https://git.openjdk.java.net/jdk/pull/2472 From fweimer at redhat.com Tue Feb 23 15:20:30 2021 From: fweimer at redhat.com (Florian Weimer) Date: Tue, 23 Feb 2021 16:20:30 +0100 Subject: Disallow C-Heap allocations from within dynamic C++ initialization? In-Reply-To: ("Thomas =?utf-8?Q?St=C3=BCfe=22's?= message of "Tue, 23 Feb 2021 11:03:17 +0100") References: Message-ID: <87h7m2x135.fsf@oldenburg.str.redhat.com> * Thomas St?fe: > The easiest approach would be to simply disallow early C-Heap allocations. > So lets say I do this: > > - move (a part of) NMT initialization very close to the start of > Thread::create_vm() > - disallow and rewrite all code which does C-Heap allocation earlier > (during dynamic C++ initialization) > > Would that be an acceptable and maintainable stance? Since that would mean > that we disallow global C++ objects which do C-Heap allocation in their > constructors somewhere. What is a C-Heap allocation in this context? With glibc, initializing a C++ object which has a non-trivial destructor can call malloc. Thanks, Florian From thomas.stuefe at gmail.com Tue Feb 23 15:32:51 2021 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Tue, 23 Feb 2021 16:32:51 +0100 Subject: Disallow C-Heap allocations from within dynamic C++ initialization? In-Reply-To: <87h7m2x135.fsf@oldenburg.str.redhat.com> References: <87h7m2x135.fsf@oldenburg.str.redhat.com> Message-ID: Hi Florian, On Tue, Feb 23, 2021 at 4:23 PM Florian Weimer wrote: > * Thomas St?fe: > > > The easiest approach would be to simply disallow early C-Heap > allocations. > > So lets say I do this: > > > > - move (a part of) NMT initialization very close to the start of > > Thread::create_vm() > > - disallow and rewrite all code which does C-Heap allocation earlier > > (during dynamic C++ initialization) > > > > Would that be an acceptable and maintainable stance? Since that would > mean > > that we disallow global C++ objects which do C-Heap allocation in their > > constructors somewhere. > > What is a C-Heap allocation in this context? > > Any call going through os::malloc(). > With glibc, initializing a C++ object which has a non-trivial destructor > can call malloc. > > Thanks, > Florian > Thanks, Thomas From github.com+42899633+eastig at openjdk.java.net Tue Feb 23 16:01:40 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Tue, 23 Feb 2021 16:01:40 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v6] In-Reply-To: References: Message-ID: On Sun, 21 Feb 2021 02:07:59 GMT, Xin Liu wrote: >> Add a flag _suppress_cr to outputStream. outstream objects won't emit any CR if it's set. >> Correct TypeInstPtr::dump2 to make sure it only emits klass name once. >> Remove the comment because Klass::oop_print_on() has emitted the address of oop. >> >> Before: >> 689 ConP === 0 [[ 821 ]] Oop:java/lang/Stringjava.lang.String >> {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' >> - string: "a" >> :Constant:exact * >> >> After: >> 689 ConP === 0 [[ 821 ]] Oop:java.lang.String {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' - string: "a":Constant:exact * > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set > > fix build failures on Windows. StringUtils::tr_delete returns size_of. Changes requested by eastig at github.com (no known OpenJDK username). src/hotspot/share/opto/type.cpp line 4056: > 4054: StringUtils::tr_delete(buf, "\n"); > 4055: st->print_raw(buf); > 4056: os::free(buf); There is no need to use `os::strdup` because `as_string` creates a copy. I've looked in stringUtils.cpp and found `replace_no_expand`. The code can be rewritten: char *buf = ss.as_string(); StringUtils::replace_no_expand(buf, "\n", ""); st->print_raw(buf); With this code, `tr_delete` is redundant. ------------- PR: https://git.openjdk.java.net/jdk/pull/2178 From aph at openjdk.java.net Tue Feb 23 17:16:41 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 23 Feb 2021 17:16:41 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v9] In-Reply-To: References: <8kMxMFAYtb0B-yUVEt-HLfhji3Gj-gog8OHvWW_tKfw=.f7c9422b-3574-4c31-9489-7286ee98332f@github.com> Message-ID: On Tue, 23 Feb 2021 10:29:05 GMT, Dong Bo wrote: > Hi, @theRealAph > > I've rebased the tests so that sshr/ushr/sshl/ssra/usra are accessed in one jtreg `test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java`. > Local tests by manually injected error shows all instructions are covered by the jtreg case. Suggestions? I'm not seeing ```sra``` used anywhere. The problem I see with the tests is that the methods are large. This causes C2 to do a lot of spilling. Also, because the resuling code is intertwined and complex, it's very hard to debug. It would be far better to do something like this: void long_shift_add(long arrLongs[][], LongVector vba, LongVector vbb, int i) { vba.add(vbb.lanewise(VectorOperators.LSHR, 37)).intoArray(arrLongs[op], 2 * i); vba.add(vbb.lanewise(VectorOperators.LSHR, 64)).intoArray(arrLongs[op + 1], 2 * i); vba.add(vbb.lanewise(VectorOperators.LSHR, 99)).intoArray(arrLongs[op + 2], 2 * i); vba.add(vbb.lanewise(VectorOperators.LSHR, 128)).intoArray(arrLongs[op + 3], 2 * i); vba.add(vbb.lanewise(VectorOperators.LSHR, 157)).intoArray(arrLongs[op + 4], 2 * i); vba.add(vbb.lanewise(VectorOperators.LSHR, 192)).intoArray(arrLongs[op + 5], 2 * i); } ------------- PR: https://git.openjdk.java.net/jdk/pull/2472 From xliu at openjdk.java.net Tue Feb 23 19:32:42 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 23 Feb 2021 19:32:42 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v6] In-Reply-To: References: Message-ID: <-6o8O3AATyO7_tSm4zM4ztzGvjcQgCvFbk0NE7J9yJQ=.a452f241-9103-4408-b853-4befdb110358@github.com> On Tue, 23 Feb 2021 15:59:03 GMT, Evgeny Astigeevich wrote: >> Xin Liu has updated the pull request incrementally with one additional commit since the last revision: >> >> 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set >> >> fix build failures on Windows. StringUtils::tr_delete returns size_of. > > src/hotspot/share/opto/type.cpp line 4056: > >> 4054: StringUtils::tr_delete(buf, "\n"); >> 4055: st->print_raw(buf); >> 4056: os::free(buf); > > There is no need to use `os::strdup` because `as_string` creates a copy. > I've looked in stringUtils.cpp and found `replace_no_expand`. The code can be rewritten: > char *buf = ss.as_string(); > StringUtils::replace_no_expand(buf, "\n", ""); > st->print_raw(buf); > With this code, `tr_delete` is redundant. oh, thanks for the head-up. I'm happy to remove os::strdup and os::free pair. it seems that replace_no_expand is cumbersome to do what tr_delete does. let me see how it works. ------------- PR: https://git.openjdk.java.net/jdk/pull/2178 From github.com+42899633+eastig at openjdk.java.net Tue Feb 23 21:11:39 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Tue, 23 Feb 2021 21:11:39 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v6] In-Reply-To: <-6o8O3AATyO7_tSm4zM4ztzGvjcQgCvFbk0NE7J9yJQ=.a452f241-9103-4408-b853-4befdb110358@github.com> References: <-6o8O3AATyO7_tSm4zM4ztzGvjcQgCvFbk0NE7J9yJQ=.a452f241-9103-4408-b853-4befdb110358@github.com> Message-ID: <_bbShtIislmHUrEyims35ijjL_jtvIdI3BgwSR-ZdD0=.d88d8516-d838-4c1b-9432-578836a32a40@github.com> On Tue, 23 Feb 2021 19:30:00 GMT, Xin Liu wrote: > oh, thanks for the head-up. I'm happy to remove os::strdup and os::free pair. > it seems that replace_no_expand is cumbersome to do what tr_delete does. let me see how it works. I don't see why it is cumbersome. IMHO, it is logically consistent: replace substring with an empty string without expanding the buffer. The main value is the amount of written code. ------------- PR: https://git.openjdk.java.net/jdk/pull/2178 From xliu at openjdk.java.net Tue Feb 23 23:12:56 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 23 Feb 2021 23:12:56 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v7] In-Reply-To: References: Message-ID: > Add a flag _suppress_cr to outputStream. outstream objects won't emit any CR if it's set. > Correct TypeInstPtr::dump2 to make sure it only emits klass name once. > Remove the comment because Klass::oop_print_on() has emitted the address of oop. > > Before: > 689 ConP === 0 [[ 821 ]] Oop:java/lang/Stringjava.lang.String > {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' > - string: "a" > :Constant:exact * > > After: > 689 ConP === 0 [[ 821 ]] Oop:java.lang.String {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' - string: "a":Constant:exact * Xin Liu has updated the pull request incrementally with one additional commit since the last revision: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set use the existing api StringUtils::replace_no_expand to archive the same replace. don't need to invoke os::strdup because stringStream::as_string() has duplicated the internal buffer. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2178/files - new: https://git.openjdk.java.net/jdk/pull/2178/files/077f9b60..6df63fe1 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2178&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2178&range=05-06 Stats: 83 lines in 5 files changed: 0 ins; 75 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/2178.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2178/head:pull/2178 PR: https://git.openjdk.java.net/jdk/pull/2178 From xliu at openjdk.java.net Tue Feb 23 23:18:41 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 23 Feb 2021 23:18:41 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v6] In-Reply-To: <_bbShtIislmHUrEyims35ijjL_jtvIdI3BgwSR-ZdD0=.d88d8516-d838-4c1b-9432-578836a32a40@github.com> References: <-6o8O3AATyO7_tSm4zM4ztzGvjcQgCvFbk0NE7J9yJQ=.a452f241-9103-4408-b853-4befdb110358@github.com> <_bbShtIislmHUrEyims35ijjL_jtvIdI3BgwSR-ZdD0=.d88d8516-d838-4c1b-9432-578836a32a40@github.com> Message-ID: On Tue, 23 Feb 2021 21:09:15 GMT, Evgeny Astigeevich wrote: >> oh, thanks for the head-up. I'm happy to remove os::strdup and os::free pair. >> it seems that replace_no_expand is cumbersome to do what tr_delete does. let me see how it works. > >> oh, thanks for the head-up. I'm happy to remove os::strdup and os::free pair. >> it seems that replace_no_expand is cumbersome to do what tr_delete does. let me see how it works. > > I don't see why it is cumbersome. IMHO, it is logically consistent: replace substring with an empty string without expanding the buffer. The main value is the amount of written code. oh, by means "cumbersome", I just felt that it's easier to sweeping chars than substrings in my case. but I has verified `replace_no_expand(buf, "\n", "")` has the same effect. I took you advice. less code is less chance to make mistake. Updated this PR. I also verified that it has the same results for `-XX:+Verbose -XX:+PrintIdeal`. ------------- PR: https://git.openjdk.java.net/jdk/pull/2178 From thartmann at openjdk.java.net Wed Feb 24 06:56:41 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 24 Feb 2021 06:56:41 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v7] In-Reply-To: References: Message-ID: On Tue, 23 Feb 2021 23:12:56 GMT, Xin Liu wrote: >> Add a flag _suppress_cr to outputStream. outstream objects won't emit any CR if it's set. >> Correct TypeInstPtr::dump2 to make sure it only emits klass name once. >> Remove the comment because Klass::oop_print_on() has emitted the address of oop. >> >> Before: >> 689 ConP === 0 [[ 821 ]] Oop:java/lang/Stringjava.lang.String >> {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' >> - string: "a" >> :Constant:exact * >> >> After: >> 689 ConP === 0 [[ 821 ]] Oop:java.lang.String {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' - string: "a":Constant:exact * > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set > > use the existing api StringUtils::replace_no_expand to archive the same replace. > don't need to invoke os::strdup because stringStream::as_string() has duplicated > the internal buffer. Changes requested by thartmann (Reviewer). src/hotspot/share/opto/type.cpp line 4052: > 4050: > 4051: { > 4052: ResourceMark rm; Shouldn't the `ResourceMark` go to before `stringStream ss` which is a `ResourceObj` as well? Also, please add a small comment explaining that this code suppresses the new line emitted by `print_oop`. src/hotspot/share/opto/type.cpp line 4040: > 4038: // Dump oop Type > 4039: #ifndef PRODUCT > 4040: void TypeInstPtr::dump2( Dict &d, uint depth, outputStream* st ) const { While you are at it, please also remove the excess whitespace after `(` and before `)`. ------------- PR: https://git.openjdk.java.net/jdk/pull/2178 From dongbo at openjdk.java.net Wed Feb 24 07:27:03 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Wed, 24 Feb 2021 07:27:03 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v10] In-Reply-To: References: Message-ID: > In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero, > see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`: > /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */ > public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT); > > The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64, > assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 0`, but `ushr dst.4H, src.4H, 16` instead. > According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long. > ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i); > vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i); > > The legal right shift amount should be in the range 1 to the element width in bits on aarch64: > https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en > > This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate. > Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests. Dong Bo has updated the pull request incrementally with one additional commit since the last revision: update tests as suggestions ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2472/files - new: https://git.openjdk.java.net/jdk/pull/2472/files/e2dc7b83..9290f27e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=09 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=08-09 Stats: 465 lines in 1 file changed: 159 ins; 187 del; 119 mod Patch: https://git.openjdk.java.net/jdk/pull/2472.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2472/head:pull/2472 PR: https://git.openjdk.java.net/jdk/pull/2472 From dongbo at openjdk.java.net Wed Feb 24 07:33:40 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Wed, 24 Feb 2021 07:33:40 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v10] In-Reply-To: References: <8kMxMFAYtb0B-yUVEt-HLfhji3Gj-gog8OHvWW_tKfw=.f7c9422b-3574-4c31-9489-7286ee98332f@github.com> Message-ID: On Tue, 23 Feb 2021 17:13:35 GMT, Andrew Haley wrote: > > Local tests by manually injected error shows all instructions are covered by the jtreg case. Suggestions? > > I'm not seeing `sra` used anywhere. > > The problem I see with the tests is that the methods are large. This causes C2 to do a lot of spilling. Also, because the resuling code is intertwined and complex, it's very hard to debug. > > It would be far better to do something like this: > > ``` > void long_shift_add(long arrLongs[][], LongVector vba, LongVector vbb, int i) { > vba.add(vbb.lanewise(VectorOperators.LSHR, 37)).intoArray(arrLongs[op], 2 * i); > vba.add(vbb.lanewise(VectorOperators.LSHR, 64)).intoArray(arrLongs[op + 1], 2 * i); > vba.add(vbb.lanewise(VectorOperators.LSHR, 99)).intoArray(arrLongs[op + 2], 2 * i); > vba.add(vbb.lanewise(VectorOperators.LSHR, 128)).intoArray(arrLongs[op + 3], 2 * i); > vba.add(vbb.lanewise(VectorOperators.LSHR, 157)).intoArray(arrLongs[op + 4], 2 * i); > vba.add(vbb.lanewise(VectorOperators.LSHR, 192)).intoArray(arrLongs[op + 5], 2 * i); > } > ``` Weird, I took a look at the the assembly, `ssra` did accessed by the tests on our server: $ ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java --add-modules jdk.incubator.vector -XX:CompileThreshold=1000 -XX:CompileCommand=print,compiler/vectorapi/TestVectorShiftImm.shift_* test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java 64 &> assembly_vlen64.txt $ ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java --add-modules jdk.incubator.vector -XX:CompileThreshold=1000 -XX:CompileCommand=print,compiler/vectorapi/TestVectorShiftImm.shift_* test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java 128 &> assembly_vlen128.txt $ cat assembly_vlen*.txt | grep "ssra" 02c0 ssra V18, V17, #37 # vector (2D) 02c8 ssra V19, V17, #0 # vector (2D) 02d0 ssra V20, V17, #35 # vector (2D) 0308 ssra V18, V17, #29 # vector (2D) 0644 ssra V18, V17, #37 # vector (2D) 064c ssra V19, V17, #0 # vector (2D) 0654 ssra V20, V17, #35 # vector (2D) 0674 ssra V18, V17, #29 # vector (2D) 0798 ssra V18, V17, #37 # vector (2D) 07a0 ssra V19, V17, #0 # vector (2D) 07a8 ssra V20, V17, #35 # vector (2D) 07e0 ssra V18, V17, #29 # vector (2D) 0x0000ffff83f7e500: ssra v18.2d, v17.2d, #37 ;*aload_0 {reexecute=0 rethrow=0 return_oop=0} 0x0000ffff83f7e510: ssra v20.2d, v17.2d, #35 ;*iand {reexecute=0 rethrow=0 return_oop=0} 0x0000ffff83f7e548: ssra v18.2d, v17.2d, #29 ;*if_icmpne {reexecute=0 rethrow=0 return_oop=0} 0x0000ffff83f7e884: ssra v18.2d, v17.2d, #37 0x0000ffff83f7e894: ssra v20.2d, v17.2d, #35 0x0000ffff83f7e8b4: ssra v18.2d, v17.2d, #29 0x0000ffff83f7e9d8: ssra v18.2d, v17.2d, #37 ;*invokestatic broadcastInt {reexecute=0 rethrow=0 return_oop=0} 0x0000ffff83f7e9e8: ssra v20.2d, v17.2d, #35 ;*invokestatic broadcastInt {reexecute=0 rethrow=0 return_oop=0} 0x0000ffff83f7ea20: ssra v18.2d, v17.2d, #29 ;*checkcast {reexecute=0 rethrow=0 return_oop=0} 284 ssra V18, V17, #9 # vector (4S) 28c ssra V19, V17, #0 # vector (4S) 294 ssra V20, V17, #15 # vector (4S) 0x0000ffff83f822c4: ssra v18.4s, v17.4s, #9 ;*invokedynamic {reexecute=0 rethrow=0 return_oop=0} 0x0000ffff83f822d4: ssra v20.4s, v17.4s, #15 284 ssra V18, V17, #1 # vector (8H) 28c ssra V19, V17, #8 # vector (8H) ... Also injected error to `sshr+add` by: --- a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp +++ b/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp @@ -545,7 +545,7 @@ public: #define WRAP(INSN) \ void INSN(FloatRegister Vd, SIMD_Arrangement T, FloatRegister Vn, int shift) { \ if (shift == 0) { \ - Assembler::addv(Vd, T, Vd, Vn); \ + Assembler::subv(Vd, T, Vd, Vn); \ } else { \ Assembler::INSN(Vd, T, Vn, shift); \ } \ The `shift+add` tests failed as expected: $ ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java --add-modules jdk.incubator.vector -XX:CompileThreshold=1000 -XX:-TieredCompilation test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java 64 WARNING: Using incubator modules: jdk.incubator.vector warning: using incubating module(s): jdk.incubator.vector 1 warning Exception in thread "main" java.lang.RuntimeException: Test Failed, failed tests: type SHORT index 19, operation ASHR_AND_ACCUMULATE, vector length 64. type SHORT index 21, operation ASHR_AND_ACCUMULATE, vector length 64. type SHORT index 23, operation ASHR_AND_ACCUMULATE, vector length 64. type SHORT index 25, operation LSHR_AND_ACCUMULATE, vector length 64. type SHORT index 27, operation LSHR_AND_ACCUMULATE, vector length 64. type SHORT index 29, operation LSHR_AND_ACCUMULATE, vector length 64. type INTEGER index 19, operation ASHR_AND_ACCUMULATE, vector length 64. type INTEGER index 21, operation ASHR_AND_ACCUMULATE, vector length 64. type INTEGER index 23, operation ASHR_AND_ACCUMULATE, vector length 64. type INTEGER index 25, operation LSHR_AND_ACCUMULATE, vector length 64. type INTEGER index 27, operation LSHR_AND_ACCUMULATE, vector length 64. type INTEGER index 29, operation LSHR_AND_ACCUMULATE, vector length 64. ... $ ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java --add-modules jdk.incubator.vector -XX:CompileThreshold=1000 -XX:-TieredCompilation test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java 128 WARNING: Using incubator modules: jdk.incubator.vector warning: using incubating module(s): jdk.incubator.vector 1 warning Exception in thread "main" java.lang.RuntimeException: Test Failed, failed tests: type LONG index 49, operation ASHR_AND_ACCUMULATE, vector length 128. type LONG index 51, operation ASHR_AND_ACCUMULATE, vector length 128. type LONG index 53, operation ASHR_AND_ACCUMULATE, vector length 128. type LONG index 55, operation LSHR_AND_ACCUMULATE, vector length 128. type LONG index 57, operation LSHR_AND_ACCUMULATE, vector length 128. type LONG index 59, operation LSHR_AND_ACCUMULATE, vector length 128. type SHORT index 49, operation ASHR_AND_ACCUMULATE, vector length 128. type SHORT index 51, operation ASHR_AND_ACCUMULATE, vector length 128. type SHORT index 53, operation ASHR_AND_ACCUMULATE, vector length 128. ... Anyway, I extracted operations you suggested into `shift_op_*` methods. Performed the error-injected experiments with the new tests on Kunpeng916 and re-checked the assembly output, results looks good. The test command I used to run the newest tests are: $ ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java --add-modules jdk.incubator.vector -XX:-TieredCompilation -XX:CompileThreshold=1000 -Dvlen=64 -XX:CompileCommand=print,compiler/vectorapi/TestVectorShiftImm.shift_* test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java &> assembly_vlen64.txt $ ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java --add-modules jdk.incubator.vector -XX:-TieredCompilation -XX:CompileThreshold=1000 -Dvlen=128 -XX:CompileCommand=print,compiler/vectorapi/TestVectorShiftImm.shift_* test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java &> assembly_vlen128.txt $ cat assembly_vlen64.txt | grep ssra; cat assembly_vlen128.txt | grep ssra ------------- PR: https://git.openjdk.java.net/jdk/pull/2472 From stuefe at openjdk.java.net Wed Feb 24 08:25:50 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 24 Feb 2021 08:25:50 GMT Subject: RFR: JDK-8256155: os::Linux Populate all large_page_sizes, select smallest page size in reserve_memory_special_huge_tlbfs* [v16] In-Reply-To: References: Message-ID: On Wed, 24 Feb 2021 08:12:52 GMT, Thomas Stuefe wrote: >> Marcus G K Williams has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 25 commits: >> >> - Merge branch 'master' into pull/1153 >> - kstefanj update >> >> Signed-off-by: Marcus G K Williams >> - Merge branch 'master' into update_hlp >> - Merge branch 'master' into update_hlp >> - Remove extraneous ' from warning >> >> Signed-off-by: Marcus G K Williams >> - Merge branch 'master' into update_hlp >> - Merge branch 'master' into update_hlp >> - Merge branch 'master' into update_hlp >> - Fix os::large_page_size() in last update >> >> Signed-off-by: Marcus G K Williams >> - Ivan W. Requested Changes >> >> Removed os::Linux::select_large_page_size and >> use os::page_size_for_region instead >> >> Removed Linux::find_large_page_size and use >> register_large_page_sizes. Streamlined >> Linux::setup_large_page_size >> >> Signed-off-by: Marcus G K Williams >> - ... and 15 more: https://git.openjdk.java.net/jdk/compare/f4cfd758...f2e44ac7 > > src/hotspot/os/linux/os_linux.cpp line 3670: > >> 3668: // If we can't open /sys/kernel/mm/hugepages >> 3669: // Add _default_large_page_size to _page_sizes >> 3670: _page_sizes.add(_default_large_page_size); > > missing return here. But see my general remarks. I would modify this function to not change outside state at all, just to return the found page sizes in a os::PageSizes object. ------------- PR: https://git.openjdk.java.net/jdk/pull/1153 From stuefe at openjdk.java.net Wed Feb 24 08:25:48 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 24 Feb 2021 08:25:48 GMT Subject: RFR: JDK-8256155: os::Linux Populate all large_page_sizes, select smallest page size in reserve_memory_special_huge_tlbfs* [v16] In-Reply-To: References: Message-ID: On Tue, 16 Feb 2021 16:32:56 GMT, Marcus G K Williams wrote: >> When using LargePageSizeInBytes=1G, os::Linux::reserve_memory_special_huge_tlbfs* cannot select large pages smaller than 1G. Code heap usually uses less than 1G, so currently the code precludes code heap from using >> Large pages in this circumstance and when os::Linux::reserve_memory_special_huge_tlbfs* is called page sizes fall back to Linux::page_size() (usually 4k). >> >> This change allows the above use case by populating all large_page_sizes present in /sys/kernel/mm/hugepages in _page_sizes upon calling os::Linux::setup_large_page_size(). >> >> In os::Linux::reserve_memory_special_huge_tlbfs* we then select the largest large page size available in _page_sizes that is smaller than bytes being reserved. > > Marcus G K Williams has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 25 commits: > > - Merge branch 'master' into pull/1153 > - kstefanj update > > Signed-off-by: Marcus G K Williams > - Merge branch 'master' into update_hlp > - Merge branch 'master' into update_hlp > - Remove extraneous ' from warning > > Signed-off-by: Marcus G K Williams > - Merge branch 'master' into update_hlp > - Merge branch 'master' into update_hlp > - Merge branch 'master' into update_hlp > - Fix os::large_page_size() in last update > > Signed-off-by: Marcus G K Williams > - Ivan W. Requested Changes > > Removed os::Linux::select_large_page_size and > use os::page_size_for_region instead > > Removed Linux::find_large_page_size and use > register_large_page_sizes. Streamlined > Linux::setup_large_page_size > > Signed-off-by: Marcus G K Williams > - ... and 15 more: https://git.openjdk.java.net/jdk/compare/f4cfd758...f2e44ac7 Hi Markus, Many apologies for letting this cook too long, the last months have been hectic. I looked closer at the code today, at least the initialization parts, and have some suggestions and remarks. Will look at the runtime side later. A lot of my remarks will be referring to pre-existing code without me pointing it out each time, just know that I am aware that a lot of that stuff has nothing to do with your work. I propose some simplifications and streamlining with initialization. Main point would be to clearly separate getting information from the OS from post-processing (consistency checks and decisions), in addition to a bit clearer naming. --- We have `find_default_large_page_size()` and `register_large_page_sizes()`. The names could be a bit clearer, and I do not think they should be known outside of this file, so I would propose to redefine them to be local convenience functions which just scan the proc fs and do not change outside state, just return values, like this: - `static size_t scan_default_large_page_size();` - `static os::PageSizes scan_multiple_page_support();` (naming is lent from vm/hugetlbpage.txt) --- Today, in `find_default_large_page_size()`, if we have no default huge page configured, currently we return a hard coded default: https://github.com/openjdk/jdk/blob/f2e44ac726bad2e7db1ec9f5e77703a99ccfb683/src/hotspot/os/linux/os_linux.cpp#L3627-L3636 I am not sure this makes sense. The kernel documentation states that if this entry does not exist, we cannot use huge pages. I would consider removing this and just return 0 in that case. The point is that these low level convenience functions should read OS information and not make up stuff. Making up stuff should be done, if at all, in the caller. --- When consistency checking and post-processing what we got from the OS, note that there are slight inconsistencies (preexisting) how we handle things: - we gracefully handle the non-existence of /sys/kernel/mm/hugepages - or if it exists, the fact that the default page size may be missing from it - by transparently adding the default huge page size to os::_page_sizes. - But if the user specifies UseLargePageSize in bytes, overwriting the default large page size, we now require the multi page size to be present in os::_page_sizes. So in that case, /sys/kernel/mm/hugepages had to be present. I mean, either we trust /sys/kernel/mm/hugepages, or we don't. We happily make up page sizes in find_default_large_page_size(), but here we check rather strictly. It makes sense to check the user input for validity, but then, could we not always just require /sys/kernel/mm/hugepages to be present and consistent with /proc/meminfo? --- I am not sure of the usefulness of `os::Linux::setup_large_page_size()`. Its just a thin wrapper. I would remove it and merge it directly into `os::large_page_init()`, which would be easier to understand. So, `os::large_page_init()` could look like this: void os::large_page_init() { // 1) Handle the case where we do not want to use huge pages and hence // there is no need to scan the OS for related info if (!UseLargePages && !UseTransparentHugePages && !UseHugeTLBFS && !UseSHM) { // Not using large pages. return; } if (!FLAG_IS_DEFAULT(UseLargePages) && !UseLargePages) { // The user explicitly turned off large pages. // Ignore the rest of the large pages flags. UseTransparentHugePages = false; UseHugeTLBFS = false; UseSHM = false; return; } // 2) Scan OS info size_t default_large_page_size = scan_default_large_page_size(); if (default_large_page_size == 0) { // We are done, no large pages configured. UseTransparentHugePages = false; UseHugeTLBFS = false; UseSHM = false; return; } os::PageSizes all_pages = scan_multiple_page_support(); // 3) Consistency check and post-processing // It is unclear if /sys/kernel/mm/hugepages/ and /proc/meminfo could disagree. Manually // re-add the default page size to the list of page sizes to be sure. all_pages.add(default_large_page_size); // Handle LargePageSizeInBytes if (!FLAG_IS_DEFAULT(LargePageSizeInBytes) && LargePageSizeInBytes != _default_large_page_size) { ... blabla default_large_page_size = LargePageSizeInBytes log_info(os)("Overriding default huge page size.."); ... } // Now determine the type of large pages to use: os::Linux::setup_large_page_type() set_coredump_filter(LARGEPAGES_BIT); // Any final logging: logloglog } What do you think? I think this would be a bit easier to read and understand, and we have that clear separation between scanning OS info and deciding what we do with it. Still a small nit is that we let the user override the OS info with LargePageSizeInBytes. I rather would have a variable containing unmodified OS info, and a separate variable for whatever we make up. But thats just a small issue. src/hotspot/os/linux/os_linux.cpp line 3679: > 3677: sscanf(entry->d_name, "hugepages-%zukB", &page_size) == 1) { > 3678: // The kernel is using kB, hotspot uses bytes > 3679: if (page_size * K > (size_t)Linux::page_size()) { I do not think excluding the base page size is needed here. The directory only contains entries for huge pages. If for any weird reason this is the same as the base page size (which I have never seen) I would include it, since its a huge page too. But I do not think this can happen. src/hotspot/os/linux/os_linux.cpp line 3670: > 3668: // If we can't open /sys/kernel/mm/hugepages > 3669: // Add _default_large_page_size to _page_sizes > 3670: _page_sizes.add(_default_large_page_size); missing return here. src/hotspot/os/linux/os_linux.cpp line 3692: > 3690: ls.print("Available page sizes: "); > 3691: _page_sizes.print_on(&ls); > 3692: } Does this work and show something? I know UL is initialization time sensitive (which is annoying btw). ------------- Changes requested by stuefe (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1153 From sjohanss at openjdk.java.net Wed Feb 24 08:40:45 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 24 Feb 2021 08:40:45 GMT Subject: RFR: JDK-8256155: os::Linux Populate all large_page_sizes, select smallest page size in reserve_memory_special_huge_tlbfs* [v16] In-Reply-To: References: Message-ID: On Wed, 24 Feb 2021 08:14:48 GMT, Thomas Stuefe wrote: >> Marcus G K Williams has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 25 commits: >> >> - Merge branch 'master' into pull/1153 >> - kstefanj update >> >> Signed-off-by: Marcus G K Williams >> - Merge branch 'master' into update_hlp >> - Merge branch 'master' into update_hlp >> - Remove extraneous ' from warning >> >> Signed-off-by: Marcus G K Williams >> - Merge branch 'master' into update_hlp >> - Merge branch 'master' into update_hlp >> - Merge branch 'master' into update_hlp >> - Fix os::large_page_size() in last update >> >> Signed-off-by: Marcus G K Williams >> - Ivan W. Requested Changes >> >> Removed os::Linux::select_large_page_size and >> use os::page_size_for_region instead >> >> Removed Linux::find_large_page_size and use >> register_large_page_sizes. Streamlined >> Linux::setup_large_page_size >> >> Signed-off-by: Marcus G K Williams >> - ... and 15 more: https://git.openjdk.java.net/jdk/compare/f4cfd758...f2e44ac7 > > src/hotspot/os/linux/os_linux.cpp line 3692: > >> 3690: ls.print("Available page sizes: "); >> 3691: _page_sizes.print_on(&ls); >> 3692: } > > Does this work and show something? I know UL is initialization time sensitive (which is annoying btw). This comes from I comment I made and UL is initialized here. Not sure this is exactly where this should end up since it will only be printed when large pages are enabled. I think it might make sense to move somewhere else or make it a completely separate change. ------------- PR: https://git.openjdk.java.net/jdk/pull/1153 From sjohanss at openjdk.java.net Wed Feb 24 09:00:43 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 24 Feb 2021 09:00:43 GMT Subject: RFR: JDK-8256155: os::Linux Populate all large_page_sizes, select smallest page size in reserve_memory_special_huge_tlbfs* [v16] In-Reply-To: References: Message-ID: On Wed, 24 Feb 2021 08:23:13 GMT, Thomas Stuefe wrote: > What do you think? I think this would be a bit easier to read and understand, and we have that clear separation between scanning OS info and deciding what we do with it. > I think what you propose Thomas looks good. One additional thing to keep in mind and think about here is how we should do the "sanity checking" when allowing multiple large page sizes. I think the best thing would be to sanity check all and if none succeeds disable `UseLargePages`. > Still a small nit is that we let the user override the OS info with LargePageSizeInBytes. I rather would have a variable containing unmodified OS info, and a separate variable for whatever we make up. But thats just a small issue. I think we need to rethink exactly what `LargePageSizeInBytes` means when allowing multiple large page sizes. I've poked around in this area quite a bit lately and I'm not sure this flag is needed when we scan for available page sizes. But to allow it to go away we would have to change the APIs a bit to start passing down the page size we want to use for a certain mapping rather than using `os::large_page_size()` to get the page size. ------------- PR: https://git.openjdk.java.net/jdk/pull/1153 From xliu at openjdk.java.net Wed Feb 24 09:50:04 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Wed, 24 Feb 2021 09:50:04 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v8] In-Reply-To: References: Message-ID: > Add a flag _suppress_cr to outputStream. outstream objects won't emit any CR if it's set. > Correct TypeInstPtr::dump2 to make sure it only emits klass name once. > Remove the comment because Klass::oop_print_on() has emitted the address of oop. > > Before: > 689 ConP === 0 [[ 821 ]] Oop:java/lang/Stringjava.lang.String > {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' > - string: "a" > :Constant:exact * > > After: > 689 ConP === 0 [[ 821 ]] Oop:java.lang.String {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' - string: "a":Constant:exact * Xin Liu has updated the pull request incrementally with one additional commit since the last revision: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set add comments and hoist ResourceMark ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2178/files - new: https://git.openjdk.java.net/jdk/pull/2178/files/6df63fe1..aeff9ecc Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2178&range=07 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2178&range=06-07 Stats: 10 lines in 1 file changed: 1 ins; 1 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/2178.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2178/head:pull/2178 PR: https://git.openjdk.java.net/jdk/pull/2178 From xliu at openjdk.java.net Wed Feb 24 09:50:06 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Wed, 24 Feb 2021 09:50:06 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v7] In-Reply-To: References: Message-ID: <8TpdqWzAgwVycyBRzDWioTkDyoQ56DjsSmlaSf6Aqu0=.dd797e69-fed2-42ba-b2e6-22de2e83312d@github.com> On Wed, 24 Feb 2021 06:50:51 GMT, Tobias Hartmann wrote: >> Xin Liu has updated the pull request incrementally with one additional commit since the last revision: >> >> 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set >> >> use the existing api StringUtils::replace_no_expand to archive the same replace. >> don't need to invoke os::strdup because stringStream::as_string() has duplicated >> the internal buffer. > > src/hotspot/share/opto/type.cpp line 4052: > >> 4050: >> 4051: { >> 4052: ResourceMark rm; > > Shouldn't the `ResourceMark` go to before `stringStream ss` which is a `ResourceObj` as well? Also, please add a small comment explaining that this code suppresses the new line emitted by `print_oop`. hi, @TobiHartmann Thank you for reviewing this PR. stringStream allocates its dynamic buffer using `NEW_C_HEAP_ARRAY`. IMHO, it's okay without a ResourceMark. Unlike `stringStream::grow`, stringStream::as_string(false) does use `NEW_RESOURCE_ARRAY`, which allocates an array on current thread's resource_area. That's why I put ResourceMark in a syntax scope. Actually, the current code still works even without that ResourceMark. It's because `Type::dump_on()` has declared a rm. Let me hoist ResourceMark as you said. That makes code straight-forward locally and I shouldn't assume its context. > src/hotspot/share/opto/type.cpp line 4040: > >> 4038: // Dump oop Type >> 4039: #ifndef PRODUCT >> 4040: void TypeInstPtr::dump2( Dict &d, uint depth, outputStream* st ) const { > > While you are at it, please also remove the excess whitespace after `(` and before `)`. done. ------------- PR: https://git.openjdk.java.net/jdk/pull/2178 From thartmann at openjdk.java.net Wed Feb 24 10:08:41 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 24 Feb 2021 10:08:41 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v8] In-Reply-To: References: Message-ID: <3aukTZwD74tFD4GptC7mbqAnVCghVEK0PJJiz88opXI=.59055500-e950-4744-84c5-a4fd3695a858@github.com> On Wed, 24 Feb 2021 09:50:04 GMT, Xin Liu wrote: >> Add a flag _suppress_cr to outputStream. outstream objects won't emit any CR if it's set. >> Correct TypeInstPtr::dump2 to make sure it only emits klass name once. >> Remove the comment because Klass::oop_print_on() has emitted the address of oop. >> >> Before: >> 689 ConP === 0 [[ 821 ]] Oop:java/lang/Stringjava.lang.String >> {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' >> - string: "a" >> :Constant:exact * >> >> After: >> 689 ConP === 0 [[ 821 ]] Oop:java.lang.String {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' - string: "a":Constant:exact * > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set > > add comments and hoist ResourceMark That looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2178 From github.com+42899633+eastig at openjdk.java.net Wed Feb 24 12:13:43 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Wed, 24 Feb 2021 12:13:43 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v8] In-Reply-To: References: Message-ID: <9QwSVEA-wKGBPSwUX0MSDAXGLLjCaSmLjtEyKshvdGI=.e9c410e8-5495-4ceb-affc-4590ebcd84d4@github.com> On Wed, 24 Feb 2021 09:50:04 GMT, Xin Liu wrote: >> Add a flag _suppress_cr to outputStream. outstream objects won't emit any CR if it's set. >> Correct TypeInstPtr::dump2 to make sure it only emits klass name once. >> Remove the comment because Klass::oop_print_on() has emitted the address of oop. >> >> Before: >> 689 ConP === 0 [[ 821 ]] Oop:java/lang/Stringjava.lang.String >> {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' >> - string: "a" >> :Constant:exact * >> >> After: >> 689 ConP === 0 [[ 821 ]] Oop:java.lang.String {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' - string: "a":Constant:exact * > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set > > add comments and hoist ResourceMark Changes requested by eastig at github.com (no known OpenJDK username). src/hotspot/share/opto/type.cpp line 4053: > 4051: const_oop()->print_oop(&ss); > 4052: // suppress new-lines('\n') in ss emitted by const_oop->print_oop() > 4053: // so each node is one-liner for -XX:+Verbose && -XX:+PrintIdeal What about rewriting the comment in clearer way: // 'const_oop->print_oop()' emits new-lines('\n') into ss. // For -XX:+Verbose && -XX:+PrintIdeal, new-lines('\n') must be removed from // the ss created string to have a node per line. test/hotspot/gtest/utilities/test_ostream.cpp line 66: > 64: > 65: static size_t count_char(const stringStream* ss, char ch) { > 66: return count_char(ss->as_string(), ss->size(), ch); Am I correct `std:count` is not allowed? No need to use `as_string`: `return count_char(ss->base(), ss->size(), ch);` Or as `stringStream` is always zero-terminated: `return count_char(ss->base(), ch);` test/hotspot/gtest/utilities/test_ostream.cpp line 72: > 70: ResourceMark rm; > 71: size_t whitespaces = count_char(ss, ' '); > 72: char* s2 = ss->as_string(false); No need of `false` because `false` is the default value of `as_string`. If you want to be explicit here, I recommend: `char* s2 = ss->as_string(/* c_heap= */ false);` test/hotspot/gtest/utilities/test_ostream.cpp line 63: > 61: } > 62: return cnt; > 63: } As the function is only used for zero-terminated strings, maybe it makes sense to use this property: static size_t count_char(const char* s, char ch) { size_t cnt = 0; while (*s != '\0') { if (*s++ == ch) { ++cnt; } } return cnt; } test/hotspot/gtest/utilities/test_ostream.cpp line 69: > 67: } > 68: > 69: static void test_stringStream_tr_delete(stringStream* ss) { I think this is a unit test for `StringUtils::replace_no_expand`. It checks that the function can be used to remove substrings. There is no dependency on `stringStream`. Any string can be used. Could you please move the test to `test_stringUtils.cpp`? ------------- PR: https://git.openjdk.java.net/jdk/pull/2178 From zgu at openjdk.java.net Wed Feb 24 14:40:39 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 24 Feb 2021 14:40:39 GMT Subject: RFR: JDK-8261520: JDK-8261302 breaks runtime/NMT/CheckForProperDetailStackTrace.java In-Reply-To: <4pPyMfRC1i30Q_MxXBo8QE_RYmKd3CYfzWr2M4K-c5w=.e8d85c7a-e4c9-42f2-9bce-55600c0e0ec9@github.com> References: <4pPyMfRC1i30Q_MxXBo8QE_RYmKd3CYfzWr2M4K-c5w=.e8d85c7a-e4c9-42f2-9bce-55600c0e0ec9@github.com> Message-ID: On Mon, 22 Feb 2021 08:48:53 GMT, Thomas Stuefe wrote: > Since JDK-8261302, the test runtime/NMT/CheckForProperDetailStackTrace.java fails with > java.lang.RuntimeException: 'NativeCallStack::NativeCallStack' found in stdout > > -- > > `NativeCallStack` contains a hash code. Before JDK-8261302, that hash code was calculated lazily in a non-inline hashcode getter. With JDK-8261302, the hash code calculation was moved into the `NativeCallStack` constructor and the getter was made inline. > > The `NativeCallStack` constructor fills itself via `os::get_native_stack()`. Before JDK-8261302, that call has been the last call in the constructor and hence had been sometimes optimized into a tail call. Whether or not its a tail call matters since it affects the number of stack frames the stack walker has to skip. Therefore, the constructor contains coding to predict tail-call-ness: > > #if (defined(_NMT_NOINLINE_) || defined(_WINDOWS) || !defined(_LP64) || defined(PPC64)) > // Not a tail call. > toSkip++; > #if (defined(_NMT_NOINLINE_) && defined(BSD) && defined(_LP64)) > // Mac OS X slowdebug builds have this odd behavior where NativeCallStack::NativeCallStack > // appears as two frames, so we need to skip an extra frame. > toSkip++; > #endif // Special-case for BSD. > #endif // Not a tail call. > > This prediction was now off since the hash code calculation happened at the end of the callstack. This causes the test error, since on some platforms (eg Linux x64) we now think we have a tail call when we don't, which means we do not skip enough frames, and the NMT output contains call frames like "NativeCallStack::NativeCallStack()", which trips the test. > > ----------- > > Fix: > > This fix moves the hash code calculation completely out of NativeCallStack. There is no reason why NativeCallStack should have a hash code. It mainly exists as a convenience to place it in a hash map. The patch moves the hash code calculation up into MallocSiteTableEntry. > > This has the advantage of only having to pay for a hash code when you need it - in theory, one may use NativeCallStack in places other than NMT, where it is unnecessary. > > I considered other options: > - modify `os::get_native_stack()` to also calculate a hash in addition to capturing the stack, and return it in a caller provided variable. That would have left this call to be the tail call. However, it seemed less clean - we have two implementations of this function, as well as other, non-capturing, NativeCallStack constructors, which would have to be modified. It also would have made `os::get_native_stack()` less general purpose. > - Leave it as it is and just always skip frames: Seemed attractive, but I did not want to touch the tailcode-prediction-code and play whack-the-mole with platform specific test errors. > > --------------- > > Tests: GA, manual test, nightlies at SAP Marked as reviewed by zgu (Reviewer). src/hotspot/share/services/mallocSiteTable.hpp line 60: > 58: private: > 59: MallocSite _malloc_site; > 60: const unsigned _hash; Prefer "unsigned int" to be consist with other places. Otherwise, Looks good to me. ------------- PR: https://git.openjdk.java.net/jdk/pull/2672 From sjohanss at openjdk.java.net Wed Feb 24 15:01:42 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 24 Feb 2021 15:01:42 GMT Subject: RFR: JDK-8256155: os::Linux Populate all large_page_sizes, select smallest page size in reserve_memory_special_huge_tlbfs* In-Reply-To: References: Message-ID: On Wed, 11 Nov 2020 01:48:46 GMT, Marcus G K Williams wrote: > When using LargePageSizeInBytes=1G, os::Linux::reserve_memory_special_huge_tlbfs* cannot select large pages smaller than 1G. Code heap usually uses less than 1G, so currently the code precludes code heap from using > Large pages in this circumstance and when os::Linux::reserve_memory_special_huge_tlbfs* is called page sizes fall back to Linux::page_size() (usually 4k). > > This change allows the above use case by populating all large_page_sizes present in /sys/kernel/mm/hugepages in _page_sizes upon calling os::Linux::setup_large_page_size(). > > In os::Linux::reserve_memory_special_huge_tlbfs* we then select the largest large page size available in _page_sizes that is smaller than bytes being reserved. @mgkwill, I've been doing some measurements trying to see what kind of improvements to expect from backing the code-cache with 2m pages and the heap with 1g pages. Can you share what benchmarks you've used when analyzing the performance of this change and also what kind of setups you've used (heap-size, code-cache size, etc). ------------- PR: https://git.openjdk.java.net/jdk/pull/1153 From stuefe at openjdk.java.net Wed Feb 24 15:11:41 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 24 Feb 2021 15:11:41 GMT Subject: RFR: JDK-8256155: os::Linux Populate all large_page_sizes, select smallest page size in reserve_memory_special_huge_tlbfs* [v16] In-Reply-To: References: Message-ID: <5Hmhp7S8616Kfbdsu5ObzFNy2uUFgJPCp0kvHr-U310=.3cabbe74-fe65-436b-973d-d6f3e64cd743@github.com> On Wed, 24 Feb 2021 08:57:29 GMT, Stefan Johansson wrote: > > What do you think? I think this would be a bit easier to read and understand, and we have that clear separation between scanning OS info and deciding what we do with it. > > I think what you propose Thomas looks good. One additional thing to keep in mind and think about here is how we should do the "sanity checking" when allowing multiple large page sizes. I think the best thing would be to sanity check all and if none succeeds disable `UseLargePages`. Oh, sure. I made this not explicit but implied this under "post processing and deciding". Presumably in the context of setup_large_page_type(). > > > Still a small nit is that we let the user override the OS info with LargePageSizeInBytes. I rather would have a variable containing unmodified OS info, and a separate variable for whatever we make up. But thats just a small issue. > > I think we need to rethink exactly what `LargePageSizeInBytes` means when allowing multiple large page sizes. I've poked around in this area quite a bit lately and I'm not sure this flag is needed when we scan for available page sizes. But to allow it to go away we would have to change the APIs a bit to start passing down the page size we want to use for a certain mapping rather than using `os::large_page_size()` to get the page size. If we could do without this flag this would be fine for me too. But how would you let the user specify that the VM is to use a different default page size than is set on system level? ------------- PR: https://git.openjdk.java.net/jdk/pull/1153 From stuefe at openjdk.java.net Wed Feb 24 15:18:08 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 24 Feb 2021 15:18:08 GMT Subject: RFR: JDK-8261520: JDK-8261302 breaks runtime/NMT/CheckForProperDetailStackTrace.java [v2] In-Reply-To: <4pPyMfRC1i30Q_MxXBo8QE_RYmKd3CYfzWr2M4K-c5w=.e8d85c7a-e4c9-42f2-9bce-55600c0e0ec9@github.com> References: <4pPyMfRC1i30Q_MxXBo8QE_RYmKd3CYfzWr2M4K-c5w=.e8d85c7a-e4c9-42f2-9bce-55600c0e0ec9@github.com> Message-ID: > Since JDK-8261302, the test runtime/NMT/CheckForProperDetailStackTrace.java fails with > java.lang.RuntimeException: 'NativeCallStack::NativeCallStack' found in stdout > > -- > > `NativeCallStack` contains a hash code. Before JDK-8261302, that hash code was calculated lazily in a non-inline hashcode getter. With JDK-8261302, the hash code calculation was moved into the `NativeCallStack` constructor and the getter was made inline. > > The `NativeCallStack` constructor fills itself via `os::get_native_stack()`. Before JDK-8261302, that call has been the last call in the constructor and hence had been sometimes optimized into a tail call. Whether or not its a tail call matters since it affects the number of stack frames the stack walker has to skip. Therefore, the constructor contains coding to predict tail-call-ness: > > #if (defined(_NMT_NOINLINE_) || defined(_WINDOWS) || !defined(_LP64) || defined(PPC64)) > // Not a tail call. > toSkip++; > #if (defined(_NMT_NOINLINE_) && defined(BSD) && defined(_LP64)) > // Mac OS X slowdebug builds have this odd behavior where NativeCallStack::NativeCallStack > // appears as two frames, so we need to skip an extra frame. > toSkip++; > #endif // Special-case for BSD. > #endif // Not a tail call. > > This prediction was now off since the hash code calculation happened at the end of the callstack. This causes the test error, since on some platforms (eg Linux x64) we now think we have a tail call when we don't, which means we do not skip enough frames, and the NMT output contains call frames like "NativeCallStack::NativeCallStack()", which trips the test. > > ----------- > > Fix: > > This fix moves the hash code calculation completely out of NativeCallStack. There is no reason why NativeCallStack should have a hash code. It mainly exists as a convenience to place it in a hash map. The patch moves the hash code calculation up into MallocSiteTableEntry. > > This has the advantage of only having to pay for a hash code when you need it - in theory, one may use NativeCallStack in places other than NMT, where it is unnecessary. > > I considered other options: > - modify `os::get_native_stack()` to also calculate a hash in addition to capturing the stack, and return it in a caller provided variable. That would have left this call to be the tail call. However, it seemed less clean - we have two implementations of this function, as well as other, non-capturing, NativeCallStack constructors, which would have to be modified. It also would have made `os::get_native_stack()` less general purpose. > - Leave it as it is and just always skip frames: Seemed attractive, but I did not want to touch the tailcode-prediction-code and play whack-the-mole with platform specific test errors. > > --------------- > > Tests: GA, manual test, nightlies at SAP Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: unsigned->unsigned int ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2672/files - new: https://git.openjdk.java.net/jdk/pull/2672/files/78115d72..a8fc1c99 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2672&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2672&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2672.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2672/head:pull/2672 PR: https://git.openjdk.java.net/jdk/pull/2672 From stuefe at openjdk.java.net Wed Feb 24 15:18:09 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 24 Feb 2021 15:18:09 GMT Subject: RFR: JDK-8261520: JDK-8261302 breaks runtime/NMT/CheckForProperDetailStackTrace.java [v2] In-Reply-To: References: <4pPyMfRC1i30Q_MxXBo8QE_RYmKd3CYfzWr2M4K-c5w=.e8d85c7a-e4c9-42f2-9bce-55600c0e0ec9@github.com> Message-ID: On Wed, 24 Feb 2021 14:38:11 GMT, Zhengyu Gu wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> unsigned->unsigned int > > src/hotspot/share/services/mallocSiteTable.hpp line 60: > >> 58: private: >> 59: MallocSite _malloc_site; >> 60: const unsigned _hash; > > Prefer "unsigned int" to be consist with other places. Otherwise, Looks good to me. Thank you Zhengyu! I reverted to unsigned int. Using just unsigned is a bad habit of mine, sorry. Cheers, Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/2672 From sjohanss at openjdk.java.net Wed Feb 24 16:02:43 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 24 Feb 2021 16:02:43 GMT Subject: RFR: JDK-8256155: os::Linux Populate all large_page_sizes, select smallest page size in reserve_memory_special_huge_tlbfs* [v16] In-Reply-To: <5Hmhp7S8616Kfbdsu5ObzFNy2uUFgJPCp0kvHr-U310=.3cabbe74-fe65-436b-973d-d6f3e64cd743@github.com> References: <5Hmhp7S8616Kfbdsu5ObzFNy2uUFgJPCp0kvHr-U310=.3cabbe74-fe65-436b-973d-d6f3e64cd743@github.com> Message-ID: On Wed, 24 Feb 2021 15:09:15 GMT, Thomas Stuefe wrote: > > > What do you think? I think this would be a bit easier to read and understand, and we have that clear separation between scanning OS info and deciding what we do with it. > > > > > > I think what you propose Thomas looks good. One additional thing to keep in mind and think about here is how we should do the "sanity checking" when allowing multiple large page sizes. I think the best thing would be to sanity check all and if none succeeds disable `UseLargePages`. > > Oh, sure. I made this not explicit but implied this under "post processing and deciding". Presumably in the context of setup_large_page_type(). > Sure, got that, just wanted to highlight that we need to figure out how to handle the sanity check for multiple sizes. Should a size that fail the sanity check be removed from the `_page_sizes` member. Maybe `_page_sizes` should include all page sizes, and then we have an additional member for "useable large page sizes". As I said, not sure how to best handle this. > > > Still a small nit is that we let the user override the OS info with LargePageSizeInBytes. I rather would have a variable containing unmodified OS info, and a separate variable for whatever we make up. But thats just a small issue. > > > > > > I think we need to rethink exactly what `LargePageSizeInBytes` means when allowing multiple large page sizes. I've poked around in this area quite a bit lately and I'm not sure this flag is needed when we scan for available page sizes. But to allow it to go away we would have to change the APIs a bit to start passing down the page size we want to use for a certain mapping rather than using `os::large_page_size()` to get the page size. > > If we could do without this flag this would be fine for me too. But how would you let the user specify that the VM is to use a different default page size than is set on system level? I agree, it's not obvious how to make this work in a good way. But using the `os::page_size_for_region*` functions in the upper layers to request a page size could be one solution. But we probably need to have a way to change the "default" value for some cases. Another thing to think about/discuss is what should be done if a reservation-request within the VM for 4G with 1G pages fail, should we fall straight back to 4k page, should we try 2M page or possible fail hard to show something is probably wrong with the config. ------------- PR: https://git.openjdk.java.net/jdk/pull/1153 From hseigel at openjdk.java.net Wed Feb 24 19:37:58 2021 From: hseigel at openjdk.java.net (Harold Seigel) Date: Wed, 24 Feb 2021 19:37:58 GMT Subject: RFR: 8262227: Change SystemDictionary::find() to return an InstanceKlass*. Message-ID: Please review this fix for JDK-8262227. This fix changes SystemDictionary::find() to return an InstanceKlass* to reduce InstanceKlass casts, it renames find() to find_instance_klass(), removes its unneeded TRAP parameter, and changes its callers as appropriate. It also changed the get_java_...() methods, in thread.cpp, to take an InstanceKlass* parameter and removed their now unneeded TRAPS parameter. The fix was tested with mach5 tiers 1 and 2 on Linux, Windows, and Mac OS, and tiers 3-5 on Linux x64 (still in progress). Thanks to Coleen and David for their helpful suggestions. Thanks, Harold ------------- Commit messages: - 8262227: Change SystemDictionary::find() to return an InstanceKlass*. Changes: https://git.openjdk.java.net/jdk/pull/2712/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2712&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8262227 Stats: 90 lines in 13 files changed: 6 ins; 19 del; 65 mod Patch: https://git.openjdk.java.net/jdk/pull/2712.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2712/head:pull/2712 PR: https://git.openjdk.java.net/jdk/pull/2712 From stuefe at openjdk.java.net Wed Feb 24 19:56:04 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 24 Feb 2021 19:56:04 GMT Subject: RFR: JDK-8261520: JDK-8261302 breaks runtime/NMT/CheckForProperDetailStackTrace.java [v3] In-Reply-To: <4pPyMfRC1i30Q_MxXBo8QE_RYmKd3CYfzWr2M4K-c5w=.e8d85c7a-e4c9-42f2-9bce-55600c0e0ec9@github.com> References: <4pPyMfRC1i30Q_MxXBo8QE_RYmKd3CYfzWr2M4K-c5w=.e8d85c7a-e4c9-42f2-9bce-55600c0e0ec9@github.com> Message-ID: > Since JDK-8261302, the test runtime/NMT/CheckForProperDetailStackTrace.java fails with > java.lang.RuntimeException: 'NativeCallStack::NativeCallStack' found in stdout > > -- > > `NativeCallStack` contains a hash code. Before JDK-8261302, that hash code was calculated lazily in a non-inline hashcode getter. With JDK-8261302, the hash code calculation was moved into the `NativeCallStack` constructor and the getter was made inline. > > The `NativeCallStack` constructor fills itself via `os::get_native_stack()`. Before JDK-8261302, that call has been the last call in the constructor and hence had been sometimes optimized into a tail call. Whether or not its a tail call matters since it affects the number of stack frames the stack walker has to skip. Therefore, the constructor contains coding to predict tail-call-ness: > > #if (defined(_NMT_NOINLINE_) || defined(_WINDOWS) || !defined(_LP64) || defined(PPC64)) > // Not a tail call. > toSkip++; > #if (defined(_NMT_NOINLINE_) && defined(BSD) && defined(_LP64)) > // Mac OS X slowdebug builds have this odd behavior where NativeCallStack::NativeCallStack > // appears as two frames, so we need to skip an extra frame. > toSkip++; > #endif // Special-case for BSD. > #endif // Not a tail call. > > This prediction was now off since the hash code calculation happened at the end of the callstack. This causes the test error, since on some platforms (eg Linux x64) we now think we have a tail call when we don't, which means we do not skip enough frames, and the NMT output contains call frames like "NativeCallStack::NativeCallStack()", which trips the test. > > ----------- > > Fix: > > This fix moves the hash code calculation completely out of NativeCallStack. There is no reason why NativeCallStack should have a hash code. It mainly exists as a convenience to place it in a hash map. The patch moves the hash code calculation up into MallocSiteTableEntry. > > This has the advantage of only having to pay for a hash code when you need it - in theory, one may use NativeCallStack in places other than NMT, where it is unnecessary. > > I considered other options: > - modify `os::get_native_stack()` to also calculate a hash in addition to capturing the stack, and return it in a caller provided variable. That would have left this call to be the tail call. However, it seemed less clean - we have two implementations of this function, as well as other, non-capturing, NativeCallStack constructors, which would have to be modified. It also would have made `os::get_native_stack()` less general purpose. > - Leave it as it is and just always skip frames: Seemed attractive, but I did not want to touch the tailcode-prediction-code and play whack-the-mole with platform specific test errors. > > --------------- > > Tests: GA, manual test, nightlies at SAP Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge - unsigned->unsigned int - Initial ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2672/files - new: https://git.openjdk.java.net/jdk/pull/2672/files/a8fc1c99..499eff5e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2672&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2672&range=01-02 Stats: 8464 lines in 266 files changed: 5219 ins; 1453 del; 1792 mod Patch: https://git.openjdk.java.net/jdk/pull/2672.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2672/head:pull/2672 PR: https://git.openjdk.java.net/jdk/pull/2672 From iklam at openjdk.java.net Wed Feb 24 20:17:46 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 24 Feb 2021 20:17:46 GMT Subject: RFR: 8262227: Change SystemDictionary::find() to return an InstanceKlass*. In-Reply-To: References: Message-ID: On Wed, 24 Feb 2021 19:32:29 GMT, Harold Seigel wrote: > Please review this fix for JDK-8262227. This fix changes SystemDictionary::find() to return an InstanceKlass* to reduce InstanceKlass casts, it renames find() to find_instance_klass(), removes its unneeded TRAP parameter, and changes its callers as appropriate. > > It also changed the get_java_...() methods, in thread.cpp, to take an InstanceKlass* parameter and removed their now unneeded TRAPS parameter. > > The fix was tested with mach5 tiers 1 and 2 on Linux, Windows, and Mac OS, and tiers 3-5 on Linux x64 (still in progress). > > Thanks to Coleen and David for their helpful suggestions. > > Thanks, Harold Changes requested by iklam (Reviewer). src/hotspot/share/ci/ciEnv.cpp line 450: > 448: kls = SystemDictionary::find_constrained_instance_or_array_klass(sym, loader, > 449: CHECK_AND_CLEAR_(fail_type)); > 450: } else { I think SystemDictionary::find_constrained_instance_or_array_klass can also be changed to accept a `Thread*` instead `TRAPS`, since now it can no longer throw, and the thread is used only for Mutexes. src/hotspot/share/runtime/thread.cpp line 3017: > 3015: InstanceKlass* ik = SystemDictionary::find_instance_klass(vmSymbols::java_lang_VersionProps(), > 3016: Handle(), Handle()); > 3017: JDK_Version::set_java_version(get_java_version(ik)); The various get_java_xxx() functions all seem to do the same thing. I am wondering if they can be combined into a single utility function, so that you can do something like: JDK_Version::set_runtime_name(get_version_info(ik, vmSymbols::java_version_name(), java_version, sizeof(java_version))); ------------- PR: https://git.openjdk.java.net/jdk/pull/2712 From mdoerr at openjdk.java.net Wed Feb 24 22:38:40 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Wed, 24 Feb 2021 22:38:40 GMT Subject: RFR: 8261447: MethodInvocationCounters frequently run into overflow [v7] In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 18:22:01 GMT, Lutz Schmidt wrote: >> Dear community, >> may I please request reviews for this fix, improving the usefulness of method invocation counters. >> - aggregation counters are retyped as uint64_t, shifting the overflow probability way out (185 days in case of a 1 GHz counter update frequency). >> - counters for individual methods are interpreted as (unsigned int), in contrast to their declaration as int. This gives us a factor of two before the counters overflow. >> - as a special case, "compiled_invocation_counter" is retyped as long, because it has a higher update frequency than other counters. >> - before/after sample output is attached to the bug description. >> >> Thank you! >> Lutz > > Lutz Schmidt has updated the pull request incrementally with one additional commit since the last revision: > > update copyright year This version looks ok. I understand that you don't want to clean up the whole singed / unsigned mess. That's fine with me. I'd only like to see one confusing comment removed or replaced. You may also want to check your 64 bit overflow time in the description: I guess 185 days matches a 1 THz counter update frequency. With 1 GHz it should be above 500 years. src/hotspot/share/runtime/java.cpp line 100: > 98: int compare_methods(Method** a, Method** b) { > 99: // invocation_count() may have overflowed already. Interpret it's result as > 100: // unsigned int to shift the limit of meaningless results by a factor of 2. Code is fine, but this comment doesn't make sense to me. The result is the same with your version. But it has the advantage that it avoids signed integer overflow (undefined behavior). ------------- Changes requested by mdoerr (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2511 From dholmes at openjdk.java.net Wed Feb 24 23:12:40 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 24 Feb 2021 23:12:40 GMT Subject: RFR: 8262227: Change SystemDictionary::find() to return an InstanceKlass*. In-Reply-To: References: Message-ID: <5ybu-yKWRLwcbLAPdLw9yaI8CLQqWT8tT9WP33ZKTGw=.2564ac42-8c31-45bf-8855-27cc3153edd5@github.com> On Wed, 24 Feb 2021 19:32:29 GMT, Harold Seigel wrote: > Please review this fix for JDK-8262227. This fix changes SystemDictionary::find() to return an InstanceKlass* to reduce InstanceKlass casts, it renames find() to find_instance_klass(), removes its unneeded TRAP parameter, and changes its callers as appropriate. > > It also changed the get_java_...() methods, in thread.cpp, to take an InstanceKlass* parameter and removed their now unneeded TRAPS parameter. > > The fix was tested with mach5 tiers 1 and 2 on Linux, Windows, and Mac OS, and tiers 3-5 on Linux x64 (still in progress). > > Thanks to Coleen and David for their helpful suggestions. > > Thanks, Harold Nice cleanup Harold! I'm more pleased with the TRAPS removal than the actual type change :) Ref my comment in the bug report I'd overlooked that just because the return type of find changed, it didn't mean you have to change the type of the variable it was assigned to! Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2712 From dholmes at openjdk.java.net Wed Feb 24 23:12:42 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 24 Feb 2021 23:12:42 GMT Subject: RFR: 8262227: Change SystemDictionary::find() to return an InstanceKlass*. In-Reply-To: References: Message-ID: <0MnhqJfTfBUyz0NXic5cqxFofg0Gv6nz5ZI3bacBRIs=.4531f2de-4d5a-484d-a800-c4b77c4a40f9@github.com> On Wed, 24 Feb 2021 20:14:02 GMT, Ioi Lam wrote: >> Please review this fix for JDK-8262227. This fix changes SystemDictionary::find() to return an InstanceKlass* to reduce InstanceKlass casts, it renames find() to find_instance_klass(), removes its unneeded TRAP parameter, and changes its callers as appropriate. >> >> It also changed the get_java_...() methods, in thread.cpp, to take an InstanceKlass* parameter and removed their now unneeded TRAPS parameter. >> >> The fix was tested with mach5 tiers 1 and 2 on Linux, Windows, and Mac OS, and tiers 3-5 on Linux x64 (still in progress). >> >> Thanks to Coleen and David for their helpful suggestions. >> >> Thanks, Harold > > src/hotspot/share/runtime/thread.cpp line 3017: > >> 3015: InstanceKlass* ik = SystemDictionary::find_instance_klass(vmSymbols::java_lang_VersionProps(), >> 3016: Handle(), Handle()); >> 3017: JDK_Version::set_java_version(get_java_version(ik)); > > The various get_java_xxx() functions all seem to do the same thing. I am wondering if they can be combined into a single utility function, so that you can do something like: > > JDK_Version::set_runtime_name(get_version_info(ik, vmSymbols::java_version_name(), > java_version, sizeof(java_version))); I think the get_java_* set of functions could be streamlined so that you pass in the symbol for the field you need. Also perhaps VersionProps could be a well-known class so we don't have to look it up. But this seems a future RFE. Even the current changes seem a little out-of-scope for this change. ------------- PR: https://git.openjdk.java.net/jdk/pull/2712 From coleenp at openjdk.java.net Wed Feb 24 23:35:39 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 24 Feb 2021 23:35:39 GMT Subject: RFR: 8262227: Change SystemDictionary::find() to return an InstanceKlass*. In-Reply-To: References: Message-ID: <1dCMxrf3l5CGsqyOJqwdOf87lsW-Wg9EBiF5wREn6ZI=.e428d942-91c1-40fd-85ca-c6d4264e8aae@github.com> On Wed, 24 Feb 2021 19:32:29 GMT, Harold Seigel wrote: > Please review this fix for JDK-8262227. This fix changes SystemDictionary::find() to return an InstanceKlass* to reduce InstanceKlass casts, it renames find() to find_instance_klass(), removes its unneeded TRAP parameter, and changes its callers as appropriate. > > It also changed the get_java_...() methods, in thread.cpp, to take an InstanceKlass* parameter and removed their now unneeded TRAPS parameter. > > The fix was tested with mach5 tiers 1 and 2 on Linux, Windows, and Mac OS, and tiers 3-5 on Linux x64 (still in progress). > > Thanks to Coleen and David for their helpful suggestions. > > Thanks, Harold This is more of a cleanup than I thought it'd be. Very nice! ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2712 From iklam at openjdk.java.net Thu Feb 25 00:31:09 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 25 Feb 2021 00:31:09 GMT Subject: RFR: 8261868: Reduce inclusion of metaspace.hpp [v3] In-Reply-To: References: Message-ID: > metaspace.hpp is included by about 770 out of 1000 HotSpot .o files. Most of these are transitively included via array.hpp and classLoaderData.hpp. > > - classLoaderData.hpp doesn't actually need metaspace.hpp. > - array.hpp can be refactored to put a function that depends on metaspace.hpp into array.inline.hpp > > Doing the above reduces the number of .o files that include metaspace.hpp to 343. Since this is still a significant number, we should split out the rarely used classes (such as `MetaspaceGC` and `MetaspaceUtils`) into a new header file (metaspaceUtils.hpp, which is included only 30 times). > > Also, these 3 includes can now be removed from metaspace.hpp. > > #include "memory/memRegion.hpp" > #include "memory/metaspaceChunkFreeListSummary.hpp" > #include "memory/virtualspace.hpp" > > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. > > (I also fixed an unrelated comment in archiveUtils.cpp when I was scanning for the word "Metaspace" in the source files -- the function `MetaspaceShared::commit_to()` no longer exists). Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - fixed copyright - Merge branch 'master' of https://github.com/openjdk/jdk into 8261868-reduce-inclusion-of-metaspace.hpp - @calvinccheung review comments - fixed ppc/s390 builds - step2 - step 1 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2599/files - new: https://git.openjdk.java.net/jdk/pull/2599/files/700d1c16..b5b5a935 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2599&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2599&range=01-02 Stats: 15191 lines in 386 files changed: 10589 ins; 2402 del; 2200 mod Patch: https://git.openjdk.java.net/jdk/pull/2599.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2599/head:pull/2599 PR: https://git.openjdk.java.net/jdk/pull/2599 From dongbo at openjdk.java.net Thu Feb 25 01:47:40 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Thu, 25 Feb 2021 01:47:40 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v10] In-Reply-To: References: <8kMxMFAYtb0B-yUVEt-HLfhji3Gj-gog8OHvWW_tKfw=.f7c9422b-3574-4c31-9489-7286ee98332f@github.com> Message-ID: <9rtmEMrsPaA73FDA-KB7H0S0CRdBePGwnI5FcDY-OLI=.425249e2-b590-4a16-b9b8-8d7b5ecd2800@github.com> On Wed, 24 Feb 2021 07:31:14 GMT, Dong Bo wrote: >>> Hi, @theRealAph >>> >>> I've rebased the tests so that sshr/ushr/sshl/ssra/usra are accessed in one jtreg `test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java`. >>> Local tests by manually injected error shows all instructions are covered by the jtreg case. Suggestions? >> >> I'm not seeing ```sra``` used anywhere. >> >> The problem I see with the tests is that the methods are large. This causes C2 to do a lot of spilling. Also, because the resuling code is intertwined and complex, it's very hard to debug. >> >> It would be far better to do something like this: >> void long_shift_add(long arrLongs[][], LongVector vba, LongVector vbb, int i) { >> vba.add(vbb.lanewise(VectorOperators.LSHR, 37)).intoArray(arrLongs[op], 2 * i); >> vba.add(vbb.lanewise(VectorOperators.LSHR, 64)).intoArray(arrLongs[op + 1], 2 * i); >> vba.add(vbb.lanewise(VectorOperators.LSHR, 99)).intoArray(arrLongs[op + 2], 2 * i); >> vba.add(vbb.lanewise(VectorOperators.LSHR, 128)).intoArray(arrLongs[op + 3], 2 * i); >> vba.add(vbb.lanewise(VectorOperators.LSHR, 157)).intoArray(arrLongs[op + 4], 2 * i); >> vba.add(vbb.lanewise(VectorOperators.LSHR, 192)).intoArray(arrLongs[op + 5], 2 * i); >> } > >> > Local tests by manually injected error shows all instructions are covered by the jtreg case. Suggestions? >> >> I'm not seeing `sra` used anywhere. >> >> The problem I see with the tests is that the methods are large. This causes C2 to do a lot of spilling. Also, because the resuling code is intertwined and complex, it's very hard to debug. >> >> It would be far better to do something like this: >> >> ``` >> void long_shift_add(long arrLongs[][], LongVector vba, LongVector vbb, int i) { >> vba.add(vbb.lanewise(VectorOperators.LSHR, 37)).intoArray(arrLongs[op], 2 * i); >> vba.add(vbb.lanewise(VectorOperators.LSHR, 64)).intoArray(arrLongs[op + 1], 2 * i); >> vba.add(vbb.lanewise(VectorOperators.LSHR, 99)).intoArray(arrLongs[op + 2], 2 * i); >> vba.add(vbb.lanewise(VectorOperators.LSHR, 128)).intoArray(arrLongs[op + 3], 2 * i); >> vba.add(vbb.lanewise(VectorOperators.LSHR, 157)).intoArray(arrLongs[op + 4], 2 * i); >> vba.add(vbb.lanewise(VectorOperators.LSHR, 192)).intoArray(arrLongs[op + 5], 2 * i); >> } >> ``` > > > Weird, I took a look at the the assembly, `ssra` did accessed by the tests on our server: > $ ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java --add-modules jdk.incubator.vector -XX:CompileThreshold=1000 -XX:CompileCommand=print,compiler/vectorapi/TestVectorShiftImm.shift_* test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java 64 &> assembly_vlen64.txt > $ ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java --add-modules jdk.incubator.vector -XX:CompileThreshold=1000 -XX:CompileCommand=print,compiler/vectorapi/TestVectorShiftImm.shift_* test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java 128 &> assembly_vlen128.txt > $ cat assembly_vlen*.txt | grep "ssra" > 02c0 ssra V18, V17, #37 # vector (2D) > 02c8 ssra V19, V17, #0 # vector (2D) > 02d0 ssra V20, V17, #35 # vector (2D) > 0308 ssra V18, V17, #29 # vector (2D) > 0644 ssra V18, V17, #37 # vector (2D) > 064c ssra V19, V17, #0 # vector (2D) > 0654 ssra V20, V17, #35 # vector (2D) > 0674 ssra V18, V17, #29 # vector (2D) > 0798 ssra V18, V17, #37 # vector (2D) > 07a0 ssra V19, V17, #0 # vector (2D) > 07a8 ssra V20, V17, #35 # vector (2D) > 07e0 ssra V18, V17, #29 # vector (2D) > 0x0000ffff83f7e500: ssra v18.2d, v17.2d, #37 ;*aload_0 {reexecute=0 rethrow=0 return_oop=0} > 0x0000ffff83f7e510: ssra v20.2d, v17.2d, #35 ;*iand {reexecute=0 rethrow=0 return_oop=0} > 0x0000ffff83f7e548: ssra v18.2d, v17.2d, #29 ;*if_icmpne {reexecute=0 rethrow=0 return_oop=0} > 0x0000ffff83f7e884: ssra v18.2d, v17.2d, #37 > 0x0000ffff83f7e894: ssra v20.2d, v17.2d, #35 > 0x0000ffff83f7e8b4: ssra v18.2d, v17.2d, #29 > 0x0000ffff83f7e9d8: ssra v18.2d, v17.2d, #37 ;*invokestatic broadcastInt {reexecute=0 rethrow=0 return_oop=0} > 0x0000ffff83f7e9e8: ssra v20.2d, v17.2d, #35 ;*invokestatic broadcastInt {reexecute=0 rethrow=0 return_oop=0} > 0x0000ffff83f7ea20: ssra v18.2d, v17.2d, #29 ;*checkcast {reexecute=0 rethrow=0 return_oop=0} > 284 ssra V18, V17, #9 # vector (4S) > 28c ssra V19, V17, #0 # vector (4S) > 294 ssra V20, V17, #15 # vector (4S) > 0x0000ffff83f822c4: ssra v18.4s, v17.4s, #9 ;*invokedynamic {reexecute=0 rethrow=0 return_oop=0} > 0x0000ffff83f822d4: ssra v20.4s, v17.4s, #15 > 284 ssra V18, V17, #1 # vector (8H) > 28c ssra V19, V17, #8 # vector (8H) > ... > > Also injected error to `sshr+add` by: > --- a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp > +++ b/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp > @@ -545,7 +545,7 @@ public: > #define WRAP(INSN) \ > void INSN(FloatRegister Vd, SIMD_Arrangement T, FloatRegister Vn, int shift) { \ > if (shift == 0) { \ > - Assembler::addv(Vd, T, Vd, Vn); \ > + Assembler::subv(Vd, T, Vd, Vn); \ > } else { \ > Assembler::INSN(Vd, T, Vn, shift); \ > } \ > The `shift+add` tests failed as expected: > $ ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java --add-modules jdk.incubator.vector -XX:CompileThreshold=1000 -XX:-TieredCompilation test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java 64 > WARNING: Using incubator modules: jdk.incubator.vector > warning: using incubating module(s): jdk.incubator.vector > 1 warning > Exception in thread "main" java.lang.RuntimeException: Test Failed, failed tests: > type SHORT index 19, operation ASHR_AND_ACCUMULATE, vector length 64. > type SHORT index 21, operation ASHR_AND_ACCUMULATE, vector length 64. > type SHORT index 23, operation ASHR_AND_ACCUMULATE, vector length 64. > type SHORT index 25, operation LSHR_AND_ACCUMULATE, vector length 64. > type SHORT index 27, operation LSHR_AND_ACCUMULATE, vector length 64. > type SHORT index 29, operation LSHR_AND_ACCUMULATE, vector length 64. > type INTEGER index 19, operation ASHR_AND_ACCUMULATE, vector length 64. > type INTEGER index 21, operation ASHR_AND_ACCUMULATE, vector length 64. > type INTEGER index 23, operation ASHR_AND_ACCUMULATE, vector length 64. > type INTEGER index 25, operation LSHR_AND_ACCUMULATE, vector length 64. > type INTEGER index 27, operation LSHR_AND_ACCUMULATE, vector length 64. > type INTEGER index 29, operation LSHR_AND_ACCUMULATE, vector length 64. > ... > $ ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java --add-modules jdk.incubator.vector -XX:CompileThreshold=1000 -XX:-TieredCompilation test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java 128 > WARNING: Using incubator modules: jdk.incubator.vector > warning: using incubating module(s): jdk.incubator.vector > 1 warning > Exception in thread "main" java.lang.RuntimeException: Test Failed, failed tests: > type LONG index 49, operation ASHR_AND_ACCUMULATE, vector length 128. > type LONG index 51, operation ASHR_AND_ACCUMULATE, vector length 128. > type LONG index 53, operation ASHR_AND_ACCUMULATE, vector length 128. > type LONG index 55, operation LSHR_AND_ACCUMULATE, vector length 128. > type LONG index 57, operation LSHR_AND_ACCUMULATE, vector length 128. > type LONG index 59, operation LSHR_AND_ACCUMULATE, vector length 128. > type SHORT index 49, operation ASHR_AND_ACCUMULATE, vector length 128. > type SHORT index 51, operation ASHR_AND_ACCUMULATE, vector length 128. > type SHORT index 53, operation ASHR_AND_ACCUMULATE, vector length 128. > ... > > Anyway, I extracted operations you suggested into `shift_op_*` methods. > Performed the error-injected experiments with the new tests on Kunpeng916 and re-checked the assembly output, results looks good. > > The test command I used to run the newest tests are: > $ ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java --add-modules jdk.incubator.vector -XX:-TieredCompilation -XX:CompileThreshold=1000 -Dvlen=64 -XX:CompileCommand=print,compiler/vectorapi/TestVectorShiftImm.shift_* test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java &> assembly_vlen64.txt > $ ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java --add-modules jdk.incubator.vector -XX:-TieredCompilation -XX:CompileThreshold=1000 -Dvlen=128 -XX:CompileCommand=print,compiler/vectorapi/TestVectorShiftImm.shift_* test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java &> assembly_vlen128.txt > $ cat assembly_vlen64.txt | grep ssra; cat assembly_vlen128.txt | grep ssra > _Mailing list message from [Andrew Haley](mailto:aph at redhat.com) on [hotspot-compiler-dev](mailto:hotspot-compiler-dev at openjdk.java.net):_ > > On 24/02/2021 07:33, Dong Bo wrote: > > > Weird, I took a look at the the assembly, `ssra` did accessed by the tests on our server: > > I don't doubt it, but the test code is so very complex that it can > fall foul of heuristics given slightly changed circumstances. That's > why good test cases are as simple as possible, and allow no room for > variations because they do only one thing. Precise targeting should > be the goal of HotSpot back-end test cases. > Understood, thanks. :-) Does the newest version address the considerations? I extracted the `shift`/`shift+add` operations into separate methods, mostly as suggested in previous comments, something like: static int shift_op_long_ASHR_and_ADD(LongVector vba, LongVector vbb, long arrLongs[][], int end, int ind) { vba.add(vbb.lanewise(VectorOperators.ASHR, 37)).intoArray(arrLongs[end++], ind); vba.add(vbb.lanewise(VectorOperators.ASHR, 64)).intoArray(arrLongs[end++], ind); vba.add(vbb.lanewise(VectorOperators.ASHR, 99)).intoArray(arrLongs[end++], ind); vba.add(vbb.lanewise(VectorOperators.ASHR, 128)).intoArray(arrLongs[end++], ind); vba.add(vbb.lanewise(VectorOperators.ASHR, 157)).intoArray(arrLongs[end++], ind); vba.add(vbb.lanewise(VectorOperators.ASHR, 192)).intoArray(arrLongs[end++], ind); return end; } ------------- PR: https://git.openjdk.java.net/jdk/pull/2472 From iklam at openjdk.java.net Thu Feb 25 04:36:40 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 25 Feb 2021 04:36:40 GMT Subject: RFR: 8261868: Reduce inclusion of metaspace.hpp [v2] In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 07:54:28 GMT, Thomas Stuefe wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed ppc/s390 builds > > Hi Ioi, > > this is very appreciated. > > metaspace.hpp is still a bit of a mess. Its the last holdover for the old metaspace implementation and I always wanted to clean it out a bit. Splitting this header into three is a right step. > > A lot of that stuff may still vanish and/or be reformed if I have the time (eg metaspaceChunkFreeListSummary). > > Assuming this builds and tests fine on all our platform, including the weirder ones, I am fine with this patch. It looks good. > > Thanks, Thomas Thanks @tstuefe and @calvinccheung for the review. I re-tested with the latest repo. ------------- PR: https://git.openjdk.java.net/jdk/pull/2599 From iklam at openjdk.java.net Thu Feb 25 04:36:40 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 25 Feb 2021 04:36:40 GMT Subject: RFR: 8261868: Reduce inclusion of metaspace.hpp [v2] In-Reply-To: References: Message-ID: On Fri, 19 Feb 2021 06:08:09 GMT, Calvin Cheung wrote: > One more file you may consider updating is share/memory/classLoaderMetaspace.cpp. > It now depends on metaspaceUtils.hpp but it includes it transitively via metaspaceTracer.hpp. > The include of metaspace.hpp is not needed because classLoaderMetaspace.hpp includes it. > > Other changes seem good. I added metaspaceUtils.hpp to classLoaderMetaspace.cpp. I kept metaspace.hpp in there -- our convention is to always explicitly include a header file if we use its contents, even if this header file might be transitively included by other headers. ------------- PR: https://git.openjdk.java.net/jdk/pull/2599 From duke at openjdk.java.net Thu Feb 25 04:36:41 2021 From: duke at openjdk.java.net (duke) Date: Thu, 25 Feb 2021 04:36:41 GMT Subject: Withdrawn: 8261868: Reduce inclusion of metaspace.hpp In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 06:12:37 GMT, Ioi Lam wrote: > metaspace.hpp is included by about 770 out of 1000 HotSpot .o files. Most of these are transitively included via array.hpp and classLoaderData.hpp. > > - classLoaderData.hpp doesn't actually need metaspace.hpp. > - array.hpp can be refactored to put a function that depends on metaspace.hpp into array.inline.hpp > > Doing the above reduces the number of .o files that include metaspace.hpp to 343. Since this is still a significant number, we should split out the rarely used classes (such as `MetaspaceGC` and `MetaspaceUtils`) into a new header file (metaspaceUtils.hpp, which is included only 30 times). > > Also, these 3 includes can now be removed from metaspace.hpp. > > #include "memory/memRegion.hpp" > #include "memory/metaspaceChunkFreeListSummary.hpp" > #include "memory/virtualspace.hpp" > > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. > > (I also fixed an unrelated comment in archiveUtils.cpp when I was scanning for the word "Metaspace" in the source files -- the function `MetaspaceShared::commit_to()` no longer exists). This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/2599 From iklam at openjdk.java.net Thu Feb 25 06:12:40 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 25 Feb 2021 06:12:40 GMT Subject: RFR: 8261868: Reduce inclusion of metaspace.hpp [v2] In-Reply-To: References: Message-ID: On Thu, 25 Feb 2021 04:30:42 GMT, Ioi Lam wrote: >> Hi Ioi, >> >> this is very appreciated. >> >> metaspace.hpp is still a bit of a mess. Its the last holdover for the old metaspace implementation and I always wanted to clean it out a bit. Splitting this header into three is a right step. >> >> A lot of that stuff may still vanish and/or be reformed if I have the time (eg metaspaceChunkFreeListSummary). >> >> Assuming this builds and tests fine on all our platform, including the weirder ones, I am fine with this patch. It looks good. >> >> Thanks, Thomas > > Thanks @tstuefe and @calvinccheung for the review. I re-tested with the latest repo. Note: the commit into the openjdk/jdk repo was successful. https://github.com/openjdk/jdk/commit/0f8be6e433b5d30e028558a4bea0659838d6b700 The previous message by the openjdk bot seems to be an error by the bot. ------------- PR: https://git.openjdk.java.net/jdk/pull/2599 From xliu at openjdk.java.net Thu Feb 25 08:43:41 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Thu, 25 Feb 2021 08:43:41 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v8] In-Reply-To: <9QwSVEA-wKGBPSwUX0MSDAXGLLjCaSmLjtEyKshvdGI=.e9c410e8-5495-4ceb-affc-4590ebcd84d4@github.com> References: <9QwSVEA-wKGBPSwUX0MSDAXGLLjCaSmLjtEyKshvdGI=.e9c410e8-5495-4ceb-affc-4590ebcd84d4@github.com> Message-ID: <-2ATme5P-5tNlwomRglS6fjxrvD5tK-bKqMx9N11eZg=.e5e0d183-6872-4972-bb0d-7542b3c0b057@github.com> On Wed, 24 Feb 2021 11:10:46 GMT, Evgeny Astigeevich wrote: >> Xin Liu has updated the pull request incrementally with one additional commit since the last revision: >> >> 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set >> >> add comments and hoist ResourceMark > > test/hotspot/gtest/utilities/test_ostream.cpp line 66: > >> 64: >> 65: static size_t count_char(const stringStream* ss, char ch) { >> 66: return count_char(ss->as_string(), ss->size(), ch); > > Am I correct `std:count` is not allowed? > No need to use `as_string`: `return count_char(ss->base(), ss->size(), ch);` > Or as `stringStream` is always zero-terminated: `return count_char(ss->base(), ch);` I don't think STL is allowed. Make sense. ss->as_string() is not necessary. I don't like the idea we assume ss is always zero-terminated like C-string. There is a member variable _written in class stringStream. Technically speaking, the implementation can avoid from writing '\0' in the end. that's why I would like to use len argument. For me, `count_char(ss->base(), ss->size(), ch)` is more reliable because it depends on interfaces instead of implementation. an interface is supposed to be more stable than implementation. ------------- PR: https://git.openjdk.java.net/jdk/pull/2178 From xliu at openjdk.java.net Thu Feb 25 08:48:40 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Thu, 25 Feb 2021 08:48:40 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v8] In-Reply-To: <9QwSVEA-wKGBPSwUX0MSDAXGLLjCaSmLjtEyKshvdGI=.e9c410e8-5495-4ceb-affc-4590ebcd84d4@github.com> References: <9QwSVEA-wKGBPSwUX0MSDAXGLLjCaSmLjtEyKshvdGI=.e9c410e8-5495-4ceb-affc-4590ebcd84d4@github.com> Message-ID: On Wed, 24 Feb 2021 12:09:23 GMT, Evgeny Astigeevich wrote: >> Xin Liu has updated the pull request incrementally with one additional commit since the last revision: >> >> 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set >> >> add comments and hoist ResourceMark > > test/hotspot/gtest/utilities/test_ostream.cpp line 69: > >> 67: } >> 68: >> 69: static void test_stringStream_tr_delete(stringStream* ss) { > > I think this is a unit test for `StringUtils::replace_no_expand`. It checks that the function can be used to remove substrings. There is no dependency on `stringStream`. Any string can be used. > Could you please move the test to `test_stringUtils.cpp`? yes, let me move it to test_stringUtils.cpp. ------------- PR: https://git.openjdk.java.net/jdk/pull/2178 From xliu at openjdk.java.net Thu Feb 25 08:51:39 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Thu, 25 Feb 2021 08:51:39 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v8] In-Reply-To: References: <9QwSVEA-wKGBPSwUX0MSDAXGLLjCaSmLjtEyKshvdGI=.e9c410e8-5495-4ceb-affc-4590ebcd84d4@github.com> Message-ID: On Thu, 25 Feb 2021 08:46:08 GMT, Xin Liu wrote: >> test/hotspot/gtest/utilities/test_ostream.cpp line 69: >> >>> 67: } >>> 68: >>> 69: static void test_stringStream_tr_delete(stringStream* ss) { >> >> I think this is a unit test for `StringUtils::replace_no_expand`. It checks that the function can be used to remove substrings. There is no dependency on `stringStream`. Any string can be used. >> Could you please move the test to `test_stringUtils.cpp`? > > yes, let me move it to test_stringUtils.cpp. I would like to keep stringStream because I think it's good idea to test the similar scenario. it's also handy to do memory management. ------------- PR: https://git.openjdk.java.net/jdk/pull/2178 From xliu at openjdk.java.net Thu Feb 25 08:56:14 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Thu, 25 Feb 2021 08:56:14 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v9] In-Reply-To: References: Message-ID: > Add a flag _suppress_cr to outputStream. outstream objects won't emit any CR if it's set. > Correct TypeInstPtr::dump2 to make sure it only emits klass name once. > Remove the comment because Klass::oop_print_on() has emitted the address of oop. > > Before: > 689 ConP === 0 [[ 821 ]] Oop:java/lang/Stringjava.lang.String > {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' > - string: "a" > :Constant:exact * > > After: > 689 ConP === 0 [[ 821 ]] Oop:java.lang.String {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' - string: "a":Constant:exact * Xin Liu has updated the pull request incrementally with one additional commit since the last revision: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set update comments based on the review feedbacks. move the unittest to test_stringUtil.cpp. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2178/files - new: https://git.openjdk.java.net/jdk/pull/2178/files/aeff9ecc..2f7ccdb0 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2178&range=08 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2178&range=07-08 Stats: 76 lines in 3 files changed: 41 ins; 30 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/2178.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2178/head:pull/2178 PR: https://git.openjdk.java.net/jdk/pull/2178 From lucy at openjdk.java.net Thu Feb 25 09:01:10 2021 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Thu, 25 Feb 2021 09:01:10 GMT Subject: RFR: 8261447: MethodInvocationCounters frequently run into overflow [v8] In-Reply-To: References: Message-ID: > Dear community, > may I please request reviews for this fix, improving the usefulness of method invocation counters. > - aggregation counters are retyped as uint64_t, shifting the overflow probability way out (> 500 years in case of a 1 GHz counter update frequency). > - counters for individual methods are interpreted as (unsigned int), in contrast to their declaration as int. This gives us a factor of two before the counters overflow. > - as a special case, "compiled_invocation_counter" is retyped as long, because it has a higher update frequency than other counters. > - before/after sample output is attached to the bug description. > > Thank you! > Lutz Lutz Schmidt has updated the pull request incrementally with one additional commit since the last revision: comment changes requested by TheRealMDoerr ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2511/files - new: https://git.openjdk.java.net/jdk/pull/2511/files/0f220ee3..e8af119b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2511&range=07 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2511&range=06-07 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2511.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2511/head:pull/2511 PR: https://git.openjdk.java.net/jdk/pull/2511 From lucy at openjdk.java.net Thu Feb 25 09:01:11 2021 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Thu, 25 Feb 2021 09:01:11 GMT Subject: RFR: 8261447: MethodInvocationCounters frequently run into overflow [v7] In-Reply-To: References: Message-ID: On Wed, 24 Feb 2021 22:28:15 GMT, Martin Doerr wrote: >> Lutz Schmidt has updated the pull request incrementally with one additional commit since the last revision: >> >> update copyright year > > src/hotspot/share/runtime/java.cpp line 100: > >> 98: int compare_methods(Method** a, Method** b) { >> 99: // invocation_count() may have overflowed already. Interpret it's result as >> 100: // unsigned int to shift the limit of meaningless results by a factor of 2. > > Code is fine, but this comment doesn't make sense to me. The result is the same with your version. But it has the advantage that it avoids signed integer overflow (undefined behavior). I agree. The comments could be misleading. They are gone. ------------- PR: https://git.openjdk.java.net/jdk/pull/2511 From mdoerr at openjdk.java.net Thu Feb 25 09:53:41 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 25 Feb 2021 09:53:41 GMT Subject: RFR: 8261447: MethodInvocationCounters frequently run into overflow [v8] In-Reply-To: References: Message-ID: On Thu, 25 Feb 2021 09:01:10 GMT, Lutz Schmidt wrote: >> Dear community, >> may I please request reviews for this fix, improving the usefulness of method invocation counters. >> - aggregation counters are retyped as uint64_t, shifting the overflow probability way out (> 500 years in case of a 1 GHz counter update frequency). >> - counters for individual methods are interpreted as (unsigned int), in contrast to their declaration as int. This gives us a factor of two before the counters overflow. >> - as a special case, "compiled_invocation_counter" is retyped as long, because it has a higher update frequency than other counters. >> - before/after sample output is attached to the bug description. >> >> Thank you! >> Lutz > > Lutz Schmidt has updated the pull request incrementally with one additional commit since the last revision: > > comment changes requested by TheRealMDoerr Marked as reviewed by mdoerr (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2511 From github.com+42899633+eastig at openjdk.java.net Thu Feb 25 11:04:07 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Thu, 25 Feb 2021 11:04:07 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v9] In-Reply-To: References: Message-ID: On Thu, 25 Feb 2021 08:56:14 GMT, Xin Liu wrote: >> Add a flag _suppress_cr to outputStream. outstream objects won't emit any CR if it's set. >> Correct TypeInstPtr::dump2 to make sure it only emits klass name once. >> Remove the comment because Klass::oop_print_on() has emitted the address of oop. >> >> Before: >> 689 ConP === 0 [[ 821 ]] Oop:java/lang/Stringjava.lang.String >> {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' >> - string: "a" >> :Constant:exact * >> >> After: >> 689 ConP === 0 [[ 821 ]] Oop:java.lang.String {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' - string: "a":Constant:exact * > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set > > update comments based on the review feedbacks. > move the unittest to test_stringUtil.cpp. Marked as reviewed by eastig at github.com (no known OpenJDK username). ------------- PR: https://git.openjdk.java.net/jdk/pull/2178 From coleenp at openjdk.java.net Thu Feb 25 13:13:39 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 25 Feb 2021 13:13:39 GMT Subject: RFR: 8262227: Change SystemDictionary::find() to return an InstanceKlass*. In-Reply-To: <0MnhqJfTfBUyz0NXic5cqxFofg0Gv6nz5ZI3bacBRIs=.4531f2de-4d5a-484d-a800-c4b77c4a40f9@github.com> References: <0MnhqJfTfBUyz0NXic5cqxFofg0Gv6nz5ZI3bacBRIs=.4531f2de-4d5a-484d-a800-c4b77c4a40f9@github.com> Message-ID: On Wed, 24 Feb 2021 23:06:18 GMT, David Holmes wrote: >> src/hotspot/share/runtime/thread.cpp line 3017: >> >>> 3015: InstanceKlass* ik = SystemDictionary::find_instance_klass(vmSymbols::java_lang_VersionProps(), >>> 3016: Handle(), Handle()); >>> 3017: JDK_Version::set_java_version(get_java_version(ik)); >> >> The various get_java_xxx() functions all seem to do the same thing. I am wondering if they can be combined into a single utility function, so that you can do something like: >> >> JDK_Version::set_runtime_name(get_version_info(ik, vmSymbols::java_version_name(), >> java_version, sizeof(java_version))); > > I think the get_java_* set of functions could be streamlined so that you pass in the symbol for the field you need. Also perhaps VersionProps could be a well-known class so we don't have to look it up. But this seems a future RFE. Even the current changes seem a little out-of-scope for this change. yes, this might be a useful change in a follow up RFE. What you have is good. You were going to change the lines anyway. ------------- PR: https://git.openjdk.java.net/jdk/pull/2712 From gziemski at openjdk.java.net Thu Feb 25 17:34:58 2021 From: gziemski at openjdk.java.net (Gerard Ziemski) Date: Thu, 25 Feb 2021 17:34:58 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v18] In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 12:36:10 GMT, Anton Kozlov wrote: >> Please review the implementation of JEP 391: macOS/AArch64 Port. >> >> It's heavily based on existing ports to linux/aarch64, macos/x86_64, and windows/aarch64. >> >> Major changes are in: >> * src/hotspot/cpu/aarch64: support of the new calling convention (subtasks JDK-8253817, JDK-8253818) >> * src/hotspot/os_cpu/bsd_aarch64: copy of os_cpu/linux_aarch64 with necessary adjustments (JDK-8253819) >> * src/hotspot/share, test/hotspot/gtest: support of write-xor-execute (W^X), required on macOS/AArch64 platform. It's implemented with pthread_jit_write_protect_np provided by Apple. The W^X mode is local to a thread, so W^X mode change relates to the java thread state change (for java threads). In most cases, JVM executes in write-only mode, except when calling a generated stub like SafeFetch, which requires a temporary switch to execute-only mode. The same execute-only mode is enabled when a java thread executes in java or native states. This approach of managing W^X mode turned out to be simple and efficient enough. >> * src/jdk.hotspot.agent: serviceability agent implementation (JDK-8254941) > > Anton Kozlov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 88 commits: > > - Merge remote-tracking branch 'upstream/jdk/master' into jdk-macos > - Re-do safefetch.hpp > - Merge remote-tracking branch 'origin/jdk/8261075-stubroutines-inline' into jdk-macos > - stubRoutines.inline.hpp -> safefetch.hpp > - Update copyrights > - Merge remote-tracking branch 'upstream/jdk/master' into 8261075-stubroutines-inline > - Merge remote-tracking branch 'upstream/jdk/master' into 8261075-stubroutines-inline > - Extract SafeFetch32/N to stubRoutines.inline.hpp > - Revert "Extract SafeFetch32/N to stubRoutines.inline.hpp" > > This reverts commit b873c25f31dd21349d140b790713cc9ccb5f2dc0. > - Merge pull request #9 from VladimirKempik/pull/2200 > > Removed unused variables > - ... and 78 more: https://git.openjdk.java.net/jdk/compare/b955f85e...ab72613c Marked as reviewed by gziemski (Committer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From xliu at openjdk.java.net Thu Feb 25 17:50:54 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Thu, 25 Feb 2021 17:50:54 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v9] In-Reply-To: References: Message-ID: On Thu, 25 Feb 2021 10:40:07 GMT, Evgeny Astigeevich wrote: >> Xin Liu has updated the pull request incrementally with one additional commit since the last revision: >> >> 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set >> >> update comments based on the review feedbacks. >> move the unittest to test_stringUtil.cpp. > > Marked as reviewed by eastig at github.com (no known OpenJDK username). @eastig Thank you for reviewing it. @TobiHartmann Could you take a look at it again? I made a little change after you approve it. If everything looks fine, could you sponsor it? Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/2178 From hseigel at openjdk.java.net Thu Feb 25 18:55:40 2021 From: hseigel at openjdk.java.net (Harold Seigel) Date: Thu, 25 Feb 2021 18:55:40 GMT Subject: RFR: 8262227: Change SystemDictionary::find() to return an InstanceKlass*. In-Reply-To: References: Message-ID: On Wed, 24 Feb 2021 20:04:36 GMT, Ioi Lam wrote: >> Please review this fix for JDK-8262227. This fix changes SystemDictionary::find() to return an InstanceKlass* to reduce InstanceKlass casts, it renames find() to find_instance_klass(), removes its unneeded TRAP parameter, and changes its callers as appropriate. >> >> It also changed the get_java_...() methods, in thread.cpp, to take an InstanceKlass* parameter and removed their now unneeded TRAPS parameter. >> >> The fix was tested with mach5 tiers 1 and 2 on Linux, Windows, and Mac OS, and tiers 3-5 on Linux x64 (still in progress). >> >> Thanks to Coleen and David for their helpful suggestions. >> >> Thanks, Harold > > src/hotspot/share/ci/ciEnv.cpp line 450: > >> 448: kls = SystemDictionary::find_constrained_instance_or_array_klass(sym, loader, >> 449: CHECK_AND_CLEAR_(fail_type)); >> 450: } else { > > I think SystemDictionary::find_constrained_instance_or_array_klass can also be changed to accept a `Thread*` instead `TRAPS`, since now it can no longer throw, and the thread is used only for Mutexes. Hi Ioi, I'd like to do this in a separate RFE. Is that okay? ------------- PR: https://git.openjdk.java.net/jdk/pull/2712 From hseigel at openjdk.java.net Thu Feb 25 18:55:41 2021 From: hseigel at openjdk.java.net (Harold Seigel) Date: Thu, 25 Feb 2021 18:55:41 GMT Subject: RFR: 8262227: Change SystemDictionary::find() to return an InstanceKlass*. In-Reply-To: References: <0MnhqJfTfBUyz0NXic5cqxFofg0Gv6nz5ZI3bacBRIs=.4531f2de-4d5a-484d-a800-c4b77c4a40f9@github.com> Message-ID: On Thu, 25 Feb 2021 13:11:11 GMT, Coleen Phillimore wrote: >> I think the get_java_* set of functions could be streamlined so that you pass in the symbol for the field you need. Also perhaps VersionProps could be a well-known class so we don't have to look it up. But this seems a future RFE. Even the current changes seem a little out-of-scope for this change. > > yes, this might be a useful change in a follow up RFE. What you have is good. You were going to change the lines anyway. Hi Ioi, I'd like to do this cleanup in a separate RFE. Is that okay? ------------- PR: https://git.openjdk.java.net/jdk/pull/2712 From mseledtsov at openjdk.java.net Thu Feb 25 19:19:52 2021 From: mseledtsov at openjdk.java.net (Mikhailo Seledtsov) Date: Thu, 25 Feb 2021 19:19:52 GMT Subject: RFR: 8256417: Exclude TestJFRWithJMX test from running with PodMan Message-ID: 8256417: Exclude TestJFRWithJMX test from running with PodMan ------------- Commit messages: - Remove TestJFRWithJMX.java from the problem list - Exclude TestJFRWithJMX.java from podman runs - Add container engine method to at-requires Changes: https://git.openjdk.java.net/jdk/pull/2726/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2726&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256417 Stats: 20 lines in 4 files changed: 16 ins; 1 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/2726.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2726/head:pull/2726 PR: https://git.openjdk.java.net/jdk/pull/2726 From mseledtsov at openjdk.java.net Thu Feb 25 19:22:41 2021 From: mseledtsov at openjdk.java.net (Mikhailo Seledtsov) Date: Thu, 25 Feb 2021 19:22:41 GMT Subject: RFR: 8256417: Exclude TestJFRWithJMX test from running with PodMan In-Reply-To: References: Message-ID: <1cRCjHRAuldHB4vmaCqyqsLOXHh2MNpALPiMZzOE2v8=.e1ba5c36-0bb2-4859-9210-89666e777d63@github.com> On Thu, 25 Feb 2021 19:14:48 GMT, Mikhailo Seledtsov wrote: > 8256417: Exclude TestJFRWithJMX test from running with PodMan Please review this small change that excludes the TestJFRWithJMX test from execution on Podman. In summary, this test requires capabilities that are not available under "rootless" podman. What was done: - added an at-requires method to return the name of the container engine - using the above named at requires to exclude this test as described above ------------- PR: https://git.openjdk.java.net/jdk/pull/2726 From mseledtsov at openjdk.java.net Thu Feb 25 19:22:41 2021 From: mseledtsov at openjdk.java.net (Mikhailo Seledtsov) Date: Thu, 25 Feb 2021 19:22:41 GMT Subject: RFR: 8256417: Exclude TestJFRWithJMX test from running with PodMan In-Reply-To: <1cRCjHRAuldHB4vmaCqyqsLOXHh2MNpALPiMZzOE2v8=.e1ba5c36-0bb2-4859-9210-89666e777d63@github.com> References: <1cRCjHRAuldHB4vmaCqyqsLOXHh2MNpALPiMZzOE2v8=.e1ba5c36-0bb2-4859-9210-89666e777d63@github.com> Message-ID: On Thu, 25 Feb 2021 19:19:33 GMT, Mikhailo Seledtsov wrote: >> 8256417: Exclude TestJFRWithJMX test from running with PodMan > > Please review this small change that excludes the TestJFRWithJMX test from execution on Podman. > In summary, this test requires capabilities that are not available under "rootless" podman. > What was done: > - added an at-requires method to return the name of the container engine > - using the above named at requires to exclude this test as described above @hseigel @iignatev Could you please take a look? ------------- PR: https://git.openjdk.java.net/jdk/pull/2726 From iignatyev at openjdk.java.net Thu Feb 25 19:29:39 2021 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Thu, 25 Feb 2021 19:29:39 GMT Subject: RFR: 8256417: Exclude TestJFRWithJMX test from running with PodMan In-Reply-To: References: <1cRCjHRAuldHB4vmaCqyqsLOXHh2MNpALPiMZzOE2v8=.e1ba5c36-0bb2-4859-9210-89666e777d63@github.com> Message-ID: On Thu, 25 Feb 2021 19:20:00 GMT, Mikhailo Seledtsov wrote: >> Please review this small change that excludes the TestJFRWithJMX test from execution on Podman. >> In summary, this test requires capabilities that are not available under "rootless" podman. >> What was done: >> - added an at-requires method to return the name of the container engine >> - using the above named at requires to exclude this test as described above > > @hseigel @iignatev Could you please take a look? Hi Misha, so this test can't be run w/ `podman` only if the test-user doesn't have root-level privileges? if so, I think it's better to check if we really don't have such priviliges and throw `SkippedException` (e.g. the same way we do in `SATestUtils` via `Platform.isRoot`). that way, the test will still be runnable in some configurations. -- Igor ------------- PR: https://git.openjdk.java.net/jdk/pull/2726 From mseledtsov at openjdk.java.net Thu Feb 25 20:06:38 2021 From: mseledtsov at openjdk.java.net (Mikhailo Seledtsov) Date: Thu, 25 Feb 2021 20:06:38 GMT Subject: RFR: 8256417: Exclude TestJFRWithJMX test from running with PodMan In-Reply-To: References: <1cRCjHRAuldHB4vmaCqyqsLOXHh2MNpALPiMZzOE2v8=.e1ba5c36-0bb2-4859-9210-89666e777d63@github.com> Message-ID: On Thu, 25 Feb 2021 19:26:48 GMT, Igor Ignatyev wrote: >> @hseigel @iignatev Could you please take a look? > > Hi Misha, > > so this test can't be run w/ `podman` only if the test-user doesn't have root-level privileges? if so, I think it's better to check if we really don't have such priviliges and throw `SkippedException` (e.g. the same way we do in `SATestUtils` via `Platform.isRoot`). that way, the test will still be runnable in some configurations. > > -- Igor Igor, thanks for the suggestion. I will update the code to use SkippedException and Platform.isRoot ------------- PR: https://git.openjdk.java.net/jdk/pull/2726 From xliu at openjdk.java.net Thu Feb 25 20:19:41 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Thu, 25 Feb 2021 20:19:41 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v8] In-Reply-To: <9QwSVEA-wKGBPSwUX0MSDAXGLLjCaSmLjtEyKshvdGI=.e9c410e8-5495-4ceb-affc-4590ebcd84d4@github.com> References: <9QwSVEA-wKGBPSwUX0MSDAXGLLjCaSmLjtEyKshvdGI=.e9c410e8-5495-4ceb-affc-4590ebcd84d4@github.com> Message-ID: <7u7ZuQQ8gEBsAFXs8ZR6JNQWvHmyzzkGPfN0gez9ZmI=.79e2b654-c552-4fbf-9f2a-e35ae70a2e6b@github.com> On Wed, 24 Feb 2021 11:08:58 GMT, Evgeny Astigeevich wrote: >> Xin Liu has updated the pull request incrementally with one additional commit since the last revision: >> >> 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set >> >> add comments and hoist ResourceMark > > src/hotspot/share/opto/type.cpp line 4053: > >> 4051: const_oop()->print_oop(&ss); >> 4052: // suppress new-lines('\n') in ss emitted by const_oop->print_oop() >> 4053: // so each node is one-liner for -XX:+Verbose && -XX:+PrintIdeal > > What about rewriting the comment in clearer way: > // 'const_oop->print_oop()' emits new-lines('\n') into ss. > // For -XX:+Verbose && -XX:+PrintIdeal, new-lines('\n') must be removed from > // the ss created string to have a node per line. update it. ------------- PR: https://git.openjdk.java.net/jdk/pull/2178 From mseledtsov at openjdk.java.net Thu Feb 25 20:52:41 2021 From: mseledtsov at openjdk.java.net (Mikhailo Seledtsov) Date: Thu, 25 Feb 2021 20:52:41 GMT Subject: RFR: 8256417: Exclude TestJFRWithJMX test from running with PodMan In-Reply-To: References: <1cRCjHRAuldHB4vmaCqyqsLOXHh2MNpALPiMZzOE2v8=.e1ba5c36-0bb2-4859-9210-89666e777d63@github.com> Message-ID: On Thu, 25 Feb 2021 20:04:20 GMT, Mikhailo Seledtsov wrote: >> Hi Misha, >> >> so this test can't be run w/ `podman` only if the test-user doesn't have root-level privileges? if so, I think it's better to check if we really don't have such priviliges and throw `SkippedException` (e.g. the same way we do in `SATestUtils` via `Platform.isRoot`). that way, the test will still be runnable in some configurations. >> >> -- Igor > > Igor, thanks for the suggestion. I will update the code to use SkippedException and Platform.isRoot Igor, while following your suggestion of using jtreg.SkippedException, I am thinking about keeping the new VMProps.containerEngine() in the VMProps.java. It can come in handy in the future. I hope you do not mind that. ------------- PR: https://git.openjdk.java.net/jdk/pull/2726 From iklam at openjdk.java.net Thu Feb 25 21:15:40 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 25 Feb 2021 21:15:40 GMT Subject: RFR: 8262227: Change SystemDictionary::find() to return an InstanceKlass*. In-Reply-To: References: Message-ID: <96SvIuW8akHlL0zV8iWIogMEUY_DrHsq20VUaDpEBbw=.4d83ed9a-03fe-4b3e-9fd3-533193863c5e@github.com> On Wed, 24 Feb 2021 19:32:29 GMT, Harold Seigel wrote: > Please review this fix for JDK-8262227. This fix changes SystemDictionary::find() to return an InstanceKlass* to reduce InstanceKlass casts, it renames find() to find_instance_klass(), removes its unneeded TRAP parameter, and changes its callers as appropriate. > > It also changed the get_java_...() methods, in thread.cpp, to take an InstanceKlass* parameter and removed their now unneeded TRAPS parameter. > > The fix was tested with mach5 tiers 1 and 2 on Linux, Windows, and Mac OS, and tiers 3-5 on Linux x64 (still in progress). > > Thanks to Coleen and David for their helpful suggestions. > > Thanks, Harold Marked as reviewed by iklam (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2712 From iklam at openjdk.java.net Thu Feb 25 21:15:41 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 25 Feb 2021 21:15:41 GMT Subject: RFR: 8262227: Change SystemDictionary::find() to return an InstanceKlass*. In-Reply-To: References: <0MnhqJfTfBUyz0NXic5cqxFofg0Gv6nz5ZI3bacBRIs=.4531f2de-4d5a-484d-a800-c4b77c4a40f9@github.com> Message-ID: On Thu, 25 Feb 2021 18:53:14 GMT, Harold Seigel wrote: >> yes, this might be a useful change in a follow up RFE. What you have is good. You were going to change the lines anyway. > > Hi Ioi, I'd like to do this cleanup in a separate RFE. Is that okay? Hi Harold, I think it's OK to do the cleanups in different RFEs. ------------- PR: https://git.openjdk.java.net/jdk/pull/2712 From hseigel at openjdk.java.net Thu Feb 25 21:18:42 2021 From: hseigel at openjdk.java.net (Harold Seigel) Date: Thu, 25 Feb 2021 21:18:42 GMT Subject: RFR: 8262227: Change SystemDictionary::find() to return an InstanceKlass*. In-Reply-To: <96SvIuW8akHlL0zV8iWIogMEUY_DrHsq20VUaDpEBbw=.4d83ed9a-03fe-4b3e-9fd3-533193863c5e@github.com> References: <96SvIuW8akHlL0zV8iWIogMEUY_DrHsq20VUaDpEBbw=.4d83ed9a-03fe-4b3e-9fd3-533193863c5e@github.com> Message-ID: On Thu, 25 Feb 2021 21:12:28 GMT, Ioi Lam wrote: >> Please review this fix for JDK-8262227. This fix changes SystemDictionary::find() to return an InstanceKlass* to reduce InstanceKlass casts, it renames find() to find_instance_klass(), removes its unneeded TRAP parameter, and changes its callers as appropriate. >> >> It also changed the get_java_...() methods, in thread.cpp, to take an InstanceKlass* parameter and removed their now unneeded TRAPS parameter. >> >> The fix was tested with mach5 tiers 1 and 2 on Linux, Windows, and Mac OS, and tiers 3-5 on Linux x64 (still in progress). >> >> Thanks to Coleen and David for their helpful suggestions. >> >> Thanks, Harold > > Marked as reviewed by iklam (Reviewer). Thanks Ioi, Coleen, and David for the reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/2712 From hseigel at openjdk.java.net Thu Feb 25 21:18:44 2021 From: hseigel at openjdk.java.net (Harold Seigel) Date: Thu, 25 Feb 2021 21:18:44 GMT Subject: Integrated: 8262227: Change SystemDictionary::find() to return an InstanceKlass*. In-Reply-To: References: Message-ID: On Wed, 24 Feb 2021 19:32:29 GMT, Harold Seigel wrote: > Please review this fix for JDK-8262227. This fix changes SystemDictionary::find() to return an InstanceKlass* to reduce InstanceKlass casts, it renames find() to find_instance_klass(), removes its unneeded TRAP parameter, and changes its callers as appropriate. > > It also changed the get_java_...() methods, in thread.cpp, to take an InstanceKlass* parameter and removed their now unneeded TRAPS parameter. > > The fix was tested with mach5 tiers 1 and 2 on Linux, Windows, and Mac OS, and tiers 3-5 on Linux x64 (still in progress). > > Thanks to Coleen and David for their helpful suggestions. > > Thanks, Harold This pull request has now been integrated. Changeset: 29c603f9 Author: Harold Seigel URL: https://git.openjdk.java.net/jdk/commit/29c603f9 Stats: 90 lines in 13 files changed: 6 ins; 19 del; 65 mod 8262227: Change SystemDictionary::find() to return an InstanceKlass*. Reviewed-by: iklam, dholmes, coleenp ------------- PR: https://git.openjdk.java.net/jdk/pull/2712 From mseledtsov at openjdk.java.net Thu Feb 25 21:18:55 2021 From: mseledtsov at openjdk.java.net (Mikhailo Seledtsov) Date: Thu, 25 Feb 2021 21:18:55 GMT Subject: RFR: 8256417: Exclude TestJFRWithJMX test from running with PodMan [v2] In-Reply-To: References: Message-ID: > 8256417: Exclude TestJFRWithJMX test from running with PodMan Mikhailo Seledtsov has updated the pull request incrementally with one additional commit since the last revision: Review feedback: using SkippedException and checking for root mode ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2726/files - new: https://git.openjdk.java.net/jdk/pull/2726/files/43d6c16f..3e5364ad Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2726&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2726&range=00-01 Stats: 15 lines in 2 files changed: 12 ins; 1 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2726.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2726/head:pull/2726 PR: https://git.openjdk.java.net/jdk/pull/2726 From mseledtsov at openjdk.java.net Thu Feb 25 21:18:56 2021 From: mseledtsov at openjdk.java.net (Mikhailo Seledtsov) Date: Thu, 25 Feb 2021 21:18:56 GMT Subject: RFR: 8256417: Exclude TestJFRWithJMX test from running with PodMan In-Reply-To: References: <1cRCjHRAuldHB4vmaCqyqsLOXHh2MNpALPiMZzOE2v8=.e1ba5c36-0bb2-4859-9210-89666e777d63@github.com> Message-ID: On Thu, 25 Feb 2021 19:26:48 GMT, Igor Ignatyev wrote: >> @hseigel @iignatev Could you please take a look? > > Hi Misha, > > so this test can't be run w/ `podman` only if the test-user doesn't have root-level privileges? if so, I think it's better to check if we really don't have such priviliges and throw `SkippedException` (e.g. the same way we do in `SATestUtils` via `Platform.isRoot`). that way, the test will still be runnable in some configurations. > > -- Igor I have updated the changes. @iignatev please review. ------------- PR: https://git.openjdk.java.net/jdk/pull/2726 From iignatyev at openjdk.java.net Thu Feb 25 22:38:38 2021 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Thu, 25 Feb 2021 22:38:38 GMT Subject: RFR: 8256417: Exclude TestJFRWithJMX test from running with PodMan In-Reply-To: References: <1cRCjHRAuldHB4vmaCqyqsLOXHh2MNpALPiMZzOE2v8=.e1ba5c36-0bb2-4859-9210-89666e777d63@github.com> Message-ID: On Thu, 25 Feb 2021 20:50:03 GMT, Mikhailo Seledtsov wrote: > Igor, while following your suggestion of using jtreg.SkippedException, I am thinking about keeping the new VMProps.containerEngine() in the VMProps.java. It can come in handy in the future. I hope you do not mind that. generally speaking, I am no fan of adding things we don't use. instead, I'd suggest you creating Platform::containerEngine (or better Container::engine) and it in the test. and if/when the needs for `@requiers container.engine` appears, we can add one w/ almost zero effort. PS you need to update copyright year in all the updated files. -- Igor ------------- PR: https://git.openjdk.java.net/jdk/pull/2726 From coleenp at openjdk.java.net Thu Feb 25 23:40:45 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 25 Feb 2021 23:40:45 GMT Subject: RFR: JDK-8262074: Consolidate the default value of MetaspaceSize [v2] In-Reply-To: References: <2CoSJFr7zv4Q38bybB1v6-MLLePLJs-kv_897GOuudk=.92c86e36-18e6-4415-8ab1-6889993e976a@github.com> Message-ID: On Tue, 23 Feb 2021 07:47:59 GMT, Thomas Stuefe wrote: >> I was looking at whether the default values for MetaspaceSize (the initial threshold to start off a metaspace-motivated GC) still make sense after JEP-387. >> >> The default is dependent on compiler tier and bitness. It is also spread across all platforms. >> >> In addition to that, it also may get modified after Metaspace::ergo_initialize() in client-compiler-emulation-mode: >> >> https://github.com/openjdk/jdk/blob/2b00367e1154feb2c05b84a11d62fb5750e46acf/src/hotspot/share/compiler/compilerDefinitions.cpp#L194-L196 >> >> which is unexpected and causes confusion (eg JDK-8261907, JDK-8261907). >> >> The reasons for this seem to originate from PermGen times: >> https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2021-February/045536.html >> >> ---- >> >> Today, MetaspaceSize defaults to: >> >> - no compiler (eg Zero): **4M** (32bit) **5.19M** (64bit) >> - C1-only build: **12M** >> - C1+C2 build (standard): **16M** (32bit) **20,75M** (64bit) >> >> I was surprised to see that they do not depend on any compiler *runtime* switches. It only depends on build time decisions. >> >> --- >> >> How much do we use? I analyzed a simple java app to see the difference VM settings make on initial metaspace consumption. Committed space, used in brackets: >> >> >> (Note: (used) committed >> CDS on: >> >> 64bit: (181,58 KB) 384 KB (a) >> 64bit tier1 only: (170,04 KB) 384 KB >> 64bit Xint: (16,62 KB) 256 KB >> >> 32bit (178 KB) 256 KB >> 32bit tier1 only: (144 KB) 256 KB >> 32bit Xint: (11 KB) (b) 128 KB >> >> CDS off: >> >> 64bit: (5,06 MB) 5.62 MB >> 64bit tier1 only: (5,00 MB) 5,56 MB >> 64bit Xint: (4,84 MB) 5.44 MB >> >> 32bit (3,69 MB) 3.75 MB >> 32bit tier1 only: (3,65 MB) 3.75 MB >> 32bit Xint: (3,52 MB) 3.62 MB >> >> Class space on/off >> >> CDS off, 64bit, +CompressedClassPointers: 5.44M >> CDS off, 64bit, -CompressedClassPointers: 5.38M >> >> >> _Notes: >> (a) Since JEP-387, with CDS=on, we pay very little committed footprint upfront (384K). For comparison, JDK 15 commits here 5.75M. >> (b) The seemingly high difference between Xint and C1+C2 - 11K vs 178K - is misleading: All initial classes get compiled, but since most of their metadata live in CDS, not in Metaspace, all we allocate at the start are MethodCounters. Hence, with -Xint, we almost allocate nothing. That changes as soon as we start loading application classes._ >> >> Conclusions: >> - CDS=off increases metaspace footprint by a flat amount, in my case ~5MB, which makes sense. >> - Running with (any) compiler has not much influence once we start using Metaspace for real. The difference between C1-only and C1+C2 is neglectible, the difference between Xint and C1+C2 amounts to about 2% wrt to initial metaspace consumption. >> - Running with or without compressed Klass pointers makes not much difference. With class space, we pay for certain overhead twice, but at this early stage this is not noticeable. >> - The difference between 64bit and 32bit is more like 1.4-1.5, not the 1.3 factor we currently assume >> >> ----- >> >> Proposal: >> >> 1) I propose to make MetaspaceSize independent from compiler. For one, if the intention was to have a lower threshold with compilers deactivated, that has never worked. E.g. on 64bit we always had a threshold of 20.75MB regardless of Xint/TieredStopAtLevel. Even if it worked, the compiler does not make that much difference in metaspace footprint. >> >> 2) I propose to slightly lower MetaspaceSize - on 32bit from 16M to 14M, on 64bit from 20.75M to 20M. This takes the slightly lower metaspace footprint since JEP 387 into account (less waste) and the scale I found to be higher than 1.3. >> >> This is all very cautious. For the standard VM, very little changes, so this is mainly a cleanup patch. We could probably tune MetaspaceSize down to much lower levels. And/or make size it differently depending on UseSharedSpaces. However, atm I don't have time to hunt regressions due to too early GCs. >> >> ----- >> >> Tests: GA, nightlies at SAP > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Increase MetaspaceSize default This looks good. I appreciate the more conservative approach and doing this as a cleanup. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2675 From coleenp at openjdk.java.net Thu Feb 25 23:53:42 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 25 Feb 2021 23:53:42 GMT Subject: RFR: JDK-8261520: JDK-8261302 breaks runtime/NMT/CheckForProperDetailStackTrace.java [v3] In-Reply-To: References: <4pPyMfRC1i30Q_MxXBo8QE_RYmKd3CYfzWr2M4K-c5w=.e8d85c7a-e4c9-42f2-9bce-55600c0e0ec9@github.com> Message-ID: <0tUORQcqrtsTukh_IrT_E_ARsvkR4ijDNHOJ0zB5rwI=.7f622b0e-4a5d-4fb4-8637-79edd2a6f88b@github.com> On Wed, 24 Feb 2021 19:56:04 GMT, Thomas Stuefe wrote: >> Since JDK-8261302, the test runtime/NMT/CheckForProperDetailStackTrace.java fails with >> java.lang.RuntimeException: 'NativeCallStack::NativeCallStack' found in stdout >> >> -- >> >> `NativeCallStack` contains a hash code. Before JDK-8261302, that hash code was calculated lazily in a non-inline hashcode getter. With JDK-8261302, the hash code calculation was moved into the `NativeCallStack` constructor and the getter was made inline. >> >> The `NativeCallStack` constructor fills itself via `os::get_native_stack()`. Before JDK-8261302, that call has been the last call in the constructor and hence had been sometimes optimized into a tail call. Whether or not its a tail call matters since it affects the number of stack frames the stack walker has to skip. Therefore, the constructor contains coding to predict tail-call-ness: >> >> #if (defined(_NMT_NOINLINE_) || defined(_WINDOWS) || !defined(_LP64) || defined(PPC64)) >> // Not a tail call. >> toSkip++; >> #if (defined(_NMT_NOINLINE_) && defined(BSD) && defined(_LP64)) >> // Mac OS X slowdebug builds have this odd behavior where NativeCallStack::NativeCallStack >> // appears as two frames, so we need to skip an extra frame. >> toSkip++; >> #endif // Special-case for BSD. >> #endif // Not a tail call. >> >> This prediction was now off since the hash code calculation happened at the end of the callstack. This causes the test error, since on some platforms (eg Linux x64) we now think we have a tail call when we don't, which means we do not skip enough frames, and the NMT output contains call frames like "NativeCallStack::NativeCallStack()", which trips the test. >> >> ----------- >> >> Fix: >> >> This fix moves the hash code calculation completely out of NativeCallStack. There is no reason why NativeCallStack should have a hash code. It mainly exists as a convenience to place it in a hash map. The patch moves the hash code calculation up into MallocSiteTableEntry. >> >> This has the advantage of only having to pay for a hash code when you need it - in theory, one may use NativeCallStack in places other than NMT, where it is unnecessary. >> >> I considered other options: >> - modify `os::get_native_stack()` to also calculate a hash in addition to capturing the stack, and return it in a caller provided variable. That would have left this call to be the tail call. However, it seemed less clean - we have two implementations of this function, as well as other, non-capturing, NativeCallStack constructors, which would have to be modified. It also would have made `os::get_native_stack()` less general purpose. >> - Leave it as it is and just always skip frames: Seemed attractive, but I did not want to touch the tailcode-prediction-code and play whack-the-mole with platform specific test errors. >> >> --------------- >> >> Tests: GA, manual test, nightlies at SAP > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge > - unsigned->unsigned int > - Initial Yes this makes sense. I use NativeCallStack sometimes for debugging things and don't need the hash code, so this is good. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2672 From mseledtsov at openjdk.java.net Thu Feb 25 23:54:40 2021 From: mseledtsov at openjdk.java.net (Mikhailo Seledtsov) Date: Thu, 25 Feb 2021 23:54:40 GMT Subject: RFR: 8256417: Exclude TestJFRWithJMX test from running with PodMan In-Reply-To: References: <1cRCjHRAuldHB4vmaCqyqsLOXHh2MNpALPiMZzOE2v8=.e1ba5c36-0bb2-4859-9210-89666e777d63@github.com> Message-ID: On Thu, 25 Feb 2021 22:35:43 GMT, Igor Ignatyev wrote: >> Igor, while following your suggestion of using jtreg.SkippedException, I am thinking about keeping the new VMProps.containerEngine() in the VMProps.java. It can come in handy in the future. I hope you do not mind that. > >> Igor, while following your suggestion of using jtreg.SkippedException, I am thinking about keeping the new VMProps.containerEngine() in the VMProps.java. It can come in handy in the future. I hope you do not mind that. > > generally speaking, I am no fan of adding things we don't use. instead, I'd suggest you creating Platform::containerEngine (or better Container::engine) and it in the test. and if/when the needs for `@requiers container.engine` appears, we can add one w/ almost zero effort. > > PS you need to update copyright year in all the updated files. > > -- Igor OK, I will just remove the code from VMProps. ------------- PR: https://git.openjdk.java.net/jdk/pull/2726 From mseledtsov at openjdk.java.net Fri Feb 26 00:01:54 2021 From: mseledtsov at openjdk.java.net (Mikhailo Seledtsov) Date: Fri, 26 Feb 2021 00:01:54 GMT Subject: RFR: 8256417: Exclude TestJFRWithJMX test from running with PodMan [v3] In-Reply-To: References: Message-ID: > 8256417: Exclude TestJFRWithJMX test from running with PodMan Mikhailo Seledtsov has updated the pull request incrementally with three additional commits since the last revision: - Updated copyright years - Reverted changes to VMProps.java since they are not in use - Reverted changes to VMProps.java since they are not in use ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2726/files - new: https://git.openjdk.java.net/jdk/pull/2726/files/3e5364ad..5c2e3cbd Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2726&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2726&range=01-02 Stats: 18 lines in 3 files changed: 0 ins; 15 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/2726.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2726/head:pull/2726 PR: https://git.openjdk.java.net/jdk/pull/2726 From coleenp at openjdk.java.net Fri Feb 26 00:03:52 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 26 Feb 2021 00:03:52 GMT Subject: RFR: 8262402: Make CATCH macro assert not fatal Message-ID: Hopefully, this is a trivial change. Tested with tier1 on linux, macosx, and windows, product and debug. ------------- Commit messages: - 8262402: Make CATCH macro assert not fatal Changes: https://git.openjdk.java.net/jdk/pull/2736/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2736&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8262402 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2736.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2736/head:pull/2736 PR: https://git.openjdk.java.net/jdk/pull/2736 From minqi at openjdk.java.net Fri Feb 26 00:09:05 2021 From: minqi at openjdk.java.net (Yumin Qi) Date: Fri, 26 Feb 2021 00:09:05 GMT Subject: RFR: 8259070: Add jcmd option to dump CDS Message-ID: Hi, Please review Added jcmd option for dumping CDS archive during application runtime. Before this change, user has to dump shared archive in two steps: first run application with `java -XX:DumpLoadedClassList= .... ` to collect shareable class names and saved in file `` , then `java -Xshare:dump -XX:SharedClassListFile= -XX:SharedArchiveFile= ...` With this change, user can use jcmd to dump CDS without going through above steps. Also user can choose a moment during the app runtime to dump an archive. The bug is associated with the CSR: https://bugs.openjdk.java.net/browse/JDK-8259798 which has been approved. New added jcmd option: `jcmd VM.cds static_dump ` or `jcmd VM.cds dynamic_dump ` To dump dynamic archive, requires start app with newly added flag `-XX:+RecordDynamicDumpInfo`, with this flag, some information related to dynamic dump like loader constraints will be recorded. Note the dumping process changed some object memory locations so for dumping dynamic archive, can only done once for a running app. For static dump, user can dump multiple times against same process. The file name is optional, if the file name is not supplied, the file name will take format of `java_pid_static.jsa` or `java_pid_dynamic.jsa` for static and dynamic respectively. The `` is the application process ID. Tests: tier1,tier2,tier3,tier4 Thanks Yumin ------------- Commit messages: - 8259070: Add jcmd option to dump CDS Changes: https://git.openjdk.java.net/jdk/pull/2737/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2737&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8259070 Stats: 502 lines in 13 files changed: 496 ins; 0 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/2737.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2737/head:pull/2737 PR: https://git.openjdk.java.net/jdk/pull/2737 From coleenp at openjdk.java.net Fri Feb 26 00:15:38 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 26 Feb 2021 00:15:38 GMT Subject: RFR: 8213269: convert test/hotspot/jtreg/runtime/memory/RunUnitTestsConcurrently to gtest In-Reply-To: References: Message-ID: <6i4IdPZcRF4sU3hoqTwMh3huOl1LaZmfoCjbva9bFkA=.1ae47162-be70-45ab-8739-28a90da3a389@github.com> On Fri, 5 Feb 2021 20:35:23 GMT, Mikhailo Seledtsov wrote: > This is a preliminary review. I would like to get the initial feedback before I proceed with conversion of the remaining tests. > > Here is what I did so far: > - created a UnitTestThread and a main test runner, based on gtests with similar needs > - moved the original code from HotSpot internals (so called hotspot internal tests: src/hotspot/share/memory/virtualspace.cpp) > to the newly created gtest while wrapping it into a TestReservedSpace class. I did not change the code of the test. > - removed invocations from whitebox.cpp > > Testing: > - ran GTestWrapper on usual platforms - All PASS > - ensured that ReservedSpaceConcurrent is in the logs and passed > > After gathering the feedback my plan is: > Plan: > - move the remaining internal Memory/VirtualSpace tests into a gTest > - I am thinking about using separate files for each test > - create a common file for UnitTestThread and MultiThreadTestRunner to reuse the code Looks good! Thank you for making this a gtest. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2436 From dholmes at openjdk.java.net Fri Feb 26 00:27:38 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Fri, 26 Feb 2021 00:27:38 GMT Subject: RFR: 8262402: Make CATCH macro assert not fatal In-Reply-To: References: Message-ID: On Thu, 25 Feb 2021 23:58:16 GMT, Coleen Phillimore wrote: > Hopefully, this is a trivial change. Tested with tier1 on linux, macosx, and windows, product and debug. Looks good and trivial. Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2736 From iignatyev at openjdk.java.net Fri Feb 26 00:45:42 2021 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Fri, 26 Feb 2021 00:45:42 GMT Subject: RFR: 8213269: convert test/hotspot/jtreg/runtime/memory/RunUnitTestsConcurrently to gtest In-Reply-To: References: Message-ID: On Fri, 5 Feb 2021 20:35:23 GMT, Mikhailo Seledtsov wrote: > This is a preliminary review. I would like to get the initial feedback before I proceed with conversion of the remaining tests. > > Here is what I did so far: > - created a UnitTestThread and a main test runner, based on gtests with similar needs > - moved the original code from HotSpot internals (so called hotspot internal tests: src/hotspot/share/memory/virtualspace.cpp) > to the newly created gtest while wrapping it into a TestReservedSpace class. I did not change the code of the test. > - removed invocations from whitebox.cpp > > Testing: > - ran GTestWrapper on usual platforms - All PASS > - ensured that ReservedSpaceConcurrent is in the logs and passed > > After gathering the feedback my plan is: > Plan: > - move the remaining internal Memory/VirtualSpace tests into a gTest > - I am thinking about using separate files for each test > - create a common file for UnitTestThread and MultiThreadTestRunner to reuse the code Changes requested by iignatyev (Reviewer). test/hotspot/gtest/concurrentTestRunner.inline.hpp line 34: > 32: class TestRunnable { > 33: public: > 34: virtual void runUnitTest() { I assume you meant for this class to be abstract, if so `runUnitTest` should be a pure virtual function. test/hotspot/gtest/concurrentTestRunner.inline.hpp line 1: > 1: /* for c++, we don't use camelCase in filenames, but rather use small_snake_case test/hotspot/gtest/concurrentTestRunner.inline.hpp line 97: > 95: long testDurationMillis; > 96: int nrOfThreads; > 97: TestRunnable* unitTestRunnable; these also can be made `const` test/hotspot/gtest/concurrentTestRunner.inline.hpp line 60: > 58: private: > 59: long testDuration; > 60: TestRunnable* runnable; all these can be made `const` and initialized in the initializer list. test/hotspot/gtest/concurrentTestRunner.inline.hpp line 49: > 47: } > 48: > 49: virtual ~UnitTestThread() {} why do you need virtual d-ctor here? test/hotspot/gtest/concurrentTestRunner.inline.hpp line 100: > 98: }; > 99: > 100: #endif // include guard we tend to use the expression used in the corresponding `#if` / `#ifdef` as a comment in `#endif` ------------- PR: https://git.openjdk.java.net/jdk/pull/2436 From iignatyev at openjdk.java.net Fri Feb 26 00:51:45 2021 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Fri, 26 Feb 2021 00:51:45 GMT Subject: RFR: 8213269: convert test/hotspot/jtreg/runtime/memory/RunUnitTestsConcurrently to gtest In-Reply-To: References: Message-ID: On Fri, 5 Feb 2021 20:35:23 GMT, Mikhailo Seledtsov wrote: > This is a preliminary review. I would like to get the initial feedback before I proceed with conversion of the remaining tests. > > Here is what I did so far: > - created a UnitTestThread and a main test runner, based on gtests with similar needs > - moved the original code from HotSpot internals (so called hotspot internal tests: src/hotspot/share/memory/virtualspace.cpp) > to the newly created gtest while wrapping it into a TestReservedSpace class. I did not change the code of the test. > - removed invocations from whitebox.cpp > > Testing: > - ran GTestWrapper on usual platforms - All PASS > - ensured that ReservedSpaceConcurrent is in the logs and passed > > After gathering the feedback my plan is: > Plan: > - move the remaining internal Memory/VirtualSpace tests into a gTest > - I am thinking about using separate files for each test > - create a common file for UnitTestThread and MultiThreadTestRunner to reuse the code Changes requested by iignatyev (Reviewer). test/hotspot/gtest/runtime/test_os_windows.cpp line 704: > 702: > 703: TEST_VM(os_windows, reserve_memory_special_concurrent) { > 704: ConcurrentTestRunner testRunner(new ReserveMemorySpecialRunnable(), 30, 15000); a memory leak test/hotspot/gtest/runtime/test_os_linux.cpp line 417: > 415: > 416: TEST_VM(os_linux, reserve_memory_special_concurrent) { > 417: ConcurrentTestRunner testRunner(new ReserveMemorySpecialRunnable(), 30, 15000); yet another memory leak test/hotspot/gtest/memory/test_virtualspace.cpp line 676: > 674: > 675: TEST_VM(VirtualSpace, virtual_space_concurrent) { > 676: ConcurrentTestRunner testRunner(new VirtualSpaceRunnable(), 30, 15000); one more memory leak test/hotspot/gtest/memory/test_virtualspace.cpp line 664: > 662: > 663: TEST_VM(VirtualSpace, reserve_space_concurrent) { > 664: ConcurrentTestRunner testRunner(new ReservedSpaceRunnable(), 30, 15000); and a memory leak again ------------- PR: https://git.openjdk.java.net/jdk/pull/2436 From kalinshi at tencent.com Fri Feb 26 03:15:39 2021 From: kalinshi at tencent.com (=?gb2312?B?a2FsaW5zaGkoyqm72yk=?=) Date: Fri, 26 Feb 2021 03:15:39 +0000 Subject: =?gb2312?B?u9i4tDogSnZtdGlFeHBvcnQ6OmNhbl93YWxrX2FueV9zcGFjZSgpIHVzYWdl?= =?gb2312?Q?_in_hotspot(Internet_mail)?= In-Reply-To: <15ea918b-2d8a-b021-f5d0-ed1af1f4c84c@oracle.com> References: <365ef5a026a84e81917462643ea4bc97@tencent.com> <15ea918b-2d8a-b021-f5d0-ed1af1f4c84c@oracle.com> Message-ID: Thanks Dan! ping! Regards Hui -----????----- ???: daniel.daugherty at oracle.com ????: 2021?2?23? 2:17 ???: kalinshi(??) ; hotspot-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; serviceability-dev at openjdk.java.net ??: Re: JvmtiExport::can_walk_any_space() usage in hotspot(Internet mail) Adding serviceability-dev at ... to this email thread since JVM/TI is maintained by the Serviceability Team... Dan On 2/22/21 3:29 AM, kalinshi(??) wrote: > Hi hotspot experts, > > Would you help on my question about JvmtiExport::can_walk_any_space() check? > Question is why JvmtiExport::can_walk_any_space() check is needed in CDS when mapping region? > > JvmtiExport::can_walk_any_space() method is only used in FileMapInfo::map_region for modifing region read-only mapping attribute. > JvmtiExport::can_walk_any_space() is set true when jvmtiCapabilities.can_tag_objects is enabled. > JVMTI capability can_tag_objects enables java heap iteration/object reference tracing, and JvmtiEnv::Set/GetTag doesn't modify read-only regions in shared archive (I might wrong). > > comments in latest code seems outdated, JvmtiExport::can_walk_any_space() doesn't disable sharing now. > " > JvmtiExport::set_can_walk_any_space( > avail.can_tag_objects); // disable sharing in onload phase > " > > Back to initial code, class sharing is disabled when condition JvmtiExport::can_modify_any_class() || JvmtiExport::can_walk_any_space() is true. > This matches above comment in JvmtiManageCapabilities::update. > " > if (JvmtiExport::can_modify_any_class() || JvmtiExport::can_walk_any_space()) { > fail_continue("Tool agent requires sharing to be disabled."); > return false; > } > " > > JvmtiExport::can_modify_any_class condition disables class data sharing when class file load hook (requires modify code and read only contents) is needed in initial code. > Both checks are removed and used to determine region read/write attribute with following commits. These commits are mainly supporting class file load hook with CDS. > > 1. enable shared class when these tow checks on, modify/map all regions in shared archive as RW. > 8054386: Allow Java debugging when CDS is enabled Map archive RW when debugging is enabled > 8087153: EXCEPTION_ACCESS_VIOLATION when CDS RO section vanished > on win32 > > 2. Support class file load hook with CDS > 8141341: CDS should be disabled if JvmtiExport::should_post_class_file_load_hook() is true Disable loading shared class if JvmtiExport::should_post_class_file_load_hook is true. > 8078644: CDS needs to support JVMTI CFLH Support posting CLFH for shared classes. > > 3. Fix jvmtiCapabilities::can_generate_all_class_hook_events inconsistent state when shared > 8161605: The '!UseSharedSpaces' check is not need in > JvmtiManageCapabilities::recompute_always_capabilities > > 4. Fix class file load hook error for early class hook event when shared > 8212200: assert when shared java.lang.Object is redefined by JVMTI > agent > > Regards > Hui From mseledtsov at openjdk.java.net Fri Feb 26 03:50:40 2021 From: mseledtsov at openjdk.java.net (Mikhailo Seledtsov) Date: Fri, 26 Feb 2021 03:50:40 GMT Subject: RFR: 8213269: convert test/hotspot/jtreg/runtime/memory/RunUnitTestsConcurrently to gtest In-Reply-To: References: Message-ID: On Fri, 26 Feb 2021 00:37:35 GMT, Igor Ignatyev wrote: >> This is a preliminary review. I would like to get the initial feedback before I proceed with conversion of the remaining tests. >> >> Here is what I did so far: >> - created a UnitTestThread and a main test runner, based on gtests with similar needs >> - moved the original code from HotSpot internals (so called hotspot internal tests: src/hotspot/share/memory/virtualspace.cpp) >> to the newly created gtest while wrapping it into a TestReservedSpace class. I did not change the code of the test. >> - removed invocations from whitebox.cpp >> >> Testing: >> - ran GTestWrapper on usual platforms - All PASS >> - ensured that ReservedSpaceConcurrent is in the logs and passed >> >> After gathering the feedback my plan is: >> Plan: >> - move the remaining internal Memory/VirtualSpace tests into a gTest >> - I am thinking about using separate files for each test >> - create a common file for UnitTestThread and MultiThreadTestRunner to reuse the code > > test/hotspot/gtest/concurrentTestRunner.inline.hpp line 1: > >> 1: /* > > for c++, we don't use camelCase in filenames, but rather use small_snake_case OK. I saw gtestMain.cpp, gtestLauncher.cpp and a few others, and just followed that. I also see a number of test_camelCase.cpp: test_primitiveConversions.cpp, test_logSelectionList.cpp and so on. In fact, it seems the most prevalent pattern for gtests is test_camelCase.cpp. Anyway, no problem, I can rename this file to concurrent_test_runner.inline.hpp > test/hotspot/gtest/runtime/test_os_windows.cpp line 704: > >> 702: >> 703: TEST_VM(os_windows, reserve_memory_special_concurrent) { >> 704: ConcurrentTestRunner testRunner(new ReserveMemorySpecialRunnable(), 30, 15000); > > a memory leak I forgot I am not in Java anymore :) ------------- PR: https://git.openjdk.java.net/jdk/pull/2436 From dongbo at openjdk.java.net Fri Feb 26 06:10:01 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Fri, 26 Feb 2021 06:10:01 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v11] In-Reply-To: References: Message-ID: > In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero, > see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`: > /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */ > public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT); > > The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64, > assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 0`, but `ushr dst.4H, src.4H, 16` instead. > According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long. > ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i); > vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i); > > The legal right shift amount should be in the range 1 to the element width in bits on aarch64: > https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en > > This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate. > Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests. Dong Bo has updated the pull request incrementally with one additional commit since the last revision: refactor tests ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2472/files - new: https://git.openjdk.java.net/jdk/pull/2472/files/9290f27e..24d6e9f8 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=10 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=09-10 Stats: 422 lines in 1 file changed: 16 ins; 222 del; 184 mod Patch: https://git.openjdk.java.net/jdk/pull/2472.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2472/head:pull/2472 PR: https://git.openjdk.java.net/jdk/pull/2472 From dongbo at openjdk.java.net Fri Feb 26 06:13:39 2021 From: dongbo at openjdk.java.net (Dong Bo) Date: Fri, 26 Feb 2021 06:13:39 GMT Subject: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v11] In-Reply-To: <9rtmEMrsPaA73FDA-KB7H0S0CRdBePGwnI5FcDY-OLI=.425249e2-b590-4a16-b9b8-8d7b5ecd2800@github.com> References: <8kMxMFAYtb0B-yUVEt-HLfhji3Gj-gog8OHvWW_tKfw=.f7c9422b-3574-4c31-9489-7286ee98332f@github.com> <9rtmEMrsPaA73FDA-KB7H0S0CRdBePGwnI5FcDY-OLI=.425249e2-b590-4a16-b9b8-8d7b5ecd2800@github.com> Message-ID: On Thu, 25 Feb 2021 01:44:27 GMT, Dong Bo wrote: >>> > Local tests by manually injected error shows all instructions are covered by the jtreg case. Suggestions? >>> >>> I'm not seeing `sra` used anywhere. >>> >>> The problem I see with the tests is that the methods are large. This causes C2 to do a lot of spilling. Also, because the resuling code is intertwined and complex, it's very hard to debug. >>> >>> It would be far better to do something like this: >>> >>> ``` >>> void long_shift_add(long arrLongs[][], LongVector vba, LongVector vbb, int i) { >>> vba.add(vbb.lanewise(VectorOperators.LSHR, 37)).intoArray(arrLongs[op], 2 * i); >>> vba.add(vbb.lanewise(VectorOperators.LSHR, 64)).intoArray(arrLongs[op + 1], 2 * i); >>> vba.add(vbb.lanewise(VectorOperators.LSHR, 99)).intoArray(arrLongs[op + 2], 2 * i); >>> vba.add(vbb.lanewise(VectorOperators.LSHR, 128)).intoArray(arrLongs[op + 3], 2 * i); >>> vba.add(vbb.lanewise(VectorOperators.LSHR, 157)).intoArray(arrLongs[op + 4], 2 * i); >>> vba.add(vbb.lanewise(VectorOperators.LSHR, 192)).intoArray(arrLongs[op + 5], 2 * i); >>> } >>> ``` >> >> >> Weird, I took a look at the the assembly, `ssra` did accessed by the tests on our server: >> $ ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java --add-modules jdk.incubator.vector -XX:CompileThreshold=1000 -XX:CompileCommand=print,compiler/vectorapi/TestVectorShiftImm.shift_* test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java 64 &> assembly_vlen64.txt >> $ ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java --add-modules jdk.incubator.vector -XX:CompileThreshold=1000 -XX:CompileCommand=print,compiler/vectorapi/TestVectorShiftImm.shift_* test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java 128 &> assembly_vlen128.txt >> $ cat assembly_vlen*.txt | grep "ssra" >> 02c0 ssra V18, V17, #37 # vector (2D) >> 02c8 ssra V19, V17, #0 # vector (2D) >> 02d0 ssra V20, V17, #35 # vector (2D) >> 0308 ssra V18, V17, #29 # vector (2D) >> 0644 ssra V18, V17, #37 # vector (2D) >> 064c ssra V19, V17, #0 # vector (2D) >> 0654 ssra V20, V17, #35 # vector (2D) >> 0674 ssra V18, V17, #29 # vector (2D) >> 0798 ssra V18, V17, #37 # vector (2D) >> 07a0 ssra V19, V17, #0 # vector (2D) >> 07a8 ssra V20, V17, #35 # vector (2D) >> 07e0 ssra V18, V17, #29 # vector (2D) >> 0x0000ffff83f7e500: ssra v18.2d, v17.2d, #37 ;*aload_0 {reexecute=0 rethrow=0 return_oop=0} >> 0x0000ffff83f7e510: ssra v20.2d, v17.2d, #35 ;*iand {reexecute=0 rethrow=0 return_oop=0} >> 0x0000ffff83f7e548: ssra v18.2d, v17.2d, #29 ;*if_icmpne {reexecute=0 rethrow=0 return_oop=0} >> 0x0000ffff83f7e884: ssra v18.2d, v17.2d, #37 >> 0x0000ffff83f7e894: ssra v20.2d, v17.2d, #35 >> 0x0000ffff83f7e8b4: ssra v18.2d, v17.2d, #29 >> 0x0000ffff83f7e9d8: ssra v18.2d, v17.2d, #37 ;*invokestatic broadcastInt {reexecute=0 rethrow=0 return_oop=0} >> 0x0000ffff83f7e9e8: ssra v20.2d, v17.2d, #35 ;*invokestatic broadcastInt {reexecute=0 rethrow=0 return_oop=0} >> 0x0000ffff83f7ea20: ssra v18.2d, v17.2d, #29 ;*checkcast {reexecute=0 rethrow=0 return_oop=0} >> 284 ssra V18, V17, #9 # vector (4S) >> 28c ssra V19, V17, #0 # vector (4S) >> 294 ssra V20, V17, #15 # vector (4S) >> 0x0000ffff83f822c4: ssra v18.4s, v17.4s, #9 ;*invokedynamic {reexecute=0 rethrow=0 return_oop=0} >> 0x0000ffff83f822d4: ssra v20.4s, v17.4s, #15 >> 284 ssra V18, V17, #1 # vector (8H) >> 28c ssra V19, V17, #8 # vector (8H) >> ... >> >> Also injected error to `sshr+add` by: >> --- a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp >> +++ b/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp >> @@ -545,7 +545,7 @@ public: >> #define WRAP(INSN) \ >> void INSN(FloatRegister Vd, SIMD_Arrangement T, FloatRegister Vn, int shift) { \ >> if (shift == 0) { \ >> - Assembler::addv(Vd, T, Vd, Vn); \ >> + Assembler::subv(Vd, T, Vd, Vn); \ >> } else { \ >> Assembler::INSN(Vd, T, Vn, shift); \ >> } \ >> The `shift+add` tests failed as expected: >> $ ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java --add-modules jdk.incubator.vector -XX:CompileThreshold=1000 -XX:-TieredCompilation test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java 64 >> WARNING: Using incubator modules: jdk.incubator.vector >> warning: using incubating module(s): jdk.incubator.vector >> 1 warning >> Exception in thread "main" java.lang.RuntimeException: Test Failed, failed tests: >> type SHORT index 19, operation ASHR_AND_ACCUMULATE, vector length 64. >> type SHORT index 21, operation ASHR_AND_ACCUMULATE, vector length 64. >> type SHORT index 23, operation ASHR_AND_ACCUMULATE, vector length 64. >> type SHORT index 25, operation LSHR_AND_ACCUMULATE, vector length 64. >> type SHORT index 27, operation LSHR_AND_ACCUMULATE, vector length 64. >> type SHORT index 29, operation LSHR_AND_ACCUMULATE, vector length 64. >> type INTEGER index 19, operation ASHR_AND_ACCUMULATE, vector length 64. >> type INTEGER index 21, operation ASHR_AND_ACCUMULATE, vector length 64. >> type INTEGER index 23, operation ASHR_AND_ACCUMULATE, vector length 64. >> type INTEGER index 25, operation LSHR_AND_ACCUMULATE, vector length 64. >> type INTEGER index 27, operation LSHR_AND_ACCUMULATE, vector length 64. >> type INTEGER index 29, operation LSHR_AND_ACCUMULATE, vector length 64. >> ... >> $ ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java --add-modules jdk.incubator.vector -XX:CompileThreshold=1000 -XX:-TieredCompilation test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java 128 >> WARNING: Using incubator modules: jdk.incubator.vector >> warning: using incubating module(s): jdk.incubator.vector >> 1 warning >> Exception in thread "main" java.lang.RuntimeException: Test Failed, failed tests: >> type LONG index 49, operation ASHR_AND_ACCUMULATE, vector length 128. >> type LONG index 51, operation ASHR_AND_ACCUMULATE, vector length 128. >> type LONG index 53, operation ASHR_AND_ACCUMULATE, vector length 128. >> type LONG index 55, operation LSHR_AND_ACCUMULATE, vector length 128. >> type LONG index 57, operation LSHR_AND_ACCUMULATE, vector length 128. >> type LONG index 59, operation LSHR_AND_ACCUMULATE, vector length 128. >> type SHORT index 49, operation ASHR_AND_ACCUMULATE, vector length 128. >> type SHORT index 51, operation ASHR_AND_ACCUMULATE, vector length 128. >> type SHORT index 53, operation ASHR_AND_ACCUMULATE, vector length 128. >> ... >> >> Anyway, I extracted operations you suggested into `shift_op_*` methods. >> Performed the error-injected experiments with the new tests on Kunpeng916 and re-checked the assembly output, results looks good. >> >> The test command I used to run the newest tests are: >> $ ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java --add-modules jdk.incubator.vector -XX:-TieredCompilation -XX:CompileThreshold=1000 -Dvlen=64 -XX:CompileCommand=print,compiler/vectorapi/TestVectorShiftImm.shift_* test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java &> assembly_vlen64.txt >> $ ./build/linux-aarch64-server-fastdebug/images/jdk/bin/java --add-modules jdk.incubator.vector -XX:-TieredCompilation -XX:CompileThreshold=1000 -Dvlen=128 -XX:CompileCommand=print,compiler/vectorapi/TestVectorShiftImm.shift_* test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java &> assembly_vlen128.txt >> $ cat assembly_vlen64.txt | grep ssra; cat assembly_vlen128.txt | grep ssra > >> _Mailing list message from [Andrew Haley](mailto:aph at redhat.com) on [hotspot-compiler-dev](mailto:hotspot-compiler-dev at openjdk.java.net):_ >> >> On 24/02/2021 07:33, Dong Bo wrote: >> >> > Weird, I took a look at the the assembly, `ssra` did accessed by the tests on our server: >> >> I don't doubt it, but the test code is so very complex that it can >> fall foul of heuristics given slightly changed circumstances. That's >> why good test cases are as simple as possible, and allow no room for >> variations because they do only one thing. Precise targeting should >> be the goal of HotSpot back-end test cases. >> > > Understood, thanks. :-) > Does the newest version address the concern? > I extracted the `shift`/`shift+add` operations into separate methods, mostly as suggested in previous comments, something like: > static int shift_op_long_ASHR_and_ADD(LongVector vba, LongVector vbb, long arrLongs[][], int end, int ind) { > vba.add(vbb.lanewise(VectorOperators.ASHR, 37)).intoArray(arrLongs[end++], ind); > vba.add(vbb.lanewise(VectorOperators.ASHR, 64)).intoArray(arrLongs[end++], ind); > vba.add(vbb.lanewise(VectorOperators.ASHR, 99)).intoArray(arrLongs[end++], ind); > vba.add(vbb.lanewise(VectorOperators.ASHR, 128)).intoArray(arrLongs[end++], ind); > vba.add(vbb.lanewise(VectorOperators.ASHR, 157)).intoArray(arrLongs[end++], ind); > vba.add(vbb.lanewise(VectorOperators.ASHR, 192)).intoArray(arrLongs[end++], ind); > return end; > } > _Mailing list message from [Andrew Haley](mailto:aph at redhat.com) on [hotspot-compiler-dev](mailto:hotspot-compiler-dev at openjdk.java.net):_ > > > I don't doubt it, but the test code is so very complex that it can > fall foul of heuristics given slightly changed circumstances. That's > why good test cases are as simple as possible, and allow no room for > variations because they do only one thing. Precise targeting should > be the goal of HotSpot back-end test cases. > Test updated, all the operations to test are put in overloaded functions, `shift_with_op` and `shift_with_op_and_add`, repeatly called and tested by `shift` and `shift_and_accumulate` repectively with a loop. Commands below are used to verify that `ssra` is accessed: $ cp hsdis-aarch64.so ./build/linux-aarch64-server-fastdebug/images/jdk/lib/ $ jtreg -verbose:all -J-Djavatest.maxOutputSize=50000000 test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java | grep ssra The tests do use `ssra` on our two different platforms, Kunpeng916 and Kunpeng920: 2b4 + ssra V16, V20, #17 # vector (2S) 2bc + ssra V17, V20, #0 # vector (2S) 2c4 + ssra V18, V20, #21 # vector (2S) 310 + ssra V16, V20, #12 # vector (2S) 0x0000ffff9894e0f4: ssra v16.2s, v20.2s, #17 ;*invokespecial fromArray0Template {reexecute=0 rethrow=0 return_oop=0} 0x0000ffff9894e104: ssra v18.2s, v20.2s, #21 ;*invokestatic arrayAddress {reexecute=0 rethrow=0 return_oop=0} 0x0000ffff9894e150: ssra v16.2s, v20.2s, #12 ;*checkcast {reexecute=0 rethrow=0 return_oop=0} 2b0 + ssra V16, V20, #9 # vector (4H) 2b8 + ssra V17, V20, #0 # vector (4H) 2c0 + ssra V18, V20, #11 # vector (4H) 0x0000ffff9894ae70: ssra v16.4h, v20.4h, #9 ;*invokevirtual vspecies {reexecute=0 rethrow=0 return_oop=0} 0x0000ffff9894ae80: ssra v18.4h, v20.4h, #11 ;*invokestatic arrayAddress {reexecute=0 rethrow=0 return_oop=0} : # out( N1802 ) <- 298in( R29, + R25B70, ssra V16 ) #7 + a4c ssra V17 + , spill V20, #0 # vector (8B) mov ssra a4cV18R1 + , spill V20R11 -> [sp, #12], # spill size = 32, ssra V16, V20, d08#3 # vector (8B)B150 ... Any comments? Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/2472 From stuefe at openjdk.java.net Fri Feb 26 06:46:39 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 26 Feb 2021 06:46:39 GMT Subject: RFR: JDK-8262074: Consolidate the default value of MetaspaceSize [v2] In-Reply-To: References: <2CoSJFr7zv4Q38bybB1v6-MLLePLJs-kv_897GOuudk=.92c86e36-18e6-4415-8ab1-6889993e976a@github.com> Message-ID: On Thu, 25 Feb 2021 23:38:11 GMT, Coleen Phillimore wrote: > This looks good. I appreciate the more conservative approach and doing this as a cleanup. Thank you Coleen! ------------- PR: https://git.openjdk.java.net/jdk/pull/2675 From stuefe at openjdk.java.net Fri Feb 26 06:50:41 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 26 Feb 2021 06:50:41 GMT Subject: RFR: JDK-8261520: JDK-8261302 breaks runtime/NMT/CheckForProperDetailStackTrace.java [v3] In-Reply-To: <0tUORQcqrtsTukh_IrT_E_ARsvkR4ijDNHOJ0zB5rwI=.7f622b0e-4a5d-4fb4-8637-79edd2a6f88b@github.com> References: <4pPyMfRC1i30Q_MxXBo8QE_RYmKd3CYfzWr2M4K-c5w=.e8d85c7a-e4c9-42f2-9bce-55600c0e0ec9@github.com> <0tUORQcqrtsTukh_IrT_E_ARsvkR4ijDNHOJ0zB5rwI=.7f622b0e-4a5d-4fb4-8637-79edd2a6f88b@github.com> Message-ID: On Thu, 25 Feb 2021 23:51:00 GMT, Coleen Phillimore wrote: > Yes this makes sense. I use NativeCallStack sometimes for debugging things and don't need the hash code, so this is good. Yes, I do too. Also, I had some vague thoughts once about using it in UL to provide callstacks at log sites (was actually Robins idea). Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/2672 From stuefe at openjdk.java.net Fri Feb 26 06:50:42 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 26 Feb 2021 06:50:42 GMT Subject: Integrated: JDK-8261520: JDK-8261302 breaks runtime/NMT/CheckForProperDetailStackTrace.java In-Reply-To: <4pPyMfRC1i30Q_MxXBo8QE_RYmKd3CYfzWr2M4K-c5w=.e8d85c7a-e4c9-42f2-9bce-55600c0e0ec9@github.com> References: <4pPyMfRC1i30Q_MxXBo8QE_RYmKd3CYfzWr2M4K-c5w=.e8d85c7a-e4c9-42f2-9bce-55600c0e0ec9@github.com> Message-ID: On Mon, 22 Feb 2021 08:48:53 GMT, Thomas Stuefe wrote: > Since JDK-8261302, the test runtime/NMT/CheckForProperDetailStackTrace.java fails with > java.lang.RuntimeException: 'NativeCallStack::NativeCallStack' found in stdout > > -- > > `NativeCallStack` contains a hash code. Before JDK-8261302, that hash code was calculated lazily in a non-inline hashcode getter. With JDK-8261302, the hash code calculation was moved into the `NativeCallStack` constructor and the getter was made inline. > > The `NativeCallStack` constructor fills itself via `os::get_native_stack()`. Before JDK-8261302, that call has been the last call in the constructor and hence had been sometimes optimized into a tail call. Whether or not its a tail call matters since it affects the number of stack frames the stack walker has to skip. Therefore, the constructor contains coding to predict tail-call-ness: > > #if (defined(_NMT_NOINLINE_) || defined(_WINDOWS) || !defined(_LP64) || defined(PPC64)) > // Not a tail call. > toSkip++; > #if (defined(_NMT_NOINLINE_) && defined(BSD) && defined(_LP64)) > // Mac OS X slowdebug builds have this odd behavior where NativeCallStack::NativeCallStack > // appears as two frames, so we need to skip an extra frame. > toSkip++; > #endif // Special-case for BSD. > #endif // Not a tail call. > > This prediction was now off since the hash code calculation happened at the end of the callstack. This causes the test error, since on some platforms (eg Linux x64) we now think we have a tail call when we don't, which means we do not skip enough frames, and the NMT output contains call frames like "NativeCallStack::NativeCallStack()", which trips the test. > > ----------- > > Fix: > > This fix moves the hash code calculation completely out of NativeCallStack. There is no reason why NativeCallStack should have a hash code. It mainly exists as a convenience to place it in a hash map. The patch moves the hash code calculation up into MallocSiteTableEntry. > > This has the advantage of only having to pay for a hash code when you need it - in theory, one may use NativeCallStack in places other than NMT, where it is unnecessary. > > I considered other options: > - modify `os::get_native_stack()` to also calculate a hash in addition to capturing the stack, and return it in a caller provided variable. That would have left this call to be the tail call. However, it seemed less clean - we have two implementations of this function, as well as other, non-capturing, NativeCallStack constructors, which would have to be modified. It also would have made `os::get_native_stack()` less general purpose. > - Leave it as it is and just always skip frames: Seemed attractive, but I did not want to touch the tailcode-prediction-code and play whack-the-mole with platform specific test errors. > > --------------- > > Tests: GA, manual test, nightlies at SAP This pull request has now been integrated. Changeset: 722142ee Author: Thomas Stuefe URL: https://git.openjdk.java.net/jdk/commit/722142ee Stats: 39 lines in 6 files changed: 13 ins; 17 del; 9 mod 8261520: JDK-8261302 breaks runtime/NMT/CheckForProperDetailStackTrace.java Reviewed-by: zgu, coleenp ------------- PR: https://git.openjdk.java.net/jdk/pull/2672 From iignatyev at openjdk.java.net Fri Feb 26 07:06:09 2021 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Fri, 26 Feb 2021 07:06:09 GMT Subject: RFR: 8256417: Exclude TestJFRWithJMX test from running with PodMan [v3] In-Reply-To: References: Message-ID: <73x-SSPaGvumOYkK9ymFXhvfUYkXZ3pigta78XC42gY=.990e4662-2ad5-45c2-ba6e-bfd3a27667be@github.com> On Fri, 26 Feb 2021 00:01:54 GMT, Mikhailo Seledtsov wrote: >> 8256417: Exclude TestJFRWithJMX test from running with PodMan > > Mikhailo Seledtsov has updated the pull request incrementally with three additional commits since the last revision: > > - Updated copyright years > - Reverted changes to VMProps.java since they are not in use > - Reverted changes to VMProps.java since they are not in use Marked as reviewed by iignatyev (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2726 From rehn at openjdk.java.net Fri Feb 26 09:30:16 2021 From: rehn at openjdk.java.net (Robbin Ehn) Date: Fri, 26 Feb 2021 09:30:16 GMT Subject: RFR: 8262443: GenerateOopMap::do_interpretation can spin for a long time. Message-ID: <28Qx7h9l5ubaDYe_QeS8uRIv_XTctt7Kog8BLx-_0Y8=.37a9d5f0-f1ae-4c7d-b92e-64a62fd12ed6@github.com> With Safepoint/Handshake timeout enabled in rare cases this methods spins for a long time, blocking safepoints/handshakes, so timeout (with a long delay) is triggered. In some cases we are in native while executing this method and in some in vm. That's why there is an check for state in vm. Tested with other changes in t-1-7 this specific case of timeout is no longer an issue. This change-set passes T1 stand alone. ------------- Commit messages: - Go to blocked when loop Changes: https://git.openjdk.java.net/jdk/pull/2742/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2742&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8262443 Stats: 11 lines in 2 files changed: 6 ins; 2 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/2742.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2742/head:pull/2742 PR: https://git.openjdk.java.net/jdk/pull/2742 From thartmann at openjdk.java.net Fri Feb 26 10:49:42 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 26 Feb 2021 10:49:42 GMT Subject: RFR: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set [v9] In-Reply-To: References: Message-ID: On Thu, 25 Feb 2021 08:56:14 GMT, Xin Liu wrote: >> Add a flag _suppress_cr to outputStream. outstream objects won't emit any CR if it's set. >> Correct TypeInstPtr::dump2 to make sure it only emits klass name once. >> Remove the comment because Klass::oop_print_on() has emitted the address of oop. >> >> Before: >> 689 ConP === 0 [[ 821 ]] Oop:java/lang/Stringjava.lang.String >> {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' >> - string: "a" >> :Constant:exact * >> >> After: >> 689 ConP === 0 [[ 821 ]] Oop:java.lang.String {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' - string: "a":Constant:exact * > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set > > update comments based on the review feedbacks. > move the unittest to test_stringUtil.cpp. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2178 From xliu at openjdk.java.net Fri Feb 26 10:49:42 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Fri, 26 Feb 2021 10:49:42 GMT Subject: Integrated: 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set In-Reply-To: References: Message-ID: On Thu, 21 Jan 2021 08:47:13 GMT, Xin Liu wrote: > Add a flag _suppress_cr to outputStream. outstream objects won't emit any CR if it's set. > Correct TypeInstPtr::dump2 to make sure it only emits klass name once. > Remove the comment because Klass::oop_print_on() has emitted the address of oop. > > Before: > 689 ConP === 0 [[ 821 ]] Oop:java/lang/Stringjava.lang.String > {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' > - string: "a" > :Constant:exact * > > After: > 689 ConP === 0 [[ 821 ]] Oop:java.lang.String {0x000000010159d7c8} - klass: public final synchronized 'java/lang/String' - string: "a":Constant:exact * This pull request has now been integrated. Changeset: 76032781 Author: Xin Liu Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/76032781 Stats: 55 lines in 2 files changed: 51 ins; 1 del; 3 mod 8260198: TypeInstPtr::dump2() emits multiple lines if Verbose is set Reviewed-by: thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/2178 From akozlov at openjdk.java.net Fri Feb 26 12:26:18 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Fri, 26 Feb 2021 12:26:18 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v19] In-Reply-To: References: Message-ID: > Please review the implementation of JEP 391: macOS/AArch64 Port. > > It's heavily based on existing ports to linux/aarch64, macos/x86_64, and windows/aarch64. > > Major changes are in: > * src/hotspot/cpu/aarch64: support of the new calling convention (subtasks JDK-8253817, JDK-8253818) > * src/hotspot/os_cpu/bsd_aarch64: copy of os_cpu/linux_aarch64 with necessary adjustments (JDK-8253819) > * src/hotspot/share, test/hotspot/gtest: support of write-xor-execute (W^X), required on macOS/AArch64 platform. It's implemented with pthread_jit_write_protect_np provided by Apple. The W^X mode is local to a thread, so W^X mode change relates to the java thread state change (for java threads). In most cases, JVM executes in write-only mode, except when calling a generated stub like SafeFetch, which requires a temporary switch to execute-only mode. The same execute-only mode is enabled when a java thread executes in java or native states. This approach of managing W^X mode turned out to be simple and efficient enough. > * src/jdk.hotspot.agent: serviceability agent implementation (JDK-8254941) Anton Kozlov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 89 commits: - Merge branch 'master' into jdk-macos - Merge remote-tracking branch 'upstream/jdk/master' into jdk-macos - Re-do safefetch.hpp - Merge remote-tracking branch 'origin/jdk/8261075-stubroutines-inline' into jdk-macos - stubRoutines.inline.hpp -> safefetch.hpp - Update copyrights - Merge remote-tracking branch 'upstream/jdk/master' into 8261075-stubroutines-inline - Merge remote-tracking branch 'upstream/jdk/master' into 8261075-stubroutines-inline - Extract SafeFetch32/N to stubRoutines.inline.hpp - Revert "Extract SafeFetch32/N to stubRoutines.inline.hpp" This reverts commit b873c25f31dd21349d140b790713cc9ccb5f2dc0. - ... and 79 more: https://git.openjdk.java.net/jdk/compare/d7efb4cc...74687c0b ------------- Changes: https://git.openjdk.java.net/jdk/pull/2200/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2200&range=18 Stats: 2953 lines in 74 files changed: 2862 ins; 27 del; 64 mod Patch: https://git.openjdk.java.net/jdk/pull/2200.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2200/head:pull/2200 PR: https://git.openjdk.java.net/jdk/pull/2200 From akozlov at openjdk.java.net Fri Feb 26 12:53:11 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Fri, 26 Feb 2021 12:53:11 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v20] In-Reply-To: References: Message-ID: <4ipGJ8KKE0_KPuWfExCo1jo3Tg9iewQkSKuZTcntoNE=.8c1c3dcb-4db2-4e0a-b42f-6db5f78c6406@github.com> > Please review the implementation of JEP 391: macOS/AArch64 Port. > > It's heavily based on existing ports to linux/aarch64, macos/x86_64, and windows/aarch64. > > Major changes are in: > * src/hotspot/cpu/aarch64: support of the new calling convention (subtasks JDK-8253817, JDK-8253818) > * src/hotspot/os_cpu/bsd_aarch64: copy of os_cpu/linux_aarch64 with necessary adjustments (JDK-8253819) > * src/hotspot/share, test/hotspot/gtest: support of write-xor-execute (W^X), required on macOS/AArch64 platform. It's implemented with pthread_jit_write_protect_np provided by Apple. The W^X mode is local to a thread, so W^X mode change relates to the java thread state change (for java threads). In most cases, JVM executes in write-only mode, except when calling a generated stub like SafeFetch, which requires a temporary switch to execute-only mode. The same execute-only mode is enabled when a java thread executes in java or native states. This approach of managing W^X mode turned out to be simple and efficient enough. > * src/jdk.hotspot.agent: serviceability agent implementation (JDK-8254941) Anton Kozlov has updated the pull request incrementally with two additional commits since the last revision: - Merge pull request #10 from VladimirKempik/pull/2200 Fix build after merge with master - Fix build after merge with master ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2200/files - new: https://git.openjdk.java.net/jdk/pull/2200/files/74687c0b..241aedee Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2200&range=19 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2200&range=18-19 Stats: 7 lines in 1 file changed: 0 ins; 1 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/2200.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2200/head:pull/2200 PR: https://git.openjdk.java.net/jdk/pull/2200 From vkempik at openjdk.java.net Fri Feb 26 12:55:50 2021 From: vkempik at openjdk.java.net (Vladimir Kempik) Date: Fri, 26 Feb 2021 12:55:50 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 23:07:08 GMT, Daniel D. Daugherty wrote: >> Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> support macos_aarch64 in hsdis > > src/java.base/macosx/native/libjli/java_md_macosx.m line 210: > >> 208: if (preferredJVM == NULL) { >> 209: #if defined(__i386__) >> 210: preferredJVM = "client"; > > #if defined(__i386__) > preferredJVM = "client"; > Not your bug, but Oracle/OpenJDK never supported 32-bit __i386__ on macOS. Hello I thought the openjdk7 supported 32-bit macos at some point in time ------------- PR: https://git.openjdk.java.net/jdk/pull/2200 From hseigel at openjdk.java.net Fri Feb 26 14:16:47 2021 From: hseigel at openjdk.java.net (Harold Seigel) Date: Fri, 26 Feb 2021 14:16:47 GMT Subject: RFR: 8262426: Change TRAPS to Thread* for find_constrained_instance_or_arrays_klass() Message-ID: Please review this small fix to change the last parameter to find_constrained_instance_or_arrays_klass() from TRAPS to Thread*. TRAPS is not needed because the method does not throw exceptions. The fix was tested with Mach5 tiers 1 and 2 on Linux, Mac OS, and Windows, and tiers 3-5 on Linux x64. Thanks, Harold ------------- Commit messages: - 8262426: Change TRAPS to Thread* for find_constrained_instance_or_array_klass() Changes: https://git.openjdk.java.net/jdk/pull/2746/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2746&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8262426 Stats: 5 lines in 4 files changed: 0 ins; 1 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/2746.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2746/head:pull/2746 PR: https://git.openjdk.java.net/jdk/pull/2746 From daniel.daugherty at oracle.com Fri Feb 26 14:37:11 2021 From: daniel.daugherty at oracle.com (daniel.daugherty at oracle.com) Date: Fri, 26 Feb 2021 09:37:11 -0500 Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: References: Message-ID: <77712482-850c-3945-2c4e-7865544a412b@oracle.com> On 2/26/21 7:55 AM, Vladimir Kempik wrote: > On Tue, 2 Feb 2021 23:07:08 GMT, Daniel D. Daugherty wrote: > >>> Anton Kozlov has updated the pull request incrementally with one additional commit since the last revision: >>> >>> support macos_aarch64 in hsdis >> src/java.base/macosx/native/libjli/java_md_macosx.m line 210: >> >>> 208: if (preferredJVM == NULL) { >>> 209: #if defined(__i386__) >>> 210: preferredJVM = "client"; >> #if defined(__i386__) >> preferredJVM = "client"; >> Not your bug, but Oracle/OpenJDK never supported 32-bit __i386__ on macOS. > Hello > I thought the openjdk7 supported 32-bit macos at some point in time The macOS porting project supported 32-bit macOS, but when the code was integrated into OpenJDK7 I don't believe that 32-bit macOS was supported. I could be wrong... it was quite a while ago... Dan > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/2200 From erik.joelsson at oracle.com Fri Feb 26 15:06:39 2021 From: erik.joelsson at oracle.com (erik.joelsson at oracle.com) Date: Fri, 26 Feb 2021 07:06:39 -0800 Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v9] In-Reply-To: <77712482-850c-3945-2c4e-7865544a412b@oracle.com> References: <77712482-850c-3945-2c4e-7865544a412b@oracle.com> Message-ID: On 2021-02-26 06:37, daniel.daugherty at oracle.com wrote: > On 2/26/21 7:55 AM, Vladimir Kempik wrote: >> On Tue, 2 Feb 2021 23:07:08 GMT, Daniel D. Daugherty >> wrote: >> >>>> Anton Kozlov has updated the pull request incrementally with one >>>> additional commit since the last revision: >>>> >>>> ?? support macos_aarch64 in hsdis >>> src/java.base/macosx/native/libjli/java_md_macosx.m line 210: >>> >>>> 208:???? if (preferredJVM == NULL) { >>>> 209: #if defined(__i386__) >>>> 210:???????? preferredJVM = "client"; >>> #if defined(__i386__) >>> ???????? preferredJVM = "client"; >>> Not your bug, but Oracle/OpenJDK never supported 32-bit __i386__ on >>> macOS. >> Hello >> I thought the openjdk7 supported 32-bit macos at some point in time > > The macOS porting project supported 32-bit macOS, but when the code > was integrated into OpenJDK7 I don't believe that 32-bit macOS was > supported. I could be wrong... it was quite a while ago... > AFAIK, OpenJDK never supported 32bit macos, but there are certainly leftovers here and there indicating that the original source did at some point. In the build system we cleaned that up a long time ago. /Erik From coleenp at openjdk.java.net Fri Feb 26 15:50:42 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 26 Feb 2021 15:50:42 GMT Subject: RFR: 8262402: Make CATCH macro assert not fatal In-Reply-To: References: Message-ID: On Fri, 26 Feb 2021 00:24:41 GMT, David Holmes wrote: >> Hopefully, this is a trivial change. Tested with tier1 on linux, macosx, and windows, product and debug. > > Looks good and trivial. > > Thanks, > David Thanks David. ------------- PR: https://git.openjdk.java.net/jdk/pull/2736 From coleenp at openjdk.java.net Fri Feb 26 15:50:44 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 26 Feb 2021 15:50:44 GMT Subject: Integrated: 8262402: Make CATCH macro assert not fatal In-Reply-To: References: Message-ID: On Thu, 25 Feb 2021 23:58:16 GMT, Coleen Phillimore wrote: > Hopefully, this is a trivial change. Tested with tier1 on linux, macosx, and windows, product and debug. This pull request has now been integrated. Changeset: d06d6f51 Author: Coleen Phillimore URL: https://git.openjdk.java.net/jdk/commit/d06d6f51 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8262402: Make CATCH macro assert not fatal Reviewed-by: dholmes ------------- PR: https://git.openjdk.java.net/jdk/pull/2736 From coleenp at openjdk.java.net Fri Feb 26 15:55:38 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 26 Feb 2021 15:55:38 GMT Subject: RFR: 8262426: Change TRAPS to Thread* for find_constrained_instance_or_array_klass() In-Reply-To: References: Message-ID: On Fri, 26 Feb 2021 14:12:05 GMT, Harold Seigel wrote: > Please review this small fix to change the last parameter to find_constrained_instance_or_arrays_klass() from TRAPS to Thread*. TRAPS is not needed because the method does not throw exceptions. > > The fix was tested with Mach5 tiers 1 and 2 on Linux, Mac OS, and Windows, and tiers 3-5 on Linux x64. > > Thanks, Harold This looks good and trivial. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2746 From hseigel at openjdk.java.net Fri Feb 26 15:59:41 2021 From: hseigel at openjdk.java.net (Harold Seigel) Date: Fri, 26 Feb 2021 15:59:41 GMT Subject: RFR: 8262426: Change TRAPS to Thread* for find_constrained_instance_or_array_klass() In-Reply-To: References: Message-ID: <7k_fqiZv2p_eRynkwwZodqwGauVk086pE5A4TdqLUXM=.c5dd976c-64bb-4562-a6fc-13580e9f54eb@github.com> On Fri, 26 Feb 2021 15:52:23 GMT, Coleen Phillimore wrote: >> Please review this small fix to change the last parameter to find_constrained_instance_or_arrays_klass() from TRAPS to Thread*. TRAPS is not needed because the method does not throw exceptions. >> >> The fix was tested with Mach5 tiers 1 and 2 on Linux, Mac OS, and Windows, and tiers 3-5 on Linux x64. >> >> Thanks, Harold > > This looks good and trivial. Thanks Coleen for reviewing this! ------------- PR: https://git.openjdk.java.net/jdk/pull/2746 From hseigel at openjdk.java.net Fri Feb 26 15:59:41 2021 From: hseigel at openjdk.java.net (Harold Seigel) Date: Fri, 26 Feb 2021 15:59:41 GMT Subject: Integrated: 8262426: Change TRAPS to Thread* for find_constrained_instance_or_array_klass() In-Reply-To: References: Message-ID: <882srXBbEheC9p6QhwU4Ccn9WERZHn9lUqFLsByXN0Y=.6595c564-cb87-445d-83ac-7ab4749b82a6@github.com> On Fri, 26 Feb 2021 14:12:05 GMT, Harold Seigel wrote: > Please review this small fix to change the last parameter to find_constrained_instance_or_arrays_klass() from TRAPS to Thread*. TRAPS is not needed because the method does not throw exceptions. > > The fix was tested with Mach5 tiers 1 and 2 on Linux, Mac OS, and Windows, and tiers 3-5 on Linux x64. > > Thanks, Harold This pull request has now been integrated. Changeset: 05c11bcb Author: Harold Seigel URL: https://git.openjdk.java.net/jdk/commit/05c11bcb Stats: 5 lines in 4 files changed: 0 ins; 1 del; 4 mod 8262426: Change TRAPS to Thread* for find_constrained_instance_or_array_klass() Reviewed-by: coleenp ------------- PR: https://git.openjdk.java.net/jdk/pull/2746 From stuefe at openjdk.java.net Fri Feb 26 16:07:41 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 26 Feb 2021 16:07:41 GMT Subject: Integrated: JDK-8262074: Consolidate the default value of MetaspaceSize In-Reply-To: <2CoSJFr7zv4Q38bybB1v6-MLLePLJs-kv_897GOuudk=.92c86e36-18e6-4415-8ab1-6889993e976a@github.com> References: <2CoSJFr7zv4Q38bybB1v6-MLLePLJs-kv_897GOuudk=.92c86e36-18e6-4415-8ab1-6889993e976a@github.com> Message-ID: <7JLpodvI8CQ3aIGEjjJZom8elAqnAFCLlQvpi3miDAY=.e025e2f5-55b0-43b7-9433-8829ee587a92@github.com> On Mon, 22 Feb 2021 16:08:20 GMT, Thomas Stuefe wrote: > I was looking at whether the default values for MetaspaceSize (the initial threshold to start off a metaspace-motivated GC) still make sense after JEP-387. > > The default is dependent on compiler tier and bitness. It is also spread across all platforms. > > In addition to that, it also may get modified after Metaspace::ergo_initialize() in client-compiler-emulation-mode: > > https://github.com/openjdk/jdk/blob/2b00367e1154feb2c05b84a11d62fb5750e46acf/src/hotspot/share/compiler/compilerDefinitions.cpp#L194-L196 > > which is unexpected and causes confusion (eg JDK-8261907, JDK-8261907). > > The reasons for this seem to originate from PermGen times: > https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2021-February/045536.html > > ---- > > Today, MetaspaceSize defaults to: > > - no compiler (eg Zero): **4M** (32bit) **5.19M** (64bit) > - C1-only build: **12M** > - C1+C2 build (standard): **16M** (32bit) **20,75M** (64bit) > > I was surprised to see that they do not depend on any compiler *runtime* switches. It only depends on build time decisions. > > --- > > How much do we use? I analyzed a simple java app to see the difference VM settings make on initial metaspace consumption. Committed space, used in brackets: > > > (Note: (used) committed > CDS on: > > 64bit: (181,58 KB) 384 KB (a) > 64bit tier1 only: (170,04 KB) 384 KB > 64bit Xint: (16,62 KB) 256 KB > > 32bit (178 KB) 256 KB > 32bit tier1 only: (144 KB) 256 KB > 32bit Xint: (11 KB) (b) 128 KB > > CDS off: > > 64bit: (5,06 MB) 5.62 MB > 64bit tier1 only: (5,00 MB) 5,56 MB > 64bit Xint: (4,84 MB) 5.44 MB > > 32bit (3,69 MB) 3.75 MB > 32bit tier1 only: (3,65 MB) 3.75 MB > 32bit Xint: (3,52 MB) 3.62 MB > > Class space on/off > > CDS off, 64bit, +CompressedClassPointers: 5.44M > CDS off, 64bit, -CompressedClassPointers: 5.38M > > > _Notes: > (a) Since JEP-387, with CDS=on, we pay very little committed footprint upfront (384K). For comparison, JDK 15 commits here 5.75M. > (b) The seemingly high difference between Xint and C1+C2 - 11K vs 178K - is misleading: All initial classes get compiled, but since most of their metadata live in CDS, not in Metaspace, all we allocate at the start are MethodCounters. Hence, with -Xint, we almost allocate nothing. That changes as soon as we start loading application classes._ > > Conclusions: > - CDS=off increases metaspace footprint by a flat amount, in my case ~5MB, which makes sense. > - Running with (any) compiler has not much influence once we start using Metaspace for real. The difference between C1-only and C1+C2 is neglectible, the difference between Xint and C1+C2 amounts to about 2% wrt to initial metaspace consumption. > - Running with or without compressed Klass pointers makes not much difference. With class space, we pay for certain overhead twice, but at this early stage this is not noticeable. > - The difference between 64bit and 32bit is more like 1.4-1.5, not the 1.3 factor we currently assume > > ----- > > Proposal: > > 1) I propose to make MetaspaceSize independent from compiler. For one, if the intention was to have a lower threshold with compilers deactivated, that has never worked. E.g. on 64bit we always had a threshold of 20.75MB regardless of Xint/TieredStopAtLevel. Even if it worked, the compiler does not make that much difference in metaspace footprint. > > 2) I propose to slightly lower MetaspaceSize - on 32bit from 16M to 14M, on 64bit from 20.75M to 20M. This takes the slightly lower metaspace footprint since JEP 387 into account (less waste) and the scale I found to be higher than 1.3. > > This is all very cautious. For the standard VM, very little changes, so this is mainly a cleanup patch. We could probably tune MetaspaceSize down to much lower levels. And/or make size it differently depending on UseSharedSpaces. However, atm I don't have time to hunt regressions due to too early GCs. > > ----- > > Tests: GA, nightlies at SAP This pull request has now been integrated. Changeset: c9e91897 Author: Thomas Stuefe URL: https://git.openjdk.java.net/jdk/commit/c9e91897 Stats: 28 lines in 13 files changed: 0 ins; 27 del; 1 mod 8262074: Consolidate the default value of MetaspaceSize Reviewed-by: iklam, coleenp ------------- PR: https://git.openjdk.java.net/jdk/pull/2675 From mseledtsov at openjdk.java.net Fri Feb 26 16:23:38 2021 From: mseledtsov at openjdk.java.net (Mikhailo Seledtsov) Date: Fri, 26 Feb 2021 16:23:38 GMT Subject: RFR: 8256417: Exclude TestJFRWithJMX test from running with PodMan [v3] In-Reply-To: <73x-SSPaGvumOYkK9ymFXhvfUYkXZ3pigta78XC42gY=.990e4662-2ad5-45c2-ba6e-bfd3a27667be@github.com> References: <73x-SSPaGvumOYkK9ymFXhvfUYkXZ3pigta78XC42gY=.990e4662-2ad5-45c2-ba6e-bfd3a27667be@github.com> Message-ID: On Fri, 26 Feb 2021 07:02:36 GMT, Igor Ignatyev wrote: >> Mikhailo Seledtsov has updated the pull request incrementally with three additional commits since the last revision: >> >> - Updated copyright years >> - Reverted changes to VMProps.java since they are not in use >> - Reverted changes to VMProps.java since they are not in use > > Marked as reviewed by iignatyev (Reviewer). Thank you Igor. ------------- PR: https://git.openjdk.java.net/jdk/pull/2726 From mseledtsov at openjdk.java.net Fri Feb 26 16:23:40 2021 From: mseledtsov at openjdk.java.net (Mikhailo Seledtsov) Date: Fri, 26 Feb 2021 16:23:40 GMT Subject: Integrated: 8256417: Exclude TestJFRWithJMX test from running with PodMan In-Reply-To: References: Message-ID: On Thu, 25 Feb 2021 19:14:48 GMT, Mikhailo Seledtsov wrote: > 8256417: Exclude TestJFRWithJMX test from running with PodMan This pull request has now been integrated. Changeset: 07061fc7 Author: Mikhailo Seledtsov URL: https://git.openjdk.java.net/jdk/commit/07061fc7 Stats: 15 lines in 2 files changed: 12 ins; 1 del; 2 mod 8256417: Exclude TestJFRWithJMX test from running with PodMan Reviewed-by: iignatyev ------------- PR: https://git.openjdk.java.net/jdk/pull/2726 From hseigel at openjdk.java.net Fri Feb 26 18:28:54 2021 From: hseigel at openjdk.java.net (Harold Seigel) Date: Fri, 26 Feb 2021 18:28:54 GMT Subject: RFR: 8262028: Make InstanceKlass::implementor return InstanceKlass Message-ID: <7ME0ALE4x-SV0Lh8Yrb04OaUCIWcNmft6jkALr3CdyQ=.89947c5e-b039-4fd3-9387-576295a7f9f7@github.com> Please review this small fix to change the parameter and return types from Klass* to InstanceKlass* in the InstanceKlass::*implementor() functions. The fix was tested with Mach5 tiers 1 and 2 on Linux, Mac OS, and Windows, and tiers 3-5 on Linux x64. Thanks, Harold ------------- Commit messages: - 8262028: Make InstanceKlass::implementor return InstanceKlass Changes: https://git.openjdk.java.net/jdk/pull/2755/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2755&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8262028 Stats: 36 lines in 4 files changed: 0 ins; 0 del; 36 mod Patch: https://git.openjdk.java.net/jdk/pull/2755.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2755/head:pull/2755 PR: https://git.openjdk.java.net/jdk/pull/2755 From stuefe at openjdk.java.net Fri Feb 26 18:46:54 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 26 Feb 2021 18:46:54 GMT Subject: RFR: JDK-8262472: Buffer overflow in UNICODE::as_utf8 for zero length output buffer Message-ID: This one is trivial and probably inconsequential, but lets fix it anyway. There is a buffer overflow in both variants of UNICODE::as_utf8, where in case of truncation due to a zero length output buffer the terminating zero still gets written. Added fix + gtest. Ran gtest. ------------- Commit messages: - start Changes: https://git.openjdk.java.net/jdk/pull/2753/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2753&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8262472 Stats: 62 lines in 2 files changed: 61 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2753.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2753/head:pull/2753 PR: https://git.openjdk.java.net/jdk/pull/2753 From akozlov at openjdk.java.net Fri Feb 26 19:17:12 2021 From: akozlov at openjdk.java.net (Anton Kozlov) Date: Fri, 26 Feb 2021 19:17:12 GMT Subject: RFR: 8253795: Implementation of JEP 391: macOS/AArch64 Port [v21] In-Reply-To: References: Message-ID: > Please review the implementation of JEP 391: macOS/AArch64 Port. > > It's heavily based on existing ports to linux/aarch64, macos/x86_64, and windows/aarch64. > > Major changes are in: > * src/hotspot/cpu/aarch64: support of the new calling convention (subtasks JDK-8253817, JDK-8253818) > * src/hotspot/os_cpu/bsd_aarch64: copy of os_cpu/linux_aarch64 with necessary adjustments (JDK-8253819) > * src/hotspot/share, test/hotspot/gtest: support of write-xor-execute (W^X), required on macOS/AArch64 platform. It's implemented with pthread_jit_write_protect_np provided by Apple. The W^X mode is local to a thread, so W^X mode change relates to the java thread state change (for java threads). In most cases, JVM executes in write-only mode, except when calling a generated stub like SafeFetch, which requires a temporary switch to execute-only mode. The same execute-only mode is enabled when a java thread executes in java or native states. This approach of managing W^X mode turned out to be simple and efficient enough. > * src/jdk.hotspot.agent: serviceability agent implementation (JDK-8254941) Anton Kozlov has updated the pull request incrementally with two additional commits since the last revision: - Merge remote-tracking branch 'origin/jdk/jdk-macos' into jdk-macos - Minor fixes ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2200/files - new: https://git.openjdk.java.net/jdk/pull/2200/files/241aedee..663cb4a1 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2200&range=20 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2200&range=19-20 Stats: 85 lines in 5 files changed: 0 ins; 80 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/2200.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2200/head:pull/2200 PR: https://git.openjdk.java.net/jdk/pull/2200 From coleenp at openjdk.java.net Fri Feb 26 19:23:38 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 26 Feb 2021 19:23:38 GMT Subject: RFR: 8262028: Make InstanceKlass::implementor return InstanceKlass In-Reply-To: <7ME0ALE4x-SV0Lh8Yrb04OaUCIWcNmft6jkALr3CdyQ=.89947c5e-b039-4fd3-9387-576295a7f9f7@github.com> References: <7ME0ALE4x-SV0Lh8Yrb04OaUCIWcNmft6jkALr3CdyQ=.89947c5e-b039-4fd3-9387-576295a7f9f7@github.com> Message-ID: <9Y83LhAuQV81eeDE48fLutd1QgWwrzpfMEY419-ka6I=.398b05a1-4b5c-4600-b8d1-478897927203@github.com> On Fri, 26 Feb 2021 18:23:34 GMT, Harold Seigel wrote: > Please review this small fix to change the parameter and return types from Klass* to InstanceKlass* in the InstanceKlass::*implementor() functions. > > The fix was tested with Mach5 tiers 1 and 2 on Linux, Mac OS, and Windows, and tiers 3-5 on Linux x64. > > Thanks, Harold Looks good! ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2755 From ccheung at openjdk.java.net Fri Feb 26 19:58:50 2021 From: ccheung at openjdk.java.net (Calvin Cheung) Date: Fri, 26 Feb 2021 19:58:50 GMT Subject: RFR: 8262028: Make InstanceKlass::implementor return InstanceKlass In-Reply-To: <7ME0ALE4x-SV0Lh8Yrb04OaUCIWcNmft6jkALr3CdyQ=.89947c5e-b039-4fd3-9387-576295a7f9f7@github.com> References: <7ME0ALE4x-SV0Lh8Yrb04OaUCIWcNmft6jkALr3CdyQ=.89947c5e-b039-4fd3-9387-576295a7f9f7@github.com> Message-ID: <7IG8pAkfKPim25dji_Lbs9u_JIpHaRqrteK4ARanZuU=.e06841f6-a1ea-40ee-833f-09109ad8aa33@github.com> On Fri, 26 Feb 2021 18:23:34 GMT, Harold Seigel wrote: > Please review this small fix to change the parameter and return types from Klass* to InstanceKlass* in the InstanceKlass::*implementor() functions. > > The fix was tested with Mach5 tiers 1 and 2 on Linux, Mac OS, and Windows, and tiers 3-5 on Linux x64. > > Thanks, Harold Looks good. ------------- Marked as reviewed by ccheung (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2755 From vlivanov at openjdk.java.net Fri Feb 26 20:29:47 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 26 Feb 2021 20:29:47 GMT Subject: RFR: 8262028: Make InstanceKlass::implementor return InstanceKlass In-Reply-To: <7ME0ALE4x-SV0Lh8Yrb04OaUCIWcNmft6jkALr3CdyQ=.89947c5e-b039-4fd3-9387-576295a7f9f7@github.com> References: <7ME0ALE4x-SV0Lh8Yrb04OaUCIWcNmft6jkALr3CdyQ=.89947c5e-b039-4fd3-9387-576295a7f9f7@github.com> Message-ID: On Fri, 26 Feb 2021 18:23:34 GMT, Harold Seigel wrote: > Please review this small fix to change the parameter and return types from Klass* to InstanceKlass* in the InstanceKlass::*implementor() functions. > > The fix was tested with Mach5 tiers 1 and 2 on Linux, Mac OS, and Windows, and tiers 3-5 on Linux x64. > > Thanks, Harold Looks good! ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2755 From ccheung at openjdk.java.net Fri Feb 26 22:07:40 2021 From: ccheung at openjdk.java.net (Calvin Cheung) Date: Fri, 26 Feb 2021 22:07:40 GMT Subject: RFR: 8259070: Add jcmd option to dump CDS In-Reply-To: References: Message-ID: On Fri, 26 Feb 2021 00:03:40 GMT, Yumin Qi wrote: > Hi, Please review > > Added jcmd option for dumping CDS archive during application runtime. Before this change, user has to dump shared archive in two steps: first run application with > `java -XX:DumpLoadedClassList= .... ` > to collect shareable class names and saved in file `` , then > `java -Xshare:dump -XX:SharedClassListFile= -XX:SharedArchiveFile= ...` > With this change, user can use jcmd to dump CDS without going through above steps. Also user can choose a moment during the app runtime to dump an archive. > The bug is associated with the CSR: https://bugs.openjdk.java.net/browse/JDK-8259798 which has been approved. > New added jcmd option: > `jcmd VM.cds static_dump ` > or > `jcmd VM.cds dynamic_dump ` > To dump dynamic archive, requires start app with newly added flag `-XX:+RecordDynamicDumpInfo`, with this flag, some information related to dynamic dump like loader constraints will be recorded. Note the dumping process changed some object memory locations so for dumping dynamic archive, can only done once for a running app. For static dump, user can dump multiple times against same process. > The file name is optional, if the file name is not supplied, the file name will take format of `java_pid_static.jsa` or `java_pid_dynamic.jsa` for static and dynamic respectively. The `` is the application process ID. > > Tests: tier1,tier2,tier3,tier4 > > Thanks > Yumin Looks like a good usability enhancement to CDS. Some comments below... Thanks, Calvin Below are my comments... src/hotspot/share/memory/metaspaceShared.cpp line 783: > 781: char* start = buffer + strlen(buffer); > 782: snprintf(start, buff_len, "%s ", arg); > 783: } Maybe move the above function to the StringUtils class under share/utilities? Use `os::snprintf()` instead of `snprintf()`? src/hotspot/share/memory/metaspaceShared.cpp line 788: > 786: // The existing file will be overwritten. > 787: char filename[JVM_MAXPATHLEN]; > 788: const char* file = file_name; Is the variable at line 788 necessary? Could you just pass filename to callees? src/hotspot/share/memory/metaspaceShared.cpp line 801: > 799: file = filename; > 800: } > 801: } This block of code is very similar to lines 813 - 821 below. Maybe factor it into another function? src/hotspot/share/memory/metaspaceShared.cpp line 831: > 829: DumpClassListCLDClosure(fileStream* f) : CLDClosure() { _stream = f; } > 830: ~DumpClassListCLDClosure() { > 831: delete _stream; // The file need close since in child process it will be used. Can you clarify the above comment? src/hotspot/share/memory/metaspaceShared.cpp line 856: > 854: char classlist_name[JVM_MAXPATHLEN]; > 855: > 856: os::snprintf(classlist_name, sizeof(classlist_name), "%s.classlist", file); I think the `file` contains the ".jsa" suffix. So the `classlist_name` would be .jsa.classlist. Maybe only the prefix of `file` should be passed into cmd_dump_static() and cmd_dump_dynamic() and have the functions append the suffix for the names for the classlist and archive? test/hotspot/jtreg/runtime/cds/appcds/jcmd/JCmdTest.java line 213: > 211: if (!cdsEnabled) { > 212: System.out.println("CDS is not available for this JDK, skip the test."); > 213: return; Should throw SkippedException instead. ------------- Changes requested by ccheung (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2737 From iklam at openjdk.java.net Fri Feb 26 23:06:42 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Fri, 26 Feb 2021 23:06:42 GMT Subject: RFR: 8259070: Add jcmd option to dump CDS In-Reply-To: References: Message-ID: On Fri, 26 Feb 2021 00:03:40 GMT, Yumin Qi wrote: > Hi, Please review > > Added jcmd option for dumping CDS archive during application runtime. Before this change, user has to dump shared archive in two steps: first run application with > `java -XX:DumpLoadedClassList= .... ` > to collect shareable class names and saved in file `` , then > `java -Xshare:dump -XX:SharedClassListFile= -XX:SharedArchiveFile= ...` > With this change, user can use jcmd to dump CDS without going through above steps. Also user can choose a moment during the app runtime to dump an archive. > The bug is associated with the CSR: https://bugs.openjdk.java.net/browse/JDK-8259798 which has been approved. > New added jcmd option: > `jcmd VM.cds static_dump ` > or > `jcmd VM.cds dynamic_dump ` > To dump dynamic archive, requires start app with newly added flag `-XX:+RecordDynamicDumpInfo`, with this flag, some information related to dynamic dump like loader constraints will be recorded. Note the dumping process changed some object memory locations so for dumping dynamic archive, can only done once for a running app. For static dump, user can dump multiple times against same process. > The file name is optional, if the file name is not supplied, the file name will take format of `java_pid_static.jsa` or `java_pid_dynamic.jsa` for static and dynamic respectively. The `` is the application process ID. > > Tests: tier1,tier2,tier3,tier4 > > Thanks > Yumin Changes requested by iklam (Reviewer). src/hotspot/share/memory/dynamicArchive.cpp line 347: > 345: if (Arguments::GetSharedDynamicArchivePath() == NULL) { > 346: if (!RecordDynamicDumpInfo) { > 347: // If run with -XX:+RecordDynamicDumpInfo, DynamicDumpSharedSpaces will be turned on, Is this check needed? It looks like `MetaspaceShared::cmd_dump_dynamic` will not call `DynamicArchive::dump()` unless the path was set up correctly. src/hotspot/share/memory/metaspaceShared.cpp line 811: > 809: if (!RecordDynamicDumpInfo) { > 810: output->print_cr("Please run with -Xshare:auto -XX:+RecordDynamicDumpInfo dumping dynamic archive!"); > 811: return; There are several error conditions: (1) CDS is not configured for this VM build. In this case, `INCLUDE_CDS` is false. The DumpSharedArchiveDCmd should be placed inside `#if INCLUDE_CDS`, so the user won't be able to issue `jcmd VM.cds` at all, and we will never come to here. (2) CDS is configured, but the JVM process is not running with CDS enabled. This could have several causes: - The JVM does not have a built-in archive. - User has specified `-XX:SharedArchiveFile=foo.jsa`, but foo.jsa doesn't exist - The shared archive exists, but has failed to map - `-Xshare:off` is specified in the command-line. I think all of the above can be checked inside metaspce.cpp. Note that if you have specified `DynamicDumpSharedSpaces` but the base archive fails to map, you will get a similar error. if (RecordDynamicDumpInfo && !UseSharedSpaces) { vm_exit_during_initialization("RecordDynamicDumpInfo is unsupported when base CDS archive is not loaded "); } (3) `jcmd VM.cds dynamic_dump` is used on a JVM process without RecordDynamicDumpInfo: We can do the check here, but I think we can change the error message to be more specific: if (!RecordDynamicDumpInfo) { output->print_cr("Unable to dump dynamic shared archive. " "Please restart the JVM with -XX:+RecordDynamicDumpInfo"); } Note that the user can get to (3) with incorrect command-line options such as `java -Xshare:off .....`, but we don't need to list all those conditions here. Instead, if the user follows the suggestion of (3) and add `-XX:+RecordDynamicDumpInfo` to the command-line, they will then get the error message of (2) and will know how to proceed further. src/hotspot/share/memory/metaspaceShared.cpp line 795: > 793: if (file_name ==nullptr) { > 794: os::snprintf(filename, sizeof(filename), "java_pid%d_static.jsa", os::current_process_id()); > 795: file = filename; I think the above `os::snprintf` can be combined for both dynamic and static case: int n = os::snprintf(filename, sizeof(filename), "java_pid%d_%s.jsa", os::current_process_id() (is_static ? "static" : "dynamic"); assert(n < sizeof(filename), "should not truncate"); `snprintf` man page says "a return value of size or more means that the output was truncated." src/hotspot/share/memory/metaspaceShared.cpp line 799: > 797: if (strstr(file_name, ".jsa") == nullptr) { > 798: os::snprintf(filename, sizeof(filename), "%s.jsa", file_name); > 799: file = filename; This could potentially overflow the buffer. I think it's best to just leave `file_name` alone. If the user doesn't want the `.jsa` extension, that's fine. Similarly, we don't add `.jsa` to `-XX:ArchiveClassesAtExit` or `-XX:SharedArchiveFile`. src/hotspot/share/memory/metaspaceShared.cpp line 856: > 854: char classlist_name[JVM_MAXPATHLEN]; > 855: > 856: os::snprintf(classlist_name, sizeof(classlist_name), "%s.classlist", file); Need to check for truncation. We should also add a test case for a very long file names specified in "jcmd VM.cds ...." src/hotspot/share/memory/metaspaceShared.cpp line 868: > 866: return; > 867: } > 868: os::snprintf(exec_path, sizeof(exec_path), Need to check for buffer overflow. I think it's better to use a stringStream so you don't need to worry about buffer allocation and overflow. Also, if any of the arguments you append to the command line contain space characters, `os::fork_and_exec` is not going to work. I think it's best to check for space characters (and maybe other special characters such as single quote, double quote, $, etc) and return an error (something like `"special character "%c" in the command-line is not supported"`) src/hotspot/share/memory/metaspaceShared.cpp line 881: > 879: } > 880: char* buff_start = exec_path + strlen(exec_path); > 881: snprintf(buff_start, sizeof(exec_path), " -cp %s %s", app_class_path, java_command); Do we need to pass `java_command`? src/hotspot/share/memory/metaspaceShared.cpp line 889: > 887: if (DynamicArchive::has_been_dumped_once()) { > 888: output->print_cr("Dynamic dump has been done, and should only be done once."); > 889: return; How about "Dynamic dump cannot be done more than once"? src/hotspot/share/memory/metaspaceShared.cpp line 898: > 896: ArchiveClassesAtExit = file; > 897: if (Arguments::init_shared_archive_paths()) { > 898: DynamicArchive::dump(); Instead of modifying the global `ArchiveClassesAtExit`, I think it's better to change `DynamicArchive::dump()` to take an argument of the file to write. `SharedDynamicArchivePath` should be changed to be used only when mapping the dynamic archive (not for writing). src/hotspot/share/oops/instanceKlass.cpp line 4249: > 4247: if (is_hidden() || unsafe_anonymous_host() != NULL) { > 4248: return false; > 4249: } Maybe `InstanceKlass::log_to_classlist` should be refactored to use this function? Also, I think a better name would be `bool InstanceKlass::can_be_logged_to_classlist()`. src/hotspot/share/runtime/arguments.cpp line 3138: > 3136: FLAG_SET_DEFAULT(DynamicDumpSharedSpaces, false); > 3137: } else { > 3138: FLAG_SET_DEFAULT(DynamicDumpSharedSpaces, true); I think this will be more readable: if (ArchiveClassesAtExit != NULL || RecordDynamicDumpInfo) { FLAG_SET_DEFAULT(DynamicDumpSharedSpaces, true); } else { FLAG_SET_DEFAULT(DynamicDumpSharedSpaces, false); BTW, what happens if you specify `-XX:-DynamicDumpSharedSpaces -XX:+RecordDynamicDumpInfo`? src/hotspot/share/runtime/arguments.cpp line 3525: > 3523: os::free(SharedDynamicArchivePath); > 3524: SharedDynamicArchivePath = nullptr; > 3525: } Is this necessary? src/hotspot/share/runtime/globals.hpp line 1896: > 1894: \ > 1895: product(bool, RecordDynamicDumpInfo, false, \ > 1896: "Record class info for jcmd Dynamic dump") \ "Record class info for jcmd VM.cds dynamic_dump"? src/hotspot/share/services/diagnosticCommand.cpp line 1084: > 1082: > 1083: DumpSharedArchiveDCmd::DumpSharedArchiveDCmd(outputStream* output, bool heap) : > 1084: DCmdWithParser(output, heap), Should be inside `#if INCLUE_CDS` src/hotspot/share/memory/metaspaceShared.cpp line 789: > 787: char filename[JVM_MAXPATHLEN]; > 788: const char* file = file_name; > 789: assert(strcmp(cmd, "static_dump") == 0 || strcmp(cmd, "dynamic_dump") == 0, "Sanity check"); Since the caller of this function already performed the string validity check, I think it's better to pass `bool is_static` as a parameter and not pass `cmd`. src/hotspot/share/memory/metaspaceShared.cpp line 863: > 861: MutexLocker lock(ClassLoaderDataGraph_lock); > 862: DumpClassListCLDClosure collect_classes(stream); > 863: ClassLoaderDataGraph::loaded_cld_do(&collect_classes); Need to close the stream. ------------- PR: https://git.openjdk.java.net/jdk/pull/2737 From coleenp at openjdk.java.net Fri Feb 26 23:23:39 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 26 Feb 2021 23:23:39 GMT Subject: RFR: 8262443: GenerateOopMap::do_interpretation can spin for a long time. In-Reply-To: <28Qx7h9l5ubaDYe_QeS8uRIv_XTctt7Kog8BLx-_0Y8=.37a9d5f0-f1ae-4c7d-b92e-64a62fd12ed6@github.com> References: <28Qx7h9l5ubaDYe_QeS8uRIv_XTctt7Kog8BLx-_0Y8=.37a9d5f0-f1ae-4c7d-b92e-64a62fd12ed6@github.com> Message-ID: On Fri, 26 Feb 2021 08:50:38 GMT, Robbin Ehn wrote: > With Safepoint/Handshake timeout enabled in rare cases this methods spins for a long time, blocking safepoints/handshakes, so timeout (with a long delay) is triggered. > > In some cases we are in native while executing this method and in some in vm. > That's why there is an check for state in vm. > > Tested with other changes in t-1-7 this specific case of timeout is no longer an issue. > This change-set passes T1 stand alone. This seems legit. This can be called by the compiler thread while in native, or during GC or during the rewriter if jsr/ret is found. I assume the last case is what you observed? ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2742 From ysuenaga at openjdk.java.net Sat Feb 27 04:58:59 2021 From: ysuenaga at openjdk.java.net (Yasumasa Suenaga) Date: Sat, 27 Feb 2021 04:58:59 GMT Subject: RFR: 8262491: AArch64: CPU description should contain compatible board list Message-ID: HotSpot generates CPU description when it is started. We can see it `jdk.CPUInformation` JFR event as below: $ jfr print --events jdk.CPUInformation raspi4.jfr jdk.CPUInformation { startTime = 22:57:13.521 cpu = "AArch64" description = "AArch64 0x41:0x0:0xd08:3, simd, crc" sockets = 4 cores = 4 hwThreads = 4 } `description` contains "AArch64", it is fixed value, we cannot guess the process was run on what machine (SoC). In Linux, we can use `compatible`property in device tree to guess the machine. The 'compatible' property contains a sorted list of strings starting with the exact name of the machine, followed by an optional list of boards it is compatible with sorted from most compatible to least. After this change, we can get the description as below: jdk.CPUInformation { startTime = 00:32:49.767 cpu = "AArch64" description = "raspberrypi,4-model-b brcm,bcm2711 0x41:0x0:0xd08:3, simd, crc" sockets = 4 cores = 4 hwThreads = 4 } In Linux on AMD64, we can see as following, then we can guess the CPU model from it. The same should do for AArch64. jdk.CPUInformation { startTime = 17:28:03.907 cpu = "AMD (null) (HT) SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 SSE4A AMD64" description = "Brand: AMD Ryzen 3 3300X 4-Core Processor , Vendor: AuthenticAMD Family: (0x17), Model: (0x71), Stepping: 0x0 Ext. family: 0x8, Ext. model: 0x7, Type: 0x0, Signature: 0x00870f10 Features: ebx: 0x01020800, ecx: 0xfed83203, edx: 0x178bfbff Ext. features: eax: 0x00870f10, ebx: 0x20000000, ecx: 0x004003f3, edx: 0x2fd3fbff Supports: On-Chip FPU, Virtual Mode Extensions, Debugging Extensions, Page Size Extensions, Time Stamp Counter, Model Specific Registers, Physical Address Extension, Machine Check Exceptions, CMPXCHG8B Instruction, On-Chip APIC, Fast System Call, Memory Type Range Registers, Page Global Enable, Machine Check Architecture, Conditional Mov Instruction, Page Attribute Table, 36-bit Page Size Extension, CLFLUSH Instruction, Intel Architecture MMX Technology, Fast Float Point Save and Restore, Streaming SIMD extensions, Streaming SIMD extensions 2, Hyper Threading, Streaming SIMD Extensions 3, PCLMULQDQ, Supplemental Streaming SIMD Extensions 3, Fused Multiply-Add, CMPXCHG16B, Streaming SIMD extensions 4.1, Streaming SIMD extensions 4.2, MOVBE, Popcount instruction, AESNI, XSAVE, OSXSAVE, AVX, F16C, LAHF/SAHF instruction support, Core multi-processor leagacy mode, Advanced Bit Manipulations: LZCNT, SSE4A: MOVNTSS, MOVNTSD, EXTRQ, INSERTQ, Misaligned SSE mode, SYSCALL/SYSRET, Execute Disab le Bit, RDTSCP, Intel 64 Architecture" sockets = 1 cores = 2 hwThreads = 2 } ------------- Commit messages: - 8262491: AArch64: CPU description should contain compatible board list Changes: https://git.openjdk.java.net/jdk/pull/2759/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2759&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8262491 Stats: 23 lines in 1 file changed: 21 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2759.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2759/head:pull/2759 PR: https://git.openjdk.java.net/jdk/pull/2759 From stuefe at openjdk.java.net Sat Feb 27 05:50:40 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sat, 27 Feb 2021 05:50:40 GMT Subject: RFR: 8259070: Add jcmd option to dump CDS In-Reply-To: References: Message-ID: On Fri, 26 Feb 2021 00:03:40 GMT, Yumin Qi wrote: > Hi, Please review > > Added jcmd option for dumping CDS archive during application runtime. Before this change, user has to dump shared archive in two steps: first run application with > `java -XX:DumpLoadedClassList= .... ` > to collect shareable class names and saved in file `` , then > `java -Xshare:dump -XX:SharedClassListFile= -XX:SharedArchiveFile= ...` > With this change, user can use jcmd to dump CDS without going through above steps. Also user can choose a moment during the app runtime to dump an archive. > The bug is associated with the CSR: https://bugs.openjdk.java.net/browse/JDK-8259798 which has been approved. > New added jcmd option: > `jcmd VM.cds static_dump ` > or > `jcmd VM.cds dynamic_dump ` > To dump dynamic archive, requires start app with newly added flag `-XX:+RecordDynamicDumpInfo`, with this flag, some information related to dynamic dump like loader constraints will be recorded. Note the dumping process changed some object memory locations so for dumping dynamic archive, can only done once for a running app. For static dump, user can dump multiple times against same process. > The file name is optional, if the file name is not supplied, the file name will take format of `java_pid_static.jsa` or `java_pid_dynamic.jsa` for static and dynamic respectively. The `` is the application process ID. > > Tests: tier1,tier2,tier3,tier4 > > Thanks > Yumin Hi Yumin, This is a very useful addition. My biggest concern with this patch though is the use of `os::fork_and_exec()` in regular, non-fatal situations. I had a look at that function and I think that is very unsafe. - it does not close file descriptors, so the child vm will inherit all file descriptors not opened with FD_CLOEXEC. In Runtime.exec, we do this whole dance around safely closing off all unused file descriptors, which is missing here. Note that even though we mostly open all fds with CLOEXEC, this does not matter, since user code may not do that. - It uses vfork "if available", which is probably always, but that may be okay since the child exec's right away. Still, vfork makes me nervous. - Weirdly enough, it always spawns the child program via one indirection using a shell; so there is always the shell between you and your spawned VM, and you probably wont get the return code of your child vm process back. Until now, os::fork_and_exec() was only used in fatal situations where the VM was about to die anyway. And where it did not really matter if it worked or not. If we now want to use it in regular situations, we need to give that thing an overhaul and make it a first class fork api, and also test it better that it is today. The file descriptor issue has to be addressed at the very least. I really would consider rewriting the whole thing using posix_spawn. Since JDK15 I think posix_spawn is the default for Runtime.exec, so we know it works well. I would also do this in a separate RFE. Alternatively, you could call into java and use Runtime.exec(). Further remarks inline. Thanks, Thomas src/hotspot/share/memory/metaspaceShared.cpp line 883: > 881: snprintf(buff_start, sizeof(exec_path), " -cp %s %s", app_class_path, java_command); > 882: output->print_cr("%s", exec_path); > 883: os::fork_and_exec(exec_path); Apart from what I wrote above, how do you handle fork/exec errors? There are a number of things that can go wrong here. At the very least I would scan the return code. And the return code of the child VM. ------------- Changes requested by stuefe (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2737 From stuefe at openjdk.java.net Sat Feb 27 05:50:41 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sat, 27 Feb 2021 05:50:41 GMT Subject: RFR: 8259070: Add jcmd option to dump CDS In-Reply-To: References: Message-ID: <8BIT_U1LoH-XHUajzOHKBe7xSETc4go9iVbaAlPcTlg=.6d34cc08-ed03-400e-9176-2d809bf1e6a5@github.com> On Fri, 26 Feb 2021 22:01:09 GMT, Calvin Cheung wrote: >> Hi, Please review >> >> Added jcmd option for dumping CDS archive during application runtime. Before this change, user has to dump shared archive in two steps: first run application with >> `java -XX:DumpLoadedClassList= .... ` >> to collect shareable class names and saved in file `` , then >> `java -Xshare:dump -XX:SharedClassListFile= -XX:SharedArchiveFile= ...` >> With this change, user can use jcmd to dump CDS without going through above steps. Also user can choose a moment during the app runtime to dump an archive. >> The bug is associated with the CSR: https://bugs.openjdk.java.net/browse/JDK-8259798 which has been approved. >> New added jcmd option: >> `jcmd VM.cds static_dump ` >> or >> `jcmd VM.cds dynamic_dump ` >> To dump dynamic archive, requires start app with newly added flag `-XX:+RecordDynamicDumpInfo`, with this flag, some information related to dynamic dump like loader constraints will be recorded. Note the dumping process changed some object memory locations so for dumping dynamic archive, can only done once for a running app. For static dump, user can dump multiple times against same process. >> The file name is optional, if the file name is not supplied, the file name will take format of `java_pid_static.jsa` or `java_pid_dynamic.jsa` for static and dynamic respectively. The `` is the application process ID. >> >> Tests: tier1,tier2,tier3,tier4 >> >> Thanks >> Yumin > > src/hotspot/share/memory/metaspaceShared.cpp line 783: > >> 781: char* start = buffer + strlen(buffer); >> 782: snprintf(start, buff_len, "%s ", arg); >> 783: } > > Maybe move the above function to the StringUtils class under share/utilities? > Use `os::snprintf()` instead of `snprintf()`? The calculation is also wrong, this would overflow. You need: char* start = buffer + strlen(buffer); snprintf(start, buff_len - (start - buffer), "%s ", arg); - and maybe add an assert that strlen(buf) < bufflen. - and as Ioi wrote, I'd use either one of os::snprintf or jio_snprintf since both guarantee zero termination on truncation. - or, just use strncat() ------------- PR: https://git.openjdk.java.net/jdk/pull/2737 From stuefe at openjdk.java.net Sat Feb 27 05:50:41 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sat, 27 Feb 2021 05:50:41 GMT Subject: RFR: 8259070: Add jcmd option to dump CDS In-Reply-To: References: Message-ID: <2ARkZBqUR_xMiSXQYcq-rmOVxB62xIiIjzsJo0ZB9Xo=.1e29924e-5aaf-4356-ac6a-5b4b46c177ee@github.com> On Fri, 26 Feb 2021 22:15:12 GMT, Ioi Lam wrote: >> Hi, Please review >> >> Added jcmd option for dumping CDS archive during application runtime. Before this change, user has to dump shared archive in two steps: first run application with >> `java -XX:DumpLoadedClassList= .... ` >> to collect shareable class names and saved in file `` , then >> `java -Xshare:dump -XX:SharedClassListFile= -XX:SharedArchiveFile= ...` >> With this change, user can use jcmd to dump CDS without going through above steps. Also user can choose a moment during the app runtime to dump an archive. >> The bug is associated with the CSR: https://bugs.openjdk.java.net/browse/JDK-8259798 which has been approved. >> New added jcmd option: >> `jcmd VM.cds static_dump ` >> or >> `jcmd VM.cds dynamic_dump ` >> To dump dynamic archive, requires start app with newly added flag `-XX:+RecordDynamicDumpInfo`, with this flag, some information related to dynamic dump like loader constraints will be recorded. Note the dumping process changed some object memory locations so for dumping dynamic archive, can only done once for a running app. For static dump, user can dump multiple times against same process. >> The file name is optional, if the file name is not supplied, the file name will take format of `java_pid_static.jsa` or `java_pid_dynamic.jsa` for static and dynamic respectively. The `` is the application process ID. >> >> Tests: tier1,tier2,tier3,tier4 >> >> Thanks >> Yumin > > src/hotspot/share/memory/metaspaceShared.cpp line 799: > >> 797: if (strstr(file_name, ".jsa") == nullptr) { >> 798: os::snprintf(filename, sizeof(filename), "%s.jsa", file_name); >> 799: file = filename; > > This could potentially overflow the buffer. I think it's best to just leave `file_name` alone. If the user doesn't want the `.jsa` extension, that's fine. Similarly, we don't add `.jsa` to `-XX:ArchiveClassesAtExit` or `-XX:SharedArchiveFile`. How would it overflow? But I agree, I would not add jsa extension if the user did not specify one. I dislike when programs do that. ------------- PR: https://git.openjdk.java.net/jdk/pull/2737 From iklam at openjdk.java.net Sat Feb 27 18:14:40 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Sat, 27 Feb 2021 18:14:40 GMT Subject: RFR: 8259070: Add jcmd option to dump CDS In-Reply-To: <2ARkZBqUR_xMiSXQYcq-rmOVxB62xIiIjzsJo0ZB9Xo=.1e29924e-5aaf-4356-ac6a-5b4b46c177ee@github.com> References: <2ARkZBqUR_xMiSXQYcq-rmOVxB62xIiIjzsJo0ZB9Xo=.1e29924e-5aaf-4356-ac6a-5b4b46c177ee@github.com> Message-ID: On Sat, 27 Feb 2021 05:19:01 GMT, Thomas Stuefe wrote: >> src/hotspot/share/memory/metaspaceShared.cpp line 799: >> >>> 797: if (strstr(file_name, ".jsa") == nullptr) { >>> 798: os::snprintf(filename, sizeof(filename), "%s.jsa", file_name); >>> 799: file = filename; >> >> This could potentially overflow the buffer. I think it's best to just leave `file_name` alone. If the user doesn't want the `.jsa` extension, that's fine. Similarly, we don't add `.jsa` to `-XX:ArchiveClassesAtExit` or `-XX:SharedArchiveFile`. > > How would it overflow? But I agree, I would not add jsa extension if the user did not specify one. I dislike when programs do that. `file_name` is user input that comes from the jcmd, so it can be arbitrarily long and exceed JVM_MAXPATHLEN characters. ------------- PR: https://git.openjdk.java.net/jdk/pull/2737 From iklam at openjdk.java.net Sat Feb 27 18:23:39 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Sat, 27 Feb 2021 18:23:39 GMT Subject: RFR: 8259070: Add jcmd option to dump CDS In-Reply-To: References: Message-ID: On Sat, 27 Feb 2021 05:48:04 GMT, Thomas Stuefe wrote: > I really would consider rewriting the whole thing using posix_spawn. Since JDK15 I think posix_spawn is the default for Runtime.exec, so we know it works well. I would also do this in a separate RFE. > > Alternatively, you could call into java and use Runtime.exec(). I think we should call into Java and use `Runtime.exec()`. Running a subprocess is very complicated, so we shouldn't try to duplicate the code in the VM. ------------- PR: https://git.openjdk.java.net/jdk/pull/2737 From thomas.stuefe at gmail.com Sun Feb 28 05:36:41 2021 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Sun, 28 Feb 2021 06:36:41 +0100 Subject: RFR: 8259070: Add jcmd option to dump CDS In-Reply-To: References: <2ARkZBqUR_xMiSXQYcq-rmOVxB62xIiIjzsJo0ZB9Xo=.1e29924e-5aaf-4356-ac6a-5b4b46c177ee@github.com> Message-ID: Oh right, then it could get truncated, but should not overflow. On Sat, Feb 27, 2021 at 7:15 PM Ioi Lam wrote: > On Sat, 27 Feb 2021 05:19:01 GMT, Thomas Stuefe > wrote: > > >> src/hotspot/share/memory/metaspaceShared.cpp line 799: > >> > >>> 797: if (strstr(file_name, ".jsa") == nullptr) { > >>> 798: os::snprintf(filename, sizeof(filename), "%s.jsa", > file_name); > >>> 799: file = filename; > >> > >> This could potentially overflow the buffer. I think it's best to just > leave `file_name` alone. If the user doesn't want the `.jsa` extension, > that's fine. Similarly, we don't add `.jsa` to `-XX:ArchiveClassesAtExit` > or `-XX:SharedArchiveFile`. > > > > How would it overflow? But I agree, I would not add jsa extension if the > user did not specify one. I dislike when programs do that. > > `file_name` is user input that comes from the jcmd, so it can be > arbitrarily long and exceed JVM_MAXPATHLEN characters. > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/2737 > From dholmes at openjdk.java.net Sun Feb 28 22:36:38 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Sun, 28 Feb 2021 22:36:38 GMT Subject: RFR: 8262426: Change TRAPS to Thread* for find_constrained_instance_or_array_klass() In-Reply-To: <7k_fqiZv2p_eRynkwwZodqwGauVk086pE5A4TdqLUXM=.c5dd976c-64bb-4562-a6fc-13580e9f54eb@github.com> References: <7k_fqiZv2p_eRynkwwZodqwGauVk086pE5A4TdqLUXM=.c5dd976c-64bb-4562-a6fc-13580e9f54eb@github.com> Message-ID: On Fri, 26 Feb 2021 15:55:28 GMT, Harold Seigel wrote: >> This looks good and trivial. > > Thanks Coleen for reviewing this! Hi Harold, When we remove TRAPS we should replace with "Thread* thread", not "Thread* THREAD". The THREAD variable is only used for traps-related exception processing. (I'll be cleaning these up as part of my TRAPS/JavaThread work anyway). Cheers, David ------------- PR: https://git.openjdk.java.net/jdk/pull/2746 From david.holmes at oracle.com Sun Feb 28 22:40:29 2021 From: david.holmes at oracle.com (David Holmes) Date: Mon, 1 Mar 2021 08:40:29 +1000 Subject: RFR: 8262426: Change TRAPS to Thread* for find_constrained_instance_or_array_klass() In-Reply-To: References: <7k_fqiZv2p_eRynkwwZodqwGauVk086pE5A4TdqLUXM=.c5dd976c-64bb-4562-a6fc-13580e9f54eb@github.com> Message-ID: <593e9dcb-f466-db91-eced-bd75f8172af4@oracle.com> On 1/03/2021 8:36 am, David Holmes wrote: > On Fri, 26 Feb 2021 15:55:28 GMT, Harold Seigel wrote: > >>> This looks good and trivial. >> >> Thanks Coleen for reviewing this! > > Hi Harold, > > When we remove TRAPS we should replace with "Thread* thread", not "Thread* THREAD". The THREAD variable is only used for traps-related exception processing. (I'll be cleaning these up as part of my TRAPS/JavaThread work anyway). I messed that in 8261127 as well. David > Cheers, > David > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/2746 >