From github.com+2249648+johntortugo at openjdk.java.net Wed Sep 1 00:23:11 2021 From: github.com+2249648+johntortugo at openjdk.java.net (John Tortugo) Date: Wed, 1 Sep 2021 00:23:11 GMT Subject: RFR: 8267265: Use new IR Test Framework to create tests for C2 IGV transformations [v4] In-Reply-To: <1ZjEPMgcx8a8qIfAMhBlFfQ1S0PUjbbv9r-uyF8cORc=.ad8ee7f3-603e-4ffd-b4e0-a12df3cf8ff3@github.com> References: <2uRU0b0fCTLTdN6jsB9mNpM_3BEgFZz7q4xHWnNG79I=.16186f49-c220-4bf7-aee1-c18f820e92a5@github.com> <1ZjEPMgcx8a8qIfAMhBlFfQ1S0PUjbbv9r-uyF8cORc=.ad8ee7f3-603e-4ffd-b4e0-a12df3cf8ff3@github.com> Message-ID: On Fri, 27 Aug 2021 17:59:10 GMT, Vladimir Kozlov wrote: >> I'm opting for having these tests in subfolders of `irTests` separated by type of optimization. But should we go with `compiler/irTests/*` or `/compiler/c2/irTests/*`? > > I agree to do cleanup and **correctly** separate c1/c2/shared tests (as compiler tests cleanup RFE). > > If we all agree with that then the answer for last question is `/compiler/c2/irTests/*` I moved the tests to `/compiler/c2/irTests/`. Please let me know if the split into subfolders I did is reasonable. ------------- PR: https://git.openjdk.java.net/jdk/pull/5135 From github.com+2249648+johntortugo at openjdk.java.net Wed Sep 1 00:23:11 2021 From: github.com+2249648+johntortugo at openjdk.java.net (John Tortugo) Date: Wed, 1 Sep 2021 00:23:11 GMT Subject: RFR: 8267265: Use new IR Test Framework to create tests for C2 IGV transformations [v4] In-Reply-To: References: Message-ID: <8Ce6bZtHwGEw8_wXZz4ak3obprd1YmZDi4cItcXB4bA=.a7162709-7aad-4709-a585-d2391392f49b@github.com> > Hi, can I please get some reviews for this Pull Request? Here is a summary of the changes: > > - Add tests, using the new IR-based test framework, for several of the Ideal transformations on Add, Sub, Mul, Div, Loop nodes and some simple Scalar Replacement transformations. > - Add more default IR regex's to IR-based test framework. > - Changes to Sub, Div and Add Ideal nodes to that transformations on Int and Long types are the whenever possible same. > - Changes to Sub*Node, Div*Node and Add*Node Ideal methods to fix some bugs and include new transformations. > - New JTREG "ir_transformations" test group under test/hotspot/jtreg. John Tortugo has updated the pull request incrementally with 146 additional commits since the last revision: - Fix merge mistake. - Merge branch 'jdk-8267265' of https://github.com/JohnTortugo/jdk into jdk-8267265 - Addressing PR feedback: move tests to other directory, add custom tests, add tests for other optimizations, rename some tests. - 8273197: ProblemList 2 jtools tests due to JDK-8273187 8273198: ProblemList java/lang/instrument/BootClassPath/BootClassPathTest.sh due to JDK-8273188 Reviewed-by: naoto - 8262186: Call X509KeyManager.chooseClientAlias once for all key types Reviewed-by: xuelei - 8273186: Remove leftover comment about sparse remembered set in G1 HeapRegionRemSet Reviewed-by: ayang - 8273169: java/util/regex/NegativeArraySize.java failed after JDK-8271302 Reviewed-by: jiefu, serb - 8273092: Sort classlist in JDK image Reviewed-by: redestad, ihse, dfuchs - 8273144: Remove unused top level "Sample Collection Set Candidates" logging Reviewed-by: iwalulya, ayang - 8262095: NPE in Flow$FlowAnalyzer.visitApply: Cannot invoke getThrownTypes because tree.meth.type is null Co-authored-by: Jan Lahoda Co-authored-by: Vicente Romero Reviewed-by: jlahoda - ... and 136 more: https://git.openjdk.java.net/jdk/compare/ac430bf7...463102e2 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5135/files - new: https://git.openjdk.java.net/jdk/pull/5135/files/ac430bf7..463102e2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5135&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5135&range=02-03 Stats: 2570 lines in 18 files changed: 1410 ins; 1146 del; 14 mod Patch: https://git.openjdk.java.net/jdk/pull/5135.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5135/head:pull/5135 PR: https://git.openjdk.java.net/jdk/pull/5135 From github.com+2249648+johntortugo at openjdk.java.net Wed Sep 1 00:23:11 2021 From: github.com+2249648+johntortugo at openjdk.java.net (John Tortugo) Date: Wed, 1 Sep 2021 00:23:11 GMT Subject: RFR: 8267265: Use new IR Test Framework to create tests for C2 IGV transformations [v4] In-Reply-To: References: Message-ID: On Fri, 20 Aug 2021 08:59:12 GMT, Christian Hagedorn wrote: >> Thank you @chhagedorn , I think this is a good idea. I'll follow your suggestion and transform some tests into `custom run tests`. > > Great, thanks! Btw, you can merge and now use `RunInfo.getRandom().XX()` for a handy access to random values (if needed) as the PR for JDK-8272567 was integrated in the meantime. Hi, again @chhagedorn. I added some `custom run tests` to tests that seemed more "complex". Please let me know if there are others that you think I should add. ------------- PR: https://git.openjdk.java.net/jdk/pull/5135 From ccheung at openjdk.java.net Wed Sep 1 00:25:52 2021 From: ccheung at openjdk.java.net (Calvin Cheung) Date: Wed, 1 Sep 2021 00:25:52 GMT Subject: RFR: 8270489: Support archived heap objects in EpsilonGC [v6] In-Reply-To: References: <8uAZQGqz-B1JzhrfMGkEz2r-jeZKLbZluxKy3BeLW6c=.5165b6ba-f6ac-48c6-ad78-12210b2e51bd@github.com> Message-ID: On Tue, 31 Aug 2021 23:55:22 GMT, Ioi Lam wrote: >> **Overview:** >> >> This is the first step for supporting archived heap objects for non-G1 collectors. We are doing it for EpsilonGC first to iron out the API between GC and CDS. Also we can implement most of the common code (such as copying archived objects into heap), without impacting the overall system stability. >> >> - Only G1 can write archive heap objects into the CDS archive. >> - Archived objects are "mapped" by G1, but the mapping operation is quite complex. >> - All other collectors will "load" the archive objects, which is much simpler to implement. The trade off is a small start-up penalty and no heap sharing. >> >> Most of the loading code is implemented in heapShared.cpp. The collectors just need to implement the following two `CollectedHeap` APIs in >> >> >> virtual bool can_load_archived_objects(); >> virtual HeapWord* allocate_loaded_archive_space(size_t size); // typically return a block in old gen >> >> >> **Implementation:** >> >> - Allocate (from the old gen) a buffer that's large enough to contain all the archived heap objects. >> - Inside the CDS archive file, the heap objects are usually divided into 2~4 disjoint regions (there are gaps between them). >> - Copy every region in to the buffer consecutively, without any gaps. >> - Relocate all the oop fields in all the copied objects, taking into account of the gap removal. >> - The archived strings may be relocated by a full GC, but the CDS "shared string table" cannot handle relocation, so we copy the archived strings into the dynamic string table. >> >> **Benchmarking:** >> >> We can see significant start-up improvement because the module graph can be loaded from CDS. >> >> >> $ perf stat -r 40 java -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC -Xmx256m -version >> >> Before: 43.1ms >> After: 30.2ms >> >> >> Testing: >> >> - Some general clean up of the test cases. >> - Added support for `-vmoptions:-Dtest.cds.runtime.options=-XX:+UnlockExperimentalVMOptions,-XX:+UseEpsilonGC`: we dump the CDS archive with G1 so we have an archived heap, but run with EpsilonGC to test the new loading code. >> - Added a mach5 task to run all CDS tests with the above config. Incompatible test cases are tagged with `@require vm.gc == null`. See changes in CDSOption.java and VMProps.java > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > @calvinccheung comments The v05 webrev looks good. ------------- Marked as reviewed by ccheung (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5074 From github.com+2249648+johntortugo at openjdk.java.net Wed Sep 1 00:26:50 2021 From: github.com+2249648+johntortugo at openjdk.java.net (John Tortugo) Date: Wed, 1 Sep 2021 00:26:50 GMT Subject: RFR: 8267265: Use new IR Test Framework to create tests for C2 IGV transformations [v4] In-Reply-To: <8Ce6bZtHwGEw8_wXZz4ak3obprd1YmZDi4cItcXB4bA=.a7162709-7aad-4709-a585-d2391392f49b@github.com> References: <8Ce6bZtHwGEw8_wXZz4ak3obprd1YmZDi4cItcXB4bA=.a7162709-7aad-4709-a585-d2391392f49b@github.com> Message-ID: On Wed, 1 Sep 2021 00:23:11 GMT, John Tortugo wrote: >> Hi, can I please get some reviews for this Pull Request? Here is a summary of the changes: >> >> - Add tests, using the new IR-based test framework, for several of the Ideal transformations on Add, Sub, Mul, Div, Loop nodes and some simple Scalar Replacement transformations. >> - Add more default IR regex's to IR-based test framework. >> - Changes to Sub, Div and Add Ideal nodes to that transformations on Int and Long types are the whenever possible same. >> - Changes to Sub*Node, Div*Node and Add*Node Ideal methods to fix some bugs and include new transformations. >> - New JTREG "ir_transformations" test group under test/hotspot/jtreg. > > John Tortugo has updated the pull request incrementally with 146 additional commits since the last revision: > > - Fix merge mistake. > - Merge branch 'jdk-8267265' of https://github.com/JohnTortugo/jdk into jdk-8267265 > - Addressing PR feedback: move tests to other directory, add custom tests, add tests for other optimizations, rename some tests. > - 8273197: ProblemList 2 jtools tests due to JDK-8273187 > 8273198: ProblemList java/lang/instrument/BootClassPath/BootClassPathTest.sh due to JDK-8273188 > > Reviewed-by: naoto > - 8262186: Call X509KeyManager.chooseClientAlias once for all key types > > Reviewed-by: xuelei > - 8273186: Remove leftover comment about sparse remembered set in G1 HeapRegionRemSet > > Reviewed-by: ayang > - 8273169: java/util/regex/NegativeArraySize.java failed after JDK-8271302 > > Reviewed-by: jiefu, serb > - 8273092: Sort classlist in JDK image > > Reviewed-by: redestad, ihse, dfuchs > - 8273144: Remove unused top level "Sample Collection Set Candidates" logging > > Reviewed-by: iwalulya, ayang > - 8262095: NPE in Flow$FlowAnalyzer.visitApply: Cannot invoke getThrownTypes because tree.meth.type is null > > Co-authored-by: Jan Lahoda > Co-authored-by: Vicente Romero > Reviewed-by: jlahoda > - ... and 136 more: https://git.openjdk.java.net/jdk/compare/ac430bf7...463102e2 Hi folks, I'd appreciate if you could take a look again. I addressed the comments so far and here is a summary of the latest changes: - Moved all tests to directory `/compiler/c2/irTests` under hotspot JTREG group. - Added a few `custom run tests` for testing some corner cases. - Renamed some tests files and some test methods. - Added new tests for testing the changes introduced by [JDK-8270823](https://bugs.openjdk.java.net/browse/JDK-8270823) ------------- PR: https://git.openjdk.java.net/jdk/pull/5135 From iklam at openjdk.java.net Wed Sep 1 01:53:12 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 1 Sep 2021 01:53:12 GMT Subject: RFR: 8270489: Support archived heap objects in EpsilonGC [v7] In-Reply-To: <8uAZQGqz-B1JzhrfMGkEz2r-jeZKLbZluxKy3BeLW6c=.5165b6ba-f6ac-48c6-ad78-12210b2e51bd@github.com> References: <8uAZQGqz-B1JzhrfMGkEz2r-jeZKLbZluxKy3BeLW6c=.5165b6ba-f6ac-48c6-ad78-12210b2e51bd@github.com> Message-ID: > **Overview:** > > This is the first step for supporting archived heap objects for non-G1 collectors. We are doing it for EpsilonGC first to iron out the API between GC and CDS. Also we can implement most of the common code (such as copying archived objects into heap), without impacting the overall system stability. > > - Only G1 can write archive heap objects into the CDS archive. > - Archived objects are "mapped" by G1, but the mapping operation is quite complex. > - All other collectors will "load" the archive objects, which is much simpler to implement. The trade off is a small start-up penalty and no heap sharing. > > Most of the loading code is implemented in heapShared.cpp. The collectors just need to implement the following two `CollectedHeap` APIs in > > > virtual bool can_load_archived_objects(); > virtual HeapWord* allocate_loaded_archive_space(size_t size); // typically return a block in old gen > > > **Implementation:** > > - Allocate (from the old gen) a buffer that's large enough to contain all the archived heap objects. > - Inside the CDS archive file, the heap objects are usually divided into 2~4 disjoint regions (there are gaps between them). > - Copy every region in to the buffer consecutively, without any gaps. > - Relocate all the oop fields in all the copied objects, taking into account of the gap removal. > - The archived strings may be relocated by a full GC, but the CDS "shared string table" cannot handle relocation, so we copy the archived strings into the dynamic string table. > > **Benchmarking:** > > We can see significant start-up improvement because the module graph can be loaded from CDS. > > > $ perf stat -r 40 java -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC -Xmx256m -version > > Before: 43.1ms > After: 30.2ms > > > Testing: > > - Some general clean up of the test cases. > - Added support for `-vmoptions:-Dtest.cds.runtime.options=-XX:+UnlockExperimentalVMOptions,-XX:+UseEpsilonGC`: we dump the CDS archive with G1 so we have an archived heap, but run with EpsilonGC to test the new loading code. > - Added a mach5 task to run all CDS tests with the above config. Incompatible test cases are tagged with `@require vm.gc == null`. See changes in CDSOption.java and VMProps.java Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: - Merge branch 'master' into 8270489-archived-heap-objects-for-epsilon-gc - @calvinccheung comments - @shipilev review - fixed whitespaces - @shipilev review -- add verbose param to allocate_work() - Merge branch 'master' into 8270489-archived-heap-objects-for-epsilon-gc - @shipilev comments - @iignatev comments - fixed whitespaces - Add/update test cases - 8270489: Support archived heap objects in EpsilonGC ------------- Changes: https://git.openjdk.java.net/jdk/pull/5074/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5074&range=06 Stats: 805 lines in 31 files changed: 685 ins; 34 del; 86 mod Patch: https://git.openjdk.java.net/jdk/pull/5074.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5074/head:pull/5074 PR: https://git.openjdk.java.net/jdk/pull/5074 From fmatte at openjdk.java.net Wed Sep 1 05:56:10 2021 From: fmatte at openjdk.java.net (Fairoz Matte) Date: Wed, 1 Sep 2021 05:56:10 GMT Subject: RFR: 8272563: Possible assertion failure in CardTableBarrierSetC1 [v2] In-Reply-To: References: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> Message-ID: <3s5tEfAOoagk3KUNawQSUcD8XC0KFHaJliJ9coABlTs=.40ec1088-4aaa-4808-943e-3f69ea0ac82a@github.com> On Tue, 31 Aug 2021 14:54:56 GMT, Fairoz Matte wrote: >> This patch is proposed by the submitter of the bug - ugawa at ci.i.u-tokyo.ac.jp >> >> The method CardTableBarrierSetC1::post_barrier generates a move LIR when TwoOperandLIRForm flag is true to move the address to be marked in the card table to a temporary register. >>> __ move(addr, tmp); >> However, this code only guarantees that `addr` is a valid register for LIR, which can be a virtual register. If the virtual register for `addr` is spilled to the stack by chance, the `move(addr, tmp)` is compiled to a memory-to-register which causes an assertion failure because a memory-to-register move requires their arguments to have the same size. >> The fix is to check if it is is_oop() and call the mov appropriately. >> >> No issues found in local testing and Mach5 tier1-3 > > Fairoz Matte has updated the pull request incrementally with one additional commit since the last revision: > > 8272563: Possible assertion failure in CardTableBarrierSetC1 Thanks, updated the patch. ------------- PR: https://git.openjdk.java.net/jdk/pull/5164 From fmatte at openjdk.java.net Wed Sep 1 05:56:09 2021 From: fmatte at openjdk.java.net (Fairoz Matte) Date: Wed, 1 Sep 2021 05:56:09 GMT Subject: RFR: 8272563: Possible assertion failure in CardTableBarrierSetC1 [v3] In-Reply-To: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> References: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> Message-ID: > This patch is proposed by the submitter of the bug - ugawa at ci.i.u-tokyo.ac.jp > > The method CardTableBarrierSetC1::post_barrier generates a move LIR when TwoOperandLIRForm flag is true to move the address to be marked in the card table to a temporary register. >> __ move(addr, tmp); > However, this code only guarantees that `addr` is a valid register for LIR, which can be a virtual register. If the virtual register for `addr` is spilled to the stack by chance, the `move(addr, tmp)` is compiled to a memory-to-register which causes an assertion failure because a memory-to-register move requires their arguments to have the same size. > The fix is to check if it is is_oop() and call the mov appropriately. > > No issues found in local testing and Mach5 tier1-3 Fairoz Matte has updated the pull request incrementally with one additional commit since the last revision: 8272563: Possible assertion failure in CardTableBarrierSetC1 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5164/files - new: https://git.openjdk.java.net/jdk/pull/5164/files/c023f4bc..5c1e1d49 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5164&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5164&range=01-02 Stats: 4 lines in 1 file changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/5164.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5164/head:pull/5164 PR: https://git.openjdk.java.net/jdk/pull/5164 From dholmes at openjdk.java.net Wed Sep 1 06:15:45 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 1 Sep 2021 06:15:45 GMT Subject: RFR: 8273206: jdk/jfr/event/gc/collection/TestG1ParallelPhases.java fails after JDK-8159979 In-Reply-To: <-Odd9NyApeMHE3VC6K9bHC7cSoKFJfwQtKj9IA7mDyc=.e8274c9e-c191-4041-989b-108b19d78167@github.com> References: <-Odd9NyApeMHE3VC6K9bHC7cSoKFJfwQtKj9IA7mDyc=.e8274c9e-c191-4041-989b-108b19d78167@github.com> Message-ID: On Tue, 31 Aug 2021 23:07:51 GMT, Jie Fu wrote: > 8273206: jdk/jfr/event/gc/collection/TestG1ParallelPhases.java fails after JDK-8159979 Hi Jie, Seems fine, but please add an explanatory comment in the JBS issue and/or the PR description so that reviewers know what the problem and solution are. Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5322 From jiefu at openjdk.java.net Wed Sep 1 06:23:43 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 1 Sep 2021 06:23:43 GMT Subject: RFR: 8273206: jdk/jfr/event/gc/collection/TestG1ParallelPhases.java fails after JDK-8159979 In-Reply-To: References: <-Odd9NyApeMHE3VC6K9bHC7cSoKFJfwQtKj9IA7mDyc=.e8274c9e-c191-4041-989b-108b19d78167@github.com> Message-ID: On Wed, 1 Sep 2021 06:13:08 GMT, David Holmes wrote: > Hi Jie, > > Seems fine, but please add an explanatory comment in the JBS issue and/or the PR description so that reviewers know what the problem and solution are. > > Thanks, > David Thanks @dholmes-ora . Just too busy this morning and won't do it again. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/5322 From iveresov at openjdk.java.net Wed Sep 1 06:55:45 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Wed, 1 Sep 2021 06:55:45 GMT Subject: RFR: 8272563: Possible assertion failure in CardTableBarrierSetC1 [v3] In-Reply-To: References: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> Message-ID: On Wed, 1 Sep 2021 05:56:09 GMT, Fairoz Matte wrote: >> This patch is proposed by the submitter of the bug - ugawa at ci.i.u-tokyo.ac.jp >> >> The method CardTableBarrierSetC1::post_barrier generates a move LIR when TwoOperandLIRForm flag is true to move the address to be marked in the card table to a temporary register. >>> __ move(addr, tmp); >> However, this code only guarantees that `addr` is a valid register for LIR, which can be a virtual register. If the virtual register for `addr` is spilled to the stack by chance, the `move(addr, tmp)` is compiled to a memory-to-register which causes an assertion failure because a memory-to-register move requires their arguments to have the same size. >> The fix is to check if it is is_oop() and call the mov appropriately. >> >> No issues found in local testing and Mach5 tier1-3 > > Fairoz Matte has updated the pull request incrementally with one additional commit since the last revision: > > 8272563: Possible assertion failure in CardTableBarrierSetC1 src/hotspot/share/gc/shared/c1/cardTableBarrierSetC1.cpp line 72: > 70: LIR_Opr addr_opr = LIR_OprFact::address(new LIR_Address(addr, addr->type())); > 71: __ leal(addr_opr, tmp); > 72: __ move(addr, tmp); You don't need the move anymore. ------------- PR: https://git.openjdk.java.net/jdk/pull/5164 From fmatte at openjdk.java.net Wed Sep 1 06:55:45 2021 From: fmatte at openjdk.java.net (Fairoz Matte) Date: Wed, 1 Sep 2021 06:55:45 GMT Subject: RFR: 8272563: Possible assertion failure in CardTableBarrierSetC1 [v3] In-Reply-To: References: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> Message-ID: On Wed, 1 Sep 2021 06:49:47 GMT, Igor Veresov wrote: >> Fairoz Matte has updated the pull request incrementally with one additional commit since the last revision: >> >> 8272563: Possible assertion failure in CardTableBarrierSetC1 > > src/hotspot/share/gc/shared/c1/cardTableBarrierSetC1.cpp line 72: > >> 70: LIR_Opr addr_opr = LIR_OprFact::address(new LIR_Address(addr, addr->type())); >> 71: __ leal(addr_opr, tmp); >> 72: __ move(addr, tmp); > > You don't need the move anymore. ok, will remove that. ------------- PR: https://git.openjdk.java.net/jdk/pull/5164 From iveresov at openjdk.java.net Wed Sep 1 07:10:07 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Wed, 1 Sep 2021 07:10:07 GMT Subject: RFR: 8272563: assert(is_double_stack() && !is_virtual()) failed: type check [v4] In-Reply-To: References: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> Message-ID: On Wed, 1 Sep 2021 07:06:21 GMT, Fairoz Matte wrote: >> This patch is proposed by the submitter of the bug - ugawa at ci.i.u-tokyo.ac.jp >> >> The method CardTableBarrierSetC1::post_barrier generates a move LIR when TwoOperandLIRForm flag is true to move the address to be marked in the card table to a temporary register. >>> __ move(addr, tmp); >> However, this code only guarantees that `addr` is a valid register for LIR, which can be a virtual register. If the virtual register for `addr` is spilled to the stack by chance, the `move(addr, tmp)` is compiled to a memory-to-register which causes an assertion failure because a memory-to-register move requires their arguments to have the same size. >> The fix is to check if it is is_oop() and call the mov appropriately. >> >> No issues found in local testing and Mach5 tier1-3 > > Fairoz Matte has updated the pull request incrementally with one additional commit since the last revision: > > 8272563: Possible assertion failure in CardTableBarrierSetC1 Marked as reviewed by iveresov (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5164 From fmatte at openjdk.java.net Wed Sep 1 07:10:07 2021 From: fmatte at openjdk.java.net (Fairoz Matte) Date: Wed, 1 Sep 2021 07:10:07 GMT Subject: RFR: 8272563: assert(is_double_stack() && !is_virtual()) failed: type check [v4] In-Reply-To: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> References: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> Message-ID: > This patch is proposed by the submitter of the bug - ugawa at ci.i.u-tokyo.ac.jp > > The method CardTableBarrierSetC1::post_barrier generates a move LIR when TwoOperandLIRForm flag is true to move the address to be marked in the card table to a temporary register. >> __ move(addr, tmp); > However, this code only guarantees that `addr` is a valid register for LIR, which can be a virtual register. If the virtual register for `addr` is spilled to the stack by chance, the `move(addr, tmp)` is compiled to a memory-to-register which causes an assertion failure because a memory-to-register move requires their arguments to have the same size. > The fix is to check if it is is_oop() and call the mov appropriately. > > No issues found in local testing and Mach5 tier1-3 Fairoz Matte has updated the pull request incrementally with one additional commit since the last revision: 8272563: Possible assertion failure in CardTableBarrierSetC1 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5164/files - new: https://git.openjdk.java.net/jdk/pull/5164/files/5c1e1d49..359cbf3d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5164&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5164&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5164.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5164/head:pull/5164 PR: https://git.openjdk.java.net/jdk/pull/5164 From iveresov at openjdk.java.net Wed Sep 1 07:10:08 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Wed, 1 Sep 2021 07:10:08 GMT Subject: RFR: 8272563: assert(is_double_stack() && !is_virtual()) failed: type check [v3] In-Reply-To: References: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> Message-ID: On Wed, 1 Sep 2021 05:56:09 GMT, Fairoz Matte wrote: >> This patch is proposed by the submitter of the bug - ugawa at ci.i.u-tokyo.ac.jp >> >> The method CardTableBarrierSetC1::post_barrier generates a move LIR when TwoOperandLIRForm flag is true to move the address to be marked in the card table to a temporary register. >>> __ move(addr, tmp); >> However, this code only guarantees that `addr` is a valid register for LIR, which can be a virtual register. If the virtual register for `addr` is spilled to the stack by chance, the `move(addr, tmp)` is compiled to a memory-to-register which causes an assertion failure because a memory-to-register move requires their arguments to have the same size. >> The fix is to check if it is is_oop() and call the mov appropriately. >> >> No issues found in local testing and Mach5 tier1-3 > > Fairoz Matte has updated the pull request incrementally with one additional commit since the last revision: > > 8272563: Possible assertion failure in CardTableBarrierSetC1 Looks good! ------------- PR: https://git.openjdk.java.net/jdk/pull/5164 From ayang at openjdk.java.net Wed Sep 1 07:53:47 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Wed, 1 Sep 2021 07:53:47 GMT Subject: RFR: 8273206: jdk/jfr/event/gc/collection/TestG1ParallelPhases.java fails after JDK-8159979 In-Reply-To: <-Odd9NyApeMHE3VC6K9bHC7cSoKFJfwQtKj9IA7mDyc=.e8274c9e-c191-4041-989b-108b19d78167@github.com> References: <-Odd9NyApeMHE3VC6K9bHC7cSoKFJfwQtKj9IA7mDyc=.e8274c9e-c191-4041-989b-108b19d78167@github.com> Message-ID: On Tue, 31 Aug 2021 23:07:51 GMT, Jie Fu wrote: > 8273206: jdk/jfr/event/gc/collection/TestG1ParallelPhases.java fails after JDK-8159979 Marked as reviewed by ayang (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5322 From jiefu at openjdk.java.net Wed Sep 1 07:58:49 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 1 Sep 2021 07:58:49 GMT Subject: RFR: 8273206: jdk/jfr/event/gc/collection/TestG1ParallelPhases.java fails after JDK-8159979 In-Reply-To: References: <-Odd9NyApeMHE3VC6K9bHC7cSoKFJfwQtKj9IA7mDyc=.e8274c9e-c191-4041-989b-108b19d78167@github.com> Message-ID: <_zs6QBbI8INu8NKrQZnuBaBgd4Z1r7LOH878aPLqWuM=.1af01508-03f5-43a5-ad8f-5598d9c9f0fb@github.com> On Wed, 1 Sep 2021 07:51:06 GMT, Albert Mingkun Yang wrote: >> 8273206: jdk/jfr/event/gc/collection/TestG1ParallelPhases.java fails after JDK-8159979 > > Marked as reviewed by ayang (Reviewer). Thanks @albertnetymk . ------------- PR: https://git.openjdk.java.net/jdk/pull/5322 From jiefu at openjdk.java.net Wed Sep 1 07:58:50 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 1 Sep 2021 07:58:50 GMT Subject: Integrated: 8273206: jdk/jfr/event/gc/collection/TestG1ParallelPhases.java fails after JDK-8159979 In-Reply-To: <-Odd9NyApeMHE3VC6K9bHC7cSoKFJfwQtKj9IA7mDyc=.e8274c9e-c191-4041-989b-108b19d78167@github.com> References: <-Odd9NyApeMHE3VC6K9bHC7cSoKFJfwQtKj9IA7mDyc=.e8274c9e-c191-4041-989b-108b19d78167@github.com> Message-ID: On Tue, 31 Aug 2021 23:07:51 GMT, Jie Fu wrote: > 8273206: jdk/jfr/event/gc/collection/TestG1ParallelPhases.java fails after JDK-8159979 This pull request has now been integrated. Changeset: f1c5e26e Author: Jie Fu URL: https://git.openjdk.java.net/jdk/commit/f1c5e26e48ca2db0fc2b7ad2cf1bda4853bdeea9 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod 8273206: jdk/jfr/event/gc/collection/TestG1ParallelPhases.java fails after JDK-8159979 Reviewed-by: dholmes, ayang ------------- PR: https://git.openjdk.java.net/jdk/pull/5322 From shade at openjdk.java.net Wed Sep 1 08:20:19 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 1 Sep 2021 08:20:19 GMT Subject: RFR: 8272914: Create hotspot:tier2 and hotspot:tier3 test groups [v4] In-Reply-To: <9aYAUWwQ9ay7S65EIVoEU0eDvIEEh5NFxx6yBtp9tbk=.0f57298c-5e68-4c3a-9237-9e5568225eee@github.com> References: <9aYAUWwQ9ay7S65EIVoEU0eDvIEEh5NFxx6yBtp9tbk=.0f57298c-5e68-4c3a-9237-9e5568225eee@github.com> Message-ID: > Currently, no tests run in `hotspot:tier2` and `hotspot:tier3` groups, yet some groups have the "tier2" and "tier3" in their names. As we move tests from `hotspot:tier1`, they need to land in higher tiers. Therefore, we need `hotspot:tier2` and `hotspot:tier3`. > > Sample runs: > > > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:tier2 425 425 0 0 > > real 11m45.244s > user 433m48.960s > sys 38m13.606s > > > > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:tier3 80 80 0 0 > > real 35m19.031s > user 418m45.607s > sys 5m41.748s > > > These also hook up properly to global `tier2` and `tier3`. > > Additional testing: > - [x] Linux x86_64 fastdebug `hotspot:tier2` > - [x] Linux x86_64 fastdebug `hotspot:tier3` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Add newly emerged tier2_gc_epsilon to tier2 as well - Merge branch 'master' into JDK-8272914-hs-tier2-3 - Cleaner test group definitions - Filter InvocationTests in tier3, leaving the existing group alone - 8272914: Create hotspot:tier2 and hotspot:tier3 test groups ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5241/files - new: https://git.openjdk.java.net/jdk/pull/5241/files/832dcc3a..cc352547 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5241&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5241&range=02-03 Stats: 10101 lines in 330 files changed: 5674 ins; 2103 del; 2324 mod Patch: https://git.openjdk.java.net/jdk/pull/5241.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5241/head:pull/5241 PR: https://git.openjdk.java.net/jdk/pull/5241 From shade at openjdk.java.net Wed Sep 1 08:20:22 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 1 Sep 2021 08:20:22 GMT Subject: RFR: 8272914: Create hotspot:tier2 and hotspot:tier3 test groups [v3] In-Reply-To: References: <9aYAUWwQ9ay7S65EIVoEU0eDvIEEh5NFxx6yBtp9tbk=.0f57298c-5e68-4c3a-9237-9e5568225eee@github.com> Message-ID: On Wed, 25 Aug 2021 10:17:03 GMT, Aleksey Shipilev wrote: >> Currently, no tests run in `hotspot:tier2` and `hotspot:tier3` groups, yet some groups have the "tier2" and "tier3" in their names. As we move tests from `hotspot:tier1`, they need to land in higher tiers. Therefore, we need `hotspot:tier2` and `hotspot:tier3`. >> >> Sample runs: >> >> >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier2 425 425 0 0 >> >> real 11m45.244s >> user 433m48.960s >> sys 38m13.606s >> >> >> >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier3 80 80 0 0 >> >> real 35m19.031s >> user 418m45.607s >> sys 5m41.748s >> >> >> These also hook up properly to global `tier2` and `tier3`. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `hotspot:tier2` >> - [x] Linux x86_64 fastdebug `hotspot:tier3` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Cleaner test group definitions Also added `tier2_gc_epsilon`, now that it is in master. ------------- PR: https://git.openjdk.java.net/jdk/pull/5241 From thartmann at openjdk.java.net Wed Sep 1 10:12:50 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 1 Sep 2021 10:12:50 GMT Subject: RFR: 8272563: assert(is_double_stack() && !is_virtual()) failed: type check [v4] In-Reply-To: References: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> Message-ID: On Wed, 1 Sep 2021 07:10:07 GMT, Fairoz Matte wrote: >> This patch is proposed by the submitter of the bug - ugawa at ci.i.u-tokyo.ac.jp >> >> The method CardTableBarrierSetC1::post_barrier generates a move LIR when TwoOperandLIRForm flag is true to move the address to be marked in the card table to a temporary register. >>> __ move(addr, tmp); >> However, this code only guarantees that `addr` is a valid register for LIR, which can be a virtual register. If the virtual register for `addr` is spilled to the stack by chance, the `move(addr, tmp)` is compiled to a memory-to-register which causes an assertion failure because a memory-to-register move requires their arguments to have the same size. >> The fix is to check if it is is_oop() and call the mov appropriately. >> >> No issues found in local testing and Mach5 tier1-3 > > Fairoz Matte has updated the pull request incrementally with one additional commit since the last revision: > > 8272563: Possible assertion failure in CardTableBarrierSetC1 Looks good to me too! ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5164 From fmatte at openjdk.java.net Wed Sep 1 10:15:52 2021 From: fmatte at openjdk.java.net (Fairoz Matte) Date: Wed, 1 Sep 2021 10:15:52 GMT Subject: Integrated: 8272563: assert(is_double_stack() && !is_virtual()) failed: type check In-Reply-To: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> References: <6X_l0Bp30HEmNGg9L3ov-n68XuRo9JrW9uvj6pSwqjk=.46f88a66-7256-4647-a74f-27b976c1200e@github.com> Message-ID: On Wed, 18 Aug 2021 12:37:00 GMT, Fairoz Matte wrote: > This patch is proposed by the submitter of the bug - ugawa at ci.i.u-tokyo.ac.jp > > The method CardTableBarrierSetC1::post_barrier generates a move LIR when TwoOperandLIRForm flag is true to move the address to be marked in the card table to a temporary register. >> __ move(addr, tmp); > However, this code only guarantees that `addr` is a valid register for LIR, which can be a virtual register. If the virtual register for `addr` is spilled to the stack by chance, the `move(addr, tmp)` is compiled to a memory-to-register which causes an assertion failure because a memory-to-register move requires their arguments to have the same size. > The fix is to check if it is is_oop() and call the mov appropriately. > > No issues found in local testing and Mach5 tier1-3 This pull request has now been integrated. Changeset: a58cf165 Author: Fairoz Matte Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/a58cf16509f3120d69fc18bd4c2c49e9ad590f73 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod 8272563: assert(is_double_stack() && !is_virtual()) failed: type check Reviewed-by: thartmann, iveresov ------------- PR: https://git.openjdk.java.net/jdk/pull/5164 From shade at openjdk.java.net Wed Sep 1 11:12:20 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 1 Sep 2021 11:12:20 GMT Subject: RFR: 8263375: Support stack watermarks in Zero VM [v2] In-Reply-To: References: <044dfg2EqIGmN7EM-PX2JkjpYwITRDIiA9qZqPHlFHA=.ead4f0b8-e21c-41eb-9fdb-7954cfb548e0@github.com> Message-ID: On Mon, 12 Jul 2021 11:59:46 GMT, Aleksey Shipilev wrote: >> src/hotspot/cpu/zero/zeroInterpreter_zero.cpp line 205: >> >>> 203: // Notify the stack watermarks machinery that we are unwinding. >>> 204: // Should do this before resetting the frame anchor. >>> 205: stack_watermark_unwind_check(thread); >> >> I wonder if this should maybe move down a bit to where we inspect the reason we left the interpreter loop. There are multiple reasons and only some involve unwinding. I'm thinking BytecodeInterpreter::return_from_method and BytecodeInterpreter::do_osr >> >> Regarding BytecodeInterpreter::throwing_exception the current contract for exception handing is that an unwind handler is called *after* unwinding instead. We have some exception handler function in the interpreter runtime that gets called after unwinding with an exception into an interpreted frame. Hopefully that still gets called when using zero. Worth double checking. > > Thanks! I'll take a look after I am back from extended time off. > > Zero does not actually do OSR anymore (Zero gradually eroded to interpreter-only mode), so only return_from_method might need handling. I'll see what happens in throwing_exception case; worst case I think I can set up a one-off frame anchor and call the unwind handler with it. OK, I think for `throwing_exception` case you mean `InterpreterRuntime::exception_handler_for_exception` that is called to do `StackWatermarkSet::after_unwind`. That thing is [called](https://github.com/openjdk/jdk/blob/0e14bf70cf6e482a2ec7d13ed37df0bee911740d/src/hotspot/share/interpreter/zero/bytecodeInterpreter.cpp#L2492) from Zero during exception throwing, similar to the places in `TemplateInterpreter`-s for other arches. New commit now only calls stack watermark check for `return_from_method` and `do_osr`. ------------- PR: https://git.openjdk.java.net/jdk/pull/4728 From shade at openjdk.java.net Wed Sep 1 11:12:17 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 1 Sep 2021 11:12:17 GMT Subject: RFR: 8263375: Support stack watermarks in Zero VM [v2] In-Reply-To: <044dfg2EqIGmN7EM-PX2JkjpYwITRDIiA9qZqPHlFHA=.ead4f0b8-e21c-41eb-9fdb-7954cfb548e0@github.com> References: <044dfg2EqIGmN7EM-PX2JkjpYwITRDIiA9qZqPHlFHA=.ead4f0b8-e21c-41eb-9fdb-7954cfb548e0@github.com> Message-ID: <2_bXvSnzzLC-SSViugPyNvmp3FpBuBqRmxgNzL4BJjY=.734d30e2-37a8-433a-9320-3e03b71aaa8d@github.com> > Zero VM supports most of GCs. Since JDK 16, Shenandoah uses stack watermarks, so Zero has to support those if Shenandoah+Zero support is to remain. This PR adds the stack watermark support in Zero VM. This should also be useful as other projects, notably Loom, mature and depend on stack watermarks. > > Zero already calls into Hotspot safepoint machinery to do things, and it seems only the hooks for `on_iteration` and `on_unwind` are missing. AFAICS, Zero only has on-return safepoints, renamed it to be more precise. > > @fisk, do you see any obvious problems with this patch? > > Additional testing: > - [x] Linux x86_64 Zero `hotspot_gc_shenandoah` now passes Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Notify only when unwinding messages are received - Merge branch 'master' into JDK-8263375-zero-stack-watermarks - Revert debugging - 8263375: Support stack watermarks in Zero VM ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4728/files - new: https://git.openjdk.java.net/jdk/pull/4728/files/d68b31a8..d809aad2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4728&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4728&range=00-01 Stats: 87518 lines in 1860 files changed: 67554 ins; 10358 del; 9606 mod Patch: https://git.openjdk.java.net/jdk/pull/4728.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4728/head:pull/4728 PR: https://git.openjdk.java.net/jdk/pull/4728 From ayang at openjdk.java.net Wed Sep 1 15:05:32 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Wed, 1 Sep 2021 15:05:32 GMT Subject: RFR: 8273239: Standardize Ticks APIs return type Message-ID: Simple change on return types of Ticks API. The call of `milliseconds()` in `spinYield.cpp` seems a bug to me, because the unit in the message is `usecs`. Therefore, I changed it to `microseconds()`. Test: tier1 ------------- Commit messages: - tick-double Changes: https://git.openjdk.java.net/jdk/pull/5332/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5332&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273239 Stats: 33 lines in 5 files changed: 0 ins; 0 del; 33 mod Patch: https://git.openjdk.java.net/jdk/pull/5332.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5332/head:pull/5332 PR: https://git.openjdk.java.net/jdk/pull/5332 From eosterlund at openjdk.java.net Wed Sep 1 16:16:20 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 1 Sep 2021 16:16:20 GMT Subject: RFR: 8263375: Support stack watermarks in Zero VM [v2] In-Reply-To: <2_bXvSnzzLC-SSViugPyNvmp3FpBuBqRmxgNzL4BJjY=.734d30e2-37a8-433a-9320-3e03b71aaa8d@github.com> References: <044dfg2EqIGmN7EM-PX2JkjpYwITRDIiA9qZqPHlFHA=.ead4f0b8-e21c-41eb-9fdb-7954cfb548e0@github.com> <2_bXvSnzzLC-SSViugPyNvmp3FpBuBqRmxgNzL4BJjY=.734d30e2-37a8-433a-9320-3e03b71aaa8d@github.com> Message-ID: <4yhfvYDB5XYRSz5JTSTZ0UCj4n2eAVZgXtmWPMsIGos=.b7041352-2e9e-4437-aa7a-08df72844f57@github.com> On Wed, 1 Sep 2021 11:12:17 GMT, Aleksey Shipilev wrote: >> Zero VM supports most of GCs. Since JDK 16, Shenandoah uses stack watermarks, so Zero has to support those if Shenandoah+Zero support is to remain. This PR adds the stack watermark support in Zero VM. This should also be useful as other projects, notably Loom, mature and depend on stack watermarks. >> >> Zero already calls into Hotspot safepoint machinery to do things, and it seems only the hooks for `on_iteration` and `on_unwind` are missing. AFAICS, Zero only has on-return safepoints, renamed it to be more precise. >> >> @fisk, do you see any obvious problems with this patch? >> >> Additional testing: >> - [x] Linux x86_64 Zero `hotspot_gc_shenandoah` now passes > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Notify only when unwinding messages are received > - Merge branch 'master' into JDK-8263375-zero-stack-watermarks > - Revert debugging > - 8263375: Support stack watermarks in Zero VM Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4728 From iklam at openjdk.java.net Wed Sep 1 16:53:55 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 1 Sep 2021 16:53:55 GMT Subject: RFR: 8270489: Support archived heap objects in EpsilonGC [v4] In-Reply-To: References: <8uAZQGqz-B1JzhrfMGkEz2r-jeZKLbZluxKy3BeLW6c=.5165b6ba-f6ac-48c6-ad78-12210b2e51bd@github.com> Message-ID: On Mon, 16 Aug 2021 05:28:16 GMT, Aleksey Shipilev wrote: >> Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: >> >> - @shipilev review -- add verbose param to allocate_work() >> - Merge branch 'master' into 8270489-archived-heap-objects-for-epsilon-gc >> - @shipilev comments >> - @iignatev comments >> - fixed whitespaces >> - Add/update test cases >> - 8270489: Support archived heap objects in EpsilonGC > > Epsilon parts look good to me. Thanks @shipilev and @calvinccheung for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/5074 From iklam at openjdk.java.net Wed Sep 1 16:53:59 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 1 Sep 2021 16:53:59 GMT Subject: Integrated: 8270489: Support archived heap objects in EpsilonGC In-Reply-To: <8uAZQGqz-B1JzhrfMGkEz2r-jeZKLbZluxKy3BeLW6c=.5165b6ba-f6ac-48c6-ad78-12210b2e51bd@github.com> References: <8uAZQGqz-B1JzhrfMGkEz2r-jeZKLbZluxKy3BeLW6c=.5165b6ba-f6ac-48c6-ad78-12210b2e51bd@github.com> Message-ID: On Tue, 10 Aug 2021 19:57:16 GMT, Ioi Lam wrote: > **Overview:** > > This is the first step for supporting archived heap objects for non-G1 collectors. We are doing it for EpsilonGC first to iron out the API between GC and CDS. Also we can implement most of the common code (such as copying archived objects into heap), without impacting the overall system stability. > > - Only G1 can write archive heap objects into the CDS archive. > - Archived objects are "mapped" by G1, but the mapping operation is quite complex. > - All other collectors will "load" the archive objects, which is much simpler to implement. The trade off is a small start-up penalty and no heap sharing. > > Most of the loading code is implemented in heapShared.cpp. The collectors just need to implement the following two `CollectedHeap` APIs in > > > virtual bool can_load_archived_objects(); > virtual HeapWord* allocate_loaded_archive_space(size_t size); // typically return a block in old gen > > > **Implementation:** > > - Allocate (from the old gen) a buffer that's large enough to contain all the archived heap objects. > - Inside the CDS archive file, the heap objects are usually divided into 2~4 disjoint regions (there are gaps between them). > - Copy every region in to the buffer consecutively, without any gaps. > - Relocate all the oop fields in all the copied objects, taking into account of the gap removal. > - The archived strings may be relocated by a full GC, but the CDS "shared string table" cannot handle relocation, so we copy the archived strings into the dynamic string table. > > **Benchmarking:** > > We can see significant start-up improvement because the module graph can be loaded from CDS. > > > $ perf stat -r 40 java -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC -Xmx256m -version > > Before: 43.1ms > After: 30.2ms > > > Testing: > > - Some general clean up of the test cases. > - Added support for `-vmoptions:-Dtest.cds.runtime.options=-XX:+UnlockExperimentalVMOptions,-XX:+UseEpsilonGC`: we dump the CDS archive with G1 so we have an archived heap, but run with EpsilonGC to test the new loading code. > - Added a mach5 task to run all CDS tests with the above config. Incompatible test cases are tagged with `@require vm.gc == null`. See changes in CDSOption.java and VMProps.java This pull request has now been integrated. Changeset: 655ea6d4 Author: Ioi Lam URL: https://git.openjdk.java.net/jdk/commit/655ea6d42ae94d96a03b1f008aad264a1ee4f173 Stats: 805 lines in 31 files changed: 685 ins; 34 del; 86 mod 8270489: Support archived heap objects in EpsilonGC Reviewed-by: shade, ccheung ------------- PR: https://git.openjdk.java.net/jdk/pull/5074 From coleenp at openjdk.java.net Wed Sep 1 17:05:26 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 1 Sep 2021 17:05:26 GMT Subject: RFR: 8272788: Nonleaf ranked locks should not be safepoint_check_never [v4] In-Reply-To: References: Message-ID: > I moved nonleaf ranked locks to be leaf (or leaf+something). Many of the leaf locks are safepoint_check_never. Segregating this rank into safepoint checking and non-safepoint checking is left for a future RFE. > Tested with tier1-3. Tier 4-6 testing in progress. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Revert NonJavaThreads_lock rank. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5203/files - new: https://git.openjdk.java.net/jdk/pull/5203/files/2b3b1c47..816d88fb Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5203&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5203&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5203.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5203/head:pull/5203 PR: https://git.openjdk.java.net/jdk/pull/5203 From pchilanomate at openjdk.java.net Wed Sep 1 17:39:33 2021 From: pchilanomate at openjdk.java.net (Patricio Chilano Mateo) Date: Wed, 1 Sep 2021 17:39:33 GMT Subject: RFR: 8272788: Nonleaf ranked locks should not be safepoint_check_never [v3] In-Reply-To: References: Message-ID: <29Skv-NBNco50uC8ZPOJZeX_D-ofejyOL7g1p3IdrMs=.7dc91776-6a3a-4a64-886e-29ccc6707a58@github.com> On Tue, 31 Aug 2021 21:52:12 GMT, Coleen Phillimore wrote: >> I moved nonleaf ranked locks to be leaf (or leaf+something). Many of the leaf locks are safepoint_check_never. Segregating this rank into safepoint checking and non-safepoint checking is left for a future RFE. >> Tested with tier1-3. Tier 4-6 testing in progress. > > Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Fix merge error. > - Merge branch 'master' into nonleaf > - Remove JfrSream_lock and rerun JFR tests. > - 8272788: Nonleaf ranked locks should not be safepoint_check_neve Hi Coleen, Changes look good to me. By inspecting the calls made after acquiring locks JvmtiTagMap_lock, CompiledIC_lock and VtableStubs_lock it seems the new lower rank of these locks should be fine although it's not straightforward. I guess we can always fix it if we find some path where a higher order rank needs to be acquired, but at least we know all the special locks are still lower than leaf. Only comment about NonJavaThreadsList_lock. Thanks, Patricio src/hotspot/share/runtime/mutexLocker.cpp line 270: > 268: > 269: def(Threads_lock , PaddedMonitor, barrier, true, _safepoint_check_always); // Used for safepoint protocol. > 270: def(NonJavaThreadsList_lock , PaddedMutex, leaf+1, true, _safepoint_check_never); Why do we need to change this rank? We now assert a lock should be _safepoint_check_always if the rank is >= nonleaf, but this is barrier so it should be good. ------------- Marked as reviewed by pchilanomate (Committer). PR: https://git.openjdk.java.net/jdk/pull/5203 From shade at openjdk.java.net Wed Sep 1 17:45:35 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 1 Sep 2021 17:45:35 GMT Subject: RFR: 8263375: Support stack watermarks in Zero VM [v2] In-Reply-To: <4yhfvYDB5XYRSz5JTSTZ0UCj4n2eAVZgXtmWPMsIGos=.b7041352-2e9e-4437-aa7a-08df72844f57@github.com> References: <044dfg2EqIGmN7EM-PX2JkjpYwITRDIiA9qZqPHlFHA=.ead4f0b8-e21c-41eb-9fdb-7954cfb548e0@github.com> <2_bXvSnzzLC-SSViugPyNvmp3FpBuBqRmxgNzL4BJjY=.734d30e2-37a8-433a-9320-3e03b71aaa8d@github.com> <4yhfvYDB5XYRSz5JTSTZ0UCj4n2eAVZgXtmWPMsIGos=.b7041352-2e9e-4437-aa7a-08df72844f57@github.com> Message-ID: On Wed, 1 Sep 2021 16:10:40 GMT, Erik ?sterlund wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Notify only when unwinding messages are received >> - Merge branch 'master' into JDK-8263375-zero-stack-watermarks >> - Revert debugging >> - 8263375: Support stack watermarks in Zero VM > > Looks good. Thanks, @fisk! I suppose there are no other interested parties for this kind of patch, so I'll integrate this soon. ------------- PR: https://git.openjdk.java.net/jdk/pull/4728 From coleenp at openjdk.java.net Wed Sep 1 17:55:33 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 1 Sep 2021 17:55:33 GMT Subject: RFR: 8272788: Nonleaf ranked locks should not be safepoint_check_never [v3] In-Reply-To: <29Skv-NBNco50uC8ZPOJZeX_D-ofejyOL7g1p3IdrMs=.7dc91776-6a3a-4a64-886e-29ccc6707a58@github.com> References: <29Skv-NBNco50uC8ZPOJZeX_D-ofejyOL7g1p3IdrMs=.7dc91776-6a3a-4a64-886e-29ccc6707a58@github.com> Message-ID: On Wed, 1 Sep 2021 17:27:33 GMT, Patricio Chilano Mateo wrote: >> Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: >> >> - Fix merge error. >> - Merge branch 'master' into nonleaf >> - Remove JfrSream_lock and rerun JFR tests. >> - 8272788: Nonleaf ranked locks should not be safepoint_check_neve > > src/hotspot/share/runtime/mutexLocker.cpp line 270: > >> 268: >> 269: def(Threads_lock , PaddedMonitor, barrier, true, _safepoint_check_always); // Used for safepoint protocol. >> 270: def(NonJavaThreadsList_lock , PaddedMutex, leaf+1, true, _safepoint_check_never); > > Why do we need to change this rank? We now assert a lock should be _safepoint_check_always if the rank is >= nonleaf, but this is barrier so it should be good. This was leftover from a different change where I was trying to move all the _safepoint_check_never locks to leaf level. In any case, I reverted this change and am retesting tier1-3. ------------- PR: https://git.openjdk.java.net/jdk/pull/5203 From never at openjdk.java.net Wed Sep 1 18:10:41 2021 From: never at openjdk.java.net (Tom Rodriguez) Date: Wed, 1 Sep 2021 18:10:41 GMT Subject: RFR: 8137018: [JVMCI] Encapsulate new Thread fields for JVMCI Message-ID: This evacuates all JVMCI related methods and fields into a separately declared struct. ------------- Commit messages: - 8137018: [JVMCI] Encapsulate new Thread fields for JVMCI Changes: https://git.openjdk.java.net/jdk/pull/5339/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5339&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8137018 Stats: 547 lines in 16 files changed: 282 ins; 207 del; 58 mod Patch: https://git.openjdk.java.net/jdk/pull/5339.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5339/head:pull/5339 PR: https://git.openjdk.java.net/jdk/pull/5339 From coleenp at openjdk.java.net Wed Sep 1 18:42:30 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 1 Sep 2021 18:42:30 GMT Subject: RFR: 8272788: Nonleaf ranked locks should not be safepoint_check_never [v3] In-Reply-To: References: <29Skv-NBNco50uC8ZPOJZeX_D-ofejyOL7g1p3IdrMs=.7dc91776-6a3a-4a64-886e-29ccc6707a58@github.com> Message-ID: On Wed, 1 Sep 2021 17:52:49 GMT, Coleen Phillimore wrote: >> src/hotspot/share/runtime/mutexLocker.cpp line 270: >> >>> 268: >>> 269: def(Threads_lock , PaddedMonitor, barrier, true, _safepoint_check_always); // Used for safepoint protocol. >>> 270: def(NonJavaThreadsList_lock , PaddedMutex, leaf+1, true, _safepoint_check_never); >> >> Why do we need to change this rank? We now assert a lock should be _safepoint_check_always if the rank is >= nonleaf, but this is barrier so it should be good. > > This was leftover from a different change where I was trying to move all the _safepoint_check_never locks to leaf level. In any case, I reverted this change and am retesting tier1-3. Tier1-3 passes by reverting that change. ------------- PR: https://git.openjdk.java.net/jdk/pull/5203 From coleenp at openjdk.java.net Wed Sep 1 18:42:30 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 1 Sep 2021 18:42:30 GMT Subject: RFR: 8272788: Nonleaf ranked locks should not be safepoint_check_never [v4] In-Reply-To: References: Message-ID: On Wed, 1 Sep 2021 17:05:26 GMT, Coleen Phillimore wrote: >> I moved nonleaf ranked locks to be leaf (or leaf+something). Many of the leaf locks are safepoint_check_never. Segregating this rank into safepoint checking and non-safepoint checking is left for a future RFE. >> Tested with tier1-3. Tier 4-6 testing in progress. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Revert NonJavaThreads_lock rank. Thanks for reviewing Patricio. Yes, these locks were less difficult to inspect their usage and pretty well tested so this smaller change seems safe in the pursuit of a bigger change. Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/5203 From coleenp at openjdk.java.net Wed Sep 1 18:42:31 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 1 Sep 2021 18:42:31 GMT Subject: Integrated: 8272788: Nonleaf ranked locks should not be safepoint_check_never In-Reply-To: References: Message-ID: <8HZBg9H7PoKvtuKDqMsQxifBfJDXTrIIUE9NKdNtauU=.9d30634f-3894-4073-9d98-dba36e9e6c0c@github.com> On Fri, 20 Aug 2021 16:41:40 GMT, Coleen Phillimore wrote: > I moved nonleaf ranked locks to be leaf (or leaf+something). Many of the leaf locks are safepoint_check_never. Segregating this rank into safepoint checking and non-safepoint checking is left for a future RFE. > Tested with tier1-3. Tier 4-6 testing in progress. This pull request has now been integrated. Changeset: 9689f615 Author: Coleen Phillimore URL: https://git.openjdk.java.net/jdk/commit/9689f615206e96f17ffc1fe7a8efeee0a90c904b Stats: 19 lines in 4 files changed: 3 ins; 13 del; 3 mod 8272788: Nonleaf ranked locks should not be safepoint_check_never Reviewed-by: eosterlund, pchilanomate ------------- PR: https://git.openjdk.java.net/jdk/pull/5203 From coleenp at openjdk.java.net Wed Sep 1 19:37:31 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 1 Sep 2021 19:37:31 GMT Subject: RFR: 8137018: [JVMCI] Encapsulate new Thread fields for JVMCI In-Reply-To: References: Message-ID: On Wed, 1 Sep 2021 18:03:11 GMT, Tom Rodriguez wrote: > This evacuates all JVMCI related methods and fields into a separately declared struct. Looks awesome, Thanks! src/hotspot/share/jvmci/jvmci.hpp line 183: > 181: }; > 182: > 183: There seems to be a crazy amount of whitespace here. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5339 From wuyan at openjdk.java.net Thu Sep 2 02:27:29 2021 From: wuyan at openjdk.java.net (Wu Yan) Date: Thu, 2 Sep 2021 02:27:29 GMT Subject: RFR: 8270832: Aarch64: Update algorithm annotations for MacroAssembler::fill_words [v2] In-Reply-To: References: <2xPts-aE-Mr-T24nLCj5WZnGieBVx9oVtJ-WzKcU0mM=.ad6e2fe0-db5f-48c9-a604-88b332c50db1@github.com> Message-ID: On Tue, 20 Jul 2021 06:36:10 GMT, Wang Huang wrote: >> It is found that the comments of `MacroAssembler::fill_words` is not right here. Let's fix that. > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > fix comments Could you do me a favor to review the patch? @theRealAph @nick-arm Thank you. ------------- PR: https://git.openjdk.java.net/jdk/pull/4809 From dholmes at openjdk.java.net Thu Sep 2 02:44:26 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 2 Sep 2021 02:44:26 GMT Subject: RFR: 8273239: Standardize Ticks APIs return type In-Reply-To: References: Message-ID: On Wed, 1 Sep 2021 14:38:52 GMT, Albert Mingkun Yang wrote: > Simple change on return types of Ticks API. > > The call of `milliseconds()` in `spinYield.cpp` seems a bug to me, because the unit in the message is `usecs`. Therefore, I changed it to `microseconds()`. > > Test: tier1 Sorry Albert but I don't see this as an improvement at all. When people ask for the time in ms/us/ns they expect integral values and forcing all the clients to do the casting is counter productive. If there is code that wants a time in say ms + us + ns then they can call seconds() and do the conversion themselves IMO. Cheers, David ------------- PR: https://git.openjdk.java.net/jdk/pull/5332 From ngasson at openjdk.java.net Thu Sep 2 02:49:27 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Thu, 2 Sep 2021 02:49:27 GMT Subject: RFR: 8270832: Aarch64: Update algorithm annotations for MacroAssembler::fill_words [v2] In-Reply-To: References: <2xPts-aE-Mr-T24nLCj5WZnGieBVx9oVtJ-WzKcU0mM=.ad6e2fe0-db5f-48c9-a604-88b332c50db1@github.com> Message-ID: On Tue, 20 Jul 2021 06:36:10 GMT, Wang Huang wrote: >> It is found that the comments of `MacroAssembler::fill_words` is not right here. Let's fix that. > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > fix comments Looks OK. ------------- Marked as reviewed by ngasson (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/4809 From dholmes at openjdk.java.net Thu Sep 2 02:58:29 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 2 Sep 2021 02:58:29 GMT Subject: RFR: 8273239: Standardize Ticks APIs return type In-Reply-To: References: Message-ID: On Wed, 1 Sep 2021 14:38:52 GMT, Albert Mingkun Yang wrote: > Simple change on return types of Ticks API. > > The call of `milliseconds()` in `spinYield.cpp` seems a bug to me, because the unit in the message is `usecs`. Therefore, I changed it to `microseconds()`. > > Test: tier1 src/hotspot/share/jfr/periodic/jfrThreadCPULoadEvent.cpp line 135: > 133: } > 134: log_trace(jfr)("Measured CPU usage for %d threads in %.3f milliseconds", number_of_threads, > 135: (double)(JfrTicks::now() - event_time).milliseconds()); I think this one is a simple bug - the wrong format specifier is being used. I don't think this expects to see e.g. 3.6 milliseconds. But confirm with JFR folk. ------------- PR: https://git.openjdk.java.net/jdk/pull/5332 From dholmes at openjdk.java.net Thu Sep 2 04:46:28 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 2 Sep 2021 04:46:28 GMT Subject: RFR: 8137018: [JVMCI] Encapsulate new Thread fields for JVMCI In-Reply-To: References: Message-ID: On Wed, 1 Sep 2021 18:03:11 GMT, Tom Rodriguez wrote: > This evacuates all JVMCI related methods and fields into a separately declared struct. Nice refactoring! A few style nits but nothing of consequence. Thanks, David src/hotspot/share/jvmci/jvmci.cpp line 405: > 403: } > 404: > 405: Nit: there are a few double-blank lines between definitions when one is normal. src/hotspot/share/jvmci/jvmci.hpp line 198: > 196: > 197: // Communicates the DeoptReason and DeoptAction of the uncommon trap > 198: int _pending_deoptimization; Nit: Why the extra large alignment spacing of all the declarations? (I'm not a fan of such alignment as it is too hard to maintain - and too hard to type in the first place!) src/hotspot/share/jvmci/jvmci.hpp line 249: > 247: void set_jvmci_reserved_oop0(oop value) { > 248: _jvmci_reserved_oop0 = value; > 249: } Nit: why is this and following definitions multi-line when the preceding ones (of similar size) are single line? ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5339 From njian at openjdk.java.net Thu Sep 2 07:38:43 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Thu, 2 Sep 2021 07:38:43 GMT Subject: RFR: 8267356: AArch64: Vector API SVE codegen support [v6] In-Reply-To: <04_lDZDCcLLfXx6XmrbVdGXkPmqNrYzQuHBJzQ9Oa5k=.f3ef1398-c8b4-4162-9631-3dbfb6594250@github.com> References: <04_lDZDCcLLfXx6XmrbVdGXkPmqNrYzQuHBJzQ9Oa5k=.f3ef1398-c8b4-4162-9631-3dbfb6594250@github.com> Message-ID: > This is the integration of current SVE work done in panama-vector/vectorIntrinscs, which includes: > > 1. Code generation for Vector API c2 IR nodes with SVE. > 2. Non-max vector size support with SVE, e.g. using *128Vector (and *64Vector) APIs on 256-bit SVE environment could also generate optimized SVE instructions with predicate feature. > 3. Some more SVE assemblers (and tests) used by the codegen part. > > Note: VectorMask is still represented in vector register, a further improvement to map mask to predicate register is under development at https://github.com/openjdk/panama-vector/tree/vectorIntrinsics+mask > > > Test: tier1-3 with vector api test cases passed on 512-bit SVE hardware with MaxVectorSize=16/32/64. Ningsheng Jian has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Merge with master - More comments from Andrew. - Add missing part - Address Andrew's comments - 8267356: AArch64: Vector API SVE codegen support This is the integration of current SVE work done in panama-vector/vectorIntrinscs, which includes: 1. Code generation for Vector API c2 IR nodes with SVE. 2. Non-max vector size support with SVE, e.g. using *128Vector APIs on 256-bit SVE environment could also generate optimized SVE instructions with predicate feature. 3. Some more SVE assemblers (and tests) used by the codegen part. Note: VectorMask is still represented in vector register, a further improvement to map mask to predicate register is under development at https://github.com/openjdk/panama-vector/tree/vectorIntrinsics+mask Test: tier1-3 with vector api test cases passed on 512-bit SVE hardware with MaxVectorSize=16/32/64. ------------- Changes: https://git.openjdk.java.net/jdk/pull/4122/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4122&range=05 Stats: 5761 lines in 13 files changed: 4576 ins; 195 del; 990 mod Patch: https://git.openjdk.java.net/jdk/pull/4122.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4122/head:pull/4122 PR: https://git.openjdk.java.net/jdk/pull/4122 From shade at openjdk.java.net Thu Sep 2 08:03:39 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 2 Sep 2021 08:03:39 GMT Subject: Integrated: 8263375: Support stack watermarks in Zero VM In-Reply-To: <044dfg2EqIGmN7EM-PX2JkjpYwITRDIiA9qZqPHlFHA=.ead4f0b8-e21c-41eb-9fdb-7954cfb548e0@github.com> References: <044dfg2EqIGmN7EM-PX2JkjpYwITRDIiA9qZqPHlFHA=.ead4f0b8-e21c-41eb-9fdb-7954cfb548e0@github.com> Message-ID: On Thu, 8 Jul 2021 16:48:26 GMT, Aleksey Shipilev wrote: > Zero VM supports most of GCs. Since JDK 16, Shenandoah uses stack watermarks, so Zero has to support those if Shenandoah+Zero support is to remain. This PR adds the stack watermark support in Zero VM. This should also be useful as other projects, notably Loom, mature and depend on stack watermarks. > > Zero already calls into Hotspot safepoint machinery to do things, and it seems only the hooks for `on_iteration` and `on_unwind` are missing. AFAICS, Zero only has on-return safepoints, renamed it to be more precise. > > @fisk, do you see any obvious problems with this patch? > > Additional testing: > - [x] Linux x86_64 Zero `hotspot_gc_shenandoah` now passes This pull request has now been integrated. Changeset: 857a930b Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/857a930bde8b53f77a23737f4ca6ff8f3da2af66 Stats: 58 lines in 5 files changed: 37 ins; 14 del; 7 mod 8263375: Support stack watermarks in Zero VM Reviewed-by: eosterlund ------------- PR: https://git.openjdk.java.net/jdk/pull/4728 From aph at openjdk.java.net Thu Sep 2 08:11:35 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 2 Sep 2021 08:11:35 GMT Subject: RFR: 8270832: Aarch64: Update algorithm annotations for MacroAssembler::fill_words [v2] In-Reply-To: References: <2xPts-aE-Mr-T24nLCj5WZnGieBVx9oVtJ-WzKcU0mM=.ad6e2fe0-db5f-48c9-a604-88b332c50db1@github.com> Message-ID: On Tue, 20 Jul 2021 06:36:10 GMT, Wang Huang wrote: >> It is found that the comments of `MacroAssembler::fill_words` is not right here. Let's fix that. > > Wang Huang has updated the pull request incrementally with one additional commit since the last revision: > > fix comments Marked as reviewed by aph (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/4809 From egahlin at openjdk.java.net Thu Sep 2 09:04:33 2021 From: egahlin at openjdk.java.net (Erik Gahlin) Date: Thu, 2 Sep 2021 09:04:33 GMT Subject: RFR: 8273239: Standardize Ticks APIs return type In-Reply-To: References: Message-ID: <6OjbxEnzLKV0xx7OVlDiHvHK0e_7PV6Kv1JPQ2wZJZ4=.003eb500-43d3-4ea5-9b53-eaddcc63c030@github.com> On Thu, 2 Sep 2021 02:55:43 GMT, David Holmes wrote: >> Simple change on return types of Ticks API. >> >> The call of `milliseconds()` in `spinYield.cpp` seems a bug to me, because the unit in the message is `usecs`. Therefore, I changed it to `microseconds()`. >> >> Test: tier1 > > src/hotspot/share/jfr/periodic/jfrThreadCPULoadEvent.cpp line 135: > >> 133: } >> 134: log_trace(jfr)("Measured CPU usage for %d threads in %.3f milliseconds", number_of_threads, >> 135: (double)(JfrTicks::now() - event_time).milliseconds()); > > I think this one is a simple bug - the wrong format specifier is being used. I don't think this expects to see e.g. 3.6 milliseconds. But confirm with JFR folk. I don't know what the original intent was, but seems fine to print a fraction now, ------------- PR: https://git.openjdk.java.net/jdk/pull/5332 From ayang at openjdk.java.net Thu Sep 2 09:27:39 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Thu, 2 Sep 2021 09:27:39 GMT Subject: RFR: 8273239: Standardize Ticks APIs return type In-Reply-To: References: Message-ID: On Thu, 2 Sep 2021 02:41:07 GMT, David Holmes wrote: > When people ask for the time in ms/us/ns they expect integral values Situations for both expecting fractional and integral values exist. Standardizing the APIs means the caller can switch between different units, get the same amount of info, and decide how to treat the sub-unit part. > If there is code that wants a time in say ms + us + ns then they can call seconds() and do the conversion themselves IMO. Yes; there are many `seconds() * 1000.0` in `g1CollectedHeap.cpp` for instance, which could be simplified with the new APIs. ------------- PR: https://git.openjdk.java.net/jdk/pull/5332 From kbarrett at openjdk.java.net Thu Sep 2 10:48:26 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 2 Sep 2021 10:48:26 GMT Subject: RFR: 8273239: Standardize Ticks APIs return type In-Reply-To: References: Message-ID: On Wed, 1 Sep 2021 14:38:52 GMT, Albert Mingkun Yang wrote: > Simple change on return types of Ticks API. > > The call of `milliseconds()` in `spinYield.cpp` seems a bug to me, because the unit in the message is `usecs`. Therefore, I changed it to `microseconds()`. > > Test: tier1 Converting nanosecond time values to double can be information losing. I think some helper functions to provide floating point values could be useful, but I think the existing functions should not be changed as proposed. ------------- PR: https://git.openjdk.java.net/jdk/pull/5332 From dholmes at openjdk.java.net Thu Sep 2 10:48:26 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 2 Sep 2021 10:48:26 GMT Subject: RFR: 8273239: Standardize Ticks APIs return type In-Reply-To: References: Message-ID: On Wed, 1 Sep 2021 14:38:52 GMT, Albert Mingkun Yang wrote: > Simple change on return types of Ticks API. > > The call of `milliseconds()` in `spinYield.cpp` seems a bug to me, because the unit in the message is `usecs`. Therefore, I changed it to `microseconds()`. > > Test: tier1 The JFR team introduced the extended form of these API's 6 years ago, and a more reduced version 8 years ago. Any changes to these API's should be approved by that team IMO. I'm more inclined to expect an API that produces integral values than fractions as the latter suggest you wanted greater resolution so you should have selected the function that provided that greater resolution. YMMV. David ------------- PR: https://git.openjdk.java.net/jdk/pull/5332 From shade at openjdk.java.net Thu Sep 2 10:56:44 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 2 Sep 2021 10:56:44 GMT Subject: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v10] In-Reply-To: References: Message-ID: <2njoOMmi_-cpJX7cNcVhH2Z8b5aZxi8icbw5RGt_IMg=.08a21ecd-dd98-4727-a495-8525b46b5bd1@github.com> > Shenandoah carries forwardee information in object's mark word. Installing the new mark word is effectively "releasing" the object copy, and reading from the new mark word is "acquiring" that object copy. > > For the forwardee update side, Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. This seems to be excessive for Shenandoah forwardee updates, and "release" is enough. > > The reader side is much more interesting, because we generally want "consume", but it is not available. We can do "acquire", but it regresses performance all too much. The close inspection of the code reveals we need "acquire" on many paths, but not on the most critical one: heap updates. This must explain why current weaker reader side was never seen to fail, and this also opens a way to get `acquire`-in-lieu-of-`consume` without the observable performance penalty. > > The relaxation in forwardee installation improves concurrent evacuation quite visibly. See for example GC cycle times with SPECjvm2008, Compiler.sunflow on AArch64: > > Before: > > > [info][gc,stats] Concurrent Evacuation = 3.421 s (a = 21247 us) (n = 161) > [info][gc,stats] Concurrent Evacuation = 3.584 s (a = 21080 us) (n = 170) > [info][gc,stats] Concurrent Evacuation = 3.226 s (a = 21088 us) (n = 153) > [info][gc,stats] Concurrent Evacuation = 3.270 s (a = 20827 us) (n = 157) > [info][gc,stats] Concurrent Evacuation = 3.339 s (a = 20742 us) (n = 161) > > > After: > > [info][gc,stats] Concurrent Evacuation = 3.109 s (a = 18617 us) (n = 167) > [info][gc,stats] Concurrent Evacuation = 3.027 s (a = 18918 us) (n = 160) > [info][gc,stats] Concurrent Evacuation = 2.862 s (a = 17669 us) (n = 162) > [info][gc,stats] Concurrent Evacuation = 2.858 s (a = 17425 us) (n = 164) > [info][gc,stats] Concurrent Evacuation = 2.883 s (a = 17685 us) (n = 163) > > > Additional testing: > - [x] Linux x86_64 `hotspot_gc_shenandoah` > - [x] Linux AArch64 `hotspot_gc_shenandoah` > - [x] Linux x86_64 `tier1` with Shenandoah > - [x] Linux AArch64 `tier1` with Shenandoah Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: - Merge branch 'master' into JDK-8261492-shenandoah-forwardee-memord - Doing acquires on most paths, and relaxed on the path that matters: heap update - Even more discussion - Additional discussion and corner cases - Merge branch 'master' into JDK-8261492-shenandoah-forwardee-memord - Add TODO - "acquire" is too slow on aarch64, and does not seem neccessary anyway - Merge branch 'master' into JDK-8261492-shenandoah-forwardee-memord - 8261492: Shenandoah: reconsider forwardee accesses memory ordering ------------- Changes: https://git.openjdk.java.net/jdk/pull/2496/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2496&range=09 Stats: 174 lines in 13 files changed: 104 ins; 30 del; 40 mod Patch: https://git.openjdk.java.net/jdk/pull/2496.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2496/head:pull/2496 PR: https://git.openjdk.java.net/jdk/pull/2496 From ayang at openjdk.java.net Thu Sep 2 12:28:28 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Thu, 2 Sep 2021 12:28:28 GMT Subject: RFR: 8273239: Standardize Ticks APIs return type In-Reply-To: References: Message-ID: On Thu, 2 Sep 2021 10:44:21 GMT, David Holmes wrote: > Any changes to these API's should be approved by that team IMO. I checked with Markus before sending out this PR. > I'm more inclined to expect an API that produces integral values than fractions Only `nanoseconds()` can return an integral value without discarding any info. Converting to other units requires some floating calculation; the fractional part is either dropped behind the API (in `master`) or controlled by the caller (in this patch). > Converting nanosecond time values to double can be information losing. True. Such loss comes from the following conversion (from `ticks.hpp`). (Note: this kind of loss is different from the one discarding the fractional part on API boundary.) template inline double conversion(typename TimeSource::Type& value) { return (double)value * ((double)unit / (double)TimeSource::frequency()); } I am not sure how significant this loss is in practice; all callers of `seconds()` (the most used API among the four) suffer from this loss. ------------- PR: https://git.openjdk.java.net/jdk/pull/5332 From david.holmes at oracle.com Thu Sep 2 13:19:18 2021 From: david.holmes at oracle.com (David Holmes) Date: Thu, 2 Sep 2021 23:19:18 +1000 Subject: RFR: 8273239: Standardize Ticks APIs return type In-Reply-To: References: Message-ID: <12d34936-f4d6-5a56-c693-4e18bcf8638c@oracle.com> On 2/09/2021 10:28 pm, Albert Mingkun Yang wrote: > On Thu, 2 Sep 2021 10:44:21 GMT, David Holmes wrote: > >> Any changes to these API's should be approved by that team IMO. > > I checked with Markus before sending out this PR. It would be good if Marcus could review it then. >> I'm more inclined to expect an API that produces integral values than fractions > > Only `nanoseconds()` can return an integral value without discarding any info. Converting to other units requires some floating calculation; the fractional part is either dropped behind the API (in `master`) or controlled by the caller (in this patch). I expect there to be "discarded" information. I'm asking for how many milliseconds have "elapsed". If I want to know about fractional milliseconds I should ask how many microseconds or nanoseconds have elapsed instead. When we ask for the current time as "milliseconds since the epoch" we expect an integral number at that resolution; the fact their could be additional microseconds and nanoseconds is immaterial. David ----- >> Converting nanosecond time values to double can be information losing. > > True. Such loss comes from the following conversion (from `ticks.hpp`). (Note: this kind of loss is different from the one discarding the fractional part on API boundary.) > > > template > inline double conversion(typename TimeSource::Type& value) { > return (double)value * ((double)unit / (double)TimeSource::frequency()); > } > > > I am not sure how significant this loss is in practice; all callers of `seconds()` (the most used API among the four) suffer from this loss. > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/5332 > From coleenp at openjdk.java.net Thu Sep 2 13:43:36 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 2 Sep 2021 13:43:36 GMT Subject: RFR: 8272914: Create hotspot:tier2 and hotspot:tier3 test groups [v4] In-Reply-To: References: <9aYAUWwQ9ay7S65EIVoEU0eDvIEEh5NFxx6yBtp9tbk=.0f57298c-5e68-4c3a-9237-9e5568225eee@github.com> Message-ID: On Wed, 1 Sep 2021 08:20:19 GMT, Aleksey Shipilev wrote: >> Currently, no tests run in `hotspot:tier2` and `hotspot:tier3` groups, yet some groups have the "tier2" and "tier3" in their names. As we move tests from `hotspot:tier1`, they need to land in higher tiers. Therefore, we need `hotspot:tier2` and `hotspot:tier3`. >> >> Sample runs: >> >> >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier2 425 425 0 0 >> >> real 11m45.244s >> user 433m48.960s >> sys 38m13.606s >> >> >> >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier3 80 80 0 0 >> >> real 35m19.031s >> user 418m45.607s >> sys 5m41.748s >> >> >> These also hook up properly to global `tier2` and `tier3`. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `hotspot:tier2` >> - [x] Linux x86_64 fastdebug `hotspot:tier3` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Add newly emerged tier2_gc_epsilon to tier2 as well > - Merge branch 'master' into JDK-8272914-hs-tier2-3 > - Cleaner test group definitions > - Filter InvocationTests in tier3, leaving the existing group alone > - 8272914: Create hotspot:tier2 and hotspot:tier3 test groups Does this mean that we can run this locally as: make test TEST=hotspot:tier2 or 3 on the command line. I like this but is there a way to run all of them locally with one command line? ------------- PR: https://git.openjdk.java.net/jdk/pull/5241 From dnsimon at openjdk.java.net Thu Sep 2 13:46:27 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Thu, 2 Sep 2021 13:46:27 GMT Subject: RFR: 8137018: [JVMCI] Encapsulate new Thread fields for JVMCI In-Reply-To: References: Message-ID: On Wed, 1 Sep 2021 18:03:11 GMT, Tom Rodriguez wrote: > This evacuates all JVMCI related methods and fields into a separately declared struct. Marked as reviewed by dnsimon (Committer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5339 From dnsimon at openjdk.java.net Thu Sep 2 13:46:28 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Thu, 2 Sep 2021 13:46:28 GMT Subject: RFR: 8137018: [JVMCI] Encapsulate new Thread fields for JVMCI In-Reply-To: References: Message-ID: On Thu, 2 Sep 2021 04:25:23 GMT, David Holmes wrote: >> This evacuates all JVMCI related methods and fields into a separately declared struct. > > src/hotspot/share/jvmci/jvmci.hpp line 198: > >> 196: >> 197: // Communicates the DeoptReason and DeoptAction of the uncommon trap >> 198: int _pending_deoptimization; > > Nit: Why the extra large alignment spacing of all the declarations? (I'm not a fan of such alignment as it is too hard to maintain - and too hard to type in the first place!) That probably comes from a time before we nicely commented each JVMCI field ;-) I agree that there's no need for the alignment now. ------------- PR: https://git.openjdk.java.net/jdk/pull/5339 From coleenp at openjdk.java.net Thu Sep 2 13:50:35 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 2 Sep 2021 13:50:35 GMT Subject: RFR: 8272914: Create hotspot:tier2 and hotspot:tier3 test groups [v4] In-Reply-To: References: <9aYAUWwQ9ay7S65EIVoEU0eDvIEEh5NFxx6yBtp9tbk=.0f57298c-5e68-4c3a-9237-9e5568225eee@github.com> Message-ID: On Wed, 1 Sep 2021 08:20:19 GMT, Aleksey Shipilev wrote: >> Currently, no tests run in `hotspot:tier2` and `hotspot:tier3` groups, yet some groups have the "tier2" and "tier3" in their names. As we move tests from `hotspot:tier1`, they need to land in higher tiers. Therefore, we need `hotspot:tier2` and `hotspot:tier3`. >> >> Sample runs: >> >> >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier2 425 425 0 0 >> >> real 11m45.244s >> user 433m48.960s >> sys 38m13.606s >> >> >> >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier3 80 80 0 0 >> >> real 35m19.031s >> user 418m45.607s >> sys 5m41.748s >> >> >> These also hook up properly to global `tier2` and `tier3`. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `hotspot:tier2` >> - [x] Linux x86_64 fastdebug `hotspot:tier3` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Add newly emerged tier2_gc_epsilon to tier2 as well > - Merge branch 'master' into JDK-8272914-hs-tier2-3 > - Cleaner test group definitions > - Filter InvocationTests in tier3, leaving the existing group alone > - 8272914: Create hotspot:tier2 and hotspot:tier3 test groups Can we have a line that runs all three? Like tier123? ------------- PR: https://git.openjdk.java.net/jdk/pull/5241 From shade at openjdk.java.net Thu Sep 2 13:50:31 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 2 Sep 2021 13:50:31 GMT Subject: RFR: 8272914: Create hotspot:tier2 and hotspot:tier3 test groups [v4] In-Reply-To: References: <9aYAUWwQ9ay7S65EIVoEU0eDvIEEh5NFxx6yBtp9tbk=.0f57298c-5e68-4c3a-9237-9e5568225eee@github.com> Message-ID: <_EzxyS8XNtQPnRyvQBIwQqVMLbC_wyHNKiPxkFSaxDY=.dc967d31-62e2-465a-8f7d-3ea59b9b33e3@github.com> On Thu, 2 Sep 2021 13:40:10 GMT, Coleen Phillimore wrote: > Does this mean that we can run this locally as: > make test TEST=hotspot:tier2 or 3 on the command line. > I like this but is there a way to run all of them locally with one command line? Yes, you can run these with `make test TEST=hotspot:tier2`. Also, they will run along with existing jdk, langtools, jaxp tier2 tests, if you run `make test TEST=tier2`. ------------- PR: https://git.openjdk.java.net/jdk/pull/5241 From shade at openjdk.java.net Thu Sep 2 13:55:27 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 2 Sep 2021 13:55:27 GMT Subject: RFR: 8272914: Create hotspot:tier2 and hotspot:tier3 test groups [v4] In-Reply-To: References: <9aYAUWwQ9ay7S65EIVoEU0eDvIEEh5NFxx6yBtp9tbk=.0f57298c-5e68-4c3a-9237-9e5568225eee@github.com> Message-ID: On Thu, 2 Sep 2021 13:47:08 GMT, Coleen Phillimore wrote: > Can we have a line that runs all three? Like tier123? Ah. That would be unconventional -- there are no such things is jdk, langtools, jaxp test groups. If you are willing to give up tiered test profiles, the test profile you are looking for is probably `hotspot_all`? ------------- PR: https://git.openjdk.java.net/jdk/pull/5241 From coleenp at openjdk.java.net Thu Sep 2 14:02:35 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 2 Sep 2021 14:02:35 GMT Subject: RFR: 8272914: Create hotspot:tier2 and hotspot:tier3 test groups [v4] In-Reply-To: References: <9aYAUWwQ9ay7S65EIVoEU0eDvIEEh5NFxx6yBtp9tbk=.0f57298c-5e68-4c3a-9237-9e5568225eee@github.com> Message-ID: On Wed, 1 Sep 2021 08:20:19 GMT, Aleksey Shipilev wrote: >> Currently, no tests run in `hotspot:tier2` and `hotspot:tier3` groups, yet some groups have the "tier2" and "tier3" in their names. As we move tests from `hotspot:tier1`, they need to land in higher tiers. Therefore, we need `hotspot:tier2` and `hotspot:tier3`. >> >> Sample runs: >> >> >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier2 425 425 0 0 >> >> real 11m45.244s >> user 433m48.960s >> sys 38m13.606s >> >> >> >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier3 80 80 0 0 >> >> real 35m19.031s >> user 418m45.607s >> sys 5m41.748s >> >> >> These also hook up properly to global `tier2` and `tier3`. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `hotspot:tier2` >> - [x] Linux x86_64 fastdebug `hotspot:tier3` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Add newly emerged tier2_gc_epsilon to tier2 as well > - Merge branch 'master' into JDK-8272914-hs-tier2-3 > - Cleaner test group definitions > - Filter InvocationTests in tier3, leaving the existing group alone > - 8272914: Create hotspot:tier2 and hotspot:tier3 test groups So it's not just like tier123 = :tier1 :tier2 :tier3 ? The reason I'm asking is that I run tier1 all the time locally before sending it to some machine land, and if things move out, I don't want to miss them. ------------- PR: https://git.openjdk.java.net/jdk/pull/5241 From shade at openjdk.java.net Thu Sep 2 14:02:35 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 2 Sep 2021 14:02:35 GMT Subject: RFR: 8272914: Create hotspot:tier2 and hotspot:tier3 test groups [v4] In-Reply-To: References: <9aYAUWwQ9ay7S65EIVoEU0eDvIEEh5NFxx6yBtp9tbk=.0f57298c-5e68-4c3a-9237-9e5568225eee@github.com> Message-ID: On Thu, 2 Sep 2021 13:56:09 GMT, Coleen Phillimore wrote: > The reason I'm asking is that I run tier1 all the time locally before sending it to some machine land, and if things move out, I don't want to miss them. I understand the use case, I have a similar one. I solved it for my workflow with a little bash script that runs `tier1`, `tier2`, etc. The additional bonus of scripting it outside the test group is that you can ran some lower tiers with additional `TEST_VM_OPTS`. So, I would prefer to leave this outside the test groups. ------------- PR: https://git.openjdk.java.net/jdk/pull/5241 From markus.gronlund at oracle.com Thu Sep 2 14:43:08 2021 From: markus.gronlund at oracle.com (Markus Gronlund) Date: Thu, 2 Sep 2021 14:43:08 +0000 Subject: RFR: 8273239: Standardize Ticks APIs return type In-Reply-To: <12d34936-f4d6-5a56-c693-4e18bcf8638c@oracle.com> References: <12d34936-f4d6-5a56-c693-4e18bcf8638c@oracle.com> Message-ID: Hi Albert, As we talked a little bit offline, IIRC the reason for providing these return types was because it was modelled it on the precedences in runtime/os.hpp: static jlong javaTimeMillis(); static jlong javaTimeNanos(); static void javaTimeNanos_info(jvmtiTimerInfo *info_ptr); static void javaTimeSystemUTC(jlong &seconds, jlong &nanos); static void run_periodic_checks(); // Returns the elapsed time in seconds since the vm started. static double elapsedTime(); static jlong elapsed_counter(); Since I don't have any call sites that use the Ticks conversion methods, which I mostly think is used by the GC code, and I do not remember the exact reasoning, I thought your standardizing effort for "double" to be ok. But now I think that David and others have brought good arguments to this PR highlighting benefits in having the sub-second representations as integrals, so I now believe we should perhaps keep the return types as is. Maybe it is possible to approach this instead along the lines suggested by Kim, by introducing a few helper functions to provide floating point values? Thanks Markus -----Original Message----- From: hotspot-dev On Behalf Of David Holmes Sent: den 2 september 2021 15:19 To: Albert Mingkun Yang ; hotspot-dev at openjdk.java.net Subject: Re: RFR: 8273239: Standardize Ticks APIs return type On 2/09/2021 10:28 pm, Albert Mingkun Yang wrote: > On Thu, 2 Sep 2021 10:44:21 GMT, David Holmes wrote: > >> Any changes to these API's should be approved by that team IMO. > > I checked with Markus before sending out this PR. It would be good if Marcus could review it then. >> I'm more inclined to expect an API that produces integral values than >> fractions > > Only `nanoseconds()` can return an integral value without discarding any info. Converting to other units requires some floating calculation; the fractional part is either dropped behind the API (in `master`) or controlled by the caller (in this patch). I expect there to be "discarded" information. I'm asking for how many milliseconds have "elapsed". If I want to know about fractional milliseconds I should ask how many microseconds or nanoseconds have elapsed instead. When we ask for the current time as "milliseconds since the epoch" we expect an integral number at that resolution; the fact their could be additional microseconds and nanoseconds is immaterial. David ----- >> Converting nanosecond time values to double can be information losing. > > True. Such loss comes from the following conversion (from > `ticks.hpp`). (Note: this kind of loss is different from the one > discarding the fractional part on API boundary.) > > > template inline double > conversion(typename TimeSource::Type& value) { > return (double)value * ((double)unit / > (double)TimeSource::frequency()); } > > > I am not sure how significant this loss is in practice; all callers of `seconds()` (the most used API among the four) suffer from this loss. > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/5332 > From coleenp at openjdk.java.net Thu Sep 2 15:35:30 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 2 Sep 2021 15:35:30 GMT Subject: RFR: 8272914: Create hotspot:tier2 and hotspot:tier3 test groups [v4] In-Reply-To: References: <9aYAUWwQ9ay7S65EIVoEU0eDvIEEh5NFxx6yBtp9tbk=.0f57298c-5e68-4c3a-9237-9e5568225eee@github.com> Message-ID: On Wed, 1 Sep 2021 08:20:19 GMT, Aleksey Shipilev wrote: >> Currently, no tests run in `hotspot:tier2` and `hotspot:tier3` groups, yet some groups have the "tier2" and "tier3" in their names. As we move tests from `hotspot:tier1`, they need to land in higher tiers. Therefore, we need `hotspot:tier2` and `hotspot:tier3`. >> >> Sample runs: >> >> >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier2 425 425 0 0 >> >> real 11m45.244s >> user 433m48.960s >> sys 38m13.606s >> >> >> >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier3 80 80 0 0 >> >> real 35m19.031s >> user 418m45.607s >> sys 5m41.748s >> >> >> These also hook up properly to global `tier2` and `tier3`. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `hotspot:tier2` >> - [x] Linux x86_64 fastdebug `hotspot:tier3` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Add newly emerged tier2_gc_epsilon to tier2 as well > - Merge branch 'master' into JDK-8272914-hs-tier2-3 > - Cleaner test group definitions > - Filter InvocationTests in tier3, leaving the existing group alone > - 8272914: Create hotspot:tier2 and hotspot:tier3 test groups Ok! Sounds good. Thought I'd try. :) ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5241 From iignatyev at openjdk.java.net Thu Sep 2 15:49:13 2021 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Thu, 2 Sep 2021 15:49:13 GMT Subject: RFR: 8272914: Create hotspot:tier2 and hotspot:tier3 test groups [v4] In-Reply-To: References: <9aYAUWwQ9ay7S65EIVoEU0eDvIEEh5NFxx6yBtp9tbk=.0f57298c-5e68-4c3a-9237-9e5568225eee@github.com> Message-ID: On Wed, 1 Sep 2021 08:20:19 GMT, Aleksey Shipilev wrote: >> Currently, no tests run in `hotspot:tier2` and `hotspot:tier3` groups, yet some groups have the "tier2" and "tier3" in their names. As we move tests from `hotspot:tier1`, they need to land in higher tiers. Therefore, we need `hotspot:tier2` and `hotspot:tier3`. >> >> Sample runs: >> >> >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier2 425 425 0 0 >> >> real 11m45.244s >> user 433m48.960s >> sys 38m13.606s >> >> >> >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier3 80 80 0 0 >> >> real 35m19.031s >> user 418m45.607s >> sys 5m41.748s >> >> >> These also hook up properly to global `tier2` and `tier3`. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `hotspot:tier2` >> - [x] Linux x86_64 fastdebug `hotspot:tier3` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Add newly emerged tier2_gc_epsilon to tier2 as well > - Merge branch 'master' into JDK-8272914-hs-tier2-3 > - Cleaner test group definitions > - Filter InvocationTests in tier3, leaving the existing group alone > - 8272914: Create hotspot:tier2 and hotspot:tier3 test groups LGTM, I think it also makes sense to introduce `tier4` group (in all test suites) to catch the tests which aren't part of `tier[1-3]`. -- Igor ------------- Marked as reviewed by iignatyev (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5241 From shade at openjdk.java.net Thu Sep 2 15:49:17 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 2 Sep 2021 15:49:17 GMT Subject: RFR: 8272914: Create hotspot:tier2 and hotspot:tier3 test groups [v4] In-Reply-To: References: <9aYAUWwQ9ay7S65EIVoEU0eDvIEEh5NFxx6yBtp9tbk=.0f57298c-5e68-4c3a-9237-9e5568225eee@github.com> Message-ID: On Wed, 1 Sep 2021 08:20:19 GMT, Aleksey Shipilev wrote: >> Currently, no tests run in `hotspot:tier2` and `hotspot:tier3` groups, yet some groups have the "tier2" and "tier3" in their names. As we move tests from `hotspot:tier1`, they need to land in higher tiers. Therefore, we need `hotspot:tier2` and `hotspot:tier3`. >> >> Sample runs: >> >> >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier2 425 425 0 0 >> >> real 11m45.244s >> user 433m48.960s >> sys 38m13.606s >> >> >> >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier3 80 80 0 0 >> >> real 35m19.031s >> user 418m45.607s >> sys 5m41.748s >> >> >> These also hook up properly to global `tier2` and `tier3`. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `hotspot:tier2` >> - [x] Linux x86_64 fastdebug `hotspot:tier3` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Add newly emerged tier2_gc_epsilon to tier2 as well > - Merge branch 'master' into JDK-8272914-hs-tier2-3 > - Cleaner test group definitions > - Filter InvocationTests in tier3, leaving the existing group alone > - 8272914: Create hotspot:tier2 and hotspot:tier3 test groups Thanks ;) All right then, I'll integrate. ------------- PR: https://git.openjdk.java.net/jdk/pull/5241 From shade at openjdk.java.net Thu Sep 2 15:49:18 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 2 Sep 2021 15:49:18 GMT Subject: RFR: 8272914: Create hotspot:tier2 and hotspot:tier3 test groups [v4] In-Reply-To: References: <9aYAUWwQ9ay7S65EIVoEU0eDvIEEh5NFxx6yBtp9tbk=.0f57298c-5e68-4c3a-9237-9e5568225eee@github.com> Message-ID: On Thu, 2 Sep 2021 15:41:07 GMT, Igor Ignatyev wrote: > I think it also makes sense to introduce `tier4` group (in all test suites) to catch the tests which aren't part of `tier[1-3]`. Good idea, I'll draft up the patch. ------------- PR: https://git.openjdk.java.net/jdk/pull/5241 From dcubed at openjdk.java.net Thu Sep 2 15:49:22 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Thu, 2 Sep 2021 15:49:22 GMT Subject: RFR: 8272914: Create hotspot:tier2 and hotspot:tier3 test groups [v4] In-Reply-To: References: <9aYAUWwQ9ay7S65EIVoEU0eDvIEEh5NFxx6yBtp9tbk=.0f57298c-5e68-4c3a-9237-9e5568225eee@github.com> Message-ID: On Wed, 1 Sep 2021 08:20:19 GMT, Aleksey Shipilev wrote: >> Currently, no tests run in `hotspot:tier2` and `hotspot:tier3` groups, yet some groups have the "tier2" and "tier3" in their names. As we move tests from `hotspot:tier1`, they need to land in higher tiers. Therefore, we need `hotspot:tier2` and `hotspot:tier3`. >> >> Sample runs: >> >> >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier2 425 425 0 0 >> >> real 11m45.244s >> user 433m48.960s >> sys 38m13.606s >> >> >> >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier3 80 80 0 0 >> >> real 35m19.031s >> user 418m45.607s >> sys 5m41.748s >> >> >> These also hook up properly to global `tier2` and `tier3`. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `hotspot:tier2` >> - [x] Linux x86_64 fastdebug `hotspot:tier3` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Add newly emerged tier2_gc_epsilon to tier2 as well > - Merge branch 'master' into JDK-8272914-hs-tier2-3 > - Cleaner test group definitions > - Filter InvocationTests in tier3, leaving the existing group alone > - 8272914: Create hotspot:tier2 and hotspot:tier3 test groups Will this work: `make test TEST=hotspot:tier1,hotspot:tier2,hotspot:tier3`? ------------- PR: https://git.openjdk.java.net/jdk/pull/5241 From shade at openjdk.java.net Thu Sep 2 15:49:22 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 2 Sep 2021 15:49:22 GMT Subject: RFR: 8272914: Create hotspot:tier2 and hotspot:tier3 test groups [v4] In-Reply-To: References: <9aYAUWwQ9ay7S65EIVoEU0eDvIEEh5NFxx6yBtp9tbk=.0f57298c-5e68-4c3a-9237-9e5568225eee@github.com> Message-ID: On Thu, 2 Sep 2021 15:43:25 GMT, Daniel D. Daugherty wrote: > Will this work: `make test TEST=hotspot:tier1,hotspot:tier2,hotspot:tier3`? Seems not: $ CONF=linux-aarch64-server-fastdebug make images run-test TEST=jdk:tier1,jdk:tier2 Building targets 'images run-test' in configuration 'linux-aarch64-server-fastdebug' Unknown test selection: 'jdk:tier1,jdk:tier2' ------------- PR: https://git.openjdk.java.net/jdk/pull/5241 From shade at openjdk.java.net Thu Sep 2 15:49:23 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 2 Sep 2021 15:49:23 GMT Subject: Integrated: 8272914: Create hotspot:tier2 and hotspot:tier3 test groups In-Reply-To: <9aYAUWwQ9ay7S65EIVoEU0eDvIEEh5NFxx6yBtp9tbk=.0f57298c-5e68-4c3a-9237-9e5568225eee@github.com> References: <9aYAUWwQ9ay7S65EIVoEU0eDvIEEh5NFxx6yBtp9tbk=.0f57298c-5e68-4c3a-9237-9e5568225eee@github.com> Message-ID: On Tue, 24 Aug 2021 17:32:58 GMT, Aleksey Shipilev wrote: > Currently, no tests run in `hotspot:tier2` and `hotspot:tier3` groups, yet some groups have the "tier2" and "tier3" in their names. As we move tests from `hotspot:tier1`, they need to land in higher tiers. Therefore, we need `hotspot:tier2` and `hotspot:tier3`. > > Sample runs: > > > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:tier2 425 425 0 0 > > real 11m45.244s > user 433m48.960s > sys 38m13.606s > > > > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:tier3 80 80 0 0 > > real 35m19.031s > user 418m45.607s > sys 5m41.748s > > > These also hook up properly to global `tier2` and `tier3`. > > Additional testing: > - [x] Linux x86_64 fastdebug `hotspot:tier2` > - [x] Linux x86_64 fastdebug `hotspot:tier3` This pull request has now been integrated. Changeset: 5ee5dd9b Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/5ee5dd9b18fb5adc563a65bd1a29779eda675d61 Stats: 11 lines in 1 file changed: 11 ins; 0 del; 0 mod 8272914: Create hotspot:tier2 and hotspot:tier3 test groups Reviewed-by: dholmes, coleenp, iignatyev ------------- PR: https://git.openjdk.java.net/jdk/pull/5241 From iignatyev at openjdk.java.net Thu Sep 2 16:02:22 2021 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Thu, 2 Sep 2021 16:02:22 GMT Subject: RFR: 8272914: Create hotspot:tier2 and hotspot:tier3 test groups [v4] In-Reply-To: References: <9aYAUWwQ9ay7S65EIVoEU0eDvIEEh5NFxx6yBtp9tbk=.0f57298c-5e68-4c3a-9237-9e5568225eee@github.com> Message-ID: On Wed, 1 Sep 2021 08:20:19 GMT, Aleksey Shipilev wrote: >> Currently, no tests run in `hotspot:tier2` and `hotspot:tier3` groups, yet some groups have the "tier2" and "tier3" in their names. As we move tests from `hotspot:tier1`, they need to land in higher tiers. Therefore, we need `hotspot:tier2` and `hotspot:tier3`. >> >> Sample runs: >> >> >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier2 425 425 0 0 >> >> real 11m45.244s >> user 433m48.960s >> sys 38m13.606s >> >> >> >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier3 80 80 0 0 >> >> real 35m19.031s >> user 418m45.607s >> sys 5m41.748s >> >> >> These also hook up properly to global `tier2` and `tier3`. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `hotspot:tier2` >> - [x] Linux x86_64 fastdebug `hotspot:tier3` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Add newly emerged tier2_gc_epsilon to tier2 as well > - Merge branch 'master' into JDK-8272914-hs-tier2-3 > - Cleaner test group definitions > - Filter InvocationTests in tier3, leaving the existing group alone > - 8272914: Create hotspot:tier2 and hotspot:tier3 test groups that's b/c the delimiter is a space, not a comma, i.e. `make test TEST="hotspot:tier1 hotspot:tier2 hotspot:tier3"` works. ------------- PR: https://git.openjdk.java.net/jdk/pull/5241 From shade at openjdk.java.net Thu Sep 2 16:02:22 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 2 Sep 2021 16:02:22 GMT Subject: RFR: 8272914: Create hotspot:tier2 and hotspot:tier3 test groups [v4] In-Reply-To: References: <9aYAUWwQ9ay7S65EIVoEU0eDvIEEh5NFxx6yBtp9tbk=.0f57298c-5e68-4c3a-9237-9e5568225eee@github.com> Message-ID: On Thu, 2 Sep 2021 15:56:12 GMT, Igor Ignatyev wrote: > that's b/c the delimiter is a space, not a comma, i.e. `make test TEST="hotspot:tier1 hotspot:tier2 hotspot:tier3"` works. Oh! TIL. Nice trick. ------------- PR: https://git.openjdk.java.net/jdk/pull/5241 From shade at openjdk.java.net Thu Sep 2 16:06:45 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 2 Sep 2021 16:06:45 GMT Subject: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v11] In-Reply-To: References: Message-ID: > Shenandoah carries forwardee information in object's mark word. Installing the new mark word is effectively "releasing" the object copy, and reading from the new mark word is "acquiring" that object copy. > > For the forwardee update side, Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. This seems to be excessive for Shenandoah forwardee updates, and "release" is enough. > > The reader side is much more interesting, because we generally want "consume", but it is not available. We can do "acquire", but it regresses performance all too much. The close inspection of the code reveals we need "acquire" on many paths, but not on the most critical one: heap updates. This must explain why current weaker reader side was never seen to fail, and this also opens a way to get `acquire`-in-lieu-of-`consume` without the observable performance penalty. > > The relaxation in forwardee installation improves concurrent evacuation quite visibly. See for example GC cycle times with SPECjvm2008, Compiler.sunflow on AArch64: > > Before: > > > [info][gc,stats] Concurrent Evacuation = 3.421 s (a = 21247 us) (n = 161) > [info][gc,stats] Concurrent Evacuation = 3.584 s (a = 21080 us) (n = 170) > [info][gc,stats] Concurrent Evacuation = 3.226 s (a = 21088 us) (n = 153) > [info][gc,stats] Concurrent Evacuation = 3.270 s (a = 20827 us) (n = 157) > [info][gc,stats] Concurrent Evacuation = 3.339 s (a = 20742 us) (n = 161) > > > After: > > [info][gc,stats] Concurrent Evacuation = 3.109 s (a = 18617 us) (n = 167) > [info][gc,stats] Concurrent Evacuation = 3.027 s (a = 18918 us) (n = 160) > [info][gc,stats] Concurrent Evacuation = 2.862 s (a = 17669 us) (n = 162) > [info][gc,stats] Concurrent Evacuation = 2.858 s (a = 17425 us) (n = 164) > [info][gc,stats] Concurrent Evacuation = 2.883 s (a = 17685 us) (n = 163) > > > Additional testing: > - [x] Linux x86_64 `hotspot_gc_shenandoah` > - [x] Linux AArch64 `hotspot_gc_shenandoah` > - [x] Linux x86_64 `tier1` with Shenandoah > - [x] Linux AArch64 `tier1` with Shenandoah Aleksey Shipilev has updated the pull request incrementally with three additional commits since the last revision: - More natural order of arguments - Move the fwdptr-related updaters to ShenandoahForwarding - Avoid acq_rel that is promoted to seq_cst on ARM <8.3 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2496/files - new: https://git.openjdk.java.net/jdk/pull/2496/files/0dacba04..40941bfc Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2496&range=10 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2496&range=09-10 Stats: 193 lines in 8 files changed: 116 ins; 67 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/2496.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2496/head:pull/2496 PR: https://git.openjdk.java.net/jdk/pull/2496 From kbarrett at openjdk.java.net Thu Sep 2 18:39:38 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 2 Sep 2021 18:39:38 GMT Subject: RFR: 8272807: Permit use of memory concurrent with pretouch Message-ID: Note that this PR replaces the withdrawn https://github.com/openjdk/jdk/pull/5215. Please review this change which adds os::touch_memory, which is similar to os::pretouch_memory but allows concurrent access to the memory while it is being touched. This is accomplished by using an atomic add of zero as the operation for touching the memory, ensuring the virtual location is backed by physical memory while not changing any values being read or written by other threads. While I was there, fixed some other lurking issues in os::pretouch_memory. There was a potential overflow in the iteration that has been fixed. And if the range arguments weren't page aligned then the last page might not get touched. The latter was even mentioned in the function's description. Both of those have been fixed by careful alignment and some extra checks. The resulting code is a little more complicated, but more robust and complete. Similarly added TouchTask, which is similar to PretouchTask. Again here, there is some cleaning up to avoid potential overflows and such. - The chunk size is computed using the page size after possible adjustment for UseTransparentHugePages. We want a chunk size that reflects the actual number of touches that will be performed. - The chunk claim is now done using a CAS that won't exceed the range end. The old atomic-fetch-and-add and check the result, which is performed by each worker thread, could lead to overflow. The old code has a test for overflow, but since pointer-arithmetic overflow is UB that's not reliable. - The old calculation of num_chunks for parallel touching could also potentially overflow. Testing: mach5 tier1-3 ------------- Commit messages: - touch task - add touch_memory Changes: https://git.openjdk.java.net/jdk/pull/5353/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5353&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8272807 Stats: 194 lines in 4 files changed: 122 ins; 8 del; 64 mod Patch: https://git.openjdk.java.net/jdk/pull/5353.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5353/head:pull/5353 PR: https://git.openjdk.java.net/jdk/pull/5353 From hohensee at amazon.com Thu Sep 2 19:04:35 2021 From: hohensee at amazon.com (Hohensee, Paul) Date: Thu, 2 Sep 2021 19:04:35 +0000 Subject: [UNVERIFIED SENDER] RE: RFR: 8273239: Standardize Ticks APIs return type In-Reply-To: References: <12d34936-f4d6-5a56-c693-4e18bcf8638c@oracle.com> Message-ID: <5610D974-4D73-48E6-B73A-2A5512774410@amazon.com> I haven't been following this thread, so please forgive redundancy. For the fully concurrent collectors such as Shenandoah, ZGC, and C4, we want to be able to measure intervals that may be shorter than a millisecond. Azul uses seconds-as-doubles to do this in their MXBean APIs (see https://docs.azul.com/prime/MXBeans), but given that Hotspot has access to nanotime counters and that a long can hold ~272 years of nanoseconds, I'd very much like for Hotspot to standardize on nanoseconds internally and make millis and seconds available as convenience methods. Parenthetically, we're working on a merge of Azul's MXBeans into com.sun.management MXBeans for the purpose of monitoring concurrent collectors, in our case Shenandoah. Azul has been using their interface in production for at least a decade, which is the primary reason we're using their paradigm. The new APIs are a strict superset of what's there, and it's looking like it should be no problem to integrate support for non-concurrent collectors. We've gone with the above scheme and standardized on nanos with millis and seconds as convenience methods. Thanks, Paul ?-----Original Message----- From: hotspot-dev on behalf of Markus Gronlund Date: Thursday, September 2, 2021 at 7:45 AM To: David Holmes , Albert Mingkun Yang , "hotspot-dev at openjdk.java.net" Subject: [EXTERNAL] [UNVERIFIED SENDER] RE: RFR: 8273239: Standardize Ticks APIs return type CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hi Albert, As we talked a little bit offline, IIRC the reason for providing these return types was because it was modelled it on the precedences in runtime/os.hpp: static jlong javaTimeMillis(); static jlong javaTimeNanos(); static void javaTimeNanos_info(jvmtiTimerInfo *info_ptr); static void javaTimeSystemUTC(jlong &seconds, jlong &nanos); static void run_periodic_checks(); // Returns the elapsed time in seconds since the vm started. static double elapsedTime(); static jlong elapsed_counter(); Since I don't have any call sites that use the Ticks conversion methods, which I mostly think is used by the GC code, and I do not remember the exact reasoning, I thought your standardizing effort for "double" to be ok. But now I think that David and others have brought good arguments to this PR highlighting benefits in having the sub-second representations as integrals, so I now believe we should perhaps keep the return types as is. Maybe it is possible to approach this instead along the lines suggested by Kim, by introducing a few helper functions to provide floating point values? Thanks Markus -----Original Message----- From: hotspot-dev On Behalf Of David Holmes Sent: den 2 september 2021 15:19 To: Albert Mingkun Yang ; hotspot-dev at openjdk.java.net Subject: Re: RFR: 8273239: Standardize Ticks APIs return type On 2/09/2021 10:28 pm, Albert Mingkun Yang wrote: > On Thu, 2 Sep 2021 10:44:21 GMT, David Holmes wrote: > >> Any changes to these API's should be approved by that team IMO. > > I checked with Markus before sending out this PR. It would be good if Marcus could review it then. >> I'm more inclined to expect an API that produces integral values than >> fractions > > Only `nanoseconds()` can return an integral value without discarding any info. Converting to other units requires some floating calculation; the fractional part is either dropped behind the API (in `master`) or controlled by the caller (in this patch). I expect there to be "discarded" information. I'm asking for how many milliseconds have "elapsed". If I want to know about fractional milliseconds I should ask how many microseconds or nanoseconds have elapsed instead. When we ask for the current time as "milliseconds since the epoch" we expect an integral number at that resolution; the fact their could be additional microseconds and nanoseconds is immaterial. David ----- >> Converting nanosecond time values to double can be information losing. > > True. Such loss comes from the following conversion (from > `ticks.hpp`). (Note: this kind of loss is different from the one > discarding the fractional part on API boundary.) > > > template inline double > conversion(typename TimeSource::Type& value) { > return (double)value * ((double)unit / > (double)TimeSource::frequency()); } > > > I am not sure how significant this loss is in practice; all callers of `seconds()` (the most used API among the four) suffer from this loss. > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/5332 > From kim.barrett at oracle.com Thu Sep 2 21:40:00 2021 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 2 Sep 2021 21:40:00 +0000 Subject: [UNVERIFIED SENDER] RFR: 8273239: Standardize Ticks APIs return type In-Reply-To: <5610D974-4D73-48E6-B73A-2A5512774410@amazon.com> References: <12d34936-f4d6-5a56-c693-4e18bcf8638c@oracle.com> <5610D974-4D73-48E6-B73A-2A5512774410@amazon.com> Message-ID: [resending from the correct account, so it gets to the mailing list and the PR.] > On Sep 2, 2021, at 3:04 PM, Hohensee, Paul wrote: > > I haven't been following this thread, so please forgive redundancy. > > For the fully concurrent collectors such as Shenandoah, ZGC, and C4, we want to be able to measure intervals that may be shorter than a millisecond. Azul uses seconds-as-doubles to do this in their MXBean APIs (see https://docs.azul.com/prime/MXBeans), but given that Hotspot has access to nanotime counters and that a long can hold ~272 years of nanoseconds, I'd very much like for Hotspot to standardize on nanoseconds internally and make millis and seconds available as convenience methods. There?s been some effort in GC code to use the Ticks utility (which I think ends up being nanoseconds on all supported platforms) internally, and convert to other types for logging and other API boundaries that require other types. See, for example, https://bugs.openjdk.java.net/browse/JDK-8208390 I thought that had already been done, but apparently not. Maybe there was just an unfinished prototype? Quite possibly it got too big for one change set. From ayang at openjdk.java.net Thu Sep 2 21:42:35 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Thu, 2 Sep 2021 21:42:35 GMT Subject: RFR: 8273239: Standardize Ticks APIs return type In-Reply-To: <5610D974-4D73-48E6-B73A-2A5512774410@amazon.com> References: <5610D974-4D73-48E6-B73A-2A5512774410@amazon.com> Message-ID: On Thu, 2 Sep 2021 19:05:57 GMT, Hohensee, Paul wrote: > When we ask for the current time as "milliseconds since the epoch" we expect an integral number at that resolution I don't think there's an established convention on what the expectation is, integral vs fractional; both scenarios can exist. Taking the expect-for-integral logic to the extreme, one could argue `seconds()` should return `uint64_t` as well. > Maybe it is possible to approach this instead along the lines suggested by Kim, by introducing a few helper functions to provide floating point values? Here are my two propositions: (more are welcome) 1. A consistent return type (`double`) for s/ms/us, but not ns. ns is special because it's the only unit without info loss. Callers expect integral values for s/ms/us can easily discard the fractional part. 2. Keep the existing return-type API, and add two more methods for ms/us returning `double`. Note in both propositions, the implementation of `uint64_t nanoseconds()` will be different from `master`, which performs unnecessary double conversion. The new impl should address Kim's concern of `uint64_t-to-double` info loss. // option 1 // s/ms/us do the double conversion&calculation // ns return the underlying counter double seconds(); double milliseconds(); double microseconds(); uint64_t nanoseconds(); // option 2 // s do double conversion&calculation // ms/us do double conversion&calculation and discard fractional // ns return the underlying counter double seconds(); uint64_t milliseconds(); uint64_t microseconds(); uint64_t nanoseconds(); // ms/us do double conversion&calculation double milliseconds_fractional(); double microseconds_fractional(); ------------- PR: https://git.openjdk.java.net/jdk/pull/5332 From serb at openjdk.java.net Thu Sep 2 22:59:31 2021 From: serb at openjdk.java.net (Sergey Bylokhov) Date: Thu, 2 Sep 2021 22:59:31 GMT Subject: Integrated: 8272805: Avoid looking up standard charsets In-Reply-To: References: Message-ID: On Sun, 22 Aug 2021 02:53:44 GMT, Sergey Bylokhov wrote: > This is the continuation of JDK-8233884, JDK-8271456, and JDK-8272120. > > In many places standard charsets are looked up via their names, for example: > absolutePath.getBytes("UTF-8"); > > This could be done more efficiently(up to x20 time faster) with use of java.nio.charset.StandardCharsets: > absolutePath.getBytes(StandardCharsets.UTF_8); > > The later variant also makes the code cleaner, as it is known not to throw UnsupportedEncodingException in contrary to the former variant. > > This change includes: > * demo/utils > * jdk.xx packages > * Some places were missed in the previous changes. I have found it by tracing the calls to the Charset.forName() by executing tier1,2,3 and desktop tests. > > Some performance discussion: https://github.com/openjdk/jdk/pull/5063 > > Code excluded in this fix: the Xerces library(should be fixed upstream), J2DBench(should be compatible to 1.4), some code in the network(the change there are not straightforward, will do it later). > > Tested by the tier1/tier2/tier3 tests on Linux/Windows/macOS. This pull request has now been integrated. Changeset: 7fff22af Author: Sergey Bylokhov URL: https://git.openjdk.java.net/jdk/commit/7fff22afe711c8c04dbf4cf5b4938d40632e4987 Stats: 364 lines in 53 files changed: 98 ins; 128 del; 138 mod 8272805: Avoid looking up standard charsets Reviewed-by: weijun, naoto, dfuchs, azvegint, erikj ------------- PR: https://git.openjdk.java.net/jdk/pull/5210 From dholmes at openjdk.java.net Fri Sep 3 02:17:29 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Fri, 3 Sep 2021 02:17:29 GMT Subject: RFR: 8273239: Standardize Ticks APIs return type In-Reply-To: References: Message-ID: On Wed, 1 Sep 2021 14:38:52 GMT, Albert Mingkun Yang wrote: > Simple change on return types of Ticks API. > > The call of `milliseconds()` in `spinYield.cpp` seems a bug to me, because the unit in the message is `usecs`. Therefore, I changed it to `microseconds()`. > > Test: tier1 I'd vote for option 2. I'd also agree that `seconds()` is the inconsistent part of the API. ------------- PR: https://git.openjdk.java.net/jdk/pull/5332 From thartmann at openjdk.java.net Fri Sep 3 06:53:26 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 3 Sep 2021 06:53:26 GMT Subject: RFR: 8264207: CodeStrings does not honour fixed address assumption. In-Reply-To: References: Message-ID: <3LtxAQNwkKMjejT0RPXO8yV_jOBaL8qW8RPB9KfI1Es=.71ffb105-1df6-44e2-a196-10261a23fdee@github.com> On Fri, 27 Aug 2021 15:19:42 GMT, Patric Hedlin wrote: > Please review this change that addresses an issue originally found in JDK-8259590, > where the error message should read > `fatal error: DEBUG MESSAGE: verify_oop: r10: broken oop in decode_heap_oop` > but is lost since the code generated refers to a fixed (string) address, a string > (address) no longer available when **CodeStrings** have been propagated (copied) > between code buffers/blobs. > > Background > > **CodeStrings** used to be **CodeComments**. The solution to JDK-8008555 introduced > this change and added a new use-case for **CodeStrings**, not only as comments, but > as strings with a fixed address used in the code generated with support for debug > string printouts. > > The changes introduced with JDK-8255208 breaks the fixed address assumption made > in the code generated with support for debug string printouts. Some of the (necessary) > move semantics have been replaced by copy semantics when **CodeStrings** are > propagated between code buffers/blobs (i.e. **CodeBuffer** and **CodeBlob** objects). > > Additional issue addressed > > + Broken printout (i.e. missing remarks) when multiple stubs are generated within > the same primary buffer. > > The following steps have been taken to provide fixed address (debug) strings. > > + Introduce a simple _gtest_ for **CodeString/s** support with **CodeBuffer/CodeBlob**. > + Split **CodeStrings** into two different abstractions; strings associated with an offset, > and strings with a fixed address. > - Let the first be _Assembly Code Remarks_, **AsmRemarks**, providing a simple > 1:N mapping, _offset_ -> _remark_. > - Let the second be _Debug Printout Strings_, **DbgStrings**, supporting a fixed > address assumption such that: > for each string A and B, if A = B -> &A = &B. > - Use a reference counting scheme to ensure that both types of strings are > deallocated properly, when no longer in use. > + Remove **CodeStrings** from **Stub** interface. > - Replace with internal use in the interpreter code, propagating the code > (assembly) remarks directly to the final codelet. > + Remove **CodeBuffer** self destruction, overwriting memory before all > deconstructors have been executed. > - Replace with sentinel deconstructor to do the bidding. > + Stub code generated into a single common **CodeBuffer** (holding a number > of stubs) will not print the assembly remarks correctly (except for the very first > stub code section). > - Introduce a displacement to correct the offset. > + Remove old **CodeString/s** implementation. > > Testing > > Tier1-3 in debug mode, using debug strings, _without_ collecting remarks. > Tier1-3 in debug mode, using debug strings, _with_ collecting remarks (regardless of options). > Manual inspection of results (linux-x64 and linux-aarch64 only) for the following command line options: > `-XX:+PrintSignatureHandlers -XX:+PrintInterpreter -XX:+PrintStubCode -XX:+PrintAssembly` Hard to review but looks good to me. src/hotspot/share/asm/codeBuffer.hpp line 284: > 282: // the generated assembly code are unique, i.e. there is very little gain in > 283: // trying to share the strings between the different offsets tracked in a > 284: // buffer (or blob). Noticed some double whitespaces in many of the code comments, for example "assembly__code", "different__offsets". ------------- PR: https://git.openjdk.java.net/jdk/pull/5281 From phedlin at openjdk.java.net Fri Sep 3 08:19:27 2021 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Fri, 3 Sep 2021 08:19:27 GMT Subject: RFR: 8264207: CodeStrings does not honour fixed address assumption. In-Reply-To: <3LtxAQNwkKMjejT0RPXO8yV_jOBaL8qW8RPB9KfI1Es=.71ffb105-1df6-44e2-a196-10261a23fdee@github.com> References: <3LtxAQNwkKMjejT0RPXO8yV_jOBaL8qW8RPB9KfI1Es=.71ffb105-1df6-44e2-a196-10261a23fdee@github.com> Message-ID: On Fri, 3 Sep 2021 06:43:30 GMT, Tobias Hartmann wrote: >> Please review this change that addresses an issue originally found in JDK-8259590, >> where the error message should read >> `fatal error: DEBUG MESSAGE: verify_oop: r10: broken oop in decode_heap_oop` >> but is lost since the code generated refers to a fixed (string) address, a string >> (address) no longer available when **CodeStrings** have been propagated (copied) >> between code buffers/blobs. >> >> Background >> >> **CodeStrings** used to be **CodeComments**. The solution to JDK-8008555 introduced >> this change and added a new use-case for **CodeStrings**, not only as comments, but >> as strings with a fixed address used in the code generated with support for debug >> string printouts. >> >> The changes introduced with JDK-8255208 breaks the fixed address assumption made >> in the code generated with support for debug string printouts. Some of the (necessary) >> move semantics have been replaced by copy semantics when **CodeStrings** are >> propagated between code buffers/blobs (i.e. **CodeBuffer** and **CodeBlob** objects). >> >> Additional issue addressed >> >> + Broken printout (i.e. missing remarks) when multiple stubs are generated within >> the same primary buffer. >> >> The following steps have been taken to provide fixed address (debug) strings. >> >> + Introduce a simple _gtest_ for **CodeString/s** support with **CodeBuffer/CodeBlob**. >> + Split **CodeStrings** into two different abstractions; strings associated with an offset, >> and strings with a fixed address. >> - Let the first be _Assembly Code Remarks_, **AsmRemarks**, providing a simple >> 1:N mapping, _offset_ -> _remark_. >> - Let the second be _Debug Printout Strings_, **DbgStrings**, supporting a fixed >> address assumption such that: >> for each string A and B, if A = B -> &A = &B. >> - Use a reference counting scheme to ensure that both types of strings are >> deallocated properly, when no longer in use. >> + Remove **CodeStrings** from **Stub** interface. >> - Replace with internal use in the interpreter code, propagating the code >> (assembly) remarks directly to the final codelet. >> + Remove **CodeBuffer** self destruction, overwriting memory before all >> deconstructors have been executed. >> - Replace with sentinel deconstructor to do the bidding. >> + Stub code generated into a single common **CodeBuffer** (holding a number >> of stubs) will not print the assembly remarks correctly (except for the very first >> stub code section). >> - Introduce a displacement to correct the offset. >> + Remove old **CodeString/s** implementation. >> >> Testing >> >> Tier1-3 in debug mode, using debug strings, _without_ collecting remarks. >> Tier1-3 in debug mode, using debug strings, _with_ collecting remarks (regardless of options). >> Manual inspection of results (linux-x64 and linux-aarch64 only) for the following command line options: >> `-XX:+PrintSignatureHandlers -XX:+PrintInterpreter -XX:+PrintStubCode -XX:+PrintAssembly` > > src/hotspot/share/asm/codeBuffer.hpp line 284: > >> 282: // the generated assembly code are unique, i.e. there is very little gain in >> 283: // trying to share the strings between the different offsets tracked in a >> 284: // buffer (or blob). > > Noticed some double whitespaces in many of the code comments, for example "assembly__code", "different__offsets". Comments are (fully) justified for readability (within limits, not using hyphenation), same as text in books, newspapers and reports. ------------- PR: https://git.openjdk.java.net/jdk/pull/5281 From sakatakui at oss.nttdata.com Fri Sep 3 08:41:42 2021 From: sakatakui at oss.nttdata.com (Koichi Sakata) Date: Fri, 3 Sep 2021 17:41:42 +0900 Subject: Regarding options of error and dump file paths In-Reply-To: <8f51414c-86c1-49de-7b5f-4af0fae556aa@oracle.com> References: <92708e25-331f-f832-144b-eb00e2b0a4ac@oss.nttdata.com> <8f51414c-86c1-49de-7b5f-4af0fae556aa@oracle.com> Message-ID: Hi David, I?m sorry for the late reply. Thank you for your great advice. > Having an explicit option override the default directory option is a > good idea, but I'm not sure it is that clear cut. If you can specify a > relative directory and file name for a given dump file, might you not > want that to be relative to the specified default path, rather than > relative to the pwd? I occasionally want to use a relative path from the specified default path. This usage might confuse the path where files are outputted and complicate to fix, so we probably should prohibit relative paths when we use the default path. We can choose the specification after we find detailed expectations. > And we actually have quite a lot of potential output files from: >?? - GC (heap dumps) >?? - JIT (replay files) >?? - hs_err files >?? - JFR (a number of files) >?? - jcmd/dcmd dumps? >?? - Unified logging? > > I think figuring out the exact details of how this should work, and > interact with all the different files involved may be more involved than > just prepending a path component. I completely agree with you. To enable the new option needs a lot of our work, but that will improve convenience for users, I believe. Enabling easily to gathering error related files in one place helps us to troubleshoot. Not so many users set all these path options. If they use the new option, all they have to do will be sending files in the directory to their support personnel. In addition, they will get easier to keep files even on container environments. > I also think I would need to hear much greater demand, with detailed > usage expectations, before supporting this. I think so, too. I'd like to hear various people's point of view. Regards, Koichi On 2021/08/26 15:23, David Holmes wrote: > Hi Koichi, > > On 23/08/2021 1:29 pm, Koichi Sakata wrote: >> Hi all, >> >> I'm writing to get feedback on my idea about options for error and >> dump file paths. >> >> First of all, we can specify several options related to error and dump >> files. For example, the HeapDumpPath option sets the heap dump file >> and the ErrorFile option sets the hs_error file. >> >> I've felt inconvenience about that because we need to write all path >> options to put those files in a specific directory. I also recognize >> that they are outputted in the working directory when I run an >> application with no options. But I'd like to keep them in any >> directory. So the new option that sets the directory where those files >> are outputted would be useful, I think. >> >> The new option helps us especially to run applications on containers >> like Docker, Kubernetes etc. If we run them without those existing >> options on containers, files will be put in the local directory of >> each container. We lose files after we operate the container such as >> deleting it. The option enables us to keep certainly all error and >> dump files if we just specify the path of the persistent volume for >> the new option. >> >> As a concrete example, when we specify -XX:ErrorAndDumpPath=/foo/bar/ >> (This option name is tentative), -XX:+HeapDumpOnOutOfMemoryError and >> -XX:StartFlightRecording etc., files are generated in the /foo/bar >> directory. From my point of view, the option will deal with the >> following files: >> - heap dump file (java_pid%p.hprof) >> - error log file (hs_err_pid%p.log) >> - JFR emergency dumps (hs_err_pid%p.jfr, hs_oom_pid%p.jfr, >> hs_soe_pid%p.jfr) >> - replay file (replay_pid%p.log) >> >> The existing path options should override the new option. If I set >> -XX:ErrorAndDumpPath=/foo/bar/ and -XX:HeapDumpPath=/foo/baz/, a heap >> dump file will be in the /foo/baz directory and other files will be >> created in the /foo/bar. >> >> I would like to hear your point of view. If some people agree to this >> idea, I will write a patch. > > My initial reaction was that this seemed something better handled in a > launch script because I figured if you had complex needs in relation to > where these files were being placed, then you'd use a launch script to > help manage that anyway. > > But I can see there would be some convenience to controlling the output > directory without also having to restate the default file names. > > Having an explicit option override the default directory option is a > good idea, but I'm not sure it is that clear cut. If you can specify a > relative directory and file name for a given dump file, might you not > want that to be relative to the specified default path, rather than > relative to the pwd? > > And we actually have quite a lot of potential output files from: > ?- GC (heap dumps) > ?- JIT (replay files) > ?- hs_err files > ?- JFR (a number of files) > ?- jcmd/dcmd dumps? > ?- Unified logging? > > I think figuring out the exact details of how this should work, and > interact with all the different files involved may be more involved than > just prepending a path component. > > I also think I would need to hear much greater demand, with detailed > usage expectations, before supporting this. > > Just my 2c. > > Cheers, > David > ----- > >> Regards, >> Koichi From shade at openjdk.java.net Fri Sep 3 09:17:42 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 3 Sep 2021 09:17:42 GMT Subject: RFR: 8273314: Add tier4 test groups Message-ID: <6rDioD5KZREYHyzOXol72O0erNDqKHUqb-0k1SwhInA=.90a12492-9e2f-4f0c-bf09-8d67d1d6111f@github.com> During the review of JDK-8272914 that added hotspot:tier{2,3} groups, @iignatev suggested to create tier4 groups that capture all tests not in tiers{1,2,3}. I have excluded `vmTestbase` and `hotspot:tier4,` because they take 10+ hours on my highly parallel machine. I have also excluded `applications` from `hotspot:tier4`, because they require test dependencies (e.g. jcstress). Sample run: ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier4 426 425 1 0 << >> jtreg:test/jdk:tier4 2891 2885 4 2 << jtreg:test/langtools:tier4 0 0 0 0 jtreg:test/jaxp:tier4 0 0 0 0 ============================== real 64m13.994s user 1462m1.213s sys 39m38.032s There are interesting test failures on my machine, which I would address separately. ------------- Commit messages: - Add tier4 test groups Changes: https://git.openjdk.java.net/jdk/pull/5357/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5357&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273314 Stats: 22 lines in 4 files changed: 22 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5357.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5357/head:pull/5357 PR: https://git.openjdk.java.net/jdk/pull/5357 From shade at openjdk.java.net Fri Sep 3 10:47:43 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 3 Sep 2021 10:47:43 GMT Subject: RFR: 8273318: Some containers/docker/TestJFREvents.java configs are running out of memory Message-ID: $ CONF=linux-x86_64-server-fastdebug make run-test TEST=containers/docker/TestJFREvents.java STDERR: stdout: []; stderr: [WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. ] exitValue = 137 java.lang.RuntimeException: Expected to get exit value of [0] at jdk.test.lib.process.OutputAnalyzer.shouldHaveExitValue(OutputAnalyzer.java:489) at TestJFREvents.testContainerInfo(TestJFREvents.java:110) at TestJFREvents.containerInfoTestCase(TestJFREvents.java:89) at TestJFREvents.main(TestJFREvents.java:74) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:568) at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:312) at java.base/java.lang.Thread.run(Thread.java:833) `exitValue = 137` suggests the container was killed by OOM killer. The failing configuration is with `64m`, and it is apparently too low. Additional testing: - [x] Affected test now passes (5 runs out of 5 tries) - [x] `containers/docker` tests pass ------------- Commit messages: - Try larger memories Changes: https://git.openjdk.java.net/jdk/pull/5359/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5359&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273318 Stats: 7 lines in 1 file changed: 1 ins; 0 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/5359.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5359/head:pull/5359 PR: https://git.openjdk.java.net/jdk/pull/5359 From david.holmes at oracle.com Fri Sep 3 11:49:58 2021 From: david.holmes at oracle.com (David Holmes) Date: Fri, 3 Sep 2021 21:49:58 +1000 Subject: RFR: 8264207: CodeStrings does not honour fixed address assumption. In-Reply-To: References: <3LtxAQNwkKMjejT0RPXO8yV_jOBaL8qW8RPB9KfI1Es=.71ffb105-1df6-44e2-a196-10261a23fdee@github.com> Message-ID: <890465b9-1907-2529-051d-e39fecf84d7b@oracle.com> On 3/09/2021 6:19 pm, Patric Hedlin wrote: > On Fri, 3 Sep 2021 06:43:30 GMT, Tobias Hartmann wrote: > >> src/hotspot/share/asm/codeBuffer.hpp line 284: >> >>> 282: // the generated assembly code are unique, i.e. there is very little gain in >>> 283: // trying to share the strings between the different offsets tracked in a >>> 284: // buffer (or blob). >> >> Noticed some double whitespaces in many of the code comments, for example "assembly__code", "different__offsets". > > Comments are (fully) justified for readability (within limits, not using hyphenation), same as text in books, newspapers and reports. It is not at all usual to justify comments in this way, and not part of the hotspot style. I would also imagine it is very tedious to write them like that. Cheers, David ----- > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/5281 > From stefan.karlsson at oracle.com Fri Sep 3 12:19:04 2021 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Fri, 3 Sep 2021 14:19:04 +0200 Subject: [UNVERIFIED SENDER] RFR: 8273239: Standardize Ticks APIs return type In-Reply-To: References: <12d34936-f4d6-5a56-c693-4e18bcf8638c@oracle.com> <5610D974-4D73-48E6-B73A-2A5512774410@amazon.com> Message-ID: On 2021-09-02 23:40, Kim Barrett wrote: > [resending from the correct account, so it gets to the mailing list and the PR.] > >> On Sep 2, 2021, at 3:04 PM, Hohensee, Paul wrote: >> >> I haven't been following this thread, so please forgive redundancy. >> >> For the fully concurrent collectors such as Shenandoah, ZGC, and C4, we want to be able to measure intervals that may be shorter than a millisecond. Azul uses seconds-as-doubles to do this in their MXBean APIs (see https://docs.azul.com/prime/MXBeans), but given that Hotspot has access to nanotime counters and that a long can hold ~272 years of nanoseconds, I'd very much like for Hotspot to standardize on nanoseconds internally and make millis and seconds available as convenience methods. > There?s been some effort in GC code to use the Ticks utility (which I think ends up being nanoseconds > on all supported platforms) internally, and convert to other types for logging and other API boundaries > that require other types. See, for example, > https://bugs.openjdk.java.net/browse/JDK-8208390 > I thought that had already been done, but apparently not. Maybe there was just an unfinished prototype? > Quite possibly it got too big for one change set. This might not be what you were thinking about, but could be interesting to think about anyways. At some point I created a prototype to use a chrono-like API to solve the numerous bugs we've had when accidentally mixing seconds and milliseconds. See: https://www.cplusplus.com/reference/chrono/ https://cr.openjdk.java.net/~stefank/prototype/durations/ That in itself was good enough to find conversion bugs. On top of this I started to convert the G1 code to use "duration" (backed by nanos), and only convert to seconds and milliseconds at the boundaries. I don't have that patch anymore, but from internal mails I see that these were my thoughts at that point: "I ... didn't like that because G1 pervasively uses doubles and does a lot of statistics on these double values, so using Duration<> forces me to uses duration_casts in many places in G1 (to allow lossy conversions). I'll reimplement that part so that I can show that part more clearly and so that we can discuss if there are better ways around those problems. " Priorities changed and we're now at C++14. Would it make sense to investigate using chrono? StefanK > > From thartmann at openjdk.java.net Fri Sep 3 12:50:23 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 3 Sep 2021 12:50:23 GMT Subject: RFR: 8264207: CodeStrings does not honour fixed address assumption. In-Reply-To: References: <3LtxAQNwkKMjejT0RPXO8yV_jOBaL8qW8RPB9KfI1Es=.71ffb105-1df6-44e2-a196-10261a23fdee@github.com> Message-ID: On Fri, 3 Sep 2021 08:16:07 GMT, Patric Hedlin wrote: >> src/hotspot/share/asm/codeBuffer.hpp line 284: >> >>> 282: // the generated assembly code are unique, i.e. there is very little gain in >>> 283: // trying to share the strings between the different offsets tracked in a >>> 284: // buffer (or blob). >> >> Noticed some double whitespaces in many of the code comments, for example "assembly__code", "different__offsets". > > Comments are (fully) justified for readability (within limits, not using hyphenation), same as text in books, newspapers and reports. As we discussed off-thread, I would prefer to not manually enforce block text layout by inserting extra whitespace because it is tedious and increases the burden on anyone later modifying (parts of) that comment. Although I do understand that this layout might be easier to read for some people, I think it's the IDE's responsibility to display the comment in the user preferred way. If at all, line length restrictions and such should be part of the HotSpot Style Guide to guarantee consistency. ------------- PR: https://git.openjdk.java.net/jdk/pull/5281 From phedlin at openjdk.java.net Fri Sep 3 14:50:06 2021 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Fri, 3 Sep 2021 14:50:06 GMT Subject: RFR: 8264207: CodeStrings does not honour fixed address assumption. [v2] In-Reply-To: References: Message-ID: > Please review this change that addresses an issue originally found in JDK-8259590, > where the error message should read > `fatal error: DEBUG MESSAGE: verify_oop: r10: broken oop in decode_heap_oop` > but is lost since the code generated refers to a fixed (string) address, a string > (address) no longer available when **CodeStrings** have been propagated (copied) > between code buffers/blobs. > > Background > > **CodeStrings** used to be **CodeComments**. The solution to JDK-8008555 introduced > this change and added a new use-case for **CodeStrings**, not only as comments, but > as strings with a fixed address used in the code generated with support for debug > string printouts. > > The changes introduced with JDK-8255208 breaks the fixed address assumption made > in the code generated with support for debug string printouts. Some of the (necessary) > move semantics have been replaced by copy semantics when **CodeStrings** are > propagated between code buffers/blobs (i.e. **CodeBuffer** and **CodeBlob** objects). > > Additional issue addressed > > + Broken printout (i.e. missing remarks) when multiple stubs are generated within > the same primary buffer. > > The following steps have been taken to provide fixed address (debug) strings. > > + Introduce a simple _gtest_ for **CodeString/s** support with **CodeBuffer/CodeBlob**. > + Split **CodeStrings** into two different abstractions; strings associated with an offset, > and strings with a fixed address. > - Let the first be _Assembly Code Remarks_, **AsmRemarks**, providing a simple > 1:N mapping, _offset_ -> _remark_. > - Let the second be _Debug Printout Strings_, **DbgStrings**, supporting a fixed > address assumption such that: > for each string A and B, if A = B -> &A = &B. > - Use a reference counting scheme to ensure that both types of strings are > deallocated properly, when no longer in use. > + Remove **CodeStrings** from **Stub** interface. > - Replace with internal use in the interpreter code, propagating the code > (assembly) remarks directly to the final codelet. > + Remove **CodeBuffer** self destruction, overwriting memory before all > deconstructors have been executed. > - Replace with sentinel deconstructor to do the bidding. > + Stub code generated into a single common **CodeBuffer** (holding a number > of stubs) will not print the assembly remarks correctly (except for the very first > stub code section). > - Introduce a displacement to correct the offset. > + Remove old **CodeString/s** implementation. > > Testing > > Tier1-3 in debug mode, using debug strings, _without_ collecting remarks. > Tier1-3 in debug mode, using debug strings, _with_ collecting remarks (regardless of options). > Manual inspection of results (linux-x64 and linux-aarch64 only) for the following command line options: > `-XX:+PrintSignatureHandlers -XX:+PrintInterpreter -XX:+PrintStubCode -XX:+PrintAssembly` Patric Hedlin has updated the pull request incrementally with one additional commit since the last revision: Remove disturbing white-space. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5281/files - new: https://git.openjdk.java.net/jdk/pull/5281/files/f2eeefc9..22cceded Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5281&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5281&range=00-01 Stats: 12 lines in 4 files changed: 0 ins; 0 del; 12 mod Patch: https://git.openjdk.java.net/jdk/pull/5281.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5281/head:pull/5281 PR: https://git.openjdk.java.net/jdk/pull/5281 From kim.barrett at oracle.com Fri Sep 3 17:29:13 2021 From: kim.barrett at oracle.com (Kim Barrett) Date: Fri, 3 Sep 2021 17:29:13 +0000 Subject: [UNVERIFIED SENDER] RFR: 8273239: Standardize Ticks APIs return type In-Reply-To: References: <12d34936-f4d6-5a56-c693-4e18bcf8638c@oracle.com> <5610D974-4D73-48E6-B73A-2A5512774410@amazon.com> Message-ID: > On Sep 3, 2021, at 8:19 AM, Stefan Karlsson wrote: > > On 2021-09-02 23:40, Kim Barrett wrote: >> [resending from the correct account, so it gets to the mailing list and the PR.] >> >>> On Sep 2, 2021, at 3:04 PM, Hohensee, Paul wrote: >>> >>> I haven't been following this thread, so please forgive redundancy. >>> >>> For the fully concurrent collectors such as Shenandoah, ZGC, and C4, we want to be able to measure intervals that may be shorter than a millisecond. Azul uses seconds-as-doubles to do this in their MXBean APIs (see https://docs.azul.com/prime/MXBeans), but given that Hotspot has access to nanotime counters and that a long can hold ~272 years of nanoseconds, I'd very much like for Hotspot to standardize on nanoseconds internally and make millis and seconds available as convenience methods. >> There?s been some effort in GC code to use the Ticks utility (which I think ends up being nanoseconds >> on all supported platforms) internally, and convert to other types for logging and other API boundaries >> that require other types. See, for example, >> https://bugs.openjdk.java.net/browse/JDK-8208390 >> I thought that had already been done, but apparently not. Maybe there was just an unfinished prototype? >> Quite possibly it got too big for one change set. > > This might not be what you were thinking about, but could be interesting to think about anyways. > > At some point I created a prototype to use a chrono-like API to solve the numerous bugs we've had when accidentally mixing seconds and milliseconds. See: Yes, I remember this. I liked the idea, but at the time we didn?t have C++11/14 available. Now we do. > https://www.cplusplus.com/reference/chrono/ > https://cr.openjdk.java.net/~stefank/prototype/durations/ > > That in itself was good enough to find conversion bugs. On top of this I started to convert the G1 code to use "duration" (backed by nanos), and only convert to seconds and milliseconds at the boundaries. I don't have that patch anymore, but from internal mails I see that these were my thoughts at that point: > > "I ... didn't like that because G1 pervasively uses doubles and does a lot of statistics on these double values, so using Duration<> forces me to uses duration_casts in many places in G1 (to allow lossy conversions). I'll reimplement that part so that I can show that part more clearly and so that we can discuss if there are better ways around those problems. " The current use of Ticks has essentially the same problem. Except we're writing out the conversions all over the place, rather than having them nicely packaged up. I find that frequently irritating, but nobody (including me) has gotten so irritated as to do something about it. > Priorities changed and we're now at C++14. Would it make sense to investigate using chrono? Unless and until we address the issue of JFR using the so-called fast-unordered clock (JDK-8211240, which is confidential for annoying reasons and needs to be resubmitted in the open. Summary is "Systematic investigation regarding latency for os::elapsed_counter() vs rdtsc()".), we need to use the paired form of Ticks. But I think the underlying implementation of Ticks could use std::chrono. We may (probably) want to use our own clock sources rather than whatever the C++ standard library provides us, in order to be sure we're consistent with other code, including Java. I don't remember how hard it is to define a clock source, though I'm guessing it's not hard. From kim.barrett at oracle.com Fri Sep 3 17:37:48 2021 From: kim.barrett at oracle.com (Kim Barrett) Date: Fri, 3 Sep 2021 17:37:48 +0000 Subject: RFR: 8273239: Standardize Ticks APIs return type In-Reply-To: References: <5610D974-4D73-48E6-B73A-2A5512774410@amazon.com> Message-ID: > On Sep 2, 2021, at 5:42 PM, Albert Mingkun Yang wrote: > Here are my two propositions: (more are welcome) > > 1. A consistent return type (`double`) for s/ms/us, but not ns. ns is special because it's the only unit without info loss. Callers expect integral values for s/ms/us can easily discard the fractional part. I was thinking about suggesting this, but it seemed a little odd. Also - There are currently no callers of microseconds(). - Both integral and double milliseconds have real uses. (Timeouts and such are often integral milliseconds.) - Many of the current uses of double milliseconds by GC would be better off remaining as unconverted Ticks, with conversions occurring later. > 2. Keep the existing return-type API, and add two more methods for ms/us returning `double`. > > Note in both propositions, the implementation of `uint64_t nanoseconds()` will be different from `master`, which performs unnecessary double conversion. The new impl should address Kim's concern of `uint64_t-to-double` info loss. I forgot that there was a conversion through double internally in Ticks. That would be a good thing to nuke. From dcubed at openjdk.java.net Fri Sep 3 18:09:39 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Fri, 3 Sep 2021 18:09:39 GMT Subject: RFR: 8265489: Stress test times out because of long ObjectSynchronizer::monitors_iterate(...) operation In-Reply-To: References: Message-ID: On Thu, 19 Aug 2021 21:18:53 GMT, Leonid Mesnik wrote: > monitors_iterate make several checks which often are true before filter monitor by a thread. It might take a lot of time when there are a lot of threads. So it makes sense to first check thread and only then other conditions. Changes requested by dcubed (Reviewer). src/hotspot/share/runtime/synchronizer.cpp line 981: > 979: if (mid->owner() != thread) { > 980: return; > 981: } The `iter` is processing the in-use-list and you're bailing the iteration when you run into an ObjectMonitor that is not owned by `thread`, but that doesn't mean that there's not an ObjectMonitor owned by `thread` later on in the in-use-list. So I could see you doing a `continue` here, but not a `return`. ------------- PR: https://git.openjdk.java.net/jdk/pull/5194 From lmesnik at openjdk.java.net Fri Sep 3 18:09:39 2021 From: lmesnik at openjdk.java.net (Leonid Mesnik) Date: Fri, 3 Sep 2021 18:09:39 GMT Subject: RFR: 8265489: Stress test times out because of long ObjectSynchronizer::monitors_iterate(...) operation Message-ID: monitors_iterate make several checks which often are true before filter monitor by a thread. It might take a lot of time when there are a lot of threads. So it makes sense to first check thread and only then other conditions. ------------- Commit messages: - Merge branch 'master' of https://github.com/openjdk/jdk into 8265489 - fixed check - Merge branch 'master' of https://github.com/openjdk/jdk into 8265489 - 8265489: RunThese24H.java times out because of long ObjectSynchronizer::monitors_iterate(...) operation Changes: https://git.openjdk.java.net/jdk/pull/5194/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5194&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8265489 Stats: 55 lines in 5 files changed: 8 ins; 15 del; 32 mod Patch: https://git.openjdk.java.net/jdk/pull/5194.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5194/head:pull/5194 PR: https://git.openjdk.java.net/jdk/pull/5194 From iignatyev at openjdk.java.net Fri Sep 3 18:34:40 2021 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Fri, 3 Sep 2021 18:34:40 GMT Subject: RFR: 8273314: Add tier4 test groups In-Reply-To: <6rDioD5KZREYHyzOXol72O0erNDqKHUqb-0k1SwhInA=.90a12492-9e2f-4f0c-bf09-8d67d1d6111f@github.com> References: <6rDioD5KZREYHyzOXol72O0erNDqKHUqb-0k1SwhInA=.90a12492-9e2f-4f0c-bf09-8d67d1d6111f@github.com> Message-ID: On Fri, 3 Sep 2021 09:10:20 GMT, Aleksey Shipilev wrote: > During the review of JDK-8272914 that added hotspot:tier{2,3} groups, @iignatev suggested to create tier4 groups that capture all tests not in tiers{1,2,3}. I have excluded `vmTestbase` and `hotspot:tier4,` because they take 10+ hours on my highly parallel machine. I have also excluded `applications` from `hotspot:tier4`, because they require test dependencies (e.g. jcstress). > > Sample run: > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR >>> jtreg:test/hotspot/jtreg:tier4 426 425 1 0 << >>> jtreg:test/jdk:tier4 2891 2885 4 2 << > jtreg:test/langtools:tier4 0 0 0 0 > jtreg:test/jaxp:tier4 0 0 0 0 > ============================== > > real 64m13.994s > user 1462m1.213s > sys 39m38.032s > > > There are interesting test failures on my machine, which I would address separately. > <...> I have excluded `vmTestbase` and `hotspot:tier4`<...> I have also excluded `applications` from `hotspot:tier4` <...> assuming the goal of tier4 is to catch the rest of the tests, I don't think we should exclude `vmTestbase`, `applications` or any other tests from tier4. unless you also want to create tier5 for them. -- Igor ------------- PR: https://git.openjdk.java.net/jdk/pull/5357 From shade at openjdk.java.net Fri Sep 3 18:43:28 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 3 Sep 2021 18:43:28 GMT Subject: RFR: 8273314: Add tier4 test groups In-Reply-To: References: <6rDioD5KZREYHyzOXol72O0erNDqKHUqb-0k1SwhInA=.90a12492-9e2f-4f0c-bf09-8d67d1d6111f@github.com> Message-ID: On Fri, 3 Sep 2021 18:32:21 GMT, Igor Ignatyev wrote: > > <...> I have excluded `vmTestbase` and `hotspot:tier4`<...> I have also excluded `applications` from `hotspot:tier4` <...> > > assuming the goal of tier4 is to catch the rest of the tests, I don't think we should exclude `vmTestbase`, `applications` or any other tests from tier4. unless you also want to create tier5 for them. Apart from practicality of using `tier4`, I think `vmTestbase` and `applications` are separate test suites in their own right. `tier4` is catching all the assorted Hotspot tests that are not part of larger suites. Makes sense? ------------- PR: https://git.openjdk.java.net/jdk/pull/5357 From never at openjdk.java.net Fri Sep 3 20:02:36 2021 From: never at openjdk.java.net (Tom Rodriguez) Date: Fri, 3 Sep 2021 20:02:36 GMT Subject: RFR: 8137018: [JVMCI] Encapsulate new Thread fields for JVMCI [v2] In-Reply-To: References: Message-ID: > This evacuates all JVMCI related methods and fields into a separately declared struct. Tom Rodriguez has updated the pull request incrementally with one additional commit since the last revision: Review cleanups ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5339/files - new: https://git.openjdk.java.net/jdk/pull/5339/files/472073a7..9af54d4b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5339&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5339&range=00-01 Stats: 58 lines in 3 files changed: 9 ins; 22 del; 27 mod Patch: https://git.openjdk.java.net/jdk/pull/5339.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5339/head:pull/5339 PR: https://git.openjdk.java.net/jdk/pull/5339 From never at openjdk.java.net Fri Sep 3 20:02:43 2021 From: never at openjdk.java.net (Tom Rodriguez) Date: Fri, 3 Sep 2021 20:02:43 GMT Subject: RFR: 8137018: [JVMCI] Encapsulate new Thread fields for JVMCI [v2] In-Reply-To: References: Message-ID: On Thu, 2 Sep 2021 04:22:00 GMT, David Holmes wrote: >> Tom Rodriguez has updated the pull request incrementally with one additional commit since the last revision: >> >> Review cleanups > > src/hotspot/share/jvmci/jvmci.cpp line 405: > >> 403: } >> 404: >> 405: > > Nit: there are a few double-blank lines between definitions when one is normal. fixed. > src/hotspot/share/jvmci/jvmci.hpp line 249: > >> 247: void set_jvmci_reserved_oop0(oop value) { >> 248: _jvmci_reserved_oop0 = value; >> 249: } > > Nit: why is this and following definitions multi-line when the preceding ones (of similar size) are single line? I've flattened them out. ------------- PR: https://git.openjdk.java.net/jdk/pull/5339 From never at openjdk.java.net Fri Sep 3 20:02:37 2021 From: never at openjdk.java.net (Tom Rodriguez) Date: Fri, 3 Sep 2021 20:02:37 GMT Subject: RFR: 8137018: [JVMCI] Encapsulate new Thread fields for JVMCI In-Reply-To: References: Message-ID: <0Vo8Od7dx8XO3U7bbm9xf-B_qHVSs4OcIDyQd-aUFvo=.2ae912d9-7d4f-4b02-a39d-7c454e0373ed@github.com> On Wed, 1 Sep 2021 18:03:11 GMT, Tom Rodriguez wrote: > This evacuates all JVMCI related methods and fields into a separately declared struct. I pushed a commit that address all the comments and cleans up the header a bit. Testing of this commit was clean. Hopefully I haven't introduced any new things which should be address. ------------- PR: https://git.openjdk.java.net/jdk/pull/5339 From never at openjdk.java.net Fri Sep 3 20:02:46 2021 From: never at openjdk.java.net (Tom Rodriguez) Date: Fri, 3 Sep 2021 20:02:46 GMT Subject: RFR: 8137018: [JVMCI] Encapsulate new Thread fields for JVMCI [v2] In-Reply-To: References: Message-ID: On Wed, 1 Sep 2021 19:34:28 GMT, Coleen Phillimore wrote: >> Tom Rodriguez has updated the pull request incrementally with one additional commit since the last revision: >> >> Review cleanups > > src/hotspot/share/jvmci/jvmci.hpp line 183: > >> 181: }; >> 182: >> 183: > > There seems to be a crazy amount of whitespace here. I'd meant to go back and pretty this up. I was just dumping things in there as I migrated them from elsewhere. I've grouped them with some short comments and tried to clean it up. I also fixed some visibility issues and removed the use of friend for JavaThread. ------------- PR: https://git.openjdk.java.net/jdk/pull/5339 From never at openjdk.java.net Fri Sep 3 20:02:49 2021 From: never at openjdk.java.net (Tom Rodriguez) Date: Fri, 3 Sep 2021 20:02:49 GMT Subject: RFR: 8137018: [JVMCI] Encapsulate new Thread fields for JVMCI [v2] In-Reply-To: References: Message-ID: On Thu, 2 Sep 2021 13:41:24 GMT, Doug Simon wrote: >> src/hotspot/share/jvmci/jvmci.hpp line 198: >> >>> 196: >>> 197: // Communicates the DeoptReason and DeoptAction of the uncommon trap >>> 198: int _pending_deoptimization; >> >> Nit: Why the extra large alignment spacing of all the declarations? (I'm not a fan of such alignment as it is too hard to maintain - and too hard to type in the first place!) > > That probably comes from a time before we nicely commented each JVMCI field ;-) > I agree that there's no need for the alignment now. I removed the extra indenting. ------------- PR: https://git.openjdk.java.net/jdk/pull/5339 From dholmes at openjdk.java.net Fri Sep 3 22:41:51 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Fri, 3 Sep 2021 22:41:51 GMT Subject: RFR: 8137018: [JVMCI] Encapsulate new Thread fields for JVMCI [v2] In-Reply-To: References: Message-ID: <-C_ILfxkP5jfexZX0PD60lkTwymI8GOJOODHbai91kc=.ca813d54-cd1c-49d1-a9f1-7700b6cae467@github.com> On Fri, 3 Sep 2021 20:02:36 GMT, Tom Rodriguez wrote: >> This evacuates all JVMCI related methods and fields into a separately declared struct. > > Tom Rodriguez has updated the pull request incrementally with one additional commit since the last revision: > > Review cleanups Looks fine. Thanks for the updates. David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5339 From serb at openjdk.java.net Sat Sep 4 02:54:46 2021 From: serb at openjdk.java.net (Sergey Bylokhov) Date: Sat, 4 Sep 2021 02:54:46 GMT Subject: RFR: 8273314: Add tier4 test groups In-Reply-To: <6rDioD5KZREYHyzOXol72O0erNDqKHUqb-0k1SwhInA=.90a12492-9e2f-4f0c-bf09-8d67d1d6111f@github.com> References: <6rDioD5KZREYHyzOXol72O0erNDqKHUqb-0k1SwhInA=.90a12492-9e2f-4f0c-bf09-8d67d1d6111f@github.com> Message-ID: <4ZMjEBOeBv6RVtgEhcfQ1sMVUyL8t2v4W4sc3iGY3r0=.d88a0dbc-d8d0-497b-bde7-9d5de283d11f@github.com> On Fri, 3 Sep 2021 09:10:20 GMT, Aleksey Shipilev wrote: > During the review of JDK-8272914 that added hotspot:tier{2,3} groups, @iignatev suggested to create tier4 groups that capture all tests not in tiers{1,2,3}. I have excluded `vmTestbase` and `hotspot:tier4,` because they take 10+ hours on my highly parallel machine. I have also excluded `applications` from `hotspot:tier4`, because they require test dependencies (e.g. jcstress). > > Sample run: > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR >>> jtreg:test/hotspot/jtreg:tier4 426 425 1 0 << >>> jtreg:test/jdk:tier4 2891 2885 4 2 << > jtreg:test/langtools:tier4 0 0 0 0 > jtreg:test/jaxp:tier4 0 0 0 0 > ============================== > > real 64m13.994s > user 1462m1.213s > sys 39m38.032s > > > There are interesting test failures on my machine, which I would address separately. it looks like the results above do not include the headful tests did you filter them out? >> jtreg:test/jdk:tier4 2891 2885 4 2 << ------------- PR: https://git.openjdk.java.net/jdk/pull/5357 From iignatyev at openjdk.java.net Sat Sep 4 03:16:52 2021 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Sat, 4 Sep 2021 03:16:52 GMT Subject: RFR: 8273314: Add tier4 test groups In-Reply-To: References: <6rDioD5KZREYHyzOXol72O0erNDqKHUqb-0k1SwhInA=.90a12492-9e2f-4f0c-bf09-8d67d1d6111f@github.com> Message-ID: On Fri, 3 Sep 2021 18:40:14 GMT, Aleksey Shipilev wrote: > > > <...> I have excluded `vmTestbase` and `hotspot:tier4`<...> I have also excluded `applications` from `hotspot:tier4` <...> > > > > > > assuming the goal of tier4 is to catch the rest of the tests, I don't think we should exclude `vmTestbase`, `applications` or any other tests from tier4. unless you also want to create tier5 for them. > > Apart from practicality of using `tier4`, I think `vmTestbase` and `applications` are separate test suites in their own right. `tier4` is catching all the assorted Hotspot tests that are not part of larger suites. Makes sense? to some extent. I agree that `applications` tests can/should be seen as a separate test suite, yet I differ on `vmTestbase` as the end goal for `vmTestbase` tests is (and always was) to become just another test within hotspot jtreg test suite, hence I don't think we should treat them any different than other jtreg tests. there is also a plan (which I need to formalize and share w/ a broader audience) to rearrange `vmTestbase` tests so they will be placed within the corresponding component subdirectories, which would bring us closer to the end goal and at the same time make it slightly harder to select all `vmTestbase` tests. -- Igor ------------- PR: https://git.openjdk.java.net/jdk/pull/5357 From peter.kessler at os.amperecomputing.com Sat Sep 4 05:37:20 2021 From: peter.kessler at os.amperecomputing.com (Peter Kessler (Open Source)) Date: Sat, 4 Sep 2021 05:37:20 +0000 Subject: AArch64: Implementing spin pauses with ISB Message-ID: <18BCCFEC-A0CE-4271-99AA-8AF50ECF4D57@amperemail.onmicrosoft.com> Why this is not also an issue for C code? Shouldn't SpinPause() call a function from the C library, possibly a (GCC) built-in function implemented with a choice of NOP, YIELD, ISB, or whatever as appropriate on the architecture? The original post referred to improvements in MySQL and MongoDB, neither of which is written in the Java programming language. The Java platform intrinsic would need to pick an implementation, if calling the C library function was too time-consuming, or not time-consuming enough. In addition to making the choice of implementation switchable, it would be good to get the default setting of that switch from vm_version_aarch64.cpp so vendors could set it appropriately for their implementations. ... peter ?On 8/27/21, 01:23, "hotspot-dev on behalf of Andrew Haley" wrote: On 8/25/21 10:16 PM, Astigeevich, Evgeny wrote: > IMHO, we've only scratched the surface of it. The problem is not > well modelled by existing public benchmarks. Yes, it is application > dependent at some level. In case of Thread.onSpinWait it depends on > how an application implements spin loops. Applications having spin > loops with several iterations would benefit from short onSpinWait > (this is what we've got in customers' benchmarks). Applications > calling onSpinWait only couple times would benefit from longer > onSpinWait. "How heavy thread contention should be, what other > places", these are still open questions. To answer them we need to > detect the issues which is the problem itself. What we currently > use is the trial-and-error approach. OK, thanks. I'm happy to approve a patch that switches NOP, PAUSE, and ISB. However, this all sounds a bit like cargo cult science to me. :-) Some thoughts. In a high-contention case this is related to backoff in transactional memory, where Dave Dice et al's TL2 (See "What really makes transactions faster?" uses exponential*random backoff, like Ethernet. The wisdom there is that you keep increasing the backoff until it is about half the time that parking would take, then actually park. That sounds intuitively reasonable. But we don't need to do anything more if just adding an ISB is good enough, at least for now. Let's do that, and see how it goes. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph-open at littlepinkcloud.com Sat Sep 4 10:30:36 2021 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Sat, 4 Sep 2021 11:30:36 +0100 Subject: AArch64: Implementing spin pauses with ISB In-Reply-To: <18BCCFEC-A0CE-4271-99AA-8AF50ECF4D57@amperemail.onmicrosoft.com> References: <18BCCFEC-A0CE-4271-99AA-8AF50ECF4D57@amperemail.onmicrosoft.com> Message-ID: On 9/4/21 6:37 AM, Peter Kessler (Open Source) wrote: > Why this is not also an issue for C code? Shouldn't SpinPause() > call a function from the C library, possibly a (GCC) built-in > function implemented with a choice of NOP, YIELD, ISB, or whatever > as appropriate on the architecture? Ouch. I can't think of any good reason why it should, and we'd have to dump many registers of state to call it. Let C provide its builtins, and we'll provide ours. > The original post referred to improvements in MySQL and MongoDB, > neither of which is written in the Java programming language. The > Java platform intrinsic would need to pick an implementation, if > calling the C library function was too time-consuming, or not > time-consuming enough. Yep. Right now ISB looks like a decent-enough choice. > In addition to making the choice of implementation switchable, it > would be good to get the default setting of that switch from > vm_version_aarch64.cpp so vendors could set it appropriately for > their implementations. I'm assuming that is what will happen. However, I'm still intrigued by the possibility that the best solution is actually implementation as much as hardware dependent, particularly with many threads and high contention. And what we really need may be randomized exponential backoff, which is adaptive rather than fixed. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From dholmes at openjdk.java.net Sun Sep 5 23:29:50 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Sun, 5 Sep 2021 23:29:50 GMT Subject: RFR: 8273314: Add tier4 test groups In-Reply-To: <6rDioD5KZREYHyzOXol72O0erNDqKHUqb-0k1SwhInA=.90a12492-9e2f-4f0c-bf09-8d67d1d6111f@github.com> References: <6rDioD5KZREYHyzOXol72O0erNDqKHUqb-0k1SwhInA=.90a12492-9e2f-4f0c-bf09-8d67d1d6111f@github.com> Message-ID: On Fri, 3 Sep 2021 09:10:20 GMT, Aleksey Shipilev wrote: > During the review of JDK-8272914 that added hotspot:tier{2,3} groups, @iignatev suggested to create tier4 groups that capture all tests not in tiers{1,2,3}. I have excluded `vmTestbase` and `hotspot:tier4,` because they take 10+ hours on my highly parallel machine. I have also excluded `applications` from `hotspot:tier4`, because they require test dependencies (e.g. jcstress). > > Sample run: > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR >>> jtreg:test/hotspot/jtreg:tier4 426 425 1 0 << >>> jtreg:test/jdk:tier4 2891 2885 4 2 << > jtreg:test/langtools:tier4 0 0 0 0 > jtreg:test/jaxp:tier4 0 0 0 0 > ============================== > > real 64m13.994s > user 1462m1.213s > sys 39m38.032s > > > There are interesting test failures on my machine, which I would address separately. Hi Aleksey, This seems rather arbitrary and subjective to me. The tier 1-3 groupings were driven by existing tier 1-3 notions. But here the definition of tier 4 as "all the rest except ..." is not really a well-defined meaning for tier 4. I don't see that it adds any value. Perhaps there is a need for a group that is "everything except tier 1, tier 2, tier 3, applications, ..." but I wouldn't call that tier 4. Cheers, David ------------- PR: https://git.openjdk.java.net/jdk/pull/5357 From neliasso at openjdk.java.net Mon Sep 6 09:15:44 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Mon, 6 Sep 2021 09:15:44 GMT Subject: RFR: 8264207: CodeStrings does not honour fixed address assumption. [v2] In-Reply-To: References: Message-ID: On Fri, 3 Sep 2021 14:50:06 GMT, Patric Hedlin wrote: >> Please review this change that addresses an issue originally found in JDK-8259590, >> where the error message should read >> `fatal error: DEBUG MESSAGE: verify_oop: r10: broken oop in decode_heap_oop` >> but is lost since the code generated refers to a fixed (string) address, a string >> (address) no longer available when **CodeStrings** have been propagated (copied) >> between code buffers/blobs. >> >> Background >> >> **CodeStrings** used to be **CodeComments**. The solution to JDK-8008555 introduced >> this change and added a new use-case for **CodeStrings**, not only as comments, but >> as strings with a fixed address used in the code generated with support for debug >> string printouts. >> >> The changes introduced with JDK-8255208 breaks the fixed address assumption made >> in the code generated with support for debug string printouts. Some of the (necessary) >> move semantics have been replaced by copy semantics when **CodeStrings** are >> propagated between code buffers/blobs (i.e. **CodeBuffer** and **CodeBlob** objects). >> >> Additional issue addressed >> >> + Broken printout (i.e. missing remarks) when multiple stubs are generated within >> the same primary buffer. >> >> The following steps have been taken to provide fixed address (debug) strings. >> >> + Introduce a simple _gtest_ for **CodeString/s** support with **CodeBuffer/CodeBlob**. >> + Split **CodeStrings** into two different abstractions; strings associated with an offset, >> and strings with a fixed address. >> - Let the first be _Assembly Code Remarks_, **AsmRemarks**, providing a simple >> 1:N mapping, _offset_ -> _remark_. >> - Let the second be _Debug Printout Strings_, **DbgStrings**, supporting a fixed >> address assumption such that: >> for each string A and B, if A = B -> &A = &B. >> - Use a reference counting scheme to ensure that both types of strings are >> deallocated properly, when no longer in use. >> + Remove **CodeStrings** from **Stub** interface. >> - Replace with internal use in the interpreter code, propagating the code >> (assembly) remarks directly to the final codelet. >> + Remove **CodeBuffer** self destruction, overwriting memory before all >> deconstructors have been executed. >> - Replace with sentinel deconstructor to do the bidding. >> + Stub code generated into a single common **CodeBuffer** (holding a number >> of stubs) will not print the assembly remarks correctly (except for the very first >> stub code section). >> - Introduce a displacement to correct the offset. >> + Remove old **CodeString/s** implementation. >> >> Testing >> >> Tier1-3 in debug mode, using debug strings, _without_ collecting remarks. >> Tier1-3 in debug mode, using debug strings, _with_ collecting remarks (regardless of options). >> Manual inspection of results (linux-x64 and linux-aarch64 only) for the following command line options: >> `-XX:+PrintSignatureHandlers -XX:+PrintInterpreter -XX:+PrintStubCode -XX:+PrintAssembly` > > Patric Hedlin has updated the pull request incrementally with one additional commit since the last revision: > > Remove disturbing white-space. src/hotspot/share/compiler/disassembler.cpp line 650: > 648: _nm->print_block_comment(st, p); > 649: } > 650: else if (_codeBlob != nullptr) { This isn't code that you have changed - but still - why is _nm and _codeblob separate? print_block_comment is virtual and nmethod is a codeblob. ------------- PR: https://git.openjdk.java.net/jdk/pull/5281 From stefan.karlsson at oracle.com Mon Sep 6 09:50:32 2021 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Mon, 6 Sep 2021 11:50:32 +0200 Subject: [UNVERIFIED SENDER] RFR: 8273239: Standardize Ticks APIs return type In-Reply-To: References: <12d34936-f4d6-5a56-c693-4e18bcf8638c@oracle.com> <5610D974-4D73-48E6-B73A-2A5512774410@amazon.com> Message-ID: <719adc2d-ef36-f102-7d13-22a35be12017@oracle.com> On 2021-09-03 19:29, Kim Barrett wrote: >> On Sep 3, 2021, at 8:19 AM, Stefan Karlsson wrote: >> >> On 2021-09-02 23:40, Kim Barrett wrote: >>> [resending from the correct account, so it gets to the mailing list and the PR.] >>> >>>> On Sep 2, 2021, at 3:04 PM, Hohensee, Paul wrote: >>>> >>>> I haven't been following this thread, so please forgive redundancy. >>>> >>>> For the fully concurrent collectors such as Shenandoah, ZGC, and C4, we want to be able to measure intervals that may be shorter than a millisecond. Azul uses seconds-as-doubles to do this in their MXBean APIs (see https://docs.azul.com/prime/MXBeans), but given that Hotspot has access to nanotime counters and that a long can hold ~272 years of nanoseconds, I'd very much like for Hotspot to standardize on nanoseconds internally and make millis and seconds available as convenience methods. >>> There?s been some effort in GC code to use the Ticks utility (which I think ends up being nanoseconds >>> on all supported platforms) internally, and convert to other types for logging and other API boundaries >>> that require other types. See, for example, >>> https://bugs.openjdk.java.net/browse/JDK-8208390 >>> I thought that had already been done, but apparently not. Maybe there was just an unfinished prototype? >>> Quite possibly it got too big for one change set. >> This might not be what you were thinking about, but could be interesting to think about anyways. >> >> At some point I created a prototype to use a chrono-like API to solve the numerous bugs we've had when accidentally mixing seconds and milliseconds. See: > Yes, I remember this. I liked the idea, but at the time we didn?t have C++11/14 available. Now we do. > >> https://www.cplusplus.com/reference/chrono/ >> https://cr.openjdk.java.net/~stefank/prototype/durations/ >> >> That in itself was good enough to find conversion bugs. On top of this I started to convert the G1 code to use "duration" (backed by nanos), and only convert to seconds and milliseconds at the boundaries. I don't have that patch anymore, but from internal mails I see that these were my thoughts at that point: >> >> "I ... didn't like that because G1 pervasively uses doubles and does a lot of statistics on these double values, so using Duration<> forces me to uses duration_casts in many places in G1 (to allow lossy conversions). I'll reimplement that part so that I can show that part more clearly and so that we can discuss if there are better ways around those problems." > The current use of Ticks has essentially the same problem. Except we're > writing out the conversions all over the place, rather than having them > nicely packaged up. I find that frequently irritating, but nobody (including > me) has gotten so irritated as to do something about it. > >> Priorities changed and we're now at C++14. Would it make sense to investigate using chrono? > Unless and until we address the issue of JFR using the so-called > fast-unordered clock (JDK-8211240, which is confidential for annoying > reasons and needs to be resubmitted in the open. Summary is "Systematic > investigation regarding latency for os::elapsed_counter() vs rdtsc()".), we > need to use the paired form of Ticks. But I think the underlying > implementation of Ticks could use std::chrono. I had hoped for the opposite. Provide a clock source for Ticks, and use std::chrono throughout the JVM (or at least GC). > > We may (probably) want to use our own clock sources rather than whatever the > C++ standard library provides us, in order to be sure we're consistent with > other code, including Java. I don't remember how hard it is to define a > clock source, though I'm guessing it's not hard. In the prototype I set up two clock sources (elapsed time, elapsed counter). I think something very similar to that could be plugged into std::chrono. I tried to prototype a Ticks clock source, but hit problems with trying to convert durations based on Ticks to other types of durations: https://gist.github.com/stefank/e7128e72e3fd120384444337e96f21cc Maybe I'm missing something, or maybe chrono isn't easily adapted to use a non-primitive time representation, I'm not sure. This might point in the direction you mentioned, to instead use chrono inside Ticks. I'm not sure how much that would help the GC code. We would still have a mix usage of two different time APIs. StefanK From ngasson at openjdk.java.net Mon Sep 6 10:15:38 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Mon, 6 Sep 2021 10:15:38 GMT Subject: RFR: 8273318: Some containers/docker/TestJFREvents.java configs are running out of memory In-Reply-To: References: Message-ID: On Fri, 3 Sep 2021 10:41:20 GMT, Aleksey Shipilev wrote: > $ CONF=linux-x86_64-server-fastdebug make run-test TEST=containers/docker/TestJFREvents.java > > STDERR: > stdout: []; > stderr: [WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. > ] > exitValue = 137 > > java.lang.RuntimeException: Expected to get exit value of [0] > > at jdk.test.lib.process.OutputAnalyzer.shouldHaveExitValue(OutputAnalyzer.java:489) > at TestJFREvents.testContainerInfo(TestJFREvents.java:110) > at TestJFREvents.containerInfoTestCase(TestJFREvents.java:89) > at TestJFREvents.main(TestJFREvents.java:74) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:568) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:312) > at java.base/java.lang.Thread.run(Thread.java:833) > > > `exitValue = 137` suggests the container was killed by OOM killer. The failing configuration is with `64m`, and it is apparently too low. > > Additional testing: > - [x] Affected test now passes (5 runs out of 5 tries) > - [x] `containers/docker` tests pass I've also seen this failure on some AArch64 machines and this patch fixes it for me too. ------------- Marked as reviewed by ngasson (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5359 From shade at openjdk.java.net Mon Sep 6 11:13:51 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 6 Sep 2021 11:13:51 GMT Subject: RFR: 8273378: Shenandoah: Remove the remaining uses of os::is_MP Message-ID: JDK-8188764 removed many uses of `os::is_MP`, effectively defaulting it to `true`, but some Shenandoah code still has it. This is a simple omission. All current uses on x86 already imply lock prefix, so this is a cleanup, and not a functional change. Additional testing: - [x] Linux x86_64 `hotspot_gc_shenandoah` ------------- Commit messages: - Fix Changes: https://git.openjdk.java.net/jdk/pull/5378/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5378&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273378 Stats: 9 lines in 1 file changed: 6 ins; 3 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5378.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5378/head:pull/5378 PR: https://git.openjdk.java.net/jdk/pull/5378 From aph at openjdk.java.net Mon Sep 6 12:08:43 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 6 Sep 2021 12:08:43 GMT Subject: RFR: 8273378: Shenandoah: Remove the remaining uses of os::is_MP In-Reply-To: References: Message-ID: On Mon, 6 Sep 2021 11:03:43 GMT, Aleksey Shipilev wrote: > JDK-8188764 removed many uses of `os::is_MP`, effectively defaulting it to `true`, but some Shenandoah code still has it. This is a simple omission. All current uses on x86 already imply lock prefix, so this is a cleanup, and not a functional change. > > Additional testing: > - [x] Linux x86_64 `hotspot_gc_shenandoah` Marked as reviewed by aph (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5378 From shade at openjdk.java.net Mon Sep 6 13:18:06 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 6 Sep 2021 13:18:06 GMT Subject: RFR: 8273314: Add tier4 test groups [v2] In-Reply-To: <6rDioD5KZREYHyzOXol72O0erNDqKHUqb-0k1SwhInA=.90a12492-9e2f-4f0c-bf09-8d67d1d6111f@github.com> References: <6rDioD5KZREYHyzOXol72O0erNDqKHUqb-0k1SwhInA=.90a12492-9e2f-4f0c-bf09-8d67d1d6111f@github.com> Message-ID: > During the review of JDK-8272914 that added hotspot:tier{2,3} groups, @iignatev suggested to create tier4 groups that capture all tests not in tiers{1,2,3}. I have excluded `vmTestbase` and `hotspot:tier4,` because they take 10+ hours on my highly parallel machine. I have also excluded `applications` from `hotspot:tier4`, because they require test dependencies (e.g. jcstress). > > Sample run: > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR >>> jtreg:test/hotspot/jtreg:tier4 426 425 1 0 << >>> jtreg:test/jdk:tier4 2891 2885 4 2 << > jtreg:test/langtools:tier4 0 0 0 0 > jtreg:test/jaxp:tier4 0 0 0 0 > ============================== > > real 64m13.994s > user 1462m1.213s > sys 39m38.032s > > > There are interesting test failures on my machine, which I would address separately. Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Drop exceptions ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5357/files - new: https://git.openjdk.java.net/jdk/pull/5357/files/a0753adf..afb77062 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5357&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5357&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/5357.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5357/head:pull/5357 PR: https://git.openjdk.java.net/jdk/pull/5357 From shade at openjdk.java.net Mon Sep 6 13:22:03 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 6 Sep 2021 13:22:03 GMT Subject: RFR: 8273314: Add tier4 test groups [v3] In-Reply-To: <6rDioD5KZREYHyzOXol72O0erNDqKHUqb-0k1SwhInA=.90a12492-9e2f-4f0c-bf09-8d67d1d6111f@github.com> References: <6rDioD5KZREYHyzOXol72O0erNDqKHUqb-0k1SwhInA=.90a12492-9e2f-4f0c-bf09-8d67d1d6111f@github.com> Message-ID: > During the review of JDK-8272914 that added hotspot:tier{2,3} groups, @iignatev suggested to create tier4 groups that capture all tests not in tiers{1,2,3}. I have excluded `vmTestbase` and `hotspot:tier4,` because they take 10+ hours on my highly parallel machine. I have also excluded `applications` from `hotspot:tier4`, because they require test dependencies (e.g. jcstress). > > Sample run: > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR >>> jtreg:test/hotspot/jtreg:tier4 426 425 1 0 << >>> jtreg:test/jdk:tier4 2891 2885 4 2 << > jtreg:test/langtools:tier4 0 0 0 0 > jtreg:test/jaxp:tier4 0 0 0 0 > ============================== > > real 64m13.994s > user 1462m1.213s > sys 39m38.032s > > > There are interesting test failures on my machine, which I would address separately. Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Drop applications and fix the comment ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5357/files - new: https://git.openjdk.java.net/jdk/pull/5357/files/afb77062..160c13c7 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5357&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5357&range=01-02 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/5357.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5357/head:pull/5357 PR: https://git.openjdk.java.net/jdk/pull/5357 From neliasso at openjdk.java.net Mon Sep 6 14:18:39 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Mon, 6 Sep 2021 14:18:39 GMT Subject: RFR: 8264207: CodeStrings does not honour fixed address assumption. [v2] In-Reply-To: References: Message-ID: On Fri, 3 Sep 2021 14:50:06 GMT, Patric Hedlin wrote: >> Please review this change that addresses an issue originally found in JDK-8259590, >> where the error message should read >> `fatal error: DEBUG MESSAGE: verify_oop: r10: broken oop in decode_heap_oop` >> but is lost since the code generated refers to a fixed (string) address, a string >> (address) no longer available when **CodeStrings** have been propagated (copied) >> between code buffers/blobs. >> >> Background >> >> **CodeStrings** used to be **CodeComments**. The solution to JDK-8008555 introduced >> this change and added a new use-case for **CodeStrings**, not only as comments, but >> as strings with a fixed address used in the code generated with support for debug >> string printouts. >> >> The changes introduced with JDK-8255208 breaks the fixed address assumption made >> in the code generated with support for debug string printouts. Some of the (necessary) >> move semantics have been replaced by copy semantics when **CodeStrings** are >> propagated between code buffers/blobs (i.e. **CodeBuffer** and **CodeBlob** objects). >> >> Additional issue addressed >> >> + Broken printout (i.e. missing remarks) when multiple stubs are generated within >> the same primary buffer. >> >> The following steps have been taken to provide fixed address (debug) strings. >> >> + Introduce a simple _gtest_ for **CodeString/s** support with **CodeBuffer/CodeBlob**. >> + Split **CodeStrings** into two different abstractions; strings associated with an offset, >> and strings with a fixed address. >> - Let the first be _Assembly Code Remarks_, **AsmRemarks**, providing a simple >> 1:N mapping, _offset_ -> _remark_. >> - Let the second be _Debug Printout Strings_, **DbgStrings**, supporting a fixed >> address assumption such that: >> for each string A and B, if A = B -> &A = &B. >> - Use a reference counting scheme to ensure that both types of strings are >> deallocated properly, when no longer in use. >> + Remove **CodeStrings** from **Stub** interface. >> - Replace with internal use in the interpreter code, propagating the code >> (assembly) remarks directly to the final codelet. >> + Remove **CodeBuffer** self destruction, overwriting memory before all >> deconstructors have been executed. >> - Replace with sentinel deconstructor to do the bidding. >> + Stub code generated into a single common **CodeBuffer** (holding a number >> of stubs) will not print the assembly remarks correctly (except for the very first >> stub code section). >> - Introduce a displacement to correct the offset. >> + Remove old **CodeString/s** implementation. >> >> Testing >> >> Tier1-3 in debug mode, using debug strings, _without_ collecting remarks. >> Tier1-3 in debug mode, using debug strings, _with_ collecting remarks (regardless of options). >> Manual inspection of results (linux-x64 and linux-aarch64 only) for the following command line options: >> `-XX:+PrintSignatureHandlers -XX:+PrintInterpreter -XX:+PrintStubCode -XX:+PrintAssembly` > > Patric Hedlin has updated the pull request incrementally with one additional commit since the last revision: > > Remove disturbing white-space. A very nice cleanup! Looks good! ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5281 From zgu at openjdk.java.net Mon Sep 6 14:26:38 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Mon, 6 Sep 2021 14:26:38 GMT Subject: RFR: 8273378: Shenandoah: Remove the remaining uses of os::is_MP In-Reply-To: References: Message-ID: On Mon, 6 Sep 2021 11:03:43 GMT, Aleksey Shipilev wrote: > JDK-8188764 removed many uses of `os::is_MP`, effectively defaulting it to `true`, but some Shenandoah code still has it. This is a simple omission. All current uses on x86 already imply lock prefix, so this is a cleanup, and not a functional change. > > Additional testing: > - [x] Linux x86_64 `hotspot_gc_shenandoah` LGTM ------------- Marked as reviewed by zgu (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5378 From phedlin at openjdk.java.net Mon Sep 6 14:46:49 2021 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Mon, 6 Sep 2021 14:46:49 GMT Subject: RFR: 8264207: CodeStrings does not honour fixed address assumption. [v2] In-Reply-To: References: Message-ID: On Fri, 3 Sep 2021 14:50:06 GMT, Patric Hedlin wrote: >> Please review this change that addresses an issue originally found in JDK-8259590, >> where the error message should read >> `fatal error: DEBUG MESSAGE: verify_oop: r10: broken oop in decode_heap_oop` >> but is lost since the code generated refers to a fixed (string) address, a string >> (address) no longer available when **CodeStrings** have been propagated (copied) >> between code buffers/blobs. >> >> Background >> >> **CodeStrings** used to be **CodeComments**. The solution to JDK-8008555 introduced >> this change and added a new use-case for **CodeStrings**, not only as comments, but >> as strings with a fixed address used in the code generated with support for debug >> string printouts. >> >> The changes introduced with JDK-8255208 breaks the fixed address assumption made >> in the code generated with support for debug string printouts. Some of the (necessary) >> move semantics have been replaced by copy semantics when **CodeStrings** are >> propagated between code buffers/blobs (i.e. **CodeBuffer** and **CodeBlob** objects). >> >> Additional issue addressed >> >> + Broken printout (i.e. missing remarks) when multiple stubs are generated within >> the same primary buffer. >> >> The following steps have been taken to provide fixed address (debug) strings. >> >> + Introduce a simple _gtest_ for **CodeString/s** support with **CodeBuffer/CodeBlob**. >> + Split **CodeStrings** into two different abstractions; strings associated with an offset, >> and strings with a fixed address. >> - Let the first be _Assembly Code Remarks_, **AsmRemarks**, providing a simple >> 1:N mapping, _offset_ -> _remark_. >> - Let the second be _Debug Printout Strings_, **DbgStrings**, supporting a fixed >> address assumption such that: >> for each string A and B, if A = B -> &A = &B. >> - Use a reference counting scheme to ensure that both types of strings are >> deallocated properly, when no longer in use. >> + Remove **CodeStrings** from **Stub** interface. >> - Replace with internal use in the interpreter code, propagating the code >> (assembly) remarks directly to the final codelet. >> + Remove **CodeBuffer** self destruction, overwriting memory before all >> deconstructors have been executed. >> - Replace with sentinel deconstructor to do the bidding. >> + Stub code generated into a single common **CodeBuffer** (holding a number >> of stubs) will not print the assembly remarks correctly (except for the very first >> stub code section). >> - Introduce a displacement to correct the offset. >> + Remove old **CodeString/s** implementation. >> >> Testing >> >> Tier1-3 in debug mode, using debug strings, _without_ collecting remarks. >> Tier1-3 in debug mode, using debug strings, _with_ collecting remarks (regardless of options). >> Manual inspection of results (linux-x64 and linux-aarch64 only) for the following command line options: >> `-XX:+PrintSignatureHandlers -XX:+PrintInterpreter -XX:+PrintStubCode -XX:+PrintAssembly` > > Patric Hedlin has updated the pull request incrementally with one additional commit since the last revision: > > Remove disturbing white-space. Thanks for reviewing. ------------- PR: https://git.openjdk.java.net/jdk/pull/5281 From phedlin at openjdk.java.net Mon Sep 6 14:46:51 2021 From: phedlin at openjdk.java.net (Patric Hedlin) Date: Mon, 6 Sep 2021 14:46:51 GMT Subject: Integrated: 8264207: CodeStrings does not honour fixed address assumption. In-Reply-To: References: Message-ID: On Fri, 27 Aug 2021 15:19:42 GMT, Patric Hedlin wrote: > Please review this change that addresses an issue originally found in JDK-8259590, > where the error message should read > `fatal error: DEBUG MESSAGE: verify_oop: r10: broken oop in decode_heap_oop` > but is lost since the code generated refers to a fixed (string) address, a string > (address) no longer available when **CodeStrings** have been propagated (copied) > between code buffers/blobs. > > Background > > **CodeStrings** used to be **CodeComments**. The solution to JDK-8008555 introduced > this change and added a new use-case for **CodeStrings**, not only as comments, but > as strings with a fixed address used in the code generated with support for debug > string printouts. > > The changes introduced with JDK-8255208 breaks the fixed address assumption made > in the code generated with support for debug string printouts. Some of the (necessary) > move semantics have been replaced by copy semantics when **CodeStrings** are > propagated between code buffers/blobs (i.e. **CodeBuffer** and **CodeBlob** objects). > > Additional issue addressed > > + Broken printout (i.e. missing remarks) when multiple stubs are generated within > the same primary buffer. > > The following steps have been taken to provide fixed address (debug) strings. > > + Introduce a simple _gtest_ for **CodeString/s** support with **CodeBuffer/CodeBlob**. > + Split **CodeStrings** into two different abstractions; strings associated with an offset, > and strings with a fixed address. > - Let the first be _Assembly Code Remarks_, **AsmRemarks**, providing a simple > 1:N mapping, _offset_ -> _remark_. > - Let the second be _Debug Printout Strings_, **DbgStrings**, supporting a fixed > address assumption such that: > for each string A and B, if A = B -> &A = &B. > - Use a reference counting scheme to ensure that both types of strings are > deallocated properly, when no longer in use. > + Remove **CodeStrings** from **Stub** interface. > - Replace with internal use in the interpreter code, propagating the code > (assembly) remarks directly to the final codelet. > + Remove **CodeBuffer** self destruction, overwriting memory before all > deconstructors have been executed. > - Replace with sentinel deconstructor to do the bidding. > + Stub code generated into a single common **CodeBuffer** (holding a number > of stubs) will not print the assembly remarks correctly (except for the very first > stub code section). > - Introduce a displacement to correct the offset. > + Remove old **CodeString/s** implementation. > > Testing > > Tier1-3 in debug mode, using debug strings, _without_ collecting remarks. > Tier1-3 in debug mode, using debug strings, _with_ collecting remarks (regardless of options). > Manual inspection of results (linux-x64 and linux-aarch64 only) for the following command line options: > `-XX:+PrintSignatureHandlers -XX:+PrintInterpreter -XX:+PrintStubCode -XX:+PrintAssembly` This pull request has now been integrated. Changeset: 7bd4f496 Author: Patric Hedlin URL: https://git.openjdk.java.net/jdk/commit/7bd4f496b493b804990615f6ce2cb1b4abd29a86 Stats: 891 lines in 16 files changed: 536 ins; 128 del; 227 mod 8264207: CodeStrings does not honour fixed address assumption. Reviewed-by: redestad, neliasso ------------- PR: https://git.openjdk.java.net/jdk/pull/5281 From shade at openjdk.java.net Mon Sep 6 15:17:43 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 6 Sep 2021 15:17:43 GMT Subject: RFR: 8273314: Add tier4 test groups In-Reply-To: References: <6rDioD5KZREYHyzOXol72O0erNDqKHUqb-0k1SwhInA=.90a12492-9e2f-4f0c-bf09-8d67d1d6111f@github.com> Message-ID: On Sat, 4 Sep 2021 03:13:58 GMT, Igor Ignatyev wrote: >>> > <...> I have excluded `vmTestbase` and `hotspot:tier4`<...> I have also excluded `applications` from `hotspot:tier4` <...> >>> >>> assuming the goal of tier4 is to catch the rest of the tests, I don't think we should exclude `vmTestbase`, `applications` or any other tests from tier4. unless you also want to create tier5 for them. >> >> Apart from practicality of using `tier4`, I think `vmTestbase` and `applications` are separate test suites in their own right. `tier4` is catching all the assorted Hotspot tests that are not part of larger suites. Makes sense? > >> > > <...> I have excluded `vmTestbase` and `hotspot:tier4`<...> I have also excluded `applications` from `hotspot:tier4` <...> >> > >> > >> > assuming the goal of tier4 is to catch the rest of the tests, I don't think we should exclude `vmTestbase`, `applications` or any other tests from tier4. unless you also want to create tier5 for them. >> >> Apart from practicality of using `tier4`, I think `vmTestbase` and `applications` are separate test suites in their own right. `tier4` is catching all the assorted Hotspot tests that are not part of larger suites. Makes sense? > > to some extent. I agree that `applications` tests can/should be seen as a separate test suite, yet I differ on `vmTestbase` as the end goal for `vmTestbase` tests is (and always was) to become just another test within hotspot jtreg test suite, hence I don't think we should treat them any different than other jtreg tests. there is also a plan (which I need to formalize and share w/ a broader audience) to rearrange `vmTestbase` tests so they will be placed within the corresponding component subdirectories, which would bring us closer to the end goal and at the same time make it slightly harder to select all `vmTestbase` tests. > > -- Igor @iignatev: OK, I reinstated `vmTestbase` and left `applications` out of `hotspot:tier4`. @dholmes-ora: Generally speaking, all `tierX` definitions are rather arbitrary, as there seem to be nothing intrinsic about the tests to be in a particular tier. In other words, what `tierX` consists of is a matter of agreement. I'd say `hotspot:tier4` is "all assorted Hotspot tests that are not application-specific suites" @mrserb: Yes, I ran this on a headless config, so headful tests were skipped, apparently. I'll try to arrange runs on my desktop. Please see new commit. As `hotspot:tier4` is now much longer, it would take me a while to verify everything works there. ------------- PR: https://git.openjdk.java.net/jdk/pull/5357 From shade at openjdk.java.net Mon Sep 6 16:03:36 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 6 Sep 2021 16:03:36 GMT Subject: RFR: 8273378: Shenandoah: Remove the remaining uses of os::is_MP In-Reply-To: References: Message-ID: On Mon, 6 Sep 2021 11:03:43 GMT, Aleksey Shipilev wrote: > JDK-8188764 removed many uses of `os::is_MP`, effectively defaulting it to `true`, but some Shenandoah code still has it. This is a simple omission. All current uses on x86 already imply lock prefix, so this is a cleanup, and not a functional change. > > Additional testing: > - [x] Linux x86_64 `hotspot_gc_shenandoah` Thank you, GHA tests are green, so I am integrating. ------------- PR: https://git.openjdk.java.net/jdk/pull/5378 From shade at openjdk.java.net Mon Sep 6 16:03:37 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 6 Sep 2021 16:03:37 GMT Subject: Integrated: 8273378: Shenandoah: Remove the remaining uses of os::is_MP In-Reply-To: References: Message-ID: On Mon, 6 Sep 2021 11:03:43 GMT, Aleksey Shipilev wrote: > JDK-8188764 removed many uses of `os::is_MP`, effectively defaulting it to `true`, but some Shenandoah code still has it. This is a simple omission. All current uses on x86 already imply lock prefix, so this is a cleanup, and not a functional change. > > Additional testing: > - [x] Linux x86_64 `hotspot_gc_shenandoah` This pull request has now been integrated. Changeset: fc546d6d Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/fc546d6de9a3ed33cf4b04e24e62714332b069cb Stats: 9 lines in 1 file changed: 6 ins; 3 del; 0 mod 8273378: Shenandoah: Remove the remaining uses of os::is_MP Reviewed-by: aph, zgu ------------- PR: https://git.openjdk.java.net/jdk/pull/5378 From serb at openjdk.java.net Mon Sep 6 20:43:36 2021 From: serb at openjdk.java.net (Sergey Bylokhov) Date: Mon, 6 Sep 2021 20:43:36 GMT Subject: RFR: 8273314: Add tier4 test groups In-Reply-To: <4ZMjEBOeBv6RVtgEhcfQ1sMVUyL8t2v4W4sc3iGY3r0=.d88a0dbc-d8d0-497b-bde7-9d5de283d11f@github.com> References: <6rDioD5KZREYHyzOXol72O0erNDqKHUqb-0k1SwhInA=.90a12492-9e2f-4f0c-bf09-8d67d1d6111f@github.com> <4ZMjEBOeBv6RVtgEhcfQ1sMVUyL8t2v4W4sc3iGY3r0=.d88a0dbc-d8d0-497b-bde7-9d5de283d11f@github.com> Message-ID: On Sat, 4 Sep 2021 02:51:50 GMT, Sergey Bylokhov wrote: >> During the review of JDK-8272914 that added hotspot:tier{2,3} groups, @iignatev suggested to create tier4 groups that capture all tests not in tiers{1,2,3}. I have excluded `vmTestbase` and `hotspot:tier4,` because they take 10+ hours on my highly parallel machine. I have also excluded `applications` from `hotspot:tier4`, because they require test dependencies (e.g. jcstress). >> >> Sample run: >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >>>> jtreg:test/hotspot/jtreg:tier4 426 425 1 0 << >>>> jtreg:test/jdk:tier4 2891 2885 4 2 << >> jtreg:test/langtools:tier4 0 0 0 0 >> jtreg:test/jaxp:tier4 0 0 0 0 >> ============================== >> >> real 64m13.994s >> user 1462m1.213s >> sys 39m38.032s >> >> >> There are interesting test failures on my machine, which I would address separately. > > it looks like the results above do not include the headful tests did you filter them out? >>> jtreg:test/jdk:tier4 2891 2885 4 2 << > @mrserb: Yes, I ran this on a headless config, so headful tests were skipped, apparently. I'll try to arrange runs on my desktop. Then you probably need to skip "printer" tests as well. BTW it will be really good somehow to execute headless tests in tier4 concurrently, and run the headful tests in tier5 sequentially. ------------- PR: https://git.openjdk.java.net/jdk/pull/5357 From david.holmes at oracle.com Mon Sep 6 23:04:24 2021 From: david.holmes at oracle.com (David Holmes) Date: Tue, 7 Sep 2021 09:04:24 +1000 Subject: RFR: 8273314: Add tier4 test groups In-Reply-To: References: <6rDioD5KZREYHyzOXol72O0erNDqKHUqb-0k1SwhInA=.90a12492-9e2f-4f0c-bf09-8d67d1d6111f@github.com> Message-ID: <25b0cf03-689a-93d8-6fca-35a465a8e631@oracle.com> On 7/09/2021 1:17 am, Aleksey Shipilev wrote: > @dholmes-ora: Generally speaking, all `tierX` definitions are rather arbitrary, as there seem to be nothing intrinsic about the tests to be in a particular tier. In other words, what `tierX` consists of is a matter of agreement. I'd say `hotspot:tier4` is "all assorted Hotspot tests that are not application-specific suites" The difference is that your previous work just consolidated the existing subsystem tier 1-3 definitions, but here you are choosing to define "all the rest" as tier 4. I don't think it is actually helpful/useful to anyone - and it bears no resemblance whatsoever to what we call "tier 4", so that will just lead to unnecessary confusion IMO. David > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/5357 > From whuang at openjdk.java.net Tue Sep 7 01:43:48 2021 From: whuang at openjdk.java.net (Wang Huang) Date: Tue, 7 Sep 2021 01:43:48 GMT Subject: Integrated: 8270832: Aarch64: Update algorithm annotations for MacroAssembler::fill_words In-Reply-To: <2xPts-aE-Mr-T24nLCj5WZnGieBVx9oVtJ-WzKcU0mM=.ad6e2fe0-db5f-48c9-a604-88b332c50db1@github.com> References: <2xPts-aE-Mr-T24nLCj5WZnGieBVx9oVtJ-WzKcU0mM=.ad6e2fe0-db5f-48c9-a604-88b332c50db1@github.com> Message-ID: On Fri, 16 Jul 2021 11:20:45 GMT, Wang Huang wrote: > It is found that the comments of `MacroAssembler::fill_words` is not right here. Let's fix that. This pull request has now been integrated. Changeset: 649c22c5 Author: Wang Huang Committer: Nick Gasson URL: https://git.openjdk.java.net/jdk/commit/649c22c5b17efbc3116ac34739b8d1be39de01be Stats: 21 lines in 1 file changed: 14 ins; 0 del; 7 mod 8270832: Aarch64: Update algorithm annotations for MacroAssembler::fill_words Co-authored-by: Wang Huang Co-authored-by: Miu Zhuojun Co-authored-by: Wu Yan Reviewed-by: ngasson, aph ------------- PR: https://git.openjdk.java.net/jdk/pull/4809 From iignatyev at openjdk.java.net Tue Sep 7 04:20:38 2021 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Tue, 7 Sep 2021 04:20:38 GMT Subject: RFR: 8273314: Add tier4 test groups [v3] In-Reply-To: References: <6rDioD5KZREYHyzOXol72O0erNDqKHUqb-0k1SwhInA=.90a12492-9e2f-4f0c-bf09-8d67d1d6111f@github.com> Message-ID: On Mon, 6 Sep 2021 13:22:03 GMT, Aleksey Shipilev wrote: >> During the review of JDK-8272914 that added hotspot:tier{2,3} groups, @iignatev suggested to create tier4 groups that capture all tests not in tiers{1,2,3}. I have excluded `vmTestbase` and `hotspot:tier4,` because they take 10+ hours on my highly parallel machine. I have also excluded `applications` from `hotspot:tier4`, because they require test dependencies (e.g. jcstress). >> >> Sample run: >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >>>> jtreg:test/hotspot/jtreg:tier4 426 425 1 0 << >>>> jtreg:test/jdk:tier4 2891 2885 4 2 << >> jtreg:test/langtools:tier4 0 0 0 0 >> jtreg:test/jaxp:tier4 0 0 0 0 >> ============================== >> >> real 64m13.994s >> user 1462m1.213s >> sys 39m38.032s >> >> >> There are interesting test failures on my machine, which I would address separately. > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Drop applications and fix the comment > _Mailing list message from [David Holmes](mailto:david.holmes at oracle.com) on [hotspot-dev](mailto:hotspot-dev at mail.openjdk.java.net):_ > > On 7/09/2021 1:17 am, Aleksey Shipilev wrote: > > > @dholmes-ora: Generally speaking, all `tierX` definitions are rather arbitrary, as there seem to be nothing intrinsic about the tests to be in a particular tier. In other words, what `tierX` consists of is a matter of agreement. I'd say `hotspot:tier4` is "all assorted Hotspot tests that are not application-specific suites" > > The difference is that your previous work just consolidated the existing > subsystem tier 1-3 definitions, but here you are choosing to define "all > the rest" as tier 4. I don't think it is actually helpful/useful to > anyone - and it bears no resemblance whatsoever to what we call "tier > 4", so that will just lead to unnecessary confusion IMO. @dholmes-ora , although I fully agree that this might lead to some misunderstanding b/w Oracle and non-Oracle folks, I don't see how it's different from the previous patch, which introduced `hotspot:tier2` and `hotspot:tier3`. even if we reduce `tierN` to just a set of tests, the test groups added by 8272914 bear as much resemblance to the test sets used in Oracle's tier2-3 as the suggested `hotspot:tier4` groups in this patch to the actual `tier4` definition used in Oracle's internal system, e.g. `hotspot:tier2` group has 0 tests from `test/hotspot/jtreg/compiler`, but Oracle's `tier2` does include a number of `test/hotspot/jtreg/compiler` tests (which aren't part of `:tier1`). I believe that this patch actually moves us closer to a convergence point, as the union of `hotspot:tier1` -- `hotspot:tier4` test groups is very close to the test sets used in hotspot parts of Oracle's `tier1` -- `tier4` definitions. Thanks, -- Igor ------------- PR: https://git.openjdk.java.net/jdk/pull/5357 From david.holmes at oracle.com Tue Sep 7 05:05:39 2021 From: david.holmes at oracle.com (David Holmes) Date: Tue, 7 Sep 2021 15:05:39 +1000 Subject: RFR: 8273314: Add tier4 test groups [v3] In-Reply-To: References: <6rDioD5KZREYHyzOXol72O0erNDqKHUqb-0k1SwhInA=.90a12492-9e2f-4f0c-bf09-8d67d1d6111f@github.com> Message-ID: <6000402c-061e-d786-3367-73a8bb934811@oracle.com> On 7/09/2021 2:20 pm, Igor Ignatyev wrote: > On Mon, 6 Sep 2021 13:22:03 GMT, Aleksey Shipilev wrote: > >>> During the review of JDK-8272914 that added hotspot:tier{2,3} groups, @iignatev suggested to create tier4 groups that capture all tests not in tiers{1,2,3}. I have excluded `vmTestbase` and `hotspot:tier4,` because they take 10+ hours on my highly parallel machine. I have also excluded `applications` from `hotspot:tier4`, because they require test dependencies (e.g. jcstress). >>> >>> Sample run: >>> >>> >>> ============================== >>> Test summary >>> ============================== >>> TEST TOTAL PASS FAIL ERROR >>>>> jtreg:test/hotspot/jtreg:tier4 426 425 1 0 << >>>>> jtreg:test/jdk:tier4 2891 2885 4 2 << >>> jtreg:test/langtools:tier4 0 0 0 0 >>> jtreg:test/jaxp:tier4 0 0 0 0 >>> ============================== >>> >>> real 64m13.994s >>> user 1462m1.213s >>> sys 39m38.032s >>> >>> >>> There are interesting test failures on my machine, which I would address separately. >> >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Drop applications and fix the comment > >> _Mailing list message from [David Holmes](mailto:david.holmes at oracle.com) on [hotspot-dev](mailto:hotspot-dev at mail.openjdk.java.net):_ >> >> On 7/09/2021 1:17 am, Aleksey Shipilev wrote: >> >>> @dholmes-ora: Generally speaking, all `tierX` definitions are rather arbitrary, as there seem to be nothing intrinsic about the tests to be in a particular tier. In other words, what `tierX` consists of is a matter of agreement. I'd say `hotspot:tier4` is "all assorted Hotspot tests that are not application-specific suites" >> >> The difference is that your previous work just consolidated the existing >> subsystem tier 1-3 definitions, but here you are choosing to define "all >> the rest" as tier 4. I don't think it is actually helpful/useful to >> anyone - and it bears no resemblance whatsoever to what we call "tier >> 4", so that will just lead to unnecessary confusion IMO. > > @dholmes-ora , although I fully agree that this might lead to some misunderstanding b/w Oracle and non-Oracle folks, I don't see how it's different from the previous patch, which introduced `hotspot:tier2` and `hotspot:tier3`. Because hotspot:tier2 and hotspot:tier3 simply grouped existing component definitions for tiers 2 and 3: +tier2 = \ + :hotspot_tier2_runtime \ + :hotspot_tier2_runtime_platform_agnostic \ + :hotspot_tier2_serviceability \ + :tier2_gc_epsilon \ + :tier2_gc_shenandoah + +tier3 = \ + :hotspot_tier3_runtime \ + :tier3_gc_shenandoah but that is not the case for tier4. > even if we reduce `tierN` to just a set of tests, the test groups added by 8272914 bear as much resemblance to the test sets used in Oracle's tier2-3 as the suggested `hotspot:tier4` groups in this patch to the actual `tier4` definition used in Oracle's internal system, e.g. `hotspot:tier2` group has 0 tests from `test/hotspot/jtreg/compiler`, but Oracle's `tier2` does include a number of `test/hotspot/jtreg/compiler` tests (which aren't part of `:tier1`). I believe that this patch actually moves us closer to a convergence point, as the union of `hotspot:tier1` -- `hotspot:tier4` test groups is very close to the test sets used in hotspot parts of Oracle's `tier1` -- `tier4` definitions. We can discuss Oracle's tier definitions privately - I don't see any connection between those and this tier4 however. Thanks, David ----- > Thanks, > -- Igor > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/5357 > From github.com+39413832+weixlu at openjdk.java.net Tue Sep 7 06:20:39 2021 From: github.com+39413832+weixlu at openjdk.java.net (Xiaowei Lu) Date: Tue, 7 Sep 2021 06:20:39 GMT Subject: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v8] In-Reply-To: References: <_IqJ7u4Vk7jF8E--2RzWfdnxYXDQr86TIsxA7sh_3WI=.4d2c4cd9-63c8-4921-b5a1-e77d66c10325@github.com> Message-ID: <_cmSjYRldX4qwVo8b-esOjzfMwiW9fLR1l6Zr9-7eEc=.b220343c-cfc8-49d0-98fc-1dde5dc25c08@github.com> On Tue, 31 Aug 2021 13:24:37 GMT, Aleksey Shipilev wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Add TODO > > More work: leave `acquire`-in-lieu-of-`consume` in, and special case the heap update paths to dodge the performance penalty of doing so. Seems to work on x86_64 and AArch64. @shipilev Hi, I have tested this pull request as well as this pull request + `OrderAccess::release();` on specjbb 2015 on AArch64 (Kunpeng 920). Maybe there is a slight improvement on critical-jOPS? Here is the result. numactl --cpubind=1 --membind=1 ${build}/images/jdk/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC -Xmx100g -Xms75g -Xlog:gc*:gclog_${build}_${i} -jar specjbb2015.jar -m COMPOSITE base_1:RUN RESULT: hbIR (max attempted) = 34282, hbIR (settled) = 32419, max-jOPS = 28797, critical-jOPS = 21801 base_2:RUN RESULT: hbIR (max attempted) = 34282, hbIR (settled) = 32419, max-jOPS = 30168, critical-jOPS = 21513 base_3:RUN RESULT: hbIR (max attempted) = 41119, hbIR (settled) = 34282, max-jOPS = 28783, critical-jOPS = 21516 OrderAccess_release_1:RUN RESULT: hbIR (max attempted) = 34282, hbIR (settled) = 32419, max-jOPS = 29483, critical-jOPS = 21979 OrderAccess_release_2:RUN RESULT: hbIR (max attempted) = 34282, hbIR (settled) = 32419, max-jOPS = 29483, critical-jOPS = 22096 OrderAccess_release_3:RUN RESULT: hbIR (max attempted) = 41119, hbIR (settled) = 34282, max-jOPS = 30017, critical-jOPS = 22929 ------------- PR: https://git.openjdk.java.net/jdk/pull/2496 From shade at openjdk.java.net Tue Sep 7 08:14:37 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 7 Sep 2021 08:14:37 GMT Subject: RFR: 8273314: Add tier4 test groups [v3] In-Reply-To: References: <6rDioD5KZREYHyzOXol72O0erNDqKHUqb-0k1SwhInA=.90a12492-9e2f-4f0c-bf09-8d67d1d6111f@github.com> Message-ID: <4izBbOzjSpoP4EwfJPEILXvLU0fCdI6xy4PTo3mYEtI=.5796ff23-e9f2-4cbe-8c8c-eb825633dd66@github.com> On Mon, 6 Sep 2021 13:22:03 GMT, Aleksey Shipilev wrote: >> During the review of JDK-8272914 that added hotspot:tier{2,3} groups, @iignatev suggested to create tier4 groups that capture all tests not in tiers{1,2,3}. I have excluded `vmTestbase` and `hotspot:tier4,` because they take 10+ hours on my highly parallel machine. I have also excluded `applications` from `hotspot:tier4`, because they require test dependencies (e.g. jcstress). >> >> Sample run: >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >>>> jtreg:test/hotspot/jtreg:tier4 426 425 1 0 << >>>> jtreg:test/jdk:tier4 2891 2885 4 2 << >> jtreg:test/langtools:tier4 0 0 0 0 >> jtreg:test/jaxp:tier4 0 0 0 0 >> ============================== >> >> real 64m13.994s >> user 1462m1.213s >> sys 39m38.032s >> >> >> There are interesting test failures on my machine, which I would address separately. > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Drop applications and fix the comment Once again, the disconnect between Oracle and OpenJDK test definitions seems to be the problem for Oracle's side. Rectifying that disconnect might fall under the scope of this PR, but I have to point out that it is a courtesy of upstream open-source project to care about proprietary downstream definitions. More to the point: since `tier4` as "catch-all-the-rest" was Igor's idea, I assumed that would be in agreement with Oracle's test definitions. Following this discussion, it seems I assumed wrong. So it puts me in a weird position to be between two Oracle engineers arguing about proprietary test definitions I cannot really know about, and have no decision power about. For all I care for OpenJDK, we might as well model `tier4` after what Oracle does, as to minimize confusion for Oracle engineers. But then again, I have no idea what Oracle means by `tier4`. So as the alternative, I can postpone this PR until you folks have a coherent view on this, or I can just give up on this PR and re-assign the RFE to Igor, assuming he is willing to work this out. Tell me what you want me to do here. ------------- PR: https://git.openjdk.java.net/jdk/pull/5357 From shade at openjdk.java.net Tue Sep 7 08:27:36 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 7 Sep 2021 08:27:36 GMT Subject: RFR: 8273318: Some containers/docker/TestJFREvents.java configs are running out of memory In-Reply-To: References: Message-ID: On Mon, 6 Sep 2021 10:12:33 GMT, Nick Gasson wrote: >> $ CONF=linux-x86_64-server-fastdebug make run-test TEST=containers/docker/TestJFREvents.java >> >> STDERR: >> stdout: []; >> stderr: [WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. >> ] >> exitValue = 137 >> >> java.lang.RuntimeException: Expected to get exit value of [0] >> >> at jdk.test.lib.process.OutputAnalyzer.shouldHaveExitValue(OutputAnalyzer.java:489) >> at TestJFREvents.testContainerInfo(TestJFREvents.java:110) >> at TestJFREvents.containerInfoTestCase(TestJFREvents.java:89) >> at TestJFREvents.main(TestJFREvents.java:74) >> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) >> at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.base/java.lang.reflect.Method.invoke(Method.java:568) >> at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:312) >> at java.base/java.lang.Thread.run(Thread.java:833) >> >> >> `exitValue = 137` suggests the container was killed by OOM killer. The failing configuration is with `64m`, and it is apparently too low. >> >> Additional testing: >> - [x] Affected test now passes (5 runs out of 5 tries) >> - [x] `containers/docker` tests pass > > I've also seen this failure on some AArch64 machines and this patch fixes it for me too. Thanks for review, @nick-arm. I wonder if @mseledts wants to take look as well? ------------- PR: https://git.openjdk.java.net/jdk/pull/5359 From shade at openjdk.java.net Tue Sep 7 08:36:41 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 7 Sep 2021 08:36:41 GMT Subject: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v8] In-Reply-To: References: <_IqJ7u4Vk7jF8E--2RzWfdnxYXDQr86TIsxA7sh_3WI=.4d2c4cd9-63c8-4921-b5a1-e77d66c10325@github.com> Message-ID: On Tue, 31 Aug 2021 13:24:37 GMT, Aleksey Shipilev wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Add TODO > > More work: leave `acquire`-in-lieu-of-`consume` in, and special case the heap update paths to dodge the performance penalty of doing so. Seems to work on x86_64 and AArch64. > @shipilev Hi, I have tested this pull request as well as this pull request + `OrderAccess::release();` on specjbb 2015 on AArch64 (Kunpeng 920). Maybe there is a slight improvement on critical-jOPS? Here is the result. Thanks for testing. So explicit barrier does seem to result in a slight bump in critical-jOPS. I assume "base" results are this PR? If so, do you have performance results for the current master? In other words, it would be interesting to see three results: baseline (current master), this PR, and this PR + `OrderAccess::release()`. ------------- PR: https://git.openjdk.java.net/jdk/pull/2496 From lkorinth at openjdk.java.net Tue Sep 7 12:34:55 2021 From: lkorinth at openjdk.java.net (Leo Korinth) Date: Tue, 7 Sep 2021 12:34:55 GMT Subject: RFR: 8269537: memset() is called after operator new Message-ID: The basic problem is that we are relying on undefined behaviour, as documented in the code: // This whole business of passing information from ResourceObj::operator new // to the ResourceObj constructor via fields in the "object" is technically UB. // But it seems to work within the limitations of HotSpot usage (such as no // multiple inheritance) with the compilers and compiler options we're using. // And it gives some possibly useful checking for misuse of ResourceObj. I am removing the undefined behaviour by passing the type of allocation through a thread local variable. This solution has some advantages: 1) it is not UB 2) it is simpler and easier to understand 3) it uses less memory (I could make it use even less if I made the enum `allocation_type` a u8) 4) in the *very* unlikely situation that stack memory (or embedded) already equals the data calculated from the address of the object, the code will also work. When doing the change, I also updated `allocated_on_stack()` to the new name `allocated_on_stack_or_embedded()` which is much harder to misinterpret. I also disallow to "fake" the memory type by explicitly calling `ResourceObj::set_allocation_type`. This forced me to change two places that is faking the allocation type of an embedded `GrowableArray` from `STACK_OR_EMBEDDED` to `C_HEAP`. The faking of the type is hard to understand as a `STACK_OR_EMBEDDED` `GrowableArray` can allocate any type of object. My guess is that `GrowableArray` has changed behaviour, or maybe that it was hard to understand because the old naming of `allocated_on_stack()`. I have also tried to update the comments. In doing that I not only changed the comments for this change, but also for the *incorrect* advice to always delete object you allocate with new. Testing on debug build tier1-3 Testing on release build tier1 ------------- Commit messages: - 8269537: memset() is called after operator new Changes: https://git.openjdk.java.net/jdk/pull/5387/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5387&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8269537 Stats: 109 lines in 8 files changed: 1 ins; 66 del; 42 mod Patch: https://git.openjdk.java.net/jdk/pull/5387.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5387/head:pull/5387 PR: https://git.openjdk.java.net/jdk/pull/5387 From sgehwolf at openjdk.java.net Tue Sep 7 12:50:41 2021 From: sgehwolf at openjdk.java.net (Severin Gehwolf) Date: Tue, 7 Sep 2021 12:50:41 GMT Subject: RFR: 8273318: Some containers/docker/TestJFREvents.java configs are running out of memory In-Reply-To: References: Message-ID: On Fri, 3 Sep 2021 10:41:20 GMT, Aleksey Shipilev wrote: > $ CONF=linux-x86_64-server-fastdebug make run-test TEST=containers/docker/TestJFREvents.java > > STDERR: > stdout: []; > stderr: [WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. > ] > exitValue = 137 > > java.lang.RuntimeException: Expected to get exit value of [0] > > at jdk.test.lib.process.OutputAnalyzer.shouldHaveExitValue(OutputAnalyzer.java:489) > at TestJFREvents.testContainerInfo(TestJFREvents.java:110) > at TestJFREvents.containerInfoTestCase(TestJFREvents.java:89) > at TestJFREvents.main(TestJFREvents.java:74) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:568) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:312) > at java.base/java.lang.Thread.run(Thread.java:833) > > > `exitValue = 137` suggests the container was killed by OOM killer. The failing configuration is with `64m`, and it is apparently too low. > > Additional testing: > - [x] Affected test now passes (5 runs out of 5 tries) > - [x] `containers/docker` tests pass LGTM ------------- Marked as reviewed by sgehwolf (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5359 From aph at openjdk.java.net Tue Sep 7 14:36:53 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 7 Sep 2021 14:36:53 GMT Subject: RFR: JDK-8271567: AArch64: AES Galois CounterMode (GCM) interleaved implementation using vector instructions Message-ID: An interleaved version of AES/GCM. Performance, now and then: Apple M1, 3.2 GHz: Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units AESGCMBench.decrypt 8192 256 avgt 6 3108.881 ? 119.675 ns/op AESGCMBench.decryptMultiPart 8192 256 avgt 6 3109.685 ? 4.206 ns/op AESGCMBench.encrypt 8192 256 avgt 6 3122.144 ? 113.379 ns/op AESGCMBench.encryptMultiPart 8192 256 avgt 6 3119.568 ? 192.877 ns/op AESGCMBench.decrypt 8192 256 avgt 6 89123.942 ? 111.977 ns/op AESGCMBench.decryptMultiPart 8192 256 avgt 6 91034.697 ? 161.469 ns/op AESGCMBench.encrypt 8192 256 avgt 6 89732.397 ? 106.370 ns/op AESGCMBench.encryptMultiPart 8192 256 avgt 6 89382.300 ? 139.300 ns/op Neoverse N1, 2.5GHz: Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units AESGCMBench.decrypt 8192 256 avgt 6 6296.575 ? 37.995 ns/op AESGCMBench.decryptMultiPart 8192 256 avgt 6 7380.326 ? 10.987 ns/op AESGCMBench.encrypt 8192 256 avgt 6 6293.090 ? 52.972 ns/op AESGCMBench.encryptMultiPart 8192 256 avgt 6 6357.536 ? 42.925 ns/op AESGCMBench.decrypt 8192 256 avgt 6 48745.085 ? 125.612 ns/op AESGCMBench.decryptMultiPart 8192 256 avgt 6 45062.599 ? 1548.950 ns/op AESGCMBench.encrypt 8192 256 avgt 6 42230.857 ? 520.562 ns/op AESGCMBench.encryptMultiPart 8192 256 avgt 6 45124.171 ? 1417.927 ns/op A note about the implementation for the reviewers: Unrolled and hand-scheduled intrinsics are often written in a way that I don't find satisfactory. Often they are a conglomeration of copy-and-paste programming and C macros, which makes them hard to understand and hard to maintain. I won't name any names, but there are many exampled to be found in free software across the Internet, I spent a while thinking about a structured way to develop and implement them, and I think I've got something better. The idea is that you transform a pre-existing implementation into a generator for the interleaved version. The transformation shouldn't be too hard to do, but more importantly it should be possible for a reader to verify that the interleaved and unrolled version performs the same function. A generator takes the form of a subclass of `KernelGenerator`. The core idea is that the programmer defines the base case of the intrinsic and a method to generate a clone of it, shifted to a different set of registers. `KernelGenerator` will then generate several interleaved copies of the function, with each one using a different set of registers. The subclass must implement three methods: `length()`, which is the number of instruction bundles in the intrinsic, `generate(int n)` which emits the nth instruction bundle in the intrinsic, and `next()` which takes an instance of the generator and returns a version of it, shifted to a new set of registers. As an example, here's the inner loop of AES encryption: (Some details elided for clarity.) BIND(L_aes_loop); ld1(v0, T16B, post(from, 16)); br(Assembler::CC, L_rounds_44); br(Assembler::EQ, L_rounds_52); aes_round(v0, v17); aes_round(v0, v18); BIND(L_rounds_52); aes_round(v0, v19); aes_round(v0, v20); BIND(L_rounds_44); ... The generator for the unrolled version looks like: virtual void generate(int index) { switch (index) { case 0: ld1(_data, T16B, _from); // get 16 bytes of input break; case 1: if (_once) { cmpw(_keylen, 52); br(Assembler::LO, _rounds_44); br(Assembler::EQ, _rounds_52); } break; case 2: aes_round(_data, _subkeys + 0); break; case 3: aes_round(_data, _subkeys + 1); break; case 4: if (_once) bind(_rounds_52); break; case 5: aes_round(_data, _subkeys + 2); break; case 6: aes_round(_data, _subkeys + 3); break; case 7: if (_once) bind(_rounds_44); break; ... The job of converting a single inline intrinsic is, as you can see, not much more than adding a switch statement. Some instructions should only be emitted once, rather than several times, such as the labels and branches. (You can use a list of C++ lambdas rather than a switch statement to do the same thing, very LISP, but that seems a bit of a sledgehammer. YMMV.) I believe that this approach will be more maintainable and easier to understand than other approaches we've seen. Also, the number of unrolls is just a number that can be tweaked as required. ------------- Commit messages: - Cosmetics - Cosmetics - Enable AES on Apple - Rebase Changes: https://git.openjdk.java.net/jdk/pull/5390/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5390&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8271567 Stats: 1352 lines in 7 files changed: 1127 ins; 210 del; 15 mod Patch: https://git.openjdk.java.net/jdk/pull/5390.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5390/head:pull/5390 PR: https://git.openjdk.java.net/jdk/pull/5390 From aph at openjdk.java.net Tue Sep 7 14:40:02 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 7 Sep 2021 14:40:02 GMT Subject: RFR: JDK-8271567: AArch64: AES Galois CounterMode (GCM) interleaved implementation using vector instructions [v2] In-Reply-To: References: Message-ID: > An interleaved version of AES/GCM. > > Performance, now and then: > > > Apple M1, 3.2 GHz: > > Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > AESGCMBench.decrypt 8192 256 avgt 6 3108.881 ? 119.675 ns/op > AESGCMBench.decryptMultiPart 8192 256 avgt 6 3109.685 ? 4.206 ns/op > AESGCMBench.encrypt 8192 256 avgt 6 3122.144 ? 113.379 ns/op > AESGCMBench.encryptMultiPart 8192 256 avgt 6 3119.568 ? 192.877 ns/op > > AESGCMBench.decrypt 8192 256 avgt 6 89123.942 ? 111.977 ns/op > AESGCMBench.decryptMultiPart 8192 256 avgt 6 91034.697 ? 161.469 ns/op > AESGCMBench.encrypt 8192 256 avgt 6 89732.397 ? 106.370 ns/op > AESGCMBench.encryptMultiPart 8192 256 avgt 6 89382.300 ? 139.300 ns/op > > Neoverse N1, 2.5GHz: > > Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > AESGCMBench.decrypt 8192 256 avgt 6 6296.575 ? 37.995 ns/op > AESGCMBench.decryptMultiPart 8192 256 avgt 6 7380.326 ? 10.987 ns/op > AESGCMBench.encrypt 8192 256 avgt 6 6293.090 ? 52.972 ns/op > AESGCMBench.encryptMultiPart 8192 256 avgt 6 6357.536 ? 42.925 ns/op > > AESGCMBench.decrypt 8192 256 avgt 6 48745.085 ? 125.612 ns/op > AESGCMBench.decryptMultiPart 8192 256 avgt 6 45062.599 ? 1548.950 ns/op > AESGCMBench.encrypt 8192 256 avgt 6 42230.857 ? 520.562 ns/op > AESGCMBench.encryptMultiPart 8192 256 avgt 6 45124.171 ? 1417.927 ns/op > > > > A note about the implementation for the reviewers: > > Unrolled and hand-scheduled intrinsics are often written in a way that > I don't find satisfactory. Often they are a conglomeration of > copy-and-paste programming and C macros, which makes them hard to > understand and hard to maintain. I won't name any names, but there are > many exampled to be found in free software across the Internet, > > I spent a while thinking about a structured way to develop and > implement them, and I think I've got something better. The idea is > that you transform a pre-existing implementation into a generator for > the interleaved version. The transformation shouldn't be too hard to > do, but more importantly it should be possible for a reader to verify > that the interleaved and unrolled version performs the same function. > > A generator takes the form of a subclass of `KernelGenerator`. The > core idea is that the programmer defines the base case of the > intrinsic and a method to generate a clone of it, shifted to a > different set of registers. `KernelGenerator` will then generate > several interleaved copies of the function, with each one using a > different set of registers. > > The subclass must implement three methods: `length()`, which is the > number of instruction bundles in the intrinsic, `generate(int n)` > which emits the nth instruction bundle in the intrinsic, and `next()` > which takes an instance of the generator and returns a version of it, > shifted to a new set of registers. > > As an example, here's the inner loop of AES encryption: > > (Some details elided for clarity.) > > > BIND(L_aes_loop); > ld1(v0, T16B, post(from, 16)); > > br(Assembler::CC, L_rounds_44); > br(Assembler::EQ, L_rounds_52); > > aes_round(v0, v17); > aes_round(v0, v18); > BIND(L_rounds_52); > aes_round(v0, v19); > aes_round(v0, v20); > BIND(L_rounds_44); > ... > > > The generator for the unrolled version looks like: > > > virtual void generate(int index) { > switch (index) { > case 0: > ld1(_data, T16B, _from); // get 16 bytes of input > break; > case 1: > if (_once) { > cmpw(_keylen, 52); > br(Assembler::LO, _rounds_44); > br(Assembler::EQ, _rounds_52); > } > break; > case 2: aes_round(_data, _subkeys + 0); break; > case 3: aes_round(_data, _subkeys + 1); break; > case 4: > if (_once) bind(_rounds_52); > break; > case 5: aes_round(_data, _subkeys + 2); break; > case 6: aes_round(_data, _subkeys + 3); break; > case 7: > if (_once) bind(_rounds_44); > break; > ... > > > The job of converting a single inline intrinsic is, as you can see, > not much more than adding a switch statement. Some instructions should > only be emitted once, rather than several times, such as the labels > and branches. (You can use a list of C++ lambdas rather than a switch > statement to do the same thing, very LISP, but that seems a bit of a > sledgehammer. YMMV.) > > I believe that this approach will be more maintainable and easier to > understand than other approaches we've seen. Also, the number of > unrolls is just a number that can be tweaked as required. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Fix includes ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5390/files - new: https://git.openjdk.java.net/jdk/pull/5390/files/eb797c34..9692272e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5390&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5390&range=00-01 Stats: 3 lines in 1 file changed: 2 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5390.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5390/head:pull/5390 PR: https://git.openjdk.java.net/jdk/pull/5390 From shade at openjdk.java.net Tue Sep 7 15:14:49 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 7 Sep 2021 15:14:49 GMT Subject: RFR: 8273438: Enable parallelism in vmTestbase/metaspace/stressHierarchy tests Message-ID: <6UGVOWy8QGpYDMbNFkT6qIERHESLMdZpvz8ihmm_obg=.cadc1f2e-3325-4bdc-a0c7-e4579f72663f@github.com> Current `vmTestbase/metaspace/stressHierarchy` tests (part of vmTestbase_vm_metaspace suite) contains about 15 tests, each running exclusively. There seem to be no reason to run them exclusively, though: they complete in reasonable time, are single-threaded, and consume the usual amount of memory. There is no evidence in JBS that they ever timed out without a reason, and their history unfortunately predates OpenJDK to see why they were not concurrent from day one. We should consider enabling parallelism for `vmTestbase/metaspace/stressHierarchy` and get improved test performance. Currently it is blocked by `TEST.properties` with `exclusiveAccess.dirs` directives in them. Note there are other exclusive tests in `vmTestbase_vm_metaspace`, but those seem to be the hard stress tests: pushing GC to the limits, or doing many threads, etc. Motivational test time improvements below. Before: $ time CONF=linux-x86_64-server-fastdebug make run-test TEST=vmTestbase_vm_metaspace | ts -s ... 00:24:53 ============================== 00:24:53 Test summary 00:24:53 ============================== 00:24:53 TEST TOTAL PASS FAIL ERROR 00:24:53 jtreg:test/hotspot/jtreg:vmTestbase_vm_metaspace 25 25 0 0 00:24:53 ============================== 00:24:53 TEST SUCCESS 00:24:53 00:24:53 Finished building target 'run-test' in configuration 'linux-x86_64-server-fastdebug' real 24m53.389s user 53m2.029s sys 1m1.849s After: $ time CONF=linux-x86_64-server-fastdebug make run-test TEST=vmTestbase_vm_metaspace | ts -s ... 00:04:04 ============================== 00:04:04 Test summary 00:04:04 ============================== 00:04:04 TEST TOTAL PASS FAIL ERROR 00:04:04 jtreg:test/hotspot/jtreg:vmTestbase_vm_metaspace 25 25 0 0 00:04:04 ============================== 00:04:04 TEST SUCCESS 00:04:04 00:04:04 Finished building target 'run-test' in configuration 'linux-x86_64-server-fastdebug' real 4m4.574s user 56m10.582s sys 1m4.725s ------------- Commit messages: - Remove TEST.properties Changes: https://git.openjdk.java.net/jdk/pull/5391/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5391&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273438 Stats: 15 lines in 15 files changed: 0 ins; 15 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5391.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5391/head:pull/5391 PR: https://git.openjdk.java.net/jdk/pull/5391 From simonis at openjdk.java.net Tue Sep 7 15:34:51 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Tue, 7 Sep 2021 15:34:51 GMT Subject: RFR: 8273392: Improve usability of stack-less exceptions due to -XX:+OmitStackTraceInFastThrow Message-ID: If running with `-XX:+OmitStackTraceInFastThrow` (which is the default) C2 will optimize certain "hot" implicit exceptions (i.e. AIOOBE, NullPointerExceptions,..) and replace them by a static, pre-allocated exception without any stacktrace. However, we can actually do better. Instead of using a single, pre-allocated exception object for all methods we can let the compiler allocate specific exceptions for each compilation unit (i.e. nmethod) and fill them with at least one stack frame with the method /line-number information of the currently compiled method. If the method in question is being inlined (which often happens), we can add stackframes for all callers up to the inlining depth of the method in question. For the attached JTreg test, we get the following exception in interpreter mode: java.lang.NullPointerException: Cannot read the array length because "" is null at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) at compiler.exceptions.StackFrameInFastThrow.main(StackFrameInFastThrow.java:233) Once the method gets compiled with `-XX:+OmitStackTraceInFastThrow` the same exception will look as follows: java.lang.NullPointerException After this change, if `StackFrameInFastThrow.throwImplicitException()` will be compiled stand alone, we will get: java.lang.NullPointerException at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) and if `StackFrameInFastThrow.throwImplicitException()` will be inlined into `level2()` and `level2()` into `level1()` we will get the following exception (altough we're still running with `-XX:+OmitStackTraceInFastThrow`): java.lang.NullPointerException at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) The new functionality is guarded by `-XX:+/-StackFrameInFastThrow`, but switched on by default (I'll create a CSR for the new option once reviewers are comfortable with the change). Notice that the optimization comes at no run-time costs because all the extra work will be done at compile time. ## Implementation details - Already the current implementation of `-XX:+OmitStackTraceInFastThrow` potentially lazy-allocates the empty singleton exceptions like AIOOBE in `ciEnv::ArrayStoreException_instance()`. With this change, if running with `-XX:+StackFrameInFastThrow` we will always allocate new exception objects and populate them with the stack frames which are statically available at compile time (see `java_lang_Throwable::fill_in_stack_trace_of_implicit_exception()`). - Because nmethods don't act as strong GC roots, we have to create a global JNI handle for every newly generated exception to prevent GC from collecting them. - In order to avoid a memory leak we have to release these global JNI handles once a nmethod gets unloaded. In order to achieve this, I've added a new section "implicit exceptions" to the nmethod which holds these JNI handles. - While adding the new "implicit exceptions" section to the corresponding stats (`print_nmethod_stats()` and printing routines (`nmethod::print()`) I realized that a previous change ([JDK-8254231: Implementation of Foreign Linker API (Incubator)](https://bugs.openjdk.java.net/browse/JDK-8254231)) had already introduced a new nmethod section ("native invokers") but missed to add it to the corresponding stats and printing routines so I've added that section as well. - The `#ifdef COMPILER2` guards are only required to not break the `zero`/`minimal` builds. - The JTreg test is using `-XX:PerMethodTrapLimit=0` to handle all implicit exceptions as "hot". This makes the test simpler and at the same time provokes the allocation of more implicit exceptions. - Manually verified that the created Exception objects are freed by GC once the corresponding nmethods have been flushed. - Manual "stress" test with a very small heap and continuous recompilation of methods with explicit exceptions to provoke GCs during compilation didn't reveal any issues. ------------- Commit messages: - 8273392: Improve usability of stack-less exceptions due to -XX:+OmitStackTraceInFastThrow Changes: https://git.openjdk.java.net/jdk/pull/5392/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5392&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273392 Stats: 538 lines in 12 files changed: 417 ins; 6 del; 115 mod Patch: https://git.openjdk.java.net/jdk/pull/5392.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5392/head:pull/5392 PR: https://git.openjdk.java.net/jdk/pull/5392 From aph at openjdk.java.net Tue Sep 7 16:47:56 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 7 Sep 2021 16:47:56 GMT Subject: RFR: JDK-8271567: AArch64: AES Galois CounterMode (GCM) interleaved implementation using vector instructions [v3] In-Reply-To: References: Message-ID: > An interleaved version of AES/GCM. > > Performance, now and then: > > > Apple M1, 3.2 GHz: > > Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > AESGCMBench.decrypt 8192 256 avgt 6 3108.881 ? 119.675 ns/op > AESGCMBench.decryptMultiPart 8192 256 avgt 6 3109.685 ? 4.206 ns/op > AESGCMBench.encrypt 8192 256 avgt 6 3122.144 ? 113.379 ns/op > AESGCMBench.encryptMultiPart 8192 256 avgt 6 3119.568 ? 192.877 ns/op > > AESGCMBench.decrypt 8192 256 avgt 6 89123.942 ? 111.977 ns/op > AESGCMBench.decryptMultiPart 8192 256 avgt 6 91034.697 ? 161.469 ns/op > AESGCMBench.encrypt 8192 256 avgt 6 89732.397 ? 106.370 ns/op > AESGCMBench.encryptMultiPart 8192 256 avgt 6 89382.300 ? 139.300 ns/op > > Neoverse N1, 2.5GHz: > > Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > AESGCMBench.decrypt 8192 256 avgt 6 6296.575 ? 37.995 ns/op > AESGCMBench.decryptMultiPart 8192 256 avgt 6 7380.326 ? 10.987 ns/op > AESGCMBench.encrypt 8192 256 avgt 6 6293.090 ? 52.972 ns/op > AESGCMBench.encryptMultiPart 8192 256 avgt 6 6357.536 ? 42.925 ns/op > > AESGCMBench.decrypt 8192 256 avgt 6 48745.085 ? 125.612 ns/op > AESGCMBench.decryptMultiPart 8192 256 avgt 6 45062.599 ? 1548.950 ns/op > AESGCMBench.encrypt 8192 256 avgt 6 42230.857 ? 520.562 ns/op > AESGCMBench.encryptMultiPart 8192 256 avgt 6 45124.171 ? 1417.927 ns/op > > > > A note about the implementation for the reviewers: > > Unrolled and hand-scheduled intrinsics are often written in a way that > I don't find satisfactory. Often they are a conglomeration of > copy-and-paste programming and C macros, which makes them hard to > understand and hard to maintain. I won't name any names, but there are > many exampled to be found in free software across the Internet, > > I spent a while thinking about a structured way to develop and > implement them, and I think I've got something better. The idea is > that you transform a pre-existing implementation into a generator for > the interleaved version. The transformation shouldn't be too hard to > do, but more importantly it should be possible for a reader to verify > that the interleaved and unrolled version performs the same function. > > A generator takes the form of a subclass of `KernelGenerator`. The > core idea is that the programmer defines the base case of the > intrinsic and a method to generate a clone of it, shifted to a > different set of registers. `KernelGenerator` will then generate > several interleaved copies of the function, with each one using a > different set of registers. > > The subclass must implement three methods: `length()`, which is the > number of instruction bundles in the intrinsic, `generate(int n)` > which emits the nth instruction bundle in the intrinsic, and `next()` > which takes an instance of the generator and returns a version of it, > shifted to a new set of registers. > > As an example, here's the inner loop of AES encryption: > > (Some details elided for clarity.) > > > BIND(L_aes_loop); > ld1(v0, T16B, post(from, 16)); > > br(Assembler::CC, L_rounds_44); > br(Assembler::EQ, L_rounds_52); > > aes_round(v0, v17); > aes_round(v0, v18); > BIND(L_rounds_52); > aes_round(v0, v19); > aes_round(v0, v20); > BIND(L_rounds_44); > ... > > > The generator for the unrolled version looks like: > > > virtual void generate(int index) { > switch (index) { > case 0: > ld1(_data, T16B, post(_from, 16)); // get 16 bytes of input > break; > case 1: > if (_once) { > cmpw(_keylen, 52); > br(Assembler::LO, _rounds_44); > br(Assembler::EQ, _rounds_52); > } > break; > case 2: aes_round(_data, _subkeys + 0); break; > case 3: aes_round(_data, _subkeys + 1); break; > case 4: > if (_once) bind(_rounds_52); > break; > case 5: aes_round(_data, _subkeys + 2); break; > case 6: aes_round(_data, _subkeys + 3); break; > case 7: > if (_once) bind(_rounds_44); > break; > ... > > > The job of converting a single inline intrinsic is, as you can see, > not much more than adding a switch statement. Some instructions should > only be emitted once, rather than several times, such as the labels > and branches. (You can use a list of C++ lambdas rather than a switch > statement to do the same thing, very LISP, but that seems a bit of a > sledgehammer. YMMV.) > > I believe that this approach will be more maintainable and easier to > understand than other approaches we've seen. Also, the number of > unrolls is just a number that can be tweaked as required. Andrew Haley has updated the pull request incrementally with two additional commits since the last revision: - Merge branch 'JDK-8271567-aarch64-gcm-rebase' of https://github.com/theRealAph/jdk into JDK-8271567-aarch64-gcm-rebase - Remove VLA in order to appease Microsoft C++ compiler. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5390/files - new: https://git.openjdk.java.net/jdk/pull/5390/files/9692272e..e5ea9b3d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5390&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5390&range=01-02 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5390.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5390/head:pull/5390 PR: https://git.openjdk.java.net/jdk/pull/5390 From ayang at openjdk.java.net Tue Sep 7 17:27:00 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 7 Sep 2021 17:27:00 GMT Subject: RFR: 8273239: Standardize Ticks APIs return type [v2] In-Reply-To: References: Message-ID: > Simple change on return types of Ticks API. > > The call of `milliseconds()` in `spinYield.cpp` seems a bug to me, because the unit in the message is `usecs`. Therefore, I changed it to `microseconds()`. > > Test: tier1 Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: template ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5332/files - new: https://git.openjdk.java.net/jdk/pull/5332/files/faceab9a..2fab83d6 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5332&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5332&range=00-01 Stats: 152 lines in 7 files changed: 75 ins; 53 del; 24 mod Patch: https://git.openjdk.java.net/jdk/pull/5332.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5332/head:pull/5332 PR: https://git.openjdk.java.net/jdk/pull/5332 From ayang at openjdk.java.net Tue Sep 7 17:27:00 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 7 Sep 2021 17:27:00 GMT Subject: RFR: 8273239: Standardize Ticks APIs return type In-Reply-To: References: Message-ID: On Wed, 1 Sep 2021 14:38:52 GMT, Albert Mingkun Yang wrote: > Simple change on return types of Ticks API. > > The call of `milliseconds()` in `spinYield.cpp` seems a bug to me, because the unit in the message is `usecs`. Therefore, I changed it to `microseconds()`. > > Test: tier1 Small variant of option 2: template methods of return-type overloading. The default return-type is the same as before, but callers expecting real number can do sth like `milliseconds()`. ------------- PR: https://git.openjdk.java.net/jdk/pull/5332 From mseledtsov at openjdk.java.net Tue Sep 7 21:17:13 2021 From: mseledtsov at openjdk.java.net (Mikhailo Seledtsov) Date: Tue, 7 Sep 2021 21:17:13 GMT Subject: RFR: 8273318: Some containers/docker/TestJFREvents.java configs are running out of memory In-Reply-To: References: Message-ID: On Fri, 3 Sep 2021 10:41:20 GMT, Aleksey Shipilev wrote: > $ CONF=linux-x86_64-server-fastdebug make run-test TEST=containers/docker/TestJFREvents.java > > STDERR: > stdout: []; > stderr: [WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. > ] > exitValue = 137 > > java.lang.RuntimeException: Expected to get exit value of [0] > > at jdk.test.lib.process.OutputAnalyzer.shouldHaveExitValue(OutputAnalyzer.java:489) > at TestJFREvents.testContainerInfo(TestJFREvents.java:110) > at TestJFREvents.containerInfoTestCase(TestJFREvents.java:89) > at TestJFREvents.main(TestJFREvents.java:74) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:568) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:312) > at java.base/java.lang.Thread.run(Thread.java:833) > > > `exitValue = 137` suggests the container was killed by OOM killer. The failing configuration is with `64m`, and it is apparently too low. > > Additional testing: > - [x] Affected test now passes (5 runs out of 5 tries) > - [x] `containers/docker` tests pass Changes look good to me. ------------- Marked as reviewed by mseledtsov (Committer). PR: https://git.openjdk.java.net/jdk/pull/5359 From coleenp at openjdk.java.net Tue Sep 7 22:43:06 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 7 Sep 2021 22:43:06 GMT Subject: RFR: 8137018: [JVMCI] Encapsulate new Thread fields for JVMCI [v2] In-Reply-To: References: Message-ID: <50eruztdDqli-4Hzttcj13d8Ka_nKYk19bg24--h7fM=.7e8634d8-a64d-4838-8fdc-ea2a7a74799c@github.com> On Fri, 3 Sep 2021 20:02:36 GMT, Tom Rodriguez wrote: >> This evacuates all JVMCI related methods and fields into a separately declared struct. > > Tom Rodriguez has updated the pull request incrementally with one additional commit since the last revision: > > Review cleanups Update looks fine. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5339 From iklam at openjdk.java.net Tue Sep 7 23:32:06 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Tue, 7 Sep 2021 23:32:06 GMT Subject: RFR: 8269537: memset() is called after operator new In-Reply-To: References: Message-ID: On Tue, 7 Sep 2021 12:25:54 GMT, Leo Korinth wrote: > The basic problem is that we are relying on undefined behaviour, as documented in the code: > > // This whole business of passing information from ResourceObj::operator new > // to the ResourceObj constructor via fields in the "object" is technically UB. > // But it seems to work within the limitations of HotSpot usage (such as no > // multiple inheritance) with the compilers and compiler options we're using. > // And it gives some possibly useful checking for misuse of ResourceObj. > > > I am removing the undefined behaviour by passing the type of allocation through a thread local variable. > > This solution has some advantages: > 1) it is not UB > 2) it is simpler and easier to understand > 3) it uses less memory (I could make it use even less if I made the enum `allocation_type` a u8) > 4) in the *very* unlikely situation that stack memory (or embedded) already equals the data calculated from the address of the object, the code will also work. > > When doing the change, I also updated `allocated_on_stack()` to the new name `allocated_on_stack_or_embedded()` which is much harder to misinterpret. > > I also disallow to "fake" the memory type by explicitly calling `ResourceObj::set_allocation_type`. > > This forced me to change two places that is faking the allocation type of an embedded `GrowableArray` from `STACK_OR_EMBEDDED` to `C_HEAP`. The faking of the type is hard to understand as a `STACK_OR_EMBEDDED` `GrowableArray` can allocate any type of object. My guess is that `GrowableArray` has changed behaviour, or maybe that it was hard to understand because the old naming of `allocated_on_stack()`. > > I have also tried to update the comments. In doing that I not only changed the comments for this change, but also for the *incorrect* advice to always delete object you allocate with new. > > Testing on debug build tier1-3 > Testing on release build tier1 Changes requested by iklam (Reviewer). src/hotspot/share/memory/allocation.hpp line 439: > 437: void* operator new(size_t size, const std::nothrow_t& nothrow_constant) throw() { > 438: address res = (address)resource_allocate_bytes(size, AllocFailStrategy::RETURN_NULL); > 439: DEBUG_ONLY(if (res != NULL) _thread_last_allocated = RESOURCE_AREA;) Maybe we should also guard against the possibility of nested allocations, which may trash `_thread_last_allocated`? #define PUSH_RESOURCE_OBJ_ALLOC_TYPE(t) \ assert(_thread_last_allocated == STACK_OR_EMBEDDED, "must not be nested"); \ DEBUG_ONLY(_thread_last_allocated = t); \ ... if (res != NULL) { PUSH_RESOURCE_OBJ_ALLOC_TYPE(RESOURCE_AREA); } Similarly, the `ResourceObj` constructor should use a corresponding `POP_RESOURCE_OBJ_ALLOC_TYPE` macro. ------------- PR: https://git.openjdk.java.net/jdk/pull/5387 From dholmes at openjdk.java.net Wed Sep 8 00:19:10 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 8 Sep 2021 00:19:10 GMT Subject: RFR: 8273314: Add tier4 test groups [v3] In-Reply-To: References: <6rDioD5KZREYHyzOXol72O0erNDqKHUqb-0k1SwhInA=.90a12492-9e2f-4f0c-bf09-8d67d1d6111f@github.com> Message-ID: On Mon, 6 Sep 2021 13:22:03 GMT, Aleksey Shipilev wrote: >> During the review of JDK-8272914 that added hotspot:tier{2,3} groups, @iignatev suggested to create tier4 groups that capture all tests not in tiers{1,2,3}. I have excluded `vmTestbase` and `hotspot:tier4,` because they take 10+ hours on my highly parallel machine. I have also excluded `applications` from `hotspot:tier4`, because they require test dependencies (e.g. jcstress). >> >> Sample run: >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >>>> jtreg:test/hotspot/jtreg:tier4 426 425 1 0 << >>>> jtreg:test/jdk:tier4 2891 2885 4 2 << >> jtreg:test/langtools:tier4 0 0 0 0 >> jtreg:test/jaxp:tier4 0 0 0 0 >> ============================== >> >> real 64m13.994s >> user 1462m1.213s >> sys 39m38.032s >> >> >> There are interesting test failures on my machine, which I would address separately. > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Drop applications and fix the comment Hi Aleksey, I've discussed this with Igor and while I don't agree with the rationale I won't "block it". Cheers, David ------------- PR: https://git.openjdk.java.net/jdk/pull/5357 From svkamath at openjdk.java.net Wed Sep 8 00:26:25 2021 From: svkamath at openjdk.java.net (Smita Kamath) Date: Wed, 8 Sep 2021 00:26:25 GMT Subject: RFR: 8273297: AES/GCM non-AVX512+VAES CPUs suffer after 8267125 Message-ID: Performance dropped up to 10% for 1k data after 8267125 for CPUs that do not support the new intrinsic. Tests run were crypto.full.AESGCMBench and crypto.full.AESGCMByteBuffer from the jmh micro benchmarks. The problem is each instance of GHASH allocates 96 extra longs for the AVX512+VAES intrinsic regardless if the intrinsic is used. This extra table space should be allocated differently so that non-supporting CPUs do not suffer this penalty. This issue also affects non-Intel CPUs too. ------------- Commit messages: - Merge master - JDK 8273297: AES/GCM non AVX512 + VAES CPU's suffer after 8267125 - changes to make sure that ghash_long_swap_mask and counter_mask_addr calls are not duplicated - Merge branch 'master' of https://git.openjdk.java.net/jdk into aes-gcm - Moved declaration in vmStructs.cpp to other AESCrypt declarations - comment update - rewiew update - Merge branch 'aes-gcm' of github.com:smita-kamath/jdk into aes-gcm - changed file property of GaloisCounterMode.java - Merge branch 'master' of https://git.openjdk.java.net/jdk into aes-gcm - ... and 10 more: https://git.openjdk.java.net/jdk/compare/d6d6c069...4628dc3a Changes: https://git.openjdk.java.net/jdk/pull/5402/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5402&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273297 Stats: 66 lines in 9 files changed: 18 ins; 2 del; 46 mod Patch: https://git.openjdk.java.net/jdk/pull/5402.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5402/head:pull/5402 PR: https://git.openjdk.java.net/jdk/pull/5402 From dcubed at openjdk.java.net Wed Sep 8 03:36:08 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Wed, 8 Sep 2021 03:36:08 GMT Subject: RFR: 8265489: Stress test times out because of long ObjectSynchronizer::monitors_iterate(...) operation In-Reply-To: References: Message-ID: On Fri, 3 Sep 2021 01:26:01 GMT, Daniel D. Daugherty wrote: >> monitors_iterate make several checks which often are true before filter monitor by a thread. It might take a lot of time when there are a lot of threads. So it makes sense to first check thread and only then other conditions. > > src/hotspot/share/runtime/synchronizer.cpp line 981: > >> 979: if (mid->owner() != thread) { >> 980: return; >> 981: } > > The `iter` is processing the in-use-list and you're bailing the iteration > when you run into an ObjectMonitor that is not owned by `thread`, but > that doesn't mean that there's not an ObjectMonitor owned by `thread` > later on in the in-use-list. > > So I could see you doing a `continue` here, but not a `return`. Thanks for resolving the above comment. ------------- PR: https://git.openjdk.java.net/jdk/pull/5194 From dcubed at openjdk.java.net Wed Sep 8 03:41:04 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Wed, 8 Sep 2021 03:41:04 GMT Subject: RFR: 8265489: Stress test times out because of long ObjectSynchronizer::monitors_iterate(...) operation In-Reply-To: References: Message-ID: On Thu, 19 Aug 2021 21:18:53 GMT, Leonid Mesnik wrote: > monitors_iterate make several checks which often are true before filter monitor by a thread. It might take a lot of time when there are a lot of threads. So it makes sense to first check thread and only then other conditions. Moving the thread check from the closure's do_monitor() call into monitors_iterate() as early as possible is a good idea. Do you have any measurements to show how much this helps? I'm okay if you don't and I'd be happy waiting to see if it makes a difference with some of those Tier8 timeouts... ------------- Marked as reviewed by dcubed (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5194 From never at openjdk.java.net Wed Sep 8 05:29:34 2021 From: never at openjdk.java.net (Tom Rodriguez) Date: Wed, 8 Sep 2021 05:29:34 GMT Subject: RFR: 8137018: [JVMCI] Encapsulate new Thread fields for JVMCI [v3] In-Reply-To: References: Message-ID: > This evacuates all JVMCI related methods and fields into a separately declared struct. Tom Rodriguez has updated the pull request incrementally with one additional commit since the last revision: Remove extra space ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5339/files - new: https://git.openjdk.java.net/jdk/pull/5339/files/9af54d4b..1c3433a4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5339&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5339&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5339.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5339/head:pull/5339 PR: https://git.openjdk.java.net/jdk/pull/5339 From shade at openjdk.java.net Wed Sep 8 07:47:12 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 8 Sep 2021 07:47:12 GMT Subject: RFR: 8273318: Some containers/docker/TestJFREvents.java configs are running out of memory In-Reply-To: References: Message-ID: <1TGdX3pQ9g4VZmEWbh0DMjkFSrpGpvqSe1d2Gr1AoRo=.da9d7f0c-0faf-4732-ab16-b06a01270afd@github.com> On Fri, 3 Sep 2021 10:41:20 GMT, Aleksey Shipilev wrote: > $ CONF=linux-x86_64-server-fastdebug make run-test TEST=containers/docker/TestJFREvents.java > > STDERR: > stdout: []; > stderr: [WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. > ] > exitValue = 137 > > java.lang.RuntimeException: Expected to get exit value of [0] > > at jdk.test.lib.process.OutputAnalyzer.shouldHaveExitValue(OutputAnalyzer.java:489) > at TestJFREvents.testContainerInfo(TestJFREvents.java:110) > at TestJFREvents.containerInfoTestCase(TestJFREvents.java:89) > at TestJFREvents.main(TestJFREvents.java:74) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:568) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:312) > at java.base/java.lang.Thread.run(Thread.java:833) > > > `exitValue = 137` suggests the container was killed by OOM killer. The failing configuration is with `64m`, and it is apparently too low. > > Additional testing: > - [x] Affected test now passes (5 runs out of 5 tries) > - [x] `containers/docker` tests pass Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/5359 From shade at openjdk.java.net Wed Sep 8 07:47:13 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 8 Sep 2021 07:47:13 GMT Subject: Integrated: 8273318: Some containers/docker/TestJFREvents.java configs are running out of memory In-Reply-To: References: Message-ID: On Fri, 3 Sep 2021 10:41:20 GMT, Aleksey Shipilev wrote: > $ CONF=linux-x86_64-server-fastdebug make run-test TEST=containers/docker/TestJFREvents.java > > STDERR: > stdout: []; > stderr: [WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. > ] > exitValue = 137 > > java.lang.RuntimeException: Expected to get exit value of [0] > > at jdk.test.lib.process.OutputAnalyzer.shouldHaveExitValue(OutputAnalyzer.java:489) > at TestJFREvents.testContainerInfo(TestJFREvents.java:110) > at TestJFREvents.containerInfoTestCase(TestJFREvents.java:89) > at TestJFREvents.main(TestJFREvents.java:74) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:568) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:312) > at java.base/java.lang.Thread.run(Thread.java:833) > > > `exitValue = 137` suggests the container was killed by OOM killer. The failing configuration is with `64m`, and it is apparently too low. > > Additional testing: > - [x] Affected test now passes (5 runs out of 5 tries) > - [x] `containers/docker` tests pass This pull request has now been integrated. Changeset: 7d24a334 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/7d24a3342129d4c71fad0d8d50d20758291d64b7 Stats: 7 lines in 1 file changed: 1 ins; 0 del; 6 mod 8273318: Some containers/docker/TestJFREvents.java configs are running out of memory Reviewed-by: ngasson, sgehwolf, mseledtsov ------------- PR: https://git.openjdk.java.net/jdk/pull/5359 From aph at openjdk.java.net Wed Sep 8 09:01:34 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 8 Sep 2021 09:01:34 GMT Subject: RFR: JDK-8271567: AArch64: AES Galois CounterMode (GCM) interleaved implementation using vector instructions [v4] In-Reply-To: References: Message-ID: > An interleaved version of AES/GCM. > > Performance, now and then: > > > Apple M1, 3.2 GHz: > > Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > AESGCMBench.decrypt 8192 256 avgt 6 3108.881 ? 119.675 ns/op > AESGCMBench.decryptMultiPart 8192 256 avgt 6 3109.685 ? 4.206 ns/op > AESGCMBench.encrypt 8192 256 avgt 6 3122.144 ? 113.379 ns/op > AESGCMBench.encryptMultiPart 8192 256 avgt 6 3119.568 ? 192.877 ns/op > > AESGCMBench.decrypt 8192 256 avgt 6 89123.942 ? 111.977 ns/op > AESGCMBench.decryptMultiPart 8192 256 avgt 6 91034.697 ? 161.469 ns/op > AESGCMBench.encrypt 8192 256 avgt 6 89732.397 ? 106.370 ns/op > AESGCMBench.encryptMultiPart 8192 256 avgt 6 89382.300 ? 139.300 ns/op > > Neoverse N1, 2.5GHz: > > Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > AESGCMBench.decrypt 8192 256 avgt 6 6296.575 ? 37.995 ns/op > AESGCMBench.decryptMultiPart 8192 256 avgt 6 7380.326 ? 10.987 ns/op > AESGCMBench.encrypt 8192 256 avgt 6 6293.090 ? 52.972 ns/op > AESGCMBench.encryptMultiPart 8192 256 avgt 6 6357.536 ? 42.925 ns/op > > AESGCMBench.decrypt 8192 256 avgt 6 48745.085 ? 125.612 ns/op > AESGCMBench.decryptMultiPart 8192 256 avgt 6 45062.599 ? 1548.950 ns/op > AESGCMBench.encrypt 8192 256 avgt 6 42230.857 ? 520.562 ns/op > AESGCMBench.encryptMultiPart 8192 256 avgt 6 45124.171 ? 1417.927 ns/op > > > > A note about the implementation for the reviewers: > > Unrolled and hand-scheduled intrinsics are often written in a way that > I don't find satisfactory. Often they are a conglomeration of > copy-and-paste programming and C macros, which makes them hard to > understand and hard to maintain. I won't name any names, but there are > many exampled to be found in free software across the Internet, > > I spent a while thinking about a structured way to develop and > implement them, and I think I've got something better. The idea is > that you transform a pre-existing implementation into a generator for > the interleaved version. The transformation shouldn't be too hard to > do, but more importantly it should be possible for a reader to verify > that the interleaved and unrolled version performs the same function. > > A generator takes the form of a subclass of `KernelGenerator`. The > core idea is that the programmer defines the base case of the > intrinsic and a method to generate a clone of it, shifted to a > different set of registers. `KernelGenerator` will then generate > several interleaved copies of the function, with each one using a > different set of registers. > > The subclass must implement three methods: `length()`, which is the > number of instruction bundles in the intrinsic, `generate(int n)` > which emits the nth instruction bundle in the intrinsic, and `next()` > which takes an instance of the generator and returns a version of it, > shifted to a new set of registers. > > As an example, here's the inner loop of AES encryption: > > (Some details elided for clarity.) > > > BIND(L_aes_loop); > ld1(v0, T16B, post(from, 16)); > > br(Assembler::CC, L_rounds_44); > br(Assembler::EQ, L_rounds_52); > > aes_round(v0, v17); > aes_round(v0, v18); > BIND(L_rounds_52); > aes_round(v0, v19); > aes_round(v0, v20); > BIND(L_rounds_44); > ... > > > The generator for the unrolled version looks like: > > > virtual void generate(int index) { > switch (index) { > case 0: > ld1(_data, T16B, post(_from, 16)); // get 16 bytes of input > break; > case 1: > if (_once) { > cmpw(_keylen, 52); > br(Assembler::LO, _rounds_44); > br(Assembler::EQ, _rounds_52); > } > break; > case 2: aes_round(_data, _subkeys + 0); break; > case 3: aes_round(_data, _subkeys + 1); break; > case 4: > if (_once) bind(_rounds_52); > break; > case 5: aes_round(_data, _subkeys + 2); break; > case 6: aes_round(_data, _subkeys + 3); break; > case 7: > if (_once) bind(_rounds_44); > break; > ... > > > The job of converting a single inline intrinsic is, as you can see, > not much more than adding a switch statement. Some instructions should > only be emitted once, rather than several times, such as the labels > and branches. (You can use a list of C++ lambdas rather than a switch > statement to do the same thing, very LISP, but that seems a bit of a > sledgehammer. YMMV.) > > I believe that this approach will be more maintainable and easier to > understand than other approaches we've seen. Also, the number of > unrolls is just a number that can be tweaked as required. Andrew Haley has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Fix includes ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5390/files - new: https://git.openjdk.java.net/jdk/pull/5390/files/e5ea9b3d..fd052771 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5390&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5390&range=02-03 Stats: 1448 lines in 1 file changed: 0 ins; 1448 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5390.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5390/head:pull/5390 PR: https://git.openjdk.java.net/jdk/pull/5390 From aph at openjdk.java.net Wed Sep 8 09:16:42 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 8 Sep 2021 09:16:42 GMT Subject: RFR: JDK-8271567: AArch64: AES Galois CounterMode (GCM) interleaved implementation using vector instructions [v5] In-Reply-To: References: Message-ID: > An interleaved version of AES/GCM. > > Performance, now and then: > > > Apple M1, 3.2 GHz: > > Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > AESGCMBench.decrypt 8192 256 avgt 6 3108.881 ? 119.675 ns/op > AESGCMBench.decryptMultiPart 8192 256 avgt 6 3109.685 ? 4.206 ns/op > AESGCMBench.encrypt 8192 256 avgt 6 3122.144 ? 113.379 ns/op > AESGCMBench.encryptMultiPart 8192 256 avgt 6 3119.568 ? 192.877 ns/op > > AESGCMBench.decrypt 8192 256 avgt 6 89123.942 ? 111.977 ns/op > AESGCMBench.decryptMultiPart 8192 256 avgt 6 91034.697 ? 161.469 ns/op > AESGCMBench.encrypt 8192 256 avgt 6 89732.397 ? 106.370 ns/op > AESGCMBench.encryptMultiPart 8192 256 avgt 6 89382.300 ? 139.300 ns/op > > Neoverse N1, 2.5GHz: > > Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > AESGCMBench.decrypt 8192 256 avgt 6 6296.575 ? 37.995 ns/op > AESGCMBench.decryptMultiPart 8192 256 avgt 6 7380.326 ? 10.987 ns/op > AESGCMBench.encrypt 8192 256 avgt 6 6293.090 ? 52.972 ns/op > AESGCMBench.encryptMultiPart 8192 256 avgt 6 6357.536 ? 42.925 ns/op > > AESGCMBench.decrypt 8192 256 avgt 6 48745.085 ? 125.612 ns/op > AESGCMBench.decryptMultiPart 8192 256 avgt 6 45062.599 ? 1548.950 ns/op > AESGCMBench.encrypt 8192 256 avgt 6 42230.857 ? 520.562 ns/op > AESGCMBench.encryptMultiPart 8192 256 avgt 6 45124.171 ? 1417.927 ns/op > > > > A note about the implementation for the reviewers: > > Unrolled and hand-scheduled intrinsics are often written in a way that > I don't find satisfactory. Often they are a conglomeration of > copy-and-paste programming and C macros, which makes them hard to > understand and hard to maintain. I won't name any names, but there are > many exampled to be found in free software across the Internet, > > I spent a while thinking about a structured way to develop and > implement them, and I think I've got something better. The idea is > that you transform a pre-existing implementation into a generator for > the interleaved version. The transformation shouldn't be too hard to > do, but more importantly it should be possible for a reader to verify > that the interleaved and unrolled version performs the same function. > > A generator takes the form of a subclass of `KernelGenerator`. The > core idea is that the programmer defines the base case of the > intrinsic and a method to generate a clone of it, shifted to a > different set of registers. `KernelGenerator` will then generate > several interleaved copies of the function, with each one using a > different set of registers. > > The subclass must implement three methods: `length()`, which is the > number of instruction bundles in the intrinsic, `generate(int n)` > which emits the nth instruction bundle in the intrinsic, and `next()` > which takes an instance of the generator and returns a version of it, > shifted to a new set of registers. > > As an example, here's the inner loop of AES encryption: > > (Some details elided for clarity.) > > > BIND(L_aes_loop); > ld1(v0, T16B, post(from, 16)); > > br(Assembler::CC, L_rounds_44); > br(Assembler::EQ, L_rounds_52); > > aes_round(v0, v17); > aes_round(v0, v18); > BIND(L_rounds_52); > aes_round(v0, v19); > aes_round(v0, v20); > BIND(L_rounds_44); > ... > > > The generator for the unrolled version looks like: > > > virtual void generate(int index) { > switch (index) { > case 0: > ld1(_data, T16B, post(_from, 16)); // get 16 bytes of input > break; > case 1: > if (_once) { > cmpw(_keylen, 52); > br(Assembler::LO, _rounds_44); > br(Assembler::EQ, _rounds_52); > } > break; > case 2: aes_round(_data, _subkeys + 0); break; > case 3: aes_round(_data, _subkeys + 1); break; > case 4: > if (_once) bind(_rounds_52); > break; > case 5: aes_round(_data, _subkeys + 2); break; > case 6: aes_round(_data, _subkeys + 3); break; > case 7: > if (_once) bind(_rounds_44); > break; > ... > > > The job of converting a single inline intrinsic is, as you can see, > not much more than adding a switch statement. Some instructions should > only be emitted once, rather than several times, such as the labels > and branches. (You can use a list of C++ lambdas rather than a switch > statement to do the same thing, very LISP, but that seems a bit of a > sledgehammer. YMMV.) > > I believe that this approach will be more maintainable and easier to > understand than other approaches we've seen. Also, the number of > unrolls is just a number that can be tweaked as required. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Fix rebase ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5390/files - new: https://git.openjdk.java.net/jdk/pull/5390/files/fd052771..ffc23ae2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5390&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5390&range=03-04 Stats: 1448 lines in 1 file changed: 1448 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5390.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5390/head:pull/5390 PR: https://git.openjdk.java.net/jdk/pull/5390 From aph at openjdk.java.net Wed Sep 8 09:54:46 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 8 Sep 2021 09:54:46 GMT Subject: RFR: JDK-8271567: AArch64: AES Galois CounterMode (GCM) interleaved implementation using vector instructions [v6] In-Reply-To: References: Message-ID: > An interleaved version of AES/GCM. > > Performance, now and then: > > > Apple M1, 3.2 GHz: > > Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > AESGCMBench.decrypt 8192 256 avgt 6 3108.881 ? 119.675 ns/op > AESGCMBench.decryptMultiPart 8192 256 avgt 6 3109.685 ? 4.206 ns/op > AESGCMBench.encrypt 8192 256 avgt 6 3122.144 ? 113.379 ns/op > AESGCMBench.encryptMultiPart 8192 256 avgt 6 3119.568 ? 192.877 ns/op > > AESGCMBench.decrypt 8192 256 avgt 6 89123.942 ? 111.977 ns/op > AESGCMBench.decryptMultiPart 8192 256 avgt 6 91034.697 ? 161.469 ns/op > AESGCMBench.encrypt 8192 256 avgt 6 89732.397 ? 106.370 ns/op > AESGCMBench.encryptMultiPart 8192 256 avgt 6 89382.300 ? 139.300 ns/op > > Neoverse N1, 2.5GHz: > > Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > AESGCMBench.decrypt 8192 256 avgt 6 6296.575 ? 37.995 ns/op > AESGCMBench.decryptMultiPart 8192 256 avgt 6 7380.326 ? 10.987 ns/op > AESGCMBench.encrypt 8192 256 avgt 6 6293.090 ? 52.972 ns/op > AESGCMBench.encryptMultiPart 8192 256 avgt 6 6357.536 ? 42.925 ns/op > > AESGCMBench.decrypt 8192 256 avgt 6 48745.085 ? 125.612 ns/op > AESGCMBench.decryptMultiPart 8192 256 avgt 6 45062.599 ? 1548.950 ns/op > AESGCMBench.encrypt 8192 256 avgt 6 42230.857 ? 520.562 ns/op > AESGCMBench.encryptMultiPart 8192 256 avgt 6 45124.171 ? 1417.927 ns/op > > > > A note about the implementation for the reviewers: > > Unrolled and hand-scheduled intrinsics are often written in a way that > I don't find satisfactory. Often they are a conglomeration of > copy-and-paste programming and C macros, which makes them hard to > understand and hard to maintain. I won't name any names, but there are > many exampled to be found in free software across the Internet, > > I spent a while thinking about a structured way to develop and > implement them, and I think I've got something better. The idea is > that you transform a pre-existing implementation into a generator for > the interleaved version. The transformation shouldn't be too hard to > do, but more importantly it should be possible for a reader to verify > that the interleaved and unrolled version performs the same function. > > A generator takes the form of a subclass of `KernelGenerator`. The > core idea is that the programmer defines the base case of the > intrinsic and a method to generate a clone of it, shifted to a > different set of registers. `KernelGenerator` will then generate > several interleaved copies of the function, with each one using a > different set of registers. > > The subclass must implement three methods: `length()`, which is the > number of instruction bundles in the intrinsic, `generate(int n)` > which emits the nth instruction bundle in the intrinsic, and `next()` > which takes an instance of the generator and returns a version of it, > shifted to a new set of registers. > > As an example, here's the inner loop of AES encryption: > > (Some details elided for clarity.) > > > BIND(L_aes_loop); > ld1(v0, T16B, post(from, 16)); > > br(Assembler::CC, L_rounds_44); > br(Assembler::EQ, L_rounds_52); > > aes_round(v0, v17); > aes_round(v0, v18); > BIND(L_rounds_52); > aes_round(v0, v19); > aes_round(v0, v20); > BIND(L_rounds_44); > ... > > > The generator for the unrolled version looks like: > > > virtual void generate(int index) { > switch (index) { > case 0: > ld1(_data, T16B, post(_from, 16)); // get 16 bytes of input > break; > case 1: > if (_once) { > cmpw(_keylen, 52); > br(Assembler::LO, _rounds_44); > br(Assembler::EQ, _rounds_52); > } > break; > case 2: aes_round(_data, _subkeys + 0); break; > case 3: aes_round(_data, _subkeys + 1); break; > case 4: > if (_once) bind(_rounds_52); > break; > case 5: aes_round(_data, _subkeys + 2); break; > case 6: aes_round(_data, _subkeys + 3); break; > case 7: > if (_once) bind(_rounds_44); > break; > ... > > > The job of converting a single inline intrinsic is, as you can see, > not much more than adding a switch statement. Some instructions should > only be emitted once, rather than several times, such as the labels > and branches. (You can use a list of C++ lambdas rather than a switch > statement to do the same thing, very LISP, but that seems a bit of a > sledgehammer. YMMV.) > > I believe that this approach will be more maintainable and easier to > understand than other approaches we've seen. Also, the number of > unrolls is just a number that can be tweaked as required. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Clean up ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5390/files - new: https://git.openjdk.java.net/jdk/pull/5390/files/ffc23ae2..9fc11725 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5390&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5390&range=04-05 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/5390.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5390/head:pull/5390 PR: https://git.openjdk.java.net/jdk/pull/5390 From shade at openjdk.java.net Wed Sep 8 10:17:13 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 8 Sep 2021 10:17:13 GMT Subject: RFR: 8273483: Zero: Clear pending JNI exception check in native method handler Message-ID: If you run Zero with existing tier1 test, then it would fail like this: $ CONF=linux-x86_64-zero-fastdebug make exploded-test TEST=runtime/jni/checked/TestCheckedJniExceptionCheck.java stdout: [TEST STARTED testSingleCallNoCheck start WARNING in native method: JNI call made without checking exceptions when required to from CallVoidMethod at java.lang.Object.getClass(java.base/Native Method) at java.io.PrintStream.println(java.base/PrintStream.java:1035) at TestCheckedJniExceptionCheck.testSingleCallNoCheck(TestCheckedJniExceptionCheck.java:82) at TestCheckedJniExceptionCheck.test(TestCheckedJniExceptionCheck.java:66) at TestCheckedJniExceptionCheck.main(TestCheckedJniExceptionCheck.java:203) testSingleCallNoCheck end In other words, there is a warning from the native call to Object.getClass from the test println itself, which it does not expect. This is because Zero does not clear the pending JNI exception check flag. All other (template) interpreter implementation do clear it in native call handlers. So the test rightfully reports the excess warning. Additional testing: - [x] Linux x86_64 Zero, `runtime/jni` tests now pass ------------- Commit messages: - Fix Changes: https://git.openjdk.java.net/jdk/pull/5411/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5411&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273483 Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5411.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5411/head:pull/5411 PR: https://git.openjdk.java.net/jdk/pull/5411 From aph at openjdk.java.net Wed Sep 8 10:30:09 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 8 Sep 2021 10:30:09 GMT Subject: RFR: 8273483: Zero: Clear pending JNI exception check in native method handler In-Reply-To: References: Message-ID: On Wed, 8 Sep 2021 10:08:50 GMT, Aleksey Shipilev wrote: > If you run Zero with existing tier1 test, then it would fail like this: > > > $ CONF=linux-x86_64-zero-fastdebug make exploded-test TEST=runtime/jni/checked/TestCheckedJniExceptionCheck.java > > stdout: [TEST STARTED > testSingleCallNoCheck start > WARNING in native method: JNI call made without checking exceptions when required to from CallVoidMethod > at java.lang.Object.getClass(java.base/Native Method) > at java.io.PrintStream.println(java.base/PrintStream.java:1035) > at TestCheckedJniExceptionCheck.testSingleCallNoCheck(TestCheckedJniExceptionCheck.java:82) > at TestCheckedJniExceptionCheck.test(TestCheckedJniExceptionCheck.java:66) > at TestCheckedJniExceptionCheck.main(TestCheckedJniExceptionCheck.java:203) > testSingleCallNoCheck end > > > In other words, there is a warning from the native call to Object.getClass from the test println itself, which it does not expect. This is because Zero does not clear the pending JNI exception check flag. All other (template) interpreter implementation do clear it in native call handlers. So the test rightfully reports the excess warning. > > Additional testing: > - [x] Linux x86_64 Zero, `runtime/jni` tests now pass Marked as reviewed by aph (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5411 From shade at openjdk.java.net Wed Sep 8 10:48:17 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 8 Sep 2021 10:48:17 GMT Subject: RFR: 8273486: Zero: Handle DiagnoseSyncOnValueBasedClasses VM option Message-ID: JDK-8257027 added a diagnostic option to check for synchronization on value-based classes. Zero does not support it, so it would fail the relevant test: $ CONF=linux-x86_64-zero-fastdebug make exploded-test TEST=runtime/Monitor/SyncOnValueBasedClassTest.java STDERR: stdout: []; stderr: [Exception in thread "main" java.lang.RuntimeException: synchronization on value based class did not fail at SyncOnValueBasedClassTest$FatalTest.main(SyncOnValueBasedClassTest.java:128) ] exitValue = 1 java.lang.RuntimeException: 'fatal error: Synchronizing on object' missing from stdout/stderr Template interpreters implement this check by going to to slowpath that calls `InterpreterRuntime::monitorenter`. Zero already goes to that path when `UseHeavyMonitors` is enabled, so we might just enable it when lock diagnostics is requested. This would cost us zero (pun intended) when diagnostic option is disabled. Additional testing: - [x] Linux x86_64 Zero, affected test now passes ------------- Commit messages: - Fix Changes: https://git.openjdk.java.net/jdk/pull/5412/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5412&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273486 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5412.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5412/head:pull/5412 PR: https://git.openjdk.java.net/jdk/pull/5412 From lkorinth at openjdk.java.net Wed Sep 8 11:27:37 2021 From: lkorinth at openjdk.java.net (Leo Korinth) Date: Wed, 8 Sep 2021 11:27:37 GMT Subject: RFR: 8269537: memset() is called after operator new [v2] In-Reply-To: References: Message-ID: <46QUL4v_Nlhi-YNhkjeNQwjnnVTQqjZtYFtHvYrSork=.70f6d52d-e873-4bb3-85a2-393596235cfc@github.com> > The basic problem is that we are relying on undefined behaviour, as documented in the code: > > // This whole business of passing information from ResourceObj::operator new > // to the ResourceObj constructor via fields in the "object" is technically UB. > // But it seems to work within the limitations of HotSpot usage (such as no > // multiple inheritance) with the compilers and compiler options we're using. > // And it gives some possibly useful checking for misuse of ResourceObj. > > > I am removing the undefined behaviour by passing the type of allocation through a thread local variable. > > This solution has some advantages: > 1) it is not UB > 2) it is simpler and easier to understand > 3) it uses less memory (I could make it use even less if I made the enum `allocation_type` a u8) > 4) in the *very* unlikely situation that stack memory (or embedded) already equals the data calculated from the address of the object, the code will also work. > > When doing the change, I also updated `allocated_on_stack()` to the new name `allocated_on_stack_or_embedded()` which is much harder to misinterpret. > > I also disallow to "fake" the memory type by explicitly calling `ResourceObj::set_allocation_type`. > > This forced me to change two places that is faking the allocation type of an embedded `GrowableArray` from `STACK_OR_EMBEDDED` to `C_HEAP`. The faking of the type is hard to understand as a `STACK_OR_EMBEDDED` `GrowableArray` can allocate any type of object. My guess is that `GrowableArray` has changed behaviour, or maybe that it was hard to understand because the old naming of `allocated_on_stack()`. > > I have also tried to update the comments. In doing that I not only changed the comments for this change, but also for the *incorrect* advice to always delete object you allocate with new. > > Testing on debug build tier1-3 > Testing on release build tier1 Leo Korinth has updated the pull request incrementally with one additional commit since the last revision: First update * Change backing type of ResourceObj::allocation_type to be u8. Also remove no longer needed mask and explicit zero value of STACK_OR_EMBEDDED value. * Now setting allocation type with set_type() with assert. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5387/files - new: https://git.openjdk.java.net/jdk/pull/5387/files/31633583..d8acedb0 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5387&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5387&range=00-01 Stats: 15 lines in 2 files changed: 7 ins; 0 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/5387.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5387/head:pull/5387 PR: https://git.openjdk.java.net/jdk/pull/5387 From lkorinth at openjdk.java.net Wed Sep 8 11:37:14 2021 From: lkorinth at openjdk.java.net (Leo Korinth) Date: Wed, 8 Sep 2021 11:37:14 GMT Subject: RFR: 8269537: memset() is called after operator new [v2] In-Reply-To: References: Message-ID: On Tue, 7 Sep 2021 23:29:10 GMT, Ioi Lam wrote: >> Leo Korinth has updated the pull request incrementally with one additional commit since the last revision: >> >> First update >> >> * Change backing type of ResourceObj::allocation_type to be u8. Also remove no longer needed mask and explicit zero value of STACK_OR_EMBEDDED value. >> >> * Now setting allocation type with set_type() with assert. > > src/hotspot/share/memory/allocation.hpp line 439: > >> 437: void* operator new(size_t size, const std::nothrow_t& nothrow_constant) throw() { >> 438: address res = (address)resource_allocate_bytes(size, AllocFailStrategy::RETURN_NULL); >> 439: DEBUG_ONLY(if (res != NULL) _thread_last_allocated = RESOURCE_AREA;) > > Maybe we should also guard against the possibility of nested allocations, which may trash `_thread_last_allocated`? > > > #define PUSH_RESOURCE_OBJ_ALLOC_TYPE(t) \ > assert(_thread_last_allocated == STACK_OR_EMBEDDED, "must not be nested"); \ > DEBUG_ONLY(_thread_last_allocated = t); \ > > ... > if (res != NULL) { > PUSH_RESOURCE_OBJ_ALLOC_TYPE(RESOURCE_AREA); > } > > > Similarly, the `ResourceObj` constructor should use a corresponding `POP_RESOURCE_OBJ_ALLOC_TYPE` macro. I added a `set_type` method that ensures that the `_thread_last_allocated` always transition over a `STACK_OR_EMBEDDED`. I did *not* create a PUSH/POP macro pair because i believe it would give the false impression that we are doing a stack operation. Other than that I also made `allocation_type` use a `u8` as backing type. I also removed the now unused `allocation_mask` and the now unimportant detail that `STACK_OR_EMBEDDED = 0`. ------------- PR: https://git.openjdk.java.net/jdk/pull/5387 From lkorinth at openjdk.java.net Wed Sep 8 11:43:05 2021 From: lkorinth at openjdk.java.net (Leo Korinth) Date: Wed, 8 Sep 2021 11:43:05 GMT Subject: RFR: 8269537: memset() is called after operator new [v2] In-Reply-To: References: Message-ID: On Wed, 8 Sep 2021 11:34:21 GMT, Leo Korinth wrote: >> src/hotspot/share/memory/allocation.hpp line 439: >> >>> 437: void* operator new(size_t size, const std::nothrow_t& nothrow_constant) throw() { >>> 438: address res = (address)resource_allocate_bytes(size, AllocFailStrategy::RETURN_NULL); >>> 439: DEBUG_ONLY(if (res != NULL) _thread_last_allocated = RESOURCE_AREA;) >> >> Maybe we should also guard against the possibility of nested allocations, which may trash `_thread_last_allocated`? >> >> >> #define PUSH_RESOURCE_OBJ_ALLOC_TYPE(t) \ >> assert(_thread_last_allocated == STACK_OR_EMBEDDED, "must not be nested"); \ >> DEBUG_ONLY(_thread_last_allocated = t); \ >> >> ... >> if (res != NULL) { >> PUSH_RESOURCE_OBJ_ALLOC_TYPE(RESOURCE_AREA); >> } >> >> >> Similarly, the `ResourceObj` constructor should use a corresponding `POP_RESOURCE_OBJ_ALLOC_TYPE` macro. > > I added a `set_type` method that ensures that the `_thread_last_allocated` always transition over a `STACK_OR_EMBEDDED`. I did *not* create a PUSH/POP macro pair because i believe it would give the false impression that we are doing a stack operation. > > Other than that I also made `allocation_type` use a `u8` as backing type. I also removed the now unused `allocation_mask` and the now unimportant detail that `STACK_OR_EMBEDDED = 0`. Hmm, u8 was not what I was thinking, I will change that to a uint8_t in the next update... ------------- PR: https://git.openjdk.java.net/jdk/pull/5387 From lkorinth at openjdk.java.net Wed Sep 8 13:00:06 2021 From: lkorinth at openjdk.java.net (Leo Korinth) Date: Wed, 8 Sep 2021 13:00:06 GMT Subject: RFR: 8269537: memset() is called after operator new [v2] In-Reply-To: References: Message-ID: On Wed, 8 Sep 2021 11:40:25 GMT, Leo Korinth wrote: >> I added a `set_type` method that ensures that the `_thread_last_allocated` always transition over a `STACK_OR_EMBEDDED`. I did *not* create a PUSH/POP macro pair because i believe it would give the false impression that we are doing a stack operation. >> >> Other than that I also made `allocation_type` use a `u8` as backing type. I also removed the now unused `allocation_mask` and the now unimportant detail that `STACK_OR_EMBEDDED = 0`. > > Hmm, u8 was not what I was thinking, I will change that to a uint8_t in the next update... I hit the new assert when not on Linux, I guess it has to do with the initialization of the thread local variable. ------------- PR: https://git.openjdk.java.net/jdk/pull/5387 From stefank at openjdk.java.net Wed Sep 8 14:07:04 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Wed, 8 Sep 2021 14:07:04 GMT Subject: RFR: 8272807: Permit use of memory concurrent with pretouch In-Reply-To: References: Message-ID: <6RDTlrQqg36KCzXUAD_O5bQhrgYnTsBwLkIaaltpQZ4=.9d011747-8966-4991-9173-85a492172bbe@github.com> On Thu, 2 Sep 2021 18:33:56 GMT, Kim Barrett wrote: > Note that this PR replaces the withdrawn https://github.com/openjdk/jdk/pull/5215. > > Please review this change which adds os::touch_memory, which is similar to > os::pretouch_memory but allows concurrent access to the memory while it is > being touched. This is accomplished by using an atomic add of zero as the > operation for touching the memory, ensuring the virtual location is backed > by physical memory while not changing any values being read or written by > other threads. > > While I was there, fixed some other lurking issues in os::pretouch_memory. > There was a potential overflow in the iteration that has been fixed. And if > the range arguments weren't page aligned then the last page might not get > touched. The latter was even mentioned in the function's description. Both > of those have been fixed by careful alignment and some extra checks. The > resulting code is a little more complicated, but more robust and complete. > > Similarly added TouchTask, which is similar to PretouchTask. Again here, > there is some cleaning up to avoid potential overflows and such. > > - The chunk size is computed using the page size after possible adjustment > for UseTransparentHugePages. We want a chunk size that reflects the actual > number of touches that will be performed. > > - The chunk claim is now done using a CAS that won't exceed the range end. > The old atomic-fetch-and-add and check the result, which is performed by > each worker thread, could lead to overflow. The old code has a test for > overflow, but since pointer-arithmetic overflow is UB that's not reliable. > > - The old calculation of num_chunks for parallel touching could also > potentially overflow. > > Testing: > mach5 tier1-3 I think it would be prudent to separate this PR into two separate PRs. 1) For the changes to the pretouch loop. * I'm not sure we need all the added safeguarding, which, as you say, complicate the code. I think a few asserts would be good enough. I don't think the function needs to: - work if the caller pass in end < start. Sounds like an assert. - Safeguard that end + page doesn't overflow. Do we ever hand out pages at the end of the virtual address range? There's a lot of HotSpot code that assumes that we never get that kind of memory, so is it worth protecting against this compared to the extra complexity it gives? Previously, I could take a glance at that function and understand it. Now I need to pause and stare at it a bit. 2) For the new infrastructure. * I'm not sure we should add TouchTask until we have a concrete use-case (with prototype/PR) for it. Without that, it is hard to gauge the usability of this feature. How do other threads proceed concurrently with the worker threads? What is the impact of using TouchTask? One of the concerns that has been brought up by others is that using a WorkGang will block the rest of the JVM from safepointing while the workers are pre-touching. A use-case would clarify this, I think. * Is the usage of template specialization really needed? Wouldn't it be sufficient to pass down a bool parameter instead? I doubt we would see any performance difference by doing that. * It's a little bit confusing that in the context of TouchTask "touch" means "concurrent touch", while in the touch_memory_* it means either concurrent or non-concurrent. Maybe rename TouchTask to something to make this distinction a bit clearer. ------------- Changes requested by stefank (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5353 From shade at openjdk.java.net Wed Sep 8 15:24:19 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 8 Sep 2021 15:24:19 GMT Subject: RFR: 8273489: Zero: Handle UseHeavyMonitors on all monitorenter paths Message-ID: While fixing JDK-8273486, I noticed there is one place where we do not call VM slowpath when `UseHeavyMonitors` are requested. That place is `ZeroInterpreter::native_entry`. We should probably implement `UseHeavyMonitors` check on those paths. New code is modeled after existing uses, for example [this](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/interpreter/zero/bytecodeInterpreter.cpp#L583-L593). Additional testing: - [x] Linux x86_64 Zero `make bootcycle-images` ------------- Commit messages: - Fix Changes: https://git.openjdk.java.net/jdk/pull/5416/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5416&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273489 Stats: 7 lines in 1 file changed: 2 ins; 2 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/5416.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5416/head:pull/5416 PR: https://git.openjdk.java.net/jdk/pull/5416 From phh at openjdk.java.net Wed Sep 8 16:58:09 2021 From: phh at openjdk.java.net (Paul Hohensee) Date: Wed, 8 Sep 2021 16:58:09 GMT Subject: RFR: 8273239: Standardize Ticks APIs return type [v2] In-Reply-To: References: Message-ID: On Tue, 7 Sep 2021 17:27:00 GMT, Albert Mingkun Yang wrote: >> Simple change on return types of Ticks API. >> >> The call of `milliseconds()` in `spinYield.cpp` seems a bug to me, because the unit in the message is `usecs`. Therefore, I changed it to `microseconds()`. >> >> Test: tier1 > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > template I'm confused about two things. First, why is ElapsedCounter::frequency() ignored in this patch? Second, why are/were we potentially losing precision by converting everything to double internally and then converting the result to the target type? If nanoseconds are the source of truth, we could do arithmetic in the target type. One could replace frequency() by uint64_t nanospertick() { static const uint64_t npt = NANOSECS_PER_SEC / (uint64_t)os::elapsed_frequency(); return npt; } define conversion() to convert ticks to nanos and then do arithmetic in the target type, template inline T conversion(typename TimeSource::Type& value) { return (T)(value * TimeSource::nanospertick()) / (T)nanos_per_unit; } and then define seconds(), millseconds(), and nanoseconds() as template static T seconds(Type value) { return conversion(value); } template static T milliseconds(Type value) { return conversion(value); } template static T microseconds(Type value) { return conversion(value); } template static T nanoseconds(Type value) { return conversion(value); } This approach can lose a result tick fraction for all T other than double, but that currently happens on the final target type cast anyway. ------------- PR: https://git.openjdk.java.net/jdk/pull/5332 From lmesnik at openjdk.java.net Wed Sep 8 19:35:02 2021 From: lmesnik at openjdk.java.net (Leonid Mesnik) Date: Wed, 8 Sep 2021 19:35:02 GMT Subject: Integrated: 8265489: Stress test times out because of long ObjectSynchronizer::monitors_iterate(...) operation In-Reply-To: References: Message-ID: On Thu, 19 Aug 2021 21:18:53 GMT, Leonid Mesnik wrote: > monitors_iterate make several checks which often are true before filter monitor by a thread. It might take a lot of time when there are a lot of threads. So it makes sense to first check thread and only then other conditions. This pull request has now been integrated. Changeset: a5e4def5 Author: Leonid Mesnik URL: https://git.openjdk.java.net/jdk/commit/a5e4def526697d88ff31a5fdb41d823b899372f2 Stats: 55 lines in 5 files changed: 8 ins; 15 del; 32 mod 8265489: Stress test times out because of long ObjectSynchronizer::monitors_iterate(...) operation Reviewed-by: dcubed ------------- PR: https://git.openjdk.java.net/jdk/pull/5194 From lmesnik at openjdk.java.net Wed Sep 8 19:35:01 2021 From: lmesnik at openjdk.java.net (Leonid Mesnik) Date: Wed, 8 Sep 2021 19:35:01 GMT Subject: RFR: 8265489: Stress test times out because of long ObjectSynchronizer::monitors_iterate(...) operation In-Reply-To: References: Message-ID: <9-ps1-ZSby_wZXqxZ0nBExAYtXmGuCSz2UKMxcMQPiE=.effef28f-6ca3-4373-afbf-af5d7d63c9a6@github.com> On Thu, 19 Aug 2021 21:18:53 GMT, Leonid Mesnik wrote: > monitors_iterate make several checks which often are true before filter monitor by a thread. It might take a lot of time when there are a lot of threads. So it makes sense to first check thread and only then other conditions. I run stress tests several times on linux-x64 with ParallelGC and haven't seen timeout anymore. ------------- PR: https://git.openjdk.java.net/jdk/pull/5194 From minqi at openjdk.java.net Wed Sep 8 19:42:25 2021 From: minqi at openjdk.java.net (Yumin Qi) Date: Wed, 8 Sep 2021 19:42:25 GMT Subject: RFR: 8271569: Rename cdsoffsets.cpp to cdsConstants.cpp Message-ID: Changed cdsOffsets.cpp to cdsConstants.cpp, now the offsets and constants are initialized static and searched separately. The offsets array could not use 'constexpr' since g++ on MacOs and VSC++ on Windows complained reinterpret_cast in 'offset_of' should not be used in constexpr initialization. Changed some field access for forming global list first. Tests: ter1-4 Thanks Yumin ------------- Commit messages: - 8271569: Rename cdsoffsets.cpp to cdsConstants.cpp Changes: https://git.openjdk.java.net/jdk/pull/5423/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5423&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8271569 Stats: 336 lines in 10 files changed: 163 ins; 129 del; 44 mod Patch: https://git.openjdk.java.net/jdk/pull/5423.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5423/head:pull/5423 PR: https://git.openjdk.java.net/jdk/pull/5423 From sviswanathan at openjdk.java.net Wed Sep 8 20:20:32 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Wed, 8 Sep 2021 20:20:32 GMT Subject: RFR: 8273512: Fix the copyright header of x86 macroAssembler files Message-ID: <89boYXfzPR50KNk_ODjVPoaNZpRqnhib7ubqDgw8ftY=.8d25a4fe-59b8-4c7e-9e35-7ad1b9049df4@github.com> Fix the copyright header of x86 macroAssembler files to match others. ------------- Commit messages: - 8273512: Fix the copyright header of x86 macroAssembler files Changes: https://git.openjdk.java.net/jdk/pull/5424/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5424&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273512 Stats: 45 lines in 11 files changed: 22 ins; 0 del; 23 mod Patch: https://git.openjdk.java.net/jdk/pull/5424.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5424/head:pull/5424 PR: https://git.openjdk.java.net/jdk/pull/5424 From ayang at openjdk.java.net Wed Sep 8 21:14:05 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Wed, 8 Sep 2021 21:14:05 GMT Subject: RFR: 8273239: Standardize Ticks APIs return type [v2] In-Reply-To: References: Message-ID: On Wed, 8 Sep 2021 16:54:44 GMT, Paul Hohensee wrote: > First, why is ElapsedCounter::frequency() ignored in this patch? Its value is a constant and unlikely to change in the future, so I went for easier-to-read code over the more extensible approach. If you think not hard-coding such knowledge in `ticks` is more beneficial, I can revise it as suggested. > Second, why are/were we potentially losing precision by converting everything to double internally and then converting the result to the target type? In order to avoid truncating the types in the middle of the calculation. It's possible that the final result fits in `T`, but some intermediate values cause overflow for `T`. ------------- PR: https://git.openjdk.java.net/jdk/pull/5332 From kim.barrett at oracle.com Wed Sep 8 22:19:15 2021 From: kim.barrett at oracle.com (Kim Barrett) Date: Wed, 8 Sep 2021 22:19:15 +0000 Subject: RFR: 8272807: Permit use of memory concurrent with pretouch In-Reply-To: <6RDTlrQqg36KCzXUAD_O5bQhrgYnTsBwLkIaaltpQZ4=.9d011747-8966-4991-9173-85a492172bbe@github.com> References: <6RDTlrQqg36KCzXUAD_O5bQhrgYnTsBwLkIaaltpQZ4=.9d011747-8966-4991-9173-85a492172bbe@github.com> Message-ID: <9A1918F7-AD8E-4A54-9B7E-A8F29886B526@oracle.com> > On Sep 8, 2021, at 10:07 AM, Stefan Karlsson wrote: > > On Thu, 2 Sep 2021 18:33:56 GMT, Kim Barrett wrote: > >> Note that this PR replaces the withdrawn https://github.com/openjdk/jdk/pull/5215. >> >> Please review this change which adds os::touch_memory, which is similar to >> os::pretouch_memory but allows concurrent access to the memory while it is >> being touched. This is accomplished by using an atomic add of zero as the >> operation for touching the memory, ensuring the virtual location is backed >> by physical memory while not changing any values being read or written by >> other threads. >> > > I think it would be prudent to separate this PR into two separate PRs. I don?t see any benefit to that, but I can do it. > 1) For the changes to the pretouch loop. > > * I'm not sure we need all the added safeguarding, which, as you say, complicate the code. I think a few asserts would be good enough. It?s not; see below. > I don't think the function needs to: > - work if the caller pass in end < start. Sounds like an assert. There's already an assert for end < start. The test currently in the code is for start < end, e.g. is the range non-empty. I originally tried asserting a non-empty range, but that seems such show up sometimes. I didn't try to figure out whether those could be eliminated, though sometimes empty ranges show up pretty naturally and range-based operations typically permit the range to be empty. I found the empty range check up front simplified the analysis of the protected code, since certain kinds of errors simply can't happen because of it. Unless empty ranges are forbidden, something needs to be done somewhere to prevent writing outside the range. > - Safeguard that end + page doesn't overflow. Do we ever hand out pages at the end of the virtual address range? There's a lot of HotSpot code that assumes that we never get that kind of memory, so is it worth protecting against this compared to the extra complexity it gives? Previously, I could take a glance at that function and understand it. Now I need to pause and stare at it a bit. I suspect overflow can't happen on at least some 64bit platforms, but I don't know of anything that would prevent it on a 32bit platform. And I had exactly the opposite reaction from you to the old code. I looked at it and immediately wondered what might happen on pointer overflow, and questioned whether I understood the code. And as noted in the PR description, the old code for PretouchTask checks for overflow, but does so incorrectly, such that the check could be optimized away (or lead to other problems) because it would involve UB to fail the check. > 2) For the new infrastructure. > > * I'm not sure we should add TouchTask until we have a concrete use-case (with prototype/PR) for it. Without that, it is hard to gauge the usability of this feature. How do other threads proceed concurrently with the worker threads? What is the impact of using TouchTask? One of the concerns that has been brought up by others is that using a WorkGang will block the rest of the JVM from safepointing while the workers are pre-touching. A use-case would clarify this, I think. Here is a description of a specific example from my experiments. When the G1 allocator runs out of regions that were pre-allocated for eden and can't GC because of the GCLocker, it tries to allocate a new region, first from the free region list, and then an actually new region (assuming that's possible within the heap size constraints). In the latter case, when AlwaysUsePretouch is true, it currently does a *single-threaded* pretouch of the region and ancillary memory. (It's single-threaded because most of the code path involved is shared with during-GC code that is running under the work gang. That code is a good candidate for os::touch_memory, so adding a gang argument through that call tree that is null to indicate don't pretouch doesn't seem like an improvement.) It could instead do a parallel concurrent touch after making the new region available for other mutator threads to use for allocation. Yes, it blocks safepoints, but that's already true for the existing code, and the existing code may block them for *much* longer due to the unparallelized pretouch. The downside of using a workgang here is the gang threads could be competing with mutator threads for cores. I think parallelizing here is likely beneficial though. I think there's a similar situation in ParallelGC, though I haven't chased through all of that code carefully yet. > * Is the usage of template specialization really needed? Wouldn't it be sufficient to pass down a bool parameter instead? I doubt we would see any performance difference by doing that. The use of a template here seemed pretty straight-forward when I wrote it, but I agree a boolean argument would work just as well and is more obvious to read. I'll change that. I expect the same generated code either way. > * It's a little bit confusing that in the context of TouchTask "touch" means "concurrent touch", while in the touch_memory_* it means either concurrent or non-concurrent. Maybe rename TouchTask to something to make this distinction a bit clearer. I'm open to alternative naming. I thought ConcurrentTouchTask was rather long. My experiments repo currently uses PretouchTask::concurrent_touch, but I don't like that naming either. The rationale for "touch" vs "pretouch" is "touching" is the primary generic concept, and touch_memory can be used anywhere. Meanwhile "pretouching" is now a restricted variant that might have better performance under those limitions. Similarly for TouchTask vs PretouchTask. The static functions are just shared helpers for the API functions; "touch_memory" is the "primary" thing. From dholmes at openjdk.java.net Wed Sep 8 22:47:03 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 8 Sep 2021 22:47:03 GMT Subject: RFR: 8271569: Rename cdsoffsets.cpp to cdsConstants.cpp In-Reply-To: References: Message-ID: On Wed, 8 Sep 2021 19:33:51 GMT, Yumin Qi wrote: > Changed cdsOffsets.cpp to cdsConstants.cpp, now the offsets and constants are initialized static and searched separately. > The offsets array could not use 'constexpr' since g++ on MacOs and VSC++ on Windows complained reinterpret_cast in 'offset_of' should not be used in constexpr initialization. Changed some field access for forming global list first. > > Tests: ter1-4 > > Thanks > Yumin Hi Yumin, > Changed cdsOffsets.cpp to cdsConstants.cpp This is not showing up in the PR as a rename but as a new file. So there is no way to know what changes you may have made inside cdsConstants.cpp. The header file change shows as a rename. David ------------- PR: https://git.openjdk.java.net/jdk/pull/5423 From kim.barrett at oracle.com Wed Sep 8 23:25:20 2021 From: kim.barrett at oracle.com (Kim Barrett) Date: Wed, 8 Sep 2021 23:25:20 +0000 Subject: RFR: 8264707: HotSpot Style Guide should permit use of lambda In-Reply-To: References: Message-ID: <02C95815-9A3C-41AF-A73F-F7597A26DE90@oracle.com> > On Aug 21, 2021, at 4:48 AM, Andrew Haley wrote: > > So here's an oink in the flightment: > > > macroAssembler_aarch64_aes.o: Error: Use of global operators new and delete is not allowed in Hotspot: > U operator delete(void*) > U operator new(unsigned long) > See: /Users/aph/theRealAph-jdk/make/hotspot/lib/CompileJvm.gmk > > > > This happened on MacOS/AArch64, and was caused by an apparently innocuous Lambda. GCC doesn't generate new and delete for this construct, but AArch64 clang does for some reason. And I guess it's true that C++ compilers are free to do this, and even if one compiler doesn't do so today, it might tomorrow. > > What should we do? At least for my application, it doesn't matter if new and delete are used, but in some cases it might. Do we need a blanket prohibition against new and delete, when the programmer has no control over it? Andrew sent me his failing example. The global allocator uses are due to his code using std::function to capture a lambda, not because of the lambda itself. std::function is not approved for use in HotSpot code, and might never be, even if we were to start using other parts of the standard library. (See below.) It only happened with clang, not with gcc. That's probably because of different implementations. Their std::function implementations might have different SBO (Small Buffer Optimization) sizes. Or something about their implementations of lambda might have led to different sized objects. Or code or compiler differences might result in apparent but conditional references that didn't get optimized away under clang under the compilation mode being used. Or still other possibilities. std::function *does* do allocation in many cases. And unfortunately, it doesn?t support control of the allocator it uses. There?s API for that, but it apparently never worked properly, gcc never even implemented it, and allocator awareness for std::function was completely removed (not fixed or deprecated) in C++17. http://open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0302r0.html So without some care, conversion from a lambda to a std::function may attempt to invoke the default global allocator, which doesn't work in HotSpot. And we can't use the allocator support even in C++14, since support for that feature is either incomplete or non-existent in platforms we care about (and probably others too). The allocation can probably be avoided by passing the lambda by reference rather than by value, for example by std::function<...>(std::ref(the_lambda)), as the reference will (probably) fit in the function's expected SBO buffer, whereas the_lambda might be too big. (Typical SBO buffer size seems to be 2-3 pointers.) There are other things to think about here too. For example, the size of a lambda depends on what it captures and how, so whether it's worth capturing the lambda itself by reference or value depends on many low-level and implementation-dependent details. I think we will want a type-erasing callable lambda holder like std::function. Otherwise, some use-cases end up being forced to be templates all the way down, which is probably not great. But do we need such a thing immediately, as part of using lambda at all? I don't think so, but that's something to consider when deciding whether to allow using lambda in HotSpot. Because of restrictions we're imposing on lambda usage, and in particular requiring only downward usage, it should be possible to create such a holder that isn't too complicated either to implement or to use, and also avoids memory allocation. Having such a facility would also make it very easy to implement a somewhat different form of the ScopeGuard facility provided as an example. Andrew wondered if there might be an ongoing problem that we don't necessarily know whether some language feature allocates memory, now, or might in the future. I think we can make educated guesses. I also think control of memory allocation is a pretty important issue to a lot of C++ developers, so I would not expect the committee or implementors to quietly add allocation points. From dholmes at openjdk.java.net Wed Sep 8 23:47:04 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 8 Sep 2021 23:47:04 GMT Subject: RFR: 8273512: Fix the copyright header of x86 macroAssembler files In-Reply-To: <89boYXfzPR50KNk_ODjVPoaNZpRqnhib7ubqDgw8ftY=.8d25a4fe-59b8-4c7e-9e35-7ad1b9049df4@github.com> References: <89boYXfzPR50KNk_ODjVPoaNZpRqnhib7ubqDgw8ftY=.8d25a4fe-59b8-4c7e-9e35-7ad1b9049df4@github.com> Message-ID: On Wed, 8 Sep 2021 20:09:10 GMT, Sandhya Viswanathan wrote: > Fix the copyright header of x86 macroAssembler files to match others. Hi Sandhya, Hotspot files do not have the "Classpath exception". Thanks, David ------------- Changes requested by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5424 From sviswanathan at openjdk.java.net Thu Sep 9 00:10:20 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Thu, 9 Sep 2021 00:10:20 GMT Subject: RFR: 8273512: Fix the copyright header of x86 macroAssembler files [v2] In-Reply-To: <89boYXfzPR50KNk_ODjVPoaNZpRqnhib7ubqDgw8ftY=.8d25a4fe-59b8-4c7e-9e35-7ad1b9049df4@github.com> References: <89boYXfzPR50KNk_ODjVPoaNZpRqnhib7ubqDgw8ftY=.8d25a4fe-59b8-4c7e-9e35-7ad1b9049df4@github.com> Message-ID: > Fix the copyright header of x86 macroAssembler files to match others. Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: implement review comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5424/files - new: https://git.openjdk.java.net/jdk/pull/5424/files/236b4db0..4e7f94d7 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5424&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5424&range=00-01 Stats: 33 lines in 11 files changed: 0 ins; 22 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/5424.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5424/head:pull/5424 PR: https://git.openjdk.java.net/jdk/pull/5424 From sviswanathan at openjdk.java.net Thu Sep 9 00:40:01 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Thu, 9 Sep 2021 00:40:01 GMT Subject: RFR: 8273512: Fix the copyright header of x86 macroAssembler files [v2] In-Reply-To: References: <89boYXfzPR50KNk_ODjVPoaNZpRqnhib7ubqDgw8ftY=.8d25a4fe-59b8-4c7e-9e35-7ad1b9049df4@github.com> Message-ID: On Wed, 8 Sep 2021 23:44:27 GMT, David Holmes wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: >> >> implement review comments > > Hi Sandhya, > > Hotspot files do not have the "Classpath exception". > > Thanks, > David Thanks @dholmes-ora. I have removed the classpath exception. Let me know if the patch looks good to you now. ------------- PR: https://git.openjdk.java.net/jdk/pull/5424 From dholmes at openjdk.java.net Thu Sep 9 01:26:01 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 9 Sep 2021 01:26:01 GMT Subject: RFR: 8273512: Fix the copyright header of x86 macroAssembler files [v2] In-Reply-To: References: <89boYXfzPR50KNk_ODjVPoaNZpRqnhib7ubqDgw8ftY=.8d25a4fe-59b8-4c7e-9e35-7ad1b9049df4@github.com> Message-ID: On Thu, 9 Sep 2021 00:10:20 GMT, Sandhya Viswanathan wrote: >> Fix the copyright header of x86 macroAssembler files to match others. > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > implement review comments Hi Sandhya, Under the assumption that this is indeed the right way to apply Intel copyrights I can approve this PR. Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5424 From minqi at openjdk.java.net Thu Sep 9 03:16:02 2021 From: minqi at openjdk.java.net (Yumin Qi) Date: Thu, 9 Sep 2021 03:16:02 GMT Subject: RFR: 8271569: Change cdsoffsets.cpp to cdsConstants.cpp In-Reply-To: References: Message-ID: On Wed, 8 Sep 2021 22:44:14 GMT, David Holmes wrote: > Hi Yumin, > > > Changed cdsOffsets.cpp to cdsConstants.cpp > > This is not showing up in the PR as a rename but as a new file. So there is no way to know what changes you may have made inside cdsConstants.cpp. The header file change shows as a rename. > > David Hi, David git did not rename the 'old' to 'new' so the patch showed cdsConstants.cpp as a 'new' file. So I will change description as you indicated. Thanks Yumin ------------- PR: https://git.openjdk.java.net/jdk/pull/5423 From david.holmes at oracle.com Thu Sep 9 04:00:58 2021 From: david.holmes at oracle.com (David Holmes) Date: Thu, 9 Sep 2021 14:00:58 +1000 Subject: RFR: 8271569: Change cdsoffsets.cpp to cdsConstants.cpp In-Reply-To: References: Message-ID: <36f4f278-8052-db3e-ae88-793f47dae48b@oracle.com> On 9/09/2021 1:16 pm, Yumin Qi wrote: > On Wed, 8 Sep 2021 22:44:14 GMT, David Holmes wrote: > >> Hi Yumin, >> >>> Changed cdsOffsets.cpp to cdsConstants.cpp >> >> This is not showing up in the PR as a rename but as a new file. So there is no way to know what changes you may have made inside cdsConstants.cpp. The header file change shows as a rename. >> >> David > > Hi, David > git did not rename the 'old' to 'new' so the patch showed cdsConstants.cpp as a 'new' file. So I will change description as you indicated. Thanks ?? I'd rather you try again and do a rename :) Otherwise please advise what, if any, changes were actually made to the "new" file? Thanks, David > Yumin > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/5423 > From minqi at openjdk.java.net Thu Sep 9 04:28:01 2021 From: minqi at openjdk.java.net (Yumin Qi) Date: Thu, 9 Sep 2021 04:28:01 GMT Subject: RFR: 8271569: Change cdsoffsets.cpp to cdsConstants.cpp In-Reply-To: References: Message-ID: On Wed, 8 Sep 2021 22:44:14 GMT, David Holmes wrote: >> Changed cdsOffsets.cpp to cdsConstants.cpp, now the offsets and constants are initialized static and searched separately. >> The offsets array could not use 'constexpr' since g++ on MacOs and VSC++ on Windows complained reinterpret_cast in 'offset_of' should not be used in constexpr initialization. Changed some field access for forming global list first. >> >> Tests: ter1-4 >> >> Thanks >> Yumin > > Hi Yumin, > >> Changed cdsOffsets.cpp to cdsConstants.cpp > > This is not showing up in the PR as a rename but as a new file. So there is no way to know what changes you may have made inside cdsConstants.cpp. The header file change shows as a rename. > > David @dholmes-ora I see your point here --- let me redo the patch using rename. ------------- PR: https://git.openjdk.java.net/jdk/pull/5423 From minqi at openjdk.java.net Thu Sep 9 04:34:06 2021 From: minqi at openjdk.java.net (Yumin Qi) Date: Thu, 9 Sep 2021 04:34:06 GMT Subject: Withdrawn: 8271569: Change cdsoffsets.cpp to cdsConstants.cpp In-Reply-To: References: Message-ID: On Wed, 8 Sep 2021 19:33:51 GMT, Yumin Qi wrote: > Changed cdsOffsets.cpp to cdsConstants.cpp, now the offsets and constants are initialized static and searched separately. > The offsets array could not use 'constexpr' since g++ on MacOs and VSC++ on Windows complained reinterpret_cast in 'offset_of' should not be used in constexpr initialization. Changed some field access for forming global list first. > > Tests: ter1-4 > > Thanks > Yumin This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/5423 From thartmann at openjdk.java.net Thu Sep 9 06:20:06 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 9 Sep 2021 06:20:06 GMT Subject: RFR: 8273512: Fix the copyright header of x86 macroAssembler files [v2] In-Reply-To: References: <89boYXfzPR50KNk_ODjVPoaNZpRqnhib7ubqDgw8ftY=.8d25a4fe-59b8-4c7e-9e35-7ad1b9049df4@github.com> Message-ID: On Thu, 9 Sep 2021 00:10:20 GMT, Sandhya Viswanathan wrote: >> Fix the copyright header of x86 macroAssembler files to match others. > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > implement review comments What about this one? https://github.com/openjdk/jdk/blob/0417fcf13f7f2159499d325f2b3ace49d2643557/src/hotspot/cpu/aarch64/macroAssembler_aarch64_log.cpp#L2 Other files look good and consistent with the Intel copyright in `src/jdk.incubator.vector/linux/native/libsvml/*`. ------------- Changes requested by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5424 From thartmann at openjdk.java.net Thu Sep 9 07:36:59 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Thu, 9 Sep 2021 07:36:59 GMT Subject: RFR: 8267265: Use new IR Test Framework to create tests for C2 IGV transformations [v4] In-Reply-To: References: Message-ID: On Wed, 1 Sep 2021 00:19:18 GMT, John Tortugo wrote: >> Great, thanks! Btw, you can merge and now use `RunInfo.getRandom().XX()` for a handy access to random values (if needed) as the PR for JDK-8272567 was integrated in the meantime. > > Hi, again @chhagedorn. I added some `custom run tests` to tests that seemed more "complex". Please let me know if there are others that you think I should add. @JohnTortugo just FYI, @chhagedorn is currently on vacation but will be back mid next week. ------------- PR: https://git.openjdk.java.net/jdk/pull/5135 From shade at openjdk.java.net Thu Sep 9 09:50:08 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 9 Sep 2021 09:50:08 GMT Subject: RFR: 8273483: Zero: Clear pending JNI exception check in native method handler In-Reply-To: References: Message-ID: <1wRfneo3KZQcTC2Hu5n3WkTuzd6ageEtFM3wnTl5YJA=.670de359-67e0-49aa-8131-203b66bdf514@github.com> On Wed, 8 Sep 2021 10:26:42 GMT, Andrew Haley wrote: >> If you run Zero with existing tier1 test, then it would fail like this: >> >> >> $ CONF=linux-x86_64-zero-fastdebug make exploded-test TEST=runtime/jni/checked/TestCheckedJniExceptionCheck.java >> >> stdout: [TEST STARTED >> testSingleCallNoCheck start >> WARNING in native method: JNI call made without checking exceptions when required to from CallVoidMethod >> at java.lang.Object.getClass(java.base/Native Method) >> at java.io.PrintStream.println(java.base/PrintStream.java:1035) >> at TestCheckedJniExceptionCheck.testSingleCallNoCheck(TestCheckedJniExceptionCheck.java:82) >> at TestCheckedJniExceptionCheck.test(TestCheckedJniExceptionCheck.java:66) >> at TestCheckedJniExceptionCheck.main(TestCheckedJniExceptionCheck.java:203) >> testSingleCallNoCheck end >> >> >> In other words, there is a warning from the native call to Object.getClass from the test println itself, which it does not expect. This is because Zero does not clear the pending JNI exception check flag. All other (template) interpreter implementation do clear it in native call handlers. So the test rightfully reports the excess warning. >> >> Additional testing: >> - [x] Linux x86_64 Zero, `runtime/jni` tests now pass > > Marked as reviewed by aph (Reviewer). Thanks for review, @theRealAph. I'll integrate to get cleaner tier1 runs. ------------- PR: https://git.openjdk.java.net/jdk/pull/5411 From shade at openjdk.java.net Thu Sep 9 09:50:09 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 9 Sep 2021 09:50:09 GMT Subject: Integrated: 8273483: Zero: Clear pending JNI exception check in native method handler In-Reply-To: References: Message-ID: On Wed, 8 Sep 2021 10:08:50 GMT, Aleksey Shipilev wrote: > If you run Zero with existing tier1 test, then it would fail like this: > > > $ CONF=linux-x86_64-zero-fastdebug make exploded-test TEST=runtime/jni/checked/TestCheckedJniExceptionCheck.java > > stdout: [TEST STARTED > testSingleCallNoCheck start > WARNING in native method: JNI call made without checking exceptions when required to from CallVoidMethod > at java.lang.Object.getClass(java.base/Native Method) > at java.io.PrintStream.println(java.base/PrintStream.java:1035) > at TestCheckedJniExceptionCheck.testSingleCallNoCheck(TestCheckedJniExceptionCheck.java:82) > at TestCheckedJniExceptionCheck.test(TestCheckedJniExceptionCheck.java:66) > at TestCheckedJniExceptionCheck.main(TestCheckedJniExceptionCheck.java:203) > testSingleCallNoCheck end > > > In other words, there is a warning from the native call to Object.getClass from the test println itself, which it does not expect. This is because Zero does not clear the pending JNI exception check flag. All other (template) interpreter implementation do clear it in native call handlers. So the test rightfully reports the excess warning. > > Additional testing: > - [x] Linux x86_64 Zero, `runtime/jni` tests now pass This pull request has now been integrated. Changeset: aa931118 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/aa9311182ae88312a70b18afd85939718415b77c Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod 8273483: Zero: Clear pending JNI exception check in native method handler Reviewed-by: aph ------------- PR: https://git.openjdk.java.net/jdk/pull/5411 From aph-open at littlepinkcloud.com Thu Sep 9 12:02:44 2021 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Thu, 9 Sep 2021 13:02:44 +0100 Subject: RFC: AArch64: Implementing spin pauses with ISB In-Reply-To: <3FA517F5-3339-4C99-B9B3-15D733033D39@amazon.com> References: <3FA517F5-3339-4C99-B9B3-15D733033D39@amazon.com> Message-ID: <5d578cc7-231c-0dba-43e1-05cf44bd11d7@littlepinkcloud.com> On 8/10/21 10:52 PM, Astigeevich, Evgeny wrote: > We'd like to discuss a proposal for implementing spin pauses with the ISB instruction: Hi, this one seems to have gone quiet. I think everyone would be happy with a switch for ISB and YIELD. Do you agree with that idea? -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph-open at littlepinkcloud.com Thu Sep 9 12:10:12 2021 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Thu, 9 Sep 2021 13:10:12 +0100 Subject: RFR: 8264707: HotSpot Style Guide should permit use of lambda In-Reply-To: <02C95815-9A3C-41AF-A73F-F7597A26DE90@oracle.com> References: <02C95815-9A3C-41AF-A73F-F7597A26DE90@oracle.com> Message-ID: On 9/9/21 12:25 AM, Kim Barrett wrote: > Because of restrictions we're imposing on lambda usage, and in particular > requiring only downward usage, it should be possible to create such a holder > that isn't too complicated either to implement or to use, and also avoids > memory allocation. OK, but for now I guess we can use Lambdas in some simple case that make HotSpot clearer and easier to write. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From pliden at openjdk.java.net Thu Sep 9 12:15:18 2021 From: pliden at openjdk.java.net (Per Liden) Date: Thu, 9 Sep 2021 12:15:18 GMT Subject: RFR: 8273545: Remove Thread::is_GC_task_thread() Message-ID: I propose we remove Thread::is_GC_task_thread(). It's used only in two places (one in ZGC, and one assert in ParallelGC), and those two uses can be replaced by calls to is_Worker_thread() instead. Removing is_GC_task_thread() also allows us to clean out some stuff from WorkGang/GangWorker. ------------- Commit messages: - 8273545: Remove Thread::is_GC_task_thread() Changes: https://git.openjdk.java.net/jdk/pull/5442/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5442&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273545 Stats: 22 lines in 12 files changed: 0 ins; 16 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/5442.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5442/head:pull/5442 PR: https://git.openjdk.java.net/jdk/pull/5442 From stefank at openjdk.java.net Thu Sep 9 12:25:03 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Thu, 9 Sep 2021 12:25:03 GMT Subject: RFR: 8273545: Remove Thread::is_GC_task_thread() In-Reply-To: References: Message-ID: On Thu, 9 Sep 2021 12:07:18 GMT, Per Liden wrote: > I propose we remove Thread::is_GC_task_thread(). It's used only in two places (one in ZGC, and one assert in ParallelGC), and those two uses can be replaced by calls to is_Worker_thread() instead. Removing is_GC_task_thread() also allows us to clean out some stuff from WorkGang/GangWorker. Looks good ------------- Marked as reviewed by stefank (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5442 From coleenp at openjdk.java.net Thu Sep 9 12:36:59 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 9 Sep 2021 12:36:59 GMT Subject: RFR: 8273545: Remove Thread::is_GC_task_thread() In-Reply-To: References: Message-ID: On Thu, 9 Sep 2021 12:07:18 GMT, Per Liden wrote: > I propose we remove Thread::is_GC_task_thread(). It's used only in two places (one in ZGC, and one assert in ParallelGC), and those two uses can be replaced by calls to is_Worker_thread() instead. Removing is_GC_task_thread() also allows us to clean out some stuff from WorkGang/GangWorker. Looks good to me. So Worker threads could always do operations during a safepoint? They're NamedThreads so they can run across safepoints? Thanks. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5442 From coleenp at openjdk.java.net Thu Sep 9 12:41:02 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 9 Sep 2021 12:41:02 GMT Subject: RFR: 8273489: Zero: Handle UseHeavyMonitors on all monitorenter paths In-Reply-To: References: Message-ID: <4u9Xjz3N3IVB6_mxQimIHdcRy8poIyZaMv6R-lmRas0=.f94343fe-d186-4ddf-9d3b-85d31c6479af@github.com> On Wed, 8 Sep 2021 14:13:08 GMT, Aleksey Shipilev wrote: > While fixing JDK-8273486, I noticed there is one place where we do not call VM slowpath when `UseHeavyMonitors` are requested. That place is `ZeroInterpreter::native_entry`. We should probably implement `UseHeavyMonitors` check on those paths. > > New code is modeled after existing uses, for example [this](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/interpreter/zero/bytecodeInterpreter.cpp#L583-L593). > > Additional testing: > - [x] Linux x86_64 Zero `make bootcycle-images` Makes sense. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5416 From pliden at openjdk.java.net Thu Sep 9 13:19:01 2021 From: pliden at openjdk.java.net (Per Liden) Date: Thu, 9 Sep 2021 13:19:01 GMT Subject: RFR: 8273545: Remove Thread::is_GC_task_thread() In-Reply-To: References: Message-ID: On Thu, 9 Sep 2021 12:33:58 GMT, Coleen Phillimore wrote: > Looks good to me. So Worker threads could always do operations during a safepoint? They're NamedThreads so they can run across safepoints? Thanks. What a worker thread can and can not do with regards to safepoints depends on the GC, and what WorkGang instance in that GC. In other words, there's no strong connection between the two that is true for all GCs and all WorkGangs. ------------- PR: https://git.openjdk.java.net/jdk/pull/5442 From pliden at openjdk.java.net Thu Sep 9 13:22:10 2021 From: pliden at openjdk.java.net (Per Liden) Date: Thu, 9 Sep 2021 13:22:10 GMT Subject: RFR: 8273550: Replace os::cgc_thread/pgc_thread with os::gc_thread Message-ID: <9iFbMpm6BOPOj8FQRRtuwlvq7A58QshyYNjrfYOBCI0=.0ae6f5ec-1be6-4024-a7af-6e435b466ab7@github.com> The os thread types `cgc_thread` and `pgc_thread` might have been treated differently at some point in the past, but today they are not. So I suggest we replace those two types with a single `gc_thread` type. ------------- Commit messages: - 8273550: Replace os::cgc_thread/pgc_thread with os::gc_thread Changes: https://git.openjdk.java.net/jdk/pull/5444/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5444&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273550 Stats: 14 lines in 5 files changed: 0 ins; 9 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/5444.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5444/head:pull/5444 PR: https://git.openjdk.java.net/jdk/pull/5444 From pliden at openjdk.java.net Thu Sep 9 13:42:29 2021 From: pliden at openjdk.java.net (Per Liden) Date: Thu, 9 Sep 2021 13:42:29 GMT Subject: RFR: 8273545: Remove Thread::is_GC_task_thread() [v2] In-Reply-To: References: Message-ID: > I propose we remove Thread::is_GC_task_thread(). It's used only in two places (one in ZGC, and one assert in ParallelGC), and those two uses can be replaced by calls to is_Worker_thread() instead. Removing is_GC_task_thread() also allows us to clean out some stuff from WorkGang/GangWorker. Per Liden has updated the pull request incrementally with one additional commit since the last revision: Updated gtests ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5442/files - new: https://git.openjdk.java.net/jdk/pull/5442/files/ca94d767..0165c4a9 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5442&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5442&range=00-01 Stats: 6 lines in 5 files changed: 0 ins; 2 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/5442.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5442/head:pull/5442 PR: https://git.openjdk.java.net/jdk/pull/5442 From shade at openjdk.java.net Thu Sep 9 13:52:06 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 9 Sep 2021 13:52:06 GMT Subject: RFR: 8273545: Remove Thread::is_GC_task_thread() [v2] In-Reply-To: References: Message-ID: On Thu, 9 Sep 2021 13:42:29 GMT, Per Liden wrote: >> I propose we remove Thread::is_GC_task_thread(). It's used only in two places (one in ZGC, and one assert in ParallelGC), and those two uses can be replaced by calls to is_Worker_thread() instead. Removing is_GC_task_thread() also allows us to clean out some stuff from WorkGang/GangWorker. > > Per Liden has updated the pull request incrementally with one additional commit since the last revision: > > Updated gtests Looks fine. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5442 From stefank at openjdk.java.net Thu Sep 9 14:03:04 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Thu, 9 Sep 2021 14:03:04 GMT Subject: RFR: 8273550: Replace os::cgc_thread/pgc_thread with os::gc_thread In-Reply-To: <9iFbMpm6BOPOj8FQRRtuwlvq7A58QshyYNjrfYOBCI0=.0ae6f5ec-1be6-4024-a7af-6e435b466ab7@github.com> References: <9iFbMpm6BOPOj8FQRRtuwlvq7A58QshyYNjrfYOBCI0=.0ae6f5ec-1be6-4024-a7af-6e435b466ab7@github.com> Message-ID: On Thu, 9 Sep 2021 13:16:11 GMT, Per Liden wrote: > The os thread types `cgc_thread` and `pgc_thread` might have been treated differently at some point in the past, but today they are not. So I suggest we replace those two types with a single `gc_thread` type. Looks good ------------- Marked as reviewed by stefank (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5444 From coleenp at openjdk.java.net Thu Sep 9 14:11:01 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 9 Sep 2021 14:11:01 GMT Subject: RFR: 8273550: Replace os::cgc_thread/pgc_thread with os::gc_thread In-Reply-To: <9iFbMpm6BOPOj8FQRRtuwlvq7A58QshyYNjrfYOBCI0=.0ae6f5ec-1be6-4024-a7af-6e435b466ab7@github.com> References: <9iFbMpm6BOPOj8FQRRtuwlvq7A58QshyYNjrfYOBCI0=.0ae6f5ec-1be6-4024-a7af-6e435b466ab7@github.com> Message-ID: <7hmxu3cRfLNl2a3BpbWQUQ1nLOPH4zmdTTQzTkh8qfo=.81ea64c7-1460-489e-87f8-063bba43f2cd@github.com> On Thu, 9 Sep 2021 13:16:11 GMT, Per Liden wrote: > The os thread types `cgc_thread` and `pgc_thread` might have been treated differently at some point in the past, but today they are not. So I suggest we replace those two types with a single `gc_thread` type. Looks good. Seems trivially correct. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5444 From coleenp at openjdk.java.net Thu Sep 9 15:02:15 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 9 Sep 2021 15:02:15 GMT Subject: RFR: 8273456: Do not hold ttyLock around stack walking Message-ID: This change moves the tty rank back down to near access, and prints stack traces to stringStream to avoid holding the tty lock while trying to take the stackwatermark lock. Tested with tier1-8 (7,8 still in progress but no failures so far). ------------- Commit messages: - 8273456: Do not hold ttyLock around stack walking Changes: https://git.openjdk.java.net/jdk/pull/5445/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5445&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273456 Stats: 126 lines in 5 files changed: 54 ins; 45 del; 27 mod Patch: https://git.openjdk.java.net/jdk/pull/5445.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5445/head:pull/5445 PR: https://git.openjdk.java.net/jdk/pull/5445 From coleenp at openjdk.java.net Thu Sep 9 15:02:16 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 9 Sep 2021 15:02:16 GMT Subject: RFR: 8273456: Do not hold ttyLock around stack walking In-Reply-To: References: Message-ID: On Thu, 9 Sep 2021 14:54:01 GMT, Coleen Phillimore wrote: > This change moves the tty rank back down to near access, and prints stack traces to stringStream to avoid holding the tty lock while trying to take the stackwatermark lock. > Tested with tier1-8 (7,8 still in progress but no failures so far). src/hotspot/share/runtime/deoptimization.cpp line 213: > 211: assert(Universe::heap()->is_in_or_null(result), "must be heap pointer"); > 212: if (TraceDeoptimization) { > 213: ttyLocker ttyl; The change also removes obvious places where ttyLocker isn't needed. ------------- PR: https://git.openjdk.java.net/jdk/pull/5445 From shade at openjdk.java.net Thu Sep 9 17:20:07 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 9 Sep 2021 17:20:07 GMT Subject: RFR: 8273489: Zero: Handle UseHeavyMonitors on all monitorenter paths In-Reply-To: References: Message-ID: On Wed, 8 Sep 2021 14:13:08 GMT, Aleksey Shipilev wrote: > While fixing JDK-8273486, I noticed there is one place where we do not call VM slowpath when `UseHeavyMonitors` are requested. That place is `ZeroInterpreter::native_entry`. We should probably implement `UseHeavyMonitors` check on those paths. > > New code is modeled after existing uses, for example [this](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/interpreter/zero/bytecodeInterpreter.cpp#L583-L593). > > Additional testing: > - [x] Linux x86_64 Zero `make bootcycle-images` > - [x] Linux x86_64 Zero `make bootcycle-images` with `-XX:+UseHeavyMonitors` forced Thanks! I'll integrate now. ------------- PR: https://git.openjdk.java.net/jdk/pull/5416 From shade at openjdk.java.net Thu Sep 9 17:24:01 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 9 Sep 2021 17:24:01 GMT Subject: Integrated: 8273489: Zero: Handle UseHeavyMonitors on all monitorenter paths In-Reply-To: References: Message-ID: <0qWmdE2VtmSDd8WSOKHm6uru2Y095H7Awqh85rHlFnk=.f3f02b2b-a65f-4be1-a81f-079f7224befa@github.com> On Wed, 8 Sep 2021 14:13:08 GMT, Aleksey Shipilev wrote: > While fixing JDK-8273486, I noticed there is one place where we do not call VM slowpath when `UseHeavyMonitors` are requested. That place is `ZeroInterpreter::native_entry`. We should probably implement `UseHeavyMonitors` check on those paths. > > New code is modeled after existing uses, for example [this](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/interpreter/zero/bytecodeInterpreter.cpp#L583-L593). > > Additional testing: > - [x] Linux x86_64 Zero `make bootcycle-images` > - [x] Linux x86_64 Zero `make bootcycle-images` with `-XX:+UseHeavyMonitors` forced This pull request has now been integrated. Changeset: e3bda63c Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/e3bda63ce29bac0eaea520d42f4927dda77f83f2 Stats: 7 lines in 1 file changed: 2 ins; 2 del; 3 mod 8273489: Zero: Handle UseHeavyMonitors on all monitorenter paths Reviewed-by: coleenp ------------- PR: https://git.openjdk.java.net/jdk/pull/5416 From sviswanathan at openjdk.java.net Thu Sep 9 17:38:23 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Thu, 9 Sep 2021 17:38:23 GMT Subject: RFR: 8273512: Fix the copyright header of x86 macroAssembler files [v3] In-Reply-To: <89boYXfzPR50KNk_ODjVPoaNZpRqnhib7ubqDgw8ftY=.8d25a4fe-59b8-4c7e-9e35-7ad1b9049df4@github.com> References: <89boYXfzPR50KNk_ODjVPoaNZpRqnhib7ubqDgw8ftY=.8d25a4fe-59b8-4c7e-9e35-7ad1b9049df4@github.com> Message-ID: > Fix the copyright header of x86 macroAssembler files to match others. Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Implement review comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5424/files - new: https://git.openjdk.java.net/jdk/pull/5424/files/4e7f94d7..9e1664e5 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5424&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5424&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5424.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5424/head:pull/5424 PR: https://git.openjdk.java.net/jdk/pull/5424 From sviswanathan at openjdk.java.net Thu Sep 9 17:38:24 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Thu, 9 Sep 2021 17:38:24 GMT Subject: RFR: 8273512: Fix the copyright header of x86 macroAssembler files [v2] In-Reply-To: References: <89boYXfzPR50KNk_ODjVPoaNZpRqnhib7ubqDgw8ftY=.8d25a4fe-59b8-4c7e-9e35-7ad1b9049df4@github.com> Message-ID: On Thu, 9 Sep 2021 06:17:22 GMT, Tobias Hartmann wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: >> >> implement review comments > > What about this one? > https://github.com/openjdk/jdk/blob/0417fcf13f7f2159499d325f2b3ace49d2643557/src/hotspot/cpu/aarch64/macroAssembler_aarch64_log.cpp#L2 > > Other files look good and consistent with the Intel copyright in `src/jdk.incubator.vector/linux/native/libsvml/*`. Thanks a lot @TobiHartmann. I have corrected that line as well. ------------- PR: https://git.openjdk.java.net/jdk/pull/5424 From pliden at openjdk.java.net Thu Sep 9 19:23:11 2021 From: pliden at openjdk.java.net (Per Liden) Date: Thu, 9 Sep 2021 19:23:11 GMT Subject: Integrated: 8273545: Remove Thread::is_GC_task_thread() In-Reply-To: References: Message-ID: On Thu, 9 Sep 2021 12:07:18 GMT, Per Liden wrote: > I propose we remove Thread::is_GC_task_thread(). It's used only in two places (one in ZGC, and one assert in ParallelGC), and those two uses can be replaced by calls to is_Worker_thread() instead. Removing is_GC_task_thread() also allows us to clean out some stuff from WorkGang/GangWorker. This pull request has now been integrated. Changeset: 185eacac Author: Per Liden URL: https://git.openjdk.java.net/jdk/commit/185eacacdde9de12936520a1cda847f7e541c62f Stats: 28 lines in 17 files changed: 0 ins; 18 del; 10 mod 8273545: Remove Thread::is_GC_task_thread() Reviewed-by: stefank, coleenp, shade ------------- PR: https://git.openjdk.java.net/jdk/pull/5442 From pliden at openjdk.java.net Thu Sep 9 19:23:10 2021 From: pliden at openjdk.java.net (Per Liden) Date: Thu, 9 Sep 2021 19:23:10 GMT Subject: RFR: 8273545: Remove Thread::is_GC_task_thread() [v2] In-Reply-To: References: Message-ID: On Thu, 9 Sep 2021 13:42:29 GMT, Per Liden wrote: >> I propose we remove Thread::is_GC_task_thread(). It's used only in two places (one in ZGC, and one assert in ParallelGC), and those two uses can be replaced by calls to is_Worker_thread() instead. Removing is_GC_task_thread() also allows us to clean out some stuff from WorkGang/GangWorker. > > Per Liden has updated the pull request incrementally with one additional commit since the last revision: > > Updated gtests Thanks all for reviewing! ------------- PR: https://git.openjdk.java.net/jdk/pull/5442 From pliden at openjdk.java.net Thu Sep 9 19:26:09 2021 From: pliden at openjdk.java.net (Per Liden) Date: Thu, 9 Sep 2021 19:26:09 GMT Subject: Integrated: 8273550: Replace os::cgc_thread/pgc_thread with os::gc_thread In-Reply-To: <9iFbMpm6BOPOj8FQRRtuwlvq7A58QshyYNjrfYOBCI0=.0ae6f5ec-1be6-4024-a7af-6e435b466ab7@github.com> References: <9iFbMpm6BOPOj8FQRRtuwlvq7A58QshyYNjrfYOBCI0=.0ae6f5ec-1be6-4024-a7af-6e435b466ab7@github.com> Message-ID: On Thu, 9 Sep 2021 13:16:11 GMT, Per Liden wrote: > The os thread types `cgc_thread` and `pgc_thread` might have been treated differently at some point in the past, but today they are not. So I suggest we replace those two types with a single `gc_thread` type. This pull request has now been integrated. Changeset: 4020a60c Author: Per Liden URL: https://git.openjdk.java.net/jdk/commit/4020a60cbb3db0458262212d46515c8c11492a5b Stats: 14 lines in 5 files changed: 0 ins; 9 del; 5 mod 8273550: Replace os::cgc_thread/pgc_thread with os::gc_thread Reviewed-by: stefank, coleenp ------------- PR: https://git.openjdk.java.net/jdk/pull/5444 From pliden at openjdk.java.net Thu Sep 9 19:26:09 2021 From: pliden at openjdk.java.net (Per Liden) Date: Thu, 9 Sep 2021 19:26:09 GMT Subject: RFR: 8273550: Replace os::cgc_thread/pgc_thread with os::gc_thread In-Reply-To: <9iFbMpm6BOPOj8FQRRtuwlvq7A58QshyYNjrfYOBCI0=.0ae6f5ec-1be6-4024-a7af-6e435b466ab7@github.com> References: <9iFbMpm6BOPOj8FQRRtuwlvq7A58QshyYNjrfYOBCI0=.0ae6f5ec-1be6-4024-a7af-6e435b466ab7@github.com> Message-ID: On Thu, 9 Sep 2021 13:16:11 GMT, Per Liden wrote: > The os thread types `cgc_thread` and `pgc_thread` might have been treated differently at some point in the past, but today they are not. So I suggest we replace those two types with a single `gc_thread` type. Thanks all for reviewing! ------------- PR: https://git.openjdk.java.net/jdk/pull/5444 From mseledtsov at openjdk.java.net Thu Sep 9 19:58:05 2021 From: mseledtsov at openjdk.java.net (Mikhailo Seledtsov) Date: Thu, 9 Sep 2021 19:58:05 GMT Subject: RFR: 8273438: Enable parallelism in vmTestbase/metaspace/stressHierarchy tests In-Reply-To: <6UGVOWy8QGpYDMbNFkT6qIERHESLMdZpvz8ihmm_obg=.cadc1f2e-3325-4bdc-a0c7-e4579f72663f@github.com> References: <6UGVOWy8QGpYDMbNFkT6qIERHESLMdZpvz8ihmm_obg=.cadc1f2e-3325-4bdc-a0c7-e4579f72663f@github.com> Message-ID: On Tue, 7 Sep 2021 15:07:10 GMT, Aleksey Shipilev wrote: > Current `vmTestbase/metaspace/stressHierarchy` tests (part of vmTestbase_vm_metaspace suite) contains about 15 tests, each running exclusively. There seem to be no reason to run them exclusively, though: they complete in reasonable time, are single-threaded, and consume the usual amount of memory. There is no evidence in JBS that they ever timed out without a reason, and their history unfortunately predates OpenJDK to see why they were not concurrent from day one. > > We should consider enabling parallelism for `vmTestbase/metaspace/stressHierarchy` and get improved test performance. Currently it is blocked by `TEST.properties` with `exclusiveAccess.dirs` directives in them. > > Note there are other exclusive tests in `vmTestbase_vm_metaspace`, but those seem to be the hard stress tests: pushing GC to the limits, or doing many threads, etc. > > Motivational test time improvements below. > > Before: > > > $ time CONF=linux-x86_64-server-fastdebug make run-test TEST=vmTestbase_vm_metaspace | ts -s > ... > 00:24:53 ============================== > 00:24:53 Test summary > 00:24:53 ============================== > 00:24:53 TEST TOTAL PASS FAIL ERROR > 00:24:53 jtreg:test/hotspot/jtreg:vmTestbase_vm_metaspace 25 25 0 0 > 00:24:53 ============================== > 00:24:53 TEST SUCCESS > 00:24:53 > 00:24:53 Finished building target 'run-test' in configuration 'linux-x86_64-server-fastdebug' > > real 24m53.389s > user 53m2.029s > sys 1m1.849s > > > After: > > > $ time CONF=linux-x86_64-server-fastdebug make run-test TEST=vmTestbase_vm_metaspace | ts -s > ... > 00:04:04 ============================== > 00:04:04 Test summary > 00:04:04 ============================== > 00:04:04 TEST TOTAL PASS FAIL ERROR > 00:04:04 jtreg:test/hotspot/jtreg:vmTestbase_vm_metaspace 25 25 0 0 > 00:04:04 ============================== > 00:04:04 TEST SUCCESS > 00:04:04 > 00:04:04 Finished building target 'run-test' in configuration 'linux-x86_64-server-fastdebug' > > real 4m4.574s > user 56m10.582s > sys 1m4.725s This looks like a good change to me. Please allow me some time to run multiple stress testing of these tests with exclusiveAccess removed. I should have the results tonight PST, or tomorrow. ------------- PR: https://git.openjdk.java.net/jdk/pull/5391 From github.com+2249648+johntortugo at openjdk.java.net Thu Sep 9 20:53:06 2021 From: github.com+2249648+johntortugo at openjdk.java.net (John Tortugo) Date: Thu, 9 Sep 2021 20:53:06 GMT Subject: RFR: 8267265: Use new IR Test Framework to create tests for C2 IGV transformations [v4] In-Reply-To: References: Message-ID: On Thu, 9 Sep 2021 07:33:47 GMT, Tobias Hartmann wrote: >> Hi, again @chhagedorn. I added some `custom run tests` to tests that seemed more "complex". Please let me know if there are others that you think I should add. > > @JohnTortugo just FYI, @chhagedorn is currently on vacation but will be back mid next week. Thank you @TobiHartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/5135 From phh at openjdk.java.net Thu Sep 9 23:23:12 2021 From: phh at openjdk.java.net (Paul Hohensee) Date: Thu, 9 Sep 2021 23:23:12 GMT Subject: RFR: 8273239: Standardize Ticks APIs return type [v2] In-Reply-To: References: Message-ID: On Tue, 7 Sep 2021 17:27:00 GMT, Albert Mingkun Yang wrote: >> Simple change on return types of Ticks API. >> >> The call of `milliseconds()` in `spinYield.cpp` seems a bug to me, because the unit in the message is `usecs`. Therefore, I changed it to `microseconds()`. >> >> Test: tier1 > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > template Here's a [webrev](https://cr.openjdk.java.net/~phh/8273239/webrev.00/) that uses is_integral() to compute seconds/millis/micros/nanos at full width for both integral and floating point target types. Passes hotspot tier1 and gtest, running jdk tier1. Generated code looks correct. There are quite a few places where seconds() * 1000.0, etc. could be replaced by uses of milliseconds(), etc. ------------- PR: https://git.openjdk.java.net/jdk/pull/5332 From dholmes at openjdk.java.net Thu Sep 9 23:26:03 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 9 Sep 2021 23:26:03 GMT Subject: RFR: 8273456: Do not hold ttyLock around stack walking In-Reply-To: References: Message-ID: On Thu, 9 Sep 2021 14:54:01 GMT, Coleen Phillimore wrote: > This change moves the tty rank back down to near access, and prints stack traces to stringStream to avoid holding the tty lock while trying to take the stackwatermark lock. > Tested with tier1-8 (7,8 still in progress but no failures so far). Hi Coleen, My only minor concern with these changes is that we lose information if there is a crash during any of these logging loops. Before you would (should?) see how far we got before a crash, but now there will not be any indication of that. But that is not the primary intention of this logging so I think the changes are okay. Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5445 From coleenp at openjdk.java.net Fri Sep 10 01:42:01 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 10 Sep 2021 01:42:01 GMT Subject: RFR: 8273456: Do not hold ttyLock around stack walking In-Reply-To: References: Message-ID: On Thu, 9 Sep 2021 14:54:01 GMT, Coleen Phillimore wrote: > This change moves the tty rank back down to near access, and prints stack traces to stringStream to avoid holding the tty lock while trying to take the stackwatermark lock. > Tested with tier1-8 (7,8 still in progress but no failures so far). I guess if you crashed you could look at st in the debugger. Thanks for the review! ------------- PR: https://git.openjdk.java.net/jdk/pull/5445 From github.com+39413832+weixlu at openjdk.java.net Fri Sep 10 02:33:03 2021 From: github.com+39413832+weixlu at openjdk.java.net (Xiaowei Lu) Date: Fri, 10 Sep 2021 02:33:03 GMT Subject: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v8] In-Reply-To: References: <_IqJ7u4Vk7jF8E--2RzWfdnxYXDQr86TIsxA7sh_3WI=.4d2c4cd9-63c8-4921-b5a1-e77d66c10325@github.com> Message-ID: On Tue, 7 Sep 2021 08:33:40 GMT, Aleksey Shipilev wrote: >> More work: leave `acquire`-in-lieu-of-`consume` in, and special case the heap update paths to dodge the performance penalty of doing so. Seems to work on x86_64 and AArch64. > >> @shipilev Hi, I have tested this pull request as well as this pull request + `OrderAccess::release();` on specjbb 2015 on AArch64 (Kunpeng 920). Maybe there is a slight improvement on critical-jOPS? Here is the result. > > Thanks for testing. So explicit barrier does seem to result in a slight bump in critical-jOPS. > > I assume "base" results are this PR? If so, do you have performance results for the current master? In other words, it would be interesting to see three results: baseline (current master), this PR, and this PR + `OrderAccess::release()`. @shipilev Yes, ?base? means this PR in my previous comment. Here is the result of the current master(i.e. revert all commits in this PR). It seems master performs better, so the cost of ?acquire? may be really high as you have said. master_1:RUN RESULT: hbIR (max attempted) = 34282, hbIR (settled) = 32419, max-jOPS = 29825, critical-jOPS = 23053 master_2:RUN RESULT: hbIR (max attempted) = 41119, hbIR (settled) = 34282, max-jOPS = 30017, critical-jOPS = 23092 master_3:RUN RESULT: hbIR (max attempted) = 34282, hbIR (settled) = 31780, max-jOPS = 29825, critical-jOPS = 22383 ------------- PR: https://git.openjdk.java.net/jdk/pull/2496 From thartmann at openjdk.java.net Fri Sep 10 05:57:07 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 10 Sep 2021 05:57:07 GMT Subject: RFR: 8273512: Fix the copyright header of x86 macroAssembler files [v3] In-Reply-To: References: <89boYXfzPR50KNk_ODjVPoaNZpRqnhib7ubqDgw8ftY=.8d25a4fe-59b8-4c7e-9e35-7ad1b9049df4@github.com> Message-ID: On Thu, 9 Sep 2021 17:38:23 GMT, Sandhya Viswanathan wrote: >> Fix the copyright header of x86 macroAssembler files to match others. > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Implement review comments Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5424 From nick.gasson at arm.com Fri Sep 10 07:37:46 2021 From: nick.gasson at arm.com (Nick Gasson) Date: Fri, 10 Sep 2021 15:37:46 +0800 Subject: RFR: JDK-8271567: AArch64: AES Galois CounterMode (GCM) interleaved implementation using vector instructions In-Reply-To: References: Message-ID: <8535qchodx.fsf@arm.com> On 07/09/21 22:36 pm, Andrew Haley wrote: > > A generator takes the form of a subclass of `KernelGenerator`. The > core idea is that the programmer defines the base case of the > intrinsic and a method to generate a clone of it, shifted to a > different set of registers. `KernelGenerator` will then generate > several interleaved copies of the function, with each one using a > different set of registers. > > The subclass must implement three methods: `length()`, which is the > number of instruction bundles in the intrinsic, `generate(int n)` > which emits the nth instruction bundle in the intrinsic, and `next()` > which takes an instance of the generator and returns a version of it, > shifted to a new set of registers. > Can you include this explanation in the code somewhere? Perhaps as a comment above KernelGenerator. I like the idea but the generate() method is a bit opaque without this. -- Thanks, Nick From shade at openjdk.java.net Fri Sep 10 07:44:58 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 10 Sep 2021 07:44:58 GMT Subject: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v8] In-Reply-To: References: <_IqJ7u4Vk7jF8E--2RzWfdnxYXDQr86TIsxA7sh_3WI=.4d2c4cd9-63c8-4921-b5a1-e77d66c10325@github.com> Message-ID: On Tue, 7 Sep 2021 08:33:40 GMT, Aleksey Shipilev wrote: >> More work: leave `acquire`-in-lieu-of-`consume` in, and special case the heap update paths to dodge the performance penalty of doing so. Seems to work on x86_64 and AArch64. > >> @shipilev Hi, I have tested this pull request as well as this pull request + `OrderAccess::release();` on specjbb 2015 on AArch64 (Kunpeng 920). Maybe there is a slight improvement on critical-jOPS? Here is the result. > > Thanks for testing. So explicit barrier does seem to result in a slight bump in critical-jOPS. > > I assume "base" results are this PR? If so, do you have performance results for the current master? In other words, it would be interesting to see three results: baseline (current master), this PR, and this PR + `OrderAccess::release()`. > @shipilev Yes, ?base? means this PR in my previous comment. Here is the result of the current master(i.e. revert all commits in this PR). It seems master performs better, so the cost of ?acquire? may be really high as you have said. (sighs) Thanks for testing. Do you have spare cycles to verify that "acquire" is indeed the culprit for this? It would be simple to check: replace all `mark_acquire()` to just `mark()` in this PR. I am somewhat sure that would not break things very much for the test runs. ------------- PR: https://git.openjdk.java.net/jdk/pull/2496 From ayang at openjdk.java.net Fri Sep 10 10:09:04 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Fri, 10 Sep 2021 10:09:04 GMT Subject: RFR: 8273239: Standardize Ticks APIs return type [v2] In-Reply-To: References: Message-ID: On Tue, 7 Sep 2021 17:27:00 GMT, Albert Mingkun Yang wrote: >> Simple change on return types of Ticks API. >> >> The call of `milliseconds()` in `spinYield.cpp` seems a bug to me, because the unit in the message is `usecs`. Therefore, I changed it to `microseconds()`. >> >> Test: tier1 > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > template Thank you for the patch; it's easier to discuss over a concrete implementation. > NANOSECS_PER_SEC / (uint64_t)Rdtsc::frequency(); This integer division could cause precision loss when `T` is `double`, right? ------------- PR: https://git.openjdk.java.net/jdk/pull/5332 From pliden at openjdk.java.net Fri Sep 10 12:49:17 2021 From: pliden at openjdk.java.net (Per Liden) Date: Fri, 10 Sep 2021 12:49:17 GMT Subject: RFR: 8273597: Rectify Thread::is_ConcurrentGC_thread() Message-ID: `Thread::is_ConcurrentGC_thread()` behaves differently to all other `Thread::is_xxx_thread()` functions, in the sense that it doesn't directly map to a distinct `Thread` sub-class. Instead, `is_ConcurrentGC_thread()` can today return true for both `ConcurrentGCThread` and `GangWorker`. These two classes have no super/sub-class relation. This is confusing and and potentially dangerous. It would be reasonable to think that code like this would be correct: if (thread->is_ConcurrentGC_thread()) { conc_thread = static_cast(thread); ... } but it's not, since we might try to cast a `GangWorker` to a `ConcurrentGCThread`. And again, these two classes have no super/sub-class relation. I propose that we clean this up, so that `is_ConcurrentGCThread()` only returns true for threads inheriting from `ConcurrentGCThread`. The main side-effect is that a handful of asserts need to be adjusted. In return, the code example above would become legal, and we can also remove some cruft from `WorkGang`/`GangWorker`. ------------- Commit messages: - 8273597: Rectify Thread::is_ConcurrentGC_thread() Changes: https://git.openjdk.java.net/jdk/pull/5463/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5463&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273597 Stats: 53 lines in 17 files changed: 6 ins; 26 del; 21 mod Patch: https://git.openjdk.java.net/jdk/pull/5463.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5463/head:pull/5463 PR: https://git.openjdk.java.net/jdk/pull/5463 From aph at openjdk.java.net Fri Sep 10 13:21:37 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 10 Sep 2021 13:21:37 GMT Subject: RFR: JDK-8271567: AArch64: AES Galois CounterMode (GCM) interleaved implementation using vector instructions [v7] In-Reply-To: References: Message-ID: > An interleaved version of AES/GCM. > > Performance, now and then: > > > Apple M1, 3.2 GHz: > > Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > AESGCMBench.decrypt 8192 256 avgt 6 3108.881 ? 119.675 ns/op > AESGCMBench.decryptMultiPart 8192 256 avgt 6 3109.685 ? 4.206 ns/op > AESGCMBench.encrypt 8192 256 avgt 6 3122.144 ? 113.379 ns/op > AESGCMBench.encryptMultiPart 8192 256 avgt 6 3119.568 ? 192.877 ns/op > > AESGCMBench.decrypt 8192 256 avgt 6 89123.942 ? 111.977 ns/op > AESGCMBench.decryptMultiPart 8192 256 avgt 6 91034.697 ? 161.469 ns/op > AESGCMBench.encrypt 8192 256 avgt 6 89732.397 ? 106.370 ns/op > AESGCMBench.encryptMultiPart 8192 256 avgt 6 89382.300 ? 139.300 ns/op > > Neoverse N1, 2.5GHz: > > Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > AESGCMBench.decrypt 8192 256 avgt 6 6296.575 ? 37.995 ns/op > AESGCMBench.decryptMultiPart 8192 256 avgt 6 7380.326 ? 10.987 ns/op > AESGCMBench.encrypt 8192 256 avgt 6 6293.090 ? 52.972 ns/op > AESGCMBench.encryptMultiPart 8192 256 avgt 6 6357.536 ? 42.925 ns/op > > AESGCMBench.decrypt 8192 256 avgt 6 48745.085 ? 125.612 ns/op > AESGCMBench.decryptMultiPart 8192 256 avgt 6 45062.599 ? 1548.950 ns/op > AESGCMBench.encrypt 8192 256 avgt 6 42230.857 ? 520.562 ns/op > AESGCMBench.encryptMultiPart 8192 256 avgt 6 45124.171 ? 1417.927 ns/op > > > > A note about the implementation for the reviewers: > > Unrolled and hand-scheduled intrinsics are often written in a way that > I don't find satisfactory. Often they are a conglomeration of > copy-and-paste programming and C macros, which makes them hard to > understand and hard to maintain. I won't name any names, but there are > many exampled to be found in free software across the Internet, > > I spent a while thinking about a structured way to develop and > implement them, and I think I've got something better. The idea is > that you transform a pre-existing implementation into a generator for > the interleaved version. The transformation shouldn't be too hard to > do, but more importantly it should be possible for a reader to verify > that the interleaved and unrolled version performs the same function. > > A generator takes the form of a subclass of `KernelGenerator`. The > core idea is that the programmer defines the base case of the > intrinsic and a method to generate a clone of it, shifted to a > different set of registers. `KernelGenerator` will then generate > several interleaved copies of the function, with each one using a > different set of registers. > > The subclass must implement three methods: `length()`, which is the > number of instruction bundles in the intrinsic, `generate(int n)` > which emits the nth instruction bundle in the intrinsic, and `next()` > which takes an instance of the generator and returns a version of it, > shifted to a new set of registers. > > As an example, here's the inner loop of AES encryption: > > (Some details elided for clarity.) > > > BIND(L_aes_loop); > ld1(v0, T16B, post(from, 16)); > > cmpw(keylen, 44); > br(Assembler::CC, L_rounds_44); > br(Assembler::EQ, L_rounds_52); > > aes_round(v0, v17); > aes_round(v0, v18); > BIND(L_rounds_52); > aes_round(v0, v19); > aes_round(v0, v20); > BIND(L_rounds_44); > ... > > > The generator for the unrolled version looks like: > > > virtual void generate(int index) { > switch (index) { > case 0: > ld1(_data, T16B, post(_from, 16)); // get 16 bytes of input > break; > case 1: > if (_once) { > cmpw(_keylen, 52); > br(Assembler::LO, _rounds_44); > br(Assembler::EQ, _rounds_52); > } > break; > case 2: aes_round(_data, _subkeys + 0); break; > case 3: aes_round(_data, _subkeys + 1); break; > case 4: > if (_once) bind(_rounds_52); > break; > case 5: aes_round(_data, _subkeys + 2); break; > case 6: aes_round(_data, _subkeys + 3); break; > case 7: > if (_once) bind(_rounds_44); > break; > ... > > > The job of converting a single inline intrinsic is, as you can see, > not much more than adding a switch statement. Some instructions should > only be emitted once, rather than several times, such as the labels > and branches. (You can use a list of C++ lambdas rather than a switch > statement to do the same thing, very LISP, but that seems a bit of a > sledgehammer. YMMV.) > > I believe that this approach will be more maintainable and easier to > understand than other approaches we've seen. Also, the number of > unrolls is just a number that can be tweaked as required. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5390/files - new: https://git.openjdk.java.net/jdk/pull/5390/files/9fc11725..ba4fe416 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5390&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5390&range=05-06 Stats: 23 lines in 2 files changed: 16 ins; 1 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/5390.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5390/head:pull/5390 PR: https://git.openjdk.java.net/jdk/pull/5390 From aph at openjdk.java.net Fri Sep 10 13:21:37 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 10 Sep 2021 13:21:37 GMT Subject: RFR: JDK-8271567: AArch64: AES Galois CounterMode (GCM) interleaved implementation using vector instructions In-Reply-To: <8535qchodx.fsf@arm.com> References: <8535qchodx.fsf@arm.com> Message-ID: <4fTufF-QvPVOxVAXP5c3h5lrOVJtroJLMhZPhT7QuVo=.7a9d31a3-6bcc-48e0-ae8b-45f93d2af327@github.com> On Fri, 10 Sep 2021 07:39:33 GMT, Nick Gasson wrote: > Can you include this explanation in the code somewhere? Perhaps as a > comment above KernelGenerator. I like the idea but the generate() > method is a bit opaque without this. Right you are: I'm forever asking committers to do just that. Physician, heal thyself! ------------- PR: https://git.openjdk.java.net/jdk/pull/5390 From stefan.karlsson at oracle.com Fri Sep 10 13:38:17 2021 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Fri, 10 Sep 2021 15:38:17 +0200 Subject: RFR: 8272807: Permit use of memory concurrent with pretouch In-Reply-To: <9A1918F7-AD8E-4A54-9B7E-A8F29886B526@oracle.com> References: <6RDTlrQqg36KCzXUAD_O5bQhrgYnTsBwLkIaaltpQZ4=.9d011747-8966-4991-9173-85a492172bbe@github.com> <9A1918F7-AD8E-4A54-9B7E-A8F29886B526@oracle.com> Message-ID: On 2021-09-09 00:19, Kim Barrett wrote: >> On Sep 8, 2021, at 10:07 AM, Stefan Karlsson wrote: >> >> On Thu, 2 Sep 2021 18:33:56 GMT, Kim Barrett wrote: >> >>> Note that this PR replaces the withdrawn https://github.com/openjdk/jdk/pull/5215. >>> >>> Please review this change which adds os::touch_memory, which is similar to >>> os::pretouch_memory but allows concurrent access to the memory while it is >>> being touched. This is accomplished by using an atomic add of zero as the >>> operation for touching the memory, ensuring the virtual location is backed >>> by physical memory while not changing any values being read or written by >>> other threads. >>> >> I think it would be prudent to separate this PR into two separate PRs. > I don?t see any benefit to that, but I can do it. There reason why I suggest that this gets split into two PRs is that there seems to be at least two different reasons for changing the code, and by mixing them into one PR it makes the review process unfocused. By mixing cleanups, fixes, and features in one PR, I as a reviewer must also spend more time trying to figure out what changes are necessary for what reason. I'm personally more likely to review smaller PRs that have a clear focus. > >> 1) For the changes to the pretouch loop. >> >> * I'm not sure we need all the added safeguarding, which, as you say, complicate the code. I think a few asserts would be good enough. > It?s not; see below. > >> I don't think the function needs to: >> - work if the caller pass in end < start. Sounds like an assert. > There's already an assert for end < start. > > The test currently in the code is for start < end, e.g. is the range > non-empty. I originally tried asserting a non-empty range, but that seems > such show up sometimes. I didn't try to figure out whether those could be > eliminated, though sometimes empty ranges show up pretty naturally and > range-based operations typically permit the range to be empty. > > I found the empty range check up front simplified the analysis of the > protected code, since certain kinds of errors simply can't happen because of > it. Unless empty ranges are forbidden, something needs to be done somewhere > to prevent writing outside the range. Before continuing the discussing around these checks, I'd like to understand the motivation for some of the changes. I'm specifically looking at the alignment of sizeof(int). I would have expected that all 'start' addresses being passed to the "touch" code to be at least aligned to os::vm_page_size(). Straying away from that seems to add complexity, that I'm not sure is worth having. Do you have a specific use-case that needs this? Or could we simplify the code by adding an assert(is_aligned(start, os::vm_page_size), "")? If not, could you point me to the code that doesn't conform to that? > >> - Safeguard that end + page doesn't overflow. Do we ever hand out pages at the end of the virtual address range? There's a lot of HotSpot code that assumes that we never get that kind of memory, so is it worth protecting against this compared to the extra complexity it gives? Previously, I could take a glance at that function and understand it. Now I need to pause and stare at it a bit. > I suspect overflow can't happen on at least some 64bit platforms, but I > don't know of anything that would prevent it on a 32bit platform. And I had > exactly the opposite reaction from you to the old code. I looked at it and > immediately wondered what might happen on pointer overflow, and questioned > whether I understood the code. > > And as noted in the PR description, the old code for PretouchTask checks for > overflow, but does so incorrectly, such that the check could be optimized > away (or lead to other problems) because it would involve UB to fail the check. I don't know of any platform that hands out pages at the top of the address range. (Not saying that I know the details of all platforms).? However, maybe this argument is more appealing: If the program got hold of a memory address at the end of the address range, then I think C++'s pointer comparison would break down. There's a section that states: "If one pointer points to an element of an array, or to a subobject thereof, and another pointer points one past the last element of the array, the latter pointer compares greater." So, if we have a 4K byte array at 0xfffff000, then the start address would obviously be 0xfffff000, but "one past the last element" would be 0xfffff000 + 0x1000, which would overflow, likely resulting in the address 0x0 (barring UB issues). If that happens, then it seems like the statement above is at risk. > >> 2) For the new infrastructure. >> >> * I'm not sure we should add TouchTask until we have a concrete use-case (with prototype/PR) for it. Without that, it is hard to gauge the usability of this feature. How do other threads proceed concurrently with the worker threads? What is the impact of using TouchTask? One of the concerns that has been brought up by others is that using a WorkGang will block the rest of the JVM from safepointing while the workers are pre-touching. A use-case would clarify this, I think. > Here is a description of a specific example from my experiments. When the G1 > allocator runs out of regions that were pre-allocated for eden and can't GC > because of the GCLocker, it tries to allocate a new region, first from the > free region list, and then an actually new region (assuming that's possible > within the heap size constraints). In the latter case, when AlwaysUsePretouch > is true, it currently does a *single-threaded* pretouch of the region and > ancillary memory. (It's single-threaded because most of the code path involved > is shared with during-GC code that is running under the work gang. That code > is a good candidate for os::touch_memory, so adding a gang argument through > that call tree that is null to indicate don't pretouch doesn't seem like an > improvement.) It could instead do a parallel concurrent touch after making the > new region available for other mutator threads to use for allocation. Yes, it > blocks safepoints, but that's already true for the existing code, and the > existing code may block them for *much* longer due to the unparallelized > pretouch. The downside of using a workgang here is the gang threads could be > competing with mutator threads for cores. I think parallelizing here is likely > beneficial though. I think there's a similar situation in ParallelGC, though I > haven't chased through all of that code carefully yet. Thanks for the use-case. Maybe the WorkGang is an easy and good enough approach to start with, even with its use of a safepoint-blocking mechanism. Is this feature going to be turned off by default? I have the feeling that for small enough requests, it will be better to let the allocating thread "concurrent touch" the memory. Other Java threads that allocate small objects, will probably get the paged in memory faster than if we spawn up worker threads. It would also be interesting to see an approach that touches memory in an async thread (though it needs to cooperate with the JVM so that the memory isn't uncommitted). > >> * Is the usage of template specialization really needed? Wouldn't it be sufficient to pass down a bool parameter instead? I doubt we would see any performance difference by doing that. > The use of a template here seemed pretty straight-forward when I wrote it, > but I agree a boolean argument would work just as well and is more obvious > to read. I'll change that. I expect the same generated code either way. > >> * It's a little bit confusing that in the context of TouchTask "touch" means "concurrent touch", while in the touch_memory_* it means either concurrent or non-concurrent. Maybe rename TouchTask to something to make this distinction a bit clearer. > I'm open to alternative naming. I thought ConcurrentTouchTask was rather > long. My experiments repo currently uses PretouchTask::concurrent_touch, but > I don't like that naming either. > > The rationale for "touch" vs "pretouch" is "touching" is the primary generic > concept, and touch_memory can be used anywhere. Meanwhile "pretouching" is now > a restricted variant that might have better performance under those limitions. > Similarly for TouchTask vs PretouchTask. The static functions are just shared > helpers for the API functions; "touch_memory" is the "primary" thing. I think ConcurrentTouchTask is a good and descriptive name that will aid the readability. StefanK > From stefank at openjdk.java.net Fri Sep 10 13:48:39 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Fri, 10 Sep 2021 13:48:39 GMT Subject: RFR: 8273597: Rectify Thread::is_ConcurrentGC_thread() In-Reply-To: References: Message-ID: On Fri, 10 Sep 2021 12:39:14 GMT, Per Liden wrote: > `Thread::is_ConcurrentGC_thread()` behaves differently to all other `Thread::is_xxx_thread()` functions, in the sense that it doesn't directly map to a distinct `Thread` sub-class. Instead, `is_ConcurrentGC_thread()` can today return true for both `ConcurrentGCThread` and `GangWorker`. These two classes have no super/sub-class relation. This is confusing and and potentially dangerous. > > It would be reasonable to think that code like this would be correct: > > > if (thread->is_ConcurrentGC_thread()) { > conc_thread = static_cast(thread); > ... > } > > > but it's not, since we might try to cast a `GangWorker` to a `ConcurrentGCThread`. And again, these two classes have no super/sub-class relation. > > I propose that we clean this up, so that `is_ConcurrentGCThread()` only returns true for threads inheriting from `ConcurrentGCThread`. The main side-effect is that a handful of asserts need to be adjusted. In return, the code example above would become legal, and we can also remove some cruft from `WorkGang`/`GangWorker`. Thanks for cleaning this up. src/hotspot/share/code/nmethod.cpp line 1563: > 1561: DEBUG_ONLY(bool called_by_gc = Universe::heap()->is_gc_active() || > 1562: Thread::current()->is_ConcurrentGC_thread() || > 1563: Thread::current()->is_Worker_thread();) Three places use the same condition. Did you consider creating a helper function? src/hotspot/share/gc/parallel/parallelScavengeHeap.hpp line 122: > 120: _old_pool(NULL), > 121: _workers("GC Thread", > 122: ParallelGCThreads) { } Consider moving this up to the line above, like the G1 code. ------------- Marked as reviewed by stefank (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5463 From aph at openjdk.java.net Fri Sep 10 14:00:18 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 10 Sep 2021 14:00:18 GMT Subject: RFR: JDK-8271567: AArch64: AES Galois CounterMode (GCM) interleaved implementation using vector instructions [v8] In-Reply-To: References: Message-ID: > An interleaved version of AES/GCM. > > Performance, now and then: > > > Apple M1, 3.2 GHz: > > Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > AESGCMBench.decrypt 8192 256 avgt 6 3108.881 ? 119.675 ns/op > AESGCMBench.decryptMultiPart 8192 256 avgt 6 3109.685 ? 4.206 ns/op > AESGCMBench.encrypt 8192 256 avgt 6 3122.144 ? 113.379 ns/op > AESGCMBench.encryptMultiPart 8192 256 avgt 6 3119.568 ? 192.877 ns/op > > AESGCMBench.decrypt 8192 256 avgt 6 89123.942 ? 111.977 ns/op > AESGCMBench.decryptMultiPart 8192 256 avgt 6 91034.697 ? 161.469 ns/op > AESGCMBench.encrypt 8192 256 avgt 6 89732.397 ? 106.370 ns/op > AESGCMBench.encryptMultiPart 8192 256 avgt 6 89382.300 ? 139.300 ns/op > > Neoverse N1, 2.5GHz: > > Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > AESGCMBench.decrypt 8192 256 avgt 6 6296.575 ? 37.995 ns/op > AESGCMBench.decryptMultiPart 8192 256 avgt 6 7380.326 ? 10.987 ns/op > AESGCMBench.encrypt 8192 256 avgt 6 6293.090 ? 52.972 ns/op > AESGCMBench.encryptMultiPart 8192 256 avgt 6 6357.536 ? 42.925 ns/op > > AESGCMBench.decrypt 8192 256 avgt 6 48745.085 ? 125.612 ns/op > AESGCMBench.decryptMultiPart 8192 256 avgt 6 45062.599 ? 1548.950 ns/op > AESGCMBench.encrypt 8192 256 avgt 6 42230.857 ? 520.562 ns/op > AESGCMBench.encryptMultiPart 8192 256 avgt 6 45124.171 ? 1417.927 ns/op > > > > A note about the implementation for the reviewers: > > Unrolled and hand-scheduled intrinsics are often written in a way that > I don't find satisfactory. Often they are a conglomeration of > copy-and-paste programming and C macros, which makes them hard to > understand and hard to maintain. I won't name any names, but there are > many exampled to be found in free software across the Internet, > > I spent a while thinking about a structured way to develop and > implement them, and I think I've got something better. The idea is > that you transform a pre-existing implementation into a generator for > the interleaved version. The transformation shouldn't be too hard to > do, but more importantly it should be possible for a reader to verify > that the interleaved and unrolled version performs the same function. > > A generator takes the form of a subclass of `KernelGenerator`. The > core idea is that the programmer defines the base case of the > intrinsic and a method to generate a clone of it, shifted to a > different set of registers. `KernelGenerator` will then generate > several interleaved copies of the function, with each one using a > different set of registers. > > The subclass must implement three methods: `length()`, which is the > number of instruction bundles in the intrinsic, `generate(int n)` > which emits the nth instruction bundle in the intrinsic, and `next()` > which takes an instance of the generator and returns a version of it, > shifted to a new set of registers. > > As an example, here's the inner loop of AES encryption: > > (Some details elided for clarity.) > > > BIND(L_aes_loop); > ld1(v0, T16B, post(from, 16)); > > cmpw(keylen, 44); > br(Assembler::CC, L_rounds_44); > br(Assembler::EQ, L_rounds_52); > > aes_round(v0, v17); > aes_round(v0, v18); > BIND(L_rounds_52); > aes_round(v0, v19); > aes_round(v0, v20); > BIND(L_rounds_44); > ... > > > The generator for the unrolled version looks like: > > > virtual void generate(int index) { > switch (index) { > case 0: > ld1(_data, T16B, post(_from, 16)); // get 16 bytes of input > break; > case 1: > if (_once) { > cmpw(_keylen, 52); > br(Assembler::LO, _rounds_44); > br(Assembler::EQ, _rounds_52); > } > break; > case 2: aes_round(_data, _subkeys + 0); break; > case 3: aes_round(_data, _subkeys + 1); break; > case 4: > if (_once) bind(_rounds_52); > break; > case 5: aes_round(_data, _subkeys + 2); break; > case 6: aes_round(_data, _subkeys + 3); break; > case 7: > if (_once) bind(_rounds_44); > break; > ... > > > The job of converting a single inline intrinsic is, as you can see, > not much more than adding a switch statement. Some instructions should > only be emitted once, rather than several times, such as the labels > and branches. (You can use a list of C++ lambdas rather than a switch > statement to do the same thing, very LISP, but that seems a bit of a > sledgehammer. YMMV.) > > I believe that this approach will be more maintainable and easier to > understand than other approaches we've seen. Also, the number of > unrolls is just a number that can be tweaked as required. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Whitespace ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5390/files - new: https://git.openjdk.java.net/jdk/pull/5390/files/ba4fe416..9ce21890 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5390&range=07 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5390&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5390.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5390/head:pull/5390 PR: https://git.openjdk.java.net/jdk/pull/5390 From coleenp at openjdk.java.net Fri Sep 10 14:37:59 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 10 Sep 2021 14:37:59 GMT Subject: RFR: 8273597: Rectify Thread::is_ConcurrentGC_thread() In-Reply-To: References: Message-ID: <-e2FC1G1xRjJP9pJtKgQV3AhyIfdrtG72Isk3K-Gnng=.956fb2cc-17be-4292-ba20-6a66881be603@github.com> On Fri, 10 Sep 2021 12:39:14 GMT, Per Liden wrote: > `Thread::is_ConcurrentGC_thread()` behaves differently to all other `Thread::is_xxx_thread()` functions, in the sense that it doesn't directly map to a distinct `Thread` sub-class. Instead, `is_ConcurrentGC_thread()` can today return true for both `ConcurrentGCThread` and `GangWorker`. These two classes have no super/sub-class relation. This is confusing and and potentially dangerous. > > It would be reasonable to think that code like this would be correct: > > > if (thread->is_ConcurrentGC_thread()) { > conc_thread = static_cast(thread); > ... > } > > > but it's not, since we might try to cast a `GangWorker` to a `ConcurrentGCThread`. And again, these two classes have no super/sub-class relation. > > I propose that we clean this up, so that `is_ConcurrentGCThread()` only returns true for threads inheriting from `ConcurrentGCThread`. The main side-effect is that a handful of asserts need to be adjusted. In return, the code example above would become legal, and we can also remove some cruft from `WorkGang`/`GangWorker`. Nice cleanup. One comment and suggestion. src/hotspot/share/gc/z/zCollectedHeap.cpp line 84: > 82: virtual void do_thread(Thread* thread) { > 83: if (thread->is_ConcurrentGC_thread()) { > 84: static_cast(thread)->stop(); Should you have a cast function like JavaThread does ? static JavaThread* cast(Thread* t) { assert(t->is_Java_thread(), "incorrect cast to JavaThread"); return static_cast(t); } At one point @dholmes replaced all the thread->as_Java_thread() with JavaThread::cast() so this would be consistent with that and nice. ------------- PR: https://git.openjdk.java.net/jdk/pull/5463 From coleenp at openjdk.java.net Fri Sep 10 14:38:00 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 10 Sep 2021 14:38:00 GMT Subject: RFR: 8273597: Rectify Thread::is_ConcurrentGC_thread() In-Reply-To: References: Message-ID: On Fri, 10 Sep 2021 13:42:27 GMT, Stefan Karlsson wrote: >> `Thread::is_ConcurrentGC_thread()` behaves differently to all other `Thread::is_xxx_thread()` functions, in the sense that it doesn't directly map to a distinct `Thread` sub-class. Instead, `is_ConcurrentGC_thread()` can today return true for both `ConcurrentGCThread` and `GangWorker`. These two classes have no super/sub-class relation. This is confusing and and potentially dangerous. >> >> It would be reasonable to think that code like this would be correct: >> >> >> if (thread->is_ConcurrentGC_thread()) { >> conc_thread = static_cast(thread); >> ... >> } >> >> >> but it's not, since we might try to cast a `GangWorker` to a `ConcurrentGCThread`. And again, these two classes have no super/sub-class relation. >> >> I propose that we clean this up, so that `is_ConcurrentGCThread()` only returns true for threads inheriting from `ConcurrentGCThread`. The main side-effect is that a handful of asserts need to be adjusted. In return, the code example above would become legal, and we can also remove some cruft from `WorkGang`/`GangWorker`. > > src/hotspot/share/code/nmethod.cpp line 1563: > >> 1561: DEBUG_ONLY(bool called_by_gc = Universe::heap()->is_gc_active() || >> 1562: Thread::current()->is_ConcurrentGC_thread() || >> 1563: Thread::current()->is_Worker_thread();) > > Three places use the same condition. Did you consider creating a helper function? I wonder if adding a helper function would encourage people to think they're the same again. ------------- PR: https://git.openjdk.java.net/jdk/pull/5463 From eosterlund at openjdk.java.net Fri Sep 10 14:39:47 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 10 Sep 2021 14:39:47 GMT Subject: RFR: 8273456: Do not hold ttyLock around stack walking In-Reply-To: References: Message-ID: On Thu, 9 Sep 2021 14:54:01 GMT, Coleen Phillimore wrote: > This change moves the tty rank back down to near access, and prints stack traces to stringStream to avoid holding the tty lock while trying to take the stackwatermark lock. > Tested with tier1-8 (7,8 still in progress but no failures so far). Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5445 From coleenp at openjdk.java.net Fri Sep 10 14:57:49 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 10 Sep 2021 14:57:49 GMT Subject: RFR: 8273456: Do not hold ttyLock around stack walking In-Reply-To: References: Message-ID: On Thu, 9 Sep 2021 14:54:01 GMT, Coleen Phillimore wrote: > This change moves the tty rank back down to near access, and prints stack traces to stringStream to avoid holding the tty lock while trying to take the stackwatermark lock. > Tested with tier1-8 (7,8 still in progress but no failures so far). Thanks Erik! ------------- PR: https://git.openjdk.java.net/jdk/pull/5445 From coleenp at openjdk.java.net Fri Sep 10 14:57:50 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 10 Sep 2021 14:57:50 GMT Subject: Integrated: 8273456: Do not hold ttyLock around stack walking In-Reply-To: References: Message-ID: On Thu, 9 Sep 2021 14:54:01 GMT, Coleen Phillimore wrote: > This change moves the tty rank back down to near access, and prints stack traces to stringStream to avoid holding the tty lock while trying to take the stackwatermark lock. > Tested with tier1-8 (7,8 still in progress but no failures so far). This pull request has now been integrated. Changeset: 461a467f Author: Coleen Phillimore URL: https://git.openjdk.java.net/jdk/commit/461a467f91ba19ae35d7833b7d3e74f62f52e19c Stats: 126 lines in 5 files changed: 54 ins; 45 del; 27 mod 8273456: Do not hold ttyLock around stack walking Reviewed-by: dholmes, eosterlund ------------- PR: https://git.openjdk.java.net/jdk/pull/5445 From coleenp at openjdk.java.net Fri Sep 10 15:33:07 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 10 Sep 2021 15:33:07 GMT Subject: RFR: 8273300: Check Mutex ranking during a safepoint Message-ID: This change checks lock ranking during a safepoint. For some reason, safepoint checking was excluded, probably from the days where Safepoint_lock and Threads_lock were used. Because of checking during a safepoint, some locks had to get lower ranks. The CR has the details of which locks these were. The Service_lock complicates things because it's held during oops_do, which may take out other G1 locks. This was built and tested with Shenandoah. Thanks to @zhengyu123 for the changes in Shenandoah. Tests run tier1-8. ------------- Commit messages: - 8273300: Check Mutex ranking during a safepoint Changes: https://git.openjdk.java.net/jdk/pull/5467/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5467&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273300 Stats: 43 lines in 13 files changed: 1 ins; 18 del; 24 mod Patch: https://git.openjdk.java.net/jdk/pull/5467.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5467/head:pull/5467 PR: https://git.openjdk.java.net/jdk/pull/5467 From sviswanathan at openjdk.java.net Fri Sep 10 15:43:51 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Fri, 10 Sep 2021 15:43:51 GMT Subject: Integrated: 8273512: Fix the copyright header of x86 macroAssembler files In-Reply-To: <89boYXfzPR50KNk_ODjVPoaNZpRqnhib7ubqDgw8ftY=.8d25a4fe-59b8-4c7e-9e35-7ad1b9049df4@github.com> References: <89boYXfzPR50KNk_ODjVPoaNZpRqnhib7ubqDgw8ftY=.8d25a4fe-59b8-4c7e-9e35-7ad1b9049df4@github.com> Message-ID: <3bfGDIHSM6TQo7-6cKKGqApPEuP-2wt5yPCvfZ-3YYU=.26868b23-b903-4b3f-b491-71764a094ffa@github.com> On Wed, 8 Sep 2021 20:09:10 GMT, Sandhya Viswanathan wrote: > Fix the copyright header of x86 macroAssembler files to match others. This pull request has now been integrated. Changeset: e58c12e6 Author: Sandhya Viswanathan URL: https://git.openjdk.java.net/jdk/commit/e58c12e61828485bfffbc9d1b865302b93a94158 Stats: 13 lines in 12 files changed: 0 ins; 0 del; 13 mod 8273512: Fix the copyright header of x86 macroAssembler files Reviewed-by: dholmes, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/5424 From eosterlund at openjdk.java.net Fri Sep 10 16:23:53 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 10 Sep 2021 16:23:53 GMT Subject: RFR: 8273300: Check Mutex ranking during a safepoint In-Reply-To: References: Message-ID: On Fri, 10 Sep 2021 15:23:49 GMT, Coleen Phillimore wrote: > This change checks lock ranking during a safepoint. For some reason, safepoint checking was excluded, probably from the days where Safepoint_lock and Threads_lock were used. > Because of checking during a safepoint, some locks had to get lower ranks. The CR has the details of which locks these were. The Service_lock complicates things because it's held during oops_do, which may take out other G1 locks. > This was built and tested with Shenandoah. Thanks to @zhengyu123 for the changes in Shenandoah. > Tests run tier1-8. Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5467 From coleenp at openjdk.java.net Fri Sep 10 17:06:57 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 10 Sep 2021 17:06:57 GMT Subject: RFR: 8273300: Check Mutex ranking during a safepoint In-Reply-To: References: Message-ID: On Fri, 10 Sep 2021 15:23:49 GMT, Coleen Phillimore wrote: > This change checks lock ranking during a safepoint. For some reason, safepoint checking was excluded, probably from the days where Safepoint_lock and Threads_lock were used. > Because of checking during a safepoint, some locks had to get lower ranks. The CR has the details of which locks these were. The Service_lock complicates things because it's held during oops_do, which may take out other G1 locks. > This was built and tested with Shenandoah. Thanks to @zhengyu123 for the changes in Shenandoah. > Tests run tier1-8. Thanks Erik! ------------- PR: https://git.openjdk.java.net/jdk/pull/5467 From mseledtsov at openjdk.java.net Fri Sep 10 19:52:48 2021 From: mseledtsov at openjdk.java.net (Mikhailo Seledtsov) Date: Fri, 10 Sep 2021 19:52:48 GMT Subject: RFR: 8273438: Enable parallelism in vmTestbase/metaspace/stressHierarchy tests In-Reply-To: <6UGVOWy8QGpYDMbNFkT6qIERHESLMdZpvz8ihmm_obg=.cadc1f2e-3325-4bdc-a0c7-e4579f72663f@github.com> References: <6UGVOWy8QGpYDMbNFkT6qIERHESLMdZpvz8ihmm_obg=.cadc1f2e-3325-4bdc-a0c7-e4579f72663f@github.com> Message-ID: On Tue, 7 Sep 2021 15:07:10 GMT, Aleksey Shipilev wrote: > Current `vmTestbase/metaspace/stressHierarchy` tests (part of vmTestbase_vm_metaspace suite) contains about 15 tests, each running exclusively. There seem to be no reason to run them exclusively, though: they complete in reasonable time, are single-threaded, and consume the usual amount of memory. There is no evidence in JBS that they ever timed out without a reason, and their history unfortunately predates OpenJDK to see why they were not concurrent from day one. > > We should consider enabling parallelism for `vmTestbase/metaspace/stressHierarchy` and get improved test performance. Currently it is blocked by `TEST.properties` with `exclusiveAccess.dirs` directives in them. > > Note there are other exclusive tests in `vmTestbase_vm_metaspace`, but those seem to be the hard stress tests: pushing GC to the limits, or doing many threads, etc. > > Motivational test time improvements below. > > Before: > > > $ time CONF=linux-x86_64-server-fastdebug make run-test TEST=vmTestbase_vm_metaspace | ts -s > ... > 00:24:53 ============================== > 00:24:53 Test summary > 00:24:53 ============================== > 00:24:53 TEST TOTAL PASS FAIL ERROR > 00:24:53 jtreg:test/hotspot/jtreg:vmTestbase_vm_metaspace 25 25 0 0 > 00:24:53 ============================== > 00:24:53 TEST SUCCESS > 00:24:53 > 00:24:53 Finished building target 'run-test' in configuration 'linux-x86_64-server-fastdebug' > > real 24m53.389s > user 53m2.029s > sys 1m1.849s > > > After: > > > $ time CONF=linux-x86_64-server-fastdebug make run-test TEST=vmTestbase_vm_metaspace | ts -s > ... > 00:04:04 ============================== > 00:04:04 Test summary > 00:04:04 ============================== > 00:04:04 TEST TOTAL PASS FAIL ERROR > 00:04:04 jtreg:test/hotspot/jtreg:vmTestbase_vm_metaspace 25 25 0 0 > 00:04:04 ============================== > 00:04:04 TEST SUCCESS > 00:04:04 > 00:04:04 Finished building target 'run-test' in configuration 'linux-x86_64-server-fastdebug' > > real 4m4.574s > user 56m10.582s > sys 1m4.725s Stability testing passed with the change. Change looks good to me. ------------- Marked as reviewed by mseledtsov (Committer). PR: https://git.openjdk.java.net/jdk/pull/5391 From coleenp at openjdk.java.net Fri Sep 10 21:15:29 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 10 Sep 2021 21:15:29 GMT Subject: RFR: 8273300: Check Mutex ranking during a safepoint [v2] In-Reply-To: References: Message-ID: <2yiIxm7CaiJf3_wiiYcuQjbRNFauGO0MJoJGUphSmR0=.29fe408e-8707-402f-8e3c-5366aff3a8cc@github.com> > This change checks lock ranking during a safepoint. For some reason, safepoint checking was excluded, probably from the days where Safepoint_lock and Threads_lock were used. > Because of checking during a safepoint, some locks had to get lower ranks. The CR has the details of which locks these were. The Service_lock complicates things because it's held during oops_do, which may take out other G1 locks. > This was built and tested with Shenandoah. Thanks to @zhengyu123 for the changes in Shenandoah. > Tests run tier1-8. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Fix Shenandoah mismerge ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5467/files - new: https://git.openjdk.java.net/jdk/pull/5467/files/13355972..dcb03c3a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5467&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5467&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5467.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5467/head:pull/5467 PR: https://git.openjdk.java.net/jdk/pull/5467 From kim.barrett at oracle.com Sun Sep 12 19:44:40 2021 From: kim.barrett at oracle.com (Kim Barrett) Date: Sun, 12 Sep 2021 19:44:40 +0000 Subject: RFR: 8272807: Permit use of memory concurrent with pretouch In-Reply-To: References: <6RDTlrQqg36KCzXUAD_O5bQhrgYnTsBwLkIaaltpQZ4=.9d011747-8966-4991-9173-85a492172bbe@github.com> <9A1918F7-AD8E-4A54-9B7E-A8F29886B526@oracle.com> Message-ID: <8176744C-BBF3-401A-9A4E-FBB2C91F8F12@oracle.com> > On Sep 10, 2021, at 9:38 AM, Stefan Karlsson wrote: > > On 2021-09-09 00:19, Kim Barrett wrote: >>> On Sep 8, 2021, at 10:07 AM, Stefan Karlsson wrote: >>> >>> On Thu, 2 Sep 2021 18:33:56 GMT, Kim Barrett wrote: >>> >>>> Note that this PR replaces the withdrawn https://github.com/openjdk/jdk/pull/5215. >>>> >>>> Please review this change which adds os::touch_memory, which is similar to >>>> os::pretouch_memory but allows concurrent access to the memory while it is >>>> being touched. This is accomplished by using an atomic add of zero as the >>>> operation for touching the memory, ensuring the virtual location is backed >>>> by physical memory while not changing any values being read or written by >>>> other threads. >>>> >>> I think it would be prudent to separate this PR into two separate PRs. >> I don?t see any benefit to that, but I can do it. > > There reason why I suggest that this gets split into two PRs is that there seems to be at least two different reasons for changing the code, and by mixing them into one PR it makes the review process unfocused. By mixing cleanups, fixes, and features in one PR, I as a reviewer must also spend more time trying to figure out what changes are necessary for what reason. I'm personally more likely to review smaller PRs that have a clear focus. I expected that if I only proposed adding the serial version that one of the first things any reviewer would ask is why I wasn't also adding the parallel version. (But maybe you wouldn't have asked that.) I could have separated the restructuring of the pretouch loop as a preliminary change. Similarly, I could have separated the restructuring of the parallel pretouch chunk claim as a preliminary change. I thought about doing that (and I'm guessing you would have preferred it), but neither seemed that useful as I'd be immediately refactoring the code anyway. I could have added concurrent touching support and some uses of it, but decided to first do a PR devoted to just the infrastructure, leaving usage to later, where the usage changes would be the focus. Out of the many possible ways to slice things up, I picked one that seemed to me to be coherent and of a reasonable (not too large) size. If you need a different slicing before reviewing this, please make a specific request. >>> 1) For the changes to the pretouch loop. >>> >>> * I'm not sure we need all the added safeguarding, which, as you say, complicate the code. I think a few asserts would be good enough. >> It?s not; see below. >> >>> I don't think the function needs to: >>> - work if the caller pass in end < start. Sounds like an assert. >> There's already an assert for end < start. >> >> The test currently in the code is for start < end, e.g. is the range >> non-empty. I originally tried asserting a non-empty range, but that seems >> such show up sometimes. I didn't try to figure out whether those could be >> eliminated, though sometimes empty ranges show up pretty naturally and >> range-based operations typically permit the range to be empty. >> >> I found the empty range check up front simplified the analysis of the >> protected code, since certain kinds of errors simply can't happen because of >> it. Unless empty ranges are forbidden, something needs to be done somewhere >> to prevent writing outside the range. > > Before continuing the discussing around these checks, I'd like to understand the motivation for some of the changes. I'm specifically looking at the alignment of sizeof(int). I would have expected that all 'start' addresses being passed to the "touch" code to be at least aligned to os::vm_page_size(). Straying away from that seems to add complexity, that I'm not sure is worth having. Do you have a specific use-case that needs this? Or could we simplify the code by adding an assert(is_aligned(start, os::vm_page_size), "")? If not, could you point me to the code that doesn't conform to that? I *think* all existing dynamic uses ensure the start (and I think end too) are at least vm_page_size() aligned. There's at least one static use that definitely doesn't: BitMap::pretouch, of which there appear to be no callers anymore. The existing pretouch implementation doesn't require (as in assert or comments) any particular alignment, and will complete without error (subject to any overflow issues) regardless of the alignments of the arguments. It is documented as possibly not touching the last page if the arguments aren't (presumably page) aligned. So it appears that pretouching has always permitted arguments that aren't page aligned. Many of the pretouch callers perform explicit page alignment of the arguments from values that might not be so aligned. Most of those occur in the context of some operation (like commit) that is already dealing with pages and page-aligned values. I'm guessing they are doing to ensure they avoid the documented possible lack of touching the last page. That seems like the wrong API tradeoff to me, given the ease with which arbitrarily unaligned ranges can be handled by pretouch (as demonstrated by the new code). And it's potentially much worse for callers of the proposed new touch operation, which will be called in some entirely different context. For example, when ParallelGC expands the old generation, the higher level code where the new touch operation would be placed (PSOldGen::expand_for_allocate) calls an expansion function with a minimum size and has available the start and end of the added range. If I recall correctly, that start and end are going to be page aligned, but that information comes from digging through the call chain; there's no promise of such by the expansion function (and no reason for it to make such a promise, as it's really just an artifact of the lower level implementation). The touch call would either need to assume (and assert) that alignment or ensure it, neither of which is an improvement to the calling code. There are similar situations in G1 when touching the auxiliary memory associated with a region (card table, BOT, &etc). The size of the card table range associated with a 1M region is 2K, which is less than an linux default x86 page, and much less than a linux default aarch64 page. I thought about making the new concurrent touch operation require the range to be int-aligned. I didn't because I think it's important to have the two touch variants be API compatible, and there was no reason to add any alignment requirement to pretouch. Adding such a requirement to touch would just be allowing the underlying implementation to show through in an inconvenient way for callers. >>> - Safeguard that end + page doesn't overflow. Do we ever hand out pages at the end of the virtual address range? There's a lot of HotSpot code that assumes that we never get that kind of memory, so is it worth protecting against this compared to the extra complexity it gives? Previously, I could take a glance at that function and understand it. Now I need to pause and stare at it a bit. >> I suspect overflow can't happen on at least some 64bit platforms, but I >> don't know of anything that would prevent it on a 32bit platform. And I had >> exactly the opposite reaction from you to the old code. I looked at it and >> immediately wondered what might happen on pointer overflow, and questioned >> whether I understood the code. >> >> And as noted in the PR description, the old code for PretouchTask checks for >> overflow, but does so incorrectly, such that the check could be optimized >> away (or lead to other problems) because it would involve UB to fail the check. > > I don't know of any platform that hands out pages at the top of the address range. (Not saying that I know the details of all platforms). However, maybe this argument is more appealing: If the program got hold of a memory address at the end of the address range, then I think C++'s pointer comparison would break down. There's a section that states: > "If one pointer points to an element of an array, or to a subobject thereof, and another pointer points one past the last element of the array, the latter pointer compares greater." > > So, if we have a 4K byte array at 0xfffff000, then the start address would obviously be 0xfffff000, but "one past the last element" would be 0xfffff000 + 0x1000, which would overflow, likely resulting in the address 0x0 (barring UB issues). If that happens, then it seems like the statement above is at risk. It is indeed the case that C++ pointer comparison can break down. That quote is about arrays created by facilities provided by the C/C++ language and its libraries, like malloc. But we regularly step outside that boundry by using OS facilities like mmap. As an example, it's for this reason that we have the pointer_delta utility function. On a 32bit platform a "proper" C/C++ array can't be 2Gbytes or larger in size, because the difference between the start and end can't be be represented by a 32bit ptrdiff_t. That doesn't prevent the C/C++ language implementation from using memory near the end of the address space; it just needs to take some care to avoid allocating an array whose end is at the end of the address space. Do we have potentially incorrect code because of this? Quite possibly. I didn't want to go on a speculative hunt for such, but tidying up code that I was touching (no pun intended) anyway seemed appropriate. >>> 2) For the new infrastructure. >>> >>> * I'm not sure we should add TouchTask until we have a concrete use-case (with prototype/PR) for it. Without that, it is hard to gauge the usability of this feature. How do other threads proceed concurrently with the worker threads? What is the impact of using TouchTask? One of the concerns that has been brought up by others is that using a WorkGang will block the rest of the JVM from safepointing while the workers are pre-touching. A use-case would clarify this, I think. >> Here is a description of a specific example from my experiments. When the G1 >> allocator runs out of regions that were pre-allocated for eden and can't GC >> because of the GCLocker, it tries to allocate a new region, first from the >> free region list, and then an actually new region (assuming that's possible >> within the heap size constraints). In the latter case, when AlwaysUsePretouch >> is true, it currently does a *single-threaded* pretouch of the region and >> ancillary memory. (It's single-threaded because most of the code path involved >> is shared with during-GC code that is running under the work gang. That code >> is a good candidate for os::touch_memory, so adding a gang argument through >> that call tree that is null to indicate don't pretouch doesn't seem like an >> improvement.) It could instead do a parallel concurrent touch after making the >> new region available for other mutator threads to use for allocation. Yes, it >> blocks safepoints, but that's already true for the existing code, and the >> existing code may block them for *much* longer due to the unparallelized >> pretouch. The downside of using a workgang here is the gang threads could be >> competing with mutator threads for cores. I think parallelizing here is likely >> beneficial though. I think there's a similar situation in ParallelGC, though I >> haven't chased through all of that code carefully yet. > > Thanks for the use-case. > > Maybe the WorkGang is an easy and good enough approach to start with, even with its use of a safepoint-blocking mechanism. Is this feature going to be turned off by default? The current code is already safepint-blocking. The idea in the above is to allow other threads to make use of the newly allocated chunk sooner. I think the feature might be under some user control, though perhaps something different from AlwaysPreTouch. > I have the feeling that for small enough requests, it will be better to let the allocating thread "concurrent touch" the memory. Other Java threads that allocate small objects, will probably get the paged in memory faster than if we spawn up worker threads. It would also be interesting to see an approach that touches memory in an async thread (though it needs to cooperate with the JVM so that the memory isn't uncommitted). Small requests, like at the Java small object level, wouldn't normally go through this mechanism. Pretouch is used for larger allocation chunks, and I think the new mechanism would be operating on the same or similar granularity. So MinHeapDeltaBytes, G1HeapRegionSize, and so on. I've done some experimenting with cooperative touching at smaller granularities, but haven't come up with an abstraction I was happy with. (Not that I think the existing PretouchTask style is ideal; it doesn't support parallel touching of a G1 region's auxiliary data very well.) The idea is that as each thread carves (say) a new TLAB out of a region, it can (if needed) advance the touched boundry for the region. >>> * Is the usage of template specialization really needed? Wouldn't it be sufficient to pass down a bool parameter instead? I doubt we would see any performance difference by doing that. >> The use of a template here seemed pretty straight-forward when I wrote it, >> but I agree a boolean argument would work just as well and is more obvious >> to read. I'll change that. I expect the same generated code either way. >> >>> * It's a little bit confusing that in the context of TouchTask "touch" means "concurrent touch", while in the touch_memory_* it means either concurrent or non-concurrent. Maybe rename TouchTask to something to make this distinction a bit clearer. >> I'm open to alternative naming. I thought ConcurrentTouchTask was rather >> long. My experiments repo currently uses PretouchTask::concurrent_touch, but >> I don't like that naming either. >> >> The rationale for "touch" vs "pretouch" is "touching" is the primary generic >> concept, and touch_memory can be used anywhere. Meanwhile "pretouching" is now >> a restricted variant that might have better performance under those limitions. >> Similarly for TouchTask vs PretouchTask. The static functions are just shared >> helpers for the API functions; "touch_memory" is the "primary" thing. > > I think ConcurrentTouchTask is a good and descriptive name that will aid the readability. I think ConcurrentTouchTask has a defect in that it suggests it's for use during the concurrent part of a GC cycle. But it can and should be used in some during-GC-pause or other safepoint situations. That is, "concurrent" is overloaded in our world. That's why I suggest thinking about it differently, with "touch" being generic and "pretouch" being restricted and perhaps gaining an efficiency benefit from the restriction. From dholmes at openjdk.java.net Sun Sep 12 22:36:48 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Sun, 12 Sep 2021 22:36:48 GMT Subject: RFR: 8273300: Check Mutex ranking during a safepoint [v2] In-Reply-To: <2yiIxm7CaiJf3_wiiYcuQjbRNFauGO0MJoJGUphSmR0=.29fe408e-8707-402f-8e3c-5366aff3a8cc@github.com> References: <2yiIxm7CaiJf3_wiiYcuQjbRNFauGO0MJoJGUphSmR0=.29fe408e-8707-402f-8e3c-5366aff3a8cc@github.com> Message-ID: <0fzRARQKdoAbSLHq_SaKs698DuGgRWMkCUI4omDmurk=.acb1126f-5b41-46ad-a022-74eabffdb71f@github.com> On Fri, 10 Sep 2021 21:15:29 GMT, Coleen Phillimore wrote: >> This change checks lock ranking during a safepoint. For some reason, safepoint checking was excluded, probably from the days where Safepoint_lock and Threads_lock were used. >> Because of checking during a safepoint, some locks had to get lower ranks. The CR has the details of which locks these were. The Service_lock complicates things because it's held during oops_do, which may take out other G1 locks. >> This was built and tested with Shenandoah. Thanks to @zhengyu123 for the changes in Shenandoah. >> Tests run tier1-8. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix Shenandoah mismerge Hi Coleen, The core change seems clean and simple, but the fanout of the ranking changes is far less clear and some of this looks like it could/should be separated out - see comments below. If I see a lock with a rank service-1, then should I infer that lock can be acquired while the service (or notification) lock is held? And that if lock A is service-1 and lock B is service-2, then B can be acquired while holding A? Thanks, David src/hotspot/share/memory/universe.cpp line 125: > 123: LatestMethodCache* Universe::_do_stack_walk_cache = NULL; > 124: > 125: bool Universe::_verify_in_progress = false; This cleanup seems completely unrelated to your mutex change and is best left to a separate cleanup RFE. src/hotspot/share/memory/universe.cpp line 1116: > 1114: } > 1115: if (should_verify_subset(Verify_CodeCache)) { > 1116: MutexLocker mu(CodeCache_lock, Mutex::_no_safepoint_check_flag); Is this needed to allow the new rankings to work? And is this enabled by the _verify_in_progress change? If so I'd rather see all of that related stuff changed first in a separate RFE that can easily be independently backported. ------------- PR: https://git.openjdk.java.net/jdk/pull/5467 From volker.simonis at gmail.com Mon Sep 13 08:08:40 2021 From: volker.simonis at gmail.com (Volker Simonis) Date: Mon, 13 Sep 2021 10:08:40 +0200 Subject: RFR: 8273392: Improve usability of stack-less exceptions due to -XX:+OmitStackTraceInFastThrow In-Reply-To: References: Message-ID: Hi, may I kindly ask somebody to please take a look at this PR? Thank you and best regards, Volker On Tue, Sep 7, 2021 at 5:42 PM Volker Simonis wrote: > > If running with `-XX:+OmitStackTraceInFastThrow` (which is the default) C2 will optimize certain "hot" implicit exceptions (i.e. AIOOBE, NullPointerExceptions,..) and replace them by a static, pre-allocated exception without any stacktrace. > > However, we can actually do better. Instead of using a single, pre-allocated exception object for all methods we can let the compiler allocate specific exceptions for each compilation unit (i.e. nmethod) and fill them with at least one stack frame with the method /line-number information of the currently compiled method. If the method in question is being inlined (which often happens), we can add stackframes for all callers up to the inlining depth of the method in question. > > For the attached JTreg test, we get the following exception in interpreter mode: > > java.lang.NullPointerException: Cannot read the array length because "" is null > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) > at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) > at compiler.exceptions.StackFrameInFastThrow.main(StackFrameInFastThrow.java:233) > > Once the method gets compiled with `-XX:+OmitStackTraceInFastThrow` the same exception will look as follows: > > java.lang.NullPointerException > > After this change, if `StackFrameInFastThrow.throwImplicitException()` will be compiled stand alone, we will get: > > java.lang.NullPointerException > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > > and if `StackFrameInFastThrow.throwImplicitException()` will be inlined into `level2()` and `level2()` into `level1()` we will get the following exception (altough we're still running with `-XX:+OmitStackTraceInFastThrow`): > > java.lang.NullPointerException > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) > at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) > > The new functionality is guarded by `-XX:+/-StackFrameInFastThrow`, but switched on by default (I'll create a CSR for the new option once reviewers are comfortable with the change). Notice that the optimization comes at no run-time costs because all the extra work will be done at compile time. > > ## Implementation details > > - Already the current implementation of `-XX:+OmitStackTraceInFastThrow` potentially lazy-allocates the empty singleton exceptions like AIOOBE in `ciEnv::ArrayStoreException_instance()`. With this change, if running with `-XX:+StackFrameInFastThrow` we will always allocate new exception objects and populate them with the stack frames which are statically available at compile time (see `java_lang_Throwable::fill_in_stack_trace_of_implicit_exception()`). > - Because nmethods don't act as strong GC roots, we have to create a global JNI handle for every newly generated exception to prevent GC from collecting them. > - In order to avoid a memory leak we have to release these global JNI handles once a nmethod gets unloaded. In order to achieve this, I've added a new section "implicit exceptions" to the nmethod which holds these JNI handles. > - While adding the new "implicit exceptions" section to the corresponding stats (`print_nmethod_stats()` and printing routines (`nmethod::print()`) I realized that a previous change ([JDK-8254231: Implementation of Foreign Linker API (Incubator)](https://bugs.openjdk.java.net/browse/JDK-8254231)) had already introduced a new nmethod section ("native invokers") but missed to add it to the corresponding stats and printing routines so I've added that section as well. > - The `#ifdef COMPILER2` guards are only required to not break the `zero`/`minimal` builds. > - The JTreg test is using `-XX:PerMethodTrapLimit=0` to handle all implicit exceptions as "hot". This makes the test simpler and at the same time provokes the allocation of more implicit exceptions. > - Manually verified that the created Exception objects are freed by GC once the corresponding nmethods have been flushed. > - Manual "stress" test with a very small heap and continuous recompilation of methods with explicit exceptions to provoke GCs during compilation didn't reveal any issues. > > ------------- > > Commit messages: > - 8273392: Improve usability of stack-less exceptions due to -XX:+OmitStackTraceInFastThrow > > Changes: https://git.openjdk.java.net/jdk/pull/5392/files > Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5392&range=00 > Issue: https://bugs.openjdk.java.net/browse/JDK-8273392 > Stats: 538 lines in 12 files changed: 417 ins; 6 del; 115 mod > Patch: https://git.openjdk.java.net/jdk/pull/5392.diff > Fetch: git fetch https://git.openjdk.java.net/jdk pull/5392/head:pull/5392 > > PR: https://git.openjdk.java.net/jdk/pull/5392 From github.com+71546117+tobiasholenstein at openjdk.java.net Mon Sep 13 08:11:18 2021 From: github.com+71546117+tobiasholenstein at openjdk.java.net (Tobias Holenstein) Date: Mon, 13 Sep 2021 08:11:18 GMT Subject: RFR: JDK-8272771: frame::pd_ps() is not implemented on any platform Message-ID: removed frame::pd_ps() which is not implemented on any platform. Replaced the only usage of frame::pd_ps() in the debug function `ps()` with `frame::print_on`. Tested on Tier1. Thanks! ------------- Commit messages: - JDK-8272771: frame::pd_ps() is not implemented on any platform Changes: https://git.openjdk.java.net/jdk/pull/5487/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5487&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8272771 Stats: 10 lines in 8 files changed: 0 ins; 9 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5487.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5487/head:pull/5487 PR: https://git.openjdk.java.net/jdk/pull/5487 From ngasson at openjdk.java.net Mon Sep 13 08:18:49 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Mon, 13 Sep 2021 08:18:49 GMT Subject: RFR: JDK-8271567: AArch64: AES Galois CounterMode (GCM) interleaved implementation using vector instructions [v8] In-Reply-To: References: Message-ID: On Fri, 10 Sep 2021 14:00:18 GMT, Andrew Haley wrote: >> An interleaved version of AES/GCM. >> >> Performance, now and then: >> >> >> Apple M1, 3.2 GHz: >> >> Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> AESGCMBench.decrypt 8192 256 avgt 6 3108.881 ? 119.675 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 3109.685 ? 4.206 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 3122.144 ? 113.379 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 3119.568 ? 192.877 ns/op >> >> AESGCMBench.decrypt 8192 256 avgt 6 89123.942 ? 111.977 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 91034.697 ? 161.469 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 89732.397 ? 106.370 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 89382.300 ? 139.300 ns/op >> >> Neoverse N1, 2.5GHz: >> >> Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> AESGCMBench.decrypt 8192 256 avgt 6 6296.575 ? 37.995 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 7380.326 ? 10.987 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 6293.090 ? 52.972 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 6357.536 ? 42.925 ns/op >> >> AESGCMBench.decrypt 8192 256 avgt 6 48745.085 ? 125.612 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 45062.599 ? 1548.950 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 42230.857 ? 520.562 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 45124.171 ? 1417.927 ns/op >> >> >> >> A note about the implementation for the reviewers: >> >> Unrolled and hand-scheduled intrinsics are often written in a way that >> I don't find satisfactory. Often they are a conglomeration of >> copy-and-paste programming and C macros, which makes them hard to >> understand and hard to maintain. I won't name any names, but there are >> many examples to be found in free software across the Internet, >> >> I spent a while thinking about a structured way to develop and >> implement them, and I think I've got something better. The idea is >> that you transform a pre-existing implementation into a generator for >> the interleaved version. The transformation shouldn't be too hard to >> do, but more importantly it should be possible for a reader to verify >> that the interleaved and unrolled version performs the same function. >> >> A generator takes the form of a subclass of `KernelGenerator`. The >> core idea is that the programmer defines the base case of the >> intrinsic and a method to generate a clone of it, shifted to a >> different set of registers. `KernelGenerator` will then generate >> several interleaved copies of the function, with each one using a >> different set of registers. >> >> The subclass must implement three methods: `length()`, which is the >> number of instruction bundles in the intrinsic, `generate(int n)` >> which emits the nth instruction bundle in the intrinsic, and `next()` >> which takes an instance of the generator and returns a version of it, >> shifted to a new set of registers. >> >> As an example, here's the inner loop of AES encryption: >> >> (Some details elided for clarity.) >> >> >> BIND(L_aes_loop); >> ld1(v0, T16B, post(from, 16)); >> >> cmpw(keylen, 44); >> br(Assembler::CC, L_rounds_44); >> br(Assembler::EQ, L_rounds_52); >> >> aes_round(v0, v17); >> aes_round(v0, v18); >> BIND(L_rounds_52); >> aes_round(v0, v19); >> aes_round(v0, v20); >> BIND(L_rounds_44); >> ... >> >> >> The generator for the unrolled version looks like: >> >> >> virtual void generate(int index) { >> switch (index) { >> case 0: >> ld1(_data, T16B, post(_from, 16)); // get 16 bytes of input >> break; >> case 1: >> if (_once) { >> cmpw(_keylen, 52); >> br(Assembler::LO, _rounds_44); >> br(Assembler::EQ, _rounds_52); >> } >> break; >> case 2: aes_round(_data, _subkeys + 0); break; >> case 3: aes_round(_data, _subkeys + 1); break; >> case 4: >> if (_once) bind(_rounds_52); >> break; >> case 5: aes_round(_data, _subkeys + 2); break; >> case 6: aes_round(_data, _subkeys + 3); break; >> case 7: >> if (_once) bind(_rounds_44); >> break; >> ... >> >> >> The job of converting a single inline intrinsic is, as you can see, >> not much more than adding a switch statement. Some instructions should >> only be emitted once, rather than several times, such as the labels >> and branches. (You can use a list of C++ lambdas rather than a switch >> statement to do the same thing, very LISP, but that seems a bit of a >> sledgehammer. YMMV.) >> >> I believe that this approach will be more maintainable and easier to >> understand than other approaches we've seen. Also, the number of >> unrolls is just a number that can be tweaked as required. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Whitespace Looks good. I tested on several different machines and got speed-ups between 5x and 17x (dataSize=16384). ------------- Marked as reviewed by ngasson (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5390 From shade at openjdk.java.net Mon Sep 13 08:23:54 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 13 Sep 2021 08:23:54 GMT Subject: RFR: JDK-8272771: frame::pd_ps() is not implemented on any platform In-Reply-To: References: Message-ID: On Mon, 13 Sep 2021 08:01:26 GMT, Tobias Holenstein wrote: > removed frame::pd_ps() which is not implemented on any platform. Replaced the only usage of frame::pd_ps() in the debug function `ps()` with `frame::print_on`. Tested on Tier1. > > Thanks! I am confused a bit. I don't see any platform where `pd_ps` is not empty (I see that it was implemented for now-removed SPARC port), so the compatible thing would be to remove the call to `pd_ps` completely? From a brief inspection, I suspect that `p->trace_stack_from` handles the printing here. ------------- PR: https://git.openjdk.java.net/jdk/pull/5487 From shade at openjdk.java.net Mon Sep 13 08:48:53 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 13 Sep 2021 08:48:53 GMT Subject: RFR: 8273438: Enable parallelism in vmTestbase/metaspace/stressHierarchy tests In-Reply-To: References: <6UGVOWy8QGpYDMbNFkT6qIERHESLMdZpvz8ihmm_obg=.cadc1f2e-3325-4bdc-a0c7-e4579f72663f@github.com> Message-ID: On Thu, 9 Sep 2021 19:54:47 GMT, Mikhailo Seledtsov wrote: >> Current `vmTestbase/metaspace/stressHierarchy` tests (part of vmTestbase_vm_metaspace suite) contains about 15 tests, each running exclusively. There seem to be no reason to run them exclusively, though: they complete in reasonable time, are single-threaded, and consume the usual amount of memory. There is no evidence in JBS that they ever timed out without a reason, and their history unfortunately predates OpenJDK to see why they were not concurrent from day one. >> >> We should consider enabling parallelism for `vmTestbase/metaspace/stressHierarchy` and get improved test performance. Currently it is blocked by `TEST.properties` with `exclusiveAccess.dirs` directives in them. >> >> Note there are other exclusive tests in `vmTestbase_vm_metaspace`, but those seem to be the hard stress tests: pushing GC to the limits, or doing many threads, etc. >> >> Motivational test time improvements below. >> >> Before: >> >> >> $ time CONF=linux-x86_64-server-fastdebug make run-test TEST=vmTestbase_vm_metaspace | ts -s >> ... >> 00:24:53 ============================== >> 00:24:53 Test summary >> 00:24:53 ============================== >> 00:24:53 TEST TOTAL PASS FAIL ERROR >> 00:24:53 jtreg:test/hotspot/jtreg:vmTestbase_vm_metaspace 25 25 0 0 >> 00:24:53 ============================== >> 00:24:53 TEST SUCCESS >> 00:24:53 >> 00:24:53 Finished building target 'run-test' in configuration 'linux-x86_64-server-fastdebug' >> >> real 24m53.389s >> user 53m2.029s >> sys 1m1.849s >> >> >> After: >> >> >> $ time CONF=linux-x86_64-server-fastdebug make run-test TEST=vmTestbase_vm_metaspace | ts -s >> ... >> 00:04:04 ============================== >> 00:04:04 Test summary >> 00:04:04 ============================== >> 00:04:04 TEST TOTAL PASS FAIL ERROR >> 00:04:04 jtreg:test/hotspot/jtreg:vmTestbase_vm_metaspace 25 25 0 0 >> 00:04:04 ============================== >> 00:04:04 TEST SUCCESS >> 00:04:04 >> 00:04:04 Finished building target 'run-test' in configuration 'linux-x86_64-server-fastdebug' >> >> real 4m4.574s >> user 56m10.582s >> sys 1m4.725s > > This looks like a good change to me. Please allow me some time to run multiple stress testing of these tests with exclusiveAccess removed. I should have the results tonight PST, or tomorrow. Thank you @mseledts. I guess I need a second reviewer for this. @iignatev, @dholmes-ora? ------------- PR: https://git.openjdk.java.net/jdk/pull/5391 From simonis at openjdk.java.net Mon Sep 13 10:19:49 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Mon, 13 Sep 2021 10:19:49 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow Message-ID: Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): public static boolean isAlpha(int c) { try { return IS_ALPHA[c]; } catch (ArrayIndexOutOfBoundsException ex) { return false; } } ### Solution Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions Benchmark (exceptionProbability) Mode Cnt Score Error Units ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions Benchmark (exceptionProbability) Mode Cnt Score Error Units ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op ### Implementation details - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. ------------- Commit messages: - 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow Changes: https://git.openjdk.java.net/jdk/pull/5488/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273563 Stats: 198 lines in 10 files changed: 190 ins; 0 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/5488.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5488/head:pull/5488 PR: https://git.openjdk.java.net/jdk/pull/5488 From pliden at openjdk.java.net Mon Sep 13 12:07:17 2021 From: pliden at openjdk.java.net (Per Liden) Date: Mon, 13 Sep 2021 12:07:17 GMT Subject: RFR: 8273597: Rectify Thread::is_ConcurrentGC_thread() [v2] In-Reply-To: References: Message-ID: > `Thread::is_ConcurrentGC_thread()` behaves differently to all other `Thread::is_xxx_thread()` functions, in the sense that it doesn't directly map to a distinct `Thread` sub-class. Instead, `is_ConcurrentGC_thread()` can today return true for both `ConcurrentGCThread` and `GangWorker`. These two classes have no super/sub-class relation. This is confusing and and potentially dangerous. > > It would be reasonable to think that code like this would be correct: > > > if (thread->is_ConcurrentGC_thread()) { > conc_thread = static_cast(thread); > ... > } > > > but it's not, since we might try to cast a `GangWorker` to a `ConcurrentGCThread`. And again, these two classes have no super/sub-class relation. > > I propose that we clean this up, so that `is_ConcurrentGCThread()` only returns true for threads inheriting from `ConcurrentGCThread`. The main side-effect is that a handful of asserts need to be adjusted. In return, the code example above would become legal, and we can also remove some cruft from `WorkGang`/`GangWorker`. Per Liden has updated the pull request incrementally with two additional commits since the last revision: - Fix constructor call - Add ConcurrentGCThread::cast() ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5463/files - new: https://git.openjdk.java.net/jdk/pull/5463/files/91e70702..fd645dd3 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5463&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5463&range=00-01 Stats: 9 lines in 3 files changed: 6 ins; 1 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/5463.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5463/head:pull/5463 PR: https://git.openjdk.java.net/jdk/pull/5463 From pliden at openjdk.java.net Mon Sep 13 12:13:49 2021 From: pliden at openjdk.java.net (Per Liden) Date: Mon, 13 Sep 2021 12:13:49 GMT Subject: RFR: 8273597: Rectify Thread::is_ConcurrentGC_thread() [v2] In-Reply-To: References: Message-ID: On Fri, 10 Sep 2021 14:30:41 GMT, Coleen Phillimore wrote: >> src/hotspot/share/code/nmethod.cpp line 1563: >> >>> 1561: DEBUG_ONLY(bool called_by_gc = Universe::heap()->is_gc_active() || >>> 1562: Thread::current()->is_ConcurrentGC_thread() || >>> 1563: Thread::current()->is_Worker_thread();) >> >> Three places use the same condition. Did you consider creating a helper function? > > I wonder if adding a helper function would encourage people to think they're the same again. All three places doesn't quite have identical conditions today. We might want to clean this up, for example, I'd argue that at least one of the asserts could be removed. However, I'm not sure I want to do such a cleanup of a fairly unrelated thing as part of this PR. ------------- PR: https://git.openjdk.java.net/jdk/pull/5463 From pliden at openjdk.java.net Mon Sep 13 12:13:48 2021 From: pliden at openjdk.java.net (Per Liden) Date: Mon, 13 Sep 2021 12:13:48 GMT Subject: RFR: 8273597: Rectify Thread::is_ConcurrentGC_thread() [v2] In-Reply-To: References: Message-ID: On Mon, 13 Sep 2021 12:07:17 GMT, Per Liden wrote: >> `Thread::is_ConcurrentGC_thread()` behaves differently to all other `Thread::is_xxx_thread()` functions, in the sense that it doesn't directly map to a distinct `Thread` sub-class. Instead, `is_ConcurrentGC_thread()` can today return true for both `ConcurrentGCThread` and `GangWorker`. These two classes have no super/sub-class relation. This is confusing and and potentially dangerous. >> >> It would be reasonable to think that code like this would be correct: >> >> >> if (thread->is_ConcurrentGC_thread()) { >> conc_thread = static_cast(thread); >> ... >> } >> >> >> but it's not, since we might try to cast a `GangWorker` to a `ConcurrentGCThread`. And again, these two classes have no super/sub-class relation. >> >> I propose that we clean this up, so that `is_ConcurrentGCThread()` only returns true for threads inheriting from `ConcurrentGCThread`. The main side-effect is that a handful of asserts need to be adjusted. In return, the code example above would become legal, and we can also remove some cruft from `WorkGang`/`GangWorker`. > > Per Liden has updated the pull request incrementally with two additional commits since the last revision: > > - Fix constructor call > - Add ConcurrentGCThread::cast() Updated PR with requested changes. ------------- PR: https://git.openjdk.java.net/jdk/pull/5463 From pliden at openjdk.java.net Mon Sep 13 12:13:51 2021 From: pliden at openjdk.java.net (Per Liden) Date: Mon, 13 Sep 2021 12:13:51 GMT Subject: RFR: 8273597: Rectify Thread::is_ConcurrentGC_thread() [v2] In-Reply-To: <-e2FC1G1xRjJP9pJtKgQV3AhyIfdrtG72Isk3K-Gnng=.956fb2cc-17be-4292-ba20-6a66881be603@github.com> References: <-e2FC1G1xRjJP9pJtKgQV3AhyIfdrtG72Isk3K-Gnng=.956fb2cc-17be-4292-ba20-6a66881be603@github.com> Message-ID: On Fri, 10 Sep 2021 14:34:05 GMT, Coleen Phillimore wrote: >> Per Liden has updated the pull request incrementally with two additional commits since the last revision: >> >> - Fix constructor call >> - Add ConcurrentGCThread::cast() > > src/hotspot/share/gc/z/zCollectedHeap.cpp line 84: > >> 82: virtual void do_thread(Thread* thread) { >> 83: if (thread->is_ConcurrentGC_thread()) { >> 84: static_cast(thread)->stop(); > > Should you have a cast function like JavaThread does ? > static JavaThread* cast(Thread* t) { > assert(t->is_Java_thread(), "incorrect cast to JavaThread"); > return static_cast(t); > } > > At one point @dholmes replaced all the thread->as_Java_thread() with JavaThread::cast() so this would be consistent with that and nice. Sounds like a good idea. Added `ConcurrentGCThread::cast()`. ------------- PR: https://git.openjdk.java.net/jdk/pull/5463 From pliden at openjdk.java.net Mon Sep 13 12:13:50 2021 From: pliden at openjdk.java.net (Per Liden) Date: Mon, 13 Sep 2021 12:13:50 GMT Subject: RFR: 8273597: Rectify Thread::is_ConcurrentGC_thread() [v2] In-Reply-To: References: Message-ID: <48uPaCg9rbZx6L-mZeasYujge4GExezphi6b3adNBfA=.64b0f346-b620-4d1e-b389-f73598e8db23@github.com> On Fri, 10 Sep 2021 13:43:37 GMT, Stefan Karlsson wrote: >> Per Liden has updated the pull request incrementally with two additional commits since the last revision: >> >> - Fix constructor call >> - Add ConcurrentGCThread::cast() > > src/hotspot/share/gc/parallel/parallelScavengeHeap.hpp line 122: > >> 120: _old_pool(NULL), >> 121: _workers("GC Thread", >> 122: ParallelGCThreads) { } > > Consider moving this up to the line above, like the G1 code. Fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/5463 From stefank at openjdk.java.net Mon Sep 13 12:24:57 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Mon, 13 Sep 2021 12:24:57 GMT Subject: RFR: 8273597: Rectify Thread::is_ConcurrentGC_thread() [v2] In-Reply-To: References: Message-ID: <5w9LT70Xgw0fu3bme0Tx4ko3zShJrqT2OQU613E7Ff8=.4b4211fc-7d8b-4b91-b6ed-e24a8e28947f@github.com> On Mon, 13 Sep 2021 12:07:17 GMT, Per Liden wrote: >> `Thread::is_ConcurrentGC_thread()` behaves differently to all other `Thread::is_xxx_thread()` functions, in the sense that it doesn't directly map to a distinct `Thread` sub-class. Instead, `is_ConcurrentGC_thread()` can today return true for both `ConcurrentGCThread` and `GangWorker`. These two classes have no super/sub-class relation. This is confusing and and potentially dangerous. >> >> It would be reasonable to think that code like this would be correct: >> >> >> if (thread->is_ConcurrentGC_thread()) { >> conc_thread = static_cast(thread); >> ... >> } >> >> >> but it's not, since we might try to cast a `GangWorker` to a `ConcurrentGCThread`. And again, these two classes have no super/sub-class relation. >> >> I propose that we clean this up, so that `is_ConcurrentGCThread()` only returns true for threads inheriting from `ConcurrentGCThread`. The main side-effect is that a handful of asserts need to be adjusted. In return, the code example above would become legal, and we can also remove some cruft from `WorkGang`/`GangWorker`. > > Per Liden has updated the pull request incrementally with two additional commits since the last revision: > > - Fix constructor call > - Add ConcurrentGCThread::cast() Marked as reviewed by stefank (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5463 From stefank at openjdk.java.net Mon Sep 13 12:24:57 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Mon, 13 Sep 2021 12:24:57 GMT Subject: RFR: 8273597: Rectify Thread::is_ConcurrentGC_thread() [v2] In-Reply-To: References: Message-ID: <5HaunoyVGOx5xjDk1pW5q0SYXGn3cLo7M0QwflA8AiY=.870a709a-181a-4627-91b2-91a75fe3d5c0@github.com> On Mon, 13 Sep 2021 12:09:15 GMT, Per Liden wrote: >> I wonder if adding a helper function would encourage people to think they're the same again. > > All three places doesn't quite have identical conditions today. We might want to clean this up, for example, I'd argue that at least one of the asserts could be removed. However, I'm not sure I want to do such a cleanup of a fairly unrelated thing as part of this PR. I agree. I missed that one of the one of them is_at_safepoint and not is_gc_active. ------------- PR: https://git.openjdk.java.net/jdk/pull/5463 From adinn at openjdk.java.net Mon Sep 13 12:35:57 2021 From: adinn at openjdk.java.net (Andrew Dinn) Date: Mon, 13 Sep 2021 12:35:57 GMT Subject: RFR: JDK-8271567: AArch64: AES Galois CounterMode (GCM) interleaved implementation using vector instructions [v8] In-Reply-To: References: Message-ID: On Fri, 10 Sep 2021 14:00:18 GMT, Andrew Haley wrote: >> An interleaved version of AES/GCM. >> >> Performance, now and then: >> >> >> Apple M1, 3.2 GHz: >> >> Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> AESGCMBench.decrypt 8192 256 avgt 6 3108.881 ? 119.675 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 3109.685 ? 4.206 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 3122.144 ? 113.379 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 3119.568 ? 192.877 ns/op >> >> AESGCMBench.decrypt 8192 256 avgt 6 89123.942 ? 111.977 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 91034.697 ? 161.469 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 89732.397 ? 106.370 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 89382.300 ? 139.300 ns/op >> >> Neoverse N1, 2.5GHz: >> >> Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> AESGCMBench.decrypt 8192 256 avgt 6 6296.575 ? 37.995 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 7380.326 ? 10.987 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 6293.090 ? 52.972 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 6357.536 ? 42.925 ns/op >> >> AESGCMBench.decrypt 8192 256 avgt 6 48745.085 ? 125.612 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 45062.599 ? 1548.950 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 42230.857 ? 520.562 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 45124.171 ? 1417.927 ns/op >> >> >> >> A note about the implementation for the reviewers: >> >> Unrolled and hand-scheduled intrinsics are often written in a way that >> I don't find satisfactory. Often they are a conglomeration of >> copy-and-paste programming and C macros, which makes them hard to >> understand and hard to maintain. I won't name any names, but there are >> many examples to be found in free software across the Internet, >> >> I spent a while thinking about a structured way to develop and >> implement them, and I think I've got something better. The idea is >> that you transform a pre-existing implementation into a generator for >> the interleaved version. The transformation shouldn't be too hard to >> do, but more importantly it should be possible for a reader to verify >> that the interleaved and unrolled version performs the same function. >> >> A generator takes the form of a subclass of `KernelGenerator`. The >> core idea is that the programmer defines the base case of the >> intrinsic and a method to generate a clone of it, shifted to a >> different set of registers. `KernelGenerator` will then generate >> several interleaved copies of the function, with each one using a >> different set of registers. >> >> The subclass must implement three methods: `length()`, which is the >> number of instruction bundles in the intrinsic, `generate(int n)` >> which emits the nth instruction bundle in the intrinsic, and `next()` >> which takes an instance of the generator and returns a version of it, >> shifted to a new set of registers. >> >> As an example, here's the inner loop of AES encryption: >> >> (Some details elided for clarity.) >> >> >> BIND(L_aes_loop); >> ld1(v0, T16B, post(from, 16)); >> >> cmpw(keylen, 44); >> br(Assembler::CC, L_rounds_44); >> br(Assembler::EQ, L_rounds_52); >> >> aes_round(v0, v17); >> aes_round(v0, v18); >> BIND(L_rounds_52); >> aes_round(v0, v19); >> aes_round(v0, v20); >> BIND(L_rounds_44); >> ... >> >> >> The generator for the unrolled version looks like: >> >> >> virtual void generate(int index) { >> switch (index) { >> case 0: >> ld1(_data, T16B, post(_from, 16)); // get 16 bytes of input >> break; >> case 1: >> if (_once) { >> cmpw(_keylen, 52); >> br(Assembler::LO, _rounds_44); >> br(Assembler::EQ, _rounds_52); >> } >> break; >> case 2: aes_round(_data, _subkeys + 0); break; >> case 3: aes_round(_data, _subkeys + 1); break; >> case 4: >> if (_once) bind(_rounds_52); >> break; >> case 5: aes_round(_data, _subkeys + 2); break; >> case 6: aes_round(_data, _subkeys + 3); break; >> case 7: >> if (_once) bind(_rounds_44); >> break; >> ... >> >> >> The job of converting a single inline intrinsic is, as you can see, >> not much more than adding a switch statement. Some instructions should >> only be emitted once, rather than several times, such as the labels >> and branches. (You can use a list of C++ lambdas rather than a switch >> statement to do the same thing, very LISP, but that seems a bit of a >> sledgehammer. YMMV.) >> >> I believe that this approach will be more maintainable and easier to >> understand than other approaches we've seen. Also, the number of >> unrolls is just a number that can be tweaked as required. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Whitespace This looks great. In particular, use of the kernel generator model to mange unrolling is something that should be used in all generated code that relies on unrolling. It is highly readable, which is rarely the case with hand-crafted code, because the generator methods clearly signal the structure of the interleaved code. It should also be far easier to update if the code ever needs revising. I suspect it would be hard to produce hand-crafted code that does significantly better when it comes to performance. Marked as reviewed by adinn (Reviewer). src/hotspot/cpu/aarch64/macroAssembler_aarch64_aes.cpp line 604: > 602: // v4: high part of product > 603: // v5: low part ... > 604: // I'm not clear about this comment. The ghash generators have a stride of 7. Should this not mean the registers are replicated across v0 - v27 with v6, v13, v20 and v27 classified as unused registers. ------------- PR: https://git.openjdk.java.net/jdk/pull/5390 From github.com+10835776+stsypanov at openjdk.java.net Mon Sep 13 12:37:49 2021 From: github.com+10835776+stsypanov at openjdk.java.net (=?UTF-8?B?0KHQtdGA0LPQtdC5?= =?UTF-8?B?IA==?= =?UTF-8?B?0KbRi9C/0LDQvdC+0LI=?=) Date: Mon, 13 Sep 2021 12:37:49 GMT Subject: RFR: 8272992: Replace usages of Collections.sort with List.sort call in jdk.* modules In-Reply-To: <9xbhI0rwD3XbAHZFfQAkJHYivbC5F4N085RuSVWx8HU=.8a470c93-5fee-4981-97e4-afb6cb1147b9@github.com> References: <9xbhI0rwD3XbAHZFfQAkJHYivbC5F4N085RuSVWx8HU=.8a470c93-5fee-4981-97e4-afb6cb1147b9@github.com> Message-ID: On Mon, 23 Aug 2021 21:08:05 GMT, Andrey Turbanov wrote: > Collections.sort is just a wrapper, so it is better to use an instance method directly. Marked as reviewed by stsypanov at github.com (no known OpenJDK username). ------------- PR: https://git.openjdk.java.net/jdk/pull/5230 From coleenp at openjdk.java.net Mon Sep 13 12:40:51 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 13 Sep 2021 12:40:51 GMT Subject: RFR: 8273597: Rectify Thread::is_ConcurrentGC_thread() [v2] In-Reply-To: References: Message-ID: On Mon, 13 Sep 2021 12:07:17 GMT, Per Liden wrote: >> `Thread::is_ConcurrentGC_thread()` behaves differently to all other `Thread::is_xxx_thread()` functions, in the sense that it doesn't directly map to a distinct `Thread` sub-class. Instead, `is_ConcurrentGC_thread()` can today return true for both `ConcurrentGCThread` and `GangWorker`. These two classes have no super/sub-class relation. This is confusing and and potentially dangerous. >> >> It would be reasonable to think that code like this would be correct: >> >> >> if (thread->is_ConcurrentGC_thread()) { >> conc_thread = static_cast(thread); >> ... >> } >> >> >> but it's not, since we might try to cast a `GangWorker` to a `ConcurrentGCThread`. And again, these two classes have no super/sub-class relation. >> >> I propose that we clean this up, so that `is_ConcurrentGCThread()` only returns true for threads inheriting from `ConcurrentGCThread`. The main side-effect is that a handful of asserts need to be adjusted. In return, the code example above would become legal, and we can also remove some cruft from `WorkGang`/`GangWorker`. > > Per Liden has updated the pull request incrementally with two additional commits since the last revision: > > - Fix constructor call > - Add ConcurrentGCThread::cast() Looks good! ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5463 From aph at openjdk.java.net Mon Sep 13 12:52:51 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 13 Sep 2021 12:52:51 GMT Subject: RFR: 8273297: AES/GCM non-AVX512+VAES CPUs suffer after 8267125 In-Reply-To: References: Message-ID: On Tue, 7 Sep 2021 22:31:30 GMT, Smita Kamath wrote: > Performance dropped up to 10% for 1k data after 8267125 for CPUs that do not support the new intrinsic. Tests run were crypto.full.AESGCMBench and crypto.full.AESGCMByteBuffer from the jmh micro benchmarks. > > The problem is each instance of GHASH allocates 96 extra longs for the AVX512+VAES intrinsic regardless if the intrinsic is used. This extra table space should be allocated differently so that non-supporting CPUs do not suffer this penalty. This issue also affects non-Intel CPUs too. src/hotspot/share/opto/library_call.cpp line 6796: > 6794: > 6795: Node* avx512_subkeyHtbl = new_array(klass_node, intcon(96), 0); > 6796: if (avx512_subkeyHtbl == NULL) return false; This looks very Intel-specific, but it's in generic code. Please make this constant 96 a symbol and push it into a header file in the x86 back end. ------------- PR: https://git.openjdk.java.net/jdk/pull/5402 From aph at openjdk.java.net Mon Sep 13 12:52:52 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 13 Sep 2021 12:52:52 GMT Subject: RFR: 8273297: AES/GCM non-AVX512+VAES CPUs suffer after 8267125 In-Reply-To: References: Message-ID: On Mon, 13 Sep 2021 12:48:14 GMT, Andrew Haley wrote: >> Performance dropped up to 10% for 1k data after 8267125 for CPUs that do not support the new intrinsic. Tests run were crypto.full.AESGCMBench and crypto.full.AESGCMByteBuffer from the jmh micro benchmarks. >> >> The problem is each instance of GHASH allocates 96 extra longs for the AVX512+VAES intrinsic regardless if the intrinsic is used. This extra table space should be allocated differently so that non-supporting CPUs do not suffer this penalty. This issue also affects non-Intel CPUs too. > > src/hotspot/share/opto/library_call.cpp line 6796: > >> 6794: >> 6795: Node* avx512_subkeyHtbl = new_array(klass_node, intcon(96), 0); >> 6796: if (avx512_subkeyHtbl == NULL) return false; > > This looks very Intel-specific, but it's in generic code. Please make this constant 96 a symbol and push it into a header file in the x86 back end. Likewise, the name prefix "avx512_" isn't appropriate for code that will certainly be used by other targets. ------------- PR: https://git.openjdk.java.net/jdk/pull/5402 From iignatyev at openjdk.java.net Mon Sep 13 14:48:53 2021 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Mon, 13 Sep 2021 14:48:53 GMT Subject: RFR: 8273438: Enable parallelism in vmTestbase/metaspace/stressHierarchy tests In-Reply-To: <6UGVOWy8QGpYDMbNFkT6qIERHESLMdZpvz8ihmm_obg=.cadc1f2e-3325-4bdc-a0c7-e4579f72663f@github.com> References: <6UGVOWy8QGpYDMbNFkT6qIERHESLMdZpvz8ihmm_obg=.cadc1f2e-3325-4bdc-a0c7-e4579f72663f@github.com> Message-ID: On Tue, 7 Sep 2021 15:07:10 GMT, Aleksey Shipilev wrote: > Current `vmTestbase/metaspace/stressHierarchy` tests (part of vmTestbase_vm_metaspace suite) contains about 15 tests, each running exclusively. There seem to be no reason to run them exclusively, though: they complete in reasonable time, are single-threaded, and consume the usual amount of memory. There is no evidence in JBS that they ever timed out without a reason, and their history unfortunately predates OpenJDK to see why they were not concurrent from day one. > > We should consider enabling parallelism for `vmTestbase/metaspace/stressHierarchy` and get improved test performance. Currently it is blocked by `TEST.properties` with `exclusiveAccess.dirs` directives in them. > > Note there are other exclusive tests in `vmTestbase_vm_metaspace`, but those seem to be the hard stress tests: pushing GC to the limits, or doing many threads, etc. > > Motivational test time improvements below. > > Before: > > > $ time CONF=linux-x86_64-server-fastdebug make run-test TEST=vmTestbase_vm_metaspace | ts -s > ... > 00:24:53 ============================== > 00:24:53 Test summary > 00:24:53 ============================== > 00:24:53 TEST TOTAL PASS FAIL ERROR > 00:24:53 jtreg:test/hotspot/jtreg:vmTestbase_vm_metaspace 25 25 0 0 > 00:24:53 ============================== > 00:24:53 TEST SUCCESS > 00:24:53 > 00:24:53 Finished building target 'run-test' in configuration 'linux-x86_64-server-fastdebug' > > real 24m53.389s > user 53m2.029s > sys 1m1.849s > > > After: > > > $ time CONF=linux-x86_64-server-fastdebug make run-test TEST=vmTestbase_vm_metaspace | ts -s > ... > 00:04:04 ============================== > 00:04:04 Test summary > 00:04:04 ============================== > 00:04:04 TEST TOTAL PASS FAIL ERROR > 00:04:04 jtreg:test/hotspot/jtreg:vmTestbase_vm_metaspace 25 25 0 0 > 00:04:04 ============================== > 00:04:04 TEST SUCCESS > 00:04:04 > 00:04:04 Finished building target 'run-test' in configuration 'linux-x86_64-server-fastdebug' > > real 4m4.574s > user 56m10.582s > sys 1m4.725s Marked as reviewed by iignatyev (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5391 From coleenp at openjdk.java.net Mon Sep 13 15:19:52 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 13 Sep 2021 15:19:52 GMT Subject: RFR: 8273300: Check Mutex ranking during a safepoint [v2] In-Reply-To: <0fzRARQKdoAbSLHq_SaKs698DuGgRWMkCUI4omDmurk=.acb1126f-5b41-46ad-a022-74eabffdb71f@github.com> References: <2yiIxm7CaiJf3_wiiYcuQjbRNFauGO0MJoJGUphSmR0=.29fe408e-8707-402f-8e3c-5366aff3a8cc@github.com> <0fzRARQKdoAbSLHq_SaKs698DuGgRWMkCUI4omDmurk=.acb1126f-5b41-46ad-a022-74eabffdb71f@github.com> Message-ID: On Sun, 12 Sep 2021 22:33:43 GMT, David Holmes wrote: > If I see a lock with a rank service-1, then should I infer that lock can be acquired while the service (or notification) lock is held? And that if lock A is service-1 and lock B is service-2, then B can be acquired while holding A? Yes. The Service_lock is held while all these other locks are taken. The subtraction comes from the highest ranked lock with a name, in this case 'service'. > src/hotspot/share/memory/universe.cpp line 125: > >> 123: LatestMethodCache* Universe::_do_stack_walk_cache = NULL; >> 124: >> 125: bool Universe::_verify_in_progress = false; > > This cleanup seems completely unrelated to your mutex change and is best left to a separate cleanup RFE. I could do it in a trivial other RFE. > src/hotspot/share/memory/universe.cpp line 1116: > >> 1114: } >> 1115: if (should_verify_subset(Verify_CodeCache)) { >> 1116: MutexLocker mu(CodeCache_lock, Mutex::_no_safepoint_check_flag); > > Is this needed to allow the new rankings to work? And is this enabled by the _verify_in_progress change? If so I'd rather see all of that related stuff changed first in a separate RFE that can easily be independently backported. Yes, this is needed. This verification is done during a safepoint, so we don't need this lock. The CodeCache_lock has a vary low ranking and takes out VtableStubs_lock which is a higher ranking. With this change, we do not take out the CodeCache_lock, so it's needed for this change. I see no reason whatsoever to backport it though. ------------- PR: https://git.openjdk.java.net/jdk/pull/5467 From shade at openjdk.java.net Mon Sep 13 16:50:52 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 13 Sep 2021 16:50:52 GMT Subject: RFR: 8273486: Zero: Handle DiagnoseSyncOnValueBasedClasses VM option In-Reply-To: References: Message-ID: On Wed, 8 Sep 2021 10:41:34 GMT, Aleksey Shipilev wrote: > JDK-8257027 added a diagnostic option to check for synchronization on value-based classes. Zero does not support it, so it would fail the relevant test: > > > $ CONF=linux-x86_64-zero-fastdebug make exploded-test TEST=runtime/Monitor/SyncOnValueBasedClassTest.java > > STDERR: > stdout: []; > stderr: [Exception in thread "main" java.lang.RuntimeException: synchronization on value based class did not fail > at SyncOnValueBasedClassTest$FatalTest.main(SyncOnValueBasedClassTest.java:128) > ] > exitValue = 1 > > java.lang.RuntimeException: 'fatal error: Synchronizing on object' missing from stdout/stderr > > > Template interpreters implement this check by going to to slowpath that calls `InterpreterRuntime::monitorenter`. Zero already goes to that path when `UseHeavyMonitors` is enabled, so we might just enable it when lock diagnostics is requested. This would cost us zero (pun intended) when diagnostic option is disabled. > > Additional testing: > - [x] Linux x86_64 Zero, affected test now passes Any takers? :) ------------- PR: https://git.openjdk.java.net/jdk/pull/5412 From psandoz at openjdk.java.net Mon Sep 13 17:34:01 2021 From: psandoz at openjdk.java.net (Paul Sandoz) Date: Mon, 13 Sep 2021 17:34:01 GMT Subject: RFR: 8272992: Replace usages of Collections.sort with List.sort call in jdk.* modules In-Reply-To: <9xbhI0rwD3XbAHZFfQAkJHYivbC5F4N085RuSVWx8HU=.8a470c93-5fee-4981-97e4-afb6cb1147b9@github.com> References: <9xbhI0rwD3XbAHZFfQAkJHYivbC5F4N085RuSVWx8HU=.8a470c93-5fee-4981-97e4-afb6cb1147b9@github.com> Message-ID: <6BQFENuC9DGe-d_UeJrpBEWaoD5vYTRq6dWCmil18J0=.b9ad79ad-b52f-4483-9b25-d1d2faca19e8@github.com> On Mon, 23 Aug 2021 21:08:05 GMT, Andrey Turbanov wrote: > Collections.sort is just a wrapper, so it is better to use an instance method directly. This looks a good change. Either as part of this PR or as a follow up I think it is worth reviewing the comparators passed to the sort method. In some cases there is repetition and in other cases it may be possible to use `Comparator.comparing` or the primitive specialization. ------------- PR: https://git.openjdk.java.net/jdk/pull/5230 From github.com+741251+turbanoff at openjdk.java.net Mon Sep 13 17:46:52 2021 From: github.com+741251+turbanoff at openjdk.java.net (Andrey Turbanov) Date: Mon, 13 Sep 2021 17:46:52 GMT Subject: RFR: 8272992: Replace usages of Collections.sort with List.sort call in jdk.* modules In-Reply-To: <9xbhI0rwD3XbAHZFfQAkJHYivbC5F4N085RuSVWx8HU=.8a470c93-5fee-4981-97e4-afb6cb1147b9@github.com> References: <9xbhI0rwD3XbAHZFfQAkJHYivbC5F4N085RuSVWx8HU=.8a470c93-5fee-4981-97e4-afb6cb1147b9@github.com> Message-ID: <1c_lAxJoIGyUfTGPpwog2v4gjl4RnKyIa-RUIevgVEw=.631b0296-27de-46e5-bc30-58d10a34556f@github.com> On Mon, 23 Aug 2021 21:08:05 GMT, Andrey Turbanov wrote: > Collections.sort is just a wrapper, so it is better to use an instance method directly. Yeah. I think it's better to leave it to another PR/issue. To make this easier to review/validate. ------------- PR: https://git.openjdk.java.net/jdk/pull/5230 From forax at openjdk.java.net Mon Sep 13 18:02:56 2021 From: forax at openjdk.java.net (=?UTF-8?B?UsOpbWk=?= Forax) Date: Mon, 13 Sep 2021 18:02:56 GMT Subject: RFR: 8272992: Replace usages of Collections.sort with List.sort call in jdk.* modules In-Reply-To: <9xbhI0rwD3XbAHZFfQAkJHYivbC5F4N085RuSVWx8HU=.8a470c93-5fee-4981-97e4-afb6cb1147b9@github.com> References: <9xbhI0rwD3XbAHZFfQAkJHYivbC5F4N085RuSVWx8HU=.8a470c93-5fee-4981-97e4-afb6cb1147b9@github.com> Message-ID: On Mon, 23 Aug 2021 21:08:05 GMT, Andrey Turbanov wrote: > Collections.sort is just a wrapper, so it is better to use an instance method directly. src/jdk.jfr/share/classes/jdk/jfr/internal/MetadataReader.java line 86: > 84: if (Logger.shouldLog(LogTag.JFR_SYSTEM_PARSER, LogLevel.TRACE)) { > 85: List ts = new ArrayList<>(types.values()); > 86: ts.sort((x, y) -> x.getName().compareTo(y.getName())); you can use Comparator.comparing(Type::getName) src/jdk.jfr/share/classes/jdk/jfr/internal/SettingsManager.java line 142: > 140: } else { > 141: if (Logger.shouldLog(LogTag.JFR_SETTING, LogLevel.INFO)) { > 142: eventControls.sort((x, y) -> x.getEventType().getName().compareTo(y.getEventType().getName())); Comparator.comparing(x -> e.getEventType().getName()) src/jdk.jfr/share/classes/jdk/jfr/internal/TypeLibrary.java line 111: > 109: try { > 110: jvmTypes = MetadataLoader.createTypes(); > 111: jvmTypes.sort((a, b) -> Long.compare(a.getId(), b.getId())); Comparator.comparingLong(Type::getId) src/jdk.jfr/share/classes/jdk/jfr/internal/consumer/RepositoryFiles.java line 215: > 213: pathLookup.remove(remove); > 214: } > 215: added.sort((p1, p2) -> p1.compareTo(p2)); 'added.sort(Path::compareTo)' src/jdk.jfr/share/classes/jdk/jfr/internal/dcmd/DCmdCheck.java line 137: > 135: List sorted = new ArrayList<>(); > 136: sorted.addAll(events); > 137: sorted.sort(new Comparator() { I wonder if there is a bootstrap issue here (why an anonymous class is used instead of a lambda?) If a lambda can be used, it cn be simplified to `sorted.sort(Comparator.comparing(EventType::getName))` src/jdk.jfr/share/classes/jdk/jfr/internal/tool/Summary.java line 145: > 143: println(" Duration: " + (totalDuration + 500_000_000) / 1_000_000_000 + " s"); > 144: List statsList = new ArrayList<>(stats.values()); > 145: statsList.sort((u, v) -> Long.compare(v.count, u.count)); `statsList.sort(Comparator.comparingLong(v -> v.count))` ------------- PR: https://git.openjdk.java.net/jdk/pull/5230 From coleenp at openjdk.java.net Mon Sep 13 20:15:24 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 13 Sep 2021 20:15:24 GMT Subject: RFR: 8273635: Attempting to acquire lock StackWatermark_lock/9 out of order with lock tty_lock/3 Message-ID: This change reverts the rank ordering of ttyLock and StackWatermark_lock because the latter is held through a very large region and printing all of this to a buffer with xmlstream is non-trivial. With this change, if tty->print_cr() is done while holding the stackwatermark lock or lower (which is service ranking, etc), a lock inversion will happen with ttyLock. This doesn't happen now because all the code in GC and much of the rest of the runtime use UL and not tty->print(). Tested with tier1-6. ------------- Commit messages: - 8273635: Attempting to acquire lock StackWatermark_lock/9 out of order with lock tty_lock/3 Changes: https://git.openjdk.java.net/jdk/pull/5499/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5499&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273635 Stats: 5 lines in 2 files changed: 1 ins; 1 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/5499.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5499/head:pull/5499 PR: https://git.openjdk.java.net/jdk/pull/5499 From coleenp at openjdk.java.net Mon Sep 13 21:04:32 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 13 Sep 2021 21:04:32 GMT Subject: RFR: 8273300: Check Mutex ranking during a safepoint [v3] In-Reply-To: References: Message-ID: > This change checks lock ranking during a safepoint. For some reason, safepoint checking was excluded, probably from the days where Safepoint_lock and Threads_lock were used. > Because of checking during a safepoint, some locks had to get lower ranks. The CR has the details of which locks these were. The Service_lock complicates things because it's held during oops_do, which may take out other G1 locks. > This was built and tested with Shenandoah. Thanks to @zhengyu123 for the changes in Shenandoah. > Tests run tier1-8. Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - Merge branch 'master' into checkrank - Fix Shenandoah mismerge - 8273300: Check Mutex ranking during a safepoint ------------- Changes: https://git.openjdk.java.net/jdk/pull/5467/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5467&range=02 Stats: 31 lines in 12 files changed: 1 ins; 7 del; 23 mod Patch: https://git.openjdk.java.net/jdk/pull/5467.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5467/head:pull/5467 PR: https://git.openjdk.java.net/jdk/pull/5467 From coleenp at openjdk.java.net Mon Sep 13 21:04:33 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 13 Sep 2021 21:04:33 GMT Subject: RFR: 8273300: Check Mutex ranking during a safepoint [v3] In-Reply-To: References: <2yiIxm7CaiJf3_wiiYcuQjbRNFauGO0MJoJGUphSmR0=.29fe408e-8707-402f-8e3c-5366aff3a8cc@github.com> <0fzRARQKdoAbSLHq_SaKs698DuGgRWMkCUI4omDmurk=.acb1126f-5b41-46ad-a022-74eabffdb71f@github.com> Message-ID: On Mon, 13 Sep 2021 15:12:51 GMT, Coleen Phillimore wrote: >> src/hotspot/share/memory/universe.cpp line 125: >> >>> 123: LatestMethodCache* Universe::_do_stack_walk_cache = NULL; >>> 124: >>> 125: long Universe::verify_flags = Universe::Verify_All; >> >> This cleanup seems completely unrelated to your mutex change and is best left to a separate cleanup RFE. > > I could do it in a trivial other RFE. Done. ------------- PR: https://git.openjdk.java.net/jdk/pull/5467 From dholmes at openjdk.java.net Mon Sep 13 22:04:22 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 13 Sep 2021 22:04:22 GMT Subject: RFR: 8273635: Attempting to acquire lock StackWatermark_lock/9 out of order with lock tty_lock/3 In-Reply-To: References: Message-ID: On Mon, 13 Sep 2021 20:04:56 GMT, Coleen Phillimore wrote: > This change reverts the rank ordering of ttyLock and StackWatermark_lock because the latter is held through a very large region and printing all of this to a buffer with xmlstream is non-trivial. > With this change, if tty->print_cr() is done while holding the stackwatermark lock or lower (which is service ranking, etc), a lock inversion will happen with ttyLock. This doesn't happen now because all the code in GC and much of the rest of the runtime use UL and not tty->print(). > Tested with tier1-6. Hi Coleen, The ranking restoration seems fine. One possible typo below. Thanks, David test/hotspot/jtreg/compiler/uncommontrap/TestDeoptOOM.java line 39: > 37: * @test > 38: * @bug 8273456 > 39: * @summary Test that ttyLock is ranked about StackWatermark_lock s/about/above/ ? ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5499 From github.com+7806504+liach at openjdk.java.net Mon Sep 13 22:14:18 2021 From: github.com+7806504+liach at openjdk.java.net (liach) Date: Mon, 13 Sep 2021 22:14:18 GMT Subject: RFR: 8272992: Replace usages of Collections.sort with List.sort call in jdk.* modules In-Reply-To: References: <9xbhI0rwD3XbAHZFfQAkJHYivbC5F4N085RuSVWx8HU=.8a470c93-5fee-4981-97e4-afb6cb1147b9@github.com> Message-ID: On Mon, 13 Sep 2021 17:56:14 GMT, R?mi Forax wrote: >> Collections.sort is just a wrapper, so it is better to use an instance method directly. > > src/jdk.jfr/share/classes/jdk/jfr/internal/consumer/RepositoryFiles.java line 215: > >> 213: pathLookup.remove(remove); >> 214: } >> 215: added.sort((p1, p2) -> p1.compareTo(p2)); > > 'added.sort(Path::compareTo)' Can't we just use natural ordering `null` here? ------------- PR: https://git.openjdk.java.net/jdk/pull/5230 From coleenp at openjdk.java.net Mon Sep 13 22:24:50 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 13 Sep 2021 22:24:50 GMT Subject: RFR: 8273635: Attempting to acquire lock StackWatermark_lock/9 out of order with lock tty_lock/3 [v2] In-Reply-To: References: Message-ID: > This change reverts the rank ordering of ttyLock and StackWatermark_lock because the latter is held through a very large region and printing all of this to a buffer with xmlstream is non-trivial. > With this change, if tty->print_cr() is done while holding the stackwatermark lock or lower (which is service ranking, etc), a lock inversion will happen with ttyLock. This doesn't happen now because all the code in GC and much of the rest of the runtime use UL and not tty->print(). > Tested with tier1-6. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: fix typo ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5499/files - new: https://git.openjdk.java.net/jdk/pull/5499/files/53dd04d2..557f2bc4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5499&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5499&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5499.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5499/head:pull/5499 PR: https://git.openjdk.java.net/jdk/pull/5499 From coleenp at openjdk.java.net Mon Sep 13 22:24:51 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 13 Sep 2021 22:24:51 GMT Subject: RFR: 8273635: Attempting to acquire lock StackWatermark_lock/9 out of order with lock tty_lock/3 In-Reply-To: References: Message-ID: On Mon, 13 Sep 2021 20:04:56 GMT, Coleen Phillimore wrote: > This change reverts the rank ordering of ttyLock and StackWatermark_lock because the latter is held through a very large region and printing all of this to a buffer with xmlstream is non-trivial. > With this change, if tty->print_cr() is done while holding the stackwatermark lock or lower (which is service ranking, etc), a lock inversion will happen with ttyLock. This doesn't happen now because all the code in GC and much of the rest of the runtime use UL and not tty->print(). > Tested with tier1-6. Thanks for the review, David! ------------- PR: https://git.openjdk.java.net/jdk/pull/5499 From coleenp at openjdk.java.net Mon Sep 13 22:24:56 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 13 Sep 2021 22:24:56 GMT Subject: RFR: 8273635: Attempting to acquire lock StackWatermark_lock/9 out of order with lock tty_lock/3 [v2] In-Reply-To: References: Message-ID: On Mon, 13 Sep 2021 21:53:35 GMT, David Holmes wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> fix typo > > test/hotspot/jtreg/compiler/uncommontrap/TestDeoptOOM.java line 39: > >> 37: * @test >> 38: * @bug 8273456 >> 39: * @summary Test that ttyLock is ranked about StackWatermark_lock > > s/about/above/ ? fixed, thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/5499 From dholmes at openjdk.java.net Tue Sep 14 01:56:07 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 14 Sep 2021 01:56:07 GMT Subject: RFR: 8273486: Zero: Handle DiagnoseSyncOnValueBasedClasses VM option In-Reply-To: References: Message-ID: On Wed, 8 Sep 2021 10:41:34 GMT, Aleksey Shipilev wrote: > JDK-8257027 added a diagnostic option to check for synchronization on value-based classes. Zero does not support it, so it would fail the relevant test: > > > $ CONF=linux-x86_64-zero-fastdebug make exploded-test TEST=runtime/Monitor/SyncOnValueBasedClassTest.java > > STDERR: > stdout: []; > stderr: [Exception in thread "main" java.lang.RuntimeException: synchronization on value based class did not fail > at SyncOnValueBasedClassTest$FatalTest.main(SyncOnValueBasedClassTest.java:128) > ] > exitValue = 1 > > java.lang.RuntimeException: 'fatal error: Synchronizing on object' missing from stdout/stderr > > > Template interpreters implement this check by going to to slowpath that calls `InterpreterRuntime::monitorenter`. Zero already goes to that path when `UseHeavyMonitors` is enabled, so we might just enable it when lock diagnostics is requested. This would cost us zero (pun intended) when diagnostic option is disabled. > > Additional testing: > - [x] Linux x86_64 Zero, affected test now passes Hi Aleksey, Change seems fine. I'm a little surprised this is all you need. Cheers, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5412 From xliu at openjdk.java.net Tue Sep 14 02:18:14 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 14 Sep 2021 02:18:14 GMT Subject: RFR: JDK-8271567: AArch64: AES Galois CounterMode (GCM) interleaved implementation using vector instructions [v8] In-Reply-To: References: Message-ID: <0xIpvHd1sguU9d85xrA0ArSlpi3Ed033AHQvZN3HXVY=.eb09a0d7-9eb3-4935-b041-047391134940@github.com> On Fri, 10 Sep 2021 14:00:18 GMT, Andrew Haley wrote: >> An interleaved version of AES/GCM. >> >> Performance, now and then: >> >> >> Apple M1, 3.2 GHz: >> >> Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> AESGCMBench.decrypt 8192 256 avgt 6 3108.881 ? 119.675 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 3109.685 ? 4.206 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 3122.144 ? 113.379 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 3119.568 ? 192.877 ns/op >> >> AESGCMBench.decrypt 8192 256 avgt 6 89123.942 ? 111.977 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 91034.697 ? 161.469 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 89732.397 ? 106.370 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 89382.300 ? 139.300 ns/op >> >> Neoverse N1, 2.5GHz: >> >> Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> AESGCMBench.decrypt 8192 256 avgt 6 6296.575 ? 37.995 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 7380.326 ? 10.987 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 6293.090 ? 52.972 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 6357.536 ? 42.925 ns/op >> >> AESGCMBench.decrypt 8192 256 avgt 6 48745.085 ? 125.612 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 45062.599 ? 1548.950 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 42230.857 ? 520.562 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 45124.171 ? 1417.927 ns/op >> >> >> >> A note about the implementation for the reviewers: >> >> Unrolled and hand-scheduled intrinsics are often written in a way that >> I don't find satisfactory. Often they are a conglomeration of >> copy-and-paste programming and C macros, which makes them hard to >> understand and hard to maintain. I won't name any names, but there are >> many examples to be found in free software across the Internet, >> >> I spent a while thinking about a structured way to develop and >> implement them, and I think I've got something better. The idea is >> that you transform a pre-existing implementation into a generator for >> the interleaved version. The transformation shouldn't be too hard to >> do, but more importantly it should be possible for a reader to verify >> that the interleaved and unrolled version performs the same function. >> >> A generator takes the form of a subclass of `KernelGenerator`. The >> core idea is that the programmer defines the base case of the >> intrinsic and a method to generate a clone of it, shifted to a >> different set of registers. `KernelGenerator` will then generate >> several interleaved copies of the function, with each one using a >> different set of registers. >> >> The subclass must implement three methods: `length()`, which is the >> number of instruction bundles in the intrinsic, `generate(int n)` >> which emits the nth instruction bundle in the intrinsic, and `next()` >> which takes an instance of the generator and returns a version of it, >> shifted to a new set of registers. >> >> As an example, here's the inner loop of AES encryption: >> >> (Some details elided for clarity.) >> >> >> BIND(L_aes_loop); >> ld1(v0, T16B, post(from, 16)); >> >> cmpw(keylen, 44); >> br(Assembler::CC, L_rounds_44); >> br(Assembler::EQ, L_rounds_52); >> >> aes_round(v0, v17); >> aes_round(v0, v18); >> BIND(L_rounds_52); >> aes_round(v0, v19); >> aes_round(v0, v20); >> BIND(L_rounds_44); >> ... >> >> >> The generator for the unrolled version looks like: >> >> >> virtual void generate(int index) { >> switch (index) { >> case 0: >> ld1(_data, T16B, post(_from, 16)); // get 16 bytes of input >> break; >> case 1: >> if (_once) { >> cmpw(_keylen, 52); >> br(Assembler::LO, _rounds_44); >> br(Assembler::EQ, _rounds_52); >> } >> break; >> case 2: aes_round(_data, _subkeys + 0); break; >> case 3: aes_round(_data, _subkeys + 1); break; >> case 4: >> if (_once) bind(_rounds_52); >> break; >> case 5: aes_round(_data, _subkeys + 2); break; >> case 6: aes_round(_data, _subkeys + 3); break; >> case 7: >> if (_once) bind(_rounds_44); >> break; >> ... >> >> >> The job of converting a single inline intrinsic is, as you can see, >> not much more than adding a switch statement. Some instructions should >> only be emitted once, rather than several times, such as the labels >> and branches. (You can use a list of C++ lambdas rather than a switch >> statement to do the same thing, very LISP, but that seems a bit of a >> sledgehammer. YMMV.) >> >> I believe that this approach will be more maintainable and easier to >> understand than other approaches we've seen. Also, the number of >> unrolls is just a number that can be tweaked as required. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Whitespace src/hotspot/os_cpu/bsd_aarch64/vm_version_bsd_aarch64.cpp line 93: > 91: > 92: // All Apple-darwin Arm processors have AES. > 93: _features |= CPU_AES; This line could be with other _features statements. just for tidiness. ------------- PR: https://git.openjdk.java.net/jdk/pull/5390 From dholmes at openjdk.java.net Tue Sep 14 02:34:14 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 14 Sep 2021 02:34:14 GMT Subject: RFR: JDK-8272771: frame::pd_ps() is not implemented on any platform In-Reply-To: References: Message-ID: On Mon, 13 Sep 2021 08:01:26 GMT, Tobias Holenstein wrote: > removed frame::pd_ps() which is not implemented on any platform. Replaced the only usage of frame::pd_ps() in the debug function `ps()` with `frame::print_on`. Tested on Tier1. > > Thanks! Hi Tobias, Removal looks good, but one change requested. Thanks, David src/hotspot/share/utilities/debug.cpp line 514: > 512: tty->print("(guessing starting frame id=" PTR_FORMAT " based on current fp)\n", p2i(f.id())); > 513: p->trace_stack_from(vframe::new_vframe(&f, ®_map, p)); > 514: f.print_on(tty); I agree with @shipilev - as pd_ps() was a no-op on all platforms it should simply be deleted from this code, not replaced by `f.print_on`. ------------- Changes requested by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5487 From github.com+39413832+weixlu at openjdk.java.net Tue Sep 14 03:42:11 2021 From: github.com+39413832+weixlu at openjdk.java.net (Xiaowei Lu) Date: Tue, 14 Sep 2021 03:42:11 GMT Subject: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v8] In-Reply-To: References: <_IqJ7u4Vk7jF8E--2RzWfdnxYXDQr86TIsxA7sh_3WI=.4d2c4cd9-63c8-4921-b5a1-e77d66c10325@github.com> Message-ID: On Fri, 10 Sep 2021 07:41:32 GMT, Aleksey Shipilev wrote: >>> @shipilev Hi, I have tested this pull request as well as this pull request + `OrderAccess::release();` on specjbb 2015 on AArch64 (Kunpeng 920). Maybe there is a slight improvement on critical-jOPS? Here is the result. >> >> Thanks for testing. So explicit barrier does seem to result in a slight bump in critical-jOPS. >> >> I assume "base" results are this PR? If so, do you have performance results for the current master? In other words, it would be interesting to see three results: baseline (current master), this PR, and this PR + `OrderAccess::release()`. > >> @shipilev Yes, ?base? means this PR in my previous comment. Here is the result of the current master(i.e. revert all commits in this PR). It seems master performs better, so the cost of ?acquire? may be really high as you have said. > > (sighs) Thanks for testing. Do you have spare cycles to verify that "acquire" is indeed the culprit for this? It would be simple to check: replace all `mark_acquire()` to just `mark()` in this PR. I am somewhat sure that would not break things very much for the test runs. @shipilev quite confusing. I have replaced `mark_acquire()` in get_forwardee_raw() and get_forwardee_mutator() and run specjbb, only to see a slight decrease on critical-jOPS compared with master. But the implementation of LSE instructions isn't so efficient on the current server(kunpeng 920), which may bother CAS instructions with memory order. So I use an Ampere processor to run the tests again. However, Same as before, critical-jOPS decreases by about 3% even if we have replaced the acquire in forwardee access. Anyway, compared with current PR, relax `mark_acquire()` to `mark()` gives us perf boost. But I'm confused why it is below master, since we adopt `release` or even `relaxed` in self heal and forwardee update. on kunpeng 920 relax_acquire_1:RUN RESULT: hbIR (max attempted) = 41119, hbIR (settled) = 34282, max-jOPS = 30017, critical-jOPS = 22581 relax_acquire_2:RUN RESULT: hbIR (max attempted) = 41119, hbIR (settled) = 34282, max-jOPS = 30017, critical-jOPS = 22581 relax_acquire_3:RUN RESULT: hbIR (max attempted) = 34282, hbIR (settled) = 32419, max-jOPS = 29825, critical-jOPS = 21492 on Ampere master_1:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 107127, max-jOPS = 101742, critical-jOPS = 37649 master_2:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 107689, max-jOPS = 100516, critical-jOPS = 38331 master_3:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 103341, max-jOPS = 99291, critical-jOPS = 37898 relax_acquire_1:RUN RESULT: hbIR (max attempted) = 108894, hbIR (settled) = 104937, max-jOPS = 99094, critical-jOPS = 34048 relax_acquire_2:RUN RESULT: hbIR (max attempted) = 122581, hbIR (settled) = 106745, max-jOPS = 101742, critical-jOPS = 38273 relax_acquire_3:RUN RESULT: hbIR (max attempted) = 108894, hbIR (settled) = 104937, max-jOPS = 101271, critical-jOPS = 37701 ------------- PR: https://git.openjdk.java.net/jdk/pull/2496 From xliu at openjdk.java.net Tue Sep 14 04:37:10 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 14 Sep 2021 04:37:10 GMT Subject: RFR: JDK-8271567: AArch64: AES Galois CounterMode (GCM) interleaved implementation using vector instructions [v8] In-Reply-To: References: Message-ID: On Fri, 10 Sep 2021 14:00:18 GMT, Andrew Haley wrote: >> An interleaved version of AES/GCM. >> >> Performance, now and then: >> >> >> Apple M1, 3.2 GHz: >> >> Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> AESGCMBench.decrypt 8192 256 avgt 6 3108.881 ? 119.675 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 3109.685 ? 4.206 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 3122.144 ? 113.379 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 3119.568 ? 192.877 ns/op >> >> AESGCMBench.decrypt 8192 256 avgt 6 89123.942 ? 111.977 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 91034.697 ? 161.469 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 89732.397 ? 106.370 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 89382.300 ? 139.300 ns/op >> >> Neoverse N1, 2.5GHz: >> >> Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> AESGCMBench.decrypt 8192 256 avgt 6 6296.575 ? 37.995 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 7380.326 ? 10.987 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 6293.090 ? 52.972 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 6357.536 ? 42.925 ns/op >> >> AESGCMBench.decrypt 8192 256 avgt 6 48745.085 ? 125.612 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 45062.599 ? 1548.950 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 42230.857 ? 520.562 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 45124.171 ? 1417.927 ns/op >> >> >> >> A note about the implementation for the reviewers: >> >> Unrolled and hand-scheduled intrinsics are often written in a way that >> I don't find satisfactory. Often they are a conglomeration of >> copy-and-paste programming and C macros, which makes them hard to >> understand and hard to maintain. I won't name any names, but there are >> many examples to be found in free software across the Internet, >> >> I spent a while thinking about a structured way to develop and >> implement them, and I think I've got something better. The idea is >> that you transform a pre-existing implementation into a generator for >> the interleaved version. The transformation shouldn't be too hard to >> do, but more importantly it should be possible for a reader to verify >> that the interleaved and unrolled version performs the same function. >> >> A generator takes the form of a subclass of `KernelGenerator`. The >> core idea is that the programmer defines the base case of the >> intrinsic and a method to generate a clone of it, shifted to a >> different set of registers. `KernelGenerator` will then generate >> several interleaved copies of the function, with each one using a >> different set of registers. >> >> The subclass must implement three methods: `length()`, which is the >> number of instruction bundles in the intrinsic, `generate(int n)` >> which emits the nth instruction bundle in the intrinsic, and `next()` >> which takes an instance of the generator and returns a version of it, >> shifted to a new set of registers. >> >> As an example, here's the inner loop of AES encryption: >> >> (Some details elided for clarity.) >> >> >> BIND(L_aes_loop); >> ld1(v0, T16B, post(from, 16)); >> >> cmpw(keylen, 44); >> br(Assembler::CC, L_rounds_44); >> br(Assembler::EQ, L_rounds_52); >> >> aes_round(v0, v17); >> aes_round(v0, v18); >> BIND(L_rounds_52); >> aes_round(v0, v19); >> aes_round(v0, v20); >> BIND(L_rounds_44); >> ... >> >> >> The generator for the unrolled version looks like: >> >> >> virtual void generate(int index) { >> switch (index) { >> case 0: >> ld1(_data, T16B, post(_from, 16)); // get 16 bytes of input >> break; >> case 1: >> if (_once) { >> cmpw(_keylen, 52); >> br(Assembler::LO, _rounds_44); >> br(Assembler::EQ, _rounds_52); >> } >> break; >> case 2: aes_round(_data, _subkeys + 0); break; >> case 3: aes_round(_data, _subkeys + 1); break; >> case 4: >> if (_once) bind(_rounds_52); >> break; >> case 5: aes_round(_data, _subkeys + 2); break; >> case 6: aes_round(_data, _subkeys + 3); break; >> case 7: >> if (_once) bind(_rounds_44); >> break; >> ... >> >> >> The job of converting a single inline intrinsic is, as you can see, >> not much more than adding a switch statement. Some instructions should >> only be emitted once, rather than several times, such as the labels >> and branches. (You can use a list of C++ lambdas rather than a switch >> statement to do the same thing, very LISP, but that seems a bit of a >> sledgehammer. YMMV.) >> >> I believe that this approach will be more maintainable and easier to >> understand than other approaches we've seen. Also, the number of >> unrolls is just a number that can be tweaked as required. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Whitespace src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 3001: > 2999: assert(bulk_width == 4 || bulk_width == 8, "must be"); > 3000: > 3001: if (bulk_width == 8) { `bulk_width` is defined as a constant 4. why do you also check bulk_width == 8? is this parameter tunable? same as "const int unroll = 4" below. ------------- PR: https://git.openjdk.java.net/jdk/pull/5390 From xliu at openjdk.java.net Tue Sep 14 05:17:06 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 14 Sep 2021 05:17:06 GMT Subject: RFR: JDK-8271567: AArch64: AES Galois CounterMode (GCM) interleaved implementation using vector instructions [v8] In-Reply-To: References: Message-ID: On Fri, 10 Sep 2021 14:00:18 GMT, Andrew Haley wrote: >> An interleaved version of AES/GCM. >> >> Performance, now and then: >> >> >> Apple M1, 3.2 GHz: >> >> Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> AESGCMBench.decrypt 8192 256 avgt 6 3108.881 ? 119.675 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 3109.685 ? 4.206 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 3122.144 ? 113.379 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 3119.568 ? 192.877 ns/op >> >> AESGCMBench.decrypt 8192 256 avgt 6 89123.942 ? 111.977 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 91034.697 ? 161.469 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 89732.397 ? 106.370 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 89382.300 ? 139.300 ns/op >> >> Neoverse N1, 2.5GHz: >> >> Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> AESGCMBench.decrypt 8192 256 avgt 6 6296.575 ? 37.995 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 7380.326 ? 10.987 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 6293.090 ? 52.972 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 6357.536 ? 42.925 ns/op >> >> AESGCMBench.decrypt 8192 256 avgt 6 48745.085 ? 125.612 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 45062.599 ? 1548.950 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 42230.857 ? 520.562 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 45124.171 ? 1417.927 ns/op >> >> >> >> A note about the implementation for the reviewers: >> >> Unrolled and hand-scheduled intrinsics are often written in a way that >> I don't find satisfactory. Often they are a conglomeration of >> copy-and-paste programming and C macros, which makes them hard to >> understand and hard to maintain. I won't name any names, but there are >> many examples to be found in free software across the Internet, >> >> I spent a while thinking about a structured way to develop and >> implement them, and I think I've got something better. The idea is >> that you transform a pre-existing implementation into a generator for >> the interleaved version. The transformation shouldn't be too hard to >> do, but more importantly it should be possible for a reader to verify >> that the interleaved and unrolled version performs the same function. >> >> A generator takes the form of a subclass of `KernelGenerator`. The >> core idea is that the programmer defines the base case of the >> intrinsic and a method to generate a clone of it, shifted to a >> different set of registers. `KernelGenerator` will then generate >> several interleaved copies of the function, with each one using a >> different set of registers. >> >> The subclass must implement three methods: `length()`, which is the >> number of instruction bundles in the intrinsic, `generate(int n)` >> which emits the nth instruction bundle in the intrinsic, and `next()` >> which takes an instance of the generator and returns a version of it, >> shifted to a new set of registers. >> >> As an example, here's the inner loop of AES encryption: >> >> (Some details elided for clarity.) >> >> >> BIND(L_aes_loop); >> ld1(v0, T16B, post(from, 16)); >> >> cmpw(keylen, 44); >> br(Assembler::CC, L_rounds_44); >> br(Assembler::EQ, L_rounds_52); >> >> aes_round(v0, v17); >> aes_round(v0, v18); >> BIND(L_rounds_52); >> aes_round(v0, v19); >> aes_round(v0, v20); >> BIND(L_rounds_44); >> ... >> >> >> The generator for the unrolled version looks like: >> >> >> virtual void generate(int index) { >> switch (index) { >> case 0: >> ld1(_data, T16B, post(_from, 16)); // get 16 bytes of input >> break; >> case 1: >> if (_once) { >> cmpw(_keylen, 52); >> br(Assembler::LO, _rounds_44); >> br(Assembler::EQ, _rounds_52); >> } >> break; >> case 2: aes_round(_data, _subkeys + 0); break; >> case 3: aes_round(_data, _subkeys + 1); break; >> case 4: >> if (_once) bind(_rounds_52); >> break; >> case 5: aes_round(_data, _subkeys + 2); break; >> case 6: aes_round(_data, _subkeys + 3); break; >> case 7: >> if (_once) bind(_rounds_44); >> break; >> ... >> >> >> The job of converting a single inline intrinsic is, as you can see, >> not much more than adding a switch statement. Some instructions should >> only be emitted once, rather than several times, such as the labels >> and branches. (You can use a list of C++ lambdas rather than a switch >> statement to do the same thing, very LISP, but that seems a bit of a >> sledgehammer. YMMV.) >> >> I believe that this approach will be more maintainable and easier to >> understand than other approaches we've seen. Also, the number of >> unrolls is just a number that can be tweaked as required. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Whitespace src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 2863: > 2861: // > 2862: // int result = len; > 2863: // while (len-- > 0) { I see that algorithm code comes from CounterMode.implCrypt, but while (len-- > 0) seems not to be exactly same as algorithm here. I think it should be `while (len > 0)` `blockSize()` at line 2865 should be `blockSize` ------------- PR: https://git.openjdk.java.net/jdk/pull/5390 From xliu at openjdk.java.net Tue Sep 14 05:28:07 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 14 Sep 2021 05:28:07 GMT Subject: RFR: JDK-8271567: AArch64: AES Galois CounterMode (GCM) interleaved implementation using vector instructions [v8] In-Reply-To: References: Message-ID: On Fri, 10 Sep 2021 14:00:18 GMT, Andrew Haley wrote: >> An interleaved version of AES/GCM. >> >> Performance, now and then: >> >> >> Apple M1, 3.2 GHz: >> >> Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> AESGCMBench.decrypt 8192 256 avgt 6 3108.881 ? 119.675 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 3109.685 ? 4.206 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 3122.144 ? 113.379 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 3119.568 ? 192.877 ns/op >> >> AESGCMBench.decrypt 8192 256 avgt 6 89123.942 ? 111.977 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 91034.697 ? 161.469 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 89732.397 ? 106.370 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 89382.300 ? 139.300 ns/op >> >> Neoverse N1, 2.5GHz: >> >> Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> AESGCMBench.decrypt 8192 256 avgt 6 6296.575 ? 37.995 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 7380.326 ? 10.987 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 6293.090 ? 52.972 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 6357.536 ? 42.925 ns/op >> >> AESGCMBench.decrypt 8192 256 avgt 6 48745.085 ? 125.612 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 45062.599 ? 1548.950 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 42230.857 ? 520.562 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 45124.171 ? 1417.927 ns/op >> >> >> >> A note about the implementation for the reviewers: >> >> Unrolled and hand-scheduled intrinsics are often written in a way that >> I don't find satisfactory. Often they are a conglomeration of >> copy-and-paste programming and C macros, which makes them hard to >> understand and hard to maintain. I won't name any names, but there are >> many examples to be found in free software across the Internet, >> >> I spent a while thinking about a structured way to develop and >> implement them, and I think I've got something better. The idea is >> that you transform a pre-existing implementation into a generator for >> the interleaved version. The transformation shouldn't be too hard to >> do, but more importantly it should be possible for a reader to verify >> that the interleaved and unrolled version performs the same function. >> >> A generator takes the form of a subclass of `KernelGenerator`. The >> core idea is that the programmer defines the base case of the >> intrinsic and a method to generate a clone of it, shifted to a >> different set of registers. `KernelGenerator` will then generate >> several interleaved copies of the function, with each one using a >> different set of registers. >> >> The subclass must implement three methods: `length()`, which is the >> number of instruction bundles in the intrinsic, `generate(int n)` >> which emits the nth instruction bundle in the intrinsic, and `next()` >> which takes an instance of the generator and returns a version of it, >> shifted to a new set of registers. >> >> As an example, here's the inner loop of AES encryption: >> >> (Some details elided for clarity.) >> >> >> BIND(L_aes_loop); >> ld1(v0, T16B, post(from, 16)); >> >> cmpw(keylen, 44); >> br(Assembler::CC, L_rounds_44); >> br(Assembler::EQ, L_rounds_52); >> >> aes_round(v0, v17); >> aes_round(v0, v18); >> BIND(L_rounds_52); >> aes_round(v0, v19); >> aes_round(v0, v20); >> BIND(L_rounds_44); >> ... >> >> >> The generator for the unrolled version looks like: >> >> >> virtual void generate(int index) { >> switch (index) { >> case 0: >> ld1(_data, T16B, post(_from, 16)); // get 16 bytes of input >> break; >> case 1: >> if (_once) { >> cmpw(_keylen, 52); >> br(Assembler::LO, _rounds_44); >> br(Assembler::EQ, _rounds_52); >> } >> break; >> case 2: aes_round(_data, _subkeys + 0); break; >> case 3: aes_round(_data, _subkeys + 1); break; >> case 4: >> if (_once) bind(_rounds_52); >> break; >> case 5: aes_round(_data, _subkeys + 2); break; >> case 6: aes_round(_data, _subkeys + 3); break; >> case 7: >> if (_once) bind(_rounds_44); >> break; >> ... >> >> >> The job of converting a single inline intrinsic is, as you can see, >> not much more than adding a switch statement. Some instructions should >> only be emitted once, rather than several times, such as the labels >> and branches. (You can use a list of C++ lambdas rather than a switch >> statement to do the same thing, very LISP, but that seems a bit of a >> sledgehammer. YMMV.) >> >> I believe that this approach will be more maintainable and easier to >> understand than other approaches we've seen. Also, the number of >> unrolls is just a number that can be tweaked as required. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Whitespace src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5605: > 5603: address small = generate_ghash_processBlocks(); > 5604: > 5605: StubCodeMark mark(this, "StubRoutines", "ghash_processBlocks"); ghash_processBlocks_wide? otherwise, there will be two stubs with a same name. ------------- PR: https://git.openjdk.java.net/jdk/pull/5390 From sakatakui at oss.nttdata.com Tue Sep 14 06:44:41 2021 From: sakatakui at oss.nttdata.com (Koichi Sakata) Date: Tue, 14 Sep 2021 15:44:41 +0900 Subject: Regarding options of error and dump file paths In-Reply-To: References: <92708e25-331f-f832-144b-eb00e2b0a4ac@oss.nttdata.com> <8f51414c-86c1-49de-7b5f-4af0fae556aa@oracle.com> Message-ID: <8282936293d12664a02b44dc6c169fc0@oss.nttdata.com> Hi all, I believe that the option helps us, especially people who belong to support team.?Because it enables us easily to get required files to troubleshoot. It's also useful in container environment. We save those files when we set a path of the option to persistent volume, even if container are deleted. So I'm thinking about how the option works. First of all, it should deal with following files. - GC (heap dumps) - JIT (replay files) - hs_err files - JFR (a number of files) Whereas it should exclude files as follows. - jcmd/dcmd dumps - Unified logging Let's see concrete usage examples of the option. Suppose we name the option ReportDir. Case 1: Set no options JVM outputs files in each default directory when we set no options. - GC: ./java_pid%p.hprof - JIT: ./replay_pid%p.log - hs_err files: ./hs_err_pid%p.log - JFR: ./hs_err_pid%p.jfr, ./hs_oom_pid%p.jfr, ./hs_soe_pid%p.jfr Case 2: Set the option only We just run `java -XX:ReportDir=/foo/bar/ ...`, then those files are putted in the /foo/bar/ directory. - GC: /foo/bar/java_pid%p.hprof - JIT: /foo/bar/replay_pid%p.log - hs_err files: /foo/bar/hs_err_pid%p.log - JFR: /foo/bar/hs_err_pid%p.jfr, /foo/bar/hs_oom_pid%p.jfr, /foo/bar/hs_soe_pid%p.jfr Case 3: Set the option with a relative path Suppose the working directory is /home/duke, run `java -XX:ReportDir=./foo/bar/ ...`. JVM finds the output directory from the working directory and the relative path. - GC: /home/duke/foo/bar/java_pid%p.hprof - JIT: /home/duke/foo/bar/replay_pid%p.log - hs_err files: /home/duke/foo/bar/hs_err_pid%p.log - JFR: /home/duke/foo/bar/hs_err_pid%p.jfr, /home/duke/foo/bar/hs_oom_pid%p.jfr, /home/duke/foo/bar/hs_soe_pid%p.jfr Case 4: Set the option with the existing path option Run `java -XX:ReportDir=/foo/bar/ -XX:ErrorFile=/home/duke/hs_err_pid%p.log ...`. The path of ErrorFile overrides the value of ReportDir. - GC: /foo/bar/java_pid%p.hprof - JIT: /foo/bar/replay_pid%p.log - hs_err files: /home/duke/hs_err_pid%p.log <- It differs from the others - JFR: /foo/bar/hs_err_pid%p.jfr, /foo/bar/hs_oom_pid%p.jfr, /foo/bar/hs_soe_pid%p.jfr Case 5: Set the option with the existing path option which has a relative path Suppose the working directory is /home/duke, run `java -XX:ReportDir=./foo/bar/ -XX:HeapDumpPath=./baz/ -XX:+HeapDumpOnOutOfMemoryError ...`. - GC: /home/duke/foo/bar/baz/java_pid%p.hprof <- It differs from the others - JIT: /home/duke/foo/bar/replay_pid%p.log - hs_err files: /home/duke/foo/bar/hs_err_pid%p.log - JFR: /home/duke/foo/bar/hs_err_pid%p.jfr, /home/duke/foo/bar/hs_oom_pid%p.jfr, /home/duke/foo/bar/hs_soe_pid%p.jfr The above example finds the heap dump path by the combination of the working directory, the relative path of ReportDir and the relative path of HeapDumpPath. As an alternative idea, we can ignore the relative path of ReportDir when HeapDumpPath has a relative path. In that case, the heap dump path is as follows. - GC: /home/duke/baz/java_pid%p.hprof In either case, I recognize that using relative paths will be slightly complicated... Last but not least, I should be pleased if we could go ahead with this topic. Regards, Koichi On 03-09-2021 05:41 PM, Koichi Sakata wrote: > Hi David, > > I?m sorry for the late reply. Thank you for your great advice. > >> Having an explicit option override the default directory option is a >> good idea, but I'm not sure it is that clear cut. If you can specify a >> relative directory and file name for a given dump file, might you not >> want that to be relative to the specified default path, rather than >> relative to the pwd? > > I occasionally want to use a relative path from the specified default > path. This usage might confuse the path where files are outputted and > complicate to fix, so we probably should prohibit relative paths when > we use the default path. We can choose the specification after we find > detailed expectations. > >> And we actually have quite a lot of potential output files from: >> ?? - GC (heap dumps) >> ?? - JIT (replay files) >> ?? - hs_err files >> ?? - JFR (a number of files) >> ?? - jcmd/dcmd dumps? >> ?? - Unified logging? >> >> I think figuring out the exact details of how this should work, and >> interact with all the different files involved may be more involved >> than >> just prepending a path component. > > I completely agree with you. To enable the new option needs a lot of > our work, but that will improve convenience for users, I believe. > Enabling easily to gathering error related files in one place helps us > to troubleshoot. Not so many users set all these path options. If they > use the new option, all they have to do will be sending files in the > directory to their support personnel. In addition, they will get > easier to keep files even on container environments. > >> I also think I would need to hear much greater demand, with detailed >> usage expectations, before supporting this. > > I think so, too. I'd like to hear various people's point of view. > > Regards, > Koichi > > > On 2021/08/26 15:23, David Holmes wrote: >> Hi Koichi, >> >> On 23/08/2021 1:29 pm, Koichi Sakata wrote: >>> Hi all, >>> >>> I'm writing to get feedback on my idea about options for error and >>> dump file paths. >>> >>> First of all, we can specify several options related to error and >>> dump files. For example, the HeapDumpPath option sets the heap dump >>> file and the ErrorFile option sets the hs_error file. >>> >>> I've felt inconvenience about that because we need to write all path >>> options to put those files in a specific directory. I also recognize >>> that they are outputted in the working directory when I run an >>> application with no options. But I'd like to keep them in any >>> directory. So the new option that sets the directory where those >>> files are outputted would be useful, I think. >>> >>> The new option helps us especially to run applications on containers >>> like Docker, Kubernetes etc. If we run them without those existing >>> options on containers, files will be put in the local directory of >>> each container. We lose files after we operate the container such as >>> deleting it. The option enables us to keep certainly all error and >>> dump files if we just specify the path of the persistent volume for >>> the new option. >>> >>> As a concrete example, when we specify -XX:ErrorAndDumpPath=/foo/bar/ >>> (This option name is tentative), -XX:+HeapDumpOnOutOfMemoryError and >>> -XX:StartFlightRecording etc., files are generated in the /foo/bar >>> directory. From my point of view, the option will deal with the >>> following files: >>> - heap dump file (java_pid%p.hprof) >>> - error log file (hs_err_pid%p.log) >>> - JFR emergency dumps (hs_err_pid%p.jfr, hs_oom_pid%p.jfr, >>> hs_soe_pid%p.jfr) >>> - replay file (replay_pid%p.log) >>> >>> The existing path options should override the new option. If I set >>> -XX:ErrorAndDumpPath=/foo/bar/ and -XX:HeapDumpPath=/foo/baz/, a heap >>> dump file will be in the /foo/baz directory and other files will be >>> created in the /foo/bar. >>> >>> I would like to hear your point of view. If some people agree to this >>> idea, I will write a patch. >> >> My initial reaction was that this seemed something better handled in a >> launch script because I figured if you had complex needs in relation >> to where these files were being placed, then you'd use a launch script >> to help manage that anyway. >> >> But I can see there would be some convenience to controlling the >> output directory without also having to restate the default file >> names. >> >> Having an explicit option override the default directory option is a >> good idea, but I'm not sure it is that clear cut. If you can specify a >> relative directory and file name for a given dump file, might you not >> want that to be relative to the specified default path, rather than >> relative to the pwd? >> >> And we actually have quite a lot of potential output files from: >> ?- GC (heap dumps) >> ?- JIT (replay files) >> ?- hs_err files >> ?- JFR (a number of files) >> ?- jcmd/dcmd dumps? >> ?- Unified logging? >> >> I think figuring out the exact details of how this should work, and >> interact with all the different files involved may be more involved >> than just prepending a path component. >> >> I also think I would need to hear much greater demand, with detailed >> usage expectations, before supporting this. >> >> Just my 2c. >> >> Cheers, >> David >> ----- >> >>> Regards, >>> Koichi From github.com+71546117+tobiasholenstein at openjdk.java.net Tue Sep 14 07:07:34 2021 From: github.com+71546117+tobiasholenstein at openjdk.java.net (Tobias Holenstein) Date: Tue, 14 Sep 2021 07:07:34 GMT Subject: RFR: JDK-8272771: frame::pd_ps() is not implemented on any platform [v2] In-Reply-To: References: Message-ID: <-ijcyfXrSJxaJqJyhRIf8WOm7CuScV5wM8JDr0dZEag=.4d2e2a83-a2ff-47d3-8e98-29c656feb35e@github.com> > removed frame::pd_ps() which is not implemented on any platform. Replaced the only usage of frame::pd_ps() in the debug function `ps()` with `frame::print_on`. Tested on Tier1. > > Thanks! Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: JDK-8272771: removed call to print_on() in debug::ps() ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5487/files - new: https://git.openjdk.java.net/jdk/pull/5487/files/17adf063..f6ef51aa Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5487&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5487&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5487.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5487/head:pull/5487 PR: https://git.openjdk.java.net/jdk/pull/5487 From github.com+71546117+tobiasholenstein at openjdk.java.net Tue Sep 14 07:10:13 2021 From: github.com+71546117+tobiasholenstein at openjdk.java.net (Tobias Holenstein) Date: Tue, 14 Sep 2021 07:10:13 GMT Subject: RFR: JDK-8272771: frame::pd_ps() is not implemented on any platform [v2] In-Reply-To: References: Message-ID: On Tue, 14 Sep 2021 02:30:42 GMT, David Holmes wrote: >> Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: >> >> JDK-8272771: removed call to print_on() in debug::ps() > > src/hotspot/share/utilities/debug.cpp line 514: > >> 512: tty->print("(guessing starting frame id=" PTR_FORMAT " based on current fp)\n", p2i(f.id())); >> 513: p->trace_stack_from(vframe::new_vframe(&f, ®_map, p)); >> 514: f.print_on(tty); > > I agree with @shipilev - as pd_ps() was a no-op on all platforms it should simply be deleted from this code, not replaced by `f.print_on`. you are right - I removed the call to `f.print_on(tty)` now. Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/5487 From forax at univ-mlv.fr Tue Sep 14 07:11:37 2021 From: forax at univ-mlv.fr (Remi Forax) Date: Tue, 14 Sep 2021 09:11:37 +0200 (CEST) Subject: RFR: 8273392: Improve usability of stack-less exceptions due to -XX:+OmitStackTraceInFastThrow In-Reply-To: References: Message-ID: <1955977649.703493.1631603497495.JavaMail.zimbra@u-pem.fr> (not a reviewer so this message will not be really helpful ...) Hi Volker, for me it's not an enhancement, but a bug fix, in production an exception with no stacktrace is useless and result in hours lost trying to figure out the issue (see by example [1] on stackoverflow). This is not a new issue, this bug pop time to time since OmitStackTraceInFastThrow was introduced (in 1.4.x, i believe). Thanks to taking the time to fix that. regards, R?mi [1] https://stackoverflow.com/questions/2411487/nullpointerexception-in-java-with-no-stacktrace ----- Original Message ----- > From: "Volker Simonis" > To: "Volker Simonis" > Cc: "hotspot-dev" , "hotspot compiler" > Sent: Lundi 13 Septembre 2021 10:08:40 > Subject: Re: RFR: 8273392: Improve usability of stack-less exceptions due to -XX:+OmitStackTraceInFastThrow > Hi, > > may I kindly ask somebody to please take a look at this PR? > > Thank you and best regards, > Volker > > On Tue, Sep 7, 2021 at 5:42 PM Volker Simonis wrote: >> >> If running with `-XX:+OmitStackTraceInFastThrow` (which is the default) C2 will >> optimize certain "hot" implicit exceptions (i.e. AIOOBE, >> NullPointerExceptions,..) and replace them by a static, pre-allocated exception >> without any stacktrace. >> >> However, we can actually do better. Instead of using a single, pre-allocated >> exception object for all methods we can let the compiler allocate specific >> exceptions for each compilation unit (i.e. nmethod) and fill them with at least >> one stack frame with the method /line-number information of the currently >> compiled method. If the method in question is being inlined (which often >> happens), we can add stackframes for all callers up to the inlining depth of >> the method in question. >> >> For the attached JTreg test, we get the following exception in interpreter mode: >> >> java.lang.NullPointerException: Cannot read the array length because >> "" is null >> at >> compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> at >> compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) >> at >> compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) >> at >> compiler.exceptions.StackFrameInFastThrow.main(StackFrameInFastThrow.java:233) >> >> Once the method gets compiled with `-XX:+OmitStackTraceInFastThrow` the same >> exception will look as follows: >> >> java.lang.NullPointerException >> >> After this change, if `StackFrameInFastThrow.throwImplicitException()` will be >> compiled stand alone, we will get: >> >> java.lang.NullPointerException >> at >> compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> >> and if `StackFrameInFastThrow.throwImplicitException()` will be inlined into >> `level2()` and `level2()` into `level1()` we will get the following exception >> (altough we're still running with `-XX:+OmitStackTraceInFastThrow`): >> >> java.lang.NullPointerException >> at >> compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> at >> compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) >> at >> compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) >> >> The new functionality is guarded by `-XX:+/-StackFrameInFastThrow`, but switched >> on by default (I'll create a CSR for the new option once reviewers are >> comfortable with the change). Notice that the optimization comes at no run-time >> costs because all the extra work will be done at compile time. >> >> ## Implementation details >> >> - Already the current implementation of `-XX:+OmitStackTraceInFastThrow` >> potentially lazy-allocates the empty singleton exceptions like AIOOBE in >> `ciEnv::ArrayStoreException_instance()`. With this change, if running with >> `-XX:+StackFrameInFastThrow` we will always allocate new exception objects and >> populate them with the stack frames which are statically available at compile >> time (see `java_lang_Throwable::fill_in_stack_trace_of_implicit_exception()`). >> - Because nmethods don't act as strong GC roots, we have to create a global JNI >> handle for every newly generated exception to prevent GC from collecting them. >> - In order to avoid a memory leak we have to release these global JNI handles >> once a nmethod gets unloaded. In order to achieve this, I've added a new >> section "implicit exceptions" to the nmethod which holds these JNI handles. >> - While adding the new "implicit exceptions" section to the corresponding stats >> (`print_nmethod_stats()` and printing routines (`nmethod::print()`) I realized >> that a previous change ([JDK-8254231: Implementation of Foreign Linker API >> (Incubator)](https://bugs.openjdk.java.net/browse/JDK-8254231)) had already >> introduced a new nmethod section ("native invokers") but missed to add it to >> the corresponding stats and printing routines so I've added that section as >> well. >> - The `#ifdef COMPILER2` guards are only required to not break the >> `zero`/`minimal` builds. >> - The JTreg test is using `-XX:PerMethodTrapLimit=0` to handle all implicit >> exceptions as "hot". This makes the test simpler and at the same time provokes >> the allocation of more implicit exceptions. >> - Manually verified that the created Exception objects are freed by GC once the >> corresponding nmethods have been flushed. >> - Manual "stress" test with a very small heap and continuous recompilation of >> methods with explicit exceptions to provoke GCs during compilation didn't >> reveal any issues. >> >> ------------- >> >> Commit messages: >> - 8273392: Improve usability of stack-less exceptions due to >> -XX:+OmitStackTraceInFastThrow >> >> Changes: https://git.openjdk.java.net/jdk/pull/5392/files >> Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5392&range=00 >> Issue: https://bugs.openjdk.java.net/browse/JDK-8273392 >> Stats: 538 lines in 12 files changed: 417 ins; 6 del; 115 mod >> Patch: https://git.openjdk.java.net/jdk/pull/5392.diff >> Fetch: git fetch https://git.openjdk.java.net/jdk pull/5392/head:pull/5392 >> > > PR: https://git.openjdk.java.net/jdk/pull/5392 From thartmann at openjdk.java.net Tue Sep 14 07:31:12 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 14 Sep 2021 07:31:12 GMT Subject: RFR: JDK-8272771: frame::pd_ps() is not implemented on any platform [v2] In-Reply-To: <-ijcyfXrSJxaJqJyhRIf8WOm7CuScV5wM8JDr0dZEag=.4d2e2a83-a2ff-47d3-8e98-29c656feb35e@github.com> References: <-ijcyfXrSJxaJqJyhRIf8WOm7CuScV5wM8JDr0dZEag=.4d2e2a83-a2ff-47d3-8e98-29c656feb35e@github.com> Message-ID: On Tue, 14 Sep 2021 07:07:34 GMT, Tobias Holenstein wrote: >> removed frame::pd_ps() which is not implemented on any platform. Replaced the only usage of frame::pd_ps() in the debug function `ps()` with `frame::print_on`. Tested on Tier1. >> >> Thanks! > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8272771: removed call to print_on() in debug::ps() Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5487 From shade at openjdk.java.net Tue Sep 14 07:36:20 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 14 Sep 2021 07:36:20 GMT Subject: RFR: JDK-8272771: frame::pd_ps() is not implemented on any platform [v2] In-Reply-To: <-ijcyfXrSJxaJqJyhRIf8WOm7CuScV5wM8JDr0dZEag=.4d2e2a83-a2ff-47d3-8e98-29c656feb35e@github.com> References: <-ijcyfXrSJxaJqJyhRIf8WOm7CuScV5wM8JDr0dZEag=.4d2e2a83-a2ff-47d3-8e98-29c656feb35e@github.com> Message-ID: On Tue, 14 Sep 2021 07:07:34 GMT, Tobias Holenstein wrote: >> removed frame::pd_ps() which is not implemented on any platform. Replaced the only usage of frame::pd_ps() in the debug function `ps()` with `frame::print_on`. Tested on Tier1. >> >> Thanks! > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8272771: removed call to print_on() in debug::ps() Marked as reviewed by shade (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5487 From github.com+741251+turbanoff at openjdk.java.net Tue Sep 14 07:46:13 2021 From: github.com+741251+turbanoff at openjdk.java.net (Andrey Turbanov) Date: Tue, 14 Sep 2021 07:46:13 GMT Subject: RFR: 8272992: Replace usages of Collections.sort with List.sort call in jdk.* modules [v2] In-Reply-To: References: <9xbhI0rwD3XbAHZFfQAkJHYivbC5F4N085RuSVWx8HU=.8a470c93-5fee-4981-97e4-afb6cb1147b9@github.com> Message-ID: On Mon, 13 Sep 2021 22:11:24 GMT, liach wrote: >> src/jdk.jfr/share/classes/jdk/jfr/internal/consumer/RepositoryFiles.java line 215: >> >>> 213: pathLookup.remove(remove); >>> 214: } >>> 215: added.sort((p1, p2) -> p1.compareTo(p2)); >> >> 'added.sort(Path::compareTo)' > > Can't we just use natural ordering `null` here? Replaced with `Collections.sort` without comparator argument. I think it's a bit easier to read than with `null`. https://github.com/openjdk/jdk/pull/5229#discussion_r695525255 ------------- PR: https://git.openjdk.java.net/jdk/pull/5230 From github.com+741251+turbanoff at openjdk.java.net Tue Sep 14 07:46:12 2021 From: github.com+741251+turbanoff at openjdk.java.net (Andrey Turbanov) Date: Tue, 14 Sep 2021 07:46:12 GMT Subject: RFR: 8272992: Replace usages of Collections.sort with List.sort call in jdk.* modules [v2] In-Reply-To: <9xbhI0rwD3XbAHZFfQAkJHYivbC5F4N085RuSVWx8HU=.8a470c93-5fee-4981-97e4-afb6cb1147b9@github.com> References: <9xbhI0rwD3XbAHZFfQAkJHYivbC5F4N085RuSVWx8HU=.8a470c93-5fee-4981-97e4-afb6cb1147b9@github.com> Message-ID: > Collections.sort is just a wrapper, so it is better to use an instance method directly. Andrey Turbanov has updated the pull request incrementally with one additional commit since the last revision: 8272992: Replace usages of Collections.sort with List.sort call in jdk.* modules ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5230/files - new: https://git.openjdk.java.net/jdk/pull/5230/files/beec68e5..fcf53eda Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5230&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5230&range=00-01 Stats: 26 lines in 8 files changed: 5 ins; 13 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/5230.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5230/head:pull/5230 PR: https://git.openjdk.java.net/jdk/pull/5230 From github.com+741251+turbanoff at openjdk.java.net Tue Sep 14 07:46:14 2021 From: github.com+741251+turbanoff at openjdk.java.net (Andrey Turbanov) Date: Tue, 14 Sep 2021 07:46:14 GMT Subject: RFR: 8272992: Replace usages of Collections.sort with List.sort call in jdk.* modules [v2] In-Reply-To: References: <9xbhI0rwD3XbAHZFfQAkJHYivbC5F4N085RuSVWx8HU=.8a470c93-5fee-4981-97e4-afb6cb1147b9@github.com> Message-ID: <4m7SMEEFOq68hcYaJVnA3U2hbTZs-ahh6vczqmYjfhY=.f015ebd0-3162-423e-a6eb-5fb12b860f0b@github.com> On Mon, 13 Sep 2021 17:58:02 GMT, R?mi Forax wrote: >> Andrey Turbanov has updated the pull request incrementally with one additional commit since the last revision: >> >> 8272992: Replace usages of Collections.sort with List.sort call in jdk.* modules > > src/jdk.jfr/share/classes/jdk/jfr/internal/dcmd/DCmdCheck.java line 137: > >> 135: List sorted = new ArrayList<>(); >> 136: sorted.addAll(events); >> 137: sorted.sort(new Comparator() { > > I wonder if there is a bootstrap issue here (why an anonymous class is used instead of a lambda?) > If a lambda can be used, it cn be simplified to > `sorted.sort(Comparator.comparing(EventType::getName))` As I can see lambdas are used in other places in this module. Replaced > src/jdk.jfr/share/classes/jdk/jfr/internal/tool/Summary.java line 145: > >> 143: println(" Duration: " + (totalDuration + 500_000_000) / 1_000_000_000 + " s"); >> 144: List statsList = new ArrayList<>(stats.values()); >> 145: statsList.sort((u, v) -> Long.compare(v.count, u.count)); > > `statsList.sort(Comparator.comparingLong(v -> v.count))` replaced ------------- PR: https://git.openjdk.java.net/jdk/pull/5230 From xliu at openjdk.java.net Tue Sep 14 07:51:12 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 14 Sep 2021 07:51:12 GMT Subject: RFR: JDK-8271567: AArch64: AES Galois CounterMode (GCM) interleaved implementation using vector instructions [v8] In-Reply-To: References: Message-ID: On Fri, 10 Sep 2021 14:00:18 GMT, Andrew Haley wrote: >> An interleaved version of AES/GCM. >> >> Performance, now and then: >> >> >> Apple M1, 3.2 GHz: >> >> Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> AESGCMBench.decrypt 8192 256 avgt 6 3108.881 ? 119.675 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 3109.685 ? 4.206 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 3122.144 ? 113.379 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 3119.568 ? 192.877 ns/op >> >> AESGCMBench.decrypt 8192 256 avgt 6 89123.942 ? 111.977 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 91034.697 ? 161.469 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 89732.397 ? 106.370 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 89382.300 ? 139.300 ns/op >> >> Neoverse N1, 2.5GHz: >> >> Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> AESGCMBench.decrypt 8192 256 avgt 6 6296.575 ? 37.995 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 7380.326 ? 10.987 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 6293.090 ? 52.972 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 6357.536 ? 42.925 ns/op >> >> AESGCMBench.decrypt 8192 256 avgt 6 48745.085 ? 125.612 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 45062.599 ? 1548.950 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 42230.857 ? 520.562 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 45124.171 ? 1417.927 ns/op >> >> >> >> A note about the implementation for the reviewers: >> >> Unrolled and hand-scheduled intrinsics are often written in a way that >> I don't find satisfactory. Often they are a conglomeration of >> copy-and-paste programming and C macros, which makes them hard to >> understand and hard to maintain. I won't name any names, but there are >> many examples to be found in free software across the Internet, >> >> I spent a while thinking about a structured way to develop and >> implement them, and I think I've got something better. The idea is >> that you transform a pre-existing implementation into a generator for >> the interleaved version. The transformation shouldn't be too hard to >> do, but more importantly it should be possible for a reader to verify >> that the interleaved and unrolled version performs the same function. >> >> A generator takes the form of a subclass of `KernelGenerator`. The >> core idea is that the programmer defines the base case of the >> intrinsic and a method to generate a clone of it, shifted to a >> different set of registers. `KernelGenerator` will then generate >> several interleaved copies of the function, with each one using a >> different set of registers. >> >> The subclass must implement three methods: `length()`, which is the >> number of instruction bundles in the intrinsic, `generate(int n)` >> which emits the nth instruction bundle in the intrinsic, and `next()` >> which takes an instance of the generator and returns a version of it, >> shifted to a new set of registers. >> >> As an example, here's the inner loop of AES encryption: >> >> (Some details elided for clarity.) >> >> >> BIND(L_aes_loop); >> ld1(v0, T16B, post(from, 16)); >> >> cmpw(keylen, 44); >> br(Assembler::CC, L_rounds_44); >> br(Assembler::EQ, L_rounds_52); >> >> aes_round(v0, v17); >> aes_round(v0, v18); >> BIND(L_rounds_52); >> aes_round(v0, v19); >> aes_round(v0, v20); >> BIND(L_rounds_44); >> ... >> >> >> The generator for the unrolled version looks like: >> >> >> virtual void generate(int index) { >> switch (index) { >> case 0: >> ld1(_data, T16B, post(_from, 16)); // get 16 bytes of input >> break; >> case 1: >> if (_once) { >> cmpw(_keylen, 52); >> br(Assembler::LO, _rounds_44); >> br(Assembler::EQ, _rounds_52); >> } >> break; >> case 2: aes_round(_data, _subkeys + 0); break; >> case 3: aes_round(_data, _subkeys + 1); break; >> case 4: >> if (_once) bind(_rounds_52); >> break; >> case 5: aes_round(_data, _subkeys + 2); break; >> case 6: aes_round(_data, _subkeys + 3); break; >> case 7: >> if (_once) bind(_rounds_44); >> break; >> ... >> >> >> The job of converting a single inline intrinsic is, as you can see, >> not much more than adding a switch statement. Some instructions should >> only be emitted once, rather than several times, such as the labels >> and branches. (You can use a list of C++ lambdas rather than a switch >> statement to do the same thing, very LISP, but that seems a bit of a >> sledgehammer. YMMV.) >> >> I believe that this approach will be more maintainable and easier to >> understand than other approaches we've seen. Also, the number of >> unrolls is just a number that can be tweaked as required. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Whitespace src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 1300: > 1298: Register tmp3); > 1299: > 1300: void ghash_modmul_wide (int index, FloatRegister result, Is there definition and reference of this? ------------- PR: https://git.openjdk.java.net/jdk/pull/5390 From shade at openjdk.java.net Tue Sep 14 08:18:06 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 14 Sep 2021 08:18:06 GMT Subject: RFR: 8273486: Zero: Handle DiagnoseSyncOnValueBasedClasses VM option In-Reply-To: References: Message-ID: On Tue, 14 Sep 2021 01:53:05 GMT, David Holmes wrote: > Change seems fine. I'm a little surprised this is all you need. Thanks. Yes, the magic of reusing the existing `UseHeavyMonitors` paths. It would be a minor problem if we ever decide to ditch that option, but that's the issue for another day. ------------- PR: https://git.openjdk.java.net/jdk/pull/5412 From shade at openjdk.java.net Tue Sep 14 08:18:07 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 14 Sep 2021 08:18:07 GMT Subject: Integrated: 8273486: Zero: Handle DiagnoseSyncOnValueBasedClasses VM option In-Reply-To: References: Message-ID: On Wed, 8 Sep 2021 10:41:34 GMT, Aleksey Shipilev wrote: > JDK-8257027 added a diagnostic option to check for synchronization on value-based classes. Zero does not support it, so it would fail the relevant test: > > > $ CONF=linux-x86_64-zero-fastdebug make exploded-test TEST=runtime/Monitor/SyncOnValueBasedClassTest.java > > STDERR: > stdout: []; > stderr: [Exception in thread "main" java.lang.RuntimeException: synchronization on value based class did not fail > at SyncOnValueBasedClassTest$FatalTest.main(SyncOnValueBasedClassTest.java:128) > ] > exitValue = 1 > > java.lang.RuntimeException: 'fatal error: Synchronizing on object' missing from stdout/stderr > > > Template interpreters implement this check by going to to slowpath that calls `InterpreterRuntime::monitorenter`. Zero already goes to that path when `UseHeavyMonitors` is enabled, so we might just enable it when lock diagnostics is requested. This would cost us zero (pun intended) when diagnostic option is disabled. > > Additional testing: > - [x] Linux x86_64 Zero, affected test now passes This pull request has now been integrated. Changeset: 86a8e552 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/86a8e5524ddb5e25dab54b4f56cc1b9c27d0a4a6 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod 8273486: Zero: Handle DiagnoseSyncOnValueBasedClasses VM option Reviewed-by: dholmes ------------- PR: https://git.openjdk.java.net/jdk/pull/5412 From shade at openjdk.java.net Tue Sep 14 08:21:12 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 14 Sep 2021 08:21:12 GMT Subject: RFR: 8273438: Enable parallelism in vmTestbase/metaspace/stressHierarchy tests In-Reply-To: <6UGVOWy8QGpYDMbNFkT6qIERHESLMdZpvz8ihmm_obg=.cadc1f2e-3325-4bdc-a0c7-e4579f72663f@github.com> References: <6UGVOWy8QGpYDMbNFkT6qIERHESLMdZpvz8ihmm_obg=.cadc1f2e-3325-4bdc-a0c7-e4579f72663f@github.com> Message-ID: On Tue, 7 Sep 2021 15:07:10 GMT, Aleksey Shipilev wrote: > Current `vmTestbase/metaspace/stressHierarchy` tests (part of vmTestbase_vm_metaspace suite) contains about 15 tests, each running exclusively. There seem to be no reason to run them exclusively, though: they complete in reasonable time, are single-threaded, and consume the usual amount of memory. There is no evidence in JBS that they ever timed out without a reason, and their history unfortunately predates OpenJDK to see why they were not concurrent from day one. > > We should consider enabling parallelism for `vmTestbase/metaspace/stressHierarchy` and get improved test performance. Currently it is blocked by `TEST.properties` with `exclusiveAccess.dirs` directives in them. > > Note there are other exclusive tests in `vmTestbase_vm_metaspace`, but those seem to be the hard stress tests: pushing GC to the limits, or doing many threads, etc. > > Motivational test time improvements below. > > Before: > > > $ time CONF=linux-x86_64-server-fastdebug make run-test TEST=vmTestbase_vm_metaspace | ts -s > ... > 00:24:53 ============================== > 00:24:53 Test summary > 00:24:53 ============================== > 00:24:53 TEST TOTAL PASS FAIL ERROR > 00:24:53 jtreg:test/hotspot/jtreg:vmTestbase_vm_metaspace 25 25 0 0 > 00:24:53 ============================== > 00:24:53 TEST SUCCESS > 00:24:53 > 00:24:53 Finished building target 'run-test' in configuration 'linux-x86_64-server-fastdebug' > > real 24m53.389s > user 53m2.029s > sys 1m1.849s > > > After: > > > $ time CONF=linux-x86_64-server-fastdebug make run-test TEST=vmTestbase_vm_metaspace | ts -s > ... > 00:04:04 ============================== > 00:04:04 Test summary > 00:04:04 ============================== > 00:04:04 TEST TOTAL PASS FAIL ERROR > 00:04:04 jtreg:test/hotspot/jtreg:vmTestbase_vm_metaspace 25 25 0 0 > 00:04:04 ============================== > 00:04:04 TEST SUCCESS > 00:04:04 > 00:04:04 Finished building target 'run-test' in configuration 'linux-x86_64-server-fastdebug' > > real 4m4.574s > user 56m10.582s > sys 1m4.725s All right, thank you! I'll integrate and see what happens next. ------------- PR: https://git.openjdk.java.net/jdk/pull/5391 From shade at openjdk.java.net Tue Sep 14 08:21:13 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 14 Sep 2021 08:21:13 GMT Subject: Integrated: 8273438: Enable parallelism in vmTestbase/metaspace/stressHierarchy tests In-Reply-To: <6UGVOWy8QGpYDMbNFkT6qIERHESLMdZpvz8ihmm_obg=.cadc1f2e-3325-4bdc-a0c7-e4579f72663f@github.com> References: <6UGVOWy8QGpYDMbNFkT6qIERHESLMdZpvz8ihmm_obg=.cadc1f2e-3325-4bdc-a0c7-e4579f72663f@github.com> Message-ID: On Tue, 7 Sep 2021 15:07:10 GMT, Aleksey Shipilev wrote: > Current `vmTestbase/metaspace/stressHierarchy` tests (part of vmTestbase_vm_metaspace suite) contains about 15 tests, each running exclusively. There seem to be no reason to run them exclusively, though: they complete in reasonable time, are single-threaded, and consume the usual amount of memory. There is no evidence in JBS that they ever timed out without a reason, and their history unfortunately predates OpenJDK to see why they were not concurrent from day one. > > We should consider enabling parallelism for `vmTestbase/metaspace/stressHierarchy` and get improved test performance. Currently it is blocked by `TEST.properties` with `exclusiveAccess.dirs` directives in them. > > Note there are other exclusive tests in `vmTestbase_vm_metaspace`, but those seem to be the hard stress tests: pushing GC to the limits, or doing many threads, etc. > > Motivational test time improvements below. > > Before: > > > $ time CONF=linux-x86_64-server-fastdebug make run-test TEST=vmTestbase_vm_metaspace | ts -s > ... > 00:24:53 ============================== > 00:24:53 Test summary > 00:24:53 ============================== > 00:24:53 TEST TOTAL PASS FAIL ERROR > 00:24:53 jtreg:test/hotspot/jtreg:vmTestbase_vm_metaspace 25 25 0 0 > 00:24:53 ============================== > 00:24:53 TEST SUCCESS > 00:24:53 > 00:24:53 Finished building target 'run-test' in configuration 'linux-x86_64-server-fastdebug' > > real 24m53.389s > user 53m2.029s > sys 1m1.849s > > > After: > > > $ time CONF=linux-x86_64-server-fastdebug make run-test TEST=vmTestbase_vm_metaspace | ts -s > ... > 00:04:04 ============================== > 00:04:04 Test summary > 00:04:04 ============================== > 00:04:04 TEST TOTAL PASS FAIL ERROR > 00:04:04 jtreg:test/hotspot/jtreg:vmTestbase_vm_metaspace 25 25 0 0 > 00:04:04 ============================== > 00:04:04 TEST SUCCESS > 00:04:04 > 00:04:04 Finished building target 'run-test' in configuration 'linux-x86_64-server-fastdebug' > > real 4m4.574s > user 56m10.582s > sys 1m4.725s This pull request has now been integrated. Changeset: a1433728 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/a143372818ffab635b0e97208be53569b159a98b Stats: 15 lines in 15 files changed: 0 ins; 15 del; 0 mod 8273438: Enable parallelism in vmTestbase/metaspace/stressHierarchy tests Reviewed-by: mseledtsov, iignatyev ------------- PR: https://git.openjdk.java.net/jdk/pull/5391 From pliden at openjdk.java.net Tue Sep 14 10:32:11 2021 From: pliden at openjdk.java.net (Per Liden) Date: Tue, 14 Sep 2021 10:32:11 GMT Subject: Integrated: 8273597: Rectify Thread::is_ConcurrentGC_thread() In-Reply-To: References: Message-ID: On Fri, 10 Sep 2021 12:39:14 GMT, Per Liden wrote: > `Thread::is_ConcurrentGC_thread()` behaves differently to all other `Thread::is_xxx_thread()` functions, in the sense that it doesn't directly map to a distinct `Thread` sub-class. Instead, `is_ConcurrentGC_thread()` can today return true for both `ConcurrentGCThread` and `GangWorker`. These two classes have no super/sub-class relation. This is confusing and and potentially dangerous. > > It would be reasonable to think that code like this would be correct: > > > if (thread->is_ConcurrentGC_thread()) { > conc_thread = static_cast(thread); > ... > } > > > but it's not, since we might try to cast a `GangWorker` to a `ConcurrentGCThread`. And again, these two classes have no super/sub-class relation. > > I propose that we clean this up, so that `is_ConcurrentGCThread()` only returns true for threads inheriting from `ConcurrentGCThread`. The main side-effect is that a handful of asserts need to be adjusted. In return, the code example above would become legal, and we can also remove some cruft from `WorkGang`/`GangWorker`. This pull request has now been integrated. Changeset: 38845805 Author: Per Liden URL: https://git.openjdk.java.net/jdk/commit/3884580591e932536a078f4f138920dcc8139c1a Stats: 61 lines in 18 files changed: 12 ins; 27 del; 22 mod 8273597: Rectify Thread::is_ConcurrentGC_thread() Reviewed-by: stefank, coleenp ------------- PR: https://git.openjdk.java.net/jdk/pull/5463 From pliden at openjdk.java.net Tue Sep 14 10:32:10 2021 From: pliden at openjdk.java.net (Per Liden) Date: Tue, 14 Sep 2021 10:32:10 GMT Subject: RFR: 8273597: Rectify Thread::is_ConcurrentGC_thread() [v2] In-Reply-To: <5w9LT70Xgw0fu3bme0Tx4ko3zShJrqT2OQU613E7Ff8=.4b4211fc-7d8b-4b91-b6ed-e24a8e28947f@github.com> References: <5w9LT70Xgw0fu3bme0Tx4ko3zShJrqT2OQU613E7Ff8=.4b4211fc-7d8b-4b91-b6ed-e24a8e28947f@github.com> Message-ID: <6UjyClC5-2-INjTE4um_TGLBFHI9iP3FNlYWkJ9j0R4=.ae087677-6e30-4bf6-99cd-42228b6f7db6@github.com> On Mon, 13 Sep 2021 12:22:03 GMT, Stefan Karlsson wrote: >> Per Liden has updated the pull request incrementally with two additional commits since the last revision: >> >> - Fix constructor call >> - Add ConcurrentGCThread::cast() > > Marked as reviewed by stefank (Reviewer). Thanks for reviewing @stefank and @coleenp! ------------- PR: https://git.openjdk.java.net/jdk/pull/5463 From stuefe at openjdk.java.net Tue Sep 14 12:15:05 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 14 Sep 2021 12:15:05 GMT Subject: RFR: 8273486: Zero: Handle DiagnoseSyncOnValueBasedClasses VM option In-Reply-To: References: Message-ID: On Wed, 8 Sep 2021 10:41:34 GMT, Aleksey Shipilev wrote: > JDK-8257027 added a diagnostic option to check for synchronization on value-based classes. Zero does not support it, so it would fail the relevant test: > > > $ CONF=linux-x86_64-zero-fastdebug make exploded-test TEST=runtime/Monitor/SyncOnValueBasedClassTest.java > > STDERR: > stdout: []; > stderr: [Exception in thread "main" java.lang.RuntimeException: synchronization on value based class did not fail > at SyncOnValueBasedClassTest$FatalTest.main(SyncOnValueBasedClassTest.java:128) > ] > exitValue = 1 > > java.lang.RuntimeException: 'fatal error: Synchronizing on object' missing from stdout/stderr > > > Template interpreters implement this check by going to to slowpath that calls `InterpreterRuntime::monitorenter`. Zero already goes to that path when `UseHeavyMonitors` is enabled, so we might just enable it when lock diagnostics is requested. This would cost us zero (pun intended) when diagnostic option is disabled. > > Additional testing: > - [x] Linux x86_64 Zero, affected test now passes LGTM ------------- PR: https://git.openjdk.java.net/jdk/pull/5412 From eosterlund at openjdk.java.net Tue Sep 14 13:03:12 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 14 Sep 2021 13:03:12 GMT Subject: RFR: 8273635: Attempting to acquire lock StackWatermark_lock/9 out of order with lock tty_lock/3 [v2] In-Reply-To: References: Message-ID: On Mon, 13 Sep 2021 22:24:50 GMT, Coleen Phillimore wrote: >> This change reverts the rank ordering of ttyLock and StackWatermark_lock because the latter is held through a very large region and printing all of this to a buffer with xmlstream is non-trivial. >> With this change, if tty->print_cr() is done while holding the stackwatermark lock or lower (which is service ranking, etc), a lock inversion will happen with ttyLock. This doesn't happen now because all the code in GC and much of the rest of the runtime use UL and not tty->print(). >> Tested with tier1-6. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > fix typo The rank reversal looks good. Too bad there was more printing code. ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5499 From coleenp at openjdk.java.net Tue Sep 14 13:13:08 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 14 Sep 2021 13:13:08 GMT Subject: RFR: 8273635: Attempting to acquire lock StackWatermark_lock/9 out of order with lock tty_lock/3 [v2] In-Reply-To: References: Message-ID: On Mon, 13 Sep 2021 22:24:50 GMT, Coleen Phillimore wrote: >> This change reverts the rank ordering of ttyLock and StackWatermark_lock because the latter is held through a very large region and printing all of this to a buffer with xmlstream is non-trivial. >> With this change, if tty->print_cr() is done while holding the stackwatermark lock or lower (which is service ranking, etc), a lock inversion will happen with ttyLock. This doesn't happen now because all the code in GC and much of the rest of the runtime use UL and not tty->print(). >> Tested with tier1-6. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > fix typo Thanks Erik! ------------- PR: https://git.openjdk.java.net/jdk/pull/5499 From coleenp at openjdk.java.net Tue Sep 14 13:13:09 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 14 Sep 2021 13:13:09 GMT Subject: Integrated: 8273635: Attempting to acquire lock StackWatermark_lock/9 out of order with lock tty_lock/3 In-Reply-To: References: Message-ID: On Mon, 13 Sep 2021 20:04:56 GMT, Coleen Phillimore wrote: > This change reverts the rank ordering of ttyLock and StackWatermark_lock because the latter is held through a very large region and printing all of this to a buffer with xmlstream is non-trivial. > With this change, if tty->print_cr() is done while holding the stackwatermark lock or lower (which is service ranking, etc), a lock inversion will happen with ttyLock. This doesn't happen now because all the code in GC and much of the rest of the runtime use UL and not tty->print(). > Tested with tier1-6. This pull request has now been integrated. Changeset: 1d3eb147 Author: Coleen Phillimore URL: https://git.openjdk.java.net/jdk/commit/1d3eb147ee7dd9b237d3cf633a5792544f8cac30 Stats: 5 lines in 2 files changed: 1 ins; 1 del; 3 mod 8273635: Attempting to acquire lock StackWatermark_lock/9 out of order with lock tty_lock/3 Reviewed-by: dholmes, eosterlund ------------- PR: https://git.openjdk.java.net/jdk/pull/5499 From phh at openjdk.java.net Tue Sep 14 13:21:09 2021 From: phh at openjdk.java.net (Paul Hohensee) Date: Tue, 14 Sep 2021 13:21:09 GMT Subject: RFR: 8273239: Standardize Ticks APIs return type [v2] In-Reply-To: References: Message-ID: On Tue, 7 Sep 2021 17:27:00 GMT, Albert Mingkun Yang wrote: >> Simple change on return types of Ticks API. >> >> The call of `milliseconds()` in `spinYield.cpp` seems a bug to me, because the unit in the message is `usecs`. Therefore, I changed it to `microseconds()`. >> >> Test: tier1 > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > template Looking at the Rdtsc implementation, It could cause precision loss in any case because Rdtsc::frequency() is the cpu clock rate, which is typically larger than NANOSECS_PER_SEC. One would have to go to picoseconds as the base rate to fix that, but picosecond precision in 64 bits (a uint64_t) is roughly the same as nanosecond precision in 53 bits (a double), so we wouldn't gain anything by doing that. Double it is. ------------- Marked as reviewed by phh (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5332 From aph at openjdk.java.net Tue Sep 14 13:34:03 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 14 Sep 2021 13:34:03 GMT Subject: RFR: 8273297: AES/GCM non-AVX512+VAES CPUs suffer after 8267125 In-Reply-To: References: Message-ID: On Tue, 7 Sep 2021 22:31:30 GMT, Smita Kamath wrote: > Performance dropped up to 10% for 1k data after 8267125 for CPUs that do not support the new intrinsic. Tests run were crypto.full.AESGCMBench and crypto.full.AESGCMByteBuffer from the jmh micro benchmarks. > > The problem is each instance of GHASH allocates 96 extra longs for the AVX512+VAES intrinsic regardless if the intrinsic is used. This extra table space should be allocated differently so that non-supporting CPUs do not suffer this penalty. This issue also affects non-Intel CPUs too. It seems to me there's a serious problem here. When you execute the galoisCounterMode_AESCrypt() intrinsic, I don't think there's a limit on the number of blocks to be encrypted. With the older intrinsic things are not so very bad because the incoming data is split into 6 segments. But if we use this intrinsic, there is no safepoint check in the inner loop, which can lead to a long time to safepoint, and this causes stalls on the other threads. If you split the incoming data into blocks of about a megabyte you'd lose no measurable performance but you'd dramatically improve the performance of everything else, especially with a concurrent GC. ------------- PR: https://git.openjdk.java.net/jdk/pull/5402 From luhenry at openjdk.java.net Tue Sep 14 14:17:24 2021 From: luhenry at openjdk.java.net (Ludovic Henry) Date: Tue, 14 Sep 2021 14:17:24 GMT Subject: Withdrawn: 8178287: AsyncGetCallTrace fails to traverse valid Java stacks In-Reply-To: <9qfnLj_-jz8MocK7UIIs5-NYZsVPJ7J20ZLiORqpUlM=.cb712662-0eb9-4d17-a67d-42451423f470@github.com> References: <9qfnLj_-jz8MocK7UIIs5-NYZsVPJ7J20ZLiORqpUlM=.cb712662-0eb9-4d17-a67d-42451423f470@github.com> Message-ID: On Wed, 9 Jun 2021 17:16:23 GMT, Ludovic Henry wrote: > When the signal sent for AsyncGetCallTrace or JFR would land on a runtime stub (like arraycopy), a vtable stub, or the prolog of a compiled method, it wouldn't be able to detect the sender (caller) frame for multiple reasons. This patch fixes these cases through adding CodeBlob-specific frame parser which are in the best position to know how a frame is setup. > > The following examples have been profiled with honest-profiler which uses `AsyncGetCallTrace`. > > # `Prof1` > > public class Prof1 { > > public static void main(String[] args) { > StringBuilder sb = new StringBuilder(); > for (int i = 0; i < 1000000; i++) { > sb.append("ab"); > sb.delete(0, 1); > } > System.out.println(sb.length()); > } > } > > > - Baseline: > > Flat Profile (by method): > (t 99.4,s 99.4) AGCT::Unknown Java[ERR=-5] > (t 0.5,s 0.2) Prof1::main > (t 0.2,s 0.2) java.lang.AbstractStringBuilder::append > (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] > (t 0.0,s 0.0) java.lang.AbstractStringBuilder::ensureCapacityInternal > (t 0.0,s 0.0) java.lang.AbstractStringBuilder::shift > (t 0.0,s 0.0) java.lang.String::getBytes > (t 0.0,s 0.0) java.lang.AbstractStringBuilder::putStringAt > (t 0.0,s 0.0) java.lang.StringBuilder::delete > (t 0.2,s 0.0) java.lang.StringBuilder::append > (t 0.0,s 0.0) java.lang.AbstractStringBuilder::delete > (t 0.0,s 0.0) java.lang.AbstractStringBuilder::putStringAt > > - With `StubRoutinesBlob::FrameParser`: > > Flat Profile (by method): > (t 98.7,s 98.7) java.lang.AbstractStringBuilder::ensureCapacityInternal > (t 0.9,s 0.9) java.lang.AbstractStringBuilder::delete > (t 99.8,s 0.2) Prof1::main > (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] > (t 0.0,s 0.0) AGCT::Unknown Java[ERR=-5] > (t 98.8,s 0.0) java.lang.AbstractStringBuilder::append > (t 98.8,s 0.0) java.lang.StringBuilder::append > (t 0.9,s 0.0) java.lang.StringBuilder::delete > > > # `Prof2` > > import java.util.function.Supplier; > > public class Prof2 { > > public static void main(String[] args) { > var rand = new java.util.Random(0); > Supplier[] suppliers = { > () -> 0, > () -> 1, > () -> 2, > () -> 3, > }; > > long sum = 0; > for (int i = 0; i >= 0; i++) { > sum += (int)suppliers[i % suppliers.length].get(); > } > } > } > > > - Baseline: > > Flat Profile (by method): > (t 60.7,s 60.7) AGCT::Unknown Java[ERR=-5] > (t 39.2,s 35.2) Prof2::main > (t 1.4,s 1.4) Prof2::lambda$main$3 > (t 1.0,s 1.0) Prof2::lambda$main$2 > (t 0.9,s 0.9) Prof2::lambda$main$1 > (t 0.7,s 0.7) Prof2::lambda$main$0 > (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] > (t 0.0,s 0.0) java.lang.Thread::exit > (t 0.9,s 0.0) Prof2$$Lambda$2.0x0000000800c00c28::get > (t 1.0,s 0.0) Prof2$$Lambda$3.0x0000000800c01000::get > (t 1.4,s 0.0) Prof2$$Lambda$4.0x0000000800c01220::get > (t 0.7,s 0.0) Prof2$$Lambda$1.0x0000000800c00a08::get > > > - With `VtableBlob::FrameParser` and `nmethod::FrameParser`: > > Flat Profile (by method): > (t 74.1,s 70.3) Prof2::main > (t 6.5,s 5.5) Prof2$$Lambda$29.0x0000000800081220::get > (t 6.6,s 5.4) Prof2$$Lambda$28.0x0000000800081000::get > (t 5.7,s 5.0) Prof2$$Lambda$26.0x0000000800080a08::get > (t 5.9,s 5.0) Prof2$$Lambda$27.0x0000000800080c28::get > (t 4.9,s 4.9) AGCT::Unknown Java[ERR=-5] > (t 1.2,s 1.2) Prof2::lambda$main$2 > (t 0.9,s 0.9) Prof2::lambda$main$3 > (t 0.9,s 0.9) Prof2::lambda$main$1 > (t 0.7,s 0.7) Prof2::lambda$main$0 > (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/4436 From luhenry at openjdk.java.net Tue Sep 14 14:17:23 2021 From: luhenry at openjdk.java.net (Ludovic Henry) Date: Tue, 14 Sep 2021 14:17:23 GMT Subject: RFR: 8178287: AsyncGetCallTrace fails to traverse valid Java stacks [v4] In-Reply-To: References: <9qfnLj_-jz8MocK7UIIs5-NYZsVPJ7J20ZLiORqpUlM=.cb712662-0eb9-4d17-a67d-42451423f470@github.com> Message-ID: On Mon, 19 Jul 2021 09:25:59 GMT, Ludovic Henry wrote: >> When the signal sent for AsyncGetCallTrace or JFR would land on a runtime stub (like arraycopy), a vtable stub, or the prolog of a compiled method, it wouldn't be able to detect the sender (caller) frame for multiple reasons. This patch fixes these cases through adding CodeBlob-specific frame parser which are in the best position to know how a frame is setup. >> >> The following examples have been profiled with honest-profiler which uses `AsyncGetCallTrace`. >> >> # `Prof1` >> >> public class Prof1 { >> >> public static void main(String[] args) { >> StringBuilder sb = new StringBuilder(); >> for (int i = 0; i < 1000000; i++) { >> sb.append("ab"); >> sb.delete(0, 1); >> } >> System.out.println(sb.length()); >> } >> } >> >> >> - Baseline: >> >> Flat Profile (by method): >> (t 99.4,s 99.4) AGCT::Unknown Java[ERR=-5] >> (t 0.5,s 0.2) Prof1::main >> (t 0.2,s 0.2) java.lang.AbstractStringBuilder::append >> (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] >> (t 0.0,s 0.0) java.lang.AbstractStringBuilder::ensureCapacityInternal >> (t 0.0,s 0.0) java.lang.AbstractStringBuilder::shift >> (t 0.0,s 0.0) java.lang.String::getBytes >> (t 0.0,s 0.0) java.lang.AbstractStringBuilder::putStringAt >> (t 0.0,s 0.0) java.lang.StringBuilder::delete >> (t 0.2,s 0.0) java.lang.StringBuilder::append >> (t 0.0,s 0.0) java.lang.AbstractStringBuilder::delete >> (t 0.0,s 0.0) java.lang.AbstractStringBuilder::putStringAt >> >> - With `StubRoutinesBlob::FrameParser`: >> >> Flat Profile (by method): >> (t 98.7,s 98.7) java.lang.AbstractStringBuilder::ensureCapacityInternal >> (t 0.9,s 0.9) java.lang.AbstractStringBuilder::delete >> (t 99.8,s 0.2) Prof1::main >> (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] >> (t 0.0,s 0.0) AGCT::Unknown Java[ERR=-5] >> (t 98.8,s 0.0) java.lang.AbstractStringBuilder::append >> (t 98.8,s 0.0) java.lang.StringBuilder::append >> (t 0.9,s 0.0) java.lang.StringBuilder::delete >> >> >> # `Prof2` >> >> import java.util.function.Supplier; >> >> public class Prof2 { >> >> public static void main(String[] args) { >> var rand = new java.util.Random(0); >> Supplier[] suppliers = { >> () -> 0, >> () -> 1, >> () -> 2, >> () -> 3, >> }; >> >> long sum = 0; >> for (int i = 0; i >= 0; i++) { >> sum += (int)suppliers[i % suppliers.length].get(); >> } >> } >> } >> >> >> - Baseline: >> >> Flat Profile (by method): >> (t 60.7,s 60.7) AGCT::Unknown Java[ERR=-5] >> (t 39.2,s 35.2) Prof2::main >> (t 1.4,s 1.4) Prof2::lambda$main$3 >> (t 1.0,s 1.0) Prof2::lambda$main$2 >> (t 0.9,s 0.9) Prof2::lambda$main$1 >> (t 0.7,s 0.7) Prof2::lambda$main$0 >> (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] >> (t 0.0,s 0.0) java.lang.Thread::exit >> (t 0.9,s 0.0) Prof2$$Lambda$2.0x0000000800c00c28::get >> (t 1.0,s 0.0) Prof2$$Lambda$3.0x0000000800c01000::get >> (t 1.4,s 0.0) Prof2$$Lambda$4.0x0000000800c01220::get >> (t 0.7,s 0.0) Prof2$$Lambda$1.0x0000000800c00a08::get >> >> >> - With `VtableBlob::FrameParser` and `nmethod::FrameParser`: >> >> Flat Profile (by method): >> (t 74.1,s 70.3) Prof2::main >> (t 6.5,s 5.5) Prof2$$Lambda$29.0x0000000800081220::get >> (t 6.6,s 5.4) Prof2$$Lambda$28.0x0000000800081000::get >> (t 5.7,s 5.0) Prof2$$Lambda$26.0x0000000800080a08::get >> (t 5.9,s 5.0) Prof2$$Lambda$27.0x0000000800080c28::get >> (t 4.9,s 4.9) AGCT::Unknown Java[ERR=-5] >> (t 1.2,s 1.2) Prof2::lambda$main$2 >> (t 0.9,s 0.9) Prof2::lambda$main$3 >> (t 0.9,s 0.9) Prof2::lambda$main$1 >> (t 0.7,s 0.7) Prof2::lambda$main$0 >> (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] > > Ludovic Henry has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Remove FrameParser and the required allocation > > The need for the allocation would be it non async-safe. However, > AsyncGetCallTrace is async-safe and thus can't allow for allocations. > - Merge branch 'master' of https://github.com/openjdk/jdk into fix-8178287 > - Fix comments > - Disable checks in FrameParser when known to be safe > - Allow AsyncGetCallTrace and JFR to unwind stack from vtable stub > > The program is the following: > > ``` > import java.util.function.Supplier; > > public class Prof2 { > > public static void main(String[] args) { > var rand = new java.util.Random(0); > Supplier[] suppliers = { > () -> 0, > () -> 1, > () -> 2, > () -> 3, > }; > > long sum = 0; > for (int i = 0; i >= 0; i++) { > sum += (int)suppliers[i % suppliers.length].get(); > } > } > } > ``` > > The results are as follows: > > - Baseline (from previous commit): > > Flat Profile (by method): > (t 39.3,s 39.3) AGCT::Unknown Java[ERR=-5] > (t 40.3,s 36.1) Prof2::main > (t 6.4,s 5.3) Prof2$$Lambda$28.0x0000000800081000::get > (t 6.1,s 5.1) Prof2$$Lambda$29.0x0000000800081220::get > (t 6.0,s 5.0) Prof2$$Lambda$27.0x0000000800080c28::get > (t 6.1,s 5.0) Prof2$$Lambda$26.0x0000000800080a08::get > (t 1.1,s 1.1) Prof2::lambda$main$2 > (t 1.1,s 1.1) Prof2::lambda$main$0 > (t 1.0,s 1.0) Prof2::lambda$main$1 > (t 0.9,s 0.9) Prof2::lambda$main$3 > (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] > > - With unwind from vtable stub > > Flat Profile (by method): > (t 74.1,s 70.3) Prof2::main > (t 6.5,s 5.5) Prof2$$Lambda$29.0x0000000800081220::get > (t 6.6,s 5.4) Prof2$$Lambda$28.0x0000000800081000::get > (t 5.7,s 5.0) Prof2$$Lambda$26.0x0000000800080a08::get > (t 5.9,s 5.0) Prof2$$Lambda$27.0x0000000800080c28::get > (t 4.9,s 4.9) AGCT::Unknown Java[ERR=-5] > (t 1.2,s 1.2) Prof2::lambda$main$2 > (t 0.9,s 0.9) Prof2::lambda$main$3 > (t 0.9,s 0.9) Prof2::lambda$main$1 > (t 0.7,s 0.7) Prof2::lambda$main$0 > (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] > > We attribute the vtable stub to the caller and not the callee, which is > already an improvement from the existing case. > - Allow AsyncGetCallTrace and JFR to unwind stack from nmethod's prolog > > When sampling hits the prolog of a method, Hotspot assumes it's unable > to parse the frame. This change allows to parse such frame on x86 by > specializing which instruction it's hitting in the prolog. > > The results are as follows: > > - Baseline: > > Flat Profile (by method): > (t 60.7,s 60.7) AGCT::Unknown Java[ERR=-5] > (t 39.2,s 35.2) Prof2::main > (t 1.4,s 1.4) Prof2::lambda$main$3 > (t 1.0,s 1.0) Prof2::lambda$main$2 > (t 0.9,s 0.9) Prof2::lambda$main$1 > (t 0.7,s 0.7) Prof2::lambda$main$0 > (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] > (t 0.0,s 0.0) java.lang.Thread::exit > (t 0.9,s 0.0) Prof2$$Lambda$2.0x0000000800c00c28::get > (t 1.0,s 0.0) Prof2$$Lambda$3.0x0000000800c01000::get > (t 1.4,s 0.0) Prof2$$Lambda$4.0x0000000800c01220::get > (t 0.7,s 0.0) Prof2$$Lambda$1.0x0000000800c00a08::get > > - With incomplete frame parsing: > > Flat Profile (by method): > (t 39.3,s 39.3) AGCT::Unknown Java[ERR=-5] > (t 40.3,s 36.1) Prof2::main > (t 6.4,s 5.3) Prof2$$Lambda$28.0x0000000800081000::get > (t 6.1,s 5.1) Prof2$$Lambda$29.0x0000000800081220::get > (t 6.0,s 5.0) Prof2$$Lambda$27.0x0000000800080c28::get > (t 6.1,s 5.0) Prof2$$Lambda$26.0x0000000800080a08::get > (t 1.1,s 1.1) Prof2::lambda$main$2 > (t 1.1,s 1.1) Prof2::lambda$main$0 > (t 1.0,s 1.0) Prof2::lambda$main$1 > (t 0.9,s 0.9) Prof2::lambda$main$3 > (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] > (t 0.0,s 0.0) java.util.Locale::getInstance > (t 0.0,s 0.0) AGCT::Not walkable Java[ERR=-6] > (t 0.0,s 0.0) jdk.internal.loader.BuiltinClassLoader::loadClassOrNull > (t 0.0,s 0.0) java.lang.ClassLoader::loadClass > (t 0.0,s 0.0) sun.net.util.URLUtil::urlNoFragString > (t 0.0,s 0.0) java.lang.Class::forName0 > (t 0.0,s 0.0) java.util.Locale::initDefault > (t 0.0,s 0.0) jdk.internal.loader.BuiltinClassLoader::loadClass > (t 0.0,s 0.0) jdk.internal.loader.URLClassPath::getLoader > (t 0.0,s 0.0) jdk.internal.loader.URLClassPath::getResource > (t 0.0,s 0.0) java.lang.String::toLowerCase > (t 0.0,s 0.0) sun.launcher.LauncherHelper::loadMainClass > (t 0.0,s 0.0) sun.launcher.LauncherHelper::checkAndLoadMain > (t 0.0,s 0.0) java.util.Locale:: > (t 0.0,s 0.0) jdk.internal.loader.BuiltinClassLoader::findClassOnClassPathOrNull > (t 0.0,s 0.0) jdk.internal.loader.ClassLoaders$AppClassLoader::loadClass > (t 0.0,s 0.0) java.lang.Class::forName > > The program is as follows: > > ``` > import java.util.function.Supplier; > > public class Prof2 { > > public static void main(String[] args) { > var rand = new java.util.Random(0); > Supplier[] suppliers = { > () -> 0, > () -> 1, > () -> 2, > () -> 3, > }; > > long sum = 0; > for (int i = 0; i >= 0; i++) { > sum += (int)suppliers[i % suppliers.length].get(); > } > } > } > ``` > > We see that the results are particularely useful in this case as the > methods are very short (it only returns an integer), and the probability > of hitting the prolog is then very high. > - Allow AsyncGetCallTrace and JFR to walk a stub frame > > When the signal sent for AsyncGetCallTrace or JFR would land on a stub > (like arraycopy), it wouldn't be able to detect the sender (caller) > frame because `_cb->frame_size() == 0`. > > Because we fully control how the prolog and epilog of stub code is > generated, we know there are two cases: > 1. A stack frame is allocated via macroAssembler->enter(), and consists > in `push rbp; mov rsp, rbp;`. > 2. No stack frames are allocated and rbp is left unchanged and rsp is > decremented with the `call` instruction that push the return `pc` on the > stack. > > For case 1., we can easily know the sender frame by simply looking at > rbp, especially since we know that all stubs preserve the frame pointer > (on x86 at least). > > For case 2., we end up returning the sender's sender, but that already > gives us more information than what we have today. > > The results are as follows: > > - Baseline: > > Flat Profile (by method): > (t 99.4,s 99.4) AGCT::Unknown Java[ERR=-5] > (t 0.5,s 0.2) Prof1::main > (t 0.2,s 0.2) java.lang.AbstractStringBuilder::append > (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] > (t 0.0,s 0.0) java.lang.AbstractStringBuilder::ensureCapacityInternal > (t 0.0,s 0.0) java.lang.AbstractStringBuilder::shift > (t 0.0,s 0.0) java.lang.String::getBytes > (t 0.0,s 0.0) java.lang.AbstractStringBuilder::putStringAt > (t 0.0,s 0.0) java.lang.StringBuilder::delete > (t 0.2,s 0.0) java.lang.StringBuilder::append > (t 0.0,s 0.0) java.lang.AbstractStringBuilder::delete > (t 0.0,s 0.0) java.lang.AbstractStringBuilder::putStringAt > > - With StubRoutinesBlob::FrameParser > > Flat Profile (by method): > (t 98.7,s 98.7) java.lang.AbstractStringBuilder::ensureCapacityInternal > (t 0.9,s 0.9) java.lang.AbstractStringBuilder::delete > (t 99.8,s 0.2) Prof1::main > (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] > (t 0.0,s 0.0) AGCT::Unknown Java[ERR=-5] > (t 98.8,s 0.0) java.lang.AbstractStringBuilder::append > (t 98.8,s 0.0) java.lang.StringBuilder::append > (t 0.9,s 0.0) java.lang.StringBuilder::delete > > The program is as follows: > > ``` > public class Prof1 { > > public static void main(String[] args) { > StringBuilder sb = new StringBuilder(); > for (int i = 0; i < 1000000; i++) { > sb.append("ab"); > sb.delete(0, 1); > } > System.out.println(sb.length()); > } > } > ``` > > We now account for the arraycopy stub which is called by > AbstractStringBuilder::ensureCapacityInternal. It was previously ignored > because it would not know how to parse the frame for the arraycopy stub > and would fall in the AGCT::Unknown Java[ERR=-5] section. > > However, it still isn't perfect since it doesn't point to the arraycopy stub > directly. > - Extract sender frame parsing to CodeBlock::FrameParser > > Whether and how a frame is setup is controlled by the code generator > for the specific CodeBlock. The CodeBlock is then in the best place to know how > to parse the sender's frame from the current frame in the given CodeBlock. > > This refactoring proposes to extract this parsing out of `frame` and into a > `CodeBlock::FrameParser`. This FrameParser is then specialized in the relevant > inherited children of CodeBlock. > > This change is to largely facilitate adding new supported cases for JDK-8252417 > like runtime stubs. Closing it for now until we figure out all the raised points. ------------- PR: https://git.openjdk.java.net/jdk/pull/4436 From aph at openjdk.java.net Tue Sep 14 14:37:06 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 14 Sep 2021 14:37:06 GMT Subject: RFR: JDK-8271567: AArch64: AES Galois CounterMode (GCM) interleaved implementation using vector instructions [v8] In-Reply-To: References: Message-ID: On Mon, 13 Sep 2021 12:28:47 GMT, Andrew Dinn wrote: >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> Whitespace > > src/hotspot/cpu/aarch64/macroAssembler_aarch64_aes.cpp line 604: > >> 602: // v4: high part of product >> 603: // v5: low part ... >> 604: // > > I'm not clear about this comment. The ghash generators have a stride of 7. Should this not mean the registers are replicated across v0 - v27 with v6, v13, v20 and v27 classified as unused registers. Well spotted. ------------- PR: https://git.openjdk.java.net/jdk/pull/5390 From coleenp at openjdk.java.net Tue Sep 14 15:21:19 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 14 Sep 2021 15:21:19 GMT Subject: RFR: 8273300: Check Mutex ranking during a safepoint [v4] In-Reply-To: References: Message-ID: > This change checks lock ranking during a safepoint. For some reason, safepoint checking was excluded, probably from the days where Safepoint_lock and Threads_lock were used. > Because of checking during a safepoint, some locks had to get lower ranks. The CR has the details of which locks these were. The Service_lock complicates things because it's held during oops_do, which may take out other G1 locks. > This was built and tested with Shenandoah. Thanks to @zhengyu123 for the changes in Shenandoah. > Tests run tier1-8. Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Fix Shenandoah mismerge - Merge branch 'master' into checkrank - Fix Shenandoah mismerge - 8273300: Check Mutex ranking during a safepoint ------------- Changes: https://git.openjdk.java.net/jdk/pull/5467/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5467&range=03 Stats: 33 lines in 14 files changed: 1 ins; 7 del; 25 mod Patch: https://git.openjdk.java.net/jdk/pull/5467.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5467/head:pull/5467 PR: https://git.openjdk.java.net/jdk/pull/5467 From coleenp at openjdk.java.net Tue Sep 14 15:21:20 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 14 Sep 2021 15:21:20 GMT Subject: RFR: 8273300: Check Mutex ranking during a safepoint [v3] In-Reply-To: References: Message-ID: On Mon, 13 Sep 2021 21:04:32 GMT, Coleen Phillimore wrote: >> This change checks lock ranking during a safepoint. For some reason, safepoint checking was excluded, probably from the days where Safepoint_lock and Threads_lock were used. >> Because of checking during a safepoint, some locks had to get lower ranks. The CR has the details of which locks these were. The Service_lock complicates things because it's held during oops_do, which may take out other G1 locks. >> This was built and tested with Shenandoah. Thanks to @zhengyu123 for the changes in Shenandoah. >> Tests run tier1-8. > > Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - Merge branch 'master' into checkrank > - Fix Shenandoah mismerge > - 8273300: Check Mutex ranking during a safepoint I just remerged from master and a couple of copyright updates leaked in from my script (which I shouldn't have used). ------------- PR: https://git.openjdk.java.net/jdk/pull/5467 From aph at openjdk.java.net Tue Sep 14 15:39:45 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 14 Sep 2021 15:39:45 GMT Subject: RFR: JDK-8271567: AArch64: AES Galois CounterMode (GCM) interleaved implementation using vector instructions [v9] In-Reply-To: References: Message-ID: > An interleaved version of AES/GCM. > > Performance, now and then: > > > Apple M1, 3.2 GHz: > > Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > AESGCMBench.decrypt 8192 256 avgt 6 3108.881 ? 119.675 ns/op > AESGCMBench.decryptMultiPart 8192 256 avgt 6 3109.685 ? 4.206 ns/op > AESGCMBench.encrypt 8192 256 avgt 6 3122.144 ? 113.379 ns/op > AESGCMBench.encryptMultiPart 8192 256 avgt 6 3119.568 ? 192.877 ns/op > > AESGCMBench.decrypt 8192 256 avgt 6 89123.942 ? 111.977 ns/op > AESGCMBench.decryptMultiPart 8192 256 avgt 6 91034.697 ? 161.469 ns/op > AESGCMBench.encrypt 8192 256 avgt 6 89732.397 ? 106.370 ns/op > AESGCMBench.encryptMultiPart 8192 256 avgt 6 89382.300 ? 139.300 ns/op > > Neoverse N1, 2.5GHz: > > Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > AESGCMBench.decrypt 8192 256 avgt 6 6296.575 ? 37.995 ns/op > AESGCMBench.decryptMultiPart 8192 256 avgt 6 7380.326 ? 10.987 ns/op > AESGCMBench.encrypt 8192 256 avgt 6 6293.090 ? 52.972 ns/op > AESGCMBench.encryptMultiPart 8192 256 avgt 6 6357.536 ? 42.925 ns/op > > AESGCMBench.decrypt 8192 256 avgt 6 48745.085 ? 125.612 ns/op > AESGCMBench.decryptMultiPart 8192 256 avgt 6 45062.599 ? 1548.950 ns/op > AESGCMBench.encrypt 8192 256 avgt 6 42230.857 ? 520.562 ns/op > AESGCMBench.encryptMultiPart 8192 256 avgt 6 45124.171 ? 1417.927 ns/op > > > > A note about the implementation for the reviewers: > > Unrolled and hand-scheduled intrinsics are often written in a way that > I don't find satisfactory. Often they are a conglomeration of > copy-and-paste programming and C macros, which makes them hard to > understand and hard to maintain. I won't name any names, but there are > many examples to be found in free software across the Internet, > > I spent a while thinking about a structured way to develop and > implement them, and I think I've got something better. The idea is > that you transform a pre-existing implementation into a generator for > the interleaved version. The transformation shouldn't be too hard to > do, but more importantly it should be possible for a reader to verify > that the interleaved and unrolled version performs the same function. > > A generator takes the form of a subclass of `KernelGenerator`. The > core idea is that the programmer defines the base case of the > intrinsic and a method to generate a clone of it, shifted to a > different set of registers. `KernelGenerator` will then generate > several interleaved copies of the function, with each one using a > different set of registers. > > The subclass must implement three methods: `length()`, which is the > number of instruction bundles in the intrinsic, `generate(int n)` > which emits the nth instruction bundle in the intrinsic, and `next()` > which takes an instance of the generator and returns a version of it, > shifted to a new set of registers. > > As an example, here's the inner loop of AES encryption: > > (Some details elided for clarity.) > > > BIND(L_aes_loop); > ld1(v0, T16B, post(from, 16)); > > cmpw(keylen, 44); > br(Assembler::CC, L_rounds_44); > br(Assembler::EQ, L_rounds_52); > > aes_round(v0, v17); > aes_round(v0, v18); > BIND(L_rounds_52); > aes_round(v0, v19); > aes_round(v0, v20); > BIND(L_rounds_44); > ... > > > The generator for the unrolled version looks like: > > > virtual void generate(int index) { > switch (index) { > case 0: > ld1(_data, T16B, post(_from, 16)); // get 16 bytes of input > break; > case 1: > if (_once) { > cmpw(_keylen, 52); > br(Assembler::LO, _rounds_44); > br(Assembler::EQ, _rounds_52); > } > break; > case 2: aes_round(_data, _subkeys + 0); break; > case 3: aes_round(_data, _subkeys + 1); break; > case 4: > if (_once) bind(_rounds_52); > break; > case 5: aes_round(_data, _subkeys + 2); break; > case 6: aes_round(_data, _subkeys + 3); break; > case 7: > if (_once) bind(_rounds_44); > break; > ... > > > The job of converting a single inline intrinsic is, as you can see, > not much more than adding a switch statement. Some instructions should > only be emitted once, rather than several times, such as the labels > and branches. (You can use a list of C++ lambdas rather than a switch > statement to do the same thing, very LISP, but that seems a bit of a > sledgehammer. YMMV.) > > I believe that this approach will be more maintainable and easier to > understand than other approaches we've seen. Also, the number of > unrolls is just a number that can be tweaked as required. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Clarifications and cleanups ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5390/files - new: https://git.openjdk.java.net/jdk/pull/5390/files/9ce21890..c0af76bd Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5390&range=08 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5390&range=07-08 Stats: 52 lines in 3 files changed: 14 ins; 3 del; 35 mod Patch: https://git.openjdk.java.net/jdk/pull/5390.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5390/head:pull/5390 PR: https://git.openjdk.java.net/jdk/pull/5390 From aph at openjdk.java.net Tue Sep 14 16:01:14 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 14 Sep 2021 16:01:14 GMT Subject: RFR: JDK-8271567: AArch64: AES Galois CounterMode (GCM) interleaved implementation using vector instructions [v8] In-Reply-To: References: Message-ID: On Tue, 14 Sep 2021 04:34:06 GMT, Xin Liu wrote: >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> Whitespace > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 3001: > >> 2999: assert(bulk_width == 4 || bulk_width == 8, "must be"); >> 3000: >> 3001: if (bulk_width == 8) { > > `bulk_width` is defined as a constant 4. why do you also check bulk_width == 8? > is this parameter tunable? same as "const int unroll = 4" below. Comment added. ------------- PR: https://git.openjdk.java.net/jdk/pull/5390 From aph at openjdk.java.net Tue Sep 14 16:07:45 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 14 Sep 2021 16:07:45 GMT Subject: RFR: JDK-8271567: AArch64: AES Galois CounterMode (GCM) interleaved implementation using vector instructions [v10] In-Reply-To: References: Message-ID: <5AXJjp4GMtL1NVj0hcCjqJ5ZHrdLObtjjkuyXAko-Ac=.30ffb412-fb26-419c-9d83-545d358f5eb7@github.com> > An interleaved version of AES/GCM. > > Performance, now and then: > > > Apple M1, 3.2 GHz: > > Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > AESGCMBench.decrypt 8192 256 avgt 6 3108.881 ? 119.675 ns/op > AESGCMBench.decryptMultiPart 8192 256 avgt 6 3109.685 ? 4.206 ns/op > AESGCMBench.encrypt 8192 256 avgt 6 3122.144 ? 113.379 ns/op > AESGCMBench.encryptMultiPart 8192 256 avgt 6 3119.568 ? 192.877 ns/op > > AESGCMBench.decrypt 8192 256 avgt 6 89123.942 ? 111.977 ns/op > AESGCMBench.decryptMultiPart 8192 256 avgt 6 91034.697 ? 161.469 ns/op > AESGCMBench.encrypt 8192 256 avgt 6 89732.397 ? 106.370 ns/op > AESGCMBench.encryptMultiPart 8192 256 avgt 6 89382.300 ? 139.300 ns/op > > Neoverse N1, 2.5GHz: > > Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > AESGCMBench.decrypt 8192 256 avgt 6 6296.575 ? 37.995 ns/op > AESGCMBench.decryptMultiPart 8192 256 avgt 6 7380.326 ? 10.987 ns/op > AESGCMBench.encrypt 8192 256 avgt 6 6293.090 ? 52.972 ns/op > AESGCMBench.encryptMultiPart 8192 256 avgt 6 6357.536 ? 42.925 ns/op > > AESGCMBench.decrypt 8192 256 avgt 6 48745.085 ? 125.612 ns/op > AESGCMBench.decryptMultiPart 8192 256 avgt 6 45062.599 ? 1548.950 ns/op > AESGCMBench.encrypt 8192 256 avgt 6 42230.857 ? 520.562 ns/op > AESGCMBench.encryptMultiPart 8192 256 avgt 6 45124.171 ? 1417.927 ns/op > > > > A note about the implementation for the reviewers: > > Unrolled and hand-scheduled intrinsics are often written in a way that > I don't find satisfactory. Often they are a conglomeration of > copy-and-paste programming and C macros, which makes them hard to > understand and hard to maintain. I won't name any names, but there are > many examples to be found in free software across the Internet, > > I spent a while thinking about a structured way to develop and > implement them, and I think I've got something better. The idea is > that you transform a pre-existing implementation into a generator for > the interleaved version. The transformation shouldn't be too hard to > do, but more importantly it should be possible for a reader to verify > that the interleaved and unrolled version performs the same function. > > A generator takes the form of a subclass of `KernelGenerator`. The > core idea is that the programmer defines the base case of the > intrinsic and a method to generate a clone of it, shifted to a > different set of registers. `KernelGenerator` will then generate > several interleaved copies of the function, with each one using a > different set of registers. > > The subclass must implement three methods: `length()`, which is the > number of instruction bundles in the intrinsic, `generate(int n)` > which emits the nth instruction bundle in the intrinsic, and `next()` > which takes an instance of the generator and returns a version of it, > shifted to a new set of registers. > > As an example, here's the inner loop of AES encryption: > > (Some details elided for clarity.) > > > BIND(L_aes_loop); > ld1(v0, T16B, post(from, 16)); > > cmpw(keylen, 44); > br(Assembler::CC, L_rounds_44); > br(Assembler::EQ, L_rounds_52); > > aes_round(v0, v17); > aes_round(v0, v18); > BIND(L_rounds_52); > aes_round(v0, v19); > aes_round(v0, v20); > BIND(L_rounds_44); > ... > > > The generator for the unrolled version looks like: > > > virtual void generate(int index) { > switch (index) { > case 0: > ld1(_data, T16B, post(_from, 16)); // get 16 bytes of input > break; > case 1: > if (_once) { > cmpw(_keylen, 52); > br(Assembler::LO, _rounds_44); > br(Assembler::EQ, _rounds_52); > } > break; > case 2: aes_round(_data, _subkeys + 0); break; > case 3: aes_round(_data, _subkeys + 1); break; > case 4: > if (_once) bind(_rounds_52); > break; > case 5: aes_round(_data, _subkeys + 2); break; > case 6: aes_round(_data, _subkeys + 3); break; > case 7: > if (_once) bind(_rounds_44); > break; > ... > > > The job of converting a single inline intrinsic is, as you can see, > not much more than adding a switch statement. Some instructions should > only be emitted once, rather than several times, such as the labels > and branches. (You can use a list of C++ lambdas rather than a switch > statement to do the same thing, very LISP, but that seems a bit of a > sledgehammer. YMMV.) > > I believe that this approach will be more maintainable and easier to > understand than other approaches we've seen. Also, the number of > unrolls is just a number that can be tweaked as required. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Cleanup ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5390/files - new: https://git.openjdk.java.net/jdk/pull/5390/files/c0af76bd..5d67d4bb Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5390&range=09 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5390&range=08-09 Stats: 5 lines in 2 files changed: 0 ins; 4 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5390.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5390/head:pull/5390 PR: https://git.openjdk.java.net/jdk/pull/5390 From aph at openjdk.java.net Tue Sep 14 16:07:49 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 14 Sep 2021 16:07:49 GMT Subject: RFR: JDK-8271567: AArch64: AES Galois CounterMode (GCM) interleaved implementation using vector instructions [v8] In-Reply-To: References: Message-ID: On Tue, 14 Sep 2021 05:25:07 GMT, Xin Liu wrote: >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> Whitespace > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5605: > >> 5603: address small = generate_ghash_processBlocks(); >> 5604: >> 5605: StubCodeMark mark(this, "StubRoutines", "ghash_processBlocks"); > > ghash_processBlocks_wide? otherwise, there will be two stubs with a same name. Good point; fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/5390 From aph at openjdk.java.net Tue Sep 14 16:15:11 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 14 Sep 2021 16:15:11 GMT Subject: RFR: JDK-8271567: AArch64: AES Galois CounterMode (GCM) interleaved implementation using vector instructions [v8] In-Reply-To: References: Message-ID: On Tue, 14 Sep 2021 07:48:03 GMT, Xin Liu wrote: >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> Whitespace > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 1300: > >> 1298: Register tmp3); >> 1299: >> 1300: void ghash_modmul_wide (int index, FloatRegister result, > > Is there definition and reference of this? Removed. > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 2863: > >> 2861: // >> 2862: // int result = len; >> 2863: // while (len-- > 0) { > > I see that algorithm code comes from CounterMode.implCrypt, but while (len-- > 0) seems not to be exactly same as algorithm here. I think it should be `while (len > 0)` > > `blockSize()` at line 2865 should be `blockSize` Done. ------------- PR: https://git.openjdk.java.net/jdk/pull/5390 From aph at openjdk.java.net Tue Sep 14 16:15:11 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 14 Sep 2021 16:15:11 GMT Subject: RFR: JDK-8271567: AArch64: AES Galois CounterMode (GCM) interleaved implementation using vector instructions [v8] In-Reply-To: References: Message-ID: On Tue, 14 Sep 2021 14:34:09 GMT, Andrew Haley wrote: >> src/hotspot/cpu/aarch64/macroAssembler_aarch64_aes.cpp line 604: >> >>> 602: // v4: high part of product >>> 603: // v5: low part ... >>> 604: // >> >> I'm not clear about this comment. The ghash generators have a stride of 7. Should this not mean the registers are replicated across v0 - v27 with v6, v13, v20 and v27 classified as unused registers. > > Well spotted. > I suspect it would be hard to produce hand-crafted code that does significantly better when it comes to performance. Probably not, especially because the design of `KenelGenerator` allows you to do pretty much anything. (In particular, the clones don't even have to compute the same function!) I hope we'd prefer maintainability to shaving off every clock cycle once we'd made encryption no longer the bottleneck. ------------- PR: https://git.openjdk.java.net/jdk/pull/5390 From xliu at openjdk.java.net Tue Sep 14 16:29:03 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 14 Sep 2021 16:29:03 GMT Subject: RFR: JDK-8271567: AArch64: AES Galois CounterMode (GCM) interleaved implementation using vector instructions [v10] In-Reply-To: <5AXJjp4GMtL1NVj0hcCjqJ5ZHrdLObtjjkuyXAko-Ac=.30ffb412-fb26-419c-9d83-545d358f5eb7@github.com> References: <5AXJjp4GMtL1NVj0hcCjqJ5ZHrdLObtjjkuyXAko-Ac=.30ffb412-fb26-419c-9d83-545d358f5eb7@github.com> Message-ID: <-LF9Rd0o4g7dDKkcM-sVXr5XTu6tcHHDIclJrRZ91kM=.4dea870d-fccb-4fa5-b8a9-13579c3f6ca4@github.com> On Tue, 14 Sep 2021 16:07:45 GMT, Andrew Haley wrote: >> An interleaved version of AES/GCM. >> >> Performance, now and then: >> >> >> Apple M1, 3.2 GHz: >> >> Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> AESGCMBench.decrypt 8192 256 avgt 6 3108.881 ? 119.675 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 3109.685 ? 4.206 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 3122.144 ? 113.379 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 3119.568 ? 192.877 ns/op >> >> AESGCMBench.decrypt 8192 256 avgt 6 89123.942 ? 111.977 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 91034.697 ? 161.469 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 89732.397 ? 106.370 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 89382.300 ? 139.300 ns/op >> >> Neoverse N1, 2.5GHz: >> >> Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> AESGCMBench.decrypt 8192 256 avgt 6 6296.575 ? 37.995 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 7380.326 ? 10.987 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 6293.090 ? 52.972 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 6357.536 ? 42.925 ns/op >> >> AESGCMBench.decrypt 8192 256 avgt 6 48745.085 ? 125.612 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 45062.599 ? 1548.950 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 42230.857 ? 520.562 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 45124.171 ? 1417.927 ns/op >> >> >> >> A note about the implementation for the reviewers: >> >> Unrolled and hand-scheduled intrinsics are often written in a way that >> I don't find satisfactory. Often they are a conglomeration of >> copy-and-paste programming and C macros, which makes them hard to >> understand and hard to maintain. I won't name any names, but there are >> many examples to be found in free software across the Internet, >> >> I spent a while thinking about a structured way to develop and >> implement them, and I think I've got something better. The idea is >> that you transform a pre-existing implementation into a generator for >> the interleaved version. The transformation shouldn't be too hard to >> do, but more importantly it should be possible for a reader to verify >> that the interleaved and unrolled version performs the same function. >> >> A generator takes the form of a subclass of `KernelGenerator`. The >> core idea is that the programmer defines the base case of the >> intrinsic and a method to generate a clone of it, shifted to a >> different set of registers. `KernelGenerator` will then generate >> several interleaved copies of the function, with each one using a >> different set of registers. >> >> The subclass must implement three methods: `length()`, which is the >> number of instruction bundles in the intrinsic, `generate(int n)` >> which emits the nth instruction bundle in the intrinsic, and `next()` >> which takes an instance of the generator and returns a version of it, >> shifted to a new set of registers. >> >> As an example, here's the inner loop of AES encryption: >> >> (Some details elided for clarity.) >> >> >> BIND(L_aes_loop); >> ld1(v0, T16B, post(from, 16)); >> >> cmpw(keylen, 44); >> br(Assembler::CC, L_rounds_44); >> br(Assembler::EQ, L_rounds_52); >> >> aes_round(v0, v17); >> aes_round(v0, v18); >> BIND(L_rounds_52); >> aes_round(v0, v19); >> aes_round(v0, v20); >> BIND(L_rounds_44); >> ... >> >> >> The generator for the unrolled version looks like: >> >> >> virtual void generate(int index) { >> switch (index) { >> case 0: >> ld1(_data, T16B, post(_from, 16)); // get 16 bytes of input >> break; >> case 1: >> if (_once) { >> cmpw(_keylen, 52); >> br(Assembler::LO, _rounds_44); >> br(Assembler::EQ, _rounds_52); >> } >> break; >> case 2: aes_round(_data, _subkeys + 0); break; >> case 3: aes_round(_data, _subkeys + 1); break; >> case 4: >> if (_once) bind(_rounds_52); >> break; >> case 5: aes_round(_data, _subkeys + 2); break; >> case 6: aes_round(_data, _subkeys + 3); break; >> case 7: >> if (_once) bind(_rounds_44); >> break; >> ... >> >> >> The job of converting a single inline intrinsic is, as you can see, >> not much more than adding a switch statement. Some instructions should >> only be emitted once, rather than several times, such as the labels >> and branches. (You can use a list of C++ lambdas rather than a switch >> statement to do the same thing, very LISP, but that seems a bit of a >> sledgehammer. YMMV.) >> >> I believe that this approach will be more maintainable and easier to >> understand than other approaches we've seen. Also, the number of >> unrolls is just a number that can be tweaked as required. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Cleanup Marked as reviewed by xliu (Committer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5390 From cjplummer at openjdk.java.net Tue Sep 14 19:05:09 2021 From: cjplummer at openjdk.java.net (Chris Plummer) Date: Tue, 14 Sep 2021 19:05:09 GMT Subject: RFR: 8272992: Replace usages of Collections.sort with List.sort call in jdk.* modules [v2] In-Reply-To: References: <9xbhI0rwD3XbAHZFfQAkJHYivbC5F4N085RuSVWx8HU=.8a470c93-5fee-4981-97e4-afb6cb1147b9@github.com> Message-ID: On Tue, 14 Sep 2021 07:46:12 GMT, Andrey Turbanov wrote: >> Collections.sort is just a wrapper, so it is better to use an instance method directly. > > Andrey Turbanov has updated the pull request incrementally with one additional commit since the last revision: > > 8272992: Replace usages of Collections.sort with List.sort call in jdk.* modules The SA changes look good. Make sure you run the tests in `test/hotspot/jtreg/serviceability/sa/` and `test/jdk/sun/tools/jhsdb/` ------------- Marked as reviewed by cjplummer (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5230 From iklam at openjdk.java.net Tue Sep 14 22:31:19 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Tue, 14 Sep 2021 22:31:19 GMT Subject: RFR: 8271073: Improve testing with VM option VerifyArchivedFields Message-ID: <2UUL97l_3iFjZBfTVxhvWeFZIPMZFqrD568mK-036VE=.90f17c84-eb88-4f8a-99ad-ddf520259638@github.com> - Changed the definition of `VerifyArchivedFields` from a whacky use of `bool` to an `int` and properly define its three levels: - 0: No verification - 1: Basic verification with VM_Verify (no side effects) - 2: Detailed verification by forcing a GC (with side effects) - Changed the default value to 0. The functionality checked by this flag has been very stable so there's no need to verify it in every single test case. - Enabled `-XX:VerifyArchivedFields=1` for all CDS test cases. - Added a new test case for `-XX:VerifyArchivedFields=2` . - Also added comments about that this flag is suppose to check for. ------------- Commit messages: - 8271073: Improve testing with VM option VerifyArchivedFields Changes: https://git.openjdk.java.net/jdk/pull/5514/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5514&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8271073 Stats: 85 lines in 5 files changed: 72 ins; 0 del; 13 mod Patch: https://git.openjdk.java.net/jdk/pull/5514.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5514/head:pull/5514 PR: https://git.openjdk.java.net/jdk/pull/5514 From dholmes at openjdk.java.net Wed Sep 15 01:33:52 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 15 Sep 2021 01:33:52 GMT Subject: RFR: 8273300: Check Mutex ranking during a safepoint [v4] In-Reply-To: References: <2yiIxm7CaiJf3_wiiYcuQjbRNFauGO0MJoJGUphSmR0=.29fe408e-8707-402f-8e3c-5366aff3a8cc@github.com> <0fzRARQKdoAbSLHq_SaKs698DuGgRWMkCUI4omDmurk=.acb1126f-5b41-46ad-a022-74eabffdb71f@github.com> Message-ID: On Mon, 13 Sep 2021 15:15:11 GMT, Coleen Phillimore wrote: >> src/hotspot/share/memory/universe.cpp line 1109: >> >>> 1107: } >>> 1108: if (should_verify_subset(Verify_CodeCache)) { >>> 1109: MutexLocker mu(CodeCache_lock, Mutex::_no_safepoint_check_flag); >> >> Is this needed to allow the new rankings to work? And is this enabled by the _verify_in_progress change? If so I'd rather see all of that related stuff changed first in a separate RFE that can easily be independently backported. > > Yes, this is needed. This verification is done during a safepoint, so we don't need this lock. The CodeCache_lock has a vary low ranking and takes out VtableStubs_lock which is a higher ranking. With this change, we do not take out the CodeCache_lock, so it's needed for this change. I see no reason whatsoever to backport it though. This isn't connected to the _verify_in_progress change, so that is fine. ------------- PR: https://git.openjdk.java.net/jdk/pull/5467 From dholmes at openjdk.java.net Wed Sep 15 01:33:51 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 15 Sep 2021 01:33:51 GMT Subject: RFR: 8273300: Check Mutex ranking during a safepoint [v4] In-Reply-To: References: Message-ID: On Tue, 14 Sep 2021 15:21:19 GMT, Coleen Phillimore wrote: >> This change checks lock ranking during a safepoint. For some reason, safepoint checking was excluded, probably from the days where Safepoint_lock and Threads_lock were used. >> Because of checking during a safepoint, some locks had to get lower ranks. The CR has the details of which locks these were. The Service_lock complicates things because it's held during oops_do, which may take out other G1 locks. >> This was built and tested with Shenandoah. Thanks to @zhengyu123 for the changes in Shenandoah. >> Tests run tier1-8. > > Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Fix Shenandoah mismerge > - Merge branch 'master' into checkrank > - Fix Shenandoah mismerge > - 8273300: Check Mutex ranking during a safepoint Hi Coleen, It wasn't clear if you thought you had already fixed the incorrect copyright notices so I flagged the two I saw. I still have some queries about this - see comment below - but nothing that would stop it from being pushed as-is. Thanks, David src/hotspot/cpu/zero/vm_version_zero.cpp line 2: > 1: /* > 2: * Copyright (c) 1997, 2021, Oracle and/or its affiliates. All rights reserved. Unwanted copyright date change. src/hotspot/share/runtime/mutex.hpp line 53: > 51: special = tty + 3, > 52: oopstorage = special + 3, > 53: leaf = oopstorage + 10, Why do we need such a big gap here? Is there any reason we can't just use the same gap between all named rankings? As it is there seems to be no rationale for the "+ N" value used. src/hotspot/share/utilities/growableArray.hpp line 2: > 1: /* > 2: * Copyright (c) 1997, 2021, Oracle and/or its affiliates. All rights reserved. Unwanted copyright change. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5467 From dholmes at openjdk.java.net Wed Sep 15 01:45:47 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 15 Sep 2021 01:45:47 GMT Subject: RFR: JDK-8272771: frame::pd_ps() is not implemented on any platform [v2] In-Reply-To: <-ijcyfXrSJxaJqJyhRIf8WOm7CuScV5wM8JDr0dZEag=.4d2e2a83-a2ff-47d3-8e98-29c656feb35e@github.com> References: <-ijcyfXrSJxaJqJyhRIf8WOm7CuScV5wM8JDr0dZEag=.4d2e2a83-a2ff-47d3-8e98-29c656feb35e@github.com> Message-ID: On Tue, 14 Sep 2021 07:07:34 GMT, Tobias Holenstein wrote: >> removed frame::pd_ps() which is not implemented on any platform. Replaced the only usage of frame::pd_ps() in the debug function `ps()` with `frame::print_on`. Tested on Tier1. >> >> Thanks! > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8272771: removed call to print_on() in debug::ps() LGTM. Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5487 From kevin.walls at oracle.com Wed Sep 15 09:12:27 2021 From: kevin.walls at oracle.com (Kevin Walls) Date: Wed, 15 Sep 2021 09:12:27 +0000 Subject: Regarding options of error and dump file paths In-Reply-To: <8282936293d12664a02b44dc6c169fc0@oss.nttdata.com> References: <92708e25-331f-f832-144b-eb00e2b0a4ac@oss.nttdata.com> <8f51414c-86c1-49de-7b5f-4af0fae556aa@oracle.com> <8282936293d12664a02b44dc6c169fc0@oss.nttdata.com> Message-ID: Hi Koichi, Yes, just wanted to (a little late) acknowledge that a few others were thinking about this kind of thing. 8-) I was thinking from a container point of view, had not heard the demand for this from support teams, but I can see the point somewhat. Running in a container, where you have some volume/location available for logs to persist, we would ideally have one VM option to set a base/root location for various output files that may currently default to the current directory, or somewhere else. We really want to take applications as they are, without changing their startup scripts etc, but adding one VM option seems reasonable. Recently I logged as a placeholder for exactly this kind of option... 8270552: Container convenience option. https://bugs.openjdk.java.net/browse/JDK-8270552 ...although I didn't progress it much so far, and have not suggested an option name. There are some complications I'm sure. e.g. Would the new option provide a root, and other settings e.g. ErrorFile or HeapDumpPath ALWAYS have the new root prepended? Or do we let absolute paths "escape" from the new root? (which might be more work for the users, as you may have several VM options to change, to make use of the new option). I think the new option is always the new root, for the affected paths. Also, in a container, we want to explore if this new location can be used for the attach api. There is currently much scanning of many /proc dirs on Linux. That is more involved, but could make use of the same option (the goal is to use fewer options). But this does not necessarily have to be implemented at the same time (as long as the new option is named appropriately). More to discuss... Thanks! Kevin -----Original Message----- From: hotspot-dev On Behalf Of Koichi Sakata Sent: 14 September 2021 07:45 To: hotspot-dev at openjdk.java.net Subject: Re: Regarding options of error and dump file paths Hi all, I believe that the option helps us, especially people who belong to support team.?Because it enables us easily to get required files to troubleshoot. It's also useful in container environment. We save those files when we set a path of the option to persistent volume, even if container are deleted. So I'm thinking about how the option works. First of all, it should deal with following files. - GC (heap dumps) - JIT (replay files) - hs_err files - JFR (a number of files) Whereas it should exclude files as follows. - jcmd/dcmd dumps - Unified logging Let's see concrete usage examples of the option. Suppose we name the option ReportDir. Case 1: Set no options JVM outputs files in each default directory when we set no options. - GC: ./java_pid%p.hprof - JIT: ./replay_pid%p.log - hs_err files: ./hs_err_pid%p.log - JFR: ./hs_err_pid%p.jfr, ./hs_oom_pid%p.jfr, ./hs_soe_pid%p.jfr Case 2: Set the option only We just run `java -XX:ReportDir=/foo/bar/ ...`, then those files are putted in the /foo/bar/ directory. - GC: /foo/bar/java_pid%p.hprof - JIT: /foo/bar/replay_pid%p.log - hs_err files: /foo/bar/hs_err_pid%p.log - JFR: /foo/bar/hs_err_pid%p.jfr, /foo/bar/hs_oom_pid%p.jfr, /foo/bar/hs_soe_pid%p.jfr Case 3: Set the option with a relative path Suppose the working directory is /home/duke, run `java -XX:ReportDir=./foo/bar/ ...`. JVM finds the output directory from the working directory and the relative path. - GC: /home/duke/foo/bar/java_pid%p.hprof - JIT: /home/duke/foo/bar/replay_pid%p.log - hs_err files: /home/duke/foo/bar/hs_err_pid%p.log - JFR: /home/duke/foo/bar/hs_err_pid%p.jfr, /home/duke/foo/bar/hs_oom_pid%p.jfr, /home/duke/foo/bar/hs_soe_pid%p.jfr Case 4: Set the option with the existing path option Run `java -XX:ReportDir=/foo/bar/ -XX:ErrorFile=/home/duke/hs_err_pid%p.log ...`. The path of ErrorFile overrides the value of ReportDir. - GC: /foo/bar/java_pid%p.hprof - JIT: /foo/bar/replay_pid%p.log - hs_err files: /home/duke/hs_err_pid%p.log <- It differs from the others - JFR: /foo/bar/hs_err_pid%p.jfr, /foo/bar/hs_oom_pid%p.jfr, /foo/bar/hs_soe_pid%p.jfr Case 5: Set the option with the existing path option which has a relative path Suppose the working directory is /home/duke, run `java -XX:ReportDir=./foo/bar/ -XX:HeapDumpPath=./baz/ -XX:+HeapDumpOnOutOfMemoryError ...`. - GC: /home/duke/foo/bar/baz/java_pid%p.hprof <- It differs from the others - JIT: /home/duke/foo/bar/replay_pid%p.log - hs_err files: /home/duke/foo/bar/hs_err_pid%p.log - JFR: /home/duke/foo/bar/hs_err_pid%p.jfr, /home/duke/foo/bar/hs_oom_pid%p.jfr, /home/duke/foo/bar/hs_soe_pid%p.jfr The above example finds the heap dump path by the combination of the working directory, the relative path of ReportDir and the relative path of HeapDumpPath. As an alternative idea, we can ignore the relative path of ReportDir when HeapDumpPath has a relative path. In that case, the heap dump path is as follows. - GC: /home/duke/baz/java_pid%p.hprof In either case, I recognize that using relative paths will be slightly complicated... Last but not least, I should be pleased if we could go ahead with this topic. Regards, Koichi On 03-09-2021 05:41 PM, Koichi Sakata wrote: > Hi David, > > I?m sorry for the late reply. Thank you for your great advice. > >> Having an explicit option override the default directory option is a >> good idea, but I'm not sure it is that clear cut. If you can specify >> a relative directory and file name for a given dump file, might you >> not want that to be relative to the specified default path, rather >> than relative to the pwd? > > I occasionally want to use a relative path from the specified default > path. This usage might confuse the path where files are outputted and > complicate to fix, so we probably should prohibit relative paths when > we use the default path. We can choose the specification after we find > detailed expectations. > >> And we actually have quite a lot of potential output files from: >> ?? - GC (heap dumps) >> ?? - JIT (replay files) >> ?? - hs_err files >> ?? - JFR (a number of files) >> ?? - jcmd/dcmd dumps? >> ?? - Unified logging? >> >> I think figuring out the exact details of how this should work, and >> interact with all the different files involved may be more involved >> than just prepending a path component. > > I completely agree with you. To enable the new option needs a lot of > our work, but that will improve convenience for users, I believe. > Enabling easily to gathering error related files in one place helps us > to troubleshoot. Not so many users set all these path options. If they > use the new option, all they have to do will be sending files in the > directory to their support personnel. In addition, they will get > easier to keep files even on container environments. > >> I also think I would need to hear much greater demand, with detailed >> usage expectations, before supporting this. > > I think so, too. I'd like to hear various people's point of view. > > Regards, > Koichi > > > On 2021/08/26 15:23, David Holmes wrote: >> Hi Koichi, >> >> On 23/08/2021 1:29 pm, Koichi Sakata wrote: >>> Hi all, >>> >>> I'm writing to get feedback on my idea about options for error and >>> dump file paths. >>> >>> First of all, we can specify several options related to error and >>> dump files. For example, the HeapDumpPath option sets the heap dump >>> file and the ErrorFile option sets the hs_error file. >>> >>> I've felt inconvenience about that because we need to write all path >>> options to put those files in a specific directory. I also recognize >>> that they are outputted in the working directory when I run an >>> application with no options. But I'd like to keep them in any >>> directory. So the new option that sets the directory where those >>> files are outputted would be useful, I think. >>> >>> The new option helps us especially to run applications on containers >>> like Docker, Kubernetes etc. If we run them without those existing >>> options on containers, files will be put in the local directory of >>> each container. We lose files after we operate the container such as >>> deleting it. The option enables us to keep certainly all error and >>> dump files if we just specify the path of the persistent volume for >>> the new option. >>> >>> As a concrete example, when we specify >>> -XX:ErrorAndDumpPath=/foo/bar/ (This option name is tentative), >>> -XX:+HeapDumpOnOutOfMemoryError and -XX:StartFlightRecording etc., >>> files are generated in the /foo/bar directory. From my point of >>> view, the option will deal with the following files: >>> - heap dump file (java_pid%p.hprof) >>> - error log file (hs_err_pid%p.log) >>> - JFR emergency dumps (hs_err_pid%p.jfr, hs_oom_pid%p.jfr, >>> hs_soe_pid%p.jfr) >>> - replay file (replay_pid%p.log) >>> >>> The existing path options should override the new option. If I set >>> -XX:ErrorAndDumpPath=/foo/bar/ and -XX:HeapDumpPath=/foo/baz/, a >>> heap dump file will be in the /foo/baz directory and other files >>> will be created in the /foo/bar. >>> >>> I would like to hear your point of view. If some people agree to >>> this idea, I will write a patch. >> >> My initial reaction was that this seemed something better handled in >> a launch script because I figured if you had complex needs in >> relation to where these files were being placed, then you'd use a >> launch script to help manage that anyway. >> >> But I can see there would be some convenience to controlling the >> output directory without also having to restate the default file >> names. >> >> Having an explicit option override the default directory option is a >> good idea, but I'm not sure it is that clear cut. If you can specify >> a relative directory and file name for a given dump file, might you >> not want that to be relative to the specified default path, rather >> than relative to the pwd? >> >> And we actually have quite a lot of potential output files from: >> ?- GC (heap dumps) >> ?- JIT (replay files) >> ?- hs_err files >> ?- JFR (a number of files) >> ?- jcmd/dcmd dumps? >> ?- Unified logging? >> >> I think figuring out the exact details of how this should work, and >> interact with all the different files involved may be more involved >> than just prepending a path component. >> >> I also think I would need to hear much greater demand, with detailed >> usage expectations, before supporting this. >> >> Just my 2c. >> >> Cheers, >> David >> ----- >> >>> Regards, >>> Koichi From aph-open at littlepinkcloud.com Wed Sep 15 09:29:12 2021 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Wed, 15 Sep 2021 10:29:12 +0100 Subject: Intrinsic methods and time to safepoint Message-ID: I've been looking at long-running intrinsics, and how long they block safepoints. At the moment, for example, encrypting a large array might lead to seconds of safepoint delay: even if you can encrypt a gigabyte per second, as we do now, a second is a long time for a computer. The most recent incarnations of concurrent GCs have pushed GC-caused pauses down into the millisecond range, a superb achievement. However, the non-GC pauses remain. I believe we should have a policy to cover how long an intrinsic can delay without responding to a safepoint, and that it should be in the millisecond range. It would make almost no difference to the performance of encryption if chunks handles by a fast intrinsic were, say, about a megabyte. The difference in performance is so small as to be immeasurable, and the improvement in the performance of other threads is vast. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From coleenp at openjdk.java.net Wed Sep 15 11:56:21 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 15 Sep 2021 11:56:21 GMT Subject: RFR: 8273300: Check Mutex ranking during a safepoint [v4] In-Reply-To: References: Message-ID: On Wed, 15 Sep 2021 01:28:34 GMT, David Holmes wrote: >> Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: >> >> - Fix Shenandoah mismerge >> - Merge branch 'master' into checkrank >> - Fix Shenandoah mismerge >> - 8273300: Check Mutex ranking during a safepoint > > src/hotspot/share/runtime/mutex.hpp line 53: > >> 51: special = tty + 3, >> 52: oopstorage = special + 3, >> 53: leaf = oopstorage + 10, > > Why do we need such a big gap here? Is there any reason we can't just use the same gap between all named rankings? As it is there seems to be no rationale for the "+ N" value used. The gap is pretty much arbitrary, but there's already some leaf-2 and having overlapping rankings is something we don't really want, so I changed it to 10 for now. > src/hotspot/share/utilities/growableArray.hpp line 2: > >> 1: /* >> 2: * Copyright (c) 1997, 2021, Oracle and/or its affiliates. All rights reserved. > > Unwanted copyright change. I think I finally reverted them. My commit script fixed them back again. ------------- PR: https://git.openjdk.java.net/jdk/pull/5467 From coleenp at openjdk.java.net Wed Sep 15 11:56:14 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 15 Sep 2021 11:56:14 GMT Subject: RFR: 8273300: Check Mutex ranking during a safepoint [v5] In-Reply-To: References: Message-ID: > This change checks lock ranking during a safepoint. For some reason, safepoint checking was excluded, probably from the days where Safepoint_lock and Threads_lock were used. > Because of checking during a safepoint, some locks had to get lower ranks. The CR has the details of which locks these were. The Service_lock complicates things because it's held during oops_do, which may take out other G1 locks. > This was built and tested with Shenandoah. Thanks to @zhengyu123 for the changes in Shenandoah. > Tests run tier1-8. Coleen Phillimore has updated the pull request incrementally with three additional commits since the last revision: - Revert copyright changes again - Revert "Revert unintended copyright changes." This reverts commit 712af5df87a5cefc16a9844867c3be1ae663b00d. - Revert unintended copyright changes. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5467/files - new: https://git.openjdk.java.net/jdk/pull/5467/files/9920c227..eec9b842 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5467&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5467&range=03-04 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/5467.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5467/head:pull/5467 PR: https://git.openjdk.java.net/jdk/pull/5467 From dholmes at openjdk.java.net Wed Sep 15 13:03:47 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 15 Sep 2021 13:03:47 GMT Subject: RFR: 8273300: Check Mutex ranking during a safepoint [v4] In-Reply-To: References: Message-ID: <9JRLvqeKqRe4yC21ZgRCNGGrFjJqykgTSQ4Q-9Ea2ss=.15756d59-e67e-4852-9413-4ae69049d9ae@github.com> On Wed, 15 Sep 2021 11:51:49 GMT, Coleen Phillimore wrote: >> src/hotspot/share/runtime/mutex.hpp line 53: >> >>> 51: special = tty + 3, >>> 52: oopstorage = special + 3, >>> 53: leaf = oopstorage + 10, >> >> Why do we need such a big gap here? Is there any reason we can't just use the same gap between all named rankings? As it is there seems to be no rationale for the "+ N" value used. > > The gap is pretty much arbitrary, but there's already some leaf-2 and having overlapping rankings is something we don't really want, so I changed it to 10 for now. Okay but is there a reason not to use the same gap? ------------- PR: https://git.openjdk.java.net/jdk/pull/5467 From github.com+741251+turbanoff at openjdk.java.net Wed Sep 15 13:36:52 2021 From: github.com+741251+turbanoff at openjdk.java.net (Andrey Turbanov) Date: Wed, 15 Sep 2021 13:36:52 GMT Subject: RFR: 8272992: Replace usages of Collections.sort with List.sort call in jdk.* modules [v2] In-Reply-To: References: <9xbhI0rwD3XbAHZFfQAkJHYivbC5F4N085RuSVWx8HU=.8a470c93-5fee-4981-97e4-afb6cb1147b9@github.com> Message-ID: On Tue, 14 Sep 2021 19:02:10 GMT, Chris Plummer wrote: >Make sure you run the tests in test/hotspot/jtreg/serviceability/sa/ and test/jdk/sun/tools/jhsdb/ Checked. All fine: ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR jtreg:test/hotspot/jtreg/serviceability/sa 56 56 0 0 jtreg:test/jdk/sun/tools/jhsdb 7 7 0 0 ============================== TEST SUCCESS Finished building target 'test' in configuration 'windows-x86_64-server-release' ------------- PR: https://git.openjdk.java.net/jdk/pull/5230 From github.com+71546117+tobiasholenstein at openjdk.java.net Wed Sep 15 14:02:52 2021 From: github.com+71546117+tobiasholenstein at openjdk.java.net (Tobias Holenstein) Date: Wed, 15 Sep 2021 14:02:52 GMT Subject: RFR: JDK-8272771: frame::pd_ps() is not implemented on any platform [v2] In-Reply-To: References: <-ijcyfXrSJxaJqJyhRIf8WOm7CuScV5wM8JDr0dZEag=.4d2e2a83-a2ff-47d3-8e98-29c656feb35e@github.com> Message-ID: <6kbOIJ7dDtmNX-jL2I7hEKR64E7mu9KaGpDTk1X4xPM=.5300b1c0-8319-47fe-99bf-244b49155a80@github.com> On Tue, 14 Sep 2021 07:27:50 GMT, Tobias Hartmann wrote: >> Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: >> >> JDK-8272771: removed call to print_on() in debug::ps() > > Looks good to me. Thanks @TobiHartmann @shipilev and @dholmes-ora for the reviews! ------------- PR: https://git.openjdk.java.net/jdk/pull/5487 From github.com+71546117+tobiasholenstein at openjdk.java.net Wed Sep 15 14:02:53 2021 From: github.com+71546117+tobiasholenstein at openjdk.java.net (Tobias Holenstein) Date: Wed, 15 Sep 2021 14:02:53 GMT Subject: Integrated: JDK-8272771: frame::pd_ps() is not implemented on any platform In-Reply-To: References: Message-ID: <4oWgP0vRzpdy60DPj2RMPHcEypgNvclQg63nHQYO9yQ=.c07c185e-076e-49b9-99ad-37d8550ca48b@github.com> On Mon, 13 Sep 2021 08:01:26 GMT, Tobias Holenstein wrote: > removed frame::pd_ps() which is not implemented on any platform. Replaced the only usage of frame::pd_ps() in the debug function `ps()` with `frame::print_on`. Tested on Tier1. > > Thanks! This pull request has now been integrated. Changeset: 82904246 Author: Tobias Holenstein Committer: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/82904246cd5af69eda96b0382b471d339bd9e204 Stats: 10 lines in 8 files changed: 0 ins; 10 del; 0 mod 8272771: frame::pd_ps() is not implemented on any platform Reviewed-by: shade, dholmes, thartmann ------------- PR: https://git.openjdk.java.net/jdk/pull/5487 From shade at openjdk.java.net Wed Sep 15 14:25:07 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 15 Sep 2021 14:25:07 GMT Subject: RFR: 8273314: Add tier4 test groups [v3] In-Reply-To: References: <6rDioD5KZREYHyzOXol72O0erNDqKHUqb-0k1SwhInA=.90a12492-9e2f-4f0c-bf09-8d67d1d6111f@github.com> Message-ID: <54l9_jkU4qhnz_ULvtN7sQXJ46LIdUhcdvxAkhCmlw4=.4a7adb15-e01c-4a38-bb99-3b577e9e05ca@github.com> On Mon, 6 Sep 2021 13:22:03 GMT, Aleksey Shipilev wrote: >> During the review of JDK-8272914 that added hotspot:tier{2,3} groups, @iignatev suggested to create tier4 groups that capture all tests not in tiers{1,2,3}. I have excluded `vmTestbase` and `hotspot:tier4,` because they take 10+ hours on my highly parallel machine. I have also excluded `applications` from `hotspot:tier4`, because they require test dependencies (e.g. jcstress). >> >> Sample run: >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >>>> jtreg:test/hotspot/jtreg:tier4 426 425 1 0 << >>>> jtreg:test/jdk:tier4 2891 2885 4 2 << >> jtreg:test/langtools:tier4 0 0 0 0 >> jtreg:test/jaxp:tier4 0 0 0 0 >> ============================== >> >> real 64m13.994s >> user 1462m1.213s >> sys 39m38.032s >> >> >> There are interesting test failures on my machine, which I would address separately. > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Drop applications and fix the comment Progress: a) `hotspot:tier4` still runs cleanly, and a bit faster due to recent `vmTestbase` parallelism improvements. b) `jdk:tier4` has a lot of failures in headful mode, probably because the tests do not like to run in parallel, see for example #5533. It would take a while to resolve for all GUI tests. If we are in agreement that current `tier4` definition is good, maybe it would be proper to integrate this PR, and then make `tier4` clean for headful mode? ------------- PR: https://git.openjdk.java.net/jdk/pull/5357 From coleenp at openjdk.java.net Wed Sep 15 15:33:56 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 15 Sep 2021 15:33:56 GMT Subject: RFR: 8273300: Check Mutex ranking during a safepoint [v4] In-Reply-To: <9JRLvqeKqRe4yC21ZgRCNGGrFjJqykgTSQ4Q-9Ea2ss=.15756d59-e67e-4852-9413-4ae69049d9ae@github.com> References: <9JRLvqeKqRe4yC21ZgRCNGGrFjJqykgTSQ4Q-9Ea2ss=.15756d59-e67e-4852-9413-4ae69049d9ae@github.com> Message-ID: On Wed, 15 Sep 2021 13:00:34 GMT, David Holmes wrote: >> The gap is pretty much arbitrary, but there's already some leaf-2 and having overlapping rankings is something we don't really want, so I changed it to 10 for now. > > Okay but is there a reason not to use the same gap? If the gap is 2, then leaf-2 will == oopstorage? I could make it 3 but it doesn't actually matter. ------------- PR: https://git.openjdk.java.net/jdk/pull/5467 From tschatzl at openjdk.java.net Wed Sep 15 15:38:18 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 15 Sep 2021 15:38:18 GMT Subject: RFR: 8273823: Problemlist gc/stringdedup tests timing out on ZGC Message-ID: Hi all, can I have reviews for problemlist the gc/stringdedup tests for ZGC? They time out a lot in CI currently, and generate a lot of noise. Testing: CI run with this change does not run the #id4 tests any more Thanks, Thomas ------------- Commit messages: - Problemlist gc/stringdedup for zgc because of many failures Changes: https://git.openjdk.java.net/jdk/pull/5534/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5534&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273823 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5534.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5534/head:pull/5534 PR: https://git.openjdk.java.net/jdk/pull/5534 From zgu at openjdk.java.net Wed Sep 15 15:44:02 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 15 Sep 2021 15:44:02 GMT Subject: RFR: 8273823: Problemlist gc/stringdedup tests timing out on ZGC In-Reply-To: References: Message-ID: On Wed, 15 Sep 2021 15:29:55 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for problemlist the gc/stringdedup tests for ZGC? They time out a lot in CI currently, and generate a lot of noise. > > Testing: CI run with this change does not run the #id4 tests any more > > Thanks, > Thomas Looks good and trivial ------------- Marked as reviewed by zgu (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5534 From lkorinth at openjdk.java.net Wed Sep 15 15:49:57 2021 From: lkorinth at openjdk.java.net (Leo Korinth) Date: Wed, 15 Sep 2021 15:49:57 GMT Subject: RFR: 8273823: Problemlist gc/stringdedup tests timing out on ZGC In-Reply-To: References: Message-ID: <-iogOwhtjx7CVX3qEakIU6Ome2xzeR7UaPZLT81UiOU=.06f724ab-2c41-4f19-97c9-36b9681574ad@github.com> On Wed, 15 Sep 2021 15:29:55 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for problemlist the gc/stringdedup tests for ZGC? They time out a lot in CI currently, and generate a lot of noise. > > Testing: CI run with this change does not run the #id4 tests any more > > Thanks, > Thomas Looks good and trivial. Thanks for problem listing these tests! ------------- Marked as reviewed by lkorinth (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5534 From tschatzl at openjdk.java.net Wed Sep 15 15:49:57 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 15 Sep 2021 15:49:57 GMT Subject: RFR: 8273823: Problemlist gc/stringdedup tests timing out on ZGC In-Reply-To: <-iogOwhtjx7CVX3qEakIU6Ome2xzeR7UaPZLT81UiOU=.06f724ab-2c41-4f19-97c9-36b9681574ad@github.com> References: <-iogOwhtjx7CVX3qEakIU6Ome2xzeR7UaPZLT81UiOU=.06f724ab-2c41-4f19-97c9-36b9681574ad@github.com> Message-ID: On Wed, 15 Sep 2021 15:45:51 GMT, Leo Korinth wrote: >> Hi all, >> >> can I have reviews for problemlist the gc/stringdedup tests for ZGC? They time out a lot in CI currently, and generate a lot of noise. >> >> Testing: CI run with this change does not run the #id4 tests any more >> >> Thanks, >> Thomas > > Looks good and trivial. Thanks for problem listing these tests! Thanks @lkorinth @zhengyu123 for your reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/5534 From tschatzl at openjdk.java.net Wed Sep 15 15:52:53 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 15 Sep 2021 15:52:53 GMT Subject: Integrated: 8273823: Problemlist gc/stringdedup tests timing out on ZGC In-Reply-To: References: Message-ID: On Wed, 15 Sep 2021 15:29:55 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for problemlist the gc/stringdedup tests for ZGC? They time out a lot in CI currently, and generate a lot of noise. > > Testing: CI run with this change does not run the #id4 tests any more > > Thanks, > Thomas This pull request has now been integrated. Changeset: 7b2beb6b Author: Thomas Schatzl URL: https://git.openjdk.java.net/jdk/commit/7b2beb6ba6df868fa8e44701f906c40bb7c407bb Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod 8273823: Problemlist gc/stringdedup tests timing out on ZGC Reviewed-by: zgu, lkorinth ------------- PR: https://git.openjdk.java.net/jdk/pull/5534 From pchilanomate at openjdk.java.net Wed Sep 15 16:05:48 2021 From: pchilanomate at openjdk.java.net (Patricio Chilano Mateo) Date: Wed, 15 Sep 2021 16:05:48 GMT Subject: RFR: 8273300: Check Mutex ranking during a safepoint [v5] In-Reply-To: References: Message-ID: On Wed, 15 Sep 2021 11:56:14 GMT, Coleen Phillimore wrote: >> This change checks lock ranking during a safepoint. For some reason, safepoint checking was excluded, probably from the days where Safepoint_lock and Threads_lock were used. >> Because of checking during a safepoint, some locks had to get lower ranks. The CR has the details of which locks these were. The Service_lock complicates things because it's held during oops_do, which may take out other G1 locks. >> This was built and tested with Shenandoah. Thanks to @zhengyu123 for the changes in Shenandoah. >> Tests run tier1-8. > > Coleen Phillimore has updated the pull request incrementally with three additional commits since the last revision: > > - Revert copyright changes again > - Revert "Revert unintended copyright changes." > > This reverts commit 712af5df87a5cefc16a9844867c3be1ae663b00d. > - Revert unintended copyright changes. Marked as reviewed by pchilanomate (Committer). src/hotspot/share/runtime/mutex.cpp line 375: > 373: Mutex* locks_owned = thread->owned_locks(); > 374: > 375: if (!SafepointSynchronize::is_at_safepoint()) { I think removing this conditional was okay in your previous change. ------------- PR: https://git.openjdk.java.net/jdk/pull/5467 From coleenp at openjdk.java.net Wed Sep 15 16:32:24 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 15 Sep 2021 16:32:24 GMT Subject: RFR: 8273300: Check Mutex ranking during a safepoint [v6] In-Reply-To: References: Message-ID: On Wed, 15 Sep 2021 16:02:18 GMT, Patricio Chilano Mateo wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove safepoint conditional from other assert. The locks should now be in decreasing order. > > src/hotspot/share/runtime/mutex.cpp line 375: > >> 373: Mutex* locks_owned = thread->owned_locks(); >> 374: >> 375: if (!SafepointSynchronize::is_at_safepoint()) { > > I think removing this conditional was okay in your previous change. Thanks Patricio for noticing this omission. I've removed it and retesting. ------------- PR: https://git.openjdk.java.net/jdk/pull/5467 From coleenp at openjdk.java.net Wed Sep 15 16:32:22 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Wed, 15 Sep 2021 16:32:22 GMT Subject: RFR: 8273300: Check Mutex ranking during a safepoint [v6] In-Reply-To: References: Message-ID: > This change checks lock ranking during a safepoint. For some reason, safepoint checking was excluded, probably from the days where Safepoint_lock and Threads_lock were used. > Because of checking during a safepoint, some locks had to get lower ranks. The CR has the details of which locks these were. The Service_lock complicates things because it's held during oops_do, which may take out other G1 locks. > This was built and tested with Shenandoah. Thanks to @zhengyu123 for the changes in Shenandoah. > Tests run tier1-8. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Remove safepoint conditional from other assert. The locks should now be in decreasing order. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5467/files - new: https://git.openjdk.java.net/jdk/pull/5467/files/eec9b842..ee2ba0fb Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5467&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5467&range=04-05 Stats: 8 lines in 1 file changed: 0 ins; 2 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/5467.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5467/head:pull/5467 PR: https://git.openjdk.java.net/jdk/pull/5467 From iklam at openjdk.java.net Wed Sep 15 17:02:06 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 15 Sep 2021 17:02:06 GMT Subject: RFR: 8271073: Improve testing with VM option VerifyArchivedFields [v2] In-Reply-To: <2UUL97l_3iFjZBfTVxhvWeFZIPMZFqrD568mK-036VE=.90f17c84-eb88-4f8a-99ad-ddf520259638@github.com> References: <2UUL97l_3iFjZBfTVxhvWeFZIPMZFqrD568mK-036VE=.90f17c84-eb88-4f8a-99ad-ddf520259638@github.com> Message-ID: > - Changed the definition of `VerifyArchivedFields` from a whacky use of `bool` to an `int` and properly define its three levels: > - 0: No verification > - 1: Basic verification with VM_Verify (no side effects) > - 2: Detailed verification by forcing a GC (with side effects) > - Changed the default value to 0. The functionality checked by this flag has been very stable so there's no need to verify it in every single test case. > - Enabled `-XX:VerifyArchivedFields=1` for all CDS test cases. > - Added a new test case for `-XX:VerifyArchivedFields=2` . > - Also added comments about that this flag is suppose to check for. Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: the test should require vm.cds.write.archived.java.heap ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5514/files - new: https://git.openjdk.java.net/jdk/pull/5514/files/b8e22df2..d7cd3a5b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5514&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5514&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5514.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5514/head:pull/5514 PR: https://git.openjdk.java.net/jdk/pull/5514 From ccheung at openjdk.java.net Wed Sep 15 17:06:50 2021 From: ccheung at openjdk.java.net (Calvin Cheung) Date: Wed, 15 Sep 2021 17:06:50 GMT Subject: RFR: 8271073: Improve testing with VM option VerifyArchivedFields [v2] In-Reply-To: References: <2UUL97l_3iFjZBfTVxhvWeFZIPMZFqrD568mK-036VE=.90f17c84-eb88-4f8a-99ad-ddf520259638@github.com> Message-ID: On Wed, 15 Sep 2021 17:02:06 GMT, Ioi Lam wrote: >> - Changed the definition of `VerifyArchivedFields` from a whacky use of `bool` to an `int` and properly define its three levels: >> - 0: No verification >> - 1: Basic verification with VM_Verify (no side effects) >> - 2: Detailed verification by forcing a GC (with side effects) >> - Changed the default value to 0. The functionality checked by this flag has been very stable so there's no need to verify it in every single test case. >> - Enabled `-XX:VerifyArchivedFields=1` for all CDS test cases. >> - Added a new test case for `-XX:VerifyArchivedFields=2` . >> - Also added comments about that this flag is suppose to check for. > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > the test should require vm.cds.write.archived.java.heap Looks good. ------------- Marked as reviewed by ccheung (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5514 From iklam at openjdk.java.net Wed Sep 15 21:18:37 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 15 Sep 2021 21:18:37 GMT Subject: RFR: 8271073: Improve testing with VM option VerifyArchivedFields [v3] In-Reply-To: <2UUL97l_3iFjZBfTVxhvWeFZIPMZFqrD568mK-036VE=.90f17c84-eb88-4f8a-99ad-ddf520259638@github.com> References: <2UUL97l_3iFjZBfTVxhvWeFZIPMZFqrD568mK-036VE=.90f17c84-eb88-4f8a-99ad-ddf520259638@github.com> Message-ID: > - Changed the definition of `VerifyArchivedFields` from a whacky use of `bool` to an `int` and properly define its three levels: > - 0: No verification > - 1: Basic verification with VM_Verify (no side effects) > - 2: Detailed verification by forcing a GC (with side effects) > - Changed the default value to 0. The functionality checked by this flag has been very stable so there's no need to verify it in every single test case. > - Enabled `-XX:VerifyArchivedFields=1` for all CDS test cases. > - Added a new test case for `-XX:VerifyArchivedFields=2` . > - Also added comments about that this flag is suppose to check for. Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: added range(0,2) for VerifyArchivedFields ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5514/files - new: https://git.openjdk.java.net/jdk/pull/5514/files/d7cd3a5b..b99f8db2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5514&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5514&range=01-02 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5514.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5514/head:pull/5514 PR: https://git.openjdk.java.net/jdk/pull/5514 From ccheung at openjdk.java.net Wed Sep 15 21:36:54 2021 From: ccheung at openjdk.java.net (Calvin Cheung) Date: Wed, 15 Sep 2021 21:36:54 GMT Subject: RFR: 8271073: Improve testing with VM option VerifyArchivedFields [v3] In-Reply-To: References: <2UUL97l_3iFjZBfTVxhvWeFZIPMZFqrD568mK-036VE=.90f17c84-eb88-4f8a-99ad-ddf520259638@github.com> Message-ID: On Wed, 15 Sep 2021 21:18:37 GMT, Ioi Lam wrote: >> - Changed the definition of `VerifyArchivedFields` from a whacky use of `bool` to an `int` and properly define its three levels: >> - 0: No verification >> - 1: Basic verification with VM_Verify (no side effects) >> - 2: Detailed verification by forcing a GC (with side effects) >> - Changed the default value to 0. The functionality checked by this flag has been very stable so there's no need to verify it in every single test case. >> - Enabled `-XX:VerifyArchivedFields=1` for all CDS test cases. >> - Added a new test case for `-XX:VerifyArchivedFields=2` . >> - Also added comments about that this flag is suppose to check for. > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > added range(0,2) for VerifyArchivedFields Marked as reviewed by ccheung (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5514 From dholmes at openjdk.java.net Wed Sep 15 21:56:57 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 15 Sep 2021 21:56:57 GMT Subject: RFR: 8273300: Check Mutex ranking during a safepoint [v4] In-Reply-To: References: <9JRLvqeKqRe4yC21ZgRCNGGrFjJqykgTSQ4Q-9Ea2ss=.15756d59-e67e-4852-9413-4ae69049d9ae@github.com> Message-ID: On Wed, 15 Sep 2021 15:30:39 GMT, Coleen Phillimore wrote: >> Okay but is there a reason not to use the same gap? > > If the gap is 2, then leaf-2 will == oopstorage? I could make it 3 but it doesn't actually matter. Obviously the gap needs to be big enough to avoid the overlap, but the presence of +1, +3, +6 and +10 just raise questions as to why different values are used. Why not make them all +10 if you need 10 slots in some areas? ------------- PR: https://git.openjdk.java.net/jdk/pull/5467 From pliden at openjdk.java.net Thu Sep 16 08:09:02 2021 From: pliden at openjdk.java.net (Per Liden) Date: Thu, 16 Sep 2021 08:09:02 GMT Subject: RFR: 8273872: ZGC: Explicitly use 2M large pages Message-ID: ZGC requires large pages to be 2M. However, ZGC doesn't explicitly asks for this page size and instead relies on the default large pages size for the system to be 2M. On systems where this is not true, ZGC will fails with an error message. To avoid this, ZGC should explicitly ask for 2M large pages and not rely on the system default. Furthermore, ZGC currently ignores `-XX:LargePageSizeInBytes`. ZGC should fails with an error message if it's specified to something other than 2M. ------------- Commit messages: - 8273872: ZGC: Explicitly use 2M large pages Changes: https://git.openjdk.java.net/jdk/pull/5541/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5541&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273872 Stats: 11 lines in 2 files changed: 9 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/5541.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5541/head:pull/5541 PR: https://git.openjdk.java.net/jdk/pull/5541 From eosterlund at openjdk.java.net Thu Sep 16 08:09:02 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 16 Sep 2021 08:09:02 GMT Subject: RFR: 8273872: ZGC: Explicitly use 2M large pages In-Reply-To: References: Message-ID: On Thu, 16 Sep 2021 07:59:32 GMT, Per Liden wrote: > ZGC requires large pages to be 2M. However, ZGC doesn't explicitly asks for this page size and instead relies on the default large pages size for the system to be 2M. On systems where this is not true, ZGC will fails with an error message. To avoid this, ZGC should explicitly ask for 2M large pages and not rely on the system default. Furthermore, ZGC currently ignores `-XX:LargePageSizeInBytes`. ZGC should fails with an error message if it's specified to something other than 2M. Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5541 From mdoerr at openjdk.java.net Thu Sep 16 08:17:51 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 16 Sep 2021 08:17:51 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow In-Reply-To: References: Message-ID: On Mon, 13 Sep 2021 10:05:16 GMT, Volker Simonis wrote: > Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. > > If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): > > public static boolean isAlpha(int c) { > try { > return IS_ALPHA[c]; > } catch (ArrayIndexOutOfBoundsException ex) { > return false; > } > } > > > ### Solution > > Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: > > -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op > ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op > ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op > ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op > > -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op > ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op > ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op > ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op > > > ### Implementation details > > - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. > - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. > - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. > - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. > - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. This looks like a great idea. I have a few minor remarks / suggestions. src/hotspot/share/ci/ciEnv.cpp line 375: > 373: // ------------------------------------------------------------------ > 374: // helper for -XX:+OptimizeImplicitExceptions > 375: ciInstanceKlass* ciEnv::exception_instanceKlass_for_reason(Deoptimization::DeoptReason reason, bool aastore) { Better `is_aastore` or pass Bytecode? src/hotspot/share/opto/graphKit.cpp line 631: > 629: Node* ex_node = new_instance(makecon(ex_type), NULL, NULL, true); > 630: set_argument(0, ex_node); > 631: ciMethod* init = ex_ciInstKlass->find_method(ciSymbol::make(""), ciSymbol::make("()V")); Extra whitespace. src/hotspot/share/runtime/globals.hpp line 645: > 643: "Omit backtraces for some 'hot' exceptions in optimized code") \ > 644: \ > 645: product(bool, OptimizeImplicitExceptions, true, \ Should it be a diagnostic flag? Regular product flags require a CSR. src/hotspot/share/runtime/sharedRuntime.cpp line 1096: > 1094: bc = bytecode.invoke_code(); > 1095: } > 1096: else { Coding style: newline before `else` ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From eosterlund at openjdk.java.net Thu Sep 16 08:19:46 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 16 Sep 2021 08:19:46 GMT Subject: RFR: 8273872: ZGC: Explicitly use 2M large pages In-Reply-To: References: Message-ID: On Thu, 16 Sep 2021 07:59:32 GMT, Per Liden wrote: > ZGC requires large pages to be 2M. However, ZGC doesn't explicitly asks for this page size and instead relies on the default large pages size for the system to be 2M. On systems where this is not true, ZGC will fails with an error message. To avoid this, ZGC should explicitly ask for 2M large pages and not rely on the system default. Furthermore, ZGC currently ignores `-XX:LargePageSizeInBytes`. ZGC should fails with an error message if it's specified to something other than 2M. Actually it looks like LargePageSizeInBytes is wrong if not set explicitly to 2M, so it ends up bailing out even though the user didn't specify any particular size. ------------- Changes requested by eosterlund (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5541 From tschatzl at openjdk.java.net Thu Sep 16 08:19:47 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 16 Sep 2021 08:19:47 GMT Subject: RFR: 8273872: ZGC: Explicitly use 2M large pages In-Reply-To: References: Message-ID: On Thu, 16 Sep 2021 07:59:32 GMT, Per Liden wrote: > ZGC requires large pages to be 2M. However, ZGC doesn't explicitly asks for this page size and instead relies on the default large pages size for the system to be 2M. On systems where this is not true, ZGC will fails with an error message. To avoid this, ZGC should explicitly ask for 2M large pages and not rely on the system default. Furthermore, ZGC currently ignores `-XX:LargePageSizeInBytes`. ZGC should fails with an error message if it's specified to something other than 2M. Looks good apart from the minor typos. src/hotspot/os/linux/gc/z/zPhysicalMemoryBacking_linux.cpp line 210: > 208: > 209: // Create file > 210: const int extra_flags = ZLargePages::is_explicit() ? (MFD_HUGETLB | MFD_HUGE_2MB) : 0; Potentially the use of the constant `MFD_HUGE_2MB` could be generalized a little and calculated from a required page size like we do in `Linux::commit_memory_special` via a helper function like `os::Linux::hugetlbfs_page_size_flag`; at least that extra flag just looks like it is actually generated the same way. Looking through `memfd.h` it is *exactly* the same as for the corresponding `HUGETLB_*` flags. But since this is really ZGC specific code, up to you. src/hotspot/os/linux/gc/z/zPhysicalMemoryBacking_linux.cpp line 216: > 214: log_debug_p(gc, init)("Failed to create memfd file (%s)", > 215: (ZLargePages::is_explicit() && (err == EINVAL || err == ENODEV)) ? > 216: "Hugepages (2M) not supported" : err.to_string()); Maybe this should be something like: Suggestion: "Hugepages (2M) not available" : err.to_string()); As in `ZArguments` the code already checks that 2M page size are requested (if any). src/hotspot/share/gc/z/zArguments.cpp line 75: > 73: } > 74: > 75: // Only 2M large pages is supported Suggestion: // Only 2M large pages are supported. src/hotspot/share/gc/z/zArguments.cpp line 77: > 75: // Only 2M large pages is supported > 76: if (!FLAG_IS_DEFAULT(LargePageSizeInBytes) && LargePageSizeInBytes != 2 * M) { > 77: vm_exit_during_initialization("Invalid -XX:LargePageSizeInBytes (only 2M large pages is supported)"); Suggestion: vm_exit_during_initialization("Invalid -XX:LargePageSizeInBytes (only 2M large pages are supported)"); ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5541 From sjohanss at openjdk.java.net Thu Sep 16 08:34:51 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Thu, 16 Sep 2021 08:34:51 GMT Subject: RFR: 8273872: ZGC: Explicitly use 2M large pages In-Reply-To: References: Message-ID: On Thu, 16 Sep 2021 07:59:32 GMT, Per Liden wrote: > ZGC requires large pages to be 2M. However, ZGC doesn't explicitly asks for this page size and instead relies on the default large pages size for the system to be 2M. On systems where this is not true, ZGC will fails with an error message. To avoid this, ZGC should explicitly ask for 2M large pages and not rely on the system default. Furthermore, ZGC currently ignores `-XX:LargePageSizeInBytes`. ZGC should fails with an error message if it's specified to something other than 2M. src/hotspot/share/gc/z/zArguments.cpp line 78: > 76: if (!FLAG_IS_DEFAULT(LargePageSizeInBytes) && LargePageSizeInBytes != 2 * M) { > 77: vm_exit_during_initialization("Invalid -XX:LargePageSizeInBytes (only 2M large pages is supported)"); > 78: } To better handle the case where the default large page size is not supported I suggest we add something like: if (LargePageSizeInBytes != 2 * M) { if (FLAG_IS_DEFAULT(LargePageSizeInBytes)) { vm_exit_during_initialization("Default large page size is not supported (only 2M large pages are supported)"); } else { vm_exit_during_initialization("Invalid -XX:LargePageSizeInBytes (only 2M large pages are supported)"); } } Probably good to include the default size in the above print. ------------- PR: https://git.openjdk.java.net/jdk/pull/5541 From pliden at openjdk.java.net Thu Sep 16 09:41:24 2021 From: pliden at openjdk.java.net (Per Liden) Date: Thu, 16 Sep 2021 09:41:24 GMT Subject: RFR: 8273872: ZGC: Explicitly use 2M large pages [v2] In-Reply-To: References: Message-ID: > ZGC requires large pages to be 2M. However, ZGC doesn't explicitly asks for this page size and instead relies on the default large pages size for the system to be 2M. On systems where this is not true, ZGC will fails with an error message. To avoid this, ZGC should explicitly ask for 2M large pages and not rely on the system default. Furthermore, ZGC currently ignores `-XX:LargePageSizeInBytes`. ZGC should fails with an error message if it's specified to something other than 2M. Per Liden has updated the pull request incrementally with one additional commit since the last revision: Review comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5541/files - new: https://git.openjdk.java.net/jdk/pull/5541/files/910ef308..d1a89d2b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5541&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5541&range=00-01 Stats: 10 lines in 2 files changed: 0 ins; 6 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/5541.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5541/head:pull/5541 PR: https://git.openjdk.java.net/jdk/pull/5541 From pliden at openjdk.java.net Thu Sep 16 09:41:31 2021 From: pliden at openjdk.java.net (Per Liden) Date: Thu, 16 Sep 2021 09:41:31 GMT Subject: RFR: 8273872: ZGC: Explicitly use 2M large pages [v2] In-Reply-To: References: Message-ID: On Thu, 16 Sep 2021 08:15:08 GMT, Thomas Schatzl wrote: >> Per Liden has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments > > src/hotspot/os/linux/gc/z/zPhysicalMemoryBacking_linux.cpp line 210: > >> 208: >> 209: // Create file >> 210: const int extra_flags = ZLargePages::is_explicit() ? (MFD_HUGETLB | MFD_HUGE_2MB) : 0; > > Potentially the use of the constant `MFD_HUGE_2MB` could be generalized a little and calculated from a required page size like we do in `Linux::commit_memory_special` via a helper function like `os::Linux::hugetlbfs_page_size_flag`; at least that extra flag just looks like it is actually generated the same way. Looking through `memfd.h` it is *exactly* the same as for the corresponding `HUGETLB_*` flags. > But since this is really ZGC specific code, up to you. I don't think we should be passing `MAP_*` flags to `memfd_create()` as there's no guarantee that `MAP_HUGE_2MB` and `MFD_HUGE_2MB` are the same values. It's more of a lucky/convenient implementation detail that they happen to be the same. > src/hotspot/os/linux/gc/z/zPhysicalMemoryBacking_linux.cpp line 216: > >> 214: log_debug_p(gc, init)("Failed to create memfd file (%s)", >> 215: (ZLargePages::is_explicit() && (err == EINVAL || err == ENODEV)) ? >> 216: "Hugepages (2M) not supported" : err.to_string()); > > Maybe this should be something like: > > Suggestion: > > "Hugepages (2M) not available" : err.to_string()); > > > As in `ZArguments` the code already checks that 2M page size are requested (if any). Fixed > src/hotspot/share/gc/z/zArguments.cpp line 75: > >> 73: } >> 74: >> 75: // Only 2M large pages is supported > > Suggestion: > > // Only 2M large pages are supported. Fixed > src/hotspot/share/gc/z/zArguments.cpp line 77: > >> 75: // Only 2M large pages is supported >> 76: if (!FLAG_IS_DEFAULT(LargePageSizeInBytes) && LargePageSizeInBytes != 2 * M) { >> 77: vm_exit_during_initialization("Invalid -XX:LargePageSizeInBytes (only 2M large pages is supported)"); > > Suggestion: > > vm_exit_during_initialization("Invalid -XX:LargePageSizeInBytes (only 2M large pages are supported)"); Fixed ------------- PR: https://git.openjdk.java.net/jdk/pull/5541 From pliden at openjdk.java.net Thu Sep 16 09:41:34 2021 From: pliden at openjdk.java.net (Per Liden) Date: Thu, 16 Sep 2021 09:41:34 GMT Subject: RFR: 8273872: ZGC: Explicitly use 2M large pages [v2] In-Reply-To: References: Message-ID: <9tTFm3PJgCrdzRex2j1JRkaUwkemCIlBl3EXh7vchR8=.ec31b552-4133-4631-9344-be9e8532ecb1@github.com> On Thu, 16 Sep 2021 08:32:04 GMT, Stefan Johansson wrote: >> Per Liden has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments > > src/hotspot/share/gc/z/zArguments.cpp line 78: > >> 76: if (!FLAG_IS_DEFAULT(LargePageSizeInBytes) && LargePageSizeInBytes != 2 * M) { >> 77: vm_exit_during_initialization("Invalid -XX:LargePageSizeInBytes (only 2M large pages is supported)"); >> 78: } > > To better handle the case where the default large page size is not supported I suggest we add something like: > > if (LargePageSizeInBytes != 2 * M) { > if (FLAG_IS_DEFAULT(LargePageSizeInBytes)) { > vm_exit_during_initialization("Default large page size is not supported (only 2M large pages are supported)"); > } else { > vm_exit_during_initialization("Invalid -XX:LargePageSizeInBytes (only 2M large pages are supported)"); > } > } > > Probably good to include the default size in the above print. I talked to Stefan offline and we agreed to not do this change, since it's not quite the behavior we want. ------------- PR: https://git.openjdk.java.net/jdk/pull/5541 From shade at openjdk.java.net Thu Sep 16 09:49:14 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 16 Sep 2021 09:49:14 GMT Subject: RFR: 8273880: Zero: Print warnings when unsupported intrinsics are enabled Message-ID: At least one test is currently failing: $ CONF=linux-x86_64-zero-fastdebug make exploded-test TEST=compiler/cpuflags/TestAESIntrinsicsOnUnsupportedConfig.java ... # java.lang.AssertionError: Expected message not found: 'warning: AES instructions are not available on this CPU'. Zero should print warnings when unsupported (all) intrinsics are enabled. ------------- Commit messages: - Fix Changes: https://git.openjdk.java.net/jdk/pull/5545/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5545&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273880 Stats: 65 lines in 1 file changed: 65 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5545.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5545/head:pull/5545 PR: https://git.openjdk.java.net/jdk/pull/5545 From coleenp at openjdk.java.net Thu Sep 16 11:53:46 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 16 Sep 2021 11:53:46 GMT Subject: RFR: 8273300: Check Mutex ranking during a safepoint [v4] In-Reply-To: References: <9JRLvqeKqRe4yC21ZgRCNGGrFjJqykgTSQ4Q-9Ea2ss=.15756d59-e67e-4852-9413-4ae69049d9ae@github.com> Message-ID: On Wed, 15 Sep 2021 21:53:17 GMT, David Holmes wrote: >> If the gap is 2, then leaf-2 will == oopstorage? I could make it 3 but it doesn't actually matter. > > Obviously the gap needs to be big enough to avoid the overlap, but the presence of +1, +3, +6 and +10 just raise questions as to why different values are used. Why not make them all +10 if you need 10 slots in some areas? I'm going to address this with a future change because we don't have range checking for overlap yet, and they should all be some consistent and not arbitrary range. If it was less arbitrary, I'd pick another 6 because runtime/mutexLocker.cpp: def(Metaspace_lock , PaddedMutex , leaf-3, true, _safepoint_check_never); The rank 'service' had 6 because there are at least 4 locks below that. Again, I should add range checking and maybe making them all 10 makes a lot of sense. Locks don't nest that deeply. I'll add this as a comment in: JDK-8176393. ------------- PR: https://git.openjdk.java.net/jdk/pull/5467 From coleenp at openjdk.java.net Thu Sep 16 12:06:02 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 16 Sep 2021 12:06:02 GMT Subject: RFR: 8273300: Check Mutex ranking during a safepoint [v6] In-Reply-To: References: Message-ID: <4ildDdKOZMCmlTSX7hBnWPsO9UJwPdkcgoFD_pM036I=.f3fb3d0b-dba6-479f-84c1-a292675e8164@github.com> On Wed, 15 Sep 2021 16:32:22 GMT, Coleen Phillimore wrote: >> This change checks lock ranking during a safepoint. For some reason, safepoint checking was excluded, probably from the days where Safepoint_lock and Threads_lock were used. >> Because of checking during a safepoint, some locks had to get lower ranks. The CR has the details of which locks these were. The Service_lock complicates things because it's held during oops_do, which may take out other G1 locks. >> This was built and tested with Shenandoah. Thanks to @zhengyu123 for the changes in Shenandoah. >> Tests run tier1-8. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Remove safepoint conditional from other assert. The locks should now be in decreasing order. Thanks Patricio for the code review. Thanks David for all the comments. ------------- PR: https://git.openjdk.java.net/jdk/pull/5467 From coleenp at openjdk.java.net Thu Sep 16 12:06:03 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 16 Sep 2021 12:06:03 GMT Subject: Integrated: 8273300: Check Mutex ranking during a safepoint In-Reply-To: References: Message-ID: On Fri, 10 Sep 2021 15:23:49 GMT, Coleen Phillimore wrote: > This change checks lock ranking during a safepoint. For some reason, safepoint checking was excluded, probably from the days where Safepoint_lock and Threads_lock were used. > Because of checking during a safepoint, some locks had to get lower ranks. The CR has the details of which locks these were. The Service_lock complicates things because it's held during oops_do, which may take out other G1 locks. > This was built and tested with Shenandoah. Thanks to @zhengyu123 for the changes in Shenandoah. > Tests run tier1-8. This pull request has now been integrated. Changeset: 5e4d09c2 Author: Coleen Phillimore URL: https://git.openjdk.java.net/jdk/commit/5e4d09c22921f2980f84f849eb600d2e524d0733 Stats: 39 lines in 12 files changed: 1 ins; 9 del; 29 mod 8273300: Check Mutex ranking during a safepoint Reviewed-by: eosterlund, dholmes, pchilanomate ------------- PR: https://git.openjdk.java.net/jdk/pull/5467 From rwestrel at redhat.com Thu Sep 16 12:28:17 2021 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 16 Sep 2021 14:28:17 +0200 Subject: Intrinsic methods and time to safepoint In-Reply-To: References: Message-ID: <87czp8g0wu.fsf@redhat.com> > I believe we should have a policy to cover how long an intrinsic can > delay without responding to a safepoint, and that it should be in the > millisecond range. It would make almost no difference to the > performance of encryption if chunks handles by a fast intrinsic were, > say, about a megabyte. The difference in performance is so small as to > be immeasurable, and the improvement in the performance of other threads > is vast. I agree with you (seems like a no brainer) but I have a couple comments about implementation details. Those intrinsics usually call some stub. It's not possible AFAICT, to have the safepoint in the stub itself. So we need some loop that repeatedly calls the stub. That loop can either be added 1) by the JIT as IR when the intrinsic is expanded 2) in java code, that is java library code needs to be refactored. 2) would seem much easier to implement and would work for both c1 and c2 (if some of these intrinsics end up implemented by c1). Also a note of caution about loop strip mining: it doesn't have a model for what the loop body costs. So it blindly assumes all loop bodies can be run for 1000 iterations (by default) between safepoints. Unless I'm missing something, with a stub running for 1ms, delays between safepoint could still be 1s. Roland. From fweimer at redhat.com Thu Sep 16 12:30:21 2021 From: fweimer at redhat.com (Florian Weimer) Date: Thu, 16 Sep 2021 14:30:21 +0200 Subject: Intrinsic methods and time to safepoint In-Reply-To: <87czp8g0wu.fsf@redhat.com> (Roland Westrelin's message of "Thu, 16 Sep 2021 14:28:17 +0200") References: <87czp8g0wu.fsf@redhat.com> Message-ID: <87ee9oln36.fsf@oldenburg.str.redhat.com> * Roland Westrelin: > 2) would seem much easier to implement and would work for both c1 and c2 > (if some of these intrinsics end up implemented by c1). I think this has been done for some critical native JNI functions in the past (the CRC code?). Thanks, Florian From pliden at openjdk.java.net Thu Sep 16 12:31:27 2021 From: pliden at openjdk.java.net (Per Liden) Date: Thu, 16 Sep 2021 12:31:27 GMT Subject: RFR: 8273872: ZGC: Explicitly use 2M large pages [v3] In-Reply-To: References: Message-ID: > ZGC requires large pages to be 2M. However, ZGC doesn't explicitly asks for this page size and instead relies on the default large pages size for the system to be 2M. On systems where this is not true, ZGC will fails with an error message. To avoid this, ZGC should explicitly ask for 2M large pages and not rely on the system default. Furthermore, ZGC currently ignores `-XX:LargePageSizeInBytes`. ZGC should fails with an error message if it's specified to something other than 2M. Per Liden has updated the pull request incrementally with two additional commits since the last revision: - Shorten line - Additional review comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5541/files - new: https://git.openjdk.java.net/jdk/pull/5541/files/d1a89d2b..1fd609ca Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5541&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5541&range=01-02 Stats: 6 lines in 2 files changed: 4 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/5541.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5541/head:pull/5541 PR: https://git.openjdk.java.net/jdk/pull/5541 From eosterlund at openjdk.java.net Thu Sep 16 12:31:29 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 16 Sep 2021 12:31:29 GMT Subject: RFR: 8273872: ZGC: Explicitly use 2M large pages [v3] In-Reply-To: References: Message-ID: On Thu, 16 Sep 2021 12:28:13 GMT, Per Liden wrote: >> ZGC requires large pages to be 2M. However, ZGC doesn't explicitly asks for this page size and instead relies on the default large pages size for the system to be 2M. On systems where this is not true, ZGC will fails with an error message. To avoid this, ZGC should explicitly ask for 2M large pages and not rely on the system default. Furthermore, ZGC currently ignores `-XX:LargePageSizeInBytes`. ZGC should fails with an error message if it's specified to something other than 2M. > > Per Liden has updated the pull request incrementally with two additional commits since the last revision: > > - Shorten line > - Additional review comments Marked as reviewed by eosterlund (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5541 From pliden at openjdk.java.net Thu Sep 16 12:41:25 2021 From: pliden at openjdk.java.net (Per Liden) Date: Thu, 16 Sep 2021 12:41:25 GMT Subject: RFR: 8273872: ZGC: Explicitly use 2M large pages [v4] In-Reply-To: References: Message-ID: > ZGC requires large pages to be 2M. However, ZGC doesn't explicitly asks for this page size and instead relies on the default large pages size for the system to be 2M. On systems where this is not true, ZGC will fails with an error message. To avoid this, ZGC should explicitly ask for 2M large pages and not rely on the system default. Furthermore, ZGC currently ignores `-XX:LargePageSizeInBytes`. ZGC should fails with an error message if it's specified to something other than 2M. Per Liden has updated the pull request incrementally with one additional commit since the last revision: Fix comment ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5541/files - new: https://git.openjdk.java.net/jdk/pull/5541/files/1fd609ca..a76baace Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5541&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5541&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5541.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5541/head:pull/5541 PR: https://git.openjdk.java.net/jdk/pull/5541 From stefank at openjdk.java.net Thu Sep 16 12:41:26 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Thu, 16 Sep 2021 12:41:26 GMT Subject: RFR: 8273872: ZGC: Explicitly use 2M large pages [v4] In-Reply-To: References: Message-ID: On Thu, 16 Sep 2021 12:37:39 GMT, Per Liden wrote: >> ZGC requires large pages to be 2M. However, ZGC doesn't explicitly asks for this page size and instead relies on the default large pages size for the system to be 2M. On systems where this is not true, ZGC will fails with an error message. To avoid this, ZGC should explicitly ask for 2M large pages and not rely on the system default. Furthermore, ZGC currently ignores `-XX:LargePageSizeInBytes`. ZGC should fails with an error message if it's specified to something other than 2M. > > Per Liden has updated the pull request incrementally with one additional commit since the last revision: > > Fix comment Looks good ------------- Marked as reviewed by stefank (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5541 From smonteith at openjdk.java.net Thu Sep 16 13:16:50 2021 From: smonteith at openjdk.java.net (Stuart Monteith) Date: Thu, 16 Sep 2021 13:16:50 GMT Subject: RFR: 8273872: ZGC: Explicitly use 2M large pages [v4] In-Reply-To: References: Message-ID: On Thu, 16 Sep 2021 12:41:25 GMT, Per Liden wrote: >> ZGC requires large pages to be 2M. However, ZGC doesn't explicitly asks for this page size and instead relies on the default large pages size for the system to be 2M. On systems where this is not true, ZGC will fails with an error message. To avoid this, ZGC should explicitly ask for 2M large pages and not rely on the system default. Furthermore, ZGC currently ignores `-XX:LargePageSizeInBytes`. ZGC should fails with an error message if it's specified to something other than 2M. > > Per Liden has updated the pull request incrementally with one additional commit since the last revision: > > Fix comment Make sense, looks good to me. ------------- PR: https://git.openjdk.java.net/jdk/pull/5541 From tschatzl at openjdk.java.net Thu Sep 16 13:51:46 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 16 Sep 2021 13:51:46 GMT Subject: RFR: 8273872: ZGC: Explicitly use 2M large pages [v4] In-Reply-To: References: Message-ID: On Thu, 16 Sep 2021 09:35:22 GMT, Per Liden wrote: >> src/hotspot/os/linux/gc/z/zPhysicalMemoryBacking_linux.cpp line 210: >> >>> 208: >>> 209: // Create file >>> 210: const int extra_flags = ZLargePages::is_explicit() ? (MFD_HUGETLB | MFD_HUGE_2MB) : 0; >> >> Potentially the use of the constant `MFD_HUGE_2MB` could be generalized a little and calculated from a required page size like we do in `Linux::commit_memory_special` via a helper function like `os::Linux::hugetlbfs_page_size_flag`; at least that extra flag just looks like it is actually generated the same way. Looking through `memfd.h` it is *exactly* the same as for the corresponding `HUGETLB_*` flags. >> But since this is really ZGC specific code, up to you. > > I don't think we should be passing `MAP_*` flags to `memfd_create()` as there's no guarantee that `MAP_HUGE_2MB` and `MFD_HUGE_2MB` are the same values. It's more of a lucky/convenient implementation detail that they happen to be the same. The manpages explicitly mention that the calculation is the same: > MFD_HUGE_2MB, MFD_HUGE_1GB, ... > Used in conjunction with MFD_HUGETLB to select alternative > hugetlb page sizes (respectively, 2 MB, 1 GB, ...) on > systems that support multiple hugetlb page sizes. > Definitions for known huge page sizes are included in the > header file . > > For details on encoding huge page sizes not included in > the header file, see the discussion of the similarly named > constants in mmap(2). I.e. this is specified that way. ------------- PR: https://git.openjdk.java.net/jdk/pull/5541 From simonis at openjdk.java.net Thu Sep 16 16:19:02 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 16 Sep 2021 16:19:02 GMT Subject: RFR: JDK-8273902: Memory leak in OopStorage due to bug in OopHandle::release() Message-ID: Currently, `OopHandle::release()` is implemented as follows: inline void OopHandle::release(OopStorage* storage) { if (peek() != NULL) { // Clear the OopHandle first NativeAccess<>::oop_store(_obj, (oop)NULL); storage->release(_obj); } } However, peek() returns NULL not only if the oop* `_obj` is NULL, but also when `_obj` points to a zero oop. In the latter case, the oop* `_obj` will not be released from the corresponding OopStorage and the slot it occupies will remain alive forever. This behavior can be easily triggered with the `LeakTestMinimal.java` test which is attached to the [JBS issue](https://bugs.openjdk.java.net/browse/JDK-8273902)(thanks to Oli Gillespie from the Amazon Profiler team for detecting the issue and providing a reproducer). This fix should probably also be downported to jdk17 as quickly as possible. ------------- Commit messages: - JDK-8273902: Memory leak in OopStorage due to bug in OopHandle::release() Changes: https://git.openjdk.java.net/jdk/pull/5549/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5549&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273902 Stats: 4 lines in 2 files changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/5549.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5549/head:pull/5549 PR: https://git.openjdk.java.net/jdk/pull/5549 From simonis at openjdk.java.net Thu Sep 16 16:49:46 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 16 Sep 2021 16:49:46 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow In-Reply-To: References: Message-ID: On Thu, 16 Sep 2021 07:59:08 GMT, Martin Doerr wrote: >> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. >> >> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): >> >> public static boolean isAlpha(int c) { >> try { >> return IS_ALPHA[c]; >> } catch (ArrayIndexOutOfBoundsException ex) { >> return false; >> } >> } >> >> >> ### Solution >> >> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: >> >> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op >> >> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op >> >> >> ### Implementation details >> >> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. >> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. >> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. >> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. >> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. > > src/hotspot/share/ci/ciEnv.cpp line 375: > >> 373: // ------------------------------------------------------------------ >> 374: // helper for -XX:+OptimizeImplicitExceptions >> 375: ciInstanceKlass* ciEnv::exception_instanceKlass_for_reason(Deoptimization::DeoptReason reason, bool aastore) { > > Better `is_aastore` or pass Bytecode? I didn't wanted to unnecessarily include `interpreter/bytecodes.hpp` into `ciEnv.hpp`. But `is_aastore` is a good suggestion. Changed as suggested. > src/hotspot/share/opto/graphKit.cpp line 631: > >> 629: Node* ex_node = new_instance(makecon(ex_type), NULL, NULL, true); >> 630: set_argument(0, ex_node); >> 631: ciMethod* init = ex_ciInstKlass->find_method(ciSymbol::make(""), ciSymbol::make("()V")); > > Extra whitespace. Fixed. > src/hotspot/share/runtime/globals.hpp line 645: > >> 643: "Omit backtraces for some 'hot' exceptions in optimized code") \ >> 644: \ >> 645: product(bool, OptimizeImplicitExceptions, true, \ > > Should it be a diagnostic flag? Regular product flags require a CSR. Good point. Changed as suggested. ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From simonis at openjdk.java.net Thu Sep 16 16:53:47 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 16 Sep 2021 16:53:47 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow In-Reply-To: References: Message-ID: <60TYPMSYj8Dpx7vT30IbkRxsiImj8eXKCYG0IEIgH-4=.5b3b97e2-0d60-456f-b4b1-a90f1efd91d3@github.com> On Thu, 16 Sep 2021 08:06:28 GMT, Martin Doerr wrote: >> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. >> >> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): >> >> public static boolean isAlpha(int c) { >> try { >> return IS_ALPHA[c]; >> } catch (ArrayIndexOutOfBoundsException ex) { >> return false; >> } >> } >> >> >> ### Solution >> >> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: >> >> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op >> >> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op >> >> >> ### Implementation details >> >> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. >> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. >> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. >> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. >> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. > > src/hotspot/share/runtime/sharedRuntime.cpp line 1096: > >> 1094: bc = bytecode.invoke_code(); >> 1095: } >> 1096: else { > > Coding style: newline before `else` Fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From simonis at openjdk.java.net Thu Sep 16 17:00:20 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 16 Sep 2021 17:00:20 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v2] In-Reply-To: References: Message-ID: > Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. > > If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): > > public static boolean isAlpha(int c) { > try { > return IS_ALPHA[c]; > } catch (ArrayIndexOutOfBoundsException ex) { > return false; > } > } > > > ### Solution > > Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: > > -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op > ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op > ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op > ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op > > -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op > ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op > ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op > ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op > > > ### Implementation details > > - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. > - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. > - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. > - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. > - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: Minor updates as requested by @TheRealMDoerr ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5488/files - new: https://git.openjdk.java.net/jdk/pull/5488/files/0558c3e1..f14338a7 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5488&range=00-01 Stats: 7 lines in 5 files changed: 0 ins; 1 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/5488.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5488/head:pull/5488 PR: https://git.openjdk.java.net/jdk/pull/5488 From simonis at openjdk.java.net Thu Sep 16 17:00:22 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 16 Sep 2021 17:00:22 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow In-Reply-To: References: Message-ID: <_dei2-IYvM72RVRyk6GfMR8fdwviFvWnQJMzhP3IRtI=.be093912-5b40-4ac2-801e-416529033f50@github.com> On Mon, 13 Sep 2021 10:05:16 GMT, Volker Simonis wrote: > Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. > > If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): > > public static boolean isAlpha(int c) { > try { > return IS_ALPHA[c]; > } catch (ArrayIndexOutOfBoundsException ex) { > return false; > } > } > > > ### Solution > > Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: > > -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op > ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op > ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op > ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op > > -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions > Benchmark (exceptionProbability) Mode Cnt Score Error Units > ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op > ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op > ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op > ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op > > > ### Implementation details > > - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. > - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. > - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. > - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. > - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. Hi Martin, thanks a lot for looking at my change. I've applied all your suggestions to my PR. With best regards, Volker ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From sspitsyn at openjdk.java.net Thu Sep 16 18:53:45 2021 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Thu, 16 Sep 2021 18:53:45 GMT Subject: RFR: JDK-8273902: Memory leak in OopStorage due to bug in OopHandle::release() In-Reply-To: References: Message-ID: On Thu, 16 Sep 2021 16:08:39 GMT, Volker Simonis wrote: > Currently, `OopHandle::release()` is implemented as follows: > > inline void OopHandle::release(OopStorage* storage) { > if (peek() != NULL) { > // Clear the OopHandle first > NativeAccess<>::oop_store(_obj, (oop)NULL); > storage->release(_obj); > } > } > > However, peek() returns NULL not only if the oop* `_obj` is NULL, but also when `_obj` points to a zero oop. In the latter case, the oop* `_obj` will not be released from the corresponding OopStorage and the slot it occupies will remain alive forever. > > This behavior can be easily triggered with the `LeakTestMinimal.java` test which is attached to the [JBS issue](https://bugs.openjdk.java.net/browse/JDK-8273902)(thanks to Oli Gillespie from the Amazon Profiler team for detecting the issue and providing a reproducer). > > This fix should probably also be downported to jdk17 as quickly as possible. Hi Volker, Nice discovery! LGTM Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5549 From coleenp at openjdk.java.net Thu Sep 16 19:30:44 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 16 Sep 2021 19:30:44 GMT Subject: RFR: JDK-8273902: Memory leak in OopStorage due to bug in OopHandle::release() In-Reply-To: References: Message-ID: <2koZAIX-NnDXzx-X4QBixrCPsePTl5KmvRVu8ogiAYI=.c7894f02-da01-4ce8-9db2-159542f96a9b@github.com> On Thu, 16 Sep 2021 16:08:39 GMT, Volker Simonis wrote: > Currently, `OopHandle::release()` is implemented as follows: > > inline void OopHandle::release(OopStorage* storage) { > if (peek() != NULL) { > // Clear the OopHandle first > NativeAccess<>::oop_store(_obj, (oop)NULL); > storage->release(_obj); > } > } > > However, peek() returns NULL not only if the oop* `_obj` is NULL, but also when `_obj` points to a zero oop. In the latter case, the oop* `_obj` will not be released from the corresponding OopStorage and the slot it occupies will remain alive forever. > > This behavior can be easily triggered with the `LeakTestMinimal.java` test which is attached to the [JBS issue](https://bugs.openjdk.java.net/browse/JDK-8273902)(thanks to Oli Gillespie from the Amazon Profiler team for detecting the issue and providing a reproducer). > > This fix should probably also be downported to jdk17 as quickly as possible. Yes, please backport. Thank you for fixing this. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5549 From simonis at openjdk.java.net Thu Sep 16 19:59:49 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 16 Sep 2021 19:59:49 GMT Subject: RFR: JDK-8273902: Memory leak in OopStorage due to bug in OopHandle::release() In-Reply-To: <2koZAIX-NnDXzx-X4QBixrCPsePTl5KmvRVu8ogiAYI=.c7894f02-da01-4ce8-9db2-159542f96a9b@github.com> References: <2koZAIX-NnDXzx-X4QBixrCPsePTl5KmvRVu8ogiAYI=.c7894f02-da01-4ce8-9db2-159542f96a9b@github.com> Message-ID: On Thu, 16 Sep 2021 19:27:40 GMT, Coleen Phillimore wrote: >> Currently, `OopHandle::release()` is implemented as follows: >> >> inline void OopHandle::release(OopStorage* storage) { >> if (peek() != NULL) { >> // Clear the OopHandle first >> NativeAccess<>::oop_store(_obj, (oop)NULL); >> storage->release(_obj); >> } >> } >> >> However, peek() returns NULL not only if the oop* `_obj` is NULL, but also when `_obj` points to a zero oop. In the latter case, the oop* `_obj` will not be released from the corresponding OopStorage and the slot it occupies will remain alive forever. >> >> This behavior can be easily triggered with the `LeakTestMinimal.java` test which is attached to the [JBS issue](https://bugs.openjdk.java.net/browse/JDK-8273902)(thanks to Oli Gillespie from the Amazon Profiler team for detecting the issue and providing a reproducer). >> >> This fix should probably also be downported to jdk17 as quickly as possible. > > Yes, please backport. Thank you for fixing this. @coleenp, @sspitsyn thanks for the quick review! ------------- PR: https://git.openjdk.java.net/jdk/pull/5549 From simonis at openjdk.java.net Thu Sep 16 19:59:50 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 16 Sep 2021 19:59:50 GMT Subject: Integrated: JDK-8273902: Memory leak in OopStorage due to bug in OopHandle::release() In-Reply-To: References: Message-ID: <3UjVsGwEAnekY5DvMO0_Ue2hWKQ_eRnwyUYZ4TwxSQs=.10eea3ee-13de-461b-9fdf-14b0d9cd48de@github.com> On Thu, 16 Sep 2021 16:08:39 GMT, Volker Simonis wrote: > Currently, `OopHandle::release()` is implemented as follows: > > inline void OopHandle::release(OopStorage* storage) { > if (peek() != NULL) { > // Clear the OopHandle first > NativeAccess<>::oop_store(_obj, (oop)NULL); > storage->release(_obj); > } > } > > However, peek() returns NULL not only if the oop* `_obj` is NULL, but also when `_obj` points to a zero oop. In the latter case, the oop* `_obj` will not be released from the corresponding OopStorage and the slot it occupies will remain alive forever. > > This behavior can be easily triggered with the `LeakTestMinimal.java` test which is attached to the [JBS issue](https://bugs.openjdk.java.net/browse/JDK-8273902)(thanks to Oli Gillespie from the Amazon Profiler team for detecting the issue and providing a reproducer). > > This fix should probably also be downported to jdk17 as quickly as possible. This pull request has now been integrated. Changeset: bc48a0ac Author: Volker Simonis URL: https://git.openjdk.java.net/jdk/commit/bc48a0ac297b99a997482dcb59f85acc1cdb0c47 Stats: 4 lines in 2 files changed: 0 ins; 2 del; 2 mod 8273902: Memory leak in OopStorage due to bug in OopHandle::release() Reviewed-by: sspitsyn, coleenp ------------- PR: https://git.openjdk.java.net/jdk/pull/5549 From svkamath at openjdk.java.net Thu Sep 16 20:46:49 2021 From: svkamath at openjdk.java.net (Smita Kamath) Date: Thu, 16 Sep 2021 20:46:49 GMT Subject: RFR: 8273297: AES/GCM non-AVX512+VAES CPUs suffer after 8267125 In-Reply-To: References: Message-ID: On Mon, 13 Sep 2021 12:50:12 GMT, Andrew Haley wrote: >> src/hotspot/share/opto/library_call.cpp line 6796: >> >>> 6794: >>> 6795: Node* avx512_subkeyHtbl = new_array(klass_node, intcon(96), 0); >>> 6796: if (avx512_subkeyHtbl == NULL) return false; >> >> This looks very Intel-specific, but it's in generic code. Please make this constant 96 a symbol and push it into a header file in the x86 back end. > > Likewise, the name prefix "avx512_" isn't appropriate for code that will certainly be used by other targets. I'll modify the code. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/5402 From svkamath at openjdk.java.net Thu Sep 16 20:55:48 2021 From: svkamath at openjdk.java.net (Smita Kamath) Date: Thu, 16 Sep 2021 20:55:48 GMT Subject: RFR: 8273297: AES/GCM non-AVX512+VAES CPUs suffer after 8267125 In-Reply-To: References: Message-ID: On Tue, 14 Sep 2021 13:31:19 GMT, Andrew Haley wrote: >> Performance dropped up to 10% for 1k data after 8267125 for CPUs that do not support the new intrinsic. Tests run were crypto.full.AESGCMBench and crypto.full.AESGCMByteBuffer from the jmh micro benchmarks. >> >> The problem is each instance of GHASH allocates 96 extra longs for the AVX512+VAES intrinsic regardless if the intrinsic is used. This extra table space should be allocated differently so that non-supporting CPUs do not suffer this penalty. This issue also affects non-Intel CPUs too. > > It seems to me there's a serious problem here. When you execute the galoisCounterMode_AESCrypt() intrinsic, I don't think there's a limit on the number of blocks to be encrypted. With the older intrinsic things are not so very bad because the incoming data is split into 6 segments. But if we use this intrinsic, there is no safepoint check in the inner loop, which can lead to a long time to safepoint, and this causes stalls on the other threads. > If you split the incoming data into blocks of about a megabyte you'd lose no measurable performance but you'd dramatically improve the performance of everything else, especially with a concurrent GC. @theRealAph Thank you for the comment above. I will look into this issue. ------------- PR: https://git.openjdk.java.net/jdk/pull/5402 From minqi at openjdk.java.net Thu Sep 16 22:44:45 2021 From: minqi at openjdk.java.net (Yumin Qi) Date: Thu, 16 Sep 2021 22:44:45 GMT Subject: RFR: 8271073: Improve testing with VM option VerifyArchivedFields [v3] In-Reply-To: References: <2UUL97l_3iFjZBfTVxhvWeFZIPMZFqrD568mK-036VE=.90f17c84-eb88-4f8a-99ad-ddf520259638@github.com> Message-ID: On Wed, 15 Sep 2021 21:18:37 GMT, Ioi Lam wrote: >> - Changed the definition of `VerifyArchivedFields` from a whacky use of `bool` to an `int` and properly define its three levels: >> - 0: No verification >> - 1: Basic verification with VM_Verify (no side effects) >> - 2: Detailed verification by forcing a GC (with side effects) >> - Changed the default value to 0. The functionality checked by this flag has been very stable so there's no need to verify it in every single test case. >> - Enabled `-XX:VerifyArchivedFields=1` for all CDS test cases. >> - Added a new test case for `-XX:VerifyArchivedFields=2` . >> - Also added comments about that this flag is suppose to check for. > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > added range(0,2) for VerifyArchivedFields LGTM. ------------- Marked as reviewed by minqi (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5514 From iklam at openjdk.java.net Thu Sep 16 23:30:51 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 16 Sep 2021 23:30:51 GMT Subject: RFR: 8271073: Improve testing with VM option VerifyArchivedFields [v3] In-Reply-To: References: <2UUL97l_3iFjZBfTVxhvWeFZIPMZFqrD568mK-036VE=.90f17c84-eb88-4f8a-99ad-ddf520259638@github.com> Message-ID: On Wed, 15 Sep 2021 21:33:46 GMT, Calvin Cheung wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> added range(0,2) for VerifyArchivedFields > > Marked as reviewed by ccheung (Reviewer). Thanks @calvinccheung and @yminqi for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/5514 From iklam at openjdk.java.net Thu Sep 16 23:30:53 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 16 Sep 2021 23:30:53 GMT Subject: Integrated: 8271073: Improve testing with VM option VerifyArchivedFields In-Reply-To: <2UUL97l_3iFjZBfTVxhvWeFZIPMZFqrD568mK-036VE=.90f17c84-eb88-4f8a-99ad-ddf520259638@github.com> References: <2UUL97l_3iFjZBfTVxhvWeFZIPMZFqrD568mK-036VE=.90f17c84-eb88-4f8a-99ad-ddf520259638@github.com> Message-ID: On Tue, 14 Sep 2021 22:16:56 GMT, Ioi Lam wrote: > - Changed the definition of `VerifyArchivedFields` from a whacky use of `bool` to an `int` and properly define its three levels: > - 0: No verification > - 1: Basic verification with VM_Verify (no side effects) > - 2: Detailed verification by forcing a GC (with side effects) > - Changed the default value to 0. The functionality checked by this flag has been very stable so there's no need to verify it in every single test case. > - Enabled `-XX:VerifyArchivedFields=1` for all CDS test cases. > - Added a new test case for `-XX:VerifyArchivedFields=2` . > - Also added comments about that this flag is suppose to check for. This pull request has now been integrated. Changeset: b9829044 Author: Ioi Lam URL: https://git.openjdk.java.net/jdk/commit/b98290444a4594d0164d6f313c506287282d1929 Stats: 86 lines in 5 files changed: 73 ins; 0 del; 13 mod 8271073: Improve testing with VM option VerifyArchivedFields Reviewed-by: ccheung, minqi ------------- PR: https://git.openjdk.java.net/jdk/pull/5514 From david.holmes at oracle.com Thu Sep 16 23:38:49 2021 From: david.holmes at oracle.com (David Holmes) Date: Fri, 17 Sep 2021 09:38:49 +1000 Subject: RFR: JDK-8273902: Memory leak in OopStorage due to bug in OopHandle::release() In-Reply-To: References: <2koZAIX-NnDXzx-X4QBixrCPsePTl5KmvRVu8ogiAYI=.c7894f02-da01-4ce8-9db2-159542f96a9b@github.com> Message-ID: <1647b55a-8acd-15fe-1df9-372e1ffe6d7a@oracle.com> Hi Volker, Please note that non-trivial fixes should wait ~24hrs before integration to ensure a range of folk have an opportunity to comment. Thanks, David On 17/09/2021 5:59 am, Volker Simonis wrote: > On Thu, 16 Sep 2021 19:27:40 GMT, Coleen Phillimore wrote: > >>> Currently, `OopHandle::release()` is implemented as follows: >>> >>> inline void OopHandle::release(OopStorage* storage) { >>> if (peek() != NULL) { >>> // Clear the OopHandle first >>> NativeAccess<>::oop_store(_obj, (oop)NULL); >>> storage->release(_obj); >>> } >>> } >>> >>> However, peek() returns NULL not only if the oop* `_obj` is NULL, but also when `_obj` points to a zero oop. In the latter case, the oop* `_obj` will not be released from the corresponding OopStorage and the slot it occupies will remain alive forever. >>> >>> This behavior can be easily triggered with the `LeakTestMinimal.java` test which is attached to the [JBS issue](https://bugs.openjdk.java.net/browse/JDK-8273902)(thanks to Oli Gillespie from the Amazon Profiler team for detecting the issue and providing a reproducer). >>> >>> This fix should probably also be downported to jdk17 as quickly as possible. >> >> Yes, please backport. Thank you for fixing this. > > @coleenp, @sspitsyn thanks for the quick review! > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/5549 > From dholmes at openjdk.java.net Fri Sep 17 01:18:42 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Fri, 17 Sep 2021 01:18:42 GMT Subject: RFR: 8273880: Zero: Print warnings when unsupported intrinsics are enabled In-Reply-To: References: Message-ID: On Thu, 16 Sep 2021 09:41:34 GMT, Aleksey Shipilev wrote: > At least one test is currently failing: > > > $ CONF=linux-x86_64-zero-fastdebug make exploded-test TEST=compiler/cpuflags/TestAESIntrinsicsOnUnsupportedConfig.java > ... > # java.lang.AssertionError: Expected message not found: 'warning: AES instructions are not available on this CPU'. > > > Zero should print warnings when unsupported (all) intrinsics are enabled. Seems fine. Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5545 From shade at openjdk.java.net Fri Sep 17 06:47:46 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 17 Sep 2021 06:47:46 GMT Subject: RFR: 8273880: Zero: Print warnings when unsupported intrinsics are enabled In-Reply-To: References: Message-ID: <1DDa57jBAPSTnwHpOgdWWClsBNJlXnp8e-3XfkKSb6s=.0c980908-2423-4f58-af1a-60a814394950@github.com> On Thu, 16 Sep 2021 09:41:34 GMT, Aleksey Shipilev wrote: > At least one test is currently failing: > > > $ CONF=linux-x86_64-zero-fastdebug make exploded-test TEST=compiler/cpuflags/TestAESIntrinsicsOnUnsupportedConfig.java > ... > # java.lang.AssertionError: Expected message not found: 'warning: AES instructions are not available on this CPU'. > > > Zero should print warnings when unsupported (all) intrinsics are enabled. Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/5545 From shade at openjdk.java.net Fri Sep 17 06:47:47 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 17 Sep 2021 06:47:47 GMT Subject: Integrated: 8273880: Zero: Print warnings when unsupported intrinsics are enabled In-Reply-To: References: Message-ID: On Thu, 16 Sep 2021 09:41:34 GMT, Aleksey Shipilev wrote: > At least one test is currently failing: > > > $ CONF=linux-x86_64-zero-fastdebug make exploded-test TEST=compiler/cpuflags/TestAESIntrinsicsOnUnsupportedConfig.java > ... > # java.lang.AssertionError: Expected message not found: 'warning: AES instructions are not available on this CPU'. > > > Zero should print warnings when unsupported (all) intrinsics are enabled. This pull request has now been integrated. Changeset: 54b45676 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/54b456764bedb53adb7ae3c25f64d44dd8322ada Stats: 65 lines in 1 file changed: 65 ins; 0 del; 0 mod 8273880: Zero: Print warnings when unsupported intrinsics are enabled Reviewed-by: dholmes ------------- PR: https://git.openjdk.java.net/jdk/pull/5545 From njian at openjdk.java.net Fri Sep 17 06:53:06 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Fri, 17 Sep 2021 06:53:06 GMT Subject: RFR: 8267356: AArch64: Vector API SVE codegen support [v7] In-Reply-To: <04_lDZDCcLLfXx6XmrbVdGXkPmqNrYzQuHBJzQ9Oa5k=.f3ef1398-c8b4-4162-9631-3dbfb6594250@github.com> References: <04_lDZDCcLLfXx6XmrbVdGXkPmqNrYzQuHBJzQ9Oa5k=.f3ef1398-c8b4-4162-9631-3dbfb6594250@github.com> Message-ID: > This is the integration of current SVE work done in panama-vector/vectorIntrinscs, which includes: > > 1. Code generation for Vector API c2 IR nodes with SVE. > 2. Non-max vector size support with SVE, e.g. using *128Vector (and *64Vector) APIs on 256-bit SVE environment could also generate optimized SVE instructions with predicate feature. > 3. Some more SVE assemblers (and tests) used by the codegen part. > > Note: VectorMask is still represented in vector register, a further improvement to map mask to predicate register is under development at https://github.com/openjdk/panama-vector/tree/vectorIntrinsics+mask > > > Test: tier1-3 with vector api test cases passed on 512-bit SVE hardware with MaxVectorSize=16/32/64. Ningsheng Jian has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Merge with master - Merge with master - More comments from Andrew. - Add missing part - Address Andrew's comments - 8267356: AArch64: Vector API SVE codegen support This is the integration of current SVE work done in panama-vector/vectorIntrinscs, which includes: 1. Code generation for Vector API c2 IR nodes with SVE. 2. Non-max vector size support with SVE, e.g. using *128Vector APIs on 256-bit SVE environment could also generate optimized SVE instructions with predicate feature. 3. Some more SVE assemblers (and tests) used by the codegen part. Note: VectorMask is still represented in vector register, a further improvement to map mask to predicate register is under development at https://github.com/openjdk/panama-vector/tree/vectorIntrinsics+mask Test: tier1-3 with vector api test cases passed on 512-bit SVE hardware with MaxVectorSize=16/32/64. ------------- Changes: https://git.openjdk.java.net/jdk/pull/4122/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=4122&range=06 Stats: 5761 lines in 13 files changed: 4576 ins; 195 del; 990 mod Patch: https://git.openjdk.java.net/jdk/pull/4122.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4122/head:pull/4122 PR: https://git.openjdk.java.net/jdk/pull/4122 From shade at openjdk.java.net Fri Sep 17 06:59:09 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 17 Sep 2021 06:59:09 GMT Subject: RFR: 8273314: Add tier4 test groups [v4] In-Reply-To: <6rDioD5KZREYHyzOXol72O0erNDqKHUqb-0k1SwhInA=.90a12492-9e2f-4f0c-bf09-8d67d1d6111f@github.com> References: <6rDioD5KZREYHyzOXol72O0erNDqKHUqb-0k1SwhInA=.90a12492-9e2f-4f0c-bf09-8d67d1d6111f@github.com> Message-ID: > During the review of JDK-8272914 that added hotspot:tier{2,3} groups, @iignatev suggested to create tier4 groups that capture all tests not in tiers{1,2,3}. > > Caveats: > - I excluded `applications` from `hotspot:tier4`, because they require test dependencies (e.g. jcstress). > - `jdk:tier4` only runs well with `JTREG_KEYWORDS=!headful` or reduced concurrency with `TEST_JOBS=1`, because headful tests cannot run in parallel > > Sample run with `JTREG_KEYWORDS=!headful`: > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR >>> jtreg:test/hotspot/jtreg:tier4 3585 3584 0 1 << >>> jtreg:test/jdk:tier4 2893 2887 5 1 << > jtreg:test/langtools:tier4 0 0 0 0 > jtreg:test/jaxp:tier4 0 0 0 0 > ============================== > > real 699m39.462s > user 6626m8.448s > sys 1110m43.704s > > > There are interesting test failures on my machine, which I would address separately. Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into JDK-8273314-tier4 - Merge branch 'master' into JDK-8273314-tier4 - Drop applications and fix the comment - Drop exceptions - Add tier4 test groups ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5357/files - new: https://git.openjdk.java.net/jdk/pull/5357/files/160c13c7..a5115a8d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5357&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5357&range=02-03 Stats: 14580 lines in 682 files changed: 9082 ins; 3191 del; 2307 mod Patch: https://git.openjdk.java.net/jdk/pull/5357.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5357/head:pull/5357 PR: https://git.openjdk.java.net/jdk/pull/5357 From shade at openjdk.java.net Fri Sep 17 06:59:10 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 17 Sep 2021 06:59:10 GMT Subject: RFR: 8273314: Add tier4 test groups [v3] In-Reply-To: References: <6rDioD5KZREYHyzOXol72O0erNDqKHUqb-0k1SwhInA=.90a12492-9e2f-4f0c-bf09-8d67d1d6111f@github.com> Message-ID: On Mon, 6 Sep 2021 13:22:03 GMT, Aleksey Shipilev wrote: >> During the review of JDK-8272914 that added hotspot:tier{2,3} groups, @iignatev suggested to create tier4 groups that capture all tests not in tiers{1,2,3}. >> >> Caveats: >> - I excluded `applications` from `hotspot:tier4`, because they require test dependencies (e.g. jcstress). >> - `jdk:tier4` only runs well with `JTREG_KEYWORDS=!headful` or reduced concurrency with `TEST_JOBS=1`, because headful tests cannot run in parallel >> >> Sample run with `JTREG_KEYWORDS=!headful`: >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >>>> jtreg:test/hotspot/jtreg:tier4 3585 3584 0 1 << >>>> jtreg:test/jdk:tier4 2893 2887 5 1 << >> jtreg:test/langtools:tier4 0 0 0 0 >> jtreg:test/jaxp:tier4 0 0 0 0 >> ============================== >> >> real 699m39.462s >> user 6626m8.448s >> sys 1110m43.704s >> >> >> There are interesting test failures on my machine, which I would address separately. > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Drop applications and fix the comment All right, I am convinced that current patch is as good as it gets. GUI tests still do not run well with default parallelism, but I see no reason to block this integration before that is resolved. Either run `tier4` in headless mode, or limit the parallelism. @iignatev, @mrserb, @dholmes-ora -- are you good with this? ------------- PR: https://git.openjdk.java.net/jdk/pull/5357 From serb at openjdk.java.net Fri Sep 17 07:17:48 2021 From: serb at openjdk.java.net (Sergey Bylokhov) Date: Fri, 17 Sep 2021 07:17:48 GMT Subject: RFR: 8273314: Add tier4 test groups [v4] In-Reply-To: References: <6rDioD5KZREYHyzOXol72O0erNDqKHUqb-0k1SwhInA=.90a12492-9e2f-4f0c-bf09-8d67d1d6111f@github.com> Message-ID: <5Lc2i9bPCfmNRnco-B7Ru5VhREqKDbBrnjTS0YcVo8o=.950bb2c9-76a1-4278-855e-7d506686715c@github.com> On Fri, 17 Sep 2021 06:59:09 GMT, Aleksey Shipilev wrote: >> During the review of JDK-8272914 that added hotspot:tier{2,3} groups, @iignatev suggested to create tier4 groups that capture all tests not in tiers{1,2,3}. >> >> Caveats: >> - I excluded `applications` from `hotspot:tier4`, because they require test dependencies (e.g. jcstress). >> - `jdk:tier4` only runs well with `JTREG_KEYWORDS=!headful` or reduced concurrency with `TEST_JOBS=1`, because headful tests cannot run in parallel >> >> Sample run with `JTREG_KEYWORDS=!headful`: >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >>>> jtreg:test/hotspot/jtreg:tier4 3585 3584 0 1 << >>>> jtreg:test/jdk:tier4 2893 2887 5 1 << >> jtreg:test/langtools:tier4 0 0 0 0 >> jtreg:test/jaxp:tier4 0 0 0 0 >> ============================== >> >> real 699m39.462s >> user 6626m8.448s >> sys 1110m43.704s >> >> >> There are interesting test failures on my machine, which I would address separately. > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into JDK-8273314-tier4 > - Merge branch 'master' into JDK-8273314-tier4 > - Drop applications and fix the comment > - Drop exceptions > - Add tier4 test groups It is fine to run headful and headless tests separately. ------------- Marked as reviewed by serb (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5357 From njian at openjdk.java.net Fri Sep 17 07:28:52 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Fri, 17 Sep 2021 07:28:52 GMT Subject: RFR: 8267356: AArch64: Vector API SVE codegen support [v7] In-Reply-To: References: <04_lDZDCcLLfXx6XmrbVdGXkPmqNrYzQuHBJzQ9Oa5k=.f3ef1398-c8b4-4162-9631-3dbfb6594250@github.com> Message-ID: On Fri, 17 Sep 2021 06:53:06 GMT, Ningsheng Jian wrote: >> This is the integration of current SVE work done in panama-vector/vectorIntrinscs, which includes: >> >> 1. Code generation for Vector API c2 IR nodes with SVE. >> 2. Non-max vector size support with SVE, e.g. using *128Vector (and *64Vector) APIs on 256-bit SVE environment could also generate optimized SVE instructions with predicate feature. >> 3. Some more SVE assemblers (and tests) used by the codegen part. >> >> Note: VectorMask is still represented in vector register, a further improvement to map mask to predicate register is under development at https://github.com/openjdk/panama-vector/tree/vectorIntrinsics+mask >> >> >> Test: tier1-3 with vector api test cases passed on 512-bit SVE hardware with MaxVectorSize=16/32/64. > > Ningsheng Jian has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge with master > - Merge with master > - More comments from Andrew. > - Add missing part > - Address Andrew's comments > - 8267356: AArch64: Vector API SVE codegen support > > This is the integration of current SVE work done in > panama-vector/vectorIntrinscs, which includes: > > 1. Code generation for Vector API c2 IR nodes with SVE. > 2. Non-max vector size support with SVE, e.g. using *128Vector APIs on > 256-bit SVE environment could also generate optimized SVE > instructions with predicate feature. > 3. Some more SVE assemblers (and tests) used by the codegen part. > > Note: VectorMask is still represented in vector register, a further > improvement to map mask to predicate register is under development at > https://github.com/openjdk/panama-vector/tree/vectorIntrinsics+mask > > Test: tier1-3 with vector api test cases passed on 512-bit SVE hardware > with MaxVectorSize=16/32/64. Merged with master and tested. Thanks to Andrew for the review! Can I get one more view? This is part of https://bugs.openjdk.java.net/browse/JDK-8271515, but can be integrated separately once the JEP has been targeted. ------------- PR: https://git.openjdk.java.net/jdk/pull/4122 From pliden at openjdk.java.net Fri Sep 17 07:50:47 2021 From: pliden at openjdk.java.net (Per Liden) Date: Fri, 17 Sep 2021 07:50:47 GMT Subject: RFR: 8273872: ZGC: Explicitly use 2M large pages [v4] In-Reply-To: References: Message-ID: On Thu, 16 Sep 2021 13:48:44 GMT, Thomas Schatzl wrote: >> I don't think we should be passing `MAP_*` flags to `memfd_create()` as there's no guarantee that `MAP_HUGE_2MB` and `MFD_HUGE_2MB` are the same values. It's more of a lucky/convenient implementation detail that they happen to be the same. > > The manpages explicitly mention that the calculation is the same: >> MFD_HUGE_2MB, MFD_HUGE_1GB, ... >> Used in conjunction with MFD_HUGETLB to select alternative >> hugetlb page sizes (respectively, 2 MB, 1 GB, ...) on >> systems that support multiple hugetlb page sizes. >> Definitions for known huge page sizes are included in the >> header file . >> >> For details on encoding huge page sizes not included in >> the header file, see the discussion of the similarly named >> constants in mmap(2). > > I.e. this is specified that way. Ok. I think I'll stick with an explicit `MFD_HUGE_2MB` since that's always what we want to use. ------------- PR: https://git.openjdk.java.net/jdk/pull/5541 From pliden at openjdk.java.net Fri Sep 17 07:54:48 2021 From: pliden at openjdk.java.net (Per Liden) Date: Fri, 17 Sep 2021 07:54:48 GMT Subject: RFR: 8273872: ZGC: Explicitly use 2M large pages [v4] In-Reply-To: References: Message-ID: On Thu, 16 Sep 2021 12:41:25 GMT, Per Liden wrote: >> ZGC requires large pages to be 2M. However, ZGC doesn't explicitly asks for this page size and instead relies on the default large pages size for the system to be 2M. On systems where this is not true, ZGC will fails with an error message. To avoid this, ZGC should explicitly ask for 2M large pages and not rely on the system default. Furthermore, ZGC currently ignores `-XX:LargePageSizeInBytes`. ZGC should fails with an error message if it's specified to something other than 2M. > > Per Liden has updated the pull request incrementally with one additional commit since the last revision: > > Fix comment Thanks all for reviewing! ------------- PR: https://git.openjdk.java.net/jdk/pull/5541 From pliden at openjdk.java.net Fri Sep 17 07:54:49 2021 From: pliden at openjdk.java.net (Per Liden) Date: Fri, 17 Sep 2021 07:54:49 GMT Subject: Integrated: 8273872: ZGC: Explicitly use 2M large pages In-Reply-To: References: Message-ID: <56VDglf--MHuoyem0L52cUI_IxebUw9CU09eg5NoHw8=.c3c2d842-6721-4274-b1ec-d022435568d7@github.com> On Thu, 16 Sep 2021 07:59:32 GMT, Per Liden wrote: > ZGC requires large pages to be 2M. However, ZGC doesn't explicitly asks for this page size and instead relies on the default large pages size for the system to be 2M. On systems where this is not true, ZGC will fails with an error message. To avoid this, ZGC should explicitly ask for 2M large pages and not rely on the system default. Furthermore, ZGC currently ignores `-XX:LargePageSizeInBytes`. ZGC should fails with an error message if it's specified to something other than 2M. This pull request has now been integrated. Changeset: 1890d85c Author: Per Liden URL: https://git.openjdk.java.net/jdk/commit/1890d85c0e647d3f890e3c7152f8cd2e60dfd826 Stats: 22 lines in 2 files changed: 13 ins; 6 del; 3 mod 8273872: ZGC: Explicitly use 2M large pages Reviewed-by: eosterlund, tschatzl, stefank ------------- PR: https://git.openjdk.java.net/jdk/pull/5541 From stefank at openjdk.java.net Fri Sep 17 08:38:06 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Fri, 17 Sep 2021 08:38:06 GMT Subject: RFR: 8273928: Use named run ids when problem listing tests Message-ID: Today when you have multiple jtreg run sections in a test, each run gets an automated id that match the location in the file. For example: Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id0 Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id1 Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id2 Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id3 The path to the test plus the id can be used to problem lists the test. Say that id3 matches a run done with ZGC, and we need to problem list this test with ZGC, then the problem list would contain: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id3 The problem is when someone adds a new run section before that, then all the ids will be shifted and #id3 doesn't correspond to the ZGC run anymore. A similar problem occurs if two run sections are swapped. I propose that we refrain from using the automatically generated ids when problem listing tests. Instead we add explicit ids like this: * @test id=Z An additional benefit of doing this is that it will be easier to see what was actually run: Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#G1 Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#Parallel Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#Serial Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#Z I've gone through the tests in the HotSpot problem lists + some that affects ZGC. There are probably more tests that would benefit from getting explicit ids, but I started with a small set to begin with. ------------- Commit messages: - 8273928: Use named run ids when problem listing tests Changes: https://git.openjdk.java.net/jdk/pull/5557/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5557&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273928 Stats: 102 lines in 17 files changed: 29 ins; 1 del; 72 mod Patch: https://git.openjdk.java.net/jdk/pull/5557.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5557/head:pull/5557 PR: https://git.openjdk.java.net/jdk/pull/5557 From stefank at openjdk.java.net Fri Sep 17 08:43:09 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Fri, 17 Sep 2021 08:43:09 GMT Subject: RFR: 8273928: Use named run ids when problem listing tests [v2] In-Reply-To: References: Message-ID: > Today when you have multiple jtreg run sections in a test, each run gets an automated id that match the location in the file. For example: > Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id0 > Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id1 > Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id2 > Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id3 > > The path to the test plus the id can be used to problem lists the test. Say that id3 matches a run done with ZGC, and we need to problem list this test with ZGC, then the problem list would contain: > gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id3 > > The problem is when someone adds a new run section before that, then all the ids will be shifted and #id3 doesn't correspond to the ZGC run anymore. A similar problem occurs if two run sections are swapped. > > I propose that we refrain from using the automatically generated ids when problem listing tests. Instead we add explicit ids like this: > * @test id=Z > > An additional benefit of doing this is that it will be easier to see what was actually run: > Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#G1 > Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#Parallel > Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#Serial > Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#Z > > I've gone through the tests in the HotSpot problem lists + some that affects ZGC. There are probably more tests that would benefit from getting explicit ids, but I started with a small set to begin with. Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: Remove temporary ProblemList testing ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5557/files - new: https://git.openjdk.java.net/jdk/pull/5557/files/91f638af..f82927db Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5557&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5557&range=00-01 Stats: 16 lines in 1 file changed: 0 ins; 16 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5557.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5557/head:pull/5557 PR: https://git.openjdk.java.net/jdk/pull/5557 From aph-open at littlepinkcloud.com Fri Sep 17 08:52:13 2021 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Fri, 17 Sep 2021 09:52:13 +0100 Subject: Intrinsic methods and time to safepoint In-Reply-To: <87czp8g0wu.fsf@redhat.com> References: <87czp8g0wu.fsf@redhat.com> Message-ID: On 9/16/21 1:28 PM, Roland Westrelin wrote: > >> I believe we should have a policy to cover how long an intrinsic can >> delay without responding to a safepoint, and that it should be in the >> millisecond range. It would make almost no difference to the >> performance of encryption if chunks handles by a fast intrinsic were, >> say, about a megabyte. The difference in performance is so small as to >> be immeasurable, and the improvement in the performance of other threads >> is vast. > > I agree with you (seems like a no brainer) but I have a couple comments > about implementation details. > > Those intrinsics usually call some stub. It's not possible AFAICT, to > have the safepoint in the stub itself. So we need some loop that > repeatedly calls the stub. That loop can either be added 1) by the JIT > as IR when the intrinsic is expanded 2) in java code, that is java > library code needs to be refactored. > > 2) would seem much easier to implement and would work for both c1 and c2 > (if some of these intrinsics end up implemented by c1). OK. I guess the problem is that the call to the stub doesn't have an oop map. The tricky cases seem to be in the crypto code, which is already rather fiddly. It's usually simple enough for intrinsics to return the amount of work left to do, I guess. > Also a note of caution about loop strip mining: it doesn't have a model > for what the loop body costs. So it blindly assumes all loop bodies can > be run for 1000 iterations (by default) between safepoints. Unless I'm > missing something, with a stub running for 1ms, delays between safepoint > could still be 1s. It'd be interesting to do the experiment. I might try that, just for grins. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From stefank at openjdk.java.net Fri Sep 17 08:54:16 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Fri, 17 Sep 2021 08:54:16 GMT Subject: RFR: 8273928: Use named run ids when problem listing tests [v3] In-Reply-To: References: Message-ID: > Today when you have multiple jtreg run sections in a test, each run gets an automated id that match the location in the file. For example: > Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id0 > Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id1 > Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id2 > Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id3 > > The path to the test plus the id can be used to problem lists the test. Say that id3 matches a run done with ZGC, and we need to problem list this test with ZGC, then the problem list would contain: > gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id3 > > The problem is when someone adds a new run section before that, then all the ids will be shifted and #id3 doesn't correspond to the ZGC run anymore. A similar problem occurs if two run sections are swapped. > > I propose that we refrain from using the automatically generated ids when problem listing tests. Instead we add explicit ids like this: > * @test id=Z > > An additional benefit of doing this is that it will be easier to see what was actually run: > Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#G1 > Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#Parallel > Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#Serial > Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#Z > > I've gone through the tests in the HotSpot problem lists + some that affects ZGC. There are probably more tests that would benefit from getting explicit ids, but I started with a small set to begin with. Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: Fix TestReferenceClearDuringReferenceProcessing ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5557/files - new: https://git.openjdk.java.net/jdk/pull/5557/files/f82927db..5129ed8e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5557&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5557&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5557.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5557/head:pull/5557 PR: https://git.openjdk.java.net/jdk/pull/5557 From rwestrel at redhat.com Fri Sep 17 09:13:22 2021 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 17 Sep 2021 11:13:22 +0200 Subject: Intrinsic methods and time to safepoint In-Reply-To: References: <87czp8g0wu.fsf@redhat.com> Message-ID: <87a6kbftu5.fsf@redhat.com> > OK. I guess the problem is that the call to the stub doesn't have an oop > map. Assuming you can get the oop map in stub, deoptimization could still be delayed until the stub is exited. Is that considered a problem? Actually I'm not even sure the loop synthesized from IR nodes is feasible: what would the JVM state at the safepoint be given that there's no actual java code for it? What would you deoptimize to if you were to deoptimize? Roland. From pliden at openjdk.java.net Fri Sep 17 09:19:43 2021 From: pliden at openjdk.java.net (Per Liden) Date: Fri, 17 Sep 2021 09:19:43 GMT Subject: RFR: 8273928: Use named run ids when problem listing tests [v3] In-Reply-To: References: Message-ID: On Fri, 17 Sep 2021 08:54:16 GMT, Stefan Karlsson wrote: >> Today when you have multiple jtreg run sections in a test, each run gets an automated id that match the location in the file. For example: >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id0 >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id1 >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id2 >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id3 >> >> The path to the test plus the id can be used to problem lists the test. Say that id3 matches a run done with ZGC, and we need to problem list this test with ZGC, then the problem list would contain: >> gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id3 >> >> The problem is when someone adds a new run section before that, then all the ids will be shifted and #id3 doesn't correspond to the ZGC run anymore. A similar problem occurs if two run sections are swapped. >> >> I propose that we refrain from using the automatically generated ids when problem listing tests. Instead we add explicit ids like this: >> * @test id=Z >> >> An additional benefit of doing this is that it will be easier to see what was actually run: >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#G1 >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#Parallel >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#Serial >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#Z >> >> I've gone through the tests in the HotSpot problem lists + some that affects ZGC. There are probably more tests that would benefit from getting explicit ids, but I started with a small set to begin with. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Fix TestReferenceClearDuringReferenceProcessing Nice cleanup! Looks good to me. ------------- Marked as reviewed by pliden (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5557 From rkennke at openjdk.java.net Fri Sep 17 09:32:43 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Fri, 17 Sep 2021 09:32:43 GMT Subject: RFR: 8272723: Don't use Access API to access primitive fields [v3] In-Reply-To: References: Message-ID: On Fri, 20 Aug 2021 08:51:54 GMT, Roman Kennke wrote: >> For earlier incarnations of Shenandoah, we needed to put barriers before accessing primitive fields. This is no longer necessary nor implemented/used by any GC, and we should simplify the code to do plain access instead. >> >> (We may want to remove remaining primitive access machinery in the Access API soon) >> >> Testing: >> - [x] build x86_32 and x86_64 >> - [x] tier1 >> - [x] tier2 >> - [x] hotspot_gc > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'master' into JDK-8272723 > - Remove redundant asserts > - Revert to use RawAccess for volatile accesses > - Fix alignment > - Consolidate (obj)field_addr() variants > - Remove remaining primitive Access API uses > - 8272723: Don't use Access API to access primitive fields Ping? ------------- PR: https://git.openjdk.java.net/jdk/pull/5187 From github.com+42899633+eastig at openjdk.java.net Fri Sep 17 11:36:11 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Fri, 17 Sep 2021 11:36:11 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 Message-ID: This PR is a follow-up on the discussion [?RFC: AArch64: Implementing spin pauses with ISB?](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054033.html). It adds the option `UsePauseImpl=value`, where `value` can be: - `none`: no implementation for spin pauses. This is the default value. - `Nnop`: use `nop` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `nop` instructions. - `Nisb`: use `isb` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `isb` instructions. - `Nyield`: use `yield` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `yield` instructions. The code for the `Thread.onSpinWait` intrinsic is generated based on the value of `UsePauseImpl`. Testing: - `make test TEST="gtest"`: Passed - `make run-test TEST="tier1"`: Passed - `make run-test TEST="tier2"`: Passed - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed ------------- Commit messages: - Add missing header file - 8186670: Implement _onSpinWait() intrinsic for AArch64 Changes: https://git.openjdk.java.net/jdk/pull/5562/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5562&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8186670 Stats: 460 lines in 9 files changed: 458 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/5562.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5562/head:pull/5562 PR: https://git.openjdk.java.net/jdk/pull/5562 From coleenp at openjdk.java.net Fri Sep 17 11:49:57 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 17 Sep 2021 11:49:57 GMT Subject: RFR: 8273915: Create 'nosafepoint' rank Message-ID: Partition safepoint checking and nonchecking lock ranks. The nonchecking locks are always lower ranked than the safepoint checking locks because they cannot block. This moves some leaf locks to 'nosafepoint' rank and corrects relative ranking. Tested with tier1-6 and built and run tier1 tests with shenandoah locally. ------------- Commit messages: - Partition safepoint checking and nonchecking lock ranks. The nonchecking locks are always lower ranked than the safepoint checking locks because they cannot block. Changes: https://git.openjdk.java.net/jdk/pull/5550/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5550&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273915 Stats: 85 lines in 25 files changed: 12 ins; 0 del; 73 mod Patch: https://git.openjdk.java.net/jdk/pull/5550.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5550/head:pull/5550 PR: https://git.openjdk.java.net/jdk/pull/5550 From coleenp at openjdk.java.net Fri Sep 17 11:57:59 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Fri, 17 Sep 2021 11:57:59 GMT Subject: RFR: 8273916: Remove 'special' ranking Message-ID: This change removes the special ranking and folds it into nosafepoint. You have to look at commit #3 to see this actual part of the change that doesn't include JDK-8273915. This passes tier1-6 also. ------------- Commit messages: - Remove "special" rank. - Partition safepoint checking and nonchecking lock ranks. The nonchecking locks are always lower ranked than the safepoint checking locks because they cannot block. - Partition safepoint checking and nonchecking lock ranks. The nonchecking locks are always lower ranked than the safepoint checking locks because they cannot block. Changes: https://git.openjdk.java.net/jdk/pull/5563/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5563&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273916 Stats: 129 lines in 26 files changed: 12 ins; 4 del; 113 mod Patch: https://git.openjdk.java.net/jdk/pull/5563.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5563/head:pull/5563 PR: https://git.openjdk.java.net/jdk/pull/5563 From chagedorn at openjdk.java.net Fri Sep 17 12:25:53 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Fri, 17 Sep 2021 12:25:53 GMT Subject: RFR: 8267265: Use new IR Test Framework to create tests for C2 IGV transformations [v4] In-Reply-To: <8Ce6bZtHwGEw8_wXZz4ak3obprd1YmZDi4cItcXB4bA=.a7162709-7aad-4709-a585-d2391392f49b@github.com> References: <8Ce6bZtHwGEw8_wXZz4ak3obprd1YmZDi4cItcXB4bA=.a7162709-7aad-4709-a585-d2391392f49b@github.com> Message-ID: On Wed, 1 Sep 2021 00:23:11 GMT, John Tortugo wrote: >> Hi, can I please get some reviews for this Pull Request? Here is a summary of the changes: >> >> - Add tests, using the new IR-based test framework, for several of the Ideal transformations on Add, Sub, Mul, Div, Loop nodes and some simple Scalar Replacement transformations. >> - Add more default IR regex's to IR-based test framework. >> - Changes to Sub, Div and Add Ideal nodes to that transformations on Int and Long types are the whenever possible same. >> - Changes to Sub*Node, Div*Node and Add*Node Ideal methods to fix some bugs and include new transformations. >> - New JTREG "ir_transformations" test group under test/hotspot/jtreg. > > John Tortugo has updated the pull request incrementally with 146 additional commits since the last revision: > > - Fix merge mistake. > - Merge branch 'jdk-8267265' of https://github.com/JohnTortugo/jdk into jdk-8267265 > - Addressing PR feedback: move tests to other directory, add custom tests, add tests for other optimizations, rename some tests. > - 8273197: ProblemList 2 jtools tests due to JDK-8273187 > 8273198: ProblemList java/lang/instrument/BootClassPath/BootClassPathTest.sh due to JDK-8273188 > > Reviewed-by: naoto > - 8262186: Call X509KeyManager.chooseClientAlias once for all key types > > Reviewed-by: xuelei > - 8273186: Remove leftover comment about sparse remembered set in G1 HeapRegionRemSet > > Reviewed-by: ayang > - 8273169: java/util/regex/NegativeArraySize.java failed after JDK-8271302 > > Reviewed-by: jiefu, serb > - 8273092: Sort classlist in JDK image > > Reviewed-by: redestad, ihse, dfuchs > - 8273144: Remove unused top level "Sample Collection Set Candidates" logging > > Reviewed-by: iwalulya, ayang > - 8262095: NPE in Flow$FlowAnalyzer.visitApply: Cannot invoke getThrownTypes because tree.meth.type is null > > Co-authored-by: Jan Lahoda > Co-authored-by: Vicente Romero > Reviewed-by: jlahoda > - ... and 136 more: https://git.openjdk.java.net/jdk/compare/ac430bf7...463102e2 Thanks for your effort to write tests for all these different kinds of transformations! Generally, they look good and are worth to have! You should add `@bug 8267265` to all files. test/hotspot/jtreg/compiler/c2/irTests/AddINodeIdealizationTests.java line 34: > 32: * @run driver compiler.c2.irTests.AddINodeIdealizationTests > 33: */ > 34: public class AddINodeIdealizationTests { General comments, also applies to the other test files: It might be good to sanity check the output results of all these transformations (even though they are simple). Since the tests only use simple randomized ints, you could use a single `@Run` method instead of one for each test. This could look something like [this](https://gist.github.com/chhagedorn/b16aba260a8fcf27c082beccf2cec0a3). test/hotspot/jtreg/compiler/c2/irTests/AddINodeIdealizationTests.java line 40: > 38: > 39: @Test > 40: @IR(failOn = {IRNode.LOAD, IRNode.STORE, IRNode.MUL, IRNode.DIV, IRNode.SUB}) In this test and all the following ones (including the other files), I think you can remove unrelated `failOn` regexes on operations that are not part of the test. For example, in this test you can safely remove `IRNode.MUL, DIV, and SUB`. test/hotspot/jtreg/compiler/c2/irTests/AddINodeIdealizationTests.java line 91: > 89: @IR(failOn = {IRNode.LOAD, IRNode.STORE, IRNode.MUL, IRNode.DIV, IRNode.SUB}) > 90: @IR(counts = {IRNode.ADD, "2"}) > 91: // Checks (x + c1) + y => (x + y) + c1 Unfortunately, a limitation of the framework to check the correct inputs of IR nodes. test/hotspot/jtreg/compiler/c2/irTests/AddLNodeIdealizationTests.java line 151: > 149: return (a - b) + (c - a); > 150: } > 151: Compared to the `AddI` tests, you've missed the case `(a - b) + (b - c) => (a - c)` here. test/hotspot/jtreg/compiler/c2/irTests/DivINodeIdealizationTests.java line 44: > 42: // Checks x / x => 1 > 43: public int constant(int x) { > 44: return x / x; This fails when `x` is zero with an `ArithmeticException`. I suggest to convert this into a custom run test and catch this case - maybe also testing zero as separate case to see if an exception is thrown with compiled code. test/hotspot/jtreg/compiler/c2/irTests/DivINodeIdealizationTests.java line 68: > 66: // Checks x / (y / y) => x > 67: public int identityThird(int x, int y) { > 68: return x / (y / y); Same problem as above with `y = 0`. test/hotspot/jtreg/compiler/c2/irTests/DivINodeIdealizationTests.java line 79: > 77: // Hotspot should keep the division because it may cause a division by zero trap > 78: public int retainDenominator(int x, int y) { > 79: return (x * y) / y; Same problem as above with `y = 0`. test/hotspot/jtreg/compiler/c2/irTests/DivLNodeIdealizationTests.java line 34: > 32: * @run driver compiler.c2.irTests.DivLNodeIdealizationTests > 33: */ > 34: public class DivLNodeIdealizationTests { Same div by zero problems as with `DivI`. Should be adjusted analogously. test/hotspot/jtreg/compiler/c2/irTests/MulINodeIdealizationTests.java line 45: > 43: //Checks Max(a,b) * min(a,b) => a*b > 44: public int excludeMaxMin(int x, int y){ > 45: return Math.max(x, y) * Math.min(x, y); `Math.min/max()` is intrinsified and HotSpot generates `CMove` nodes (see `LibraryCallKit::generate_min_max()`) for them. But it looks like `MulNode::Ideal` misses this check for `CMove` nodes. That could be done in a separate RFE (and then this test could be improved to check if the `CMove` node was removed). Anyways, min/max nodes are mainly used for loop limit computations, so it's harder to test this transformation in an easy way. test/hotspot/jtreg/compiler/c2/irTests/MulLNodeIdealizationTests.java line 43: > 41: @IR(failOn = {IRNode.LOAD, IRNode.STORE, IRNode.DIV, IRNode.CALL}) > 42: @IR(counts = {IRNode.MUL, "1"}) > 43: //Checks Max(a,b) * min(a,b) => a*b See comments for `MulI`. test/hotspot/jtreg/compiler/c2/irTests/SubINodeIdealizationTests.java line 164: > 162: @Arguments(Argument.RANDOM_EACH) > 163: @IR(failOn = {IRNode.LOAD, IRNode.STORE, IRNode.MUL, IRNode.DIV, IRNode.SUB, IRNode.ADD}) > 164: // Checks 0 - (a >> 31) => a >> 31 Comment should be adjusted to differentiate between signed and unsigned shifts. And a rule should be added to check that the `RShiftI` node was converted into an `URShiftI` node. test/hotspot/jtreg/compiler/c2/irTests/SubLNodeIdealizationTests.java line 155: > 153: @Arguments(Argument.RANDOM_EACH) > 154: @IR(failOn = {IRNode.LOAD, IRNode.STORE, IRNode.MUL, IRNode.DIV, IRNode.SUB, IRNode.ADD}) > 155: // Checks 0 - (a >> 63) => a >>> 63 Same as for `SubI` above, a rule should be added for the shift nodes. test/hotspot/jtreg/compiler/c2/irTests/loopOpts/LoopIdealizationTests.java line 43: > 41: @Test > 42: @IR(failOn = {IRNode.LOAD, IRNode.STORE, IRNode.MUL, IRNode.DIV, IRNode.ADD, IRNode.SUB, IRNode.LOOP, IRNode.COUNTEDLOOP, IRNode.COUNTEDLOOP_MAIN, IRNode.CALL}) > 43: //Checks that a for loop with 0 iterations is removed Missing space after `//` and also for other comments below. test/hotspot/jtreg/compiler/c2/irTests/loopOpts/LoopIdealizationTests.java line 44: > 42: @IR(failOn = {IRNode.LOAD, IRNode.STORE, IRNode.MUL, IRNode.DIV, IRNode.ADD, IRNode.SUB, IRNode.LOOP, IRNode.COUNTEDLOOP, IRNode.COUNTEDLOOP_MAIN, IRNode.CALL}) > 43: //Checks that a for loop with 0 iterations is removed > 44: public void zeroIterForLoop(){ Missing space between `)` and `{` and also on other lines below. test/hotspot/jtreg/compiler/c2/irTests/loopOpts/LoopIdealizationTests.java line 52: > 50: @Test > 51: @IR(failOn = {IRNode.LOAD, IRNode.STORE, IRNode.MUL, IRNode.DIV, IRNode.ADD, IRNode.SUB, IRNode.LOOP, IRNode.COUNTEDLOOP, IRNode.COUNTEDLOOP_MAIN, IRNode.CALL}) > 52: //Checks that a for loop with 0 iterations is removed Actually there is 1 iteration but we break it immediately (i.e. the loop is entered). test/hotspot/jtreg/compiler/c2/irTests/loopOpts/LoopIdealizationTests.java line 89: > 87: if (i == 0){ > 88: break; > 89: }else{ Spaces around `else`. test/hotspot/jtreg/compiler/c2/irTests/loopOpts/LoopIdealizationTests.java line 141: > 139: //Checks that a while loop with 1 iteration is simplified to straight code > 140: public void oneIterDoWhileLoop(){ > 141: do{ Spacing test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/ScalarReplacementTests.java line 29: > 27: /* > 28: * @test > 29: * @summary Tests that Escape Analisys and Scalar Replacement is able to handle some simple cases. Typo: Analisys -> Analysis test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/ScalarReplacementTests.java line 33: > 31: * @run driver compiler.c2.irTests.scalarReplacement.ScalarReplacementTests > 32: */ > 33: public class ScalarReplacementTests { You should also add some rules to check if there is an allocation or not. test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 162: > 160: public static final String SCOPE_OBJECT = "(.*# ScObj.*" + END; > 161: public static final String MEMBAR = START + "MemBar" + MID + END; > 162: I suggest to move all newly added regex together here. ------------- Changes requested by chagedorn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5135 From iignatyev at openjdk.java.net Fri Sep 17 13:32:48 2021 From: iignatyev at openjdk.java.net (Igor Ignatyev) Date: Fri, 17 Sep 2021 13:32:48 GMT Subject: RFR: 8273314: Add tier4 test groups [v4] In-Reply-To: References: <6rDioD5KZREYHyzOXol72O0erNDqKHUqb-0k1SwhInA=.90a12492-9e2f-4f0c-bf09-8d67d1d6111f@github.com> Message-ID: On Fri, 17 Sep 2021 06:59:09 GMT, Aleksey Shipilev wrote: >> During the review of JDK-8272914 that added hotspot:tier{2,3} groups, @iignatev suggested to create tier4 groups that capture all tests not in tiers{1,2,3}. >> >> Caveats: >> - I excluded `applications` from `hotspot:tier4`, because they require test dependencies (e.g. jcstress). >> - `jdk:tier4` only runs well with `JTREG_KEYWORDS=!headful` or reduced concurrency with `TEST_JOBS=1`, because headful tests cannot run in parallel >> >> Sample run with `JTREG_KEYWORDS=!headful`: >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >>>> jtreg:test/hotspot/jtreg:tier4 3585 3584 0 1 << >>>> jtreg:test/jdk:tier4 2893 2887 5 1 << >> jtreg:test/langtools:tier4 0 0 0 0 >> jtreg:test/jaxp:tier4 0 0 0 0 >> ============================== >> >> real 699m39.462s >> user 6626m8.448s >> sys 1110m43.704s >> >> >> There are interesting test failures on my machine, which I would address separately. > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into JDK-8273314-tier4 > - Merge branch 'master' into JDK-8273314-tier4 > - Drop applications and fix the comment > - Drop exceptions > - Add tier4 test groups LGTM ------------- Marked as reviewed by iignatyev (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5357 From aph-open at littlepinkcloud.com Fri Sep 17 14:09:00 2021 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Fri, 17 Sep 2021 15:09:00 +0100 Subject: Intrinsic methods and time to safepoint In-Reply-To: <87a6kbftu5.fsf@redhat.com> References: <87czp8g0wu.fsf@redhat.com> <87a6kbftu5.fsf@redhat.com> Message-ID: <9fc9a3ac-1108-64a2-f7ef-9be76f0e7a95@littlepinkcloud.com> On 9/17/21 10:13 AM, Roland Westrelin wrote: > >> OK. I guess the problem is that the call to the stub doesn't have an oop >> map. > > Assuming you can get the oop map in stub, deoptimization could still be > delayed until the stub is exited. Is that considered a problem? > > Actually I'm not even sure the loop synthesized from IR nodes is > feasible: what would the JVM state at the safepoint be given that > there's no actual java code for it? What would you deoptimize to if you > were to deoptimize? Ah. Perhaps you have a good point. :-) -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From github.com+42899633+eastig at openjdk.java.net Fri Sep 17 20:01:40 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Fri, 17 Sep 2021 20:01:40 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 In-Reply-To: References: Message-ID: On Fri, 17 Sep 2021 16:44:27 GMT, Stuart Monteith wrote: >> This PR is a follow-up on the discussion [?RFC: AArch64: Implementing spin pauses with ISB?](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054033.html). >> >> It adds the option `UsePauseImpl=value`, where `value` can be: >> >> - `none`: no implementation for spin pauses. This is the default value. >> - `Nnop`: use `nop` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `nop` instructions. >> - `Nisb`: use `isb` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `isb` instructions. >> - `Nyield`: use `yield` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `yield` instructions. >> >> The code for the `Thread.onSpinWait` intrinsic is generated based on the value of `UsePauseImpl`. >> >> Testing: >> >> - `make test TEST="gtest"`: Passed >> - `make run-test TEST="tier1"`: Passed >> - `make run-test TEST="tier2"`: Passed >> - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed > > src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp line 2989: > >> 2987: switch (VM_Version::pause_impl_desc().inst()) { >> 2988: case NOP: >> 2989: for (unsigned int i = 1; i < VM_Version::pause_impl_desc().inst_count(); ++i) { > > Shouldn't these loops be indexed from 0? Good catch. It is a copy-paste error. Is there any method to test C1 generated assembly code? ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From github.com+42899633+eastig at openjdk.java.net Fri Sep 17 20:16:45 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Fri, 17 Sep 2021 20:16:45 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 In-Reply-To: References: Message-ID: <8rRN5taGRtxStVgvMus03HHLSTuz3lvXPcIA_ZBG_6c=.cba8f88b-c13f-49de-8d0d-0387c5821691@github.com> On Fri, 17 Sep 2021 17:44:07 GMT, Paul Hohensee wrote: > Do you intend to make isb the default for N1? Yes, I do. I'll rewrite https://github.com/openjdk/jdk/pull/5112 to use different implementations. After that, I'd like to enable it for N1. ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From smonteith at openjdk.java.net Fri Sep 17 17:38:49 2021 From: smonteith at openjdk.java.net (Stuart Monteith) Date: Fri, 17 Sep 2021 17:38:49 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 In-Reply-To: References: Message-ID: On Fri, 17 Sep 2021 11:26:03 GMT, Evgeny Astigeevich wrote: > This PR is a follow-up on the discussion [?RFC: AArch64: Implementing spin pauses with ISB?](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054033.html). > > It adds the option `UsePauseImpl=value`, where `value` can be: > > - `none`: no implementation for spin pauses. This is the default value. > - `Nnop`: use `nop` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `nop` instructions. > - `Nisb`: use `isb` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `isb` instructions. > - `Nyield`: use `yield` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `yield` instructions. > > The code for the `Thread.onSpinWait` intrinsic is generated based on the value of `UsePauseImpl`. > > Testing: > > - `make test TEST="gtest"`: Passed > - `make run-test TEST="tier1"`: Passed > - `make run-test TEST="tier2"`: Passed > - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed Looks Ok to me, this is the most future proof option. Will you be adding code to set the default depending on model, or is that something for your fork? src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp line 2989: > 2987: switch (VM_Version::pause_impl_desc().inst()) { > 2988: case NOP: > 2989: for (unsigned int i = 1; i < VM_Version::pause_impl_desc().inst_count(); ++i) { Shouldn't these loops be indexed from 0? ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From phh at openjdk.java.net Fri Sep 17 17:46:44 2021 From: phh at openjdk.java.net (Paul Hohensee) Date: Fri, 17 Sep 2021 17:46:44 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 In-Reply-To: References: Message-ID: On Fri, 17 Sep 2021 11:26:03 GMT, Evgeny Astigeevich wrote: > This PR is a follow-up on the discussion [?RFC: AArch64: Implementing spin pauses with ISB?](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054033.html). > > It adds the option `UsePauseImpl=value`, where `value` can be: > > - `none`: no implementation for spin pauses. This is the default value. > - `Nnop`: use `nop` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `nop` instructions. > - `Nisb`: use `isb` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `isb` instructions. > - `Nyield`: use `yield` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `yield` instructions. > > The code for the `Thread.onSpinWait` intrinsic is generated based on the value of `UsePauseImpl`. > > Testing: > > - `make test TEST="gtest"`: Passed > - `make run-test TEST="tier1"`: Passed > - `make run-test TEST="tier2"`: Passed > - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed Do you intend to make isb the default for N1? ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From qingfeng.yy at alibaba-inc.com Sat Sep 18 07:34:26 2021 From: qingfeng.yy at alibaba-inc.com (Yi Yang) Date: Sat, 18 Sep 2021 15:34:26 +0800 Subject: =?UTF-8?B?UXVlc3Rpb24gYWJvdXQgSklUIFBlZXBob2xlIG9wdGltaXphdGlvbnM=?= Message-ID: Hello Community, I see that both C1 and C2 introduced peephole optimization, which is a classic compiler optimization phase. However, it seems that they are barely implemented/used/changed from the first time they open-sourced to now on. C1's peephole(LIR_Assembler::peephole) does nothing, and its implementation on most platforms is empty. As for C2's peephole, I noticed currently arm/aarch64 has no peephole rules, x686/s390/ppc has 2-3 peephole rules. PhasePeephole on almost all platforms is disabled by default, the only exception is x86, which is enabled by default. I want to know why we do not add more rules to allow merging more instructions by using peephole(Like llvm/lib/CodeGen/PeepholeOptimizer.cpp). And I noticed that many rules have been commented out. Is there any reason for that? Is it because XXNode::Ideal does most of the work? Or has profiling proved that peepholes are not profitable/balanced between compilation time and their outcome? Or it's difficult to do peepholes by rule-based approach? Are we worthy of continuing to work on it? I know nothing about the prehistoric era of HotSpot JITs, any input is appreciated! Thanks. From aph at openjdk.java.net Sat Sep 18 09:40:44 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Sat, 18 Sep 2021 09:40:44 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 In-Reply-To: References: Message-ID: On Fri, 17 Sep 2021 11:26:03 GMT, Evgeny Astigeevich wrote: > This PR is a follow-up on the discussion [?RFC: AArch64: Implementing spin pauses with ISB?](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054033.html). > > It adds the option `UsePauseImpl=value`, where `value` can be: > > - `none`: no implementation for spin pauses. This is the default value. > - `Nnop`: use `nop` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `nop` instructions. > - `Nisb`: use `isb` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `isb` instructions. > - `Nyield`: use `yield` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `yield` instructions. > > The code for the `Thread.onSpinWait` intrinsic is generated based on the value of `UsePauseImpl`. > > Testing: > > - `make test TEST="gtest"`: Passed > - `make run-test TEST="tier1"`: Passed > - `make run-test TEST="tier2"`: Passed > - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed src/hotspot/cpu/aarch64/aarch64.ad line 14368: > 14366: for (unsigned int i = 1; i < VM_Version::pause_impl_desc().inst_count(); ++i) { > 14367: $$emit$$"\tisb\n" > 14368: } The code to generate n copies of a pause_impl instruction would be much happier in the MacroAssembler. src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp line 2988: > 2986: void LIR_Assembler::on_spin_wait() { > 2987: switch (VM_Version::pause_impl_desc().inst()) { > 2988: case NOP: Again, please push this into a macro, called from c1 and c2, that does the right thing for the current machine. ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From aph at openjdk.java.net Sat Sep 18 09:40:45 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Sat, 18 Sep 2021 09:40:45 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 In-Reply-To: References: Message-ID: On Sat, 18 Sep 2021 09:33:33 GMT, Andrew Haley wrote: >> This PR is a follow-up on the discussion [?RFC: AArch64: Implementing spin pauses with ISB?](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054033.html). >> >> It adds the option `UsePauseImpl=value`, where `value` can be: >> >> - `none`: no implementation for spin pauses. This is the default value. >> - `Nnop`: use `nop` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `nop` instructions. >> - `Nisb`: use `isb` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `isb` instructions. >> - `Nyield`: use `yield` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `yield` instructions. >> >> The code for the `Thread.onSpinWait` intrinsic is generated based on the value of `UsePauseImpl`. >> >> Testing: >> >> - `make test TEST="gtest"`: Passed >> - `make run-test TEST="tier1"`: Passed >> - `make run-test TEST="tier2"`: Passed >> - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed > > src/hotspot/cpu/aarch64/aarch64.ad line 14368: > >> 14366: for (unsigned int i = 1; i < VM_Version::pause_impl_desc().inst_count(); ++i) { >> 14367: $$emit$$"\tisb\n" >> 14368: } > > The code to generate n copies of a pause_impl instruction would be much happier in the MacroAssembler. So, lose all the code here in C2 and push it down into a single macro. ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From aph at openjdk.java.net Sat Sep 18 09:40:45 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Sat, 18 Sep 2021 09:40:45 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 In-Reply-To: References: Message-ID: On Fri, 17 Sep 2021 19:58:43 GMT, Evgeny Astigeevich wrote: >> src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp line 2989: >> >>> 2987: switch (VM_Version::pause_impl_desc().inst()) { >>> 2988: case NOP: >>> 2989: for (unsigned int i = 1; i < VM_Version::pause_impl_desc().inst_count(); ++i) { >> >> Shouldn't these loops be indexed from 0? > > Good catch. It is a copy-paste error. > Is there any method to test C1 generated assembly code? You could do it the same way as hotspot/jtreg/compiler/c2/aarch64/TestVolatiles.java, i.e. spawn a subtask and parse the output dump. It's very fiddly, though. ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From kbarrett at openjdk.java.net Sat Sep 18 19:10:49 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sat, 18 Sep 2021 19:10:49 GMT Subject: RFR: 8273928: Use named run ids when problem listing tests [v3] In-Reply-To: References: Message-ID: On Fri, 17 Sep 2021 08:54:16 GMT, Stefan Karlsson wrote: >> Today when you have multiple jtreg run sections in a test, each run gets an automated id that match the location in the file. For example: >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id0 >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id1 >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id2 >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id3 >> >> The path to the test plus the id can be used to problem lists the test. Say that id3 matches a run done with ZGC, and we need to problem list this test with ZGC, then the problem list would contain: >> gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id3 >> >> The problem is when someone adds a new run section before that, then all the ids will be shifted and #id3 doesn't correspond to the ZGC run anymore. A similar problem occurs if two run sections are swapped. >> >> I propose that we refrain from using the automatically generated ids when problem listing tests. Instead we add explicit ids like this: >> * @test id=Z >> >> An additional benefit of doing this is that it will be easier to see what was actually run: >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#G1 >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#Parallel >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#Serial >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#Z >> >> I've gone through the tests in the HotSpot problem lists + some that affects ZGC. There are probably more tests that would benefit from getting explicit ids, but I started with a small set to begin with. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Fix TestReferenceClearDuringReferenceProcessing Changes requested by kbarrett (Reviewer). test/hotspot/jtreg/ProblemList.txt line 104: > 102: runtime/InvocationTests/invokevirtualTests.java#old-int 8271125 generic-all > 103: runtime/jni/terminatedThread/TestTerminatedThread.java 8219652 aix-ppc64 > 104: runtime/os/TestTracePageSizes.java#no-options 8267460 linux-aarch64 I don't think the whitespace changes to align fields should be made. That's not the style of this file. And I think it shouldn't be the style, since maintaining the alignment will frequently introduce otherwise uninteresting diffs. test/hotspot/jtreg/serviceability/dcmd/gc/HeapDumpCompressedTest.java line 86: > 84: > 85: /* > 86: * @test id=SHenandoah Lowercase the first "H". ------------- PR: https://git.openjdk.java.net/jdk/pull/5557 From dholmes at openjdk.java.net Sun Sep 19 13:10:52 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Sun, 19 Sep 2021 13:10:52 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 In-Reply-To: References: Message-ID: On Fri, 17 Sep 2021 11:26:03 GMT, Evgeny Astigeevich wrote: > This PR is a follow-up on the discussion [?RFC: AArch64: Implementing spin pauses with ISB?](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054033.html). > > It adds the option `UsePauseImpl=value`, where `value` can be: > > - `none`: no implementation for spin pauses. This is the default value. > - `Nnop`: use `nop` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `nop` instructions. > - `Nisb`: use `isb` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `isb` instructions. > - `Nyield`: use `yield` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `yield` instructions. > > The code for the `Thread.onSpinWait` intrinsic is generated based on the value of `UsePauseImpl`. > > Testing: > > - `make test TEST="gtest"`: Passed > - `make run-test TEST="tier1"`: Passed > - `make run-test TEST="tier2"`: Passed > - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed If you are adding a new product flag then a CSR request is needed. David ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From dholmes at openjdk.java.net Sun Sep 19 13:15:51 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Sun, 19 Sep 2021 13:15:51 GMT Subject: RFR: 8273314: Add tier4 test groups [v4] In-Reply-To: References: <6rDioD5KZREYHyzOXol72O0erNDqKHUqb-0k1SwhInA=.90a12492-9e2f-4f0c-bf09-8d67d1d6111f@github.com> Message-ID: On Fri, 17 Sep 2021 06:59:09 GMT, Aleksey Shipilev wrote: >> During the review of JDK-8272914 that added hotspot:tier{2,3} groups, @iignatev suggested to create tier4 groups that capture all tests not in tiers{1,2,3}. >> >> Caveats: >> - I excluded `applications` from `hotspot:tier4`, because they require test dependencies (e.g. jcstress). >> - `jdk:tier4` only runs well with `JTREG_KEYWORDS=!headful` or reduced concurrency with `TEST_JOBS=1`, because headful tests cannot run in parallel >> >> Sample run with `JTREG_KEYWORDS=!headful`: >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >>>> jtreg:test/hotspot/jtreg:tier4 3585 3584 0 1 << >>>> jtreg:test/jdk:tier4 2893 2887 5 1 << >> jtreg:test/langtools:tier4 0 0 0 0 >> jtreg:test/jaxp:tier4 0 0 0 0 >> ============================== >> >> real 699m39.462s >> user 6626m8.448s >> sys 1110m43.704s >> >> >> There are interesting test failures on my machine, which I would address separately. > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into JDK-8273314-tier4 > - Merge branch 'master' into JDK-8273314-tier4 > - Drop applications and fix the comment > - Drop exceptions > - Add tier4 test groups I abstain - you have your reviews. Cheers, David ------------- PR: https://git.openjdk.java.net/jdk/pull/5357 From dholmes at openjdk.java.net Sun Sep 19 23:23:50 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Sun, 19 Sep 2021 23:23:50 GMT Subject: RFR: 8273928: Use named run ids when problem listing tests [v3] In-Reply-To: References: Message-ID: On Fri, 17 Sep 2021 08:54:16 GMT, Stefan Karlsson wrote: >> Today when you have multiple jtreg run sections in a test, each run gets an automated id that match the location in the file. For example: >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id0 >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id1 >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id2 >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id3 >> >> The path to the test plus the id can be used to problem lists the test. Say that id3 matches a run done with ZGC, and we need to problem list this test with ZGC, then the problem list would contain: >> gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id3 >> >> The problem is when someone adds a new run section before that, then all the ids will be shifted and #id3 doesn't correspond to the ZGC run anymore. A similar problem occurs if two run sections are swapped. >> >> I propose that we refrain from using the automatically generated ids when problem listing tests. Instead we add explicit ids like this: >> * @test id=Z >> >> An additional benefit of doing this is that it will be easier to see what was actually run: >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#G1 >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#Parallel >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#Serial >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#Z >> >> I've gone through the tests in the HotSpot problem lists + some that affects ZGC. There are probably more tests that would benefit from getting explicit ids, but I started with a small set to begin with. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Fix TestReferenceClearDuringReferenceProcessing Overall seems fine to have a consistent naming of subtests based around the GC name. Thanks, David test/hotspot/jtreg/compiler/gcbarriers/UnsafeIntrinsicsTest.java line 25: > 23: > 24: /* > 25: * @test id=Z Are these id's actually case sensitive? Just curious. ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5557 From dholmes at openjdk.java.net Sun Sep 19 23:23:51 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Sun, 19 Sep 2021 23:23:51 GMT Subject: RFR: 8273928: Use named run ids when problem listing tests [v3] In-Reply-To: References: Message-ID: <3v6uKKMlS2beLVNYOY7q3p9DDmY36FaryKq2oQ6qyKs=.7b4c0796-a1df-4f5f-96e6-018e6177c690@github.com> On Sat, 18 Sep 2021 19:02:11 GMT, Kim Barrett wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix TestReferenceClearDuringReferenceProcessing > > test/hotspot/jtreg/ProblemList.txt line 104: > >> 102: runtime/InvocationTests/invokevirtualTests.java#old-int 8271125 generic-all >> 103: runtime/jni/terminatedThread/TestTerminatedThread.java 8219652 aix-ppc64 >> 104: runtime/os/TestTracePageSizes.java#no-options 8267460 linux-aarch64 > > I don't think the whitespace changes to align fields should be made. That's not the style of this file. And I think it shouldn't be the style, since maintaining the alignment will frequently introduce otherwise uninteresting diffs. I agree with Kim. ------------- PR: https://git.openjdk.java.net/jdk/pull/5557 From dholmes at openjdk.java.net Mon Sep 20 00:14:48 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 20 Sep 2021 00:14:48 GMT Subject: RFR: 8273915: Create 'nosafepoint' rank In-Reply-To: References: Message-ID: On Thu, 16 Sep 2021 17:11:30 GMT, Coleen Phillimore wrote: > Partition safepoint checking and nonchecking lock ranks. The nonchecking locks are always lower ranked than the safepoint checking locks because they cannot block. > > This moves some leaf locks to 'nosafepoint' rank and corrects relative ranking. > > Tested with tier1-6 and built and run tier1 tests with shenandoah locally. Hi Coleen, Mostly the remapping seems okay but a few queries below. Thanks, David src/hotspot/share/gc/parallel/psCompactionManager.cpp line 95: > 93: _shadow_region_array = new (ResourceObj::C_HEAP, mtGC) GrowableArray(10, mtGC); > 94: > 95: _shadow_region_monitor = new Monitor(Mutex::nosafepoint, "CompactionManager_lock", Not clear why this one needed to change?? src/hotspot/share/runtime/mutex.hpp line 55: > 53: nosafepoint = oopstorage + 6, > 54: leaf = nosafepoint + 6, > 55: safepoint = leaf + 10, It is somewhat confusing to have safepoint as an explicit rank now that all ranks above nosafepoint imply safepoint-ing. src/hotspot/share/runtime/mutexLocker.cpp line 253: > 251: def(ClassInitError_lock , PaddedMonitor, leaf+1, true, _safepoint_check_always); > 252: def(Module_lock , PaddedMutex , leaf+2, false, _safepoint_check_always); > 253: def(InlineCacheBuffer_lock , PaddedMutex , nosafepoint-1, true, _safepoint_check_never); Why -1 ? ------------- PR: https://git.openjdk.java.net/jdk/pull/5550 From dholmes at openjdk.java.net Mon Sep 20 00:38:51 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 20 Sep 2021 00:38:51 GMT Subject: RFR: 8273916: Remove 'special' ranking In-Reply-To: References: Message-ID: <8dW8OAemsXYBncE3GYKv5VzHscvOzBqvxk3xwqO7PdA=.8abcc090-3a90-487a-9b35-27d753a70d61@github.com> On Fri, 17 Sep 2021 11:50:22 GMT, Coleen Phillimore wrote: > This change removes the special ranking and folds it into nosafepoint. You have to look at commit #3 to see this actual part of the change that doesn't include JDK-8273915. > This passes tier1-6 also. Sorry Coleen but I'm not understanding the mapping process here. I expected to see all special changed to the same thing, eg. nosafepoint, , and all special-N changed to nosafepoint-N, but you have not done that. ??? David ------------- PR: https://git.openjdk.java.net/jdk/pull/5563 From svkamath at openjdk.java.net Mon Sep 20 05:16:16 2021 From: svkamath at openjdk.java.net (Smita Kamath) Date: Mon, 20 Sep 2021 05:16:16 GMT Subject: RFR: 8273297: AES/GCM non-AVX512+VAES CPUs suffer after 8267125 [v2] In-Reply-To: References: Message-ID: > Performance dropped up to 10% for 1k data after 8267125 for CPUs that do not support the new intrinsic. Tests run were crypto.full.AESGCMBench and crypto.full.AESGCMByteBuffer from the jmh micro benchmarks. > > The problem is each instance of GHASH allocates 96 extra longs for the AVX512+VAES intrinsic regardless if the intrinsic is used. This extra table space should be allocated differently so that non-supporting CPUs do not suffer this penalty. This issue also affects non-Intel CPUs too. Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: Added a wrapper around aes-gcm intrinsic, changed data size in TestAESMain and added a new constant for htbl entries ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5402/files - new: https://git.openjdk.java.net/jdk/pull/5402/files/4628dc3a..7ea464ae Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5402&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5402&range=00-01 Stats: 42 lines in 5 files changed: 28 ins; 1 del; 13 mod Patch: https://git.openjdk.java.net/jdk/pull/5402.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5402/head:pull/5402 PR: https://git.openjdk.java.net/jdk/pull/5402 From shade at openjdk.java.net Mon Sep 20 07:40:59 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 20 Sep 2021 07:40:59 GMT Subject: RFR: 8273314: Add tier4 test groups [v4] In-Reply-To: References: <6rDioD5KZREYHyzOXol72O0erNDqKHUqb-0k1SwhInA=.90a12492-9e2f-4f0c-bf09-8d67d1d6111f@github.com> Message-ID: On Fri, 17 Sep 2021 06:59:09 GMT, Aleksey Shipilev wrote: >> During the review of JDK-8272914 that added hotspot:tier{2,3} groups, @iignatev suggested to create tier4 groups that capture all tests not in tiers{1,2,3}. >> >> Caveats: >> - I excluded `applications` from `hotspot:tier4`, because they require test dependencies (e.g. jcstress). >> - `jdk:tier4` only runs well with `JTREG_KEYWORDS=!headful` or reduced concurrency with `TEST_JOBS=1`, because headful tests cannot run in parallel >> >> Sample run with `JTREG_KEYWORDS=!headful`: >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >>>> jtreg:test/hotspot/jtreg:tier4 3585 3584 0 1 << >>>> jtreg:test/jdk:tier4 2893 2887 5 1 << >> jtreg:test/langtools:tier4 0 0 0 0 >> jtreg:test/jaxp:tier4 0 0 0 0 >> ============================== >> >> real 699m39.462s >> user 6626m8.448s >> sys 1110m43.704s >> >> >> There are interesting test failures on my machine, which I would address separately. > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into JDK-8273314-tier4 > - Merge branch 'master' into JDK-8273314-tier4 > - Drop applications and fix the comment > - Drop exceptions > - Add tier4 test groups All right, there goes. ------------- PR: https://git.openjdk.java.net/jdk/pull/5357 From rrich at openjdk.java.net Mon Sep 20 08:17:57 2021 From: rrich at openjdk.java.net (Richard Reingruber) Date: Mon, 20 Sep 2021 08:17:57 GMT Subject: RFR: 8273392: Improve usability of stack-less exceptions due to -XX:+OmitStackTraceInFastThrow In-Reply-To: References: Message-ID: On Tue, 7 Sep 2021 15:25:46 GMT, Volker Simonis wrote: > If running with `-XX:+OmitStackTraceInFastThrow` (which is the default) C2 will optimize certain "hot" implicit exceptions (i.e. AIOOBE, NullPointerExceptions,..) and replace them by a static, pre-allocated exception without any stacktrace. > > However, we can actually do better. Instead of using a single, pre-allocated exception object for all methods we can let the compiler allocate specific exceptions for each compilation unit (i.e. nmethod) and fill them with at least one stack frame with the method /line-number information of the currently compiled method. If the method in question is being inlined (which often happens), we can add stackframes for all callers up to the inlining depth of the method in question. > > For the attached JTreg test, we get the following exception in interpreter mode: > > java.lang.NullPointerException: Cannot read the array length because "" is null > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) > at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) > at compiler.exceptions.StackFrameInFastThrow.main(StackFrameInFastThrow.java:233) > > Once the method gets compiled with `-XX:+OmitStackTraceInFastThrow` the same exception will look as follows: > > java.lang.NullPointerException > > After this change, if `StackFrameInFastThrow.throwImplicitException()` will be compiled stand alone, we will get: > > java.lang.NullPointerException > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > > and if `StackFrameInFastThrow.throwImplicitException()` will be inlined into `level2()` and `level2()` into `level1()` we will get the following exception (altough we're still running with `-XX:+OmitStackTraceInFastThrow`): > > java.lang.NullPointerException > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) > at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) > > The new functionality is guarded by `-XX:+/-StackFrameInFastThrow`, but switched on by default (I'll create a CSR for the new option once reviewers are comfortable with the change). Notice that the optimization comes at no run-time costs because all the extra work will be done at compile time. > > ## Implementation details > > - Already the current implementation of `-XX:+OmitStackTraceInFastThrow` potentially lazy-allocates the empty singleton exceptions like AIOOBE in `ciEnv::ArrayStoreException_instance()`. With this change, if running with `-XX:+StackFrameInFastThrow` we will always allocate new exception objects and populate them with the stack frames which are statically available at compile time (see `java_lang_Throwable::fill_in_stack_trace_of_implicit_exception()`). > - Because nmethods don't act as strong GC roots, we have to create a global JNI handle for every newly generated exception to prevent GC from collecting them. > - In order to avoid a memory leak we have to release these global JNI handles once a nmethod gets unloaded. In order to achieve this, I've added a new section "implicit exceptions" to the nmethod which holds these JNI handles. > - While adding the new "implicit exceptions" section to the corresponding stats (`print_nmethod_stats()` and printing routines (`nmethod::print()`) I realized that a previous change ([JDK-8254231: Implementation of Foreign Linker API (Incubator)](https://bugs.openjdk.java.net/browse/JDK-8254231)) had already introduced a new nmethod section ("native invokers") but missed to add it to the corresponding stats and printing routines so I've added that section as well. > - The `#ifdef COMPILER2` guards are only required to not break the `zero`/`minimal` builds. > - The JTreg test is using `-XX:PerMethodTrapLimit=0` to handle all implicit exceptions as "hot". This makes the test simpler and at the same time provokes the allocation of more implicit exceptions. > - Manually verified that the created Exception objects are freed by GC once the corresponding nmethods have been flushed. > - Manual "stress" test with a very small heap and continuous recompilation of methods with explicit exceptions to provoke GCs during compilation didn't reveal any issues. Hi Volker, R?mis, I haven't yet looked into the details of the change but @TheRealMDoerr kindly explained it to me. As I understood, you are using global JNI references to hold the preallocated exceptions with partial backtrace. Backtraces seem to hold references to the java mirrors of the holders of the methods in the backtrace [1]. This will keep their classloaders alive and prevent classunloading. Also the owning nmethod cannot be unloaded for the same reason. It shouldn't be to difficult to write a test that leaks classes because of this. > > _Mailing list message from [Remi Forax](mailto:forax at univ-mlv.fr) on [hotspot-dev](mailto:hotspot-dev at mail.openjdk.java.net):_ > > (not a reviewer so this message will not be really helpful ...) > > Hi Volker, > for me it's not an enhancement, but a bug fix, in production an exception with no stacktrace is useless and result in hours lost trying to figure out the issue I'd agree. Even in development exceptions should have a stacktrace and therefore, IMHO, OmitStackTraceInFastThrow should be off by default. In my eyes exceptions are means to handle unforseen application states in an best effort approach. Often they will be caused by bugs and the attached stacktrace is valuable information to find them. Your enhancement limits the stacktrace to potentially just the top frame which in many cases will not be enough and also confusing to developers. Also I don't think that we should optimize applications that have run into a bug. In the related [JDK-8273563](https://bugs.openjdk.java.net/browse/JDK-8273563) you gave the example of [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274) method: public static boolean isAlpha(int c) { try { return IS_ALPHA[c]; } catch (ArrayIndexOutOfBoundsException ex) { return false; } } There the backtrace is completely useless and can be omitted but IMHO (stated above) this is a misuse of exceptions in the first place and should be fixed. Such idioms may occur in libraries that are out of maintenance (which should not be used for security reasons) or the maintainer is not willing to accept the fix. Therefore we could limit OmitStackTraceInFastThrow to these idioms only where the thrown exception is caught in a local or inlined handler that ignores it. Maybe this is not even too difficult. If C2 compiles `isAlpha` with OmitStackTraceInFastThrow enabled then practically everything related to exception handling gets eliminated. It might be possible to recognize that the IR-Node representing the preallocated exception has no uses and only if it actually does have uses it could be replaced with an uncommon trap (which is likely the harder part). Cheers, Richard. [1] Exception backtrace references java mirrors: https://github.com/openjdk/jdk/blob/7c9868c0b3c9bd3d305e71f91596190813cdccce/src/hotspot/share/classfile/javaClasses.cpp#L2178-L2182 ------------- PR: https://git.openjdk.java.net/jdk/pull/5392 From rrich at openjdk.java.net Mon Sep 20 08:22:53 2021 From: rrich at openjdk.java.net (Richard Reingruber) Date: Mon, 20 Sep 2021 08:22:53 GMT Subject: RFR: 8273392: Improve usability of stack-less exceptions due to -XX:+OmitStackTraceInFastThrow In-Reply-To: References: Message-ID: On Tue, 7 Sep 2021 15:25:46 GMT, Volker Simonis wrote: > If running with `-XX:+OmitStackTraceInFastThrow` (which is the default) C2 will optimize certain "hot" implicit exceptions (i.e. AIOOBE, NullPointerExceptions,..) and replace them by a static, pre-allocated exception without any stacktrace. > > However, we can actually do better. Instead of using a single, pre-allocated exception object for all methods we can let the compiler allocate specific exceptions for each compilation unit (i.e. nmethod) and fill them with at least one stack frame with the method /line-number information of the currently compiled method. If the method in question is being inlined (which often happens), we can add stackframes for all callers up to the inlining depth of the method in question. > > For the attached JTreg test, we get the following exception in interpreter mode: > > java.lang.NullPointerException: Cannot read the array length because "" is null > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) > at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) > at compiler.exceptions.StackFrameInFastThrow.main(StackFrameInFastThrow.java:233) > > Once the method gets compiled with `-XX:+OmitStackTraceInFastThrow` the same exception will look as follows: > > java.lang.NullPointerException > > After this change, if `StackFrameInFastThrow.throwImplicitException()` will be compiled stand alone, we will get: > > java.lang.NullPointerException > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > > and if `StackFrameInFastThrow.throwImplicitException()` will be inlined into `level2()` and `level2()` into `level1()` we will get the following exception (altough we're still running with `-XX:+OmitStackTraceInFastThrow`): > > java.lang.NullPointerException > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) > at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) > > The new functionality is guarded by `-XX:+/-StackFrameInFastThrow`, but switched on by default (I'll create a CSR for the new option once reviewers are comfortable with the change). Notice that the optimization comes at no run-time costs because all the extra work will be done at compile time. > > ## Implementation details > > - Already the current implementation of `-XX:+OmitStackTraceInFastThrow` potentially lazy-allocates the empty singleton exceptions like AIOOBE in `ciEnv::ArrayStoreException_instance()`. With this change, if running with `-XX:+StackFrameInFastThrow` we will always allocate new exception objects and populate them with the stack frames which are statically available at compile time (see `java_lang_Throwable::fill_in_stack_trace_of_implicit_exception()`). > - Because nmethods don't act as strong GC roots, we have to create a global JNI handle for every newly generated exception to prevent GC from collecting them. > - In order to avoid a memory leak we have to release these global JNI handles once a nmethod gets unloaded. In order to achieve this, I've added a new section "implicit exceptions" to the nmethod which holds these JNI handles. > - While adding the new "implicit exceptions" section to the corresponding stats (`print_nmethod_stats()` and printing routines (`nmethod::print()`) I realized that a previous change ([JDK-8254231: Implementation of Foreign Linker API (Incubator)](https://bugs.openjdk.java.net/browse/JDK-8254231)) had already introduced a new nmethod section ("native invokers") but missed to add it to the corresponding stats and printing routines so I've added that section as well. > - The `#ifdef COMPILER2` guards are only required to not break the `zero`/`minimal` builds. > - The JTreg test is using `-XX:PerMethodTrapLimit=0` to handle all implicit exceptions as "hot". This makes the test simpler and at the same time provokes the allocation of more implicit exceptions. > - Manually verified that the created Exception objects are freed by GC once the corresponding nmethods have been flushed. > - Manual "stress" test with a very small heap and continuous recompilation of methods with explicit exceptions to provoke GCs during compilation didn't reveal any issues. src/hotspot/share/ci/ciEnv.cpp line 410: > 408: // nmethods are no strong roots so we have to create a global JNI handle > 409: // for the created exception in order to keep it alive accross GCs. > 410: objh = JNIHandles::make_global(handle); The backtrace references the java mirrors corresponding to the methods in the backtrace[1]. Thereby the global JNI handle will keep their classloaders alive and prevent classunloading and also unloading of the nmethod being compiled. [1] https://github.com/openjdk/jdk/blob/7c9868c0b3c9bd3d305e71f91596190813cdccce/src/hotspot/share/classfile/javaClasses.cpp#L2178-L2182 ------------- PR: https://git.openjdk.java.net/jdk/pull/5392 From volker.simonis at gmail.com Mon Sep 20 09:03:12 2021 From: volker.simonis at gmail.com (Volker Simonis) Date: Mon, 20 Sep 2021 11:03:12 +0200 Subject: RFR: 8273392: Improve usability of stack-less exceptions due to -XX:+OmitStackTraceInFastThrow In-Reply-To: References: Message-ID: Hi Richard, thanks a lot for looking into this change. Nmethod unloading does still work with this change, just take a look at the associated JTreg test which compiles and then unloads a method with a generated implicit exception. Once the nmethod has been unloaded, the global JNI handle will be released and the class can be unloaded as well. But I agree that it might be too late and class unloading shouldn't depend on unloading of all nmethods which reference that class. I'll have a look if I can fix that somehow. Best regards, Volker On Mon, Sep 20, 2021 at 10:53 AM Richard Reingruber wrote: > > On Tue, 7 Sep 2021 15:25:46 GMT, Volker Simonis wrote: > > > If running with `-XX:+OmitStackTraceInFastThrow` (which is the default) C2 will optimize certain "hot" implicit exceptions (i.e. AIOOBE, NullPointerExceptions,..) and replace them by a static, pre-allocated exception without any stacktrace. > > > > However, we can actually do better. Instead of using a single, pre-allocated exception object for all methods we can let the compiler allocate specific exceptions for each compilation unit (i.e. nmethod) and fill them with at least one stack frame with the method /line-number information of the currently compiled method. If the method in question is being inlined (which often happens), we can add stackframes for all callers up to the inlining depth of the method in question. > > > > For the attached JTreg test, we get the following exception in interpreter mode: > > > > java.lang.NullPointerException: Cannot read the array length because "" is null > > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > > at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) > > at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) > > at compiler.exceptions.StackFrameInFastThrow.main(StackFrameInFastThrow.java:233) > > > > Once the method gets compiled with `-XX:+OmitStackTraceInFastThrow` the same exception will look as follows: > > > > java.lang.NullPointerException > > > > After this change, if `StackFrameInFastThrow.throwImplicitException()` will be compiled stand alone, we will get: > > > > java.lang.NullPointerException > > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > > > > and if `StackFrameInFastThrow.throwImplicitException()` will be inlined into `level2()` and `level2()` into `level1()` we will get the following exception (altough we're still running with `-XX:+OmitStackTraceInFastThrow`): > > > > java.lang.NullPointerException > > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > > at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) > > at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) > > > > The new functionality is guarded by `-XX:+/-StackFrameInFastThrow`, but switched on by default (I'll create a CSR for the new option once reviewers are comfortable with the change). Notice that the optimization comes at no run-time costs because all the extra work will be done at compile time. > > > > ## Implementation details > > > > - Already the current implementation of `-XX:+OmitStackTraceInFastThrow` potentially lazy-allocates the empty singleton exceptions like AIOOBE in `ciEnv::ArrayStoreException_instance()`. With this change, if running with `-XX:+StackFrameInFastThrow` we will always allocate new exception objects and populate them with the stack frames which are statically available at compile time (see `java_lang_Throwable::fill_in_stack_trace_of_implicit_exception()`). > > - Because nmethods don't act as strong GC roots, we have to create a global JNI handle for every newly generated exception to prevent GC from collecting them. > > - In order to avoid a memory leak we have to release these global JNI handles once a nmethod gets unloaded. In order to achieve this, I've added a new section "implicit exceptions" to the nmethod which holds these JNI handles. > > - While adding the new "implicit exceptions" section to the corresponding stats (`print_nmethod_stats()` and printing routines (`nmethod::print()`) I realized that a previous change ([JDK-8254231: Implementation of Foreign Linker API (Incubator)](https://bugs.openjdk.java.net/browse/JDK-8254231)) had already introduced a new nmethod section ("native invokers") but missed to add it to the corresponding stats and printing routines so I've added that section as well. > > - The `#ifdef COMPILER2` guards are only required to not break the `zero`/`minimal` builds. > > - The JTreg test is using `-XX:PerMethodTrapLimit=0` to handle all implicit exceptions as "hot". This makes the test simpler and at the same time provokes the allocation of more implicit exceptions. > > - Manually verified that the created Exception objects are freed by GC once the corresponding nmethods have been flushed. > > - Manual "stress" test with a very small heap and continuous recompilation of methods with explicit exceptions to provoke GCs during compilation didn't reveal any issues. > > src/hotspot/share/ci/ciEnv.cpp line 410: > > > 408: // nmethods are no strong roots so we have to create a global JNI handle > > 409: // for the created exception in order to keep it alive accross GCs. > > 410: objh = JNIHandles::make_global(handle); > > The backtrace references the java mirrors corresponding to the methods in the backtrace[1]. Thereby the global JNI handle will keep their classloaders alive and prevent classunloading and also unloading of the nmethod being compiled. > > [1] https://github.com/openjdk/jdk/blob/7c9868c0b3c9bd3d305e71f91596190813cdccce/src/hotspot/share/classfile/javaClasses.cpp#L2178-L2182 > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/5392 From rrich at openjdk.java.net Mon Sep 20 09:51:58 2021 From: rrich at openjdk.java.net (Richard Reingruber) Date: Mon, 20 Sep 2021 09:51:58 GMT Subject: RFR: 8273392: Improve usability of stack-less exceptions due to -XX:+OmitStackTraceInFastThrow In-Reply-To: References: Message-ID: On Tue, 7 Sep 2021 15:25:46 GMT, Volker Simonis wrote: > If running with `-XX:+OmitStackTraceInFastThrow` (which is the default) C2 will optimize certain "hot" implicit exceptions (i.e. AIOOBE, NullPointerExceptions,..) and replace them by a static, pre-allocated exception without any stacktrace. > > However, we can actually do better. Instead of using a single, pre-allocated exception object for all methods we can let the compiler allocate specific exceptions for each compilation unit (i.e. nmethod) and fill them with at least one stack frame with the method /line-number information of the currently compiled method. If the method in question is being inlined (which often happens), we can add stackframes for all callers up to the inlining depth of the method in question. > > For the attached JTreg test, we get the following exception in interpreter mode: > > java.lang.NullPointerException: Cannot read the array length because "" is null > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) > at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) > at compiler.exceptions.StackFrameInFastThrow.main(StackFrameInFastThrow.java:233) > > Once the method gets compiled with `-XX:+OmitStackTraceInFastThrow` the same exception will look as follows: > > java.lang.NullPointerException > > After this change, if `StackFrameInFastThrow.throwImplicitException()` will be compiled stand alone, we will get: > > java.lang.NullPointerException > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > > and if `StackFrameInFastThrow.throwImplicitException()` will be inlined into `level2()` and `level2()` into `level1()` we will get the following exception (altough we're still running with `-XX:+OmitStackTraceInFastThrow`): > > java.lang.NullPointerException > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) > at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) > > The new functionality is guarded by `-XX:+/-StackFrameInFastThrow`, but switched on by default (I'll create a CSR for the new option once reviewers are comfortable with the change). Notice that the optimization comes at no run-time costs because all the extra work will be done at compile time. > > ## Implementation details > > - Already the current implementation of `-XX:+OmitStackTraceInFastThrow` potentially lazy-allocates the empty singleton exceptions like AIOOBE in `ciEnv::ArrayStoreException_instance()`. With this change, if running with `-XX:+StackFrameInFastThrow` we will always allocate new exception objects and populate them with the stack frames which are statically available at compile time (see `java_lang_Throwable::fill_in_stack_trace_of_implicit_exception()`). > - Because nmethods don't act as strong GC roots, we have to create a global JNI handle for every newly generated exception to prevent GC from collecting them. > - In order to avoid a memory leak we have to release these global JNI handles once a nmethod gets unloaded. In order to achieve this, I've added a new section "implicit exceptions" to the nmethod which holds these JNI handles. > - While adding the new "implicit exceptions" section to the corresponding stats (`print_nmethod_stats()` and printing routines (`nmethod::print()`) I realized that a previous change ([JDK-8254231: Implementation of Foreign Linker API (Incubator)](https://bugs.openjdk.java.net/browse/JDK-8254231)) had already introduced a new nmethod section ("native invokers") but missed to add it to the corresponding stats and printing routines so I've added that section as well. > - The `#ifdef COMPILER2` guards are only required to not break the `zero`/`minimal` builds. > - The JTreg test is using `-XX:PerMethodTrapLimit=0` to handle all implicit exceptions as "hot". This makes the test simpler and at the same time provokes the allocation of more implicit exceptions. > - Manually verified that the created Exception objects are freed by GC once the corresponding nmethods have been flushed. > - Manual "stress" test with a very small heap and continuous recompilation of methods with explicit exceptions to provoke GCs during compilation didn't reveal any issues. Hi Volker, > > _Mailing list message from [Volker Simonis](mailto:volker.simonis at gmail.com) on [hotspot-dev](mailto:hotspot-dev at mail.openjdk.java.net):_ > > Hi Richard, > > thanks a lot for looking into this change. > > Nmethod unloading does still work with this change, just take a look > at the associated JTreg test which compiles and then unloads a method > with a generated implicit exception. Yes it works in your test because you explicitly make the compiled method not entrant. Think of another test where a nmethod would be unloaded because the corresponding classloader isn't reachable anymore. The change prevents this because the loader will be kept alive by the preallocated exception if one exists. A test with a class leak would repeatedly create a loader, c2 compile a method with preallocated exception that was loaded by the loader and then drop the reference to the classloader. All the loaders would be kept alive by the preallocated exceptions. > Once the nmethod has been > unloaded, the global JNI handle will be released and the class can be > unloaded as well. But I agree that it might be too late and class > unloading shouldn't depend on unloading of all nmethods which > reference that class. I'll have a look if I can fix that somehow. > > Best regards, > Volker > > On Mon, Sep 20, 2021 at 10:53 AM Richard Reingruber > wrote: Cheers, Richard. ------------- PR: https://git.openjdk.java.net/jdk/pull/5392 From aph at openjdk.java.net Mon Sep 20 10:42:51 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Mon, 20 Sep 2021 10:42:51 GMT Subject: RFR: JDK-8271567: AArch64: AES Galois CounterMode (GCM) interleaved implementation using vector instructions [v10] In-Reply-To: <5AXJjp4GMtL1NVj0hcCjqJ5ZHrdLObtjjkuyXAko-Ac=.30ffb412-fb26-419c-9d83-545d358f5eb7@github.com> References: <5AXJjp4GMtL1NVj0hcCjqJ5ZHrdLObtjjkuyXAko-Ac=.30ffb412-fb26-419c-9d83-545d358f5eb7@github.com> Message-ID: On Tue, 14 Sep 2021 16:07:45 GMT, Andrew Haley wrote: >> An interleaved version of AES/GCM. >> >> Performance, now and then: >> >> >> Apple M1, 3.2 GHz: >> >> Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> AESGCMBench.decrypt 8192 256 avgt 6 3108.881 ? 119.675 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 3109.685 ? 4.206 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 3122.144 ? 113.379 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 3119.568 ? 192.877 ns/op >> >> AESGCMBench.decrypt 8192 256 avgt 6 89123.942 ? 111.977 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 91034.697 ? 161.469 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 89732.397 ? 106.370 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 89382.300 ? 139.300 ns/op >> >> Neoverse N1, 2.5GHz: >> >> Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> AESGCMBench.decrypt 8192 256 avgt 6 6296.575 ? 37.995 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 7380.326 ? 10.987 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 6293.090 ? 52.972 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 6357.536 ? 42.925 ns/op >> >> AESGCMBench.decrypt 8192 256 avgt 6 48745.085 ? 125.612 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 45062.599 ? 1548.950 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 42230.857 ? 520.562 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 45124.171 ? 1417.927 ns/op >> >> >> >> A note about the implementation for the reviewers: >> >> Unrolled and hand-scheduled intrinsics are often written in a way that >> I don't find satisfactory. Often they are a conglomeration of >> copy-and-paste programming and C macros, which makes them hard to >> understand and hard to maintain. I won't name any names, but there are >> many examples to be found in free software across the Internet, >> >> I spent a while thinking about a structured way to develop and >> implement them, and I think I've got something better. The idea is >> that you transform a pre-existing implementation into a generator for >> the interleaved version. The transformation shouldn't be too hard to >> do, but more importantly it should be possible for a reader to verify >> that the interleaved and unrolled version performs the same function. >> >> A generator takes the form of a subclass of `KernelGenerator`. The >> core idea is that the programmer defines the base case of the >> intrinsic and a method to generate a clone of it, shifted to a >> different set of registers. `KernelGenerator` will then generate >> several interleaved copies of the function, with each one using a >> different set of registers. >> >> The subclass must implement three methods: `length()`, which is the >> number of instruction bundles in the intrinsic, `generate(int n)` >> which emits the nth instruction bundle in the intrinsic, and `next()` >> which takes an instance of the generator and returns a version of it, >> shifted to a new set of registers. >> >> As an example, here's the inner loop of AES encryption: >> >> (Some details elided for clarity.) >> >> >> BIND(L_aes_loop); >> ld1(v0, T16B, post(from, 16)); >> >> cmpw(keylen, 44); >> br(Assembler::CC, L_rounds_44); >> br(Assembler::EQ, L_rounds_52); >> >> aes_round(v0, v17); >> aes_round(v0, v18); >> BIND(L_rounds_52); >> aes_round(v0, v19); >> aes_round(v0, v20); >> BIND(L_rounds_44); >> ... >> >> >> The generator for the unrolled version looks like: >> >> >> virtual void generate(int index) { >> switch (index) { >> case 0: >> ld1(_data, T16B, post(_from, 16)); // get 16 bytes of input >> break; >> case 1: >> if (_once) { >> cmpw(_keylen, 52); >> br(Assembler::LO, _rounds_44); >> br(Assembler::EQ, _rounds_52); >> } >> break; >> case 2: aes_round(_data, _subkeys + 0); break; >> case 3: aes_round(_data, _subkeys + 1); break; >> case 4: >> if (_once) bind(_rounds_52); >> break; >> case 5: aes_round(_data, _subkeys + 2); break; >> case 6: aes_round(_data, _subkeys + 3); break; >> case 7: >> if (_once) bind(_rounds_44); >> break; >> ... >> >> >> The job of converting a single inline intrinsic is, as you can see, >> not much more than adding a switch statement. Some instructions should >> only be emitted once, rather than several times, such as the labels >> and branches. (You can use a list of C++ lambdas rather than a switch >> statement to do the same thing, very LISP, but that seems a bit of a >> sledgehammer. YMMV.) >> >> I believe that this approach will be more maintainable and easier to >> understand than other approaches we've seen. Also, the number of >> unrolls is just a number that can be tweaked as required. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Cleanup In case anyone is wondering why this one hasn't been committed yet. There's a problem with prolonged time-to-safepoint when these intrinsics are executed with large-sized arguments. Intel are also looking at this intrinsic on x86, 8273297: AES/GCM non-AVX512+VAES CPUs suffer after 8267125. I could commit this now, and fix its time-to-safepoint later. Thoughts? ------------- PR: https://git.openjdk.java.net/jdk/pull/5390 From stefank at openjdk.java.net Mon Sep 20 11:15:47 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Mon, 20 Sep 2021 11:15:47 GMT Subject: RFR: 8273928: Use named run ids when problem listing tests [v3] In-Reply-To: <3v6uKKMlS2beLVNYOY7q3p9DDmY36FaryKq2oQ6qyKs=.7b4c0796-a1df-4f5f-96e6-018e6177c690@github.com> References: <3v6uKKMlS2beLVNYOY7q3p9DDmY36FaryKq2oQ6qyKs=.7b4c0796-a1df-4f5f-96e6-018e6177c690@github.com> Message-ID: <8HgRD6iyUa1A8QypijkEVmcPYDPl2RukVsfRarF5bVc=.dba7458f-34a2-42d9-8992-86ccb61fa460@github.com> On Sun, 19 Sep 2021 23:12:23 GMT, David Holmes wrote: >> test/hotspot/jtreg/ProblemList.txt line 104: >> >>> 102: runtime/InvocationTests/invokevirtualTests.java#old-int 8271125 generic-all >>> 103: runtime/jni/terminatedThread/TestTerminatedThread.java 8219652 aix-ppc64 >>> 104: runtime/os/TestTracePageSizes.java#no-options 8267460 linux-aarch64 >> >> I don't think the whitespace changes to align fields should be made. That's not the style of this file. And I think it shouldn't be the style, since maintaining the alignment will frequently introduce otherwise uninteresting diffs. > > I agree with Kim. OK. I'll change this, though I think it makes it much harder to see what tests are excluded by which bug. ------------- PR: https://git.openjdk.java.net/jdk/pull/5557 From stefank at openjdk.java.net Mon Sep 20 11:15:48 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Mon, 20 Sep 2021 11:15:48 GMT Subject: RFR: 8273928: Use named run ids when problem listing tests [v3] In-Reply-To: References: Message-ID: <_bQrXhg-yg0o6uaggnSDelwwr08IwMBL36DFlu3TPBE=.2aac6fac-af2b-4b77-a8c1-abd39b891f99@github.com> On Sun, 19 Sep 2021 23:13:32 GMT, David Holmes wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix TestReferenceClearDuringReferenceProcessing > > test/hotspot/jtreg/compiler/gcbarriers/UnsafeIntrinsicsTest.java line 25: > >> 23: >> 24: /* >> 25: * @test id=Z > > Are these id's actually case sensitive? Just curious. It seems like they are case insensitive. Just tried to problem list with z and Z, and both worked. ------------- PR: https://git.openjdk.java.net/jdk/pull/5557 From stefank at openjdk.java.net Mon Sep 20 11:22:40 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Mon, 20 Sep 2021 11:22:40 GMT Subject: RFR: 8273928: Use named run ids when problem listing tests [v4] In-Reply-To: References: Message-ID: > Today when you have multiple jtreg run sections in a test, each run gets an automated id that match the location in the file. For example: > Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id0 > Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id1 > Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id2 > Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id3 > > The path to the test plus the id can be used to problem lists the test. Say that id3 matches a run done with ZGC, and we need to problem list this test with ZGC, then the problem list would contain: > gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id3 > > The problem is when someone adds a new run section before that, then all the ids will be shifted and #id3 doesn't correspond to the ZGC run anymore. A similar problem occurs if two run sections are swapped. > > I propose that we refrain from using the automatically generated ids when problem listing tests. Instead we add explicit ids like this: > * @test id=Z > > An additional benefit of doing this is that it will be easier to see what was actually run: > Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#G1 > Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#Parallel > Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#Serial > Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#Z > > I've gone through the tests in the HotSpot problem lists + some that affects ZGC. There are probably more tests that would benefit from getting explicit ids, but I started with a small set to begin with. Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge remote-tracking branch 'origin/master' into 8273928_jtreg_ids - Review 1 - Fix TestReferenceClearDuringReferenceProcessing - Remove temporary ProblemList testing - 8273928: Use named run ids when problem listing tests ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5557/files - new: https://git.openjdk.java.net/jdk/pull/5557/files/5129ed8e..363e17fb Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5557&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5557&range=02-03 Stats: 1780 lines in 64 files changed: 1193 ins; 445 del; 142 mod Patch: https://git.openjdk.java.net/jdk/pull/5557.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5557/head:pull/5557 PR: https://git.openjdk.java.net/jdk/pull/5557 From pliden at openjdk.java.net Mon Sep 20 12:10:56 2021 From: pliden at openjdk.java.net (Per Liden) Date: Mon, 20 Sep 2021 12:10:56 GMT Subject: RFR: 8273928: Use named run ids when problem listing tests [v4] In-Reply-To: References: Message-ID: <9fkyAYoHP_-8TxrnKKDbNyuGi8YYrBOQBZEEOmadAcg=.13c8884d-c6fd-40e8-a178-715f4772cd72@github.com> On Mon, 20 Sep 2021 11:22:40 GMT, Stefan Karlsson wrote: >> Today when you have multiple jtreg run sections in a test, each run gets an automated id that match the location in the file. For example: >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id0 >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id1 >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id2 >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id3 >> >> The path to the test plus the id can be used to problem lists the test. Say that id3 matches a run done with ZGC, and we need to problem list this test with ZGC, then the problem list would contain: >> gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id3 >> >> The problem is when someone adds a new run section before that, then all the ids will be shifted and #id3 doesn't correspond to the ZGC run anymore. A similar problem occurs if two run sections are swapped. >> >> I propose that we refrain from using the automatically generated ids when problem listing tests. Instead we add explicit ids like this: >> * @test id=Z >> >> An additional benefit of doing this is that it will be easier to see what was actually run: >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#G1 >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#Parallel >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#Serial >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#Z >> >> I've gone through the tests in the HotSpot problem lists + some that affects ZGC. There are probably more tests that would benefit from getting explicit ids, but I started with a small set to begin with. > > Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge remote-tracking branch 'origin/master' into 8273928_jtreg_ids > - Review 1 > - Fix TestReferenceClearDuringReferenceProcessing > - Remove temporary ProblemList testing > - 8273928: Use named run ids when problem listing tests Still looks good. ------------- PR: https://git.openjdk.java.net/jdk/pull/5557 From simonis at openjdk.java.net Mon Sep 20 12:23:07 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Mon, 20 Sep 2021 12:23:07 GMT Subject: RFR: 8273392: Improve usability of stack-less exceptions due to -XX:+OmitStackTraceInFastThrow [v2] In-Reply-To: References: Message-ID: <4aME71kyj1wnVLbosGZMtSpFNHTIOYPR_uIlYLoi5RM=.108a01cf-7072-40f6-b304-242414ea1f7c@github.com> > If running with `-XX:+OmitStackTraceInFastThrow` (which is the default) C2 will optimize certain "hot" implicit exceptions (i.e. AIOOBE, NullPointerExceptions,..) and replace them by a static, pre-allocated exception without any stacktrace. > > However, we can actually do better. Instead of using a single, pre-allocated exception object for all methods we can let the compiler allocate specific exceptions for each compilation unit (i.e. nmethod) and fill them with at least one stack frame with the method /line-number information of the currently compiled method. If the method in question is being inlined (which often happens), we can add stackframes for all callers up to the inlining depth of the method in question. > > For the attached JTreg test, we get the following exception in interpreter mode: > > java.lang.NullPointerException: Cannot read the array length because "" is null > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) > at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) > at compiler.exceptions.StackFrameInFastThrow.main(StackFrameInFastThrow.java:233) > > Once the method gets compiled with `-XX:+OmitStackTraceInFastThrow` the same exception will look as follows: > > java.lang.NullPointerException > > After this change, if `StackFrameInFastThrow.throwImplicitException()` will be compiled stand alone, we will get: > > java.lang.NullPointerException > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > > and if `StackFrameInFastThrow.throwImplicitException()` will be inlined into `level2()` and `level2()` into `level1()` we will get the following exception (altough we're still running with `-XX:+OmitStackTraceInFastThrow`): > > java.lang.NullPointerException > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) > at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) > > The new functionality is guarded by `-XX:+/-StackFrameInFastThrow`, but switched on by default (I'll create a CSR for the new option once reviewers are comfortable with the change). Notice that the optimization comes at no run-time costs because all the extra work will be done at compile time. > > ## Implementation details > > - Already the current implementation of `-XX:+OmitStackTraceInFastThrow` potentially lazy-allocates the empty singleton exceptions like AIOOBE in `ciEnv::ArrayStoreException_instance()`. With this change, if running with `-XX:+StackFrameInFastThrow` we will always allocate new exception objects and populate them with the stack frames which are statically available at compile time (see `java_lang_Throwable::fill_in_stack_trace_of_implicit_exception()`). > - Because nmethods don't act as strong GC roots, we have to create a global JNI handle for every newly generated exception to prevent GC from collecting them. > - In order to avoid a memory leak we have to release these global JNI handles once a nmethod gets unloaded. In order to achieve this, I've added a new section "implicit exceptions" to the nmethod which holds these JNI handles. > - While adding the new "implicit exceptions" section to the corresponding stats (`print_nmethod_stats()` and printing routines (`nmethod::print()`) I realized that a previous change ([JDK-8254231: Implementation of Foreign Linker API (Incubator)](https://bugs.openjdk.java.net/browse/JDK-8254231)) had already introduced a new nmethod section ("native invokers") but missed to add it to the corresponding stats and printing routines so I've added that section as well. > - The `#ifdef COMPILER2` guards are only required to not break the `zero`/`minimal` builds. > - The JTreg test is using `-XX:PerMethodTrapLimit=0` to handle all implicit exceptions as "hot". This makes the test simpler and at the same time provokes the allocation of more implicit exceptions. > - Manually verified that the created Exception objects are freed by GC once the corresponding nmethods have been flushed. > - Manual "stress" test with a very small heap and continuous recompilation of methods with explicit exceptions to provoke GCs during compilation didn't reveal any issues. Volker Simonis has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Move the '_implicit_exceptions' GrowableArray into the compiler arena and correctly initialize '_implicit_excepts_offset' for native wrappers - 8273392: Improve usability of stack-less exceptions due to -XX:+OmitStackTraceInFastThrow ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5392/files - new: https://git.openjdk.java.net/jdk/pull/5392/files/906fe7f2..07ebd638 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5392&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5392&range=00-01 Stats: 14444 lines in 647 files changed: 8880 ins; 3313 del; 2251 mod Patch: https://git.openjdk.java.net/jdk/pull/5392.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5392/head:pull/5392 PR: https://git.openjdk.java.net/jdk/pull/5392 From github.com+42899633+eastig at openjdk.java.net Mon Sep 20 13:00:54 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Mon, 20 Sep 2021 13:00:54 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 In-Reply-To: References: Message-ID: On Sat, 18 Sep 2021 09:31:05 GMT, Andrew Haley wrote: >> Good catch. It is a copy-paste error. >> Is there any method to test C1 generated assembly code? > > You could do it the same way as hotspot/jtreg/compiler/c2/aarch64/TestVolatiles.java, i.e. spawn a subtask and parse the output dump. It's very fiddly, though. Yes, I used it as an example when I was writing tests for the PR. It works only for C2 because it relies on C2 `XX:+PrintOptoAssembly`. I haven't found anything similar for C1. ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From coleenp at openjdk.java.net Mon Sep 20 13:29:47 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 20 Sep 2021 13:29:47 GMT Subject: RFR: 8273915: Create 'nosafepoint' rank In-Reply-To: References: Message-ID: On Mon, 20 Sep 2021 00:08:11 GMT, David Holmes wrote: >> Partition safepoint checking and nonchecking lock ranks. The nonchecking locks are always lower ranked than the safepoint checking locks because they cannot block. >> >> This moves some leaf locks to 'nosafepoint' rank and corrects relative ranking. >> >> Tested with tier1-6 and built and run tier1 tests with shenandoah locally. > > src/hotspot/share/gc/parallel/psCompactionManager.cpp line 95: > >> 93: _shadow_region_array = new (ResourceObj::C_HEAP, mtGC) GrowableArray(10, mtGC); >> 94: >> 95: _shadow_region_monitor = new Monitor(Mutex::nosafepoint, "CompactionManager_lock", > > Not clear why this one needed to change?? This one changes because 'barrier' is above 'leaf' which checks for safepoint. nosafepoint it the top rank that doesn't check for safepoint, so this was made nosafepoint. > src/hotspot/share/runtime/mutex.hpp line 55: > >> 53: nosafepoint = oopstorage + 6, >> 54: leaf = nosafepoint + 6, >> 55: safepoint = leaf + 10, > > It is somewhat confusing to have safepoint as an explicit rank now that all ranks above nosafepoint imply safepoint-ing. I thought there was still a lock that used this rank but there isn't, so I'll remove it. > src/hotspot/share/runtime/mutexLocker.cpp line 253: > >> 251: def(ClassInitError_lock , PaddedMonitor, leaf+1, true, _safepoint_check_always); >> 252: def(Module_lock , PaddedMutex , leaf+2, false, _safepoint_check_always); >> 253: def(InlineCacheBuffer_lock , PaddedMutex , nosafepoint-1, true, _safepoint_check_never); > > Why -1 ? It depends on CompiledIC_lock def(CompiledIC_lock , PaddedMutex , nosafepoint, _safepoint_check_never, true); ------------- PR: https://git.openjdk.java.net/jdk/pull/5550 From coleenp at openjdk.java.net Mon Sep 20 13:29:45 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 20 Sep 2021 13:29:45 GMT Subject: RFR: 8273915: Create 'nosafepoint' rank In-Reply-To: References: Message-ID: On Thu, 16 Sep 2021 17:11:30 GMT, Coleen Phillimore wrote: > Partition safepoint checking and nonchecking lock ranks. The nonchecking locks are always lower ranked than the safepoint checking locks because they cannot block. > > This moves some leaf locks to 'nosafepoint' rank and corrects relative ranking. > > Tested with tier1-6 and built and run tier1 tests with shenandoah locally. Thank you David for reviewing this. ------------- PR: https://git.openjdk.java.net/jdk/pull/5550 From coleenp at openjdk.java.net Mon Sep 20 13:35:29 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 20 Sep 2021 13:35:29 GMT Subject: RFR: 8273915: Create 'nosafepoint' rank [v2] In-Reply-To: References: Message-ID: <6WoJrQEzL6i3ZSGEEa7i38KSG2MPOi5B2bMdoyBBv9k=.611f2e26-342d-45f5-931d-1665b4152ab0@github.com> > Partition safepoint checking and nonchecking lock ranks. The nonchecking locks are always lower ranked than the safepoint checking locks because they cannot block. > > This moves some leaf locks to 'nosafepoint' rank and corrects relative ranking. > > Tested with tier1-6 and built and run tier1 tests with shenandoah locally. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Remove 'safepoint' rank, now unused. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5550/files - new: https://git.openjdk.java.net/jdk/pull/5550/files/d14f8e17..1a927805 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5550&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5550&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5550.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5550/head:pull/5550 PR: https://git.openjdk.java.net/jdk/pull/5550 From shade at openjdk.java.net Mon Sep 20 14:07:06 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 20 Sep 2021 14:07:06 GMT Subject: Integrated: 8273314: Add tier4 test groups In-Reply-To: <6rDioD5KZREYHyzOXol72O0erNDqKHUqb-0k1SwhInA=.90a12492-9e2f-4f0c-bf09-8d67d1d6111f@github.com> References: <6rDioD5KZREYHyzOXol72O0erNDqKHUqb-0k1SwhInA=.90a12492-9e2f-4f0c-bf09-8d67d1d6111f@github.com> Message-ID: <2EIwJmplZv9-2ffL0Ye4pf3tgPEsx7HPhu-oM54T8Hk=.129e646b-333d-4ba1-b65d-ddd9ca731d2a@github.com> On Fri, 3 Sep 2021 09:10:20 GMT, Aleksey Shipilev wrote: > During the review of JDK-8272914 that added hotspot:tier{2,3} groups, @iignatev suggested to create tier4 groups that capture all tests not in tiers{1,2,3}. > > Caveats: > - I excluded `applications` from `hotspot:tier4`, because they require test dependencies (e.g. jcstress). > - `jdk:tier4` only runs well with `JTREG_KEYWORDS=!headful` or reduced concurrency with `TEST_JOBS=1`, because headful tests cannot run in parallel > > Sample run with `JTREG_KEYWORDS=!headful`: > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR >>> jtreg:test/hotspot/jtreg:tier4 3585 3584 0 1 << >>> jtreg:test/jdk:tier4 2893 2887 5 1 << > jtreg:test/langtools:tier4 0 0 0 0 > jtreg:test/jaxp:tier4 0 0 0 0 > ============================== > > real 699m39.462s > user 6626m8.448s > sys 1110m43.704s > > > There are interesting test failures on my machine, which I would address separately. This pull request has now been integrated. Changeset: 1f8af524 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/1f8af524ffe2d2d1469d8f07887b1f61c6e4d7b8 Stats: 20 lines in 4 files changed: 20 ins; 0 del; 0 mod 8273314: Add tier4 test groups Reviewed-by: serb, iignatyev ------------- PR: https://git.openjdk.java.net/jdk/pull/5357 From github.com+42899633+eastig at openjdk.java.net Mon Sep 20 16:21:38 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Mon, 20 Sep 2021 16:21:38 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v2] In-Reply-To: References: Message-ID: <9aPLibCCYMKqhil77ooUe8xzeT4P8JwSHmmYez0-5TM=.c46d790d-d6fc-4925-beb8-875cbf49a62b@github.com> > This PR is a follow-up on the discussion [?RFC: AArch64: Implementing spin pauses with ISB?](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054033.html). > > It adds the option `UsePauseImpl=value`, where `value` can be: > > - `none`: no implementation for spin pauses. This is the default value. > - `Nnop`: use `nop` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `nop` instructions. > - `Nisb`: use `isb` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `isb` instructions. > - `Nyield`: use `yield` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `yield` instructions. > > The code for the `Thread.onSpinWait` intrinsic is generated based on the value of `UsePauseImpl`. > > Testing: > > - `make test TEST="gtest"`: Passed > - `make run-test TEST="tier1"`: Passed > - `make run-test TEST="tier2"`: Passed > - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: Replace 'for' loops with macros ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5562/files - new: https://git.openjdk.java.net/jdk/pull/5562/files/1e856dec..c6831a3b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5562&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5562&range=00-01 Stats: 42 lines in 4 files changed: 10 ins; 21 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/5562.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5562/head:pull/5562 PR: https://git.openjdk.java.net/jdk/pull/5562 From github.com+42899633+eastig at openjdk.java.net Mon Sep 20 16:21:40 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Mon, 20 Sep 2021 16:21:40 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v2] In-Reply-To: References: Message-ID: <-_FlwpUr9maYPrYKvnWmNgxLGGRAESJs6JhI_F--4eM=.2ecc8e70-83b1-4836-84aa-44ec8b7c7baa@github.com> On Sat, 18 Sep 2021 09:36:25 GMT, Andrew Haley wrote: >> src/hotspot/cpu/aarch64/aarch64.ad line 14368: >> >>> 14366: for (unsigned int i = 1; i < VM_Version::pause_impl_desc().inst_count(); ++i) { >>> 14367: $$emit$$"\tisb\n" >>> 14368: } >> >> The code to generate n copies of a pause_impl instruction would be much happier in the MacroAssembler. > > So, lose all the code here in C2 and push it down into a single macro. Done ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From github.com+42899633+eastig at openjdk.java.net Mon Sep 20 16:21:42 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Mon, 20 Sep 2021 16:21:42 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v2] In-Reply-To: References: Message-ID: On Mon, 20 Sep 2021 12:57:45 GMT, Evgeny Astigeevich wrote: >> You could do it the same way as hotspot/jtreg/compiler/c2/aarch64/TestVolatiles.java, i.e. spawn a subtask and parse the output dump. It's very fiddly, though. > > Yes, I used it as an example when I was writing tests for the PR. It works only for C2 because it relies on C2 `XX:+PrintOptoAssembly`. I haven't found anything similar for C1. Fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From ascarpino at openjdk.java.net Mon Sep 20 16:47:59 2021 From: ascarpino at openjdk.java.net (Anthony Scarpino) Date: Mon, 20 Sep 2021 16:47:59 GMT Subject: RFR: 8273297: AES/GCM non-AVX512+VAES CPUs suffer after 8267125 [v2] In-Reply-To: References: Message-ID: On Mon, 20 Sep 2021 05:16:16 GMT, Smita Kamath wrote: >> Performance dropped up to 10% for 1k data after 8267125 for CPUs that do not support the new intrinsic. Tests run were crypto.full.AESGCMBench and crypto.full.AESGCMByteBuffer from the jmh micro benchmarks. >> >> The problem is each instance of GHASH allocates 96 extra longs for the AVX512+VAES intrinsic regardless if the intrinsic is used. This extra table space should be allocated differently so that non-supporting CPUs do not suffer this penalty. This issue also affects non-Intel CPUs too. > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Added a wrapper around aes-gcm intrinsic, changed data size in TestAESMain and added a new constant for htbl entries I approve the jdk changes. You'll need a hotspot reviewer to approve the other changes ------------- Marked as reviewed by ascarpino (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5402 From kbarrett at openjdk.java.net Mon Sep 20 17:05:58 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Mon, 20 Sep 2021 17:05:58 GMT Subject: RFR: 8273928: Use named run ids when problem listing tests [v4] In-Reply-To: References: Message-ID: On Mon, 20 Sep 2021 11:22:40 GMT, Stefan Karlsson wrote: >> Today when you have multiple jtreg run sections in a test, each run gets an automated id that match the location in the file. For example: >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id0 >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id1 >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id2 >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id3 >> >> The path to the test plus the id can be used to problem lists the test. Say that id3 matches a run done with ZGC, and we need to problem list this test with ZGC, then the problem list would contain: >> gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id3 >> >> The problem is when someone adds a new run section before that, then all the ids will be shifted and #id3 doesn't correspond to the ZGC run anymore. A similar problem occurs if two run sections are swapped. >> >> I propose that we refrain from using the automatically generated ids when problem listing tests. Instead we add explicit ids like this: >> * @test id=Z >> >> An additional benefit of doing this is that it will be easier to see what was actually run: >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#G1 >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#Parallel >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#Serial >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#Z >> >> I've gone through the tests in the HotSpot problem lists + some that affects ZGC. There are probably more tests that would benefit from getting explicit ids, but I started with a small set to begin with. > > Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge remote-tracking branch 'origin/master' into 8273928_jtreg_ids > - Review 1 > - Fix TestReferenceClearDuringReferenceProcessing > - Remove temporary ProblemList testing > - 8273928: Use named run ids when problem listing tests Marked as reviewed by kbarrett (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5557 From svkamath at openjdk.java.net Mon Sep 20 17:42:59 2021 From: svkamath at openjdk.java.net (Smita Kamath) Date: Mon, 20 Sep 2021 17:42:59 GMT Subject: RFR: 8273297: AES/GCM non-AVX512+VAES CPUs suffer after 8267125 In-Reply-To: References: Message-ID: On Tue, 14 Sep 2021 13:31:19 GMT, Andrew Haley wrote: >> Performance dropped up to 10% for 1k data after 8267125 for CPUs that do not support the new intrinsic. Tests run were crypto.full.AESGCMBench and crypto.full.AESGCMByteBuffer from the jmh micro benchmarks. >> >> The problem is each instance of GHASH allocates 96 extra longs for the AVX512+VAES intrinsic regardless if the intrinsic is used. This extra table space should be allocated differently so that non-supporting CPUs do not suffer this penalty. This issue also affects non-Intel CPUs too. > > It seems to me there's a serious problem here. When you execute the galoisCounterMode_AESCrypt() intrinsic, I don't think there's a limit on the number of blocks to be encrypted. With the older intrinsic things are not so very bad because the incoming data is split into 6 segments. But if we use this intrinsic, there is no safepoint check in the inner loop, which can lead to a long time to safepoint, and this causes stalls on the other threads. > If you split the incoming data into blocks of about a megabyte you'd lose no measurable performance but you'd dramatically improve the performance of everything else, especially with a concurrent GC. @theRealAph I have implemented changes as per your suggestions. Could you review the changes and let me know your thoughts? ------------- PR: https://git.openjdk.java.net/jdk/pull/5402 From iklam at openjdk.java.net Mon Sep 20 18:31:56 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Mon, 20 Sep 2021 18:31:56 GMT Subject: RFR: 8273915: Create 'nosafepoint' rank [v2] In-Reply-To: <6WoJrQEzL6i3ZSGEEa7i38KSG2MPOi5B2bMdoyBBv9k=.611f2e26-342d-45f5-931d-1665b4152ab0@github.com> References: <6WoJrQEzL6i3ZSGEEa7i38KSG2MPOi5B2bMdoyBBv9k=.611f2e26-342d-45f5-931d-1665b4152ab0@github.com> Message-ID: On Mon, 20 Sep 2021 13:35:29 GMT, Coleen Phillimore wrote: >> Partition safepoint checking and nonchecking lock ranks. The nonchecking locks are always lower ranked than the safepoint checking locks because they cannot block. >> >> This moves some leaf locks to 'nosafepoint' rank and corrects relative ranking. >> >> Tested with tier1-6 and built and run tier1 tests with shenandoah locally. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Remove 'safepoint' rank, now unused. src/hotspot/share/runtime/mutex.hpp line 53: > 51: special = tty + 3, > 52: oopstorage = special + 3, > 53: nosafepoint = oopstorage + 6, Maybe add a comment below `nosafepoint` like: // A thread is not allowed to safepoint while holding a mutex whose // rank is nosafepoint or lower. Also, how about renaming the generic name `lock_types` to something specific like `standard_lock_ranks`? BTW, should we (in separate RFE) add a new enum `_safepoint_check_default`, so that this parameter can be omitted depending on the rank value (unless in places where you need to override it)? For one thing, I never understood which _safepoint_check_xxx I should have used when adding a new lock. I just randomly changed it until the JVM stops crashing. ------------- PR: https://git.openjdk.java.net/jdk/pull/5550 From rkennke at openjdk.java.net Mon Sep 20 19:09:20 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 20 Sep 2021 19:09:20 GMT Subject: RFR: 8274024: Use regular accessors to internal fields of oopDesc Message-ID: Currently, we are using 'raw' accessors to initialize the mark, Klass*, (array-)length and klass_gap of oops. This is ugly and confusing and we should just use the regular accessors. Testing: - [ ] tier1 - [ ] tier2 - [ ] hotspot_gc ------------- Commit messages: - 8274024: Use regular accessors to internal fields of oopDesc Changes: https://git.openjdk.java.net/jdk/pull/5585/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5585&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8274024 Stats: 37 lines in 5 files changed: 1 ins; 17 del; 19 mod Patch: https://git.openjdk.java.net/jdk/pull/5585.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5585/head:pull/5585 PR: https://git.openjdk.java.net/jdk/pull/5585 From coleenp at openjdk.java.net Mon Sep 20 19:44:04 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 20 Sep 2021 19:44:04 GMT Subject: RFR: 8273916: Remove 'special' ranking In-Reply-To: References: Message-ID: <1KYVDVyT-GbHXg8UPba3734ic0YS_cONQyT-agIFZDM=.2b8b422f-cf21-4ac9-b3a5-3907f1722fe6@github.com> On Fri, 17 Sep 2021 11:50:22 GMT, Coleen Phillimore wrote: > This change removes the special ranking and folds it into nosafepoint. You have to look at commit #3 to see this actual part of the change that doesn't include JDK-8273915. > This passes tier1-6 also. Thanks for looking at this, David. I started to map special(-n) to nosafepoint(-n) but since special is nosafepoint - 9 (?) there were interactions with other nosafepoint locks so they needed rankings relative to other locks. CompiledMethod_lock(nosafepoint-4) -> CodeCache_lock(nosafepoint-3) -> VtableStubs_lock(nosafepoint-2) -> CompiledIC_lock(nosafepoint) CodeSweeper_lock(nosafepoint-5) -> CompiledMethod_lock(nosafepoint-4) ThreadsSMRDelete_lock(nosafepoint-2) -> (can't remember which one was nosafepoint-1 anymore!) The compiler locks have the deepest nestings. ------------- PR: https://git.openjdk.java.net/jdk/pull/5563 From coleenp at openjdk.java.net Mon Sep 20 19:52:55 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 20 Sep 2021 19:52:55 GMT Subject: RFR: 8273915: Create 'nosafepoint' rank [v2] In-Reply-To: References: <6WoJrQEzL6i3ZSGEEa7i38KSG2MPOi5B2bMdoyBBv9k=.611f2e26-342d-45f5-931d-1665b4152ab0@github.com> Message-ID: On Mon, 20 Sep 2021 18:15:44 GMT, Ioi Lam wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove 'safepoint' rank, now unused. > > src/hotspot/share/runtime/mutex.hpp line 53: > >> 51: special = tty + 3, >> 52: oopstorage = special + 3, >> 53: nosafepoint = oopstorage + 6, > > Maybe add a comment below `nosafepoint` like: > > > // A thread is not allowed to safepoint while holding a mutex whose > // rank is nosafepoint or lower. > > > Also, how about renaming the generic name `lock_types` to something specific like `standard_lock_ranks`? > > > BTW, should we (in separate RFE) add a new enum `_safepoint_check_default`, so that this parameter can be omitted depending on the rank value (unless in places where you need to override it)? > > For one thing, I never understood which _safepoint_check_xxx I should have used when adding a new lock. I just randomly changed it until the JVM stops crashing. The comment is good, but I added it to above the enum because otherwise it messes up my nice alignment. How about changing lock_types to Rank, since the later plan is to make this an enum class that only allows subtraction operations? I think in a future RFE we should remove this safepoint_check_always or never parameter and just use the ranks, since after the next change there won't be any overrides. For the most part, new locks that JavaThreads or runtime JRT entries use, should be safepoint_check_always so that the safepoint protocol is respected. More locks are safepoint_check_never since these locks are shared with GC or compiler threads and many times used during a safepoint. ------------- PR: https://git.openjdk.java.net/jdk/pull/5550 From coleenp at openjdk.java.net Mon Sep 20 19:52:54 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 20 Sep 2021 19:52:54 GMT Subject: RFR: 8273915: Create 'nosafepoint' rank [v2] In-Reply-To: <6WoJrQEzL6i3ZSGEEa7i38KSG2MPOi5B2bMdoyBBv9k=.611f2e26-342d-45f5-931d-1665b4152ab0@github.com> References: <6WoJrQEzL6i3ZSGEEa7i38KSG2MPOi5B2bMdoyBBv9k=.611f2e26-342d-45f5-931d-1665b4152ab0@github.com> Message-ID: On Mon, 20 Sep 2021 13:35:29 GMT, Coleen Phillimore wrote: >> Partition safepoint checking and nonchecking lock ranks. The nonchecking locks are always lower ranked than the safepoint checking locks because they cannot block. >> >> This moves some leaf locks to 'nosafepoint' rank and corrects relative ranking. >> >> Tested with tier1-6 and built and run tier1 tests with shenandoah locally. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Remove 'safepoint' rank, now unused. Thanks for commenting, Ioi. ------------- PR: https://git.openjdk.java.net/jdk/pull/5550 From coleenp at openjdk.java.net Mon Sep 20 20:16:03 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 20 Sep 2021 20:16:03 GMT Subject: RFR: 8273915: Create 'nosafepoint' rank [v3] In-Reply-To: References: Message-ID: > Partition safepoint checking and nonchecking lock ranks. The nonchecking locks are always lower ranked than the safepoint checking locks because they cannot block. > > This moves some leaf locks to 'nosafepoint' rank and corrects relative ranking. > > Tested with tier1-6 and built and run tier1 tests with shenandoah locally. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Add comment and change enum name to Rank. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5550/files - new: https://git.openjdk.java.net/jdk/pull/5550/files/1a927805..f85c647f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5550&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5550&range=01-02 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5550.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5550/head:pull/5550 PR: https://git.openjdk.java.net/jdk/pull/5550 From coleenp at openjdk.java.net Mon Sep 20 21:34:58 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 20 Sep 2021 21:34:58 GMT Subject: RFR: 8274024: Use regular accessors to internal fields of oopDesc In-Reply-To: References: Message-ID: On Mon, 20 Sep 2021 18:59:34 GMT, Roman Kennke wrote: > Currently, we are using 'raw' accessors to initialize the mark, Klass*, (array-)length and klass_gap of oops. This is ugly and confusing and we should just use the regular accessors. > > Testing: > - [ ] tier1 > - [ ] tier2 > - [ ] hotspot_gc This looks good! ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5585 From iklam at openjdk.java.net Mon Sep 20 21:36:08 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Mon, 20 Sep 2021 21:36:08 GMT Subject: RFR: 8273915: Create 'nosafepoint' rank [v3] In-Reply-To: References: Message-ID: On Mon, 20 Sep 2021 20:16:03 GMT, Coleen Phillimore wrote: >> Partition safepoint checking and nonchecking lock ranks. The nonchecking locks are always lower ranked than the safepoint checking locks because they cannot block. >> >> This moves some leaf locks to 'nosafepoint' rank and corrects relative ranking. >> >> Tested with tier1-6 and built and run tier1 tests with shenandoah locally. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Add comment and change enum name to Rank. LGTM ------------- Marked as reviewed by iklam (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5550 From coleenp at openjdk.java.net Mon Sep 20 22:11:55 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 20 Sep 2021 22:11:55 GMT Subject: RFR: 8273915: Create 'nosafepoint' rank [v3] In-Reply-To: References: Message-ID: On Mon, 20 Sep 2021 20:16:03 GMT, Coleen Phillimore wrote: >> Partition safepoint checking and nonchecking lock ranks. The nonchecking locks are always lower ranked than the safepoint checking locks because they cannot block. >> >> This moves some leaf locks to 'nosafepoint' rank and corrects relative ranking. >> >> Tested with tier1-6 and built and run tier1 tests with shenandoah locally. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Add comment and change enum name to Rank. Thank you, Ioi. ------------- PR: https://git.openjdk.java.net/jdk/pull/5550 From xliu at openjdk.java.net Mon Sep 20 22:12:09 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Mon, 20 Sep 2021 22:12:09 GMT Subject: RFR: 8273608: Deadlock when jcmd of OnError attaches to itself Message-ID: <3mILJs2Lcq7t5gUDP70FH2LVsm-NT2UTsm1JY-rCKB0=.d4dd4b55-9476-43ad-a0cf-ce6d2a9bef4e@github.com> This patch allows the custom commands of OnError to attach to HotSpot itself. It sets the thread of report_and_die() to Native before os::fork_and_exec(cmd). This prevents cmds which require safepoint synchronization from deadlock. eg. OnError='jcmd %p Thread.print'. ------------- Commit messages: - 8273608: Deadlock when jcmd of OnError attaches to itself Changes: https://git.openjdk.java.net/jdk/pull/5590/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5590&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273608 Stats: 34 lines in 1 file changed: 34 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5590.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5590/head:pull/5590 PR: https://git.openjdk.java.net/jdk/pull/5590 From dholmes at openjdk.java.net Mon Sep 20 22:20:51 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 20 Sep 2021 22:20:51 GMT Subject: RFR: 8273916: Remove 'special' ranking In-Reply-To: References: Message-ID: On Fri, 17 Sep 2021 11:50:22 GMT, Coleen Phillimore wrote: > This change removes the special ranking and folds it into nosafepoint. You have to look at commit #3 to see this actual part of the change that doesn't include JDK-8273915. > This passes tier1-6 also. Sorry I don't follow. Lets examine that first example. We currently have: CompiledMethod_lock (special-1) -> CodeCache_lock (special) -> VtableStubs_lock (leaf-2 == special + 11) -> CompiledIC_lock(leaf+2 == special + 15) If we change special to nosafepoint then we would have: CompiledMethod_lock (nosafepoint-1) -> CodeCache_lock (nosafepoint) -> VtableStubs_lock (leaf-2 == nosafepoint + 4) -> CompiledIC_lock(leaf+2 == nosafepoint + 8 == safepoint - 8) so I don't see why we would have CodeCache_lock instead be nosafepoint-3 ? If all you have done is change special to nosafepoint then all the existing relative rankings remain for things like leaf and leaf-2. The only adjustment you have to make is for a leaf+2 to instead be expressed as safepoint-8. David ------------- PR: https://git.openjdk.java.net/jdk/pull/5563 From coleenp at openjdk.java.net Mon Sep 20 22:34:00 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 20 Sep 2021 22:34:00 GMT Subject: RFR: 8273916: Remove 'special' ranking In-Reply-To: References: Message-ID: On Fri, 17 Sep 2021 11:50:22 GMT, Coleen Phillimore wrote: > This change removes the special ranking and folds it into nosafepoint. You have to look at commit #3 to see this actual part of the change that doesn't include JDK-8273915. > This passes tier1-6 also. All the lock rankings are negative values from locks they hold. The nosafepoint ranking and ranking below that are safepoint_check_never locks. So nosafepoint is the top of the lock hierarchy for safepoint_check_never locks. That's how we start with CompiledIC_lock at 'nosafepoint' rank. The lock values will not be exactly the same value as they were when they were 'special'. They'll be ordered relative to their ranking in the nosafepoint to nosafepoint-n range. I did skip a value though (nosafepoint-1). ------------- PR: https://git.openjdk.java.net/jdk/pull/5563 From coleenp at openjdk.java.net Mon Sep 20 22:54:19 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Mon, 20 Sep 2021 22:54:19 GMT Subject: RFR: 8273916: Remove 'special' ranking [v2] In-Reply-To: References: Message-ID: > This change removes the special ranking and folds it into nosafepoint. You have to look at commit #3 to see this actual part of the change that doesn't include JDK-8273915. > This passes tier1-6 also. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Add comment about ThreadSMRDelete_lock ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5563/files - new: https://git.openjdk.java.net/jdk/pull/5563/files/17988275..81c5feff Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5563&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5563&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5563.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5563/head:pull/5563 PR: https://git.openjdk.java.net/jdk/pull/5563 From dholmes at openjdk.java.net Mon Sep 20 23:34:59 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 20 Sep 2021 23:34:59 GMT Subject: RFR: 8273916: Remove 'special' ranking [v2] In-Reply-To: References: Message-ID: On Mon, 20 Sep 2021 22:54:19 GMT, Coleen Phillimore wrote: >> This change removes the special ranking and folds it into nosafepoint. You have to look at commit #3 to see this actual part of the change that doesn't include JDK-8273915. >> This passes tier1-6 also. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Add comment about ThreadSMRDelete_lock So there is more to this than just removing the "special" ranking - you've also changed some locks that are safepoint_never, that used to have ranks above what is now nosafepoint, so that they instead have ranks below nosafepoint - is that right? As long a all relative rankings of locks that can be taken together is maintained, then that is okay - but it is very hard to see that just by looking at the changes. ------------- PR: https://git.openjdk.java.net/jdk/pull/5563 From dholmes at openjdk.java.net Tue Sep 21 01:03:50 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 21 Sep 2021 01:03:50 GMT Subject: RFR: 8273915: Create 'nosafepoint' rank [v3] In-Reply-To: References: Message-ID: <7pDjyODGfcI7z64XeKvTzXMb7is032dtr2J14d_-uzM=.a0135e3a-b9b1-4311-b6bf-41d910858824@github.com> On Mon, 20 Sep 2021 13:26:12 GMT, Coleen Phillimore wrote: >> src/hotspot/share/runtime/mutexLocker.cpp line 253: >> >>> 251: def(ClassInitError_lock , PaddedMonitor, leaf+1, true, _safepoint_check_always); >>> 252: def(Module_lock , PaddedMutex , leaf+2, false, _safepoint_check_always); >>> 253: def(InlineCacheBuffer_lock , PaddedMutex , nosafepoint-1, true, _safepoint_check_never); >> >> Why -1 ? > > It depends on CompiledIC_lock which is ranked nosafepoint. > def(CompiledIC_lock , PaddedMutex , nosafepoint, _safepoint_check_never, true); Okay ------------- PR: https://git.openjdk.java.net/jdk/pull/5550 From jjg at openjdk.java.net Tue Sep 21 01:27:58 2021 From: jjg at openjdk.java.net (Jonathan Gibbons) Date: Tue, 21 Sep 2021 01:27:58 GMT Subject: RFR: 8272992: Replace usages of Collections.sort with List.sort call in jdk.* modules [v2] In-Reply-To: References: <9xbhI0rwD3XbAHZFfQAkJHYivbC5F4N085RuSVWx8HU=.8a470c93-5fee-4981-97e4-afb6cb1147b9@github.com> Message-ID: On Tue, 14 Sep 2021 07:46:12 GMT, Andrey Turbanov wrote: >> Collections.sort is just a wrapper, so it is better to use an instance method directly. > > Andrey Turbanov has updated the pull request incrementally with one additional commit since the last revision: > > 8272992: Replace usages of Collections.sort with List.sort call in jdk.* modules I've looked at the javadoc changes. In general, it would be better to split a review like this into separate ones for separate components, but in this case, I guess it's innocuous enough. ------------- PR: https://git.openjdk.java.net/jdk/pull/5230 From dholmes at openjdk.java.net Tue Sep 21 02:52:51 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 21 Sep 2021 02:52:51 GMT Subject: RFR: 8273915: Create 'nosafepoint' rank [v3] In-Reply-To: References: Message-ID: <1jGDPvUPDoSE_vabl7IHcXS01MmeqRDx2At8Th2Lcnk=.5db4ef3f-15f1-45de-bb89-db93b230c797@github.com> On Mon, 20 Sep 2021 13:21:26 GMT, Coleen Phillimore wrote: >> src/hotspot/share/gc/parallel/psCompactionManager.cpp line 95: >> >>> 93: _shadow_region_array = new (ResourceObj::C_HEAP, mtGC) GrowableArray(10, mtGC); >>> 94: >>> 95: _shadow_region_monitor = new Monitor(Mutex::nosafepoint, "CompactionManager_lock", >> >> Not clear why this one needed to change?? > > This one changes because 'barrier' is above 'leaf' which checks for safepoint. nosafepoint it the top rank that doesn't check for safepoint, so this was made nosafepoint. Okay ------------- PR: https://git.openjdk.java.net/jdk/pull/5550 From dholmes at openjdk.java.net Tue Sep 21 02:52:51 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 21 Sep 2021 02:52:51 GMT Subject: RFR: 8273915: Create 'nosafepoint' rank [v3] In-Reply-To: References: Message-ID: On Mon, 20 Sep 2021 20:16:03 GMT, Coleen Phillimore wrote: >> Partition safepoint checking and nonchecking lock ranks. The nonchecking locks are always lower ranked than the safepoint checking locks because they cannot block. >> >> This moves some leaf locks to 'nosafepoint' rank and corrects relative ranking. >> >> Tested with tier1-6 and built and run tier1 tests with shenandoah locally. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Add comment and change enum name to Rank. This seems fine to me now. Thanks for the offlist discussions. David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5550 From dholmes at openjdk.java.net Tue Sep 21 05:12:50 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 21 Sep 2021 05:12:50 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v2] In-Reply-To: References: Message-ID: <4VZM07gGacQUgxJj5QM6m7LqhHK1Ehw3AyxcSrSQF8U=.15a62f8a-ef46-4c65-b8ad-f6530b74fc33@github.com> On Thu, 16 Sep 2021 17:00:20 GMT, Volker Simonis wrote: >> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. >> >> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): >> >> public static boolean isAlpha(int c) { >> try { >> return IS_ALPHA[c]; >> } catch (ArrayIndexOutOfBoundsException ex) { >> return false; >> } >> } >> >> >> ### Solution >> >> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: >> >> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op >> >> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op >> >> >> ### Implementation details >> >> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. >> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. >> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. >> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. >> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Minor updates as requested by @TheRealMDoerr I think I must have a basic misunderstanding of the problem here, as the described problem seems to be the opposite of what the intent of OmitStackTraceInFastThrow actually is. From the comment in graphKit: // If this throw happens frequently, an uncommon trap might cause // a performance pothole. If there is a local exception handler, // and if this particular bytecode appears to be deoptimizing often, // let us handle the throw inline, with a preconstructed instance. so OmitStackTraceInFastThrow actually allows us to use an optimized fastpath because we can replace a heavyweight stack-full exception with a preallocated stackless one and so avoid the uncommon trap to create the exception. What am I missing? Thanks, David ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From david.holmes at oracle.com Tue Sep 21 05:37:55 2021 From: david.holmes at oracle.com (David Holmes) Date: Tue, 21 Sep 2021 15:37:55 +1000 Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v2] In-Reply-To: <4VZM07gGacQUgxJj5QM6m7LqhHK1Ehw3AyxcSrSQF8U=.15a62f8a-ef46-4c65-b8ad-f6530b74fc33@github.com> References: <4VZM07gGacQUgxJj5QM6m7LqhHK1Ehw3AyxcSrSQF8U=.15a62f8a-ef46-4c65-b8ad-f6530b74fc33@github.com> Message-ID: Please ignore - I deleted this comment as I realized what I was missing (the '-' sign) as soon as I posted it. :( David On 21/09/2021 3:12 pm, David Holmes wrote: > On Thu, 16 Sep 2021 17:00:20 GMT, Volker Simonis wrote: > >>> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. >>> >>> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): >>> >>> public static boolean isAlpha(int c) { >>> try { >>> return IS_ALPHA[c]; >>> } catch (ArrayIndexOutOfBoundsException ex) { >>> return false; >>> } >>> } >>> >>> >>> ### Solution >>> >>> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: >>> >>> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions >>> Benchmark (exceptionProbability) Mode Cnt Score Error Units >>> ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op >>> ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op >>> ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op >>> ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op >>> >>> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions >>> Benchmark (exceptionProbability) Mode Cnt Score Error Units >>> ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op >>> ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op >>> ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op >>> ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op >>> >>> >>> ### Implementation details >>> >>> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. >>> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. >>> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. >>> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. >>> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. >> >> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: >> >> Minor updates as requested by @TheRealMDoerr > > I think I must have a basic misunderstanding of the problem here, as the described problem seems to be the opposite of what the intent of OmitStackTraceInFastThrow actually is. From the comment in graphKit: > > // If this throw happens frequently, an uncommon trap might cause > // a performance pothole. If there is a local exception handler, > // and if this particular bytecode appears to be deoptimizing often, > // let us handle the throw inline, with a preconstructed instance. > > so OmitStackTraceInFastThrow actually allows us to use an optimized fastpath because we can replace a heavyweight stack-full exception with a preallocated stackless one and so avoid the uncommon trap to create the exception. > > What am I missing? > > Thanks, > David > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/5488 > From dholmes at openjdk.java.net Tue Sep 21 05:39:57 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 21 Sep 2021 05:39:57 GMT Subject: RFR: 8273608: Deadlock when jcmd of OnError attaches to itself In-Reply-To: <3mILJs2Lcq7t5gUDP70FH2LVsm-NT2UTsm1JY-rCKB0=.d4dd4b55-9476-43ad-a0cf-ce6d2a9bef4e@github.com> References: <3mILJs2Lcq7t5gUDP70FH2LVsm-NT2UTsm1JY-rCKB0=.d4dd4b55-9476-43ad-a0cf-ce6d2a9bef4e@github.com> Message-ID: <_MjzQyxu0xw_hkQcsKe3kIq3mdrpkKRKJ7vgVf2TNWA=.2fde4c46-30a5-4a61-8723-75ddd3c84df3@github.com> On Mon, 20 Sep 2021 22:02:37 GMT, Xin Liu wrote: > This patch allows the custom commands of OnError to attach to HotSpot itself. > It sets the thread of report_and_die() to Native before os::fork_and_exec(cmd). > This prevents cmds which require safepoint synchronization from deadlock. > eg. OnError='jcmd %p Thread.print'. > > Without this patch, we will encounter a deadlock at safepoint synchronization. > `"main" #1` is the very thread which executes `os::fork_and_exec(cmd)`. > > > Aborting due to java.lang.OutOfMemoryError: Java heap space > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (debug.cpp:364), pid=94632, tid=94633 > # fatal error: OutOfMemory encountered: Java heap space > # > # JRE version: OpenJDK Runtime Environment (18.0) (build 18-internal+0-adhoc.xxinliu.jdk) > # Java VM: OpenJDK 64-Bit Server VM (18-internal+0-adhoc.xxinliu.jdk, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # /local/home/xxinliu/JDK-2085/hs_err_pid94632.log > # > # -XX:OnError="jcmd %p Thread.print" > # Executing /bin/sh -c "jcmd 94632 Thread.print" ... > 94632: > [10.616s][warning][safepoint] > [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timeout detected: > [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timed out while spinning to reach a safepoint. > [10.616s][warning][safepoint] # SafepointSynchronize::begin: Threads which did not reach the safepoint: > [10.616s][warning][safepoint] # "main" #1 prio=5 os_prio=0 cpu=236.97ms elapsed=10.61s tid=0x00007f01b00232f0 nid=94633 runnable [0x00007f01b7a08000] > [10.616s][warning][safepoint] java.lang.Thread.State: RUNNABLE > [10.616s][warning][safepoint] > [10.616s][warning][safepoint] # SafepointSynchronize::begin: (End of list) Hi Xin, The basic idea is reasonable but I think some of the details need changing. Thanks, David src/hotspot/share/utilities/vmError.cpp line 1325: > 1323: > 1324: public: > 1325: VMErrorThreadToNativeFromVM(Thread* t) : _thread(nullptr) { If `t` must be the current thread then it should not be passed in as that gives the impression you can pass any thread. src/hotspot/share/utilities/vmError.cpp line 1333: > 1331: } > 1332: > 1333: if (_thread != nullptr) { No need to terminate the first if block. src/hotspot/share/utilities/vmError.cpp line 1334: > 1332: > 1333: if (_thread != nullptr) { > 1334: assert(!_thread->owns_locks(), "must release all locks when leaving VM"); This can't be an assertion as the thread is not knowingly leaving the VM without first releasing locks. If it does hold locks then that could lead to additional problems and strange errors when running the external command. We have to decide whether it is safest/best to simply not transition to native if holding locks, or whether it is okay to proceed knowing that there are risks of secondary crashes, or hangs, if we do. src/hotspot/share/utilities/vmError.cpp line 1343: > 1341: ThreadStateTransition::transition_from_native(_thread, _thread_in_vm); > 1342: assert(!_thread->is_pending_jni_exception_check(), "Pending JNI Exception Check"); > 1343: // We don't need to clear_walkable because it will happen automagically when we return to java We are not executing JNI code when the do the fork_and_exec so this does not seem necessary. The comment about `clear_walkable` also doesn't make sense here - we are crashing so we are not returning to Java at all. src/hotspot/share/utilities/vmError.cpp line 1663: > 1661: out.print_raw_cr("\" ..."); > 1662: > 1663: VMErrorThreadToNativeFromVM ttnfv(JavaThread::current_or_null()); Surely the current thread need not be a JavaThread here. ------------- Changes requested by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5590 From dholmes at openjdk.java.net Tue Sep 21 05:55:50 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 21 Sep 2021 05:55:50 GMT Subject: RFR: 8274024: Use regular accessors to internal fields of oopDesc In-Reply-To: References: Message-ID: On Mon, 20 Sep 2021 18:59:34 GMT, Roman Kennke wrote: > Currently, we are using 'raw' accessors to initialize the mark, Klass*, (array-)length and klass_gap of oops. This is ugly and confusing and we should just use the regular accessors. > > Testing: > - [ ] tier1 > - [ ] tier2 > - [ ] hotspot_gc Hi Roman, This seems semantically wrong to me. Until we have applied all these operations to the raw "mem" we don't have an Oop, we just have a chunk of memory that is being transformed into an oop. Should these operations instead be combined into a factory method that takes raw "mem" and returns an oop, rather than having setters for these internal fields? Thanks, David ------------- PR: https://git.openjdk.java.net/jdk/pull/5585 From iklam at openjdk.java.net Tue Sep 21 06:24:44 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Tue, 21 Sep 2021 06:24:44 GMT Subject: RFR: 8273508: Support archived heap objects in SerialGC Message-ID: When `-XX:+UseSerialGC is enabled`, load the CDS archived heap objects into `SerialHeap::old_gen()` during VM bootstrap. This improves VM start-up time, mostly because the module graph can be loaded from the archive. $ perf stat -r 40 java -XX:+UseSerialGC -version Before: 0.042484507 seconds time elapsed ( +- 0.72% ) After: 0.031671000 seconds time elapsed ( +- 0.72% ) Changes in the gc subdirectories are contributed by @tschatzl ------------- Commit messages: - 8273508: Support archived heap objects in SerialGC Changes: https://git.openjdk.java.net/jdk/pull/5596/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5596&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8273508 Stats: 211 lines in 12 files changed: 179 ins; 1 del; 31 mod Patch: https://git.openjdk.java.net/jdk/pull/5596.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5596/head:pull/5596 PR: https://git.openjdk.java.net/jdk/pull/5596 From stefank at openjdk.java.net Tue Sep 21 07:46:00 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Tue, 21 Sep 2021 07:46:00 GMT Subject: RFR: 8273928: Use named run ids when problem listing tests [v4] In-Reply-To: References: Message-ID: On Mon, 20 Sep 2021 11:22:40 GMT, Stefan Karlsson wrote: >> Today when you have multiple jtreg run sections in a test, each run gets an automated id that match the location in the file. For example: >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id0 >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id1 >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id2 >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id3 >> >> The path to the test plus the id can be used to problem lists the test. Say that id3 matches a run done with ZGC, and we need to problem list this test with ZGC, then the problem list would contain: >> gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id3 >> >> The problem is when someone adds a new run section before that, then all the ids will be shifted and #id3 doesn't correspond to the ZGC run anymore. A similar problem occurs if two run sections are swapped. >> >> I propose that we refrain from using the automatically generated ids when problem listing tests. Instead we add explicit ids like this: >> * @test id=Z >> >> An additional benefit of doing this is that it will be easier to see what was actually run: >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#G1 >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#Parallel >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#Serial >> Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#Z >> >> I've gone through the tests in the HotSpot problem lists + some that affects ZGC. There are probably more tests that would benefit from getting explicit ids, but I started with a small set to begin with. > > Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge remote-tracking branch 'origin/master' into 8273928_jtreg_ids > - Review 1 > - Fix TestReferenceClearDuringReferenceProcessing > - Remove temporary ProblemList testing > - 8273928: Use named run ids when problem listing tests Thanks for reviewing! ------------- PR: https://git.openjdk.java.net/jdk/pull/5557 From stefank at openjdk.java.net Tue Sep 21 07:46:01 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Tue, 21 Sep 2021 07:46:01 GMT Subject: Integrated: 8273928: Use named run ids when problem listing tests In-Reply-To: References: Message-ID: On Fri, 17 Sep 2021 08:29:48 GMT, Stefan Karlsson wrote: > Today when you have multiple jtreg run sections in a test, each run gets an automated id that match the location in the file. For example: > Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id0 > Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id1 > Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id2 > Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id3 > > The path to the test plus the id can be used to problem lists the test. Say that id3 matches a run done with ZGC, and we need to problem list this test with ZGC, then the problem list would contain: > gc/stringdedup/TestStringDeduplicationAgeThreshold.java#id3 > > The problem is when someone adds a new run section before that, then all the ids will be shifted and #id3 doesn't correspond to the ZGC run anymore. A similar problem occurs if two run sections are swapped. > > I propose that we refrain from using the automatically generated ids when problem listing tests. Instead we add explicit ids like this: > * @test id=Z > > An additional benefit of doing this is that it will be easier to see what was actually run: > Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#G1 > Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#Parallel > Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#Serial > Passed: gc/stringdedup/TestStringDeduplicationAgeThreshold.java#Z > > I've gone through the tests in the HotSpot problem lists + some that affects ZGC. There are probably more tests that would benefit from getting explicit ids, but I started with a small set to begin with. This pull request has now been integrated. Changeset: c60bcd09 Author: Stefan Karlsson URL: https://git.openjdk.java.net/jdk/commit/c60bcd09b73f6ad176bbd73fe3c1a09545609353 Stats: 84 lines in 17 files changed: 13 ins; 1 del; 70 mod 8273928: Use named run ids when problem listing tests Reviewed-by: pliden, kbarrett, dholmes ------------- PR: https://git.openjdk.java.net/jdk/pull/5557 From aph at openjdk.java.net Tue Sep 21 08:09:56 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 21 Sep 2021 08:09:56 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v2] In-Reply-To: <9aPLibCCYMKqhil77ooUe8xzeT4P8JwSHmmYez0-5TM=.c46d790d-d6fc-4925-beb8-875cbf49a62b@github.com> References: <9aPLibCCYMKqhil77ooUe8xzeT4P8JwSHmmYez0-5TM=.c46d790d-d6fc-4925-beb8-875cbf49a62b@github.com> Message-ID: On Mon, 20 Sep 2021 16:21:38 GMT, Evgeny Astigeevich wrote: >> This PR is a follow-up on the discussion [?RFC: AArch64: Implementing spin pauses with ISB?](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054033.html). >> >> It adds the option `UsePauseImpl=value`, where `value` can be: >> >> - `none`: no implementation for spin pauses. This is the default value. >> - `Nnop`: use `nop` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `nop` instructions. >> - `Nisb`: use `isb` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `isb` instructions. >> - `Nyield`: use `yield` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `yield` instructions. >> >> The code for the `Thread.onSpinWait` intrinsic is generated based on the value of `UsePauseImpl`. >> >> Testing: >> >> - `make test TEST="gtest"`: Passed >> - `make run-test TEST="tier1"`: Passed >> - `make run-test TEST="tier2"`: Passed >> - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > Replace 'for' loops with macros src/hotspot/cpu/aarch64/aarch64.ad line 14374: > 14372: ShouldNotReachHere(); > 14373: } > 14374: #undef EMIT_N_ASM_STRINGS None of this is necessary. Printing "onspinwait" is enough. src/hotspot/cpu/aarch64/aarch64.ad line 14392: > 14390: ShouldNotReachHere(); > 14391: } > 14392: %} Please let the MacroAssembler do this. Just call `MacroAssembler::spin_wait()`. src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp line 2999: > 2997: break; > 2998: default: > 2999: ShouldNotReachHere(); Same here. Please just call `MacroAssembler::spin_wait()`. ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From aph at openjdk.java.net Tue Sep 21 08:09:57 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 21 Sep 2021 08:09:57 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v2] In-Reply-To: References: Message-ID: <2mJybb2RLTON7AdFNCbru8cnNKiraG8zpNFccNCJCY8=.3414cf7e-444b-4160-95f7-38306650d4b9@github.com> On Mon, 20 Sep 2021 16:17:53 GMT, Evgeny Astigeevich wrote: >> Yes, I used it as an example when I was writing tests for the PR. It works only for C2 because it relies on C2 `XX:+PrintOptoAssembly`. I haven't found anything similar for C1. > > Fixed. `-XX:+PrintAssembly` ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From aph at openjdk.java.net Tue Sep 21 08:17:44 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 21 Sep 2021 08:17:44 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v2] In-Reply-To: <9aPLibCCYMKqhil77ooUe8xzeT4P8JwSHmmYez0-5TM=.c46d790d-d6fc-4925-beb8-875cbf49a62b@github.com> References: <9aPLibCCYMKqhil77ooUe8xzeT4P8JwSHmmYez0-5TM=.c46d790d-d6fc-4925-beb8-875cbf49a62b@github.com> Message-ID: <1y06ntv5fV4fwNgpCiHhcjhYvpwaYGs_e1xjOnbtCHo=.341e1a89-083b-49f3-badc-395c41adf997@github.com> On Mon, 20 Sep 2021 16:21:38 GMT, Evgeny Astigeevich wrote: >> This PR is a follow-up on the discussion [?RFC: AArch64: Implementing spin pauses with ISB?](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054033.html). >> >> It adds the option `UsePauseImpl=value`, where `value` can be: >> >> - `none`: no implementation for spin pauses. This is the default value. >> - `Nnop`: use `nop` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `nop` instructions. >> - `Nisb`: use `isb` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `isb` instructions. >> - `Nyield`: use `yield` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `yield` instructions. >> >> The code for the `Thread.onSpinWait` intrinsic is generated based on the value of `UsePauseImpl`. >> >> Testing: >> >> - `make test TEST="gtest"`: Passed >> - `make run-test TEST="tier1"`: Passed >> - `make run-test TEST="tier2"`: Passed >> - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > Replace 'for' loops with macros src/hotspot/cpu/aarch64/globals_aarch64.hpp line 114: > 112: "Value -1 means off.") \ > 113: range(-1, 4096) \ > 114: product(ccstr, UsePauseImpl, "none", \ The name "UsePauseImpl" fails to make the connection with `onSpinWait`. If you called it something like `OnSpinWaitImpl` that would make the connection. src/hotspot/cpu/aarch64/globals_aarch64.hpp line 115: > 113: range(-1, 4096) \ > 114: product(ccstr, UsePauseImpl, "none", \ > 115: "Use instructions to implement pauses." \ Suggestion: "Use instructions to implement java.lang.Thread.onSpinWait() ." \ ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From tschatzl at openjdk.java.net Tue Sep 21 08:20:52 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 21 Sep 2021 08:20:52 GMT Subject: RFR: 8273508: Support archived heap objects in SerialGC In-Reply-To: References: Message-ID: On Tue, 21 Sep 2021 06:11:58 GMT, Ioi Lam wrote: > When `-XX:+UseSerialGC is enabled`, load the CDS archived heap objects into `SerialHeap::old_gen()` during VM bootstrap. This improves VM start-up time, mostly because the module graph can be loaded from the archive. > > > $ perf stat -r 40 java -XX:+UseSerialGC -version > > Before: 0.042484507 seconds time elapsed ( +- 0.72% ) > After: 0.031671000 seconds time elapsed ( +- 0.72% ) > > > Changes in the gc subdirectories are contributed by @tschatzl Initial pass over GC code. src/hotspot/share/gc/serial/serialHeap.cpp line 32: > 30: #include "gc/shared/strongRootsScope.hpp" > 31: #include "gc/shared/suspendibleThreadSet.hpp" > 32: #include "logging/log.hpp" Suggestion: Debug code. src/hotspot/share/gc/serial/serialHeap.cpp line 121: > 119: MutexLocker ml(Heap_lock); > 120: HeapWord* result = old_gen()->allocate(word_size, /* is_tlab = */ false); > 121: return result; Suggestion: return old_gen()->allocate(word_size, false /* is_tlab */); Tighten the code...; GC code (see similar examples in the file) also adds the comment for bool parameters after that parameter. src/hotspot/share/gc/serial/tenuredGeneration.cpp line 225: > 223: TenuredSpace* space = (TenuredSpace*)_the_space; > 224: > 225: space->initialize_threshold(); Suggestion: // Create the BOT for the archive space. TenuredSpace* space = (TenuredSpace*)_the_space; space->initialize_threshold(); src/hotspot/share/gc/serial/tenuredGeneration.cpp line 228: > 226: HeapWord* start = archive_space.start(); > 227: while (start < archive_space.end()) { > 228: size_t word_size = _the_space->block_size(start); /// Crashes here when accessing the klass Please remove the debug comment :) Suggestion: size_t word_size = _the_space->block_size(start); ------------- PR: https://git.openjdk.java.net/jdk/pull/5596 From simonis at openjdk.java.net Tue Sep 21 10:09:11 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Tue, 21 Sep 2021 10:09:11 GMT Subject: RFR: 8273392: Improve usability of stack-less exceptions due to -XX:+OmitStackTraceInFastThrow [v3] In-Reply-To: References: Message-ID: <02TswvRnnXEQeNetqwBh8XiivQsMFUCEIt758AdjrHk=.6216480f-2031-481d-8bf6-e94e8c5f1977@github.com> > If running with `-XX:+OmitStackTraceInFastThrow` (which is the default) C2 will optimize certain "hot" implicit exceptions (i.e. AIOOBE, NullPointerExceptions,..) and replace them by a static, pre-allocated exception without any stacktrace. > > However, we can actually do better. Instead of using a single, pre-allocated exception object for all methods we can let the compiler allocate specific exceptions for each compilation unit (i.e. nmethod) and fill them with at least one stack frame with the method /line-number information of the currently compiled method. If the method in question is being inlined (which often happens), we can add stackframes for all callers up to the inlining depth of the method in question. > > For the attached JTreg test, we get the following exception in interpreter mode: > > java.lang.NullPointerException: Cannot read the array length because "" is null > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) > at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) > at compiler.exceptions.StackFrameInFastThrow.main(StackFrameInFastThrow.java:233) > > Once the method gets compiled with `-XX:+OmitStackTraceInFastThrow` the same exception will look as follows: > > java.lang.NullPointerException > > After this change, if `StackFrameInFastThrow.throwImplicitException()` will be compiled stand alone, we will get: > > java.lang.NullPointerException > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > > and if `StackFrameInFastThrow.throwImplicitException()` will be inlined into `level2()` and `level2()` into `level1()` we will get the following exception (altough we're still running with `-XX:+OmitStackTraceInFastThrow`): > > java.lang.NullPointerException > at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) > at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) > at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) > > The new functionality is guarded by `-XX:+/-StackFrameInFastThrow`, but switched on by default (I'll create a CSR for the new option once reviewers are comfortable with the change). Notice that the optimization comes at no run-time costs because all the extra work will be done at compile time. > > ## Implementation details > > - Already the current implementation of `-XX:+OmitStackTraceInFastThrow` potentially lazy-allocates the empty singleton exceptions like AIOOBE in `ciEnv::ArrayStoreException_instance()`. With this change, if running with `-XX:+StackFrameInFastThrow` we will always allocate new exception objects and populate them with the stack frames which are statically available at compile time (see `java_lang_Throwable::fill_in_stack_trace_of_implicit_exception()`). > - Because nmethods don't act as strong GC roots, we have to create a global JNI handle for every newly generated exception to prevent GC from collecting them. > - In order to avoid a memory leak we have to release these global JNI handles once a nmethod gets unloaded. In order to achieve this, I've added a new section "implicit exceptions" to the nmethod which holds these JNI handles. > - While adding the new "implicit exceptions" section to the corresponding stats (`print_nmethod_stats()` and printing routines (`nmethod::print()`) I realized that a previous change ([JDK-8254231: Implementation of Foreign Linker API (Incubator)](https://bugs.openjdk.java.net/browse/JDK-8254231)) had already introduced a new nmethod section ("native invokers") but missed to add it to the corresponding stats and printing routines so I've added that section as well. > - The `#ifdef COMPILER2` guards are only required to not break the `zero`/`minimal` builds. > - The JTreg test is using `-XX:PerMethodTrapLimit=0` to handle all implicit exceptions as "hot". This makes the test simpler and at the same time provokes the allocation of more implicit exceptions. > - Manually verified that the created Exception objects are freed by GC once the corresponding nmethods have been flushed. > - Manual "stress" test with a very small heap and continuous recompilation of methods with explicit exceptions to provoke GCs during compilation didn't reveal any issues. Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: Create implcit exceptions with an array of StackTraceElements right away instead of creating a backtrace. This prevents that implicit exceptions will keep classes alive due to Java mirrors in the backtrace. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5392/files - new: https://git.openjdk.java.net/jdk/pull/5392/files/07ebd638..f4a205b1 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5392&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5392&range=01-02 Stats: 34 lines in 3 files changed: 13 ins; 5 del; 16 mod Patch: https://git.openjdk.java.net/jdk/pull/5392.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5392/head:pull/5392 PR: https://git.openjdk.java.net/jdk/pull/5392 From simonis at openjdk.java.net Tue Sep 21 10:17:46 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Tue, 21 Sep 2021 10:17:46 GMT Subject: RFR: 8273392: Improve usability of stack-less exceptions due to -XX:+OmitStackTraceInFastThrow [v3] In-Reply-To: <02TswvRnnXEQeNetqwBh8XiivQsMFUCEIt758AdjrHk=.6216480f-2031-481d-8bf6-e94e8c5f1977@github.com> References: <02TswvRnnXEQeNetqwBh8XiivQsMFUCEIt758AdjrHk=.6216480f-2031-481d-8bf6-e94e8c5f1977@github.com> Message-ID: On Tue, 21 Sep 2021 10:09:11 GMT, Volker Simonis wrote: >> If running with `-XX:+OmitStackTraceInFastThrow` (which is the default) C2 will optimize certain "hot" implicit exceptions (i.e. AIOOBE, NullPointerExceptions,..) and replace them by a static, pre-allocated exception without any stacktrace. >> >> However, we can actually do better. Instead of using a single, pre-allocated exception object for all methods we can let the compiler allocate specific exceptions for each compilation unit (i.e. nmethod) and fill them with at least one stack frame with the method /line-number information of the currently compiled method. If the method in question is being inlined (which often happens), we can add stackframes for all callers up to the inlining depth of the method in question. >> >> For the attached JTreg test, we get the following exception in interpreter mode: >> >> java.lang.NullPointerException: Cannot read the array length because "" is null >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) >> at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) >> at compiler.exceptions.StackFrameInFastThrow.main(StackFrameInFastThrow.java:233) >> >> Once the method gets compiled with `-XX:+OmitStackTraceInFastThrow` the same exception will look as follows: >> >> java.lang.NullPointerException >> >> After this change, if `StackFrameInFastThrow.throwImplicitException()` will be compiled stand alone, we will get: >> >> java.lang.NullPointerException >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> >> and if `StackFrameInFastThrow.throwImplicitException()` will be inlined into `level2()` and `level2()` into `level1()` we will get the following exception (altough we're still running with `-XX:+OmitStackTraceInFastThrow`): >> >> java.lang.NullPointerException >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) >> at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) >> >> The new functionality is guarded by `-XX:+/-StackFrameInFastThrow`, but switched on by default (I'll create a CSR for the new option once reviewers are comfortable with the change). Notice that the optimization comes at no run-time costs because all the extra work will be done at compile time. >> >> ## Implementation details >> >> - Already the current implementation of `-XX:+OmitStackTraceInFastThrow` potentially lazy-allocates the empty singleton exceptions like AIOOBE in `ciEnv::ArrayStoreException_instance()`. With this change, if running with `-XX:+StackFrameInFastThrow` we will always allocate new exception objects and populate them with the stack frames which are statically available at compile time (see `java_lang_Throwable::fill_in_stack_trace_of_implicit_exception()`). >> - Because nmethods don't act as strong GC roots, we have to create a global JNI handle for every newly generated exception to prevent GC from collecting them. >> - In order to avoid a memory leak we have to release these global JNI handles once a nmethod gets unloaded. In order to achieve this, I've added a new section "implicit exceptions" to the nmethod which holds these JNI handles. >> - While adding the new "implicit exceptions" section to the corresponding stats (`print_nmethod_stats()` and printing routines (`nmethod::print()`) I realized that a previous change ([JDK-8254231: Implementation of Foreign Linker API (Incubator)](https://bugs.openjdk.java.net/browse/JDK-8254231)) had already introduced a new nmethod section ("native invokers") but missed to add it to the corresponding stats and printing routines so I've added that section as well. >> - The `#ifdef COMPILER2` guards are only required to not break the `zero`/`minimal` builds. >> - The JTreg test is using `-XX:PerMethodTrapLimit=0` to handle all implicit exceptions as "hot". This makes the test simpler and at the same time provokes the allocation of more implicit exceptions. >> - Manually verified that the created Exception objects are freed by GC once the corresponding nmethods have been flushed. >> - Manual "stress" test with a very small heap and continuous recompilation of methods with explicit exceptions to provoke GCs during compilation didn't reveal any issues. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Create implcit exceptions with an array of StackTraceElements right away instead of creating a backtrace. This prevents that implicit exceptions will keep classes alive due to Java mirrors in the backtrace. Hi Richard, thanks one more time for pointing out the issue with the class mirrors in the backtrace which keep classes alive and potentially prevents them from being unloaded. Fortunately, I think the solution is pretty simple. I don't think we need the backtrace at all. In the end it is just an optimization to save some space and not construct the full StackTraceElement[] right at the creation time of an exception. But the implicit exceptions which we are creating here are "nmethod-singletons" and as such I don't think we loose much if we create the array of StackTraceElements right away instead of creating a backtrace (see my last push). The StackTraceElements only contain Strings and therefore don't keep any classes unnecessarily alive. What do you think? And once you're on it, would you mind reviewing the whole PR :) Thank you and best regards, Volker ------------- PR: https://git.openjdk.java.net/jdk/pull/5392 From coleenp at openjdk.java.net Tue Sep 21 11:42:03 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 21 Sep 2021 11:42:03 GMT Subject: RFR: 8273915: Create 'nosafepoint' rank [v3] In-Reply-To: References: Message-ID: <9FD9eRy_euAGI5zhAhAJmV3BQQlFLT-9_knH1n7d-yQ=.3faca5a4-753f-4d21-ad8d-05c569f5a3ea@github.com> On Mon, 20 Sep 2021 20:16:03 GMT, Coleen Phillimore wrote: >> Partition safepoint checking and nonchecking lock ranks. The nonchecking locks are always lower ranked than the safepoint checking locks because they cannot block. >> >> This moves some leaf locks to 'nosafepoint' rank and corrects relative ranking. >> >> Tested with tier1-6 and built and run tier1 tests with shenandoah locally. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Add comment and change enum name to Rank. Thank you, David for the offline discussions and for the code review! ------------- PR: https://git.openjdk.java.net/jdk/pull/5550 From coleenp at openjdk.java.net Tue Sep 21 11:42:03 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 21 Sep 2021 11:42:03 GMT Subject: Integrated: 8273915: Create 'nosafepoint' rank In-Reply-To: References: Message-ID: On Thu, 16 Sep 2021 17:11:30 GMT, Coleen Phillimore wrote: > Partition safepoint checking and nonchecking lock ranks. The nonchecking locks are always lower ranked than the safepoint checking locks because they cannot block. > > This moves some leaf locks to 'nosafepoint' rank and corrects relative ranking. > > Tested with tier1-6 and built and run tier1 tests with shenandoah locally. This pull request has now been integrated. Changeset: 111d5e1a Author: Coleen Phillimore URL: https://git.openjdk.java.net/jdk/commit/111d5e1a9324cb5e8d98627f6329d17fcbc9c13d Stats: 90 lines in 25 files changed: 14 ins; 0 del; 76 mod 8273915: Create 'nosafepoint' rank Reviewed-by: dholmes, iklam ------------- PR: https://git.openjdk.java.net/jdk/pull/5550 From coleenp at openjdk.java.net Tue Sep 21 12:07:06 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 21 Sep 2021 12:07:06 GMT Subject: RFR: 8273916: Remove 'special' ranking [v3] In-Reply-To: References: Message-ID: > This change removes the special ranking and folds it into nosafepoint. You have to look at commit #3 to see this actual part of the change that doesn't include JDK-8273915. > This passes tier1-6 also. Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Remove blank line. - Merge branch 'master' into remove-special - Add comment about ThreadSMRDelete_lock - Remove "special" rank. - Partition safepoint checking and nonchecking lock ranks. The nonchecking locks are always lower ranked than the safepoint checking locks because they cannot block. - Partition safepoint checking and nonchecking lock ranks. The nonchecking locks are always lower ranked than the safepoint checking locks because they cannot block. ------------- Changes: https://git.openjdk.java.net/jdk/pull/5563/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5563&range=02 Stats: 44 lines in 5 files changed: 0 ins; 5 del; 39 mod Patch: https://git.openjdk.java.net/jdk/pull/5563.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5563/head:pull/5563 PR: https://git.openjdk.java.net/jdk/pull/5563 From coleenp at openjdk.java.net Tue Sep 21 12:07:07 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 21 Sep 2021 12:07:07 GMT Subject: RFR: 8273916: Remove 'special' ranking [v2] In-Reply-To: References: Message-ID: On Mon, 20 Sep 2021 22:54:19 GMT, Coleen Phillimore wrote: >> This change removes the special ranking and folds it into nosafepoint. You have to look at commit #3 to see this actual part of the change that doesn't include JDK-8273915. >> This passes tier1-6 also. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Add comment about ThreadSMRDelete_lock I just merged with my previous commit and hopefully this change makes a lot more sense now. ------------- PR: https://git.openjdk.java.net/jdk/pull/5563 From github.com+42899633+eastig at openjdk.java.net Tue Sep 21 12:42:46 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Tue, 21 Sep 2021 12:42:46 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 In-Reply-To: References: Message-ID: On Sun, 19 Sep 2021 13:07:49 GMT, David Holmes wrote: > If you are adding a new product flag then a CSR request is needed. > > David Hi David, I'll create a CSR when the name of the option is finilazed. ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From github.com+42899633+eastig at openjdk.java.net Tue Sep 21 12:42:46 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Tue, 21 Sep 2021 12:42:46 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v2] In-Reply-To: <2mJybb2RLTON7AdFNCbru8cnNKiraG8zpNFccNCJCY8=.3414cf7e-444b-4160-95f7-38306650d4b9@github.com> References: <2mJybb2RLTON7AdFNCbru8cnNKiraG8zpNFccNCJCY8=.3414cf7e-444b-4160-95f7-38306650d4b9@github.com> Message-ID: On Tue, 21 Sep 2021 08:06:32 GMT, Andrew Haley wrote: >> Fixed. > > `-XX:+PrintAssembly` To have assembly instructions in `-XX:+PrintAssembly` output `hsdis` needs to be provided: 0x0000ffff61ba2b5c: ; {metadata({method} {0x0000000800466ab8} 'isLatin1' '()Z' in 'java/lang/String')} 0x0000ffff61ba2b5c: 0857 8dd2 | c808 a0f2 | 0801 c0f2 | e807 00f9 ;; 0xFFFFFFFFFFFFFFFF 0x0000ffff61ba2b6c: 0800 8092 | e803 00f9 0x0000ffff61ba2b74: ; {runtime_call counter_overflow Runtime1 stub} However it can help to skip to the place where instructions are expected and to check instructions' hex code. ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From github.com+42899633+eastig at openjdk.java.net Tue Sep 21 13:17:31 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Tue, 21 Sep 2021 13:17:31 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v2] In-Reply-To: References: <9aPLibCCYMKqhil77ooUe8xzeT4P8JwSHmmYez0-5TM=.c46d790d-d6fc-4925-beb8-875cbf49a62b@github.com> Message-ID: On Tue, 21 Sep 2021 08:03:13 GMT, Andrew Haley wrote: >> Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: >> >> Replace 'for' loops with macros > > src/hotspot/cpu/aarch64/aarch64.ad line 14374: > >> 14372: ShouldNotReachHere(); >> 14373: } >> 14374: #undef EMIT_N_ASM_STRINGS > > None of this is necessary. Printing "onspinwait" is enough. This results no instructions implementing `onspinwait` in OptoAssembly output. Why do we want to hide the details? ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From aph at openjdk.java.net Tue Sep 21 14:51:06 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 21 Sep 2021 14:51:06 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v2] In-Reply-To: References: <2mJybb2RLTON7AdFNCbru8cnNKiraG8zpNFccNCJCY8=.3414cf7e-444b-4160-95f7-38306650d4b9@github.com> Message-ID: On Tue, 21 Sep 2021 12:38:50 GMT, Evgeny Astigeevich wrote: >> `-XX:+PrintAssembly` > > To have assembly instructions in `-XX:+PrintAssembly` output `hsdis` needs to be provided: > > 0x0000ffff61ba2b5c: ; {metadata({method} {0x0000000800466ab8} 'isLatin1' '()Z' in 'java/lang/String')} > 0x0000ffff61ba2b5c: 0857 8dd2 | c808 a0f2 | 0801 c0f2 | e807 00f9 > ;; 0xFFFFFFFFFFFFFFFF > 0x0000ffff61ba2b6c: 0800 8092 | e803 00f9 > > 0x0000ffff61ba2b74: ; {runtime_call counter_overflow Runtime1 stub} > > > However it can help to skip to the place where instructions are expected and to check instructions' hex code. True. There's no C1 equivalent. ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From aph at openjdk.java.net Tue Sep 21 15:17:33 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 21 Sep 2021 15:17:33 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v2] In-Reply-To: References: <9aPLibCCYMKqhil77ooUe8xzeT4P8JwSHmmYez0-5TM=.c46d790d-d6fc-4925-beb8-875cbf49a62b@github.com> Message-ID: On Tue, 21 Sep 2021 13:14:05 GMT, Evgeny Astigeevich wrote: >> src/hotspot/cpu/aarch64/aarch64.ad line 14374: >> >>> 14372: ShouldNotReachHere(); >>> 14373: } >>> 14374: #undef EMIT_N_ASM_STRINGS >> >> None of this is necessary. Printing "onspinwait" is enough. > > This results no instructions implementing `onspinwait` in OptoAssembly output. Why do we want to hide the details? It's not useful as a verification that the correct instructions were generated: you need a disassembly for that. OptoAssembly is usually a somewhat briefer format. ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From sviswanathan at openjdk.java.net Tue Sep 21 16:42:35 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Tue, 21 Sep 2021 16:42:35 GMT Subject: RFR: 8273297: AES/GCM non-AVX512+VAES CPUs suffer after 8267125 [v2] In-Reply-To: References: Message-ID: On Mon, 20 Sep 2021 05:16:16 GMT, Smita Kamath wrote: >> Performance dropped up to 10% for 1k data after 8267125 for CPUs that do not support the new intrinsic. Tests run were crypto.full.AESGCMBench and crypto.full.AESGCMByteBuffer from the jmh micro benchmarks. >> >> The problem is each instance of GHASH allocates 96 extra longs for the AVX512+VAES intrinsic regardless if the intrinsic is used. This extra table space should be allocated differently so that non-supporting CPUs do not suffer this penalty. This issue also affects non-Intel CPUs too. > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Added a wrapper around aes-gcm intrinsic, changed data size in TestAESMain and added a new constant for htbl entries src/java.base/share/classes/com/sun/crypto/provider/GaloisCounterMode.java line 588: > 586: ctOfs+len, out, outOfs+len, gctr, ghash); > 587: len+= partlen; > 588: inLen-= len; This should be inLen -= partlen; ------------- PR: https://git.openjdk.java.net/jdk/pull/5402 From simonis at openjdk.java.net Tue Sep 21 17:11:32 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Tue, 21 Sep 2021 17:11:32 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v2] In-Reply-To: References: Message-ID: On Thu, 16 Sep 2021 08:14:41 GMT, Martin Doerr wrote: >> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: >> >> Minor updates as requested by @TheRealMDoerr > > This looks like a great idea. I have a few minor remarks / suggestions. @TheRealMDoerr, are you fine with this change now? ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From iklam at openjdk.java.net Tue Sep 21 17:35:13 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Tue, 21 Sep 2021 17:35:13 GMT Subject: RFR: 8273508: Support archived heap objects in SerialGC [v2] In-Reply-To: References: Message-ID: > When `-XX:+UseSerialGC is enabled`, load the CDS archived heap objects into `SerialHeap::old_gen()` during VM bootstrap. This improves VM start-up time, mostly because the module graph can be loaded from the archive. > > > $ perf stat -r 40 java -XX:+UseSerialGC -version > > Before: 0.042484507 seconds time elapsed ( +- 0.72% ) > After: 0.031671000 seconds time elapsed ( +- 0.72% ) > > > Changes in the gc subdirectories are contributed by @tschatzl Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: @tschatzl comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5596/files - new: https://git.openjdk.java.net/jdk/pull/5596/files/9027660e..7d841aae Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5596&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5596&range=00-01 Stats: 7 lines in 2 files changed: 1 ins; 3 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/5596.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5596/head:pull/5596 PR: https://git.openjdk.java.net/jdk/pull/5596 From mdoerr at openjdk.java.net Tue Sep 21 18:12:34 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Tue, 21 Sep 2021 18:12:34 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v2] In-Reply-To: References: Message-ID: On Thu, 16 Sep 2021 17:00:20 GMT, Volker Simonis wrote: >> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. >> >> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): >> >> public static boolean isAlpha(int c) { >> try { >> return IS_ALPHA[c]; >> } catch (ArrayIndexOutOfBoundsException ex) { >> return false; >> } >> } >> >> >> ### Solution >> >> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: >> >> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op >> >> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op >> >> >> ### Implementation details >> >> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. >> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. >> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. >> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. >> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Minor updates as requested by @TheRealMDoerr Thanks for the update! I haven't looked into every detail, yet, but it basically looks good to me. I think we should disable OmitStackTraceInFastThrow and run a substantial amount of tests. Otherwise, test coverage could be poor. Did you do that already? Running without OmitStackTraceInFastThrow is indeed relevant for us. Would be interesting to know what else benefits from it. Maybe startup performance (class loading may use many Exceptions). ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From svkamath at openjdk.java.net Tue Sep 21 18:31:12 2021 From: svkamath at openjdk.java.net (Smita Kamath) Date: Tue, 21 Sep 2021 18:31:12 GMT Subject: RFR: 8273297: AES/GCM non-AVX512+VAES CPUs suffer after 8267125 [v3] In-Reply-To: References: Message-ID: > Performance dropped up to 10% for 1k data after 8267125 for CPUs that do not support the new intrinsic. Tests run were crypto.full.AESGCMBench and crypto.full.AESGCMByteBuffer from the jmh micro benchmarks. > > The problem is each instance of GHASH allocates 96 extra longs for the AVX512+VAES intrinsic regardless if the intrinsic is used. This extra table space should be allocated differently so that non-supporting CPUs do not suffer this penalty. This issue also affects non-Intel CPUs too. Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: Fixed length decrement issue ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5402/files - new: https://git.openjdk.java.net/jdk/pull/5402/files/7ea464ae..19b0d547 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5402&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5402&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5402.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5402/head:pull/5402 PR: https://git.openjdk.java.net/jdk/pull/5402 From svkamath at openjdk.java.net Tue Sep 21 18:31:14 2021 From: svkamath at openjdk.java.net (Smita Kamath) Date: Tue, 21 Sep 2021 18:31:14 GMT Subject: RFR: 8273297: AES/GCM non-AVX512+VAES CPUs suffer after 8267125 [v2] In-Reply-To: References: Message-ID: On Tue, 21 Sep 2021 16:37:49 GMT, Sandhya Viswanathan wrote: >> Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: >> >> Added a wrapper around aes-gcm intrinsic, changed data size in TestAESMain and added a new constant for htbl entries > > src/java.base/share/classes/com/sun/crypto/provider/GaloisCounterMode.java line 588: > >> 586: ctOfs+len, out, outOfs+len, gctr, ghash); >> 587: len+= partlen; >> 588: inLen-= len; > > This should be inLen -= partlen; Done. Thank you for pointing this out. ------------- PR: https://git.openjdk.java.net/jdk/pull/5402 From sviswanathan at openjdk.java.net Tue Sep 21 19:10:34 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Tue, 21 Sep 2021 19:10:34 GMT Subject: RFR: 8273297: AES/GCM non-AVX512+VAES CPUs suffer after 8267125 [v3] In-Reply-To: References: Message-ID: On Tue, 21 Sep 2021 18:31:12 GMT, Smita Kamath wrote: >> Performance dropped up to 10% for 1k data after 8267125 for CPUs that do not support the new intrinsic. Tests run were crypto.full.AESGCMBench and crypto.full.AESGCMByteBuffer from the jmh micro benchmarks. >> >> The problem is each instance of GHASH allocates 96 extra longs for the AVX512+VAES intrinsic regardless if the intrinsic is used. This extra table space should be allocated differently so that non-supporting CPUs do not suffer this penalty. This issue also affects non-Intel CPUs too. > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Fixed length decrement issue Marked as reviewed by sviswanathan (Reviewer). The patch looks good to me. ------------- PR: https://git.openjdk.java.net/jdk/pull/5402 From svkamath at openjdk.java.net Tue Sep 21 19:43:15 2021 From: svkamath at openjdk.java.net (Smita Kamath) Date: Tue, 21 Sep 2021 19:43:15 GMT Subject: RFR: 8273297: AES/GCM non-AVX512+VAES CPUs suffer after 8267125 [v2] In-Reply-To: References: Message-ID: On Mon, 20 Sep 2021 16:44:58 GMT, Anthony Scarpino wrote: >> Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: >> >> Added a wrapper around aes-gcm intrinsic, changed data size in TestAESMain and added a new constant for htbl entries > > I approve the jdk changes. You'll need a hotspot reviewer to approve the other changes @ascarpino Could you please run tier 1-3 tests? We have two reviewers for the patch. ------------- PR: https://git.openjdk.java.net/jdk/pull/5402 From anthony.scarpino at oracle.com Tue Sep 21 20:12:18 2021 From: anthony.scarpino at oracle.com (Anthony Scarpino) Date: Tue, 21 Sep 2021 13:12:18 -0700 Subject: RFR: 8273297: AES/GCM non-AVX512+VAES CPUs suffer after 8267125 [v2] In-Reply-To: References: Message-ID: <9e21f013-f408-ffca-3c70-93216c0d4b80@oracle.com> I'll run them.. Did you seem my comments? They are just code-style comments thanks Tony On 9/21/21 12:43 PM, Smita Kamath wrote: > On Mon, 20 Sep 2021 16:44:58 GMT, Anthony Scarpino wrote: > >>> Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: >>> >>> Added a wrapper around aes-gcm intrinsic, changed data size in TestAESMain and added a new constant for htbl entries >> >> I approve the jdk changes. You'll need a hotspot reviewer to approve the other changes > > @ascarpino Could you please run tier 1-3 tests? We have two reviewers for the patch. > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/5402 > From svkamath at openjdk.java.net Tue Sep 21 20:20:06 2021 From: svkamath at openjdk.java.net (Smita Kamath) Date: Tue, 21 Sep 2021 20:20:06 GMT Subject: RFR: 8273297: AES/GCM non-AVX512+VAES CPUs suffer after 8267125 [v2] In-Reply-To: References: Message-ID: On Mon, 20 Sep 2021 16:44:58 GMT, Anthony Scarpino wrote: >> Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: >> >> Added a wrapper around aes-gcm intrinsic, changed data size in TestAESMain and added a new constant for htbl entries > > I approve the jdk changes. You'll need a hotspot reviewer to approve the other changes @ascarpino I dont see your comments on this PR. Could you please post them again? ------------- PR: https://git.openjdk.java.net/jdk/pull/5402 From kim.barrett at oracle.com Tue Sep 21 21:37:23 2021 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 21 Sep 2021 21:37:23 +0000 Subject: RFR: 8264707: HotSpot Style Guide should permit use of lambda In-Reply-To: References: <02C95815-9A3C-41AF-A73F-F7597A26DE90@oracle.com> Message-ID: > On Sep 9, 2021, at 8:10 AM, Andrew Haley wrote: > > On 9/9/21 12:25 AM, Kim Barrett wrote: >> Because of restrictions we're imposing on lambda usage, and in particular >> requiring only downward usage, it should be possible to create such a holder >> that isn't too complicated either to implement or to use, and also avoids >> memory allocation. > > OK, but for now I guess we can use Lambdas in some simple case that make > HotSpot clearer and easier to write. I also think we don't need to solve the type-erased capture problem before we start using lambdas. It would be good to get opinions from others. And just in general, this PR hasn't generated much discussion or approvals. I expected more of one or both. We're well past the originally suggested decision date, but I don't feel comfortable calling this done. Maybe folks just forgot about it and this will serve as a reminder. From kbarrett at openjdk.java.net Tue Sep 21 22:05:09 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 21 Sep 2021 22:05:09 GMT Subject: RFR: 8264707: HotSpot Style Guide should permit use of lambda [v2] In-Reply-To: References: Message-ID: > Please review this proposal to permit the use of lambda expressions in > HotSpot code, with some restrictions and suggestions for good usage within > HotSpot code. Lambda expressions were added in C++11, and provide a more > expressive syntax for local functions, with a number of use-cases where they > can improve readability by eliminating a lot of uninteresting boilerplate. > > Some example uses are included, but are not part of the proposed change. > They will be removed from the PR before it is pushed. (In particular, the > ScopeGuard utility uses move semantics, the use of which hasn't been > approved or even discussed.) They are given to show some of the benefits > that might accrue from permitting the use of lambdas. In particular, they > highlight some of the code reduction that is possible. Some of these code > changes might be proposed in the future, using the normal PR process. > > This is a modification of the Style Guide, so rough consensus among the > HotSpot Group members is required to make this change. Only Group members > should vote for approval (via the github PR), though reasoned objections or > comments from anyone will be considered. A decision on this proposal will > not be made before Wednesday 1-Sep-2021 at 12h00 UTC. > > Since we're piggybacking on github PRs here, please use the PR review > process to approve (click on Review Changes > Approve), rather than sending > a "vote: yes" email reply that would be normal for a CFV. Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into permit_lambda - terminology fix - add scope guard and some example uses - G1 SATB filter lambda - new local functions section ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5144/files - new: https://git.openjdk.java.net/jdk/pull/5144/files/cc08f8b4..1fd7efbc Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5144&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5144&range=00-01 Stats: 40987 lines in 1478 files changed: 26921 ins; 8047 del; 6019 mod Patch: https://git.openjdk.java.net/jdk/pull/5144.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5144/head:pull/5144 PR: https://git.openjdk.java.net/jdk/pull/5144 From pchilanomate at openjdk.java.net Tue Sep 21 22:07:57 2021 From: pchilanomate at openjdk.java.net (Patricio Chilano Mateo) Date: Tue, 21 Sep 2021 22:07:57 GMT Subject: RFR: 8273916: Remove 'special' ranking [v3] In-Reply-To: References: Message-ID: On Tue, 21 Sep 2021 12:07:06 GMT, Coleen Phillimore wrote: >> This change removes the special ranking and folds it into nosafepoint. You have to look at commit #3 to see this actual part of the change that doesn't include JDK-8273915. >> This passes tier1-6 also. > > Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Remove blank line. > - Merge branch 'master' into remove-special > - Add comment about ThreadSMRDelete_lock > - Remove "special" rank. > - Partition safepoint checking and nonchecking lock ranks. The nonchecking locks are always lower ranked than the safepoint checking locks because they cannot block. > - Partition safepoint checking and nonchecking lock ranks. The nonchecking locks are always lower ranked than the safepoint checking locks because they cannot block. Hi Coleen, Changes look good to me. Not straightforward to verify by code inspection but seems you also already figured out dependent locks with all the testing. Thanks, Patricio src/hotspot/share/runtime/mutexLocker.cpp line 228: > 226: def(StringDedupIntern_lock , PaddedMutex , nosafepoint, true, _safepoint_check_never); > 227: def(ParGCRareEvent_lock , PaddedMutex , leaf, true, _safepoint_check_always); > 228: def(CodeCache_lock , PaddedMonitor, nosafepoint-3, true, _safepoint_check_never); nit: There is a comment in mutexLocker.hpp that rank of CodeCache_lock is special. ------------- Marked as reviewed by pchilanomate (Committer). PR: https://git.openjdk.java.net/jdk/pull/5563 From svkamath at openjdk.java.net Tue Sep 21 22:16:40 2021 From: svkamath at openjdk.java.net (Smita Kamath) Date: Tue, 21 Sep 2021 22:16:40 GMT Subject: RFR: 8273297: AES/GCM non-AVX512+VAES CPUs suffer after 8267125 [v2] In-Reply-To: References: Message-ID: On Mon, 20 Sep 2021 16:44:58 GMT, Anthony Scarpino wrote: >> Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: >> >> Added a wrapper around aes-gcm intrinsic, changed data size in TestAESMain and added a new constant for htbl entries > > I approve the jdk changes. You'll need a hotspot reviewer to approve the other changes @ascarpino I've modified the code style. Thank you. ------------- PR: https://git.openjdk.java.net/jdk/pull/5402 From svkamath at openjdk.java.net Tue Sep 21 22:16:37 2021 From: svkamath at openjdk.java.net (Smita Kamath) Date: Tue, 21 Sep 2021 22:16:37 GMT Subject: RFR: 8273297: AES/GCM non-AVX512+VAES CPUs suffer after 8267125 [v4] In-Reply-To: References: Message-ID: > Performance dropped up to 10% for 1k data after 8267125 for CPUs that do not support the new intrinsic. Tests run were crypto.full.AESGCMBench and crypto.full.AESGCMByteBuffer from the jmh micro benchmarks. > > The problem is each instance of GHASH allocates 96 extra longs for the AVX512+VAES intrinsic regardless if the intrinsic is used. This extra table space should be allocated differently so that non-supporting CPUs do not suffer this penalty. This issue also affects non-Intel CPUs too. Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: Fixed code-style standard to have a space between plus, minus and combo operators ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5402/files - new: https://git.openjdk.java.net/jdk/pull/5402/files/19b0d547..8756d301 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5402&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5402&range=02-03 Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/5402.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5402/head:pull/5402 PR: https://git.openjdk.java.net/jdk/pull/5402 From github.com+42899633+eastig at openjdk.java.net Tue Sep 21 22:22:50 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Tue, 21 Sep 2021 22:22:50 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v3] In-Reply-To: References: Message-ID: > This PR is a follow-up on the discussion [?RFC: AArch64: Implementing spin pauses with ISB?](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054033.html). > > It adds the option `UsePauseImpl=value`, where `value` can be: > > - `none`: no implementation for spin pauses. This is the default value. > - `Nnop`: use `nop` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `nop` instructions. > - `Nisb`: use `isb` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `isb` instructions. > - `Nyield`: use `yield` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `yield` instructions. > > The code for the `Thread.onSpinWait` intrinsic is generated based on the value of `UsePauseImpl`. > > Testing: > > - `make test TEST="gtest"`: Passed > - `make run-test TEST="tier1"`: Passed > - `make run-test TEST="tier2"`: Passed > - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: Move emitting code to MacroAssembler::spin_wait Code emitting spin pauses is moved to MacroAssembler::spin_wait. As OptoAssembly output is changed, tests are updated to parse PrintAssembly. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5562/files - new: https://git.openjdk.java.net/jdk/pull/5562/files/c6831a3b..fc55a682 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5562&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5562&range=01-02 Stats: 241 lines in 8 files changed: 66 ins; 135 del; 40 mod Patch: https://git.openjdk.java.net/jdk/pull/5562.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5562/head:pull/5562 PR: https://git.openjdk.java.net/jdk/pull/5562 From github.com+42899633+eastig at openjdk.java.net Tue Sep 21 22:22:51 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Tue, 21 Sep 2021 22:22:51 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v2] In-Reply-To: References: <9aPLibCCYMKqhil77ooUe8xzeT4P8JwSHmmYez0-5TM=.c46d790d-d6fc-4925-beb8-875cbf49a62b@github.com> Message-ID: On Tue, 21 Sep 2021 15:14:09 GMT, Andrew Haley wrote: >> This results no instructions implementing `onspinwait` in OptoAssembly output. Why do we want to hide the details? > > It's not useful as a verification that the correct instructions were generated: you need a disassembly for that. OptoAssembly is usually a somewhat briefer format. Done ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From github.com+42899633+eastig at openjdk.java.net Tue Sep 21 22:22:54 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Tue, 21 Sep 2021 22:22:54 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v2] In-Reply-To: References: <9aPLibCCYMKqhil77ooUe8xzeT4P8JwSHmmYez0-5TM=.c46d790d-d6fc-4925-beb8-875cbf49a62b@github.com> Message-ID: <7WVs4mxJOrqDQfdh4UrA0Ss6JTzJLqKVPVY1qy2WYlI=.9c9cdabc-c52e-438e-b1cf-20d84e6400f8@github.com> On Tue, 21 Sep 2021 08:04:15 GMT, Andrew Haley wrote: >> Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: >> >> Replace 'for' loops with macros > > src/hotspot/cpu/aarch64/aarch64.ad line 14392: > >> 14390: ShouldNotReachHere(); >> 14391: } >> 14392: %} > > Please let the MacroAssembler do this. Just call `MacroAssembler::spin_wait()`. Done > src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp line 2999: > >> 2997: break; >> 2998: default: >> 2999: ShouldNotReachHere(); > > Same here. Please just call `MacroAssembler::spin_wait()`. Done > src/hotspot/cpu/aarch64/globals_aarch64.hpp line 114: > >> 112: "Value -1 means off.") \ >> 113: range(-1, 4096) \ >> 114: product(ccstr, UsePauseImpl, "none", \ > > The name "UsePauseImpl" fails to make the connection with `onSpinWait`. If you called it something like `OnSpinWaitImpl` that would make the connection. Done > src/hotspot/cpu/aarch64/globals_aarch64.hpp line 115: > >> 113: range(-1, 4096) \ >> 114: product(ccstr, UsePauseImpl, "none", \ >> 115: "Use instructions to implement pauses." \ > > Suggestion: > > "Use instructions to implement java.lang.Thread.onSpinWait() ." \ Done ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From github.com+42899633+eastig at openjdk.java.net Tue Sep 21 22:22:55 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Tue, 21 Sep 2021 22:22:55 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v3] In-Reply-To: References: <2mJybb2RLTON7AdFNCbru8cnNKiraG8zpNFccNCJCY8=.3414cf7e-444b-4160-95f7-38306650d4b9@github.com> Message-ID: On Tue, 21 Sep 2021 14:47:41 GMT, Andrew Haley wrote: >> To have assembly instructions in `-XX:+PrintAssembly` output `hsdis` needs to be provided: >> >> 0x0000ffff61ba2b5c: ; {metadata({method} {0x0000000800466ab8} 'isLatin1' '()Z' in 'java/lang/String')} >> 0x0000ffff61ba2b5c: 0857 8dd2 | c808 a0f2 | 0801 c0f2 | e807 00f9 >> ;; 0xFFFFFFFFFFFFFFFF >> 0x0000ffff61ba2b6c: 0800 8092 | e803 00f9 >> >> 0x0000ffff61ba2b74: ; {runtime_call counter_overflow Runtime1 stub} >> >> >> However it can help to skip to the place where instructions are expected and to check instructions' hex code. > > True. There's no C1 equivalent. I rewrote a test to parse `XX:+PrintAssembly` hex instructions. ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From xliu at openjdk.java.net Tue Sep 21 22:32:03 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 21 Sep 2021 22:32:03 GMT Subject: RFR: 8273392: Improve usability of stack-less exceptions due to -XX:+OmitStackTraceInFastThrow [v3] In-Reply-To: <02TswvRnnXEQeNetqwBh8XiivQsMFUCEIt758AdjrHk=.6216480f-2031-481d-8bf6-e94e8c5f1977@github.com> References: <02TswvRnnXEQeNetqwBh8XiivQsMFUCEIt758AdjrHk=.6216480f-2031-481d-8bf6-e94e8c5f1977@github.com> Message-ID: On Tue, 21 Sep 2021 10:09:11 GMT, Volker Simonis wrote: >> If running with `-XX:+OmitStackTraceInFastThrow` (which is the default) C2 will optimize certain "hot" implicit exceptions (i.e. AIOOBE, NullPointerExceptions,..) and replace them by a static, pre-allocated exception without any stacktrace. >> >> However, we can actually do better. Instead of using a single, pre-allocated exception object for all methods we can let the compiler allocate specific exceptions for each compilation unit (i.e. nmethod) and fill them with at least one stack frame with the method /line-number information of the currently compiled method. If the method in question is being inlined (which often happens), we can add stackframes for all callers up to the inlining depth of the method in question. >> >> For the attached JTreg test, we get the following exception in interpreter mode: >> >> java.lang.NullPointerException: Cannot read the array length because "" is null >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) >> at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) >> at compiler.exceptions.StackFrameInFastThrow.main(StackFrameInFastThrow.java:233) >> >> Once the method gets compiled with `-XX:+OmitStackTraceInFastThrow` the same exception will look as follows: >> >> java.lang.NullPointerException >> >> After this change, if `StackFrameInFastThrow.throwImplicitException()` will be compiled stand alone, we will get: >> >> java.lang.NullPointerException >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> >> and if `StackFrameInFastThrow.throwImplicitException()` will be inlined into `level2()` and `level2()` into `level1()` we will get the following exception (altough we're still running with `-XX:+OmitStackTraceInFastThrow`): >> >> java.lang.NullPointerException >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) >> at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) >> >> The new functionality is guarded by `-XX:+/-StackFrameInFastThrow`, but switched on by default (I'll create a CSR for the new option once reviewers are comfortable with the change). Notice that the optimization comes at no run-time costs because all the extra work will be done at compile time. >> >> ## Implementation details >> >> - Already the current implementation of `-XX:+OmitStackTraceInFastThrow` potentially lazy-allocates the empty singleton exceptions like AIOOBE in `ciEnv::ArrayStoreException_instance()`. With this change, if running with `-XX:+StackFrameInFastThrow` we will always allocate new exception objects and populate them with the stack frames which are statically available at compile time (see `java_lang_Throwable::fill_in_stack_trace_of_implicit_exception()`). >> - Because nmethods don't act as strong GC roots, we have to create a global JNI handle for every newly generated exception to prevent GC from collecting them. >> - In order to avoid a memory leak we have to release these global JNI handles once a nmethod gets unloaded. In order to achieve this, I've added a new section "implicit exceptions" to the nmethod which holds these JNI handles. >> - While adding the new "implicit exceptions" section to the corresponding stats (`print_nmethod_stats()` and printing routines (`nmethod::print()`) I realized that a previous change ([JDK-8254231: Implementation of Foreign Linker API (Incubator)](https://bugs.openjdk.java.net/browse/JDK-8254231)) had already introduced a new nmethod section ("native invokers") but missed to add it to the corresponding stats and printing routines so I've added that section as well. >> - The `#ifdef COMPILER2` guards are only required to not break the `zero`/`minimal` builds. >> - The JTreg test is using `-XX:PerMethodTrapLimit=0` to handle all implicit exceptions as "hot". This makes the test simpler and at the same time provokes the allocation of more implicit exceptions. >> - Manually verified that the created Exception objects are freed by GC once the corresponding nmethods have been flushed. >> - Manual "stress" test with a very small heap and continuous recompilation of methods with explicit exceptions to provoke GCs during compilation didn't reveal any issues. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Create implcit exceptions with an array of StackTraceElements right away instead of creating a backtrace. This prevents that implicit exceptions will keep classes alive due to Java mirrors in the backtrace. > for me it's not an enhancement, but a bug fix, in production an exception with no stacktrace is useless and result in hours lost trying to figure out the issue If we treat it as a bug, shall we remove `StackFrameInFastThrow'? we can just make this the default behavior of `OmitStackTraceInFastThrow`. Why `OmitStackTraceInFastThrow` isn't a c2-exclusive option? I think it only affects c2. That flag overrides the existing flag `StackTraceInThrowable`. Let not introduce another flag. No one would like an exception without a pointer. test/hotspot/jtreg/compiler/exceptions/StackFrameInFastThrow.java line 26: > 24: /* > 25: * @test > 26: * @bug 9999999 Should this be 8273392? 'requires' supports boolean expression |. Therefore, we don't need two annotations. test/hotspot/jtreg/compiler/exceptions/StackFrameInFastThrow.java line 110: > 108: private static void unload(Method m) { > 109: Asserts.assertEQ(WB.getMethodCompilationLevel(m), 4, "Method should be compiled at level 4."); > 110: if (DEBUG) System.console().readLine(); Is this "press any key" from stdin? I got problem to invoke it when I set DEBUG=1. We better off remove these statements in case it trip test up. TEST RESULT: Failed. Execution failed: `main' threw exception: java.lang.NullPointerException: Cannot invoke "java.io.Console.readLine()" because the return value of "java.lang.System.console()" is null ------------- PR: https://git.openjdk.java.net/jdk/pull/5392 From kbarrett at openjdk.java.net Tue Sep 21 23:01:08 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 21 Sep 2021 23:01:08 GMT Subject: RFR: 8272807: Permit use of memory concurrent with pretouch [v2] In-Reply-To: References: Message-ID: > Note that this PR replaces the withdrawn https://github.com/openjdk/jdk/pull/5215. > > Please review this change which adds os::touch_memory, which is similar to > os::pretouch_memory but allows concurrent access to the memory while it is > being touched. This is accomplished by using an atomic add of zero as the > operation for touching the memory, ensuring the virtual location is backed > by physical memory while not changing any values being read or written by > other threads. > > While I was there, fixed some other lurking issues in os::pretouch_memory. > There was a potential overflow in the iteration that has been fixed. And if > the range arguments weren't page aligned then the last page might not get > touched. The latter was even mentioned in the function's description. Both > of those have been fixed by careful alignment and some extra checks. The > resulting code is a little more complicated, but more robust and complete. > > Similarly added TouchTask, which is similar to PretouchTask. Again here, > there is some cleaning up to avoid potential overflows and such. > > - The chunk size is computed using the page size after possible adjustment > for UseTransparentHugePages. We want a chunk size that reflects the actual > number of touches that will be performed. > > - The chunk claim is now done using a CAS that won't exceed the range end. > The old atomic-fetch-and-add and check the result, which is performed by > each worker thread, could lead to overflow. The old code has a test for > overflow, but since pointer-arithmetic overflow is UB that's not reliable. > > - The old calculation of num_chunks for parallel touching could also > potentially overflow. > > Testing: > mach5 tier1-3 Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into touch_memory - simplify touch_impl, using conditional on bool arg rather than template specialization - touch task - add touch_memory ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5353/files - new: https://git.openjdk.java.net/jdk/pull/5353/files/d33934f3..3c8db1dc Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5353&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5353&range=00-01 Stats: 25850 lines in 930 files changed: 18027 ins; 4839 del; 2984 mod Patch: https://git.openjdk.java.net/jdk/pull/5353.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5353/head:pull/5353 PR: https://git.openjdk.java.net/jdk/pull/5353 From xliu at openjdk.java.net Tue Sep 21 23:36:02 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 21 Sep 2021 23:36:02 GMT Subject: RFR: 8273392: Improve usability of stack-less exceptions due to -XX:+OmitStackTraceInFastThrow [v3] In-Reply-To: <02TswvRnnXEQeNetqwBh8XiivQsMFUCEIt758AdjrHk=.6216480f-2031-481d-8bf6-e94e8c5f1977@github.com> References: <02TswvRnnXEQeNetqwBh8XiivQsMFUCEIt758AdjrHk=.6216480f-2031-481d-8bf6-e94e8c5f1977@github.com> Message-ID: On Tue, 21 Sep 2021 10:09:11 GMT, Volker Simonis wrote: >> If running with `-XX:+OmitStackTraceInFastThrow` (which is the default) C2 will optimize certain "hot" implicit exceptions (i.e. AIOOBE, NullPointerExceptions,..) and replace them by a static, pre-allocated exception without any stacktrace. >> >> However, we can actually do better. Instead of using a single, pre-allocated exception object for all methods we can let the compiler allocate specific exceptions for each compilation unit (i.e. nmethod) and fill them with at least one stack frame with the method /line-number information of the currently compiled method. If the method in question is being inlined (which often happens), we can add stackframes for all callers up to the inlining depth of the method in question. >> >> For the attached JTreg test, we get the following exception in interpreter mode: >> >> java.lang.NullPointerException: Cannot read the array length because "" is null >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) >> at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) >> at compiler.exceptions.StackFrameInFastThrow.main(StackFrameInFastThrow.java:233) >> >> Once the method gets compiled with `-XX:+OmitStackTraceInFastThrow` the same exception will look as follows: >> >> java.lang.NullPointerException >> >> After this change, if `StackFrameInFastThrow.throwImplicitException()` will be compiled stand alone, we will get: >> >> java.lang.NullPointerException >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> >> and if `StackFrameInFastThrow.throwImplicitException()` will be inlined into `level2()` and `level2()` into `level1()` we will get the following exception (altough we're still running with `-XX:+OmitStackTraceInFastThrow`): >> >> java.lang.NullPointerException >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) >> at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) >> >> The new functionality is guarded by `-XX:+/-StackFrameInFastThrow`, but switched on by default (I'll create a CSR for the new option once reviewers are comfortable with the change). Notice that the optimization comes at no run-time costs because all the extra work will be done at compile time. >> >> ## Implementation details >> >> - Already the current implementation of `-XX:+OmitStackTraceInFastThrow` potentially lazy-allocates the empty singleton exceptions like AIOOBE in `ciEnv::ArrayStoreException_instance()`. With this change, if running with `-XX:+StackFrameInFastThrow` we will always allocate new exception objects and populate them with the stack frames which are statically available at compile time (see `java_lang_Throwable::fill_in_stack_trace_of_implicit_exception()`). >> - Because nmethods don't act as strong GC roots, we have to create a global JNI handle for every newly generated exception to prevent GC from collecting them. >> - In order to avoid a memory leak we have to release these global JNI handles once a nmethod gets unloaded. In order to achieve this, I've added a new section "implicit exceptions" to the nmethod which holds these JNI handles. >> - While adding the new "implicit exceptions" section to the corresponding stats (`print_nmethod_stats()` and printing routines (`nmethod::print()`) I realized that a previous change ([JDK-8254231: Implementation of Foreign Linker API (Incubator)](https://bugs.openjdk.java.net/browse/JDK-8254231)) had already introduced a new nmethod section ("native invokers") but missed to add it to the corresponding stats and printing routines so I've added that section as well. >> - The `#ifdef COMPILER2` guards are only required to not break the `zero`/`minimal` builds. >> - The JTreg test is using `-XX:PerMethodTrapLimit=0` to handle all implicit exceptions as "hot". This makes the test simpler and at the same time provokes the allocation of more implicit exceptions. >> - Manually verified that the created Exception objects are freed by GC once the corresponding nmethods have been flushed. >> - Manual "stress" test with a very small heap and continuous recompilation of methods with explicit exceptions to provoke GCs during compilation didn't reveal any issues. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Create implcit exceptions with an array of StackTraceElements right away instead of creating a backtrace. This prevents that implicit exceptions will keep classes alive due to Java mirrors in the backtrace. src/hotspot/share/classfile/javaClasses.cpp line 2581: > 2579: assert(ik != NULL, "must be loaded in 1.4+"); > 2580: > 2581: // Determin the number of available frames typo ------------- PR: https://git.openjdk.java.net/jdk/pull/5392 From xliu at openjdk.java.net Tue Sep 21 23:40:05 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 21 Sep 2021 23:40:05 GMT Subject: RFR: 8273392: Improve usability of stack-less exceptions due to -XX:+OmitStackTraceInFastThrow [v3] In-Reply-To: <02TswvRnnXEQeNetqwBh8XiivQsMFUCEIt758AdjrHk=.6216480f-2031-481d-8bf6-e94e8c5f1977@github.com> References: <02TswvRnnXEQeNetqwBh8XiivQsMFUCEIt758AdjrHk=.6216480f-2031-481d-8bf6-e94e8c5f1977@github.com> Message-ID: On Tue, 21 Sep 2021 10:09:11 GMT, Volker Simonis wrote: >> If running with `-XX:+OmitStackTraceInFastThrow` (which is the default) C2 will optimize certain "hot" implicit exceptions (i.e. AIOOBE, NullPointerExceptions,..) and replace them by a static, pre-allocated exception without any stacktrace. >> >> However, we can actually do better. Instead of using a single, pre-allocated exception object for all methods we can let the compiler allocate specific exceptions for each compilation unit (i.e. nmethod) and fill them with at least one stack frame with the method /line-number information of the currently compiled method. If the method in question is being inlined (which often happens), we can add stackframes for all callers up to the inlining depth of the method in question. >> >> For the attached JTreg test, we get the following exception in interpreter mode: >> >> java.lang.NullPointerException: Cannot read the array length because "" is null >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) >> at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) >> at compiler.exceptions.StackFrameInFastThrow.main(StackFrameInFastThrow.java:233) >> >> Once the method gets compiled with `-XX:+OmitStackTraceInFastThrow` the same exception will look as follows: >> >> java.lang.NullPointerException >> >> After this change, if `StackFrameInFastThrow.throwImplicitException()` will be compiled stand alone, we will get: >> >> java.lang.NullPointerException >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> >> and if `StackFrameInFastThrow.throwImplicitException()` will be inlined into `level2()` and `level2()` into `level1()` we will get the following exception (altough we're still running with `-XX:+OmitStackTraceInFastThrow`): >> >> java.lang.NullPointerException >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) >> at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) >> >> The new functionality is guarded by `-XX:+/-StackFrameInFastThrow`, but switched on by default (I'll create a CSR for the new option once reviewers are comfortable with the change). Notice that the optimization comes at no run-time costs because all the extra work will be done at compile time. >> >> ## Implementation details >> >> - Already the current implementation of `-XX:+OmitStackTraceInFastThrow` potentially lazy-allocates the empty singleton exceptions like AIOOBE in `ciEnv::ArrayStoreException_instance()`. With this change, if running with `-XX:+StackFrameInFastThrow` we will always allocate new exception objects and populate them with the stack frames which are statically available at compile time (see `java_lang_Throwable::fill_in_stack_trace_of_implicit_exception()`). >> - Because nmethods don't act as strong GC roots, we have to create a global JNI handle for every newly generated exception to prevent GC from collecting them. >> - In order to avoid a memory leak we have to release these global JNI handles once a nmethod gets unloaded. In order to achieve this, I've added a new section "implicit exceptions" to the nmethod which holds these JNI handles. >> - While adding the new "implicit exceptions" section to the corresponding stats (`print_nmethod_stats()` and printing routines (`nmethod::print()`) I realized that a previous change ([JDK-8254231: Implementation of Foreign Linker API (Incubator)](https://bugs.openjdk.java.net/browse/JDK-8254231)) had already introduced a new nmethod section ("native invokers") but missed to add it to the corresponding stats and printing routines so I've added that section as well. >> - The `#ifdef COMPILER2` guards are only required to not break the `zero`/`minimal` builds. >> - The JTreg test is using `-XX:PerMethodTrapLimit=0` to handle all implicit exceptions as "hot". This makes the test simpler and at the same time provokes the allocation of more implicit exceptions. >> - Manually verified that the created Exception objects are freed by GC once the corresponding nmethods have been flushed. >> - Manual "stress" test with a very small heap and continuous recompilation of methods with explicit exceptions to provoke GCs during compilation didn't reveal any issues. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Create implcit exceptions with an array of StackTraceElements right away instead of creating a backtrace. This prevents that implicit exceptions will keep classes alive due to Java mirrors in the backtrace. src/hotspot/share/ci/ciEnv.cpp line 407: > 405: if (!HAS_PENDING_EXCEPTION) { > 406: Handle handle = Handle(THREAD, obj); > 407: java_lang_Throwable::allocate_fill_stack_trace_of_implicit_exception(handle, gk); IIHO, hotspot fills stacktrace when StackTraceInThrowable is true. ------------- PR: https://git.openjdk.java.net/jdk/pull/5392 From coleenp at openjdk.java.net Tue Sep 21 23:47:41 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 21 Sep 2021 23:47:41 GMT Subject: RFR: 8273916: Remove 'special' ranking [v4] In-Reply-To: References: Message-ID: <1Y3__mWGE9I1tGXX9nh5pqNkgcBDMAemQqz9tf4Q8t8=.3bd92d9e-42e6-4a81-b11f-5639a274aad2@github.com> > This change removes the special ranking and folds it into nosafepoint. You have to look at commit #3 to see this actual part of the change that doesn't include JDK-8273915. > This passes tier1-6 also. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Remove special comment. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5563/files - new: https://git.openjdk.java.net/jdk/pull/5563/files/5867b9a5..6440a8c4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5563&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5563&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5563.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5563/head:pull/5563 PR: https://git.openjdk.java.net/jdk/pull/5563 From coleenp at openjdk.java.net Tue Sep 21 23:47:44 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 21 Sep 2021 23:47:44 GMT Subject: RFR: 8273916: Remove 'special' ranking [v3] In-Reply-To: References: Message-ID: On Tue, 21 Sep 2021 12:07:06 GMT, Coleen Phillimore wrote: >> This change removes the special ranking and folds it into nosafepoint. You have to look at commit #3 to see this actual part of the change that doesn't include JDK-8273915. >> This passes tier1-6 also. > > Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Remove blank line. > - Merge branch 'master' into remove-special > - Add comment about ThreadSMRDelete_lock > - Remove "special" rank. > - Partition safepoint checking and nonchecking lock ranks. The nonchecking locks are always lower ranked than the safepoint checking locks because they cannot block. > - Partition safepoint checking and nonchecking lock ranks. The nonchecking locks are always lower ranked than the safepoint checking locks because they cannot block. Thanks Patricio for the code review! ------------- PR: https://git.openjdk.java.net/jdk/pull/5563 From coleenp at openjdk.java.net Tue Sep 21 23:47:46 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Tue, 21 Sep 2021 23:47:46 GMT Subject: RFR: 8273916: Remove 'special' ranking [v3] In-Reply-To: References: Message-ID: On Tue, 21 Sep 2021 21:50:27 GMT, Patricio Chilano Mateo wrote: >> Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: >> >> - Remove blank line. >> - Merge branch 'master' into remove-special >> - Add comment about ThreadSMRDelete_lock >> - Remove "special" rank. >> - Partition safepoint checking and nonchecking lock ranks. The nonchecking locks are always lower ranked than the safepoint checking locks because they cannot block. >> - Partition safepoint checking and nonchecking lock ranks. The nonchecking locks are always lower ranked than the safepoint checking locks because they cannot block. > > src/hotspot/share/runtime/mutexLocker.cpp line 228: > >> 226: def(StringDedupIntern_lock , PaddedMutex , nosafepoint, true, _safepoint_check_never); >> 227: def(ParGCRareEvent_lock , PaddedMutex , leaf, true, _safepoint_check_always); >> 228: def(CodeCache_lock , PaddedMonitor, nosafepoint-3, true, _safepoint_check_never); > > nit: There is a comment in mutexLocker.hpp that rank of CodeCache_lock is special. Thanks for noticing that. ------------- PR: https://git.openjdk.java.net/jdk/pull/5563 From ngasson at openjdk.java.net Wed Sep 22 03:06:00 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Wed, 22 Sep 2021 03:06:00 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v3] In-Reply-To: References: Message-ID: <5yTyf-BNoSsU36WxJNxGum10rfMKf4dkZAFIVFl7zEw=.261e52fc-1d56-4ed8-942f-97335f07eca6@github.com> On Tue, 21 Sep 2021 22:22:50 GMT, Evgeny Astigeevich wrote: >> This PR is a follow-up on the discussion [?RFC: AArch64: Implementing spin pauses with ISB?](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054033.html). >> >> It adds the option `OnSpinWaitImpl=value`, where `value` can be: >> >> - `none`: no implementation for spin pauses. This is the default value. >> - `Nnop`: use `nop` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `nop` instructions. >> - `Nisb`: use `isb` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `isb` instructions. >> - `Nyield`: use `yield` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `yield` instructions. >> >> The code for the `Thread.onSpinWait` intrinsic is generated based on the value of `OnSpinWaitImpl`. >> >> Testing: >> >> - `make test TEST="gtest"`: Passed >> - `make run-test TEST="tier1"`: Passed >> - `make run-test TEST="tier2"`: Passed >> - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > Move emitting code to MacroAssembler::spin_wait > > Code emitting spin pauses is moved to MacroAssembler::spin_wait. > As OptoAssembly output is changed, tests are updated to parse > PrintAssembly. src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 1385: > 1383: // Code for java.lang.Thread::onSpinWait() intrinsic. > 1384: void spin_wait() { > 1385: #define EMIT_N_INST(n, inst) for (int i = 0; i < (n); ++i) inst() Why use a macro here? You could just put the loop around the switch statement. And the method body seems sufficiently large that it ought to go in the .cpp file. ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From ngasson at openjdk.java.net Wed Sep 22 03:24:11 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Wed, 22 Sep 2021 03:24:11 GMT Subject: RFR: 8267356: AArch64: Vector API SVE codegen support [v7] In-Reply-To: References: <04_lDZDCcLLfXx6XmrbVdGXkPmqNrYzQuHBJzQ9Oa5k=.f3ef1398-c8b4-4162-9631-3dbfb6594250@github.com> Message-ID: <053YSRWJjndpXnMFcY0hOx9SUHWrSJDtSbAcLRs_QLc=.f2914915-321e-45a8-bf1a-ead879600bca@github.com> On Fri, 17 Sep 2021 06:53:06 GMT, Ningsheng Jian wrote: >> This is the integration of current SVE work done in panama-vector/vectorIntrinscs, which includes: >> >> 1. Code generation for Vector API c2 IR nodes with SVE. >> 2. Non-max vector size support with SVE, e.g. using *128Vector (and *64Vector) APIs on 256-bit SVE environment could also generate optimized SVE instructions with predicate feature. >> 3. Some more SVE assemblers (and tests) used by the codegen part. >> >> Note: VectorMask is still represented in vector register, a further improvement to map mask to predicate register is under development at https://github.com/openjdk/panama-vector/tree/vectorIntrinsics+mask >> >> >> Test: tier1-3 with vector api test cases passed on 512-bit SVE hardware with MaxVectorSize=16/32/64. > > Ningsheng Jian has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge with master > - Merge with master > - More comments from Andrew. > - Add missing part > - Address Andrew's comments > - 8267356: AArch64: Vector API SVE codegen support > > This is the integration of current SVE work done in > panama-vector/vectorIntrinscs, which includes: > > 1. Code generation for Vector API c2 IR nodes with SVE. > 2. Non-max vector size support with SVE, e.g. using *128Vector APIs on > 256-bit SVE environment could also generate optimized SVE > instructions with predicate feature. > 3. Some more SVE assemblers (and tests) used by the codegen part. > > Note: VectorMask is still represented in vector register, a further > improvement to map mask to predicate register is under development at > https://github.com/openjdk/panama-vector/tree/vectorIntrinsics+mask > > Test: tier1-3 with vector api test cases passed on 512-bit SVE hardware > with MaxVectorSize=16/32/64. Marked as reviewed by ngasson (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/4122 From njian at openjdk.java.net Wed Sep 22 03:24:11 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Wed, 22 Sep 2021 03:24:11 GMT Subject: RFR: 8267356: AArch64: Vector API SVE codegen support [v7] In-Reply-To: <053YSRWJjndpXnMFcY0hOx9SUHWrSJDtSbAcLRs_QLc=.f2914915-321e-45a8-bf1a-ead879600bca@github.com> References: <04_lDZDCcLLfXx6XmrbVdGXkPmqNrYzQuHBJzQ9Oa5k=.f3ef1398-c8b4-4162-9631-3dbfb6594250@github.com> <053YSRWJjndpXnMFcY0hOx9SUHWrSJDtSbAcLRs_QLc=.f2914915-321e-45a8-bf1a-ead879600bca@github.com> Message-ID: On Wed, 22 Sep 2021 03:17:47 GMT, Nick Gasson wrote: >> Ningsheng Jian has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: >> >> - Merge with master >> - Merge with master >> - More comments from Andrew. >> - Add missing part >> - Address Andrew's comments >> - 8267356: AArch64: Vector API SVE codegen support >> >> This is the integration of current SVE work done in >> panama-vector/vectorIntrinscs, which includes: >> >> 1. Code generation for Vector API c2 IR nodes with SVE. >> 2. Non-max vector size support with SVE, e.g. using *128Vector APIs on >> 256-bit SVE environment could also generate optimized SVE >> instructions with predicate feature. >> 3. Some more SVE assemblers (and tests) used by the codegen part. >> >> Note: VectorMask is still represented in vector register, a further >> improvement to map mask to predicate register is under development at >> https://github.com/openjdk/panama-vector/tree/vectorIntrinsics+mask >> >> Test: tier1-3 with vector api test cases passed on 512-bit SVE hardware >> with MaxVectorSize=16/32/64. > > Marked as reviewed by ngasson (Reviewer). Thank you @nick-arm for the review! ------------- PR: https://git.openjdk.java.net/jdk/pull/4122 From never at openjdk.java.net Wed Sep 22 05:48:29 2021 From: never at openjdk.java.net (Tom Rodriguez) Date: Wed, 22 Sep 2021 05:48:29 GMT Subject: RFR: 8218885: Restore pop_frame and force_early_return functionality for Graal Message-ID: This logic no longer seems to be necessary since the adjustCompilationLevel callback has been removed. ------------- Commit messages: - 8218885: Restore pop_frame and force_early_return functionality for Graal Changes: https://git.openjdk.java.net/jdk/pull/5625/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5625&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8218885 Stats: 8 lines in 1 file changed: 0 ins; 8 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5625.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5625/head:pull/5625 PR: https://git.openjdk.java.net/jdk/pull/5625 From xliu at openjdk.java.net Wed Sep 22 06:01:59 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Wed, 22 Sep 2021 06:01:59 GMT Subject: RFR: 8273608: Deadlock when jcmd of OnError attaches to itself In-Reply-To: <_MjzQyxu0xw_hkQcsKe3kIq3mdrpkKRKJ7vgVf2TNWA=.2fde4c46-30a5-4a61-8723-75ddd3c84df3@github.com> References: <3mILJs2Lcq7t5gUDP70FH2LVsm-NT2UTsm1JY-rCKB0=.d4dd4b55-9476-43ad-a0cf-ce6d2a9bef4e@github.com> <_MjzQyxu0xw_hkQcsKe3kIq3mdrpkKRKJ7vgVf2TNWA=.2fde4c46-30a5-4a61-8723-75ddd3c84df3@github.com> Message-ID: On Tue, 21 Sep 2021 05:22:00 GMT, David Holmes wrote: >> This patch allows the custom commands of OnError to attach to HotSpot itself. >> It sets the thread of report_and_die() to Native before os::fork_and_exec(cmd). >> This prevents cmds which require safepoint synchronization from deadlock. >> eg. OnError='jcmd %p Thread.print'. >> >> Without this patch, we will encounter a deadlock at safepoint synchronization. >> `"main" #1` is the very thread which executes `os::fork_and_exec(cmd)`. >> >> >> Aborting due to java.lang.OutOfMemoryError: Java heap space >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (debug.cpp:364), pid=94632, tid=94633 >> # fatal error: OutOfMemory encountered: Java heap space >> # >> # JRE version: OpenJDK Runtime Environment (18.0) (build 18-internal+0-adhoc.xxinliu.jdk) >> # Java VM: OpenJDK 64-Bit Server VM (18-internal+0-adhoc.xxinliu.jdk, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) >> # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again >> # >> # An error report file with more information is saved as: >> # /local/home/xxinliu/JDK-2085/hs_err_pid94632.log >> # >> # -XX:OnError="jcmd %p Thread.print" >> # Executing /bin/sh -c "jcmd 94632 Thread.print" ... >> 94632: >> [10.616s][warning][safepoint] >> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timeout detected: >> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timed out while spinning to reach a safepoint. >> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Threads which did not reach the safepoint: >> [10.616s][warning][safepoint] # "main" #1 prio=5 os_prio=0 cpu=236.97ms elapsed=10.61s tid=0x00007f01b00232f0 nid=94633 runnable [0x00007f01b7a08000] >> [10.616s][warning][safepoint] java.lang.Thread.State: RUNNABLE >> [10.616s][warning][safepoint] >> [10.616s][warning][safepoint] # SafepointSynchronize::begin: (End of list) > > src/hotspot/share/utilities/vmError.cpp line 1663: > >> 1661: out.print_raw_cr("\" ..."); >> 1662: >> 1663: VMErrorThreadToNativeFromVM ttnfv(JavaThread::current_or_null()); > > Surely the current thread need not be a JavaThread here. Make sense. I haven't seen report_and_die() is called by NonJavaThread, but I agree we should cover that case. it will be no-op for NonJavaThread because safepoint synchronization only checks JavaThreads. ------------- PR: https://git.openjdk.java.net/jdk/pull/5590 From xliu at openjdk.java.net Wed Sep 22 06:59:00 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Wed, 22 Sep 2021 06:59:00 GMT Subject: RFR: 8273608: Deadlock when jcmd of OnError attaches to itself In-Reply-To: <_MjzQyxu0xw_hkQcsKe3kIq3mdrpkKRKJ7vgVf2TNWA=.2fde4c46-30a5-4a61-8723-75ddd3c84df3@github.com> References: <3mILJs2Lcq7t5gUDP70FH2LVsm-NT2UTsm1JY-rCKB0=.d4dd4b55-9476-43ad-a0cf-ce6d2a9bef4e@github.com> <_MjzQyxu0xw_hkQcsKe3kIq3mdrpkKRKJ7vgVf2TNWA=.2fde4c46-30a5-4a61-8723-75ddd3c84df3@github.com> Message-ID: On Tue, 21 Sep 2021 05:32:37 GMT, David Holmes wrote: >> This patch allows the custom commands of OnError to attach to HotSpot itself. >> It sets the thread of report_and_die() to Native before os::fork_and_exec(cmd). >> This prevents cmds which require safepoint synchronization from deadlock. >> eg. OnError='jcmd %p Thread.print'. >> >> Without this patch, we will encounter a deadlock at safepoint synchronization. >> `"main" #1` is the very thread which executes `os::fork_and_exec(cmd)`. >> >> >> Aborting due to java.lang.OutOfMemoryError: Java heap space >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (debug.cpp:364), pid=94632, tid=94633 >> # fatal error: OutOfMemory encountered: Java heap space >> # >> # JRE version: OpenJDK Runtime Environment (18.0) (build 18-internal+0-adhoc.xxinliu.jdk) >> # Java VM: OpenJDK 64-Bit Server VM (18-internal+0-adhoc.xxinliu.jdk, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) >> # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again >> # >> # An error report file with more information is saved as: >> # /local/home/xxinliu/JDK-2085/hs_err_pid94632.log >> # >> # -XX:OnError="jcmd %p Thread.print" >> # Executing /bin/sh -c "jcmd 94632 Thread.print" ... >> 94632: >> [10.616s][warning][safepoint] >> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timeout detected: >> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timed out while spinning to reach a safepoint. >> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Threads which did not reach the safepoint: >> [10.616s][warning][safepoint] # "main" #1 prio=5 os_prio=0 cpu=236.97ms elapsed=10.61s tid=0x00007f01b00232f0 nid=94633 runnable [0x00007f01b7a08000] >> [10.616s][warning][safepoint] java.lang.Thread.State: RUNNABLE >> [10.616s][warning][safepoint] >> [10.616s][warning][safepoint] # SafepointSynchronize::begin: (End of list) > > src/hotspot/share/utilities/vmError.cpp line 1334: > >> 1332: >> 1333: if (_thread != nullptr) { >> 1334: assert(!_thread->owns_locks(), "must release all locks when leaving VM"); > > This can't be an assertion as the thread is not knowingly leaving the VM without first releasing locks. If it does hold locks then that could lead to additional problems and strange errors when running the external command. We have to decide whether it is safest/best to simply not transition to native if holding locks, or whether it is okay to proceed knowing that there are risks of secondary crashes, or hangs, if we do. In debug build, a JavaThread can't transit to Native if it owns any lock. Even I remove the assert here, it will hit another assert later in `ThreadStateTransition::transition_from_vm`. // Checks safepoint allowed and clears unhandled oops at potential safepoints. void JavaThread::check_possible_safepoint() { if (_no_safepoint_count > 0) { print_owned_locks(); assert(false, "Possible safepoint reached by thread that does not allow it"); } I'd like to make VMErrorThreadToNativeFromVM only change state if _thread doesn't own any mutex, but 'Thread::own_lock()` is only available in debug build. Only one test ?runtime/ErrorHandling/TestOnError.java? will call VMError::report_and_die with Threads_lock. ------------- PR: https://git.openjdk.java.net/jdk/pull/5590 From stuefe at openjdk.java.net Wed Sep 22 07:10:57 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 22 Sep 2021 07:10:57 GMT Subject: RFR: 8273608: Deadlock when jcmd of OnError attaches to itself In-Reply-To: <3mILJs2Lcq7t5gUDP70FH2LVsm-NT2UTsm1JY-rCKB0=.d4dd4b55-9476-43ad-a0cf-ce6d2a9bef4e@github.com> References: <3mILJs2Lcq7t5gUDP70FH2LVsm-NT2UTsm1JY-rCKB0=.d4dd4b55-9476-43ad-a0cf-ce6d2a9bef4e@github.com> Message-ID: On Mon, 20 Sep 2021 22:02:37 GMT, Xin Liu wrote: > This patch allows the custom commands of OnError to attach to HotSpot itself. > It sets the thread of report_and_die() to Native before os::fork_and_exec(cmd). > This prevents cmds which require safepoint synchronization from deadlock. > eg. OnError='jcmd %p Thread.print'. > > Without this patch, we will encounter a deadlock at safepoint synchronization. > `"main" #1` is the very thread which executes `os::fork_and_exec(cmd)`. > > > Aborting due to java.lang.OutOfMemoryError: Java heap space > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (debug.cpp:364), pid=94632, tid=94633 > # fatal error: OutOfMemory encountered: Java heap space > # > # JRE version: OpenJDK Runtime Environment (18.0) (build 18-internal+0-adhoc.xxinliu.jdk) > # Java VM: OpenJDK 64-Bit Server VM (18-internal+0-adhoc.xxinliu.jdk, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # /local/home/xxinliu/JDK-2085/hs_err_pid94632.log > # > # -XX:OnError="jcmd %p Thread.print" > # Executing /bin/sh -c "jcmd 94632 Thread.print" ... > 94632: > [10.616s][warning][safepoint] > [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timeout detected: > [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timed out while spinning to reach a safepoint. > [10.616s][warning][safepoint] # SafepointSynchronize::begin: Threads which did not reach the safepoint: > [10.616s][warning][safepoint] # "main" #1 prio=5 os_prio=0 cpu=236.97ms elapsed=10.61s tid=0x00007f01b00232f0 nid=94633 runnable [0x00007f01b7a08000] > [10.616s][warning][safepoint] java.lang.Thread.State: RUNNABLE > [10.616s][warning][safepoint] > [10.616s][warning][safepoint] # SafepointSynchronize::begin: (End of list) Can we limit this to the jcmd-attaches-to-me scenario? In general, the less we modify the VM state before core'ing the better. This distorts the picture and may confuse analysts of the hs-err file/core. I think we should do this only if necessary. Potentially, I would even limit it to OOM situations since for other types of errors (eg crashes) I do not see the point of attaching with jcmd. To prevent deadlock in those cases, one may just avoid calling jcmd altogether. ------------- Changes requested by stuefe (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5590 From xliu at openjdk.java.net Wed Sep 22 07:41:57 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Wed, 22 Sep 2021 07:41:57 GMT Subject: RFR: 8273608: Deadlock when jcmd of OnError attaches to itself In-Reply-To: <3mILJs2Lcq7t5gUDP70FH2LVsm-NT2UTsm1JY-rCKB0=.d4dd4b55-9476-43ad-a0cf-ce6d2a9bef4e@github.com> References: <3mILJs2Lcq7t5gUDP70FH2LVsm-NT2UTsm1JY-rCKB0=.d4dd4b55-9476-43ad-a0cf-ce6d2a9bef4e@github.com> Message-ID: <6Uyb7BNksjsYpgOZVgn8c8DWVAVPDz8lwofK_UOtH10=.2e5f8703-31e3-4063-95ef-45c2fe1dd831@github.com> On Mon, 20 Sep 2021 22:02:37 GMT, Xin Liu wrote: > This patch allows the custom commands of OnError to attach to HotSpot itself. > It sets the thread of report_and_die() to Native before os::fork_and_exec(cmd). > This prevents cmds which require safepoint synchronization from deadlock. > eg. OnError='jcmd %p Thread.print'. > > Without this patch, we will encounter a deadlock at safepoint synchronization. > `"main" #1` is the very thread which executes `os::fork_and_exec(cmd)`. > > > Aborting due to java.lang.OutOfMemoryError: Java heap space > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (debug.cpp:364), pid=94632, tid=94633 > # fatal error: OutOfMemory encountered: Java heap space > # > # JRE version: OpenJDK Runtime Environment (18.0) (build 18-internal+0-adhoc.xxinliu.jdk) > # Java VM: OpenJDK 64-Bit Server VM (18-internal+0-adhoc.xxinliu.jdk, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # /local/home/xxinliu/JDK-2085/hs_err_pid94632.log > # > # -XX:OnError="jcmd %p Thread.print" > # Executing /bin/sh -c "jcmd 94632 Thread.print" ... > 94632: > [10.616s][warning][safepoint] > [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timeout detected: > [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timed out while spinning to reach a safepoint. > [10.616s][warning][safepoint] # SafepointSynchronize::begin: Threads which did not reach the safepoint: > [10.616s][warning][safepoint] # "main" #1 prio=5 os_prio=0 cpu=236.97ms elapsed=10.61s tid=0x00007f01b00232f0 nid=94633 runnable [0x00007f01b7a08000] > [10.616s][warning][safepoint] java.lang.Thread.State: RUNNABLE > [10.616s][warning][safepoint] > [10.616s][warning][safepoint] # SafepointSynchronize::begin: (End of list) > Can we limit this to the jcmd-attaches-to-me scenario? In general, the less we modify the VM state before core'ing the better. This distorts the picture and may confuse analysts of the hs-err file/core. I think we should do this only if necessary. > > Potentially, I would even limit it to OOM situations since for other types of errors (eg crashes) I do not see the point of attaching with jcmd. To prevent deadlock in those cases, one may just avoid calling jcmd altogether. The only reason I try this because I would like to get heap dump when `-XX:AbortVMOnException=java.lang.OutOfMemoryError ` does trigger a fatal. Indeed, I know we can get a core file and extract java heap from it. Some counter-arguments are: 1) core dump is subject to kernel and ulimit constraints. 2) filesize is too big 3) not secure. I come up an idea to use OnError=jcmd %p GC.heap_dump to simulate `HeapDumpOnOutOfMemoryError`. if neither of you guys thinks it's a good idea, I can drop it. As you said, it will distort VMThread for sure. ------------- PR: https://git.openjdk.java.net/jdk/pull/5590 From dholmes at openjdk.java.net Wed Sep 22 07:49:01 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 22 Sep 2021 07:49:01 GMT Subject: RFR: 8273608: Deadlock when jcmd of OnError attaches to itself In-Reply-To: <3mILJs2Lcq7t5gUDP70FH2LVsm-NT2UTsm1JY-rCKB0=.d4dd4b55-9476-43ad-a0cf-ce6d2a9bef4e@github.com> References: <3mILJs2Lcq7t5gUDP70FH2LVsm-NT2UTsm1JY-rCKB0=.d4dd4b55-9476-43ad-a0cf-ce6d2a9bef4e@github.com> Message-ID: On Mon, 20 Sep 2021 22:02:37 GMT, Xin Liu wrote: > This patch allows the custom commands of OnError to attach to HotSpot itself. > It sets the thread of report_and_die() to Native before os::fork_and_exec(cmd). > This prevents cmds which require safepoint synchronization from deadlock. > eg. OnError='jcmd %p Thread.print'. > > Without this patch, we will encounter a deadlock at safepoint synchronization. > `"main" #1` is the very thread which executes `os::fork_and_exec(cmd)`. > > > Aborting due to java.lang.OutOfMemoryError: Java heap space > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (debug.cpp:364), pid=94632, tid=94633 > # fatal error: OutOfMemory encountered: Java heap space > # > # JRE version: OpenJDK Runtime Environment (18.0) (build 18-internal+0-adhoc.xxinliu.jdk) > # Java VM: OpenJDK 64-Bit Server VM (18-internal+0-adhoc.xxinliu.jdk, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # /local/home/xxinliu/JDK-2085/hs_err_pid94632.log > # > # -XX:OnError="jcmd %p Thread.print" > # Executing /bin/sh -c "jcmd 94632 Thread.print" ... > 94632: > [10.616s][warning][safepoint] > [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timeout detected: > [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timed out while spinning to reach a safepoint. > [10.616s][warning][safepoint] # SafepointSynchronize::begin: Threads which did not reach the safepoint: > [10.616s][warning][safepoint] # "main" #1 prio=5 os_prio=0 cpu=236.97ms elapsed=10.61s tid=0x00007f01b00232f0 nid=94633 runnable [0x00007f01b7a08000] > [10.616s][warning][safepoint] java.lang.Thread.State: RUNNABLE > [10.616s][warning][safepoint] > [10.616s][warning][safepoint] # SafepointSynchronize::begin: (End of list) I don't see how this can be limited to the jcmd-attach-to-self issue because we have no idea what the command to be executed will be. I don't see how changing the thread state to thread-in-native will cause any problems or confusion - the stack will show it has gone into os::fork_and_exec. ------------- PR: https://git.openjdk.java.net/jdk/pull/5590 From dholmes at openjdk.java.net Wed Sep 22 07:49:01 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 22 Sep 2021 07:49:01 GMT Subject: RFR: 8273608: Deadlock when jcmd of OnError attaches to itself In-Reply-To: References: <3mILJs2Lcq7t5gUDP70FH2LVsm-NT2UTsm1JY-rCKB0=.d4dd4b55-9476-43ad-a0cf-ce6d2a9bef4e@github.com> <_MjzQyxu0xw_hkQcsKe3kIq3mdrpkKRKJ7vgVf2TNWA=.2fde4c46-30a5-4a61-8723-75ddd3c84df3@github.com> Message-ID: On Wed, 22 Sep 2021 06:56:12 GMT, Xin Liu wrote: >> src/hotspot/share/utilities/vmError.cpp line 1334: >> >>> 1332: >>> 1333: if (_thread != nullptr) { >>> 1334: assert(!_thread->owns_locks(), "must release all locks when leaving VM"); >> >> This can't be an assertion as the thread is not knowingly leaving the VM without first releasing locks. If it does hold locks then that could lead to additional problems and strange errors when running the external command. We have to decide whether it is safest/best to simply not transition to native if holding locks, or whether it is okay to proceed knowing that there are risks of secondary crashes, or hangs, if we do. > > In debug build, a JavaThread can't transit to Native if it owns any lock. Even I remove the assert here, it will hit another assert later in `ThreadStateTransition::transition_from_vm`. > > > // Checks safepoint allowed and clears unhandled oops at potential safepoints. > void JavaThread::check_possible_safepoint() { > if (_no_safepoint_count > 0) { > print_owned_locks(); > assert(false, "Possible safepoint reached by thread that does not allow it"); > } > > > I'd like to make VMErrorThreadToNativeFromVM only change state if _thread doesn't own any mutex, but 'Thread::own_lock()` is only available in debug build. > > Only one test ?runtime/ErrorHandling/TestOnError.java? will call VMError::report_and_die with Threads_lock. I did flag this problem originally. It is unfortunate that we can't tell if a thread holds any locks in a product build. Not sure how to deal with this. ------------- PR: https://git.openjdk.java.net/jdk/pull/5590 From aph at openjdk.java.net Wed Sep 22 07:53:04 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 22 Sep 2021 07:53:04 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v3] In-Reply-To: <5yTyf-BNoSsU36WxJNxGum10rfMKf4dkZAFIVFl7zEw=.261e52fc-1d56-4ed8-942f-97335f07eca6@github.com> References: <5yTyf-BNoSsU36WxJNxGum10rfMKf4dkZAFIVFl7zEw=.261e52fc-1d56-4ed8-942f-97335f07eca6@github.com> Message-ID: On Wed, 22 Sep 2021 03:03:02 GMT, Nick Gasson wrote: >> Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: >> >> Move emitting code to MacroAssembler::spin_wait >> >> Code emitting spin pauses is moved to MacroAssembler::spin_wait. >> As OptoAssembly output is changed, tests are updated to parse >> PrintAssembly. > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 1385: > >> 1383: // Code for java.lang.Thread::onSpinWait() intrinsic. >> 1384: void spin_wait() { >> 1385: #define EMIT_N_INST(n, inst) for (int i = 0; i < (n); ++i) inst() > > Why use a macro here? You could just put the loop around the switch statement. And the method body seems sufficiently large that it ought to go in the .cpp file. Good point. There's no significant performance advantage to having this in the header. ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From stuefe at openjdk.java.net Wed Sep 22 07:55:03 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 22 Sep 2021 07:55:03 GMT Subject: RFR: 8273608: Deadlock when jcmd of OnError attaches to itself In-Reply-To: <6Uyb7BNksjsYpgOZVgn8c8DWVAVPDz8lwofK_UOtH10=.2e5f8703-31e3-4063-95ef-45c2fe1dd831@github.com> References: <3mILJs2Lcq7t5gUDP70FH2LVsm-NT2UTsm1JY-rCKB0=.d4dd4b55-9476-43ad-a0cf-ce6d2a9bef4e@github.com> <6Uyb7BNksjsYpgOZVgn8c8DWVAVPDz8lwofK_UOtH10=.2e5f8703-31e3-4063-95ef-45c2fe1dd831@github.com> Message-ID: On Wed, 22 Sep 2021 07:38:22 GMT, Xin Liu wrote: > > Can we limit this to the jcmd-attaches-to-me scenario? In general, the less we modify the VM state before core'ing the better. This distorts the picture and may confuse analysts of the hs-err file/core. I think we should do this only if necessary. > > Potentially, I would even limit it to OOM situations since for other types of errors (eg crashes) I do not see the point of attaching with jcmd. To prevent deadlock in those cases, one may just avoid calling jcmd altogether. > > The only reason I try this because I would like to get heap dump when `-XX:AbortVMOnException=java.lang.OutOfMemoryError ` does trigger a fatal. > > Indeed, I know we can get a core file and extract java heap from it. Some counter-arguments are: 1) core dump is subject to kernel and ulimit constraints. 2) filesize is too big 3) not secure. I come up an idea to use OnError=jcmd %p GC.heap_dump to simulate `HeapDumpOnOutOfMemoryError`. > > if neither of you guys thinks it's a good idea, I can drop it. As you said, it will distort VMThread for sure. I am not against your fix. Makes sense and offers some merits compared to the usual core file analysis. All I am saying is that I would limit it to the case of OnError="jcmd ". And maybe just to OOMs. E.g. If someone does an OnError with a different tool or script, maybe something harmless like OnError="cp core /my-core-dir", I would rather we don't switch the thread to native. Subject to discussion, oc. Maybe we could limit it to OOMs but leave it for all values of OnError. (Its all a best-effort anyway. E.g. when OOMing due to thread creation error, the fork needed for OnError won't work either) ------------- PR: https://git.openjdk.java.net/jdk/pull/5590 From stuefe at openjdk.java.net Wed Sep 22 07:55:04 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 22 Sep 2021 07:55:04 GMT Subject: RFR: 8273608: Deadlock when jcmd of OnError attaches to itself In-Reply-To: <3mILJs2Lcq7t5gUDP70FH2LVsm-NT2UTsm1JY-rCKB0=.d4dd4b55-9476-43ad-a0cf-ce6d2a9bef4e@github.com> References: <3mILJs2Lcq7t5gUDP70FH2LVsm-NT2UTsm1JY-rCKB0=.d4dd4b55-9476-43ad-a0cf-ce6d2a9bef4e@github.com> Message-ID: On Mon, 20 Sep 2021 22:02:37 GMT, Xin Liu wrote: > This patch allows the custom commands of OnError to attach to HotSpot itself. > It sets the thread of report_and_die() to Native before os::fork_and_exec(cmd). > This prevents cmds which require safepoint synchronization from deadlock. > eg. OnError='jcmd %p Thread.print'. > > Without this patch, we will encounter a deadlock at safepoint synchronization. > `"main" #1` is the very thread which executes `os::fork_and_exec(cmd)`. > > > Aborting due to java.lang.OutOfMemoryError: Java heap space > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (debug.cpp:364), pid=94632, tid=94633 > # fatal error: OutOfMemory encountered: Java heap space > # > # JRE version: OpenJDK Runtime Environment (18.0) (build 18-internal+0-adhoc.xxinliu.jdk) > # Java VM: OpenJDK 64-Bit Server VM (18-internal+0-adhoc.xxinliu.jdk, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # /local/home/xxinliu/JDK-2085/hs_err_pid94632.log > # > # -XX:OnError="jcmd %p Thread.print" > # Executing /bin/sh -c "jcmd 94632 Thread.print" ... > 94632: > [10.616s][warning][safepoint] > [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timeout detected: > [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timed out while spinning to reach a safepoint. > [10.616s][warning][safepoint] # SafepointSynchronize::begin: Threads which did not reach the safepoint: > [10.616s][warning][safepoint] # "main" #1 prio=5 os_prio=0 cpu=236.97ms elapsed=10.61s tid=0x00007f01b00232f0 nid=94633 runnable [0x00007f01b7a08000] > [10.616s][warning][safepoint] java.lang.Thread.State: RUNNABLE > [10.616s][warning][safepoint] > [10.616s][warning][safepoint] # SafepointSynchronize::begin: (End of list) After David's response, I withdraw my objection. Please go ahead with the change. ------------- PR: https://git.openjdk.java.net/jdk/pull/5590 From stuefe at openjdk.java.net Wed Sep 22 08:01:01 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 22 Sep 2021 08:01:01 GMT Subject: RFR: 8273608: Deadlock when jcmd of OnError attaches to itself In-Reply-To: References: <3mILJs2Lcq7t5gUDP70FH2LVsm-NT2UTsm1JY-rCKB0=.d4dd4b55-9476-43ad-a0cf-ce6d2a9bef4e@github.com> <_MjzQyxu0xw_hkQcsKe3kIq3mdrpkKRKJ7vgVf2TNWA=.2fde4c46-30a5-4a61-8723-75ddd3c84df3@github.com> Message-ID: <8eSPurQN964AGBNvlGX6G3610dGkXNOeB1o5WkGDeBA=.1de32b00-c1df-4a6b-a2b9-dafccc034566@github.com> On Wed, 22 Sep 2021 07:43:43 GMT, David Holmes wrote: >> In debug build, a JavaThread can't transit to Native if it owns any lock. Even I remove the assert here, it will hit another assert later in `ThreadStateTransition::transition_from_vm`. >> >> >> // Checks safepoint allowed and clears unhandled oops at potential safepoints. >> void JavaThread::check_possible_safepoint() { >> if (_no_safepoint_count > 0) { >> print_owned_locks(); >> assert(false, "Possible safepoint reached by thread that does not allow it"); >> } >> >> >> I'd like to make VMErrorThreadToNativeFromVM only change state if _thread doesn't own any mutex, but 'Thread::own_lock()` is only available in debug build. >> >> Only one test ?runtime/ErrorHandling/TestOnError.java? will call VMError::report_and_die with Threads_lock. > > I did flag this problem originally. It is unfortunate that we can't tell if a thread holds any locks in a product build. Not sure how to deal with this. > In debug build, a JavaThread can't transit to Native if it owns any lock. Even I remove the assert here, it will hit another assert later in `ThreadStateTransition::transition_from_vm`. > > ``` > // Checks safepoint allowed and clears unhandled oops at potential safepoints. > void JavaThread::check_possible_safepoint() { > if (_no_safepoint_count > 0) { > print_owned_locks(); > assert(false, "Possible safepoint reached by thread that does not allow it"); > } > ``` > > I'd like to make VMErrorThreadToNativeFromVM only change state if _thread doesn't own any mutex, but 'Thread::own_lock()` is only available in debug build. Which may be fine, since asserts only fire in debug builds. ------------- PR: https://git.openjdk.java.net/jdk/pull/5590 From sakatakui at oss.nttdata.com Wed Sep 22 11:25:33 2021 From: sakatakui at oss.nttdata.com (Koichi Sakata) Date: Wed, 22 Sep 2021 20:25:33 +0900 Subject: Regarding options of error and dump file paths Message-ID: <2844697c5fadfa932d7d35b50d68fa1e@oss.nttdata.com> Hi Kevin, I'm glad to hear from you. Your explanation is exactly what I was thinking. I totally agree with you. I wasn't aware of that issue in JBS. Thank you for letting me know. I'd like to propose to define subtasks of the issue. Then we can start working on the issue. I think we have three subtasks as follows. 1. Add the new option and apply it to hs_err file, heap dump file and replay file. 2. Apply it to JDK Flight Recorder. The reason why this is a separate task is that JFR has a peculiar structure. 3. Explore if the attach api can use the location set by the new option. If you agree to that, I would add subtasks to the issue in JBS and start to work on the first subtask. I'd like to add the option and apply it to hs_err file or something to have deeper discussions in this community. Regards, Koichi On 15-09-2021 06:12 PM, Kevin Walls wrote: > Hi Koichi, > > Yes, just wanted to (a little late) acknowledge that a few others were > thinking about this kind of thing. 8-) > > I was thinking from a container point of view, had not heard the > demand for this from support teams, but I can see the point somewhat. > > Running in a container, where you have some volume/location available > for logs to persist, we would ideally have one VM option to set a > base/root location for various output files that may currently default > to the current directory, or somewhere else. > > We really want to take applications as they are, without changing > their startup scripts etc, but adding one VM option seems reasonable. > > Recently I logged as a placeholder for exactly this kind of option... > > 8270552: Container convenience option. > https://bugs.openjdk.java.net/browse/JDK-8270552 > > ...although I didn't progress it much so far, and have not suggested > an option name. > > There are some complications I'm sure. e.g. Would the new option > provide a root, and other settings e.g. ErrorFile or HeapDumpPath > ALWAYS have the new root prepended? > Or do we let absolute paths "escape" from the new root? (which might > be more work for the users, as you may have several VM options to > change, to make use of the new option). I think the new option is > always the new root, for the affected paths. > > > Also, in a container, we want to explore if this new location can be > used for the attach api. There is currently much scanning of many > /proc dirs on Linux. That is more involved, but could make use of > the same option (the goal is to use fewer options). But this does not > necessarily have to be implemented at the same time (as long as the > new option is named appropriately). > > More to discuss... > > Thanks! > Kevin > > > -----Original Message----- > From: hotspot-dev On Behalf Of > Koichi Sakata > Sent: 14 September 2021 07:45 > To: hotspot-dev at openjdk.java.net > Subject: Re: Regarding options of error and dump file paths > > Hi all, > > I believe that the option helps us, especially people who belong to > support team.?Because it enables us easily to get required files to > troubleshoot. It's also useful in container environment. We save those > files when we set a path of the option to persistent volume, even if > container are deleted. > > So I'm thinking about how the option works. First of all, it should > deal with following files. > - GC (heap dumps) > - JIT (replay files) > - hs_err files > - JFR (a number of files) > > Whereas it should exclude files as follows. > - jcmd/dcmd dumps > - Unified logging > > Let's see concrete usage examples of the option. Suppose we name the > option ReportDir. > > Case 1: Set no options > JVM outputs files in each default directory when we set no options. > - GC: ./java_pid%p.hprof > - JIT: ./replay_pid%p.log > - hs_err files: ./hs_err_pid%p.log > - JFR: ./hs_err_pid%p.jfr, ./hs_oom_pid%p.jfr, ./hs_soe_pid%p.jfr > > Case 2: Set the option only > We just run `java -XX:ReportDir=/foo/bar/ ...`, then those files are > putted in the /foo/bar/ directory. > - GC: /foo/bar/java_pid%p.hprof > - JIT: /foo/bar/replay_pid%p.log > - hs_err files: /foo/bar/hs_err_pid%p.log > - JFR: /foo/bar/hs_err_pid%p.jfr, /foo/bar/hs_oom_pid%p.jfr, > /foo/bar/hs_soe_pid%p.jfr > > Case 3: Set the option with a relative path Suppose the working > directory is /home/duke, run `java -XX:ReportDir=./foo/bar/ ...`. JVM > finds the output directory from the working directory and the relative > path. > - GC: /home/duke/foo/bar/java_pid%p.hprof > - JIT: /home/duke/foo/bar/replay_pid%p.log > - hs_err files: /home/duke/foo/bar/hs_err_pid%p.log > - JFR: /home/duke/foo/bar/hs_err_pid%p.jfr, > /home/duke/foo/bar/hs_oom_pid%p.jfr, > /home/duke/foo/bar/hs_soe_pid%p.jfr > > Case 4: Set the option with the existing path option Run `java > -XX:ReportDir=/foo/bar/ -XX:ErrorFile=/home/duke/hs_err_pid%p.log > ...`. The path of ErrorFile overrides the value of ReportDir. > - GC: /foo/bar/java_pid%p.hprof > - JIT: /foo/bar/replay_pid%p.log > - hs_err files: /home/duke/hs_err_pid%p.log <- It differs from the > others > - JFR: /foo/bar/hs_err_pid%p.jfr, /foo/bar/hs_oom_pid%p.jfr, > /foo/bar/hs_soe_pid%p.jfr > > Case 5: Set the option with the existing path option which has a > relative path Suppose the working directory is /home/duke, run `java > -XX:ReportDir=./foo/bar/ -XX:HeapDumpPath=./baz/ > -XX:+HeapDumpOnOutOfMemoryError ...`. > - GC: /home/duke/foo/bar/baz/java_pid%p.hprof <- It differs from the > others > - JIT: /home/duke/foo/bar/replay_pid%p.log > - hs_err files: /home/duke/foo/bar/hs_err_pid%p.log > - JFR: /home/duke/foo/bar/hs_err_pid%p.jfr, > /home/duke/foo/bar/hs_oom_pid%p.jfr, > /home/duke/foo/bar/hs_soe_pid%p.jfr > > The above example finds the heap dump path by the combination of the > working directory, the relative path of ReportDir and the relative > path of HeapDumpPath. > As an alternative idea, we can ignore the relative path of ReportDir > when HeapDumpPath has a relative path. In that case, the heap dump > path is as follows. > - GC: /home/duke/baz/java_pid%p.hprof > > In either case, I recognize that using relative paths will be slightly > complicated... > > Last but not least, I should be pleased if we could go ahead with this > topic. > > Regards, > Koichi > > > On 03-09-2021 05:41 PM, Koichi Sakata wrote: >> Hi David, >> >> I?m sorry for the late reply. Thank you for your great advice. >> >>> Having an explicit option override the default directory option is a >>> good idea, but I'm not sure it is that clear cut. If you can specify >>> a relative directory and file name for a given dump file, might you >>> not want that to be relative to the specified default path, rather >>> than relative to the pwd? >> >> I occasionally want to use a relative path from the specified default >> path. This usage might confuse the path where files are outputted and >> complicate to fix, so we probably should prohibit relative paths when >> we use the default path. We can choose the specification after we find >> detailed expectations. >> >>> And we actually have quite a lot of potential output files from: >>> ?? - GC (heap dumps) >>> ?? - JIT (replay files) >>> ?? - hs_err files >>> ?? - JFR (a number of files) >>> ?? - jcmd/dcmd dumps? >>> ?? - Unified logging? >>> >>> I think figuring out the exact details of how this should work, and >>> interact with all the different files involved may be more involved >>> than just prepending a path component. >> >> I completely agree with you. To enable the new option needs a lot of >> our work, but that will improve convenience for users, I believe. >> Enabling easily to gathering error related files in one place helps us >> to troubleshoot. Not so many users set all these path options. If they >> use the new option, all they have to do will be sending files in the >> directory to their support personnel. In addition, they will get >> easier to keep files even on container environments. >> >>> I also think I would need to hear much greater demand, with detailed >>> usage expectations, before supporting this. >> >> I think so, too. I'd like to hear various people's point of view. >> >> Regards, >> Koichi >> >> >> On 2021/08/26 15:23, David Holmes wrote: >>> Hi Koichi, >>> >>> On 23/08/2021 1:29 pm, Koichi Sakata wrote: >>>> Hi all, >>>> >>>> I'm writing to get feedback on my idea about options for error and >>>> dump file paths. >>>> >>>> First of all, we can specify several options related to error and >>>> dump files. For example, the HeapDumpPath option sets the heap dump >>>> file and the ErrorFile option sets the hs_error file. >>>> >>>> I've felt inconvenience about that because we need to write all path >>>> options to put those files in a specific directory. I also recognize >>>> that they are outputted in the working directory when I run an >>>> application with no options. But I'd like to keep them in any >>>> directory. So the new option that sets the directory where those >>>> files are outputted would be useful, I think. >>>> >>>> The new option helps us especially to run applications on containers >>>> like Docker, Kubernetes etc. If we run them without those existing >>>> options on containers, files will be put in the local directory of >>>> each container. We lose files after we operate the container such as >>>> deleting it. The option enables us to keep certainly all error and >>>> dump files if we just specify the path of the persistent volume for >>>> the new option. >>>> >>>> As a concrete example, when we specify >>>> -XX:ErrorAndDumpPath=/foo/bar/ (This option name is tentative), >>>> -XX:+HeapDumpOnOutOfMemoryError and -XX:StartFlightRecording etc., >>>> files are generated in the /foo/bar directory. From my point of >>>> view, the option will deal with the following files: >>>> - heap dump file (java_pid%p.hprof) >>>> - error log file (hs_err_pid%p.log) >>>> - JFR emergency dumps (hs_err_pid%p.jfr, hs_oom_pid%p.jfr, >>>> hs_soe_pid%p.jfr) >>>> - replay file (replay_pid%p.log) >>>> >>>> The existing path options should override the new option. If I set >>>> -XX:ErrorAndDumpPath=/foo/bar/ and -XX:HeapDumpPath=/foo/baz/, a >>>> heap dump file will be in the /foo/baz directory and other files >>>> will be created in the /foo/bar. >>>> >>>> I would like to hear your point of view. If some people agree to >>>> this idea, I will write a patch. >>> >>> My initial reaction was that this seemed something better handled in >>> a launch script because I figured if you had complex needs in >>> relation to where these files were being placed, then you'd use a >>> launch script to help manage that anyway. >>> >>> But I can see there would be some convenience to controlling the >>> output directory without also having to restate the default file >>> names. >>> >>> Having an explicit option override the default directory option is a >>> good idea, but I'm not sure it is that clear cut. If you can specify >>> a relative directory and file name for a given dump file, might you >>> not want that to be relative to the specified default path, rather >>> than relative to the pwd? >>> >>> And we actually have quite a lot of potential output files from: >>> ?- GC (heap dumps) >>> ?- JIT (replay files) >>> ?- hs_err files >>> ?- JFR (a number of files) >>> ?- jcmd/dcmd dumps? >>> ?- Unified logging? >>> >>> I think figuring out the exact details of how this should work, and >>> interact with all the different files involved may be more involved >>> than just prepending a path component. >>> >>> I also think I would need to hear much greater demand, with detailed >>> usage expectations, before supporting this. >>> >>> Just my 2c. >>> >>> Cheers, >>> David >>> ----- >>> >>>> Regards, >>>> Koichi From github.com+42899633+eastig at openjdk.java.net Wed Sep 22 12:32:02 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Wed, 22 Sep 2021 12:32:02 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v3] In-Reply-To: References: <5yTyf-BNoSsU36WxJNxGum10rfMKf4dkZAFIVFl7zEw=.261e52fc-1d56-4ed8-942f-97335f07eca6@github.com> Message-ID: On Wed, 22 Sep 2021 07:49:36 GMT, Andrew Haley wrote: >> src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 1385: >> >>> 1383: // Code for java.lang.Thread::onSpinWait() intrinsic. >>> 1384: void spin_wait() { >>> 1385: #define EMIT_N_INST(n, inst) for (int i = 0; i < (n); ++i) inst() >> >> Why use a macro here? You could just put the loop around the switch statement. And the method body seems sufficiently large that it ought to go in the .cpp file. > > Good point. There's no significant performance advantage to having this in the header. > Why use a macro here? You could just put the loop around the switch statement. And the method body seems sufficiently large that it ought to go in the .cpp file. :) compiler engineering experience. Compilers have a problem to apply unswitching optimization to loop-invariant SWITCHes. I'll update the code as suggested. ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From github.com+42899633+eastig at openjdk.java.net Wed Sep 22 12:36:58 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Wed, 22 Sep 2021 12:36:58 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v3] In-Reply-To: References: Message-ID: On Tue, 21 Sep 2021 22:22:50 GMT, Evgeny Astigeevich wrote: >> This PR is a follow-up on the discussion [?RFC: AArch64: Implementing spin pauses with ISB?](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054033.html). >> >> It adds the option `OnSpinWaitImpl=value`, where `value` can be: >> >> - `none`: no implementation for spin pauses. This is the default value. >> - `Nnop`: use `nop` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `nop` instructions. >> - `Nisb`: use `isb` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `isb` instructions. >> - `Nyield`: use `yield` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `yield` instructions. >> >> The code for the `Thread.onSpinWait` intrinsic is generated based on the value of `OnSpinWaitImpl`. >> >> Testing: >> >> - `make test TEST="gtest"`: Passed >> - `make run-test TEST="tier1"`: Passed >> - `make run-test TEST="tier2"`: Passed >> - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > Move emitting code to MacroAssembler::spin_wait > > Code emitting spin pauses is moved to MacroAssembler::spin_wait. > As OptoAssembly output is changed, tests are updated to parse > PrintAssembly. @theRealAph, when I was writing a test I notice a strange thing in `PrintAssembly` output: # {method} {0x0000ffff6ac00370} 'test' '()V' in 'compiler/onSpinWait/TestOnSpinWaitImplAArch64$Launcher' # [sp+0x40] (sp of caller) 0x0000ffff9d557680: 1f20 03d5 | e953 40d1 | 3f01 00f9 | ff03 01d1 | fd7b 03a9 | 1f20 03d5 | 1f20 03d5 | 1f20 03d5 0x0000ffff9d5576a0: 1f20 03d5 | 1f20 03d5 | 1f20 03d5 0x0000ffff9d5576ac: ;*invokestatic onSpinWait {reexecute=0 rethrow=0 return_oop=0} ; - compiler.onSpinWait.TestOnSpinWaitImplAArch64$Launcher::test at 0 (line 161) 0x0000ffff9d5576ac: 1f20 03d5 | fd7b 43a9 | ff03 0191 The code is for the case when 7 NOPs are used for a spin pause. In the output only one instruction is after `invokestatic onSpinWait`. Other 6 instructions are before it. Is it expected behaviour or a bug? ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From github.com+42899633+eastig at openjdk.java.net Wed Sep 22 13:48:40 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Wed, 22 Sep 2021 13:48:40 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v4] In-Reply-To: References: Message-ID: > This PR is a follow-up on the discussion [?RFC: AArch64: Implementing spin pauses with ISB?](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054033.html). > > It adds the option `OnSpinWaitImpl=value`, where `value` can be: > > - `none`: no implementation for spin pauses. This is the default value. > - `Nnop`: use `nop` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `nop` instructions. > - `Nisb`: use `isb` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `isb` instructions. > - `Nyield`: use `yield` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `yield` instructions. > > The code for the `Thread.onSpinWait` intrinsic is generated based on the value of `OnSpinWaitImpl`. > > Testing: > > - `make test TEST="gtest"`: Passed > - `make run-test TEST="tier1"`: Passed > - `make run-test TEST="tier2"`: Passed > - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: Move spin_wait in cpp file with removal of loop macro In addition, comments are added to a checking method of a test. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5562/files - new: https://git.openjdk.java.net/jdk/pull/5562/files/fc55a682..4db361e1 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5562&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5562&range=02-03 Stats: 49 lines in 3 files changed: 30 ins; 18 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5562.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5562/head:pull/5562 PR: https://git.openjdk.java.net/jdk/pull/5562 From aph at openjdk.java.net Wed Sep 22 13:48:43 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 22 Sep 2021 13:48:43 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v3] In-Reply-To: References: Message-ID: <5nm0e8NHUdcYJJzhxbBq8rlJcn3Y6fSYzlRrGlm_RtE=.6177e231-f9e7-42df-861f-6974d6648e5c@github.com> On Tue, 21 Sep 2021 22:22:50 GMT, Evgeny Astigeevich wrote: >> This PR is a follow-up on the discussion [?RFC: AArch64: Implementing spin pauses with ISB?](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054033.html). >> >> It adds the option `OnSpinWaitImpl=value`, where `value` can be: >> >> - `none`: no implementation for spin pauses. This is the default value. >> - `Nnop`: use `nop` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `nop` instructions. >> - `Nisb`: use `isb` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `isb` instructions. >> - `Nyield`: use `yield` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `yield` instructions. >> >> The code for the `Thread.onSpinWait` intrinsic is generated based on the value of `OnSpinWaitImpl`. >> >> Testing: >> >> - `make test TEST="gtest"`: Passed >> - `make run-test TEST="tier1"`: Passed >> - `make run-test TEST="tier2"`: Passed >> - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > Move emitting code to MacroAssembler::spin_wait > > Code emitting spin pauses is moved to MacroAssembler::spin_wait. > As OptoAssembly output is changed, tests are updated to parse > PrintAssembly. It's pretty much expected, yes. The debuginfo isn't all that precise. ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From rrich at openjdk.java.net Wed Sep 22 14:03:55 2021 From: rrich at openjdk.java.net (Richard Reingruber) Date: Wed, 22 Sep 2021 14:03:55 GMT Subject: RFR: 8273392: Improve usability of stack-less exceptions due to -XX:+OmitStackTraceInFastThrow [v3] In-Reply-To: <02TswvRnnXEQeNetqwBh8XiivQsMFUCEIt758AdjrHk=.6216480f-2031-481d-8bf6-e94e8c5f1977@github.com> References: <02TswvRnnXEQeNetqwBh8XiivQsMFUCEIt758AdjrHk=.6216480f-2031-481d-8bf6-e94e8c5f1977@github.com> Message-ID: On Tue, 21 Sep 2021 10:09:11 GMT, Volker Simonis wrote: >> If running with `-XX:+OmitStackTraceInFastThrow` (which is the default) C2 will optimize certain "hot" implicit exceptions (i.e. AIOOBE, NullPointerExceptions,..) and replace them by a static, pre-allocated exception without any stacktrace. >> >> However, we can actually do better. Instead of using a single, pre-allocated exception object for all methods we can let the compiler allocate specific exceptions for each compilation unit (i.e. nmethod) and fill them with at least one stack frame with the method /line-number information of the currently compiled method. If the method in question is being inlined (which often happens), we can add stackframes for all callers up to the inlining depth of the method in question. >> >> For the attached JTreg test, we get the following exception in interpreter mode: >> >> java.lang.NullPointerException: Cannot read the array length because "" is null >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) >> at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) >> at compiler.exceptions.StackFrameInFastThrow.main(StackFrameInFastThrow.java:233) >> >> Once the method gets compiled with `-XX:+OmitStackTraceInFastThrow` the same exception will look as follows: >> >> java.lang.NullPointerException >> >> After this change, if `StackFrameInFastThrow.throwImplicitException()` will be compiled stand alone, we will get: >> >> java.lang.NullPointerException >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> >> and if `StackFrameInFastThrow.throwImplicitException()` will be inlined into `level2()` and `level2()` into `level1()` we will get the following exception (altough we're still running with `-XX:+OmitStackTraceInFastThrow`): >> >> java.lang.NullPointerException >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) >> at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) >> >> The new functionality is guarded by `-XX:+/-StackFrameInFastThrow`, but switched on by default (I'll create a CSR for the new option once reviewers are comfortable with the change). Notice that the optimization comes at no run-time costs because all the extra work will be done at compile time. >> >> ## Implementation details >> >> - Already the current implementation of `-XX:+OmitStackTraceInFastThrow` potentially lazy-allocates the empty singleton exceptions like AIOOBE in `ciEnv::ArrayStoreException_instance()`. With this change, if running with `-XX:+StackFrameInFastThrow` we will always allocate new exception objects and populate them with the stack frames which are statically available at compile time (see `java_lang_Throwable::fill_in_stack_trace_of_implicit_exception()`). >> - Because nmethods don't act as strong GC roots, we have to create a global JNI handle for every newly generated exception to prevent GC from collecting them. >> - In order to avoid a memory leak we have to release these global JNI handles once a nmethod gets unloaded. In order to achieve this, I've added a new section "implicit exceptions" to the nmethod which holds these JNI handles. >> - While adding the new "implicit exceptions" section to the corresponding stats (`print_nmethod_stats()` and printing routines (`nmethod::print()`) I realized that a previous change ([JDK-8254231: Implementation of Foreign Linker API (Incubator)](https://bugs.openjdk.java.net/browse/JDK-8254231)) had already introduced a new nmethod section ("native invokers") but missed to add it to the corresponding stats and printing routines so I've added that section as well. >> - The `#ifdef COMPILER2` guards are only required to not break the `zero`/`minimal` builds. >> - The JTreg test is using `-XX:PerMethodTrapLimit=0` to handle all implicit exceptions as "hot". This makes the test simpler and at the same time provokes the allocation of more implicit exceptions. >> - Manually verified that the created Exception objects are freed by GC once the corresponding nmethods have been flushed. >> - Manual "stress" test with a very small heap and continuous recompilation of methods with explicit exceptions to provoke GCs during compilation didn't reveal any issues. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Create implcit exceptions with an array of StackTraceElements right away instead of creating a backtrace. This prevents that implicit exceptions will keep classes alive due to Java mirrors in the backtrace. Hi Volker, > Fortunately, I think the solution is pretty simple. I don't think we need the backtrace at all. In the end it is just an optimization to save some space and not construct the full StackTraceElement[] right at the creation time of an exception. But the implicit exceptions which we are creating here are "nmethod-singletons" and as such I don't think we loose much if we create the array of StackTraceElements right away instead of creating a backtrace (see my last push). The StackTraceElements only contain Strings and therefore don't keep any classes unnecessarily alive. What do you think? I agree. Thanks for fixing! > And once you're on it, would you mind reviewing the whole PR :) :) I'll be out of office next week. Maybe I'll get to review the related https://github.com/openjdk/jdk/pull/5488 after that if needed. Cheers, Richard. ------------- PR: https://git.openjdk.java.net/jdk/pull/5392 From pchilanomate at openjdk.java.net Wed Sep 22 19:44:58 2021 From: pchilanomate at openjdk.java.net (Patricio Chilano Mateo) Date: Wed, 22 Sep 2021 19:44:58 GMT Subject: RFR: 8273608: Deadlock when jcmd of OnError attaches to itself In-Reply-To: <3mILJs2Lcq7t5gUDP70FH2LVsm-NT2UTsm1JY-rCKB0=.d4dd4b55-9476-43ad-a0cf-ce6d2a9bef4e@github.com> References: <3mILJs2Lcq7t5gUDP70FH2LVsm-NT2UTsm1JY-rCKB0=.d4dd4b55-9476-43ad-a0cf-ce6d2a9bef4e@github.com> Message-ID: <6pffdSxbdmf8m_wslp0HWlnENY-PuV-bDG2L_vrV4TM=.9583da02-8384-4d84-b1bd-f36b1457e362@github.com> On Mon, 20 Sep 2021 22:02:37 GMT, Xin Liu wrote: > This patch allows the custom commands of OnError to attach to HotSpot itself. > It sets the thread of report_and_die() to Native before os::fork_and_exec(cmd). > This prevents cmds which require safepoint synchronization from deadlock. > eg. OnError='jcmd %p Thread.print'. > > Without this patch, we will encounter a deadlock at safepoint synchronization. > `"main" #1` is the very thread which executes `os::fork_and_exec(cmd)`. > > > Aborting due to java.lang.OutOfMemoryError: Java heap space > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (debug.cpp:364), pid=94632, tid=94633 > # fatal error: OutOfMemory encountered: Java heap space > # > # JRE version: OpenJDK Runtime Environment (18.0) (build 18-internal+0-adhoc.xxinliu.jdk) > # Java VM: OpenJDK 64-Bit Server VM (18-internal+0-adhoc.xxinliu.jdk, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # /local/home/xxinliu/JDK-2085/hs_err_pid94632.log > # > # -XX:OnError="jcmd %p Thread.print" > # Executing /bin/sh -c "jcmd 94632 Thread.print" ... > 94632: > [10.616s][warning][safepoint] > [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timeout detected: > [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timed out while spinning to reach a safepoint. > [10.616s][warning][safepoint] # SafepointSynchronize::begin: Threads which did not reach the safepoint: > [10.616s][warning][safepoint] # "main" #1 prio=5 os_prio=0 cpu=236.97ms elapsed=10.61s tid=0x00007f01b00232f0 nid=94633 runnable [0x00007f01b7a08000] > [10.616s][warning][safepoint] java.lang.Thread.State: RUNNABLE > [10.616s][warning][safepoint] > [10.616s][warning][safepoint] # SafepointSynchronize::begin: (End of list) Hi Xin, src/hotspot/share/utilities/vmError.cpp line 1341: > 1339: ~VMErrorThreadToNativeFromVM() { > 1340: if (_thread != nullptr) { > 1341: ThreadStateTransition::transition_from_native(_thread, _thread_in_vm); Just some thought on this part. Ideally we should avoid calling process_if_requested_with_exit_check() since attempting to process handshakes/stackwatermarks at this point might lead to all sorts of other issues. An alternative could be to just set the original state back and continue. But maybe we don't care about this because we are almost done with the error reporting and the OnError commands were already executed. That last part would argue to move the wrapper before the while loop. ------------- PR: https://git.openjdk.java.net/jdk/pull/5590 From pchilanomate at openjdk.java.net Wed Sep 22 19:44:59 2021 From: pchilanomate at openjdk.java.net (Patricio Chilano Mateo) Date: Wed, 22 Sep 2021 19:44:59 GMT Subject: RFR: 8273608: Deadlock when jcmd of OnError attaches to itself In-Reply-To: <8eSPurQN964AGBNvlGX6G3610dGkXNOeB1o5WkGDeBA=.1de32b00-c1df-4a6b-a2b9-dafccc034566@github.com> References: <3mILJs2Lcq7t5gUDP70FH2LVsm-NT2UTsm1JY-rCKB0=.d4dd4b55-9476-43ad-a0cf-ce6d2a9bef4e@github.com> <_MjzQyxu0xw_hkQcsKe3kIq3mdrpkKRKJ7vgVf2TNWA=.2fde4c46-30a5-4a61-8723-75ddd3c84df3@github.com> <8eSPurQN964AGBNvlGX6G3610dGkXNOeB1o5WkGDeBA=.1de32b00-c1df-4a6b-a2b9-dafccc034566@github.com> Message-ID: On Wed, 22 Sep 2021 07:57:05 GMT, Thomas Stuefe wrote: >> I did flag this problem originally. It is unfortunate that we can't tell if a thread holds any locks in a product build. Not sure how to deal with this. > >> In debug build, a JavaThread can't transit to Native if it owns any lock. Even I remove the assert here, it will hit another assert later in `ThreadStateTransition::transition_from_vm`. >> >> ``` >> // Checks safepoint allowed and clears unhandled oops at potential safepoints. >> void JavaThread::check_possible_safepoint() { >> if (_no_safepoint_count > 0) { >> print_owned_locks(); >> assert(false, "Possible safepoint reached by thread that does not allow it"); >> } >> ``` >> >> I'd like to make VMErrorThreadToNativeFromVM only change state if _thread doesn't own any mutex, but 'Thread::own_lock()` is only available in debug build. > > Which may be fine, since asserts only fire in debug builds. We can always walk _mutex_array like we do in print_owned_locks_on_error(). Note that locks created outside mutex_init() will not be visible though. Maybe we should fix that. ------------- PR: https://git.openjdk.java.net/jdk/pull/5590 From phh at openjdk.java.net Wed Sep 22 20:07:05 2021 From: phh at openjdk.java.net (Paul Hohensee) Date: Wed, 22 Sep 2021 20:07:05 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v4] In-Reply-To: References: Message-ID: On Wed, 22 Sep 2021 13:48:40 GMT, Evgeny Astigeevich wrote: >> This PR is a follow-up on the discussion [?RFC: AArch64: Implementing spin pauses with ISB?](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054033.html). >> >> It adds the option `OnSpinWaitImpl=value`, where `value` can be: >> >> - `none`: no implementation for spin pauses. This is the default value. >> - `Nnop`: use `nop` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `nop` instructions. >> - `Nisb`: use `isb` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `isb` instructions. >> - `Nyield`: use `yield` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `yield` instructions. >> >> The code for the `Thread.onSpinWait` intrinsic is generated based on the value of `OnSpinWaitImpl`. >> >> Testing: >> >> - `make test TEST="gtest"`: Passed >> - `make run-test TEST="tier1"`: Passed >> - `make run-test TEST="tier2"`: Passed >> - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > Move spin_wait in cpp file with removal of loop macro > > In addition, comments are added to a checking method of a test. In pause_aarch64.hpp, I'd put the definition of PauseInst inside the definition of PauseImplDesc in order to not clutter the global namespec more than needed. ------------- Changes requested by phh (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5562 From phh at openjdk.java.net Wed Sep 22 20:10:59 2021 From: phh at openjdk.java.net (Paul Hohensee) Date: Wed, 22 Sep 2021 20:10:59 GMT Subject: RFR: JDK-8271567: AArch64: AES Galois CounterMode (GCM) interleaved implementation using vector instructions [v10] In-Reply-To: <5AXJjp4GMtL1NVj0hcCjqJ5ZHrdLObtjjkuyXAko-Ac=.30ffb412-fb26-419c-9d83-545d358f5eb7@github.com> References: <5AXJjp4GMtL1NVj0hcCjqJ5ZHrdLObtjjkuyXAko-Ac=.30ffb412-fb26-419c-9d83-545d358f5eb7@github.com> Message-ID: On Tue, 14 Sep 2021 16:07:45 GMT, Andrew Haley wrote: >> An interleaved version of AES/GCM. >> >> Performance, now and then: >> >> >> Apple M1, 3.2 GHz: >> >> Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> AESGCMBench.decrypt 8192 256 avgt 6 3108.881 ? 119.675 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 3109.685 ? 4.206 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 3122.144 ? 113.379 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 3119.568 ? 192.877 ns/op >> >> AESGCMBench.decrypt 8192 256 avgt 6 89123.942 ? 111.977 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 91034.697 ? 161.469 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 89732.397 ? 106.370 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 89382.300 ? 139.300 ns/op >> >> Neoverse N1, 2.5GHz: >> >> Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> AESGCMBench.decrypt 8192 256 avgt 6 6296.575 ? 37.995 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 7380.326 ? 10.987 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 6293.090 ? 52.972 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 6357.536 ? 42.925 ns/op >> >> AESGCMBench.decrypt 8192 256 avgt 6 48745.085 ? 125.612 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 45062.599 ? 1548.950 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 42230.857 ? 520.562 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 45124.171 ? 1417.927 ns/op >> >> >> >> A note about the implementation for the reviewers: >> >> Unrolled and hand-scheduled intrinsics are often written in a way that >> I don't find satisfactory. Often they are a conglomeration of >> copy-and-paste programming and C macros, which makes them hard to >> understand and hard to maintain. I won't name any names, but there are >> many examples to be found in free software across the Internet, >> >> I spent a while thinking about a structured way to develop and >> implement them, and I think I've got something better. The idea is >> that you transform a pre-existing implementation into a generator for >> the interleaved version. The transformation shouldn't be too hard to >> do, but more importantly it should be possible for a reader to verify >> that the interleaved and unrolled version performs the same function. >> >> A generator takes the form of a subclass of `KernelGenerator`. The >> core idea is that the programmer defines the base case of the >> intrinsic and a method to generate a clone of it, shifted to a >> different set of registers. `KernelGenerator` will then generate >> several interleaved copies of the function, with each one using a >> different set of registers. >> >> The subclass must implement three methods: `length()`, which is the >> number of instruction bundles in the intrinsic, `generate(int n)` >> which emits the nth instruction bundle in the intrinsic, and `next()` >> which takes an instance of the generator and returns a version of it, >> shifted to a new set of registers. >> >> As an example, here's the inner loop of AES encryption: >> >> (Some details elided for clarity.) >> >> >> BIND(L_aes_loop); >> ld1(v0, T16B, post(from, 16)); >> >> cmpw(keylen, 44); >> br(Assembler::CC, L_rounds_44); >> br(Assembler::EQ, L_rounds_52); >> >> aes_round(v0, v17); >> aes_round(v0, v18); >> BIND(L_rounds_52); >> aes_round(v0, v19); >> aes_round(v0, v20); >> BIND(L_rounds_44); >> ... >> >> >> The generator for the unrolled version looks like: >> >> >> virtual void generate(int index) { >> switch (index) { >> case 0: >> ld1(_data, T16B, post(_from, 16)); // get 16 bytes of input >> break; >> case 1: >> if (_once) { >> cmpw(_keylen, 52); >> br(Assembler::LO, _rounds_44); >> br(Assembler::EQ, _rounds_52); >> } >> break; >> case 2: aes_round(_data, _subkeys + 0); break; >> case 3: aes_round(_data, _subkeys + 1); break; >> case 4: >> if (_once) bind(_rounds_52); >> break; >> case 5: aes_round(_data, _subkeys + 2); break; >> case 6: aes_round(_data, _subkeys + 3); break; >> case 7: >> if (_once) bind(_rounds_44); >> break; >> ... >> >> >> The job of converting a single inline intrinsic is, as you can see, >> not much more than adding a switch statement. Some instructions should >> only be emitted once, rather than several times, such as the labels >> and branches. (You can use a list of C++ lambdas rather than a switch >> statement to do the same thing, very LISP, but that seems a bit of a >> sledgehammer. YMMV.) >> >> I believe that this approach will be more maintainable and easier to >> understand than other approaches we've seen. Also, the number of >> unrolls is just a number that can be tweaked as required. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Cleanup I'd commit it now in order to get experience with it, and fix time-to-safepoint later. There's still plenty of time left in the Java 18 schedule for the latter. ------------- PR: https://git.openjdk.java.net/jdk/pull/5390 From jrose at openjdk.java.net Wed Sep 22 21:18:55 2021 From: jrose at openjdk.java.net (John R Rose) Date: Wed, 22 Sep 2021 21:18:55 GMT Subject: RFR: 8273392: Improve usability of stack-less exceptions due to -XX:+OmitStackTraceInFastThrow [v3] In-Reply-To: <02TswvRnnXEQeNetqwBh8XiivQsMFUCEIt758AdjrHk=.6216480f-2031-481d-8bf6-e94e8c5f1977@github.com> References: <02TswvRnnXEQeNetqwBh8XiivQsMFUCEIt758AdjrHk=.6216480f-2031-481d-8bf6-e94e8c5f1977@github.com> Message-ID: <3iOUpNM2PQuE_i52zjBx72fHyXWZdmkq-a0hAA7Omlc=.0ac26dff-990c-4604-8f53-da449f0e3d34@github.com> On Tue, 21 Sep 2021 10:09:11 GMT, Volker Simonis wrote: >> If running with `-XX:+OmitStackTraceInFastThrow` (which is the default) C2 will optimize certain "hot" implicit exceptions (i.e. AIOOBE, NullPointerExceptions,..) and replace them by a static, pre-allocated exception without any stacktrace. >> >> However, we can actually do better. Instead of using a single, pre-allocated exception object for all methods we can let the compiler allocate specific exceptions for each compilation unit (i.e. nmethod) and fill them with at least one stack frame with the method /line-number information of the currently compiled method. If the method in question is being inlined (which often happens), we can add stackframes for all callers up to the inlining depth of the method in question. >> >> For the attached JTreg test, we get the following exception in interpreter mode: >> >> java.lang.NullPointerException: Cannot read the array length because "" is null >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) >> at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) >> at compiler.exceptions.StackFrameInFastThrow.main(StackFrameInFastThrow.java:233) >> >> Once the method gets compiled with `-XX:+OmitStackTraceInFastThrow` the same exception will look as follows: >> >> java.lang.NullPointerException >> >> After this change, if `StackFrameInFastThrow.throwImplicitException()` will be compiled stand alone, we will get: >> >> java.lang.NullPointerException >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> >> and if `StackFrameInFastThrow.throwImplicitException()` will be inlined into `level2()` and `level2()` into `level1()` we will get the following exception (altough we're still running with `-XX:+OmitStackTraceInFastThrow`): >> >> java.lang.NullPointerException >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) >> at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) >> >> The new functionality is guarded by `-XX:+/-StackFrameInFastThrow`, but switched on by default (I'll create a CSR for the new option once reviewers are comfortable with the change). Notice that the optimization comes at no run-time costs because all the extra work will be done at compile time. >> >> ## Implementation details >> >> - Already the current implementation of `-XX:+OmitStackTraceInFastThrow` potentially lazy-allocates the empty singleton exceptions like AIOOBE in `ciEnv::ArrayStoreException_instance()`. With this change, if running with `-XX:+StackFrameInFastThrow` we will always allocate new exception objects and populate them with the stack frames which are statically available at compile time (see `java_lang_Throwable::fill_in_stack_trace_of_implicit_exception()`). >> - Because nmethods don't act as strong GC roots, we have to create a global JNI handle for every newly generated exception to prevent GC from collecting them. >> - In order to avoid a memory leak we have to release these global JNI handles once a nmethod gets unloaded. In order to achieve this, I've added a new section "implicit exceptions" to the nmethod which holds these JNI handles. >> - While adding the new "implicit exceptions" section to the corresponding stats (`print_nmethod_stats()` and printing routines (`nmethod::print()`) I realized that a previous change ([JDK-8254231: Implementation of Foreign Linker API (Incubator)](https://bugs.openjdk.java.net/browse/JDK-8254231)) had already introduced a new nmethod section ("native invokers") but missed to add it to the corresponding stats and printing routines so I've added that section as well. >> - The `#ifdef COMPILER2` guards are only required to not break the `zero`/`minimal` builds. >> - The JTreg test is using `-XX:PerMethodTrapLimit=0` to handle all implicit exceptions as "hot". This makes the test simpler and at the same time provokes the allocation of more implicit exceptions. >> - Manually verified that the created Exception objects are freed by GC once the corresponding nmethods have been flushed. >> - Manual "stress" test with a very small heap and continuous recompilation of methods with explicit exceptions to provoke GCs during compilation didn't reveal any issues. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Create implcit exceptions with an array of StackTraceElements right away instead of creating a backtrace. This prevents that implicit exceptions will keep classes alive due to Java mirrors in the backtrace. To me this looks like a very clever mess. The mess comes from the trickiness (it's a tricky problem!) and even more from forcing various parts of the system that are usually isolated to come into contact. Adding reasons to GC during a JIT task is a smell. Adding objects which are pieced together at compile time is a smell. (And you can't run Java code from the JIT; it's an architectural limitation.) Having JavaClasses talk directly to C2 GraphKit (without even a CI class between) is a smell. Adding a new section to nmethods just to make a poorly-understood life cycle for an odd (non-Java-created) group of exceptions is a smell. If we need a new section on nmethods it should be something more like "Java structures the JIT has made", with clearly separated concerns from the rest of the system, rather than "my very special section for an intrusive RFE". This is not even close to being ready to integrate. ------------- PR: https://git.openjdk.java.net/jdk/pull/5392 From sspitsyn at openjdk.java.net Wed Sep 22 21:24:51 2021 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Wed, 22 Sep 2021 21:24:51 GMT Subject: RFR: 8218885: Restore pop_frame and force_early_return functionality for Graal In-Reply-To: References: Message-ID: On Wed, 22 Sep 2021 05:40:40 GMT, Tom Rodriguez wrote: > This logic no longer seems to be necessary since the adjustCompilationLevel callback has been removed. Hi Tom, The fix looks good in general. We disabled can_pop_frame and can_pop_early_return capabilities with Graal because some jvmti jck tests failed intermittently. So, I'll submit jvmti jck tests on mach5 to make sure they do not fail anymore. ------------- PR: https://git.openjdk.java.net/jdk/pull/5625 From jrose at openjdk.java.net Wed Sep 22 21:31:58 2021 From: jrose at openjdk.java.net (John R Rose) Date: Wed, 22 Sep 2021 21:31:58 GMT Subject: RFR: JDK-8271567: AArch64: AES Galois CounterMode (GCM) interleaved implementation using vector instructions [v10] In-Reply-To: <5AXJjp4GMtL1NVj0hcCjqJ5ZHrdLObtjjkuyXAko-Ac=.30ffb412-fb26-419c-9d83-545d358f5eb7@github.com> References: <5AXJjp4GMtL1NVj0hcCjqJ5ZHrdLObtjjkuyXAko-Ac=.30ffb412-fb26-419c-9d83-545d358f5eb7@github.com> Message-ID: <7cL7mxSBZvplTMqHZkKLKvwioN-Kku_2QRWZoze1CxM=.842df9c8-f4a6-48da-bcec-14677d017582@github.com> On Tue, 14 Sep 2021 16:07:45 GMT, Andrew Haley wrote: >> An interleaved version of AES/GCM. >> >> Performance, now and then: >> >> >> Apple M1, 3.2 GHz: >> >> Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> AESGCMBench.decrypt 8192 256 avgt 6 3108.881 ? 119.675 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 3109.685 ? 4.206 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 3122.144 ? 113.379 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 3119.568 ? 192.877 ns/op >> >> AESGCMBench.decrypt 8192 256 avgt 6 89123.942 ? 111.977 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 91034.697 ? 161.469 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 89732.397 ? 106.370 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 89382.300 ? 139.300 ns/op >> >> Neoverse N1, 2.5GHz: >> >> Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units >> AESGCMBench.decrypt 8192 256 avgt 6 6296.575 ? 37.995 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 7380.326 ? 10.987 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 6293.090 ? 52.972 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 6357.536 ? 42.925 ns/op >> >> AESGCMBench.decrypt 8192 256 avgt 6 48745.085 ? 125.612 ns/op >> AESGCMBench.decryptMultiPart 8192 256 avgt 6 45062.599 ? 1548.950 ns/op >> AESGCMBench.encrypt 8192 256 avgt 6 42230.857 ? 520.562 ns/op >> AESGCMBench.encryptMultiPart 8192 256 avgt 6 45124.171 ? 1417.927 ns/op >> >> >> >> A note about the implementation for the reviewers: >> >> Unrolled and hand-scheduled intrinsics are often written in a way that >> I don't find satisfactory. Often they are a conglomeration of >> copy-and-paste programming and C macros, which makes them hard to >> understand and hard to maintain. I won't name any names, but there are >> many examples to be found in free software across the Internet, >> >> I spent a while thinking about a structured way to develop and >> implement them, and I think I've got something better. The idea is >> that you transform a pre-existing implementation into a generator for >> the interleaved version. The transformation shouldn't be too hard to >> do, but more importantly it should be possible for a reader to verify >> that the interleaved and unrolled version performs the same function. >> >> A generator takes the form of a subclass of `KernelGenerator`. The >> core idea is that the programmer defines the base case of the >> intrinsic and a method to generate a clone of it, shifted to a >> different set of registers. `KernelGenerator` will then generate >> several interleaved copies of the function, with each one using a >> different set of registers. >> >> The subclass must implement three methods: `length()`, which is the >> number of instruction bundles in the intrinsic, `generate(int n)` >> which emits the nth instruction bundle in the intrinsic, and `next()` >> which takes an instance of the generator and returns a version of it, >> shifted to a new set of registers. >> >> As an example, here's the inner loop of AES encryption: >> >> (Some details elided for clarity.) >> >> >> BIND(L_aes_loop); >> ld1(v0, T16B, post(from, 16)); >> >> cmpw(keylen, 44); >> br(Assembler::CC, L_rounds_44); >> br(Assembler::EQ, L_rounds_52); >> >> aes_round(v0, v17); >> aes_round(v0, v18); >> BIND(L_rounds_52); >> aes_round(v0, v19); >> aes_round(v0, v20); >> BIND(L_rounds_44); >> ... >> >> >> The generator for the unrolled version looks like: >> >> >> virtual void generate(int index) { >> switch (index) { >> case 0: >> ld1(_data, T16B, post(_from, 16)); // get 16 bytes of input >> break; >> case 1: >> if (_once) { >> cmpw(_keylen, 52); >> br(Assembler::LO, _rounds_44); >> br(Assembler::EQ, _rounds_52); >> } >> break; >> case 2: aes_round(_data, _subkeys + 0); break; >> case 3: aes_round(_data, _subkeys + 1); break; >> case 4: >> if (_once) bind(_rounds_52); >> break; >> case 5: aes_round(_data, _subkeys + 2); break; >> case 6: aes_round(_data, _subkeys + 3); break; >> case 7: >> if (_once) bind(_rounds_44); >> break; >> ... >> >> >> The job of converting a single inline intrinsic is, as you can see, >> not much more than adding a switch statement. Some instructions should >> only be emitted once, rather than several times, such as the labels >> and branches. (You can use a list of C++ lambdas rather than a switch >> statement to do the same thing, very LISP, but that seems a bit of a >> sledgehammer. YMMV.) >> >> I believe that this approach will be more maintainable and easier to >> understand than other approaches we've seen. Also, the number of >> unrolls is just a number that can be tweaked as required. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Cleanup Not a review, but that's the best assembly code I think I've ever seen. Probably the only way to make it decisively better would be to code it in Java, using the Vector API on top of the (as yet uninvented) statically compiled but self-hosting System Java dialect. ------------- PR: https://git.openjdk.java.net/jdk/pull/5390 From svkamath at openjdk.java.net Wed Sep 22 22:48:32 2021 From: svkamath at openjdk.java.net (Smita Kamath) Date: Wed, 22 Sep 2021 22:48:32 GMT Subject: RFR: 8273297: AES/GCM non-AVX512+VAES CPUs suffer after 8267125 [v5] In-Reply-To: References: Message-ID: <-FJux8XZG1ra8W4DNa31PEP_pWxoIkVvf9qHyYRSEBM=.5b57d887-c321-4fd0-9c64-9ea2202ac774@github.com> > Performance dropped up to 10% for 1k data after 8267125 for CPUs that do not support the new intrinsic. Tests run were crypto.full.AESGCMBench and crypto.full.AESGCMByteBuffer from the jmh micro benchmarks. > > The problem is each instance of GHASH allocates 96 extra longs for the AVX512+VAES intrinsic regardless if the intrinsic is used. This extra table space should be allocated differently so that non-supporting CPUs do not suffer this penalty. This issue also affects non-Intel CPUs too. Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: Added htbl_entries constant to other architectures ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5402/files - new: https://git.openjdk.java.net/jdk/pull/5402/files/8756d301..59b1b910 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5402&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5402&range=03-04 Stats: 16 lines in 6 files changed: 15 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5402.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5402/head:pull/5402 PR: https://git.openjdk.java.net/jdk/pull/5402 From dholmes at openjdk.java.net Thu Sep 23 01:37:57 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 23 Sep 2021 01:37:57 GMT Subject: RFR: 8273916: Remove 'special' ranking [v4] In-Reply-To: <1Y3__mWGE9I1tGXX9nh5pqNkgcBDMAemQqz9tf4Q8t8=.3bd92d9e-42e6-4a81-b11f-5639a274aad2@github.com> References: <1Y3__mWGE9I1tGXX9nh5pqNkgcBDMAemQqz9tf4Q8t8=.3bd92d9e-42e6-4a81-b11f-5639a274aad2@github.com> Message-ID: On Tue, 21 Sep 2021 23:47:41 GMT, Coleen Phillimore wrote: >> This change removes the special ranking and folds it into nosafepoint. You have to look at commit #3 to see this actual part of the change that doesn't include JDK-8273915. >> This passes tier1-6 also. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Remove special comment. Hi Coleen, Thanks again for the offlist discussion. Sorry it took me so long to "get it". Cheers, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5563 From njian at openjdk.java.net Thu Sep 23 03:03:00 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Thu, 23 Sep 2021 03:03:00 GMT Subject: Integrated: 8267356: AArch64: Vector API SVE codegen support In-Reply-To: <04_lDZDCcLLfXx6XmrbVdGXkPmqNrYzQuHBJzQ9Oa5k=.f3ef1398-c8b4-4162-9631-3dbfb6594250@github.com> References: <04_lDZDCcLLfXx6XmrbVdGXkPmqNrYzQuHBJzQ9Oa5k=.f3ef1398-c8b4-4162-9631-3dbfb6594250@github.com> Message-ID: On Thu, 20 May 2021 07:32:52 GMT, Ningsheng Jian wrote: > This is the integration of current SVE work done in panama-vector/vectorIntrinscs, which includes: > > 1. Code generation for Vector API c2 IR nodes with SVE. > 2. Non-max vector size support with SVE, e.g. using *128Vector (and *64Vector) APIs on 256-bit SVE environment could also generate optimized SVE instructions with predicate feature. > 3. Some more SVE assemblers (and tests) used by the codegen part. > > Note: VectorMask is still represented in vector register, a further improvement to map mask to predicate register is under development at https://github.com/openjdk/panama-vector/tree/vectorIntrinsics+mask > > > Test: tier1-3 with vector api test cases passed on 512-bit SVE hardware with MaxVectorSize=16/32/64. This pull request has now been integrated. Changeset: 9d3379b9 Author: Ningsheng Jian URL: https://git.openjdk.java.net/jdk/commit/9d3379b9755e9739f0b8f5c29deb1d28d0f3aa81 Stats: 5761 lines in 13 files changed: 4576 ins; 195 del; 990 mod 8267356: AArch64: Vector API SVE codegen support Co-authored-by: Xiaohong Gong Co-authored-by: Wang Huang Co-authored-by: Ningsheng Jian Co-authored-by: Xuejin He Co-authored-by: Ai Jiaming Co-authored-by: Eric Liu Reviewed-by: aph, ngasson ------------- PR: https://git.openjdk.java.net/jdk/pull/4122 From dlong at openjdk.java.net Thu Sep 23 03:28:57 2021 From: dlong at openjdk.java.net (Dean Long) Date: Thu, 23 Sep 2021 03:28:57 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v2] In-Reply-To: References: Message-ID: <7v19NZqVzDlIpGm_JEGFW9ynn-7fz2xZ2kIkI2lelL8=.7eef3216-9700-4dc6-b267-5d3f7f172f16@github.com> On Thu, 16 Sep 2021 17:00:20 GMT, Volker Simonis wrote: >> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. >> >> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): >> >> public static boolean isAlpha(int c) { >> try { >> return IS_ALPHA[c]; >> } catch (ArrayIndexOutOfBoundsException ex) { >> return false; >> } >> } >> >> >> ### Solution >> >> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: >> >> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op >> >> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op >> >> >> ### Implementation details >> >> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. >> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. >> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. >> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. >> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Minor updates as requested by @TheRealMDoerr How about introduce a public rangeCheck() method that returns true/false and would be a compiler intrinsic. Then we don't have to create an exception at all. It could go some place like jdk/internal/util/ArraysSupport. ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From dlong at openjdk.java.net Thu Sep 23 03:50:54 2021 From: dlong at openjdk.java.net (Dean Long) Date: Thu, 23 Sep 2021 03:50:54 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v2] In-Reply-To: References: Message-ID: On Thu, 16 Sep 2021 17:00:20 GMT, Volker Simonis wrote: >> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. >> >> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): >> >> public static boolean isAlpha(int c) { >> try { >> return IS_ALPHA[c]; >> } catch (ArrayIndexOutOfBoundsException ex) { >> return false; >> } >> } >> >> >> ### Solution >> >> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: >> >> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op >> >> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op >> >> >> ### Implementation details >> >> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. >> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. >> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. >> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. >> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Minor updates as requested by @TheRealMDoerr Ok 2nd thought, change code like isAlpha() to do the range check online. It would be nice if the compiler could do that automatically, but I don?t think the spec would allow omitting the except, even though it would be difficult to tell without exception logging or JVMTI turned on. ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From never at openjdk.java.net Thu Sep 23 04:10:57 2021 From: never at openjdk.java.net (Tom Rodriguez) Date: Thu, 23 Sep 2021 04:10:57 GMT Subject: RFR: 8218885: Restore pop_frame and force_early_return functionality for Graal In-Reply-To: References: Message-ID: On Wed, 22 Sep 2021 05:40:40 GMT, Tom Rodriguez wrote: > This logic no longer seems to be necessary since the adjustCompilationLevel callback has been removed. Since Graal is gone I don't think it's possible to run those tests any more. It might be possible to run those tests on some source base after https://bugs.openjdk.java.net/browse/JDK-8219403 when adjustCompilationLevel was removed. ------------- PR: https://git.openjdk.java.net/jdk/pull/5625 From stuefe at openjdk.java.net Thu Sep 23 05:54:53 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 23 Sep 2021 05:54:53 GMT Subject: RFR: 8273608: Deadlock when jcmd of OnError attaches to itself In-Reply-To: <3mILJs2Lcq7t5gUDP70FH2LVsm-NT2UTsm1JY-rCKB0=.d4dd4b55-9476-43ad-a0cf-ce6d2a9bef4e@github.com> References: <3mILJs2Lcq7t5gUDP70FH2LVsm-NT2UTsm1JY-rCKB0=.d4dd4b55-9476-43ad-a0cf-ce6d2a9bef4e@github.com> Message-ID: <2C2ZjJtbXq06HV3JNSBw35iy3otCExPJQ8k2YXhFgTQ=.115e3d34-162e-4e2f-b656-cdb874aa8246@github.com> On Mon, 20 Sep 2021 22:02:37 GMT, Xin Liu wrote: > This patch allows the custom commands of OnError to attach to HotSpot itself. > It sets the thread of report_and_die() to Native before os::fork_and_exec(cmd). > This prevents cmds which require safepoint synchronization from deadlock. > eg. OnError='jcmd %p Thread.print'. > > Without this patch, we will encounter a deadlock at safepoint synchronization. > `"main" #1` is the very thread which executes `os::fork_and_exec(cmd)`. > > > Aborting due to java.lang.OutOfMemoryError: Java heap space > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (debug.cpp:364), pid=94632, tid=94633 > # fatal error: OutOfMemory encountered: Java heap space > # > # JRE version: OpenJDK Runtime Environment (18.0) (build 18-internal+0-adhoc.xxinliu.jdk) > # Java VM: OpenJDK 64-Bit Server VM (18-internal+0-adhoc.xxinliu.jdk, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # /local/home/xxinliu/JDK-2085/hs_err_pid94632.log > # > # -XX:OnError="jcmd %p Thread.print" > # Executing /bin/sh -c "jcmd 94632 Thread.print" ... > 94632: > [10.616s][warning][safepoint] > [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timeout detected: > [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timed out while spinning to reach a safepoint. > [10.616s][warning][safepoint] # SafepointSynchronize::begin: Threads which did not reach the safepoint: > [10.616s][warning][safepoint] # "main" #1 prio=5 os_prio=0 cpu=236.97ms elapsed=10.61s tid=0x00007f01b00232f0 nid=94633 runnable [0x00007f01b7a08000] > [10.616s][warning][safepoint] java.lang.Thread.State: RUNNABLE > [10.616s][warning][safepoint] > [10.616s][warning][safepoint] # SafepointSynchronize::begin: (End of list) Hi Xin, Comments inline. As I said, I think this is useful (and probably should be backported at least to 17). Can you please provide a regression test for this? This is not just for aesthetics, these switches are actually used a lot more than one thinks, and knowing that they work and did not bitrot would be reassuring :) You could maybe expand runtime/ErrorHandling/TestOnError. It would be nice to have tests for: `jcmd XXX` (e.g. call `VM.info` and then scan for the pid). TestOnError uses `-XX:ErrorHandlerTest`, which does a voluntary crash in CreateJavaVM, after VM initialization. It would be nice to have a second mode for TestOnError to test OnError in OOM situations. Then we have covered both variants (real crashes and OOMs). The latter could also be tested in release VMs (`-XX:ErrorHandlerTest` is debug only). Finally (and optionally, depending on how far you want to go) we should test OnError with a sequence of commands too. Thanks, Thomas src/hotspot/share/utilities/vmError.cpp line 1330: > 1328: _thread = JavaThread::cast(t); > 1329: assert(_thread == Thread::current(), "must be current thread"); > 1330: assert(_thread->thread_state() == _thread_in_vm, "must be in VM"); Don't assert here. We are in the middle of error handling. This would just lead to recursive errors and very probably to "too many errors, abort". ------------- Changes requested by stuefe (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5590 From stuefe at openjdk.java.net Thu Sep 23 05:54:54 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 23 Sep 2021 05:54:54 GMT Subject: RFR: 8273608: Deadlock when jcmd of OnError attaches to itself In-Reply-To: <6pffdSxbdmf8m_wslp0HWlnENY-PuV-bDG2L_vrV4TM=.9583da02-8384-4d84-b1bd-f36b1457e362@github.com> References: <3mILJs2Lcq7t5gUDP70FH2LVsm-NT2UTsm1JY-rCKB0=.d4dd4b55-9476-43ad-a0cf-ce6d2a9bef4e@github.com> <6pffdSxbdmf8m_wslp0HWlnENY-PuV-bDG2L_vrV4TM=.9583da02-8384-4d84-b1bd-f36b1457e362@github.com> Message-ID: On Wed, 22 Sep 2021 19:26:28 GMT, Patricio Chilano Mateo wrote: >> This patch allows the custom commands of OnError to attach to HotSpot itself. >> It sets the thread of report_and_die() to Native before os::fork_and_exec(cmd). >> This prevents cmds which require safepoint synchronization from deadlock. >> eg. OnError='jcmd %p Thread.print'. >> >> Without this patch, we will encounter a deadlock at safepoint synchronization. >> `"main" #1` is the very thread which executes `os::fork_and_exec(cmd)`. >> >> >> Aborting due to java.lang.OutOfMemoryError: Java heap space >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (debug.cpp:364), pid=94632, tid=94633 >> # fatal error: OutOfMemory encountered: Java heap space >> # >> # JRE version: OpenJDK Runtime Environment (18.0) (build 18-internal+0-adhoc.xxinliu.jdk) >> # Java VM: OpenJDK 64-Bit Server VM (18-internal+0-adhoc.xxinliu.jdk, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) >> # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again >> # >> # An error report file with more information is saved as: >> # /local/home/xxinliu/JDK-2085/hs_err_pid94632.log >> # >> # -XX:OnError="jcmd %p Thread.print" >> # Executing /bin/sh -c "jcmd 94632 Thread.print" ... >> 94632: >> [10.616s][warning][safepoint] >> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timeout detected: >> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timed out while spinning to reach a safepoint. >> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Threads which did not reach the safepoint: >> [10.616s][warning][safepoint] # "main" #1 prio=5 os_prio=0 cpu=236.97ms elapsed=10.61s tid=0x00007f01b00232f0 nid=94633 runnable [0x00007f01b7a08000] >> [10.616s][warning][safepoint] java.lang.Thread.State: RUNNABLE >> [10.616s][warning][safepoint] >> [10.616s][warning][safepoint] # SafepointSynchronize::begin: (End of list) > > src/hotspot/share/utilities/vmError.cpp line 1341: > >> 1339: ~VMErrorThreadToNativeFromVM() { >> 1340: if (_thread != nullptr) { >> 1341: ThreadStateTransition::transition_from_native(_thread, _thread_in_vm); > > Just some thought on this part. Ideally we should avoid calling process_if_requested_with_exit_check() since attempting to process handshakes/stackwatermarks at this point might lead to all sorts of other issues. An alternative could be to just set the original state back and continue. But maybe we don't care about this because we are almost done with the error reporting and the OnError commands were already executed. That last part would argue to move the wrapper before the while loop. I agree with @pchilano (if I understand him correctly) in that I think an RAII object here is not even needed. There is no need to re-instate the prior thread state. ------------- PR: https://git.openjdk.java.net/jdk/pull/5590 From stuefe at openjdk.java.net Thu Sep 23 05:54:55 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 23 Sep 2021 05:54:55 GMT Subject: RFR: 8273608: Deadlock when jcmd of OnError attaches to itself In-Reply-To: <_MjzQyxu0xw_hkQcsKe3kIq3mdrpkKRKJ7vgVf2TNWA=.2fde4c46-30a5-4a61-8723-75ddd3c84df3@github.com> References: <3mILJs2Lcq7t5gUDP70FH2LVsm-NT2UTsm1JY-rCKB0=.d4dd4b55-9476-43ad-a0cf-ce6d2a9bef4e@github.com> <_MjzQyxu0xw_hkQcsKe3kIq3mdrpkKRKJ7vgVf2TNWA=.2fde4c46-30a5-4a61-8723-75ddd3c84df3@github.com> Message-ID: On Tue, 21 Sep 2021 05:35:14 GMT, David Holmes wrote: >> This patch allows the custom commands of OnError to attach to HotSpot itself. >> It sets the thread of report_and_die() to Native before os::fork_and_exec(cmd). >> This prevents cmds which require safepoint synchronization from deadlock. >> eg. OnError='jcmd %p Thread.print'. >> >> Without this patch, we will encounter a deadlock at safepoint synchronization. >> `"main" #1` is the very thread which executes `os::fork_and_exec(cmd)`. >> >> >> Aborting due to java.lang.OutOfMemoryError: Java heap space >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (debug.cpp:364), pid=94632, tid=94633 >> # fatal error: OutOfMemory encountered: Java heap space >> # >> # JRE version: OpenJDK Runtime Environment (18.0) (build 18-internal+0-adhoc.xxinliu.jdk) >> # Java VM: OpenJDK 64-Bit Server VM (18-internal+0-adhoc.xxinliu.jdk, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) >> # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again >> # >> # An error report file with more information is saved as: >> # /local/home/xxinliu/JDK-2085/hs_err_pid94632.log >> # >> # -XX:OnError="jcmd %p Thread.print" >> # Executing /bin/sh -c "jcmd 94632 Thread.print" ... >> 94632: >> [10.616s][warning][safepoint] >> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timeout detected: >> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timed out while spinning to reach a safepoint. >> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Threads which did not reach the safepoint: >> [10.616s][warning][safepoint] # "main" #1 prio=5 os_prio=0 cpu=236.97ms elapsed=10.61s tid=0x00007f01b00232f0 nid=94633 runnable [0x00007f01b7a08000] >> [10.616s][warning][safepoint] java.lang.Thread.State: RUNNABLE >> [10.616s][warning][safepoint] >> [10.616s][warning][safepoint] # SafepointSynchronize::begin: (End of list) > > src/hotspot/share/utilities/vmError.cpp line 1343: > >> 1341: ThreadStateTransition::transition_from_native(_thread, _thread_in_vm); >> 1342: assert(!_thread->is_pending_jni_exception_check(), "Pending JNI Exception Check"); >> 1343: // We don't need to clear_walkable because it will happen automagically when we return to java > > We are not executing JNI code when the do the fork_and_exec so this does not seem necessary. > The comment about `clear_walkable` also doesn't make sense here - we are crashing so we are not returning to Java at all. Also, recursive assertions don't work here. ------------- PR: https://git.openjdk.java.net/jdk/pull/5590 From stuefe at openjdk.java.net Thu Sep 23 05:54:55 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 23 Sep 2021 05:54:55 GMT Subject: RFR: 8273608: Deadlock when jcmd of OnError attaches to itself In-Reply-To: References: <3mILJs2Lcq7t5gUDP70FH2LVsm-NT2UTsm1JY-rCKB0=.d4dd4b55-9476-43ad-a0cf-ce6d2a9bef4e@github.com> <_MjzQyxu0xw_hkQcsKe3kIq3mdrpkKRKJ7vgVf2TNWA=.2fde4c46-30a5-4a61-8723-75ddd3c84df3@github.com> Message-ID: On Wed, 22 Sep 2021 05:59:01 GMT, Xin Liu wrote: >> src/hotspot/share/utilities/vmError.cpp line 1663: >> >>> 1661: out.print_raw_cr("\" ..."); >>> 1662: >>> 1663: VMErrorThreadToNativeFromVM ttnfv(JavaThread::current_or_null()); >> >> Surely the current thread need not be a JavaThread here. > > Make sense. I haven't seen report_and_die() is called by NonJavaThread, but I agree we should cover that case. it will be no-op for NonJavaThread because safepoint synchronization only checks JavaThreads. I would move the RAII object out of this scope into the enclosing scope. Do it at the start of `if OnError[0] != NULL`. There is no need to repeatedly do this for every command. (See my earlier comment, maybe we don't even need an RAII but a simple transition, done once, would be sufficient). ------------- PR: https://git.openjdk.java.net/jdk/pull/5590 From sspitsyn at openjdk.java.net Thu Sep 23 06:28:58 2021 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Thu, 23 Sep 2021 06:28:58 GMT Subject: RFR: 8218885: Restore pop_frame and force_early_return functionality for Graal In-Reply-To: References: Message-ID: On Wed, 22 Sep 2021 05:40:40 GMT, Tom Rodriguez wrote: > This logic no longer seems to be necessary since the adjustCompilationLevel callback has been removed. Yes, I got errors since Graal is gone: `JVMCI compiler 'graal' specified by jvmci.Compiler not found` If you have a repository with Graal can you run this mach5 command? : mach5 remote-build-and-test --email tom.rodriguez at oracle.com \ --id-tag jck-jvmti-graal-Xcomp --comment jvmti-graal-Xcomp --test "jck:vm/jvmti" \ -a "-Xcomp -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+TieredCompilation -XX:+UseJVMCICompiler -Djvmci.Compiler=graal" \ -b linux-x64-debug,windows-x64-debug,macosx-x64-debug,linux-x64,windows-x64,macosx-x64 Otherwise, I'll try to extract a jdk 13 version with the adjustCompilationLevel removed. ------------- PR: https://git.openjdk.java.net/jdk/pull/5625 From xliu at openjdk.java.net Thu Sep 23 07:28:56 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Thu, 23 Sep 2021 07:28:56 GMT Subject: RFR: 8273608: Deadlock when jcmd of OnError attaches to itself In-Reply-To: <3mILJs2Lcq7t5gUDP70FH2LVsm-NT2UTsm1JY-rCKB0=.d4dd4b55-9476-43ad-a0cf-ce6d2a9bef4e@github.com> References: <3mILJs2Lcq7t5gUDP70FH2LVsm-NT2UTsm1JY-rCKB0=.d4dd4b55-9476-43ad-a0cf-ce6d2a9bef4e@github.com> Message-ID: On Mon, 20 Sep 2021 22:02:37 GMT, Xin Liu wrote: > This patch allows the custom commands of OnError to attach to HotSpot itself. > It sets the thread of report_and_die() to Native before os::fork_and_exec(cmd). > This prevents cmds which require safepoint synchronization from deadlock. > eg. OnError='jcmd %p Thread.print'. > > Without this patch, we will encounter a deadlock at safepoint synchronization. > `"main" #1` is the very thread which executes `os::fork_and_exec(cmd)`. > > > Aborting due to java.lang.OutOfMemoryError: Java heap space > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (debug.cpp:364), pid=94632, tid=94633 > # fatal error: OutOfMemory encountered: Java heap space > # > # JRE version: OpenJDK Runtime Environment (18.0) (build 18-internal+0-adhoc.xxinliu.jdk) > # Java VM: OpenJDK 64-Bit Server VM (18-internal+0-adhoc.xxinliu.jdk, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # /local/home/xxinliu/JDK-2085/hs_err_pid94632.log > # > # -XX:OnError="jcmd %p Thread.print" > # Executing /bin/sh -c "jcmd 94632 Thread.print" ... > 94632: > [10.616s][warning][safepoint] > [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timeout detected: > [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timed out while spinning to reach a safepoint. > [10.616s][warning][safepoint] # SafepointSynchronize::begin: Threads which did not reach the safepoint: > [10.616s][warning][safepoint] # "main" #1 prio=5 os_prio=0 cpu=236.97ms elapsed=10.61s tid=0x00007f01b00232f0 nid=94633 runnable [0x00007f01b7a08000] > [10.616s][warning][safepoint] java.lang.Thread.State: RUNNABLE > [10.616s][warning][safepoint] > [10.616s][warning][safepoint] # SafepointSynchronize::begin: (End of list) hi, Reviewers, Thanks for the comments. > Can you please provide a regression test for this? yes, it should have a test. I am working on this. > We can always walk _mutex_array like we do in print_owned_locks_on_error(). Note that locks created outside mutex_init() will not be visible though. Maybe we should fix that. Thank Patricio for this idea. I jog down a quick scan. I would like to have Thread::owns_locks() in release build, but as you said, it's different from `Thread::owns_locks() { return owned_locks() != NULL; }`. It only covers mutex_init(). I think it should be a standalone issue. Could you also take a look at `jfrEmergencyDump::on_vm_shutdown` in jfrEmergencyDump.cpp ? I think it's very similar logic. Even if we don't use RAII, I think it's still possible to have a reusable procedure. // PreConds: 1) current != null 2) current->is_Java_Thread 3. current state is VM 4) successfully unlock all owning locks. // return: true if succeed. bool transition_current_to_native() ------------- PR: https://git.openjdk.java.net/jdk/pull/5590 From aph at openjdk.java.net Thu Sep 23 08:30:01 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 23 Sep 2021 08:30:01 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v4] In-Reply-To: References: Message-ID: On Wed, 22 Sep 2021 13:48:40 GMT, Evgeny Astigeevich wrote: >> This PR is a follow-up on the discussion [?RFC: AArch64: Implementing spin pauses with ISB?](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054033.html). >> >> It adds the option `OnSpinWaitImpl=value`, where `value` can be: >> >> - `none`: no implementation for spin pauses. This is the default value. >> - `Nnop`: use `nop` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `nop` instructions. >> - `Nisb`: use `isb` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `isb` instructions. >> - `Nyield`: use `yield` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `yield` instructions. >> >> The code for the `Thread.onSpinWait` intrinsic is generated based on the value of `OnSpinWaitImpl`. >> >> Testing: >> >> - `make test TEST="gtest"`: Passed >> - `make run-test TEST="tier1"`: Passed >> - `make run-test TEST="tier2"`: Passed >> - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > Move spin_wait in cpp file with removal of loop macro > > In addition, comments are added to a checking method of a test. src/hotspot/cpu/aarch64/vm_version_aarch64.cpp line 68: > 66: return PauseImplDesc(YIELD, count); > 67: } else if (strcmp(s, "none") != 0) { > 68: vm_exit_during_initialization("Invalid value for OnSpinWaitImpl", OnSpinWaitImpl); Suggestion: vm_exit_during_initialization("The options for OnSpinWaitImpl are nop, isb, yield, and none", OnSpinWaitImpl); ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From tschatzl at openjdk.java.net Thu Sep 23 08:30:01 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 23 Sep 2021 08:30:01 GMT Subject: RFR: 8273508: Support archived heap objects in SerialGC [v2] In-Reply-To: References: Message-ID: On Tue, 21 Sep 2021 17:35:13 GMT, Ioi Lam wrote: >> When `-XX:+UseSerialGC is enabled`, load the CDS archived heap objects into `SerialHeap::old_gen()` during VM bootstrap. This improves VM start-up time, mostly because the module graph can be loaded from the archive. >> >> >> $ perf stat -r 40 java -XX:+UseSerialGC -version >> >> Before: 0.042484507 seconds time elapsed ( +- 0.72% ) >> After: 0.031671000 seconds time elapsed ( +- 0.72% ) >> >> >> Changes in the gc subdirectories are contributed by @tschatzl > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > @tschatzl comments GC changes (and the bits around them) seem good to me. We already discussed the (existing) pervasive usage of `uintptr_t` in other code calling the gc code for addresses in the java heap and using size_t/int/intx/whatever for offsets in private which I do not recommend to do. However if there is any change to be made, it's a separate issue. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5596 From aph at openjdk.java.net Thu Sep 23 08:40:55 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 23 Sep 2021 08:40:55 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v4] In-Reply-To: References: Message-ID: On Wed, 22 Sep 2021 13:48:40 GMT, Evgeny Astigeevich wrote: >> This PR is a follow-up on the discussion [?RFC: AArch64: Implementing spin pauses with ISB?](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054033.html). >> >> It adds the option `OnSpinWaitImpl=value`, where `value` can be: >> >> - `none`: no implementation for spin pauses. This is the default value. >> - `Nnop`: use `nop` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `nop` instructions. >> - `Nisb`: use `isb` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `isb` instructions. >> - `Nyield`: use `yield` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `yield` instructions. >> >> The code for the `Thread.onSpinWait` intrinsic is generated based on the value of `OnSpinWaitImpl`. >> >> Testing: >> >> - `make test TEST="gtest"`: Passed >> - `make run-test TEST="tier1"`: Passed >> - `make run-test TEST="tier2"`: Passed >> - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > Move spin_wait in cpp file with removal of loop macro > > In addition, comments are added to a checking method of a test. src/hotspot/cpu/aarch64/vm_version_aarch64.cpp line 60: > 58: } > 59: s += 1; > 60: } Suggestion: while (isdigit(*s++)); count = atoi(OnSpinWaitImpl); ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From aph at openjdk.java.net Thu Sep 23 09:04:02 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 23 Sep 2021 09:04:02 GMT Subject: RFR: JDK-8271567: AArch64: AES Galois CounterMode (GCM) interleaved implementation using vector instructions [v10] In-Reply-To: <7cL7mxSBZvplTMqHZkKLKvwioN-Kku_2QRWZoze1CxM=.842df9c8-f4a6-48da-bcec-14677d017582@github.com> References: <5AXJjp4GMtL1NVj0hcCjqJ5ZHrdLObtjjkuyXAko-Ac=.30ffb412-fb26-419c-9d83-545d358f5eb7@github.com> <7cL7mxSBZvplTMqHZkKLKvwioN-Kku_2QRWZoze1CxM=.842df9c8-f4a6-48da-bcec-14677d017582@github.com> Message-ID: <486Xy7jXIH0Cqs9yKXYZ-QmsYxwSP3zEP0trVXQ4YoM=.3739db28-def4-4dfc-b4f3-c679c678f7b3@github.com> On Wed, 22 Sep 2021 21:28:55 GMT, John R Rose wrote: > Not a review, but that's the best assembly code I think I've ever seen. I'm going to frame that and put it on my wall. ------------- PR: https://git.openjdk.java.net/jdk/pull/5390 From aph at openjdk.java.net Thu Sep 23 09:04:04 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 23 Sep 2021 09:04:04 GMT Subject: Integrated: JDK-8271567: AArch64: AES Galois CounterMode (GCM) interleaved implementation using vector instructions In-Reply-To: References: Message-ID: On Tue, 7 Sep 2021 14:23:12 GMT, Andrew Haley wrote: > An interleaved version of AES/GCM. > > Performance, now and then: > > > Apple M1, 3.2 GHz: > > Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > AESGCMBench.decrypt 8192 256 avgt 6 3108.881 ? 119.675 ns/op > AESGCMBench.decryptMultiPart 8192 256 avgt 6 3109.685 ? 4.206 ns/op > AESGCMBench.encrypt 8192 256 avgt 6 3122.144 ? 113.379 ns/op > AESGCMBench.encryptMultiPart 8192 256 avgt 6 3119.568 ? 192.877 ns/op > > AESGCMBench.decrypt 8192 256 avgt 6 89123.942 ? 111.977 ns/op > AESGCMBench.decryptMultiPart 8192 256 avgt 6 91034.697 ? 161.469 ns/op > AESGCMBench.encrypt 8192 256 avgt 6 89732.397 ? 106.370 ns/op > AESGCMBench.encryptMultiPart 8192 256 avgt 6 89382.300 ? 139.300 ns/op > > Neoverse N1, 2.5GHz: > > Benchmark (dataSize) (keyLength) (provider) Mode Cnt Score Error Units > AESGCMBench.decrypt 8192 256 avgt 6 6296.575 ? 37.995 ns/op > AESGCMBench.decryptMultiPart 8192 256 avgt 6 7380.326 ? 10.987 ns/op > AESGCMBench.encrypt 8192 256 avgt 6 6293.090 ? 52.972 ns/op > AESGCMBench.encryptMultiPart 8192 256 avgt 6 6357.536 ? 42.925 ns/op > > AESGCMBench.decrypt 8192 256 avgt 6 48745.085 ? 125.612 ns/op > AESGCMBench.decryptMultiPart 8192 256 avgt 6 45062.599 ? 1548.950 ns/op > AESGCMBench.encrypt 8192 256 avgt 6 42230.857 ? 520.562 ns/op > AESGCMBench.encryptMultiPart 8192 256 avgt 6 45124.171 ? 1417.927 ns/op > > > > A note about the implementation for the reviewers: > > Unrolled and hand-scheduled intrinsics are often written in a way that > I don't find satisfactory. Often they are a conglomeration of > copy-and-paste programming and C macros, which makes them hard to > understand and hard to maintain. I won't name any names, but there are > many examples to be found in free software across the Internet, > > I spent a while thinking about a structured way to develop and > implement them, and I think I've got something better. The idea is > that you transform a pre-existing implementation into a generator for > the interleaved version. The transformation shouldn't be too hard to > do, but more importantly it should be possible for a reader to verify > that the interleaved and unrolled version performs the same function. > > A generator takes the form of a subclass of `KernelGenerator`. The > core idea is that the programmer defines the base case of the > intrinsic and a method to generate a clone of it, shifted to a > different set of registers. `KernelGenerator` will then generate > several interleaved copies of the function, with each one using a > different set of registers. > > The subclass must implement three methods: `length()`, which is the > number of instruction bundles in the intrinsic, `generate(int n)` > which emits the nth instruction bundle in the intrinsic, and `next()` > which takes an instance of the generator and returns a version of it, > shifted to a new set of registers. > > As an example, here's the inner loop of AES encryption: > > (Some details elided for clarity.) > > > BIND(L_aes_loop); > ld1(v0, T16B, post(from, 16)); > > cmpw(keylen, 44); > br(Assembler::CC, L_rounds_44); > br(Assembler::EQ, L_rounds_52); > > aes_round(v0, v17); > aes_round(v0, v18); > BIND(L_rounds_52); > aes_round(v0, v19); > aes_round(v0, v20); > BIND(L_rounds_44); > ... > > > The generator for the unrolled version looks like: > > > virtual void generate(int index) { > switch (index) { > case 0: > ld1(_data, T16B, post(_from, 16)); // get 16 bytes of input > break; > case 1: > if (_once) { > cmpw(_keylen, 52); > br(Assembler::LO, _rounds_44); > br(Assembler::EQ, _rounds_52); > } > break; > case 2: aes_round(_data, _subkeys + 0); break; > case 3: aes_round(_data, _subkeys + 1); break; > case 4: > if (_once) bind(_rounds_52); > break; > case 5: aes_round(_data, _subkeys + 2); break; > case 6: aes_round(_data, _subkeys + 3); break; > case 7: > if (_once) bind(_rounds_44); > break; > ... > > > The job of converting a single inline intrinsic is, as you can see, > not much more than adding a switch statement. Some instructions should > only be emitted once, rather than several times, such as the labels > and branches. (You can use a list of C++ lambdas rather than a switch > statement to do the same thing, very LISP, but that seems a bit of a > sledgehammer. YMMV.) > > I believe that this approach will be more maintainable and easier to > understand than other approaches we've seen. Also, the number of > unrolls is just a number that can be tweaked as required. This pull request has now been integrated. Changeset: 4f3b626a Author: Andrew Haley URL: https://git.openjdk.java.net/jdk/commit/4f3b626a36319cbbbbdcb1c02a84486a3d4eddb6 Stats: 1378 lines in 7 files changed: 1153 ins; 210 del; 15 mod 8271567: AArch64: AES Galois CounterMode (GCM) interleaved implementation using vector instructions Reviewed-by: ngasson, adinn, xliu ------------- PR: https://git.openjdk.java.net/jdk/pull/5390 From aph at openjdk.java.net Thu Sep 23 10:45:52 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 23 Sep 2021 10:45:52 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v4] In-Reply-To: References: Message-ID: On Thu, 23 Sep 2021 08:38:05 GMT, Andrew Haley wrote: >> Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: >> >> Move spin_wait in cpp file with removal of loop macro >> >> In addition, comments are added to a checking method of a test. > > src/hotspot/cpu/aarch64/vm_version_aarch64.cpp line 60: > >> 58: } >> 59: s += 1; >> 60: } > > Suggestion: > > while (isdigit(*s++)); > count = atoi(OnSpinWaitImpl); As far as I know, this combination of digits and named option is unusual in HotSpot; it may be unique. For the sake of not doing something so unfamiliar to our users, it may be worth separating the count and the option string. ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From qpzhang at openjdk.java.net Thu Sep 23 10:50:03 2021 From: qpzhang at openjdk.java.net (Patrick Zhang) Date: Thu, 23 Sep 2021 10:50:03 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v4] In-Reply-To: References: Message-ID: On Wed, 22 Sep 2021 13:48:40 GMT, Evgeny Astigeevich wrote: >> This PR is a follow-up on the discussion [?RFC: AArch64: Implementing spin pauses with ISB?](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054033.html). >> >> It adds the option `OnSpinWaitImpl=value`, where `value` can be: >> >> - `none`: no implementation for spin pauses. This is the default value. >> - `Nnop`: use `nop` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `nop` instructions. >> - `Nisb`: use `isb` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `isb` instructions. >> - `Nyield`: use `yield` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `yield` instructions. >> >> The code for the `Thread.onSpinWait` intrinsic is generated based on the value of `OnSpinWaitImpl`. >> >> Testing: >> >> - `make test TEST="gtest"`: Passed >> - `make run-test TEST="tier1"`: Passed >> - `make run-test TEST="tier2"`: Passed >> - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > Move spin_wait in cpp file with removal of loop macro > > In addition, comments are added to a checking method of a test. src/hotspot/cpu/aarch64/globals_aarch64.hpp line 116: > 114: product(ccstr, OnSpinWaitImpl, "none", \ > 115: "Use instructions to implement java.lang.Thread.onSpinWait()."\ > 116: "Options: none, Nnop, Nisb, Nyield, where optional N is 2..9.") Could this N be changed to 2..99 instead of 2..9? I tested this with SpinWaitBench.pong:totalSpins (ops/us), 9nop or 9yield is unable to provide a similar pause time as 1isb, maybe ~20 can do. A larger range can be more convenient for future experiments/tunings. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From github.com+42899633+eastig at openjdk.java.net Thu Sep 23 11:07:58 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Thu, 23 Sep 2021 11:07:58 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v4] In-Reply-To: References: Message-ID: On Thu, 23 Sep 2021 10:45:10 GMT, Patrick Zhang wrote: >> Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: >> >> Move spin_wait in cpp file with removal of loop macro >> >> In addition, comments are added to a checking method of a test. > > src/hotspot/cpu/aarch64/globals_aarch64.hpp line 116: > >> 114: product(ccstr, OnSpinWaitImpl, "none", \ >> 115: "Use instructions to implement java.lang.Thread.onSpinWait()."\ >> 116: "Options: none, Nnop, Nisb, Nyield, where optional N is 2..9.") > > Could this N be changed to 2..99 instead of 2..9? I tested this with SpinWaitBench.pong:totalSpins (ops/us), 9nop or 9yield is unable to provide a similar pause time as 1isb, maybe ~20 can do. A larger range can be more convenient for future experiments/tunings. Thanks. As Andrew suggested to have separated options for the name and the count, this won't be an issue. ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From simonis at openjdk.java.net Thu Sep 23 11:16:56 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 23 Sep 2021 11:16:56 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v2] In-Reply-To: References: Message-ID: On Thu, 16 Sep 2021 17:00:20 GMT, Volker Simonis wrote: >> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. >> >> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): >> >> public static boolean isAlpha(int c) { >> try { >> return IS_ALPHA[c]; >> } catch (ArrayIndexOutOfBoundsException ex) { >> return false; >> } >> } >> >> >> ### Solution >> >> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: >> >> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op >> >> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op >> >> >> ### Implementation details >> >> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. >> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. >> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. >> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. >> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Minor updates as requested by @TheRealMDoerr Hi Dean, thank you for looking at this change. Unfortunately I don't completely understand your point. Are you saying we should change all user code which leads to "hot" exceptions such that they don't throw exceptions any more? That would certainly be a possibility, but I think it is neither practical nor customer obsessed. With that argument you wouldn't need the `-XX:+OmitStackTraceInFastThrow` optimization (which is turned on by default) in the first place :) I think it is a quite simple and pragmatic solution to also optimize implicit exceptions for users who run with `-XX:-OmitStackTraceInFastThrow` because they require full stack traces. Best regards, Volker ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From coleenp at openjdk.java.net Thu Sep 23 11:24:58 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 23 Sep 2021 11:24:58 GMT Subject: Integrated: 8273916: Remove 'special' ranking In-Reply-To: References: Message-ID: On Fri, 17 Sep 2021 11:50:22 GMT, Coleen Phillimore wrote: > This change removes the special ranking and folds it into nosafepoint. You have to look at commit #3 to see this actual part of the change that doesn't include JDK-8273915. > This passes tier1-6 also. This pull request has now been integrated. Changeset: d0987513 Author: Coleen Phillimore URL: https://git.openjdk.java.net/jdk/commit/d0987513665def1b6b2981ab5932b6f1b8b310d8 Stats: 45 lines in 5 files changed: 0 ins; 5 del; 40 mod 8273916: Remove 'special' ranking Reviewed-by: dholmes, pchilanomate ------------- PR: https://git.openjdk.java.net/jdk/pull/5563 From coleenp at openjdk.java.net Thu Sep 23 11:24:58 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 23 Sep 2021 11:24:58 GMT Subject: RFR: 8273916: Remove 'special' ranking [v2] In-Reply-To: References: Message-ID: On Mon, 20 Sep 2021 23:32:12 GMT, David Holmes wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Add comment about ThreadSMRDelete_lock > > So there is more to this than just removing the "special" ranking - you've also changed some locks that are safepoint_never, that used to have ranks above what is now nosafepoint, so that they instead have ranks below nosafepoint - is that right? As long a all relative rankings of locks that can be taken together is maintained, then that is okay - but it is very hard to see that just by looking at the changes. Thank you @dholmes-ora for the discussions and review! ------------- PR: https://git.openjdk.java.net/jdk/pull/5563 From github.com+42899633+eastig at openjdk.java.net Thu Sep 23 12:02:30 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Thu, 23 Sep 2021 12:02:30 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v5] In-Reply-To: References: Message-ID: > This PR is a follow-up on the discussion [?RFC: AArch64: Implementing spin pauses with ISB?](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054033.html). > > It adds the option `OnSpinWaitImpl=value`, where `value` can be: > > - `none`: no implementation for spin pauses. This is the default value. > - `Nnop`: use `nop` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `nop` instructions. > - `Nisb`: use `isb` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `isb` instructions. > - `Nyield`: use `yield` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `yield` instructions. > > The code for the `Thread.onSpinWait` intrinsic is generated based on the value of `OnSpinWaitImpl`. > > Testing: > > - `make test TEST="gtest"`: Passed > - `make run-test TEST="tier1"`: Passed > - `make run-test TEST="tier2"`: Passed > - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: Rename PauseImpl to SpinWait ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5562/files - new: https://git.openjdk.java.net/jdk/pull/5562/files/4db361e1..2beb6382 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5562&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5562&range=03-04 Stats: 46 lines in 5 files changed: 2 ins; 1 del; 43 mod Patch: https://git.openjdk.java.net/jdk/pull/5562.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5562/head:pull/5562 PR: https://git.openjdk.java.net/jdk/pull/5562 From github.com+42899633+eastig at openjdk.java.net Thu Sep 23 12:10:08 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Thu, 23 Sep 2021 12:10:08 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v6] In-Reply-To: References: Message-ID: > This PR is a follow-up on the discussion [?RFC: AArch64: Implementing spin pauses with ISB?](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054033.html). > > It adds the option `OnSpinWaitImpl=value`, where `value` can be: > > - `none`: no implementation for spin pauses. This is the default value. > - `Nnop`: use `nop` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `nop` instructions. > - `Nisb`: use `isb` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `isb` instructions. > - `Nyield`: use `yield` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `yield` instructions. > > The code for the `Thread.onSpinWait` intrinsic is generated based on the value of `OnSpinWaitImpl`. > > Testing: > > - `make test TEST="gtest"`: Passed > - `make run-test TEST="tier1"`: Passed > - `make run-test TEST="tier2"`: Passed > - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: Rename pause_aarch64.hpp to spin_wait_aarch64.hpp ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5562/files - new: https://git.openjdk.java.net/jdk/pull/5562/files/2beb6382..2f8dc2ae Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5562&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5562&range=04-05 Stats: 1 line in 2 files changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5562.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5562/head:pull/5562 PR: https://git.openjdk.java.net/jdk/pull/5562 From github.com+42899633+eastig at openjdk.java.net Thu Sep 23 12:19:51 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Thu, 23 Sep 2021 12:19:51 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v7] In-Reply-To: References: Message-ID: > This PR is a follow-up on the discussion [?RFC: AArch64: Implementing spin pauses with ISB?](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054033.html). > > It adds the option `OnSpinWaitImpl=value`, where `value` can be: > > - `none`: no implementation for spin pauses. This is the default value. > - `Nnop`: use `nop` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `nop` instructions. > - `Nisb`: use `isb` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `isb` instructions. > - `Nyield`: use `yield` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `yield` instructions. > > The code for the `Thread.onSpinWait` intrinsic is generated based on the value of `OnSpinWaitImpl`. > > Testing: > > - `make test TEST="gtest"`: Passed > - `make run-test TEST="tier1"`: Passed > - `make run-test TEST="tier2"`: Passed > - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: Remove redundancy in names ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5562/files - new: https://git.openjdk.java.net/jdk/pull/5562/files/2f8dc2ae..335a1813 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5562&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5562&range=05-06 Stats: 381 lines in 3 files changed: 188 ins; 189 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/5562.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5562/head:pull/5562 PR: https://git.openjdk.java.net/jdk/pull/5562 From coleenp at openjdk.java.net Thu Sep 23 12:31:51 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 23 Sep 2021 12:31:51 GMT Subject: RFR: 8274024: Use regular accessors to internal fields of oopDesc In-Reply-To: References: Message-ID: <0GT-wt5Y3dgau16L6NzmeCJ8mhV7Uo7tA0K0qlYGDQ8=.40ff1fdd-f7aa-4da3-8b27-afab4a6baab1@github.com> On Mon, 20 Sep 2021 18:59:34 GMT, Roman Kennke wrote: > Currently, we are using 'raw' accessors to initialize the mark, Klass*, (array-)length and klass_gap of oops. This is ugly and confusing and we should just use the regular accessors. > > Testing: > - [ ] tier1 > - [ ] tier2 > - [ ] hotspot_gc The places that Roman changed are the boundaries of where HeapWord* becomes oops so the cast_to_oops() here are appropriate. This is a nice cleanup and changes the sources to what they were before the shenandoah changes. ------------- PR: https://git.openjdk.java.net/jdk/pull/5585 From github.com+42899633+eastig at openjdk.java.net Thu Sep 23 14:14:38 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Thu, 23 Sep 2021 14:14:38 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v8] In-Reply-To: References: Message-ID: > This PR is a follow-up on the discussion [?RFC: AArch64: Implementing spin pauses with ISB?](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054033.html). > > It adds the option `OnSpinWaitImpl=value`, where `value` can be: > > - `none`: no implementation for spin pauses. This is the default value. > - `Nnop`: use `nop` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `nop` instructions. > - `Nisb`: use `isb` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `isb` instructions. > - `Nyield`: use `yield` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `yield` instructions. > > The code for the `Thread.onSpinWait` intrinsic is generated based on the value of `OnSpinWaitImpl`. > > Testing: > > - `make test TEST="gtest"`: Passed > - `make run-test TEST="tier1"`: Passed > - `make run-test TEST="tier2"`: Passed > - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: Separate OnSpinWaitImpl into OnSpinWaitInst and OnSpinWaitInstCount ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5562/files - new: https://git.openjdk.java.net/jdk/pull/5562/files/335a1813..d4a5183a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5562&range=07 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5562&range=06-07 Stats: 34 lines in 4 files changed: 4 ins; 5 del; 25 mod Patch: https://git.openjdk.java.net/jdk/pull/5562.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5562/head:pull/5562 PR: https://git.openjdk.java.net/jdk/pull/5562 From aph at openjdk.java.net Thu Sep 23 14:14:39 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 23 Sep 2021 14:14:39 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v4] In-Reply-To: References: Message-ID: On Thu, 23 Sep 2021 11:04:32 GMT, Evgeny Astigeevich wrote: >> src/hotspot/cpu/aarch64/globals_aarch64.hpp line 116: >> >>> 114: product(ccstr, OnSpinWaitImpl, "none", \ >>> 115: "Use instructions to implement java.lang.Thread.onSpinWait()."\ >>> 116: "Options: none, Nnop, Nisb, Nyield, where optional N is 2..9.") >> >> Could this N be changed to 2..99 instead of 2..9? I tested this with SpinWaitBench.pong:totalSpins (ops/us), 9nop or 9yield is unable to provide a similar pause time as 1isb, maybe ~20 can do. A larger range can be more convenient for future experiments/tunings. Thanks. > > As Andrew suggested to have separated options for the name and the count, this won't be an issue. It was more of a question than a suggestion. I'm not sure, and I'd like you to have a look at existing options to see if there's a precedent. ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From github.com+42899633+eastig at openjdk.java.net Thu Sep 23 15:11:07 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Thu, 23 Sep 2021 15:11:07 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v4] In-Reply-To: References: Message-ID: On Thu, 23 Sep 2021 14:09:12 GMT, Andrew Haley wrote: >> As Andrew suggested to have separated options for the name and the count, this won't be an issue. > > It was more of a question than a suggestion. I'm not sure, and I'd like you to have a look at existing options to see if there's a precedent. I like the suggestion to have them separated. It makes code simpler. A separate `OnSpinWaitInstCount` uses standard ways to define it as an integer with a range. I don't need to write code for this. Problems with the combined approach start when a number of instructions to use needs to be more than 9. I have found no precedent of combined options. ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From stefank at openjdk.java.net Thu Sep 23 15:16:01 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Thu, 23 Sep 2021 15:16:01 GMT Subject: RFR: 8274024: Use regular accessors to internal fields of oopDesc In-Reply-To: References: Message-ID: On Mon, 20 Sep 2021 18:59:34 GMT, Roman Kennke wrote: > Currently, we are using 'raw' accessors to initialize the mark, Klass*, (array-)length and klass_gap of oops. This is ugly and confusing and we should just use the regular accessors. > > Testing: > - [ ] tier1 > - [ ] tier2 > - [ ] hotspot_gc I think I agree with David that we should try to stay away from cast something to an oop before the object is fully created. ------------- Changes requested by stefank (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5585 From simonis at openjdk.java.net Thu Sep 23 16:34:53 2021 From: simonis at openjdk.java.net (Volker Simonis) Date: Thu, 23 Sep 2021 16:34:53 GMT Subject: RFR: 8273392: Improve usability of stack-less exceptions due to -XX:+OmitStackTraceInFastThrow [v3] In-Reply-To: <02TswvRnnXEQeNetqwBh8XiivQsMFUCEIt758AdjrHk=.6216480f-2031-481d-8bf6-e94e8c5f1977@github.com> References: <02TswvRnnXEQeNetqwBh8XiivQsMFUCEIt758AdjrHk=.6216480f-2031-481d-8bf6-e94e8c5f1977@github.com> Message-ID: <4vVjDnZ2_joqYIMVwTm2p5ZIecp1i4t4o6_3o8BCtgY=.27940f77-156c-4e36-99a3-0a57ebb0914e@github.com> On Tue, 21 Sep 2021 10:09:11 GMT, Volker Simonis wrote: >> If running with `-XX:+OmitStackTraceInFastThrow` (which is the default) C2 will optimize certain "hot" implicit exceptions (i.e. AIOOBE, NullPointerExceptions,..) and replace them by a static, pre-allocated exception without any stacktrace. >> >> However, we can actually do better. Instead of using a single, pre-allocated exception object for all methods we can let the compiler allocate specific exceptions for each compilation unit (i.e. nmethod) and fill them with at least one stack frame with the method /line-number information of the currently compiled method. If the method in question is being inlined (which often happens), we can add stackframes for all callers up to the inlining depth of the method in question. >> >> For the attached JTreg test, we get the following exception in interpreter mode: >> >> java.lang.NullPointerException: Cannot read the array length because "" is null >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) >> at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) >> at compiler.exceptions.StackFrameInFastThrow.main(StackFrameInFastThrow.java:233) >> >> Once the method gets compiled with `-XX:+OmitStackTraceInFastThrow` the same exception will look as follows: >> >> java.lang.NullPointerException >> >> After this change, if `StackFrameInFastThrow.throwImplicitException()` will be compiled stand alone, we will get: >> >> java.lang.NullPointerException >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> >> and if `StackFrameInFastThrow.throwImplicitException()` will be inlined into `level2()` and `level2()` into `level1()` we will get the following exception (altough we're still running with `-XX:+OmitStackTraceInFastThrow`): >> >> java.lang.NullPointerException >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) >> at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) >> >> The new functionality is guarded by `-XX:+/-StackFrameInFastThrow`, but switched on by default (I'll create a CSR for the new option once reviewers are comfortable with the change). Notice that the optimization comes at no run-time costs because all the extra work will be done at compile time. >> >> ## Implementation details >> >> - Already the current implementation of `-XX:+OmitStackTraceInFastThrow` potentially lazy-allocates the empty singleton exceptions like AIOOBE in `ciEnv::ArrayStoreException_instance()`. With this change, if running with `-XX:+StackFrameInFastThrow` we will always allocate new exception objects and populate them with the stack frames which are statically available at compile time (see `java_lang_Throwable::fill_in_stack_trace_of_implicit_exception()`). >> - Because nmethods don't act as strong GC roots, we have to create a global JNI handle for every newly generated exception to prevent GC from collecting them. >> - In order to avoid a memory leak we have to release these global JNI handles once a nmethod gets unloaded. In order to achieve this, I've added a new section "implicit exceptions" to the nmethod which holds these JNI handles. >> - While adding the new "implicit exceptions" section to the corresponding stats (`print_nmethod_stats()` and printing routines (`nmethod::print()`) I realized that a previous change ([JDK-8254231: Implementation of Foreign Linker API (Incubator)](https://bugs.openjdk.java.net/browse/JDK-8254231)) had already introduced a new nmethod section ("native invokers") but missed to add it to the corresponding stats and printing routines so I've added that section as well. >> - The `#ifdef COMPILER2` guards are only required to not break the `zero`/`minimal` builds. >> - The JTreg test is using `-XX:PerMethodTrapLimit=0` to handle all implicit exceptions as "hot". This makes the test simpler and at the same time provokes the allocation of more implicit exceptions. >> - Manually verified that the created Exception objects are freed by GC once the corresponding nmethods have been flushed. >> - Manual "stress" test with a very small heap and continuous recompilation of methods with explicit exceptions to provoke GCs during compilation didn't reveal any issues. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Create implcit exceptions with an array of StackTraceElements right away instead of creating a backtrace. This prevents that implicit exceptions will keep classes alive due to Java mirrors in the backtrace. Hi Jon, thanks for looking at this PR. Let me reply to your comments inline: > To me this looks like a very clever mess. The mess comes from the trickiness (it's a tricky problem!) and even more from forcing various parts of the system that are usually isolated to come into contact. Adding reasons to GC during a JIT task is a smell. Adding objects which are pieced together at compile time is a smell. (And you can't run Java code from the JIT; it's an architectural limitation.) I agree. But we already have all this mess today with the current implementation of `-XX:+OmitStackTraceInFastThrow` which already lazily creates empty exceptions and introduces all the problems you describe (see `ciEnv::get_or_create_exception()`). It's to a much lesser extent compared to this change, but fundamentally it is not different. > Having JavaClasses talk directly to C2 GraphKit (without even a CI class between) is a smell. Adding a new section to nmethods just to make a poorly-understood life cycle for an odd (non-Java-created) group of exceptions is a smell. > > If we need a new section on nmethods it should be something more like "Java structures the JIT has made", with clearly separated concerns from the rest of the system, rather than "my very special section for an intrusive RFE". > That's a good proposal and I'm happy to work on such a solution. What do you mean exactly by ".._clearly separated concerns from the rest of the system_.."? > This is not even close to being ready to integrate. As I said, I'm happy to invest more work and improve this PR based on your suggestion if there's a chance for this feature to be accepted (even if only in a heavily revised form). But in general I think the **biggest mess** is really that users still get empty exceptions without any information at all and I think it is time to fix that. Unfortunately I can't see into the history of this code before jdk 6, but from [JDK-4292742: NullPointerException with no stack trace](https://bugs.openjdk.java.net/browse/JDK-4292742) it looks like you already worked on this issue almost 20 years ago :) So what about removing JavaClasses' dependency on GraphKit and making the new nmethod section more generally usable as you suggested? Are there any other pain points before reconsidering this PR? Any other suggestions you like me to integrate? Thank you and best regards, Volker ------------- PR: https://git.openjdk.java.net/jdk/pull/5392 From pchilanomate at openjdk.java.net Thu Sep 23 16:52:00 2021 From: pchilanomate at openjdk.java.net (Patricio Chilano Mateo) Date: Thu, 23 Sep 2021 16:52:00 GMT Subject: RFR: 8273608: Deadlock when jcmd of OnError attaches to itself In-Reply-To: References: <3mILJs2Lcq7t5gUDP70FH2LVsm-NT2UTsm1JY-rCKB0=.d4dd4b55-9476-43ad-a0cf-ce6d2a9bef4e@github.com> <6pffdSxbdmf8m_wslp0HWlnENY-PuV-bDG2L_vrV4TM=.9583da02-8384-4d84-b1bd-f36b1457e362@github.com> Message-ID: On Thu, 23 Sep 2021 05:27:58 GMT, Thomas Stuefe wrote: >> src/hotspot/share/utilities/vmError.cpp line 1341: >> >>> 1339: ~VMErrorThreadToNativeFromVM() { >>> 1340: if (_thread != nullptr) { >>> 1341: ThreadStateTransition::transition_from_native(_thread, _thread_in_vm); >> >> Just some thought on this part. Ideally we should avoid calling process_if_requested_with_exit_check() since attempting to process handshakes/stackwatermarks at this point might lead to all sorts of other issues. An alternative could be to just set the original state back and continue. But maybe we don't care about this because we are almost done with the error reporting and the OnError commands were already executed. That last part would argue to move the wrapper before the while loop. > > I agree with @pchilano (if I understand him correctly) in that I think an RAII object here is not even needed. There is no need to re-instate the prior thread state. As long as we don't call process_if_requested_with_exit_check() I think either way should be fine, i.e. manually restoring the state or just continue, since after this the only thing left is to call os::die/os::abort. If we reuse this logic with jfrEmergencyDump::on_vm_shutdown() then we would have to restore it, since I see print_bug_submit_message() which happens after that checks the state. ------------- PR: https://git.openjdk.java.net/jdk/pull/5590 From pchilanomate at openjdk.java.net Thu Sep 23 17:00:56 2021 From: pchilanomate at openjdk.java.net (Patricio Chilano Mateo) Date: Thu, 23 Sep 2021 17:00:56 GMT Subject: RFR: 8273608: Deadlock when jcmd of OnError attaches to itself In-Reply-To: References: <3mILJs2Lcq7t5gUDP70FH2LVsm-NT2UTsm1JY-rCKB0=.d4dd4b55-9476-43ad-a0cf-ce6d2a9bef4e@github.com> Message-ID: On Thu, 23 Sep 2021 07:26:05 GMT, Xin Liu wrote: > hi, Reviewers, > Thanks for the comments. > > > Can you please provide a regression test for this? > > yes, it should have a test. I am working on this. > > > We can always walk _mutex_array like we do in print_owned_locks_on_error(). Note that locks created outside mutex_init() will not be visible though. Maybe we should fix that. > > Thank Patricio for this idea. I jog down a quick scan. I would like to have Thread::owns_locks() in release build, but as you said, it's different from `Thread::owns_locks() { return owned_locks() != NULL; }`. It only covers mutex_init(). I think it should be a standalone issue. > > Could you also take a look at `jfrEmergencyDump::on_vm_shutdown` in jfrEmergencyDump.cpp ? > I think it's very similar logic. Even if we don't use RAII, I think it's still possible to have a reusable procedure. > > ``` > // PreConds: 1) current != null 2) current->is_Java_Thread 3. current state is VM 4) successfully unlock all owning locks. > // return: true if succeed. > bool transition_current_to_native() > ``` Yes, it's very similar except they need to be in _thread_in_vm initially and then switch to native. If we do reuse it maybe the wrapper should receive the state to transition to on construction to avoid that extra call to transition_current_to_native(). ------------- PR: https://git.openjdk.java.net/jdk/pull/5590 From xliu at openjdk.java.net Thu Sep 23 18:21:55 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Thu, 23 Sep 2021 18:21:55 GMT Subject: RFR: 8273608: Deadlock when jcmd of OnError attaches to itself In-Reply-To: <2C2ZjJtbXq06HV3JNSBw35iy3otCExPJQ8k2YXhFgTQ=.115e3d34-162e-4e2f-b656-cdb874aa8246@github.com> References: <3mILJs2Lcq7t5gUDP70FH2LVsm-NT2UTsm1JY-rCKB0=.d4dd4b55-9476-43ad-a0cf-ce6d2a9bef4e@github.com> <2C2ZjJtbXq06HV3JNSBw35iy3otCExPJQ8k2YXhFgTQ=.115e3d34-162e-4e2f-b656-cdb874aa8246@github.com> Message-ID: <2YUlZU9QzZoLdrkTluIYA8OGhQ48-g2xjfCMXtysZMs=.ccb75999-49df-4200-a265-aaa117d8ef23@github.com> On Thu, 23 Sep 2021 05:23:01 GMT, Thomas Stuefe wrote: >> This patch allows the custom commands of OnError to attach to HotSpot itself. >> It sets the thread of report_and_die() to Native before os::fork_and_exec(cmd). >> This prevents cmds which require safepoint synchronization from deadlock. >> eg. OnError='jcmd %p Thread.print'. >> >> Without this patch, we will encounter a deadlock at safepoint synchronization. >> `"main" #1` is the very thread which executes `os::fork_and_exec(cmd)`. >> >> >> Aborting due to java.lang.OutOfMemoryError: Java heap space >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (debug.cpp:364), pid=94632, tid=94633 >> # fatal error: OutOfMemory encountered: Java heap space >> # >> # JRE version: OpenJDK Runtime Environment (18.0) (build 18-internal+0-adhoc.xxinliu.jdk) >> # Java VM: OpenJDK 64-Bit Server VM (18-internal+0-adhoc.xxinliu.jdk, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) >> # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again >> # >> # An error report file with more information is saved as: >> # /local/home/xxinliu/JDK-2085/hs_err_pid94632.log >> # >> # -XX:OnError="jcmd %p Thread.print" >> # Executing /bin/sh -c "jcmd 94632 Thread.print" ... >> 94632: >> [10.616s][warning][safepoint] >> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timeout detected: >> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timed out while spinning to reach a safepoint. >> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Threads which did not reach the safepoint: >> [10.616s][warning][safepoint] # "main" #1 prio=5 os_prio=0 cpu=236.97ms elapsed=10.61s tid=0x00007f01b00232f0 nid=94633 runnable [0x00007f01b7a08000] >> [10.616s][warning][safepoint] java.lang.Thread.State: RUNNABLE >> [10.616s][warning][safepoint] >> [10.616s][warning][safepoint] # SafepointSynchronize::begin: (End of list) > > src/hotspot/share/utilities/vmError.cpp line 1330: > >> 1328: _thread = JavaThread::cast(t); >> 1329: assert(_thread == Thread::current(), "must be current thread"); >> 1330: assert(_thread->thread_state() == _thread_in_vm, "must be in VM"); > > Don't assert here. We are in the middle of error handling. This would just lead to recursive errors and very probably to "too many errors, abort". Acknowledge! I will avoid asserts and reverting the state back. ------------- PR: https://git.openjdk.java.net/jdk/pull/5590 From svkamath at openjdk.java.net Thu Sep 23 22:23:52 2021 From: svkamath at openjdk.java.net (Smita Kamath) Date: Thu, 23 Sep 2021 22:23:52 GMT Subject: RFR: 8273297: AES/GCM non-AVX512+VAES CPUs suffer after 8267125 [v2] In-Reply-To: References: Message-ID: On Mon, 20 Sep 2021 16:44:58 GMT, Anthony Scarpino wrote: >> Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: >> >> Added a wrapper around aes-gcm intrinsic, changed data size in TestAESMain and added a new constant for htbl entries > > I approve the jdk changes. You'll need a hotspot reviewer to approve the other changes @ascarpino Is it okay to integrate this patch? ------------- PR: https://git.openjdk.java.net/jdk/pull/5402 From sviswanathan at openjdk.java.net Thu Sep 23 22:23:52 2021 From: sviswanathan at openjdk.java.net (Sandhya Viswanathan) Date: Thu, 23 Sep 2021 22:23:52 GMT Subject: RFR: 8273297: AES/GCM non-AVX512+VAES CPUs suffer after 8267125 [v5] In-Reply-To: <-FJux8XZG1ra8W4DNa31PEP_pWxoIkVvf9qHyYRSEBM=.5b57d887-c321-4fd0-9c64-9ea2202ac774@github.com> References: <-FJux8XZG1ra8W4DNa31PEP_pWxoIkVvf9qHyYRSEBM=.5b57d887-c321-4fd0-9c64-9ea2202ac774@github.com> Message-ID: On Wed, 22 Sep 2021 22:48:32 GMT, Smita Kamath wrote: >> Performance dropped up to 10% for 1k data after 8267125 for CPUs that do not support the new intrinsic. Tests run were crypto.full.AESGCMBench and crypto.full.AESGCMByteBuffer from the jmh micro benchmarks. >> >> The problem is each instance of GHASH allocates 96 extra longs for the AVX512+VAES intrinsic regardless if the intrinsic is used. This extra table space should be allocated differently so that non-supporting CPUs do not suffer this penalty. This issue also affects non-Intel CPUs too. > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Added htbl_entries constant to other architectures Hotspot changes look good. ------------- PR: https://git.openjdk.java.net/jdk/pull/5402 From coleenp at openjdk.java.net Thu Sep 23 22:44:04 2021 From: coleenp at openjdk.java.net (Coleen Phillimore) Date: Thu, 23 Sep 2021 22:44:04 GMT Subject: RFR: 8264707: HotSpot Style Guide should permit use of lambda [v2] In-Reply-To: References: Message-ID: On Tue, 21 Sep 2021 22:05:09 GMT, Kim Barrett wrote: >> Please review this proposal to permit the use of lambda expressions in >> HotSpot code, with some restrictions and suggestions for good usage within >> HotSpot code. Lambda expressions were added in C++11, and provide a more >> expressive syntax for local functions, with a number of use-cases where they >> can improve readability by eliminating a lot of uninteresting boilerplate. >> >> Some example uses are included, but are not part of the proposed change. >> They will be removed from the PR before it is pushed. (In particular, the >> ScopeGuard utility uses move semantics, the use of which hasn't been >> approved or even discussed.) They are given to show some of the benefits >> that might accrue from permitting the use of lambdas. In particular, they >> highlight some of the code reduction that is possible. Some of these code >> changes might be proposed in the future, using the normal PR process. >> >> This is a modification of the Style Guide, so rough consensus among the >> HotSpot Group members is required to make this change. Only Group members >> should vote for approval (via the github PR), though reasoned objections or >> comments from anyone will be considered. A decision on this proposal will >> not be made before Wednesday 1-Sep-2021 at 12h00 UTC. >> >> Since we're piggybacking on github PRs here, please use the PR review >> process to approve (click on Review Changes > Approve), rather than sending >> a "vote: yes" email reply that would be normal for a CFV. > > Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into permit_lambda > - terminology fix > - add scope guard and some example uses > - G1 SATB filter lambda > - new local functions section I approve of adding lambdas as specified in the coding standard. src/hotspot/share/compiler/compilerOracle.cpp line 787: > 785: > 786: char* original_line = os::strdup(line, mtInternal); > 787: auto g = make_guard([&] { os::free(original_line); }); I have to admit that this usage is very mysterious. I know you're going to take it out but using lambdas for something that's really hard to parse doesn't seem like a win. Does save lines of code though. src/hotspot/share/gc/g1/g1SATBMarkQueueSet.cpp line 104: > 102: return !requires_marking(entry, _g1h) || _g1h->is_marked_next(cast_to_oop(entry)); > 103: }; > 104: apply_filter(requires_discard, queue); This on the other hand, is a nice usage and saves a lot of boilerplate, and it's not like operator() was that easy to read in the first place. ------------- Marked as reviewed by coleenp (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5144 From dlong at openjdk.java.net Thu Sep 23 22:55:54 2021 From: dlong at openjdk.java.net (Dean Long) Date: Thu, 23 Sep 2021 22:55:54 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v2] In-Reply-To: References: Message-ID: On Thu, 16 Sep 2021 17:00:20 GMT, Volker Simonis wrote: >> Currently, if running with `-XX:-OmitStackTraceInFastThrow`, C2 has no possibility to create implicit exceptions like AIOOBE, NullPointerExceptions, etc. in compiled code. This means that such methods will always be deoptimized and re-executed in the interpreter if such exceptions are happening. >> >> If implicit exceptions are used for normal control flow, that can have a dramatic impact on performance. A prominent example for such code is [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274): >> >> public static boolean isAlpha(int c) { >> try { >> return IS_ALPHA[c]; >> } catch (ArrayIndexOutOfBoundsException ex) { >> return false; >> } >> } >> >> >> ### Solution >> >> Instead of deoptimizing and resorting to the interpreter, we can generate code which allocates and initializes the corresponding exceptions right in compiled code. This results in a ten-times performance improvement for the above code: >> >> -XX:-OmitStackTraceInFastThrow -XX:-OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.430 ? 0.353 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 3563.038 ? 77.358 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 8609.693 ? 1205.104 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 12842.401 ? 1022.728 ns/op >> >> -XX:-OmitStackTraceInFastThrow -XX:+OptimizeImplicitExceptions >> Benchmark (exceptionProbability) Mode Cnt Score Error Units >> ImplicitExceptions.bench 0.0 avgt 5 1.432 ? 0.352 ns/op >> ImplicitExceptions.bench 0.33 avgt 5 355.723 ? 16.641 ns/op >> ImplicitExceptions.bench 0.66 avgt 5 887.068 ? 166.728 ns/op >> ImplicitExceptions.bench 1.00 avgt 5 1274.418 ? 88.235 ns/op >> >> >> ### Implementation details >> >> - The new optimization is guarded by the option `OptimizeImplicitExceptions` which is on by default. >> - In `GraphKit::builtin_throw()` we can't simply use `CallGenerator::for_direct_call()` to create a `DirectCallGenerator` for the call to the exception's `` function because `DirectCallGenerator` assumes in various places that calls are only issued at `invoke*` bytecodes. This is is not true in genral for bytecode which can cause an implicit exception. >> - Instead, we manually wire up the call based on the code in `DirectCallGenerator::generate()`. >> - We use a similar trick like for method handle intrinsics where the callee from the bytecode is replaced by a direct call and this fact is recorded in the call's `_override_symbolic_info` field. For calling constructors of implicit exceptions I've introduced the new field `_implicit_exception_init`. This field is also used in various assertions to prevent queries for the bytecode's symbolic method information which doesn't exist because we're not at an `invoke*` bytecode at the place where we generate the call. >> - The PR contains a micro-benchmark which compares the old and the new implementation for [Tomcat's `HttpParser::isAlpha()` method](https://github.com/apache/tomcat/blob/26ba86cdbd40ca718e43b82e62b3eb49d004c3d6/java/org/apache/tomcat/util/http/parser/HttpParser.java#L266-L274). Except for the trivial case where the exception probability is 0 (i.e. no exceptions are happening at all) the new implementation is about 10 times faster. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Minor updates as requested by @TheRealMDoerr OK, so user/customer wants or needs to run with -XX:-OmitStackTraceInFastThrow, and there is code like isAlpha() throwing a hot exception. Does the user really care about the stack trace and -XX:-OmitStackTraceInFastThrow setting for this method? If the compiler could eliminate the stack trace for this and similar methods, or even better, eliminate the exception too, like it does for other allocations through escape analysis, would that solve your use cases? Or are there examples where the hot exception escapes and we really need to create it with a stack trace and throw it? I guess the amount of effort the JVM does to support "hot exceptions" (which seems like an oxymoron to me), surprises me, so the thought off adding even more complexity concerns me. But I'm not an expert on this part of the code, so let's see what other JIT experts think. ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From dholmes at openjdk.java.net Thu Sep 23 23:24:15 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 23 Sep 2021 23:24:15 GMT Subject: RFR: 8274136: -XX:+ExitOnOutOfMemoryError calls exit while threads are running Message-ID: Please see bug report for more detailed discussion. We introduce `os::_exit()` to `call _exit()` to allow us to terminate without running the at_exit handlers and global destructors, which lead to the crashes during termination. Testing: tiers 1-3 (includes the ExitOnOutOfMemoryError test) Thanks, David ------------- Commit messages: - 8274136: -XX:+ExitOnOutOfMemoryError calls exit while threads are running Changes: https://git.openjdk.java.net/jdk/pull/5668/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5668&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8274136 Stats: 16 lines in 4 files changed: 12 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/5668.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5668/head:pull/5668 PR: https://git.openjdk.java.net/jdk/pull/5668 From stuefe at openjdk.java.net Fri Sep 24 07:46:53 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 24 Sep 2021 07:46:53 GMT Subject: RFR: 8274136: -XX:+ExitOnOutOfMemoryError calls exit while threads are running In-Reply-To: References: Message-ID: On Thu, 23 Sep 2021 23:15:28 GMT, David Holmes wrote: > Please see bug report for more detailed discussion. > > We introduce `os::_exit()` to `call _exit()` to allow us to terminate without running the at_exit handlers and global destructors, which lead to the crashes during termination. > > Testing: tiers 1-3 (includes the ExitOnOutOfMemoryError test) > > Thanks, > David LGTM. Your assumption that `-XX:+ExitOnOutOfMemoryError` should stop the VM painlessly is what I think too. Our customers use it in scenarios where the VM should go down, quickly, with a minimum of fuzz. E.g. in cloud scenarios, where you want to restart the VM as fast as possible. OTOH, `-XX:+CrashOnOutOfMemoryError` should give you a hs-err file and a core, creating either may hang or at least delay matters. Incidentally, in our SapMachine we added some subtle behavioral changes (https://github.com/SAP/SapMachine/wiki/Handling-of-OnOutOfMemoryError-switches-in-the-SapMachine, see italics). I know we talked about handling Thread exhaustion events, but what about simple stack printing to stdout, do you think that would be useful upstream? ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5668 From lkorinth at openjdk.java.net Fri Sep 24 08:52:54 2021 From: lkorinth at openjdk.java.net (Leo Korinth) Date: Fri, 24 Sep 2021 08:52:54 GMT Subject: RFR: 8269537: memset() is called after operator new [v2] In-Reply-To: References: Message-ID: On Wed, 8 Sep 2021 12:57:18 GMT, Leo Korinth wrote: >> Hmm, u8 was not what I was thinking, I will change that to a uint8_t in the next update... > > I hit the new assert when not on Linux, I guess it has to do with the initialization of the thread local variable. Thanks Ioi for making me adding the assert!!! The sequencing of the allocation function and the arguments to the constructor is not what I thought, so my "solution" is not working. I am unsure how to resolve this in a good way. We could probably have a small thread local collection of (type, address) pairs, but I wonder if it is not better removing this debug information altogether. It is to my knowledge only used in GrowableArray (to limit the type of its internal allocations) and to hinder delete on resource allocated objects. ------------- PR: https://git.openjdk.java.net/jdk/pull/5387 From hseigel at openjdk.java.net Fri Sep 24 12:18:51 2021 From: hseigel at openjdk.java.net (Harold Seigel) Date: Fri, 24 Sep 2021 12:18:51 GMT Subject: RFR: 8274136: -XX:+ExitOnOutOfMemoryError calls exit while threads are running In-Reply-To: References: Message-ID: <8R3oJm5hG-wnTZiLZgSwmkJwXQD7Y7o7s6EZiYxu1UU=.325f3c3d-bb78-4f84-b558-703b66a8c6be@github.com> On Thu, 23 Sep 2021 23:15:28 GMT, David Holmes wrote: > Please see bug report for more detailed discussion. > > We introduce `os::_exit()` to `call _exit()` to allow us to terminate without running the at_exit handlers and global destructors, which lead to the crashes during termination. > > Testing: tiers 1-3 (includes the ExitOnOutOfMemoryError test) > > Thanks, > David The changes look good. Thanks for fixing it. Harold ------------- Marked as reviewed by hseigel (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5668 From rrich at openjdk.java.net Fri Sep 24 13:33:55 2021 From: rrich at openjdk.java.net (Richard Reingruber) Date: Fri, 24 Sep 2021 13:33:55 GMT Subject: RFR: 8273563: Improve performance of implicit exceptions with -XX:-OmitStackTraceInFastThrow [v2] In-Reply-To: References: Message-ID: On Tue, 21 Sep 2021 18:09:48 GMT, Martin Doerr wrote: > Would be interesting to know what else benefits from it. Maybe startup performance (class loading may use many Exceptions). That's true. Many exceptions are thrown in classloading but these are not builtin exceptions and therefore not affected. ------------- PR: https://git.openjdk.java.net/jdk/pull/5488 From ascarpino at openjdk.java.net Fri Sep 24 16:06:54 2021 From: ascarpino at openjdk.java.net (Anthony Scarpino) Date: Fri, 24 Sep 2021 16:06:54 GMT Subject: RFR: 8273297: AES/GCM non-AVX512+VAES CPUs suffer after 8267125 [v5] In-Reply-To: <-FJux8XZG1ra8W4DNa31PEP_pWxoIkVvf9qHyYRSEBM=.5b57d887-c321-4fd0-9c64-9ea2202ac774@github.com> References: <-FJux8XZG1ra8W4DNa31PEP_pWxoIkVvf9qHyYRSEBM=.5b57d887-c321-4fd0-9c64-9ea2202ac774@github.com> Message-ID: <-WygjWwDsLnk3pjW6ukRb_TukFOY20t_Y7JHQM2rG4U=.6e401872-ce13-411b-bcd3-9a986e8ce45a@github.com> On Wed, 22 Sep 2021 22:48:32 GMT, Smita Kamath wrote: >> Performance dropped up to 10% for 1k data after 8267125 for CPUs that do not support the new intrinsic. Tests run were crypto.full.AESGCMBench and crypto.full.AESGCMByteBuffer from the jmh micro benchmarks. >> >> The problem is each instance of GHASH allocates 96 extra longs for the AVX512+VAES intrinsic regardless if the intrinsic is used. This extra table space should be allocated differently so that non-supporting CPUs do not suffer this penalty. This issue also affects non-Intel CPUs too. > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Added htbl_entries constant to other architectures I think it's ready to integrate ------------- PR: https://git.openjdk.java.net/jdk/pull/5402 From aph at openjdk.java.net Fri Sep 24 17:07:56 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Fri, 24 Sep 2021 17:07:56 GMT Subject: RFR: 8273297: AES/GCM non-AVX512+VAES CPUs suffer after 8267125 [v5] In-Reply-To: <-FJux8XZG1ra8W4DNa31PEP_pWxoIkVvf9qHyYRSEBM=.5b57d887-c321-4fd0-9c64-9ea2202ac774@github.com> References: <-FJux8XZG1ra8W4DNa31PEP_pWxoIkVvf9qHyYRSEBM=.5b57d887-c321-4fd0-9c64-9ea2202ac774@github.com> Message-ID: On Wed, 22 Sep 2021 22:48:32 GMT, Smita Kamath wrote: >> Performance dropped up to 10% for 1k data after 8267125 for CPUs that do not support the new intrinsic. Tests run were crypto.full.AESGCMBench and crypto.full.AESGCMByteBuffer from the jmh micro benchmarks. >> >> The problem is each instance of GHASH allocates 96 extra longs for the AVX512+VAES intrinsic regardless if the intrinsic is used. This extra table space should be allocated differently so that non-supporting CPUs do not suffer this penalty. This issue also affects non-Intel CPUs too. > > Smita Kamath has updated the pull request incrementally with one additional commit since the last revision: > > Added htbl_entries constant to other architectures Marked as reviewed by aph (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5402 From svkamath at openjdk.java.net Fri Sep 24 19:25:57 2021 From: svkamath at openjdk.java.net (Smita Kamath) Date: Fri, 24 Sep 2021 19:25:57 GMT Subject: Integrated: 8273297: AES/GCM non-AVX512+VAES CPUs suffer after 8267125 In-Reply-To: References: Message-ID: On Tue, 7 Sep 2021 22:31:30 GMT, Smita Kamath wrote: > Performance dropped up to 10% for 1k data after 8267125 for CPUs that do not support the new intrinsic. Tests run were crypto.full.AESGCMBench and crypto.full.AESGCMByteBuffer from the jmh micro benchmarks. > > The problem is each instance of GHASH allocates 96 extra longs for the AVX512+VAES intrinsic regardless if the intrinsic is used. This extra table space should be allocated differently so that non-supporting CPUs do not suffer this penalty. This issue also affects non-Intel CPUs too. This pull request has now been integrated. Changeset: 13e9ea9e Author: Smita Kamath Committer: Anthony Scarpino URL: https://git.openjdk.java.net/jdk/commit/13e9ea9e922030927775345b1abde1313a6ec03f Stats: 116 lines in 16 files changed: 60 ins; 2 del; 54 mod 8273297: AES/GCM non-AVX512+VAES CPUs suffer after 8267125 Reviewed-by: ascarpino, sviswanathan, aph ------------- PR: https://git.openjdk.java.net/jdk/pull/5402 From dlong at openjdk.java.net Sat Sep 25 00:08:53 2021 From: dlong at openjdk.java.net (Dean Long) Date: Sat, 25 Sep 2021 00:08:53 GMT Subject: RFR: 8273392: Improve usability of stack-less exceptions due to -XX:+OmitStackTraceInFastThrow [v3] In-Reply-To: <02TswvRnnXEQeNetqwBh8XiivQsMFUCEIt758AdjrHk=.6216480f-2031-481d-8bf6-e94e8c5f1977@github.com> References: <02TswvRnnXEQeNetqwBh8XiivQsMFUCEIt758AdjrHk=.6216480f-2031-481d-8bf6-e94e8c5f1977@github.com> Message-ID: On Tue, 21 Sep 2021 10:09:11 GMT, Volker Simonis wrote: >> If running with `-XX:+OmitStackTraceInFastThrow` (which is the default) C2 will optimize certain "hot" implicit exceptions (i.e. AIOOBE, NullPointerExceptions,..) and replace them by a static, pre-allocated exception without any stacktrace. >> >> However, we can actually do better. Instead of using a single, pre-allocated exception object for all methods we can let the compiler allocate specific exceptions for each compilation unit (i.e. nmethod) and fill them with at least one stack frame with the method /line-number information of the currently compiled method. If the method in question is being inlined (which often happens), we can add stackframes for all callers up to the inlining depth of the method in question. >> >> For the attached JTreg test, we get the following exception in interpreter mode: >> >> java.lang.NullPointerException: Cannot read the array length because "" is null >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) >> at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) >> at compiler.exceptions.StackFrameInFastThrow.main(StackFrameInFastThrow.java:233) >> >> Once the method gets compiled with `-XX:+OmitStackTraceInFastThrow` the same exception will look as follows: >> >> java.lang.NullPointerException >> >> After this change, if `StackFrameInFastThrow.throwImplicitException()` will be compiled stand alone, we will get: >> >> java.lang.NullPointerException >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> >> and if `StackFrameInFastThrow.throwImplicitException()` will be inlined into `level2()` and `level2()` into `level1()` we will get the following exception (altough we're still running with `-XX:+OmitStackTraceInFastThrow`): >> >> java.lang.NullPointerException >> at compiler.exceptions.StackFrameInFastThrow.throwImplicitException(StackFrameInFastThrow.java:76) >> at compiler.exceptions.StackFrameInFastThrow.level2(StackFrameInFastThrow.java:95) >> at compiler.exceptions.StackFrameInFastThrow.level1(StackFrameInFastThrow.java:99) >> >> The new functionality is guarded by `-XX:+/-StackFrameInFastThrow`, but switched on by default (I'll create a CSR for the new option once reviewers are comfortable with the change). Notice that the optimization comes at no run-time costs because all the extra work will be done at compile time. >> >> ## Implementation details >> >> - Already the current implementation of `-XX:+OmitStackTraceInFastThrow` potentially lazy-allocates the empty singleton exceptions like AIOOBE in `ciEnv::ArrayStoreException_instance()`. With this change, if running with `-XX:+StackFrameInFastThrow` we will always allocate new exception objects and populate them with the stack frames which are statically available at compile time (see `java_lang_Throwable::fill_in_stack_trace_of_implicit_exception()`). >> - Because nmethods don't act as strong GC roots, we have to create a global JNI handle for every newly generated exception to prevent GC from collecting them. >> - In order to avoid a memory leak we have to release these global JNI handles once a nmethod gets unloaded. In order to achieve this, I've added a new section "implicit exceptions" to the nmethod which holds these JNI handles. >> - While adding the new "implicit exceptions" section to the corresponding stats (`print_nmethod_stats()` and printing routines (`nmethod::print()`) I realized that a previous change ([JDK-8254231: Implementation of Foreign Linker API (Incubator)](https://bugs.openjdk.java.net/browse/JDK-8254231)) had already introduced a new nmethod section ("native invokers") but missed to add it to the corresponding stats and printing routines so I've added that section as well. >> - The `#ifdef COMPILER2` guards are only required to not break the `zero`/`minimal` builds. >> - The JTreg test is using `-XX:PerMethodTrapLimit=0` to handle all implicit exceptions as "hot". This makes the test simpler and at the same time provokes the allocation of more implicit exceptions. >> - Manually verified that the created Exception objects are freed by GC once the corresponding nmethods have been flushed. >> - Manual "stress" test with a very small heap and continuous recompilation of methods with explicit exceptions to provoke GCs during compilation didn't reveal any issues. > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Create implcit exceptions with an array of StackTraceElements right away instead of creating a backtrace. This prevents that implicit exceptions will keep classes alive due to Java mirrors in the backtrace. If this was the interpreter, it seems like these exception objects could go into the class constant pool as condy objects. But because the ideal backtrace includes inlining information, it is a use case for something I'll call nmethod constant pools. GC would probably need to scan it as strong roots (I'm not a GC expert), but it would get rid of the need for the JNI global refs. And the exception objects could be allocated on a slow path when first needed rather than eagerly at compile time. I have other uses in mind for nmethod constant pools, so I'm showing my bias here, but I also think a condy approach extends the system in a more elegant way. ------------- PR: https://git.openjdk.java.net/jdk/pull/5392 From kbarrett at openjdk.java.net Sat Sep 25 00:18:04 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sat, 25 Sep 2021 00:18:04 GMT Subject: RFR: 8264707: HotSpot Style Guide should permit use of lambda [v2] In-Reply-To: References: Message-ID: On Thu, 23 Sep 2021 22:39:29 GMT, Coleen Phillimore wrote: >> Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Merge branch 'master' into permit_lambda >> - terminology fix >> - add scope guard and some example uses >> - G1 SATB filter lambda >> - new local functions section > > src/hotspot/share/compiler/compilerOracle.cpp line 787: > >> 785: >> 786: char* original_line = os::strdup(line, mtInternal); >> 787: auto g = make_guard([&] { os::free(original_line); }); > > I have to admit that this usage is very mysterious. I know you're going to take it out but using lambdas for something that's really hard to parse doesn't seem like a win. Does save lines of code though. I think a different approach may be possible, that would have usage like this: char* original_line = os::strdup(line, mtInternal); ScopeGuard guard([&] { os::free(original_line); }); This would use something like (or maybe directly use) a solution to the type-erased lambda capturing problem a la std::function. ------------- PR: https://git.openjdk.java.net/jdk/pull/5144 From github.com+741251+turbanoff at openjdk.java.net Sat Sep 25 11:15:20 2021 From: github.com+741251+turbanoff at openjdk.java.net (Andrey Turbanov) Date: Sat, 25 Sep 2021 11:15:20 GMT Subject: RFR: 8272992: Replace usages of Collections.sort with List.sort call in jdk.* modules [v3] In-Reply-To: <9xbhI0rwD3XbAHZFfQAkJHYivbC5F4N085RuSVWx8HU=.8a470c93-5fee-4981-97e4-afb6cb1147b9@github.com> References: <9xbhI0rwD3XbAHZFfQAkJHYivbC5F4N085RuSVWx8HU=.8a470c93-5fee-4981-97e4-afb6cb1147b9@github.com> Message-ID: > Collections.sort is just a wrapper, so it is better to use an instance method directly. Andrey Turbanov has updated the pull request incrementally with one additional commit since the last revision: 8272992: Replace usages of Collections.sort with List.sort call in jdk.* modules extracted jdk.jfr changes into separate PR. Rollback them here. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5230/files - new: https://git.openjdk.java.net/jdk/pull/5230/files/fcf53eda..2ce0f043 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5230&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5230&range=01-02 Stats: 30 lines in 10 files changed: 10 ins; 1 del; 19 mod Patch: https://git.openjdk.java.net/jdk/pull/5230.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5230/head:pull/5230 PR: https://git.openjdk.java.net/jdk/pull/5230 From github.com+741251+turbanoff at openjdk.java.net Sat Sep 25 11:15:24 2021 From: github.com+741251+turbanoff at openjdk.java.net (Andrey Turbanov) Date: Sat, 25 Sep 2021 11:15:24 GMT Subject: RFR: 8272992: Replace usages of Collections.sort with List.sort call in jdk.* modules [v2] In-Reply-To: References: <9xbhI0rwD3XbAHZFfQAkJHYivbC5F4N085RuSVWx8HU=.8a470c93-5fee-4981-97e4-afb6cb1147b9@github.com> Message-ID: On Tue, 14 Sep 2021 07:46:12 GMT, Andrey Turbanov wrote: >> Collections.sort is just a wrapper, so it is better to use an instance method directly. > > Andrey Turbanov has updated the pull request incrementally with one additional commit since the last revision: > > 8272992: Replace usages of Collections.sort with List.sort call in jdk.* modules Extracted jdk.jfr changes into separate PR https://github.com/openjdk/jdk/pull/5696 (JDK-8274319) ------------- PR: https://git.openjdk.java.net/jdk/pull/5230 From jrose at openjdk.java.net Sat Sep 25 20:16:00 2021 From: jrose at openjdk.java.net (John R Rose) Date: Sat, 25 Sep 2021 20:16:00 GMT Subject: RFR: 8264707: HotSpot Style Guide should permit use of lambda [v2] In-Reply-To: References: Message-ID: On Tue, 21 Sep 2021 22:05:09 GMT, Kim Barrett wrote: >> Please review this proposal to permit the use of lambda expressions in >> HotSpot code, with some restrictions and suggestions for good usage within >> HotSpot code. Lambda expressions were added in C++11, and provide a more >> expressive syntax for local functions, with a number of use-cases where they >> can improve readability by eliminating a lot of uninteresting boilerplate. >> >> Some example uses are included, but are not part of the proposed change. >> They will be removed from the PR before it is pushed. (In particular, the >> ScopeGuard utility uses move semantics, the use of which hasn't been >> approved or even discussed.) They are given to show some of the benefits >> that might accrue from permitting the use of lambdas. In particular, they >> highlight some of the code reduction that is possible. Some of these code >> changes might be proposed in the future, using the normal PR process. >> >> This is a modification of the Style Guide, so rough consensus among the >> HotSpot Group members is required to make this change. Only Group members >> should vote for approval (via the github PR), though reasoned objections or >> comments from anyone will be considered. A decision on this proposal will >> not be made before Wednesday 1-Sep-2021 at 12h00 UTC. >> >> Since we're piggybacking on github PRs here, please use the PR review >> process to approve (click on Review Changes > Approve), rather than sending >> a "vote: yes" email reply that would be normal for a CFV. > > Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into permit_lambda > - terminology fix > - add scope guard and some example uses > - G1 SATB filter lambda > - new local functions section Reviewed. Thank you for collecting so much practical and useful advice. ------------- PR: https://git.openjdk.java.net/jdk/pull/5144 From kbarrett at openjdk.java.net Sat Sep 25 22:42:50 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sat, 25 Sep 2021 22:42:50 GMT Subject: RFR: 8274024: Use regular accessors to internal fields of oopDesc In-Reply-To: References: Message-ID: On Mon, 20 Sep 2021 18:59:34 GMT, Roman Kennke wrote: > Currently, we are using 'raw' accessors to initialize the mark, Klass*, (array-)length and klass_gap of oops. This is ugly and confusing and we should just use the regular accessors. > > Testing: > - [ ] tier1 > - [ ] tier2 > - [ ] hotspot_gc I agree with Stefan and David - don't treat the values as oops until they are well-formed. It turns out there's a pre-existing problem about that; both the proposed changes here and pre-existing code technically invoke UB. See JDK-8274322. ------------- PR: https://git.openjdk.java.net/jdk/pull/5585 From iklam at openjdk.java.net Sun Sep 26 04:49:16 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Sun, 26 Sep 2021 04:49:16 GMT Subject: RFR: 8264707: HotSpot Style Guide should permit use of lambda [v2] In-Reply-To: References: Message-ID: On Tue, 21 Sep 2021 22:05:09 GMT, Kim Barrett wrote: >> Please review this proposal to permit the use of lambda expressions in >> HotSpot code, with some restrictions and suggestions for good usage within >> HotSpot code. Lambda expressions were added in C++11, and provide a more >> expressive syntax for local functions, with a number of use-cases where they >> can improve readability by eliminating a lot of uninteresting boilerplate. >> >> Some example uses are included, but are not part of the proposed change. >> They will be removed from the PR before it is pushed. (In particular, the >> ScopeGuard utility uses move semantics, the use of which hasn't been >> approved or even discussed.) They are given to show some of the benefits >> that might accrue from permitting the use of lambdas. In particular, they >> highlight some of the code reduction that is possible. Some of these code >> changes might be proposed in the future, using the normal PR process. >> >> This is a modification of the Style Guide, so rough consensus among the >> HotSpot Group members is required to make this change. Only Group members >> should vote for approval (via the github PR), though reasoned objections or >> comments from anyone will be considered. A decision on this proposal will >> not be made before Wednesday 1-Sep-2021 at 12h00 UTC. >> >> Since we're piggybacking on github PRs here, please use the PR review >> process to approve (click on Review Changes > Approve), rather than sending >> a "vote: yes" email reply that would be normal for a CFV. > > Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into permit_lambda > - terminology fix > - add scope guard and some example uses > - G1 SATB filter lambda > - new local functions section Marked as reviewed by iklam (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5144 From kbarrett at openjdk.java.net Sun Sep 26 04:53:07 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sun, 26 Sep 2021 04:53:07 GMT Subject: RFR: 8269537: memset() is called after operator new [v2] In-Reply-To: References: Message-ID: On Fri, 24 Sep 2021 08:50:18 GMT, Leo Korinth wrote: >> I hit the new assert when not on Linux, I guess it has to do with the initialization of the thread local variable. > > Thanks Ioi for making me adding the assert!!! The sequencing of the allocation function and the arguments to the constructor is not what I thought, so my "solution" is not working. I am unsure how to resolve this in a good way. We could probably have a small thread local collection of (type, address) pairs, but I wonder if it is not better removing this debug information altogether. It is to my knowledge only used in GrowableArray (to limit the type of its internal allocations) and to hinder delete on resource allocated objects. A collection of (type, address) pairs is still problematic. It still requires assuming that the address of the derived object is the same as the address of the ResourceObj subobject, which isn't guaranteed, though the current mechanism also depends on that being true. I think there might be other places in HotSpot where we're making that assumption too, unfortunately. ------------- PR: https://git.openjdk.java.net/jdk/pull/5387 From jiefu at openjdk.java.net Sun Sep 26 17:18:14 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Sun, 26 Sep 2021 17:18:14 GMT Subject: RFR: 8274325: C4819 warning at vm_version_x86.cpp on Windows after JDK-8234160 Message-ID: Hi all, I'd like to fix the C4819 warning at vm_version_x86.cpp on Windows after JDK-8234160. The reason is that there are non-ASCII chars in the comments of vm_version_x86.cpp after JDK-8234160. It makes the code to be less portable. It would be better to fix it. The fix just removing those chars in the comments. Thanks. Best regards, Jie ------------- Commit messages: - 8274325: C4819 warning at vm_version_x86.cpp on Windows after JDK-8234160 Changes: https://git.openjdk.java.net/jdk/pull/5701/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5701&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8274325 Stats: 37 lines in 1 file changed: 0 ins; 0 del; 37 mod Patch: https://git.openjdk.java.net/jdk/pull/5701.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5701/head:pull/5701 PR: https://git.openjdk.java.net/jdk/pull/5701 From stuefe at openjdk.java.net Sun Sep 26 17:34:23 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sun, 26 Sep 2021 17:34:23 GMT Subject: RFR: JDK-8274320: os::fork_and_exec() should be using posix_spawn Message-ID: <5hepN-Ub74DMHnpbM_TdPaHmI1zEQddSobpy6AIlT98=.11c35eb4-02c2-4bcc-8153-d07b282f3da3@github.com> Hi, may I have reviews for this small patch please? `os::fork_and_exec()`, used in the hotspot to spawn child programs (scripts etc) in error situations, should be using `posix_spawn()`. ATM it uses either `fork()` or `vfork()`. `vfork()` got deprecated on MacOS and we get build errors (JDK-8274293) - even though in this case it would be completely fine to use. This leaves us with `fork()` for MacOS, which has the known problems with large-footprint-parents. This matters here especially since we also use os::fork_and_exec to implement `-XX:OnError` for OOM situations. We already use posix_spawn() as default for Runtime.exec() since JDK 15, and it is available on all our Unices. We also should use it here. I kept the name of the function (fork_and_exec) since people know it, even though it's more incorrect now than before. Tests: - manual tests using -XX:OnError with various scripts, including checking that env variables are passed correctly - manually ran runtime/ErrorHandling tests - GHAs ------------- Commit messages: - Merge branch 'master' into JDK-8274320-os-fork-and-exec-should-be-using-posix-spawn - JDK-8274320-os-fork-and-exec-should-be-using-posix-spawn Changes: https://git.openjdk.java.net/jdk/pull/5698/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5698&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8274320 Stats: 43 lines in 4 files changed: 4 ins; 28 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/5698.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5698/head:pull/5698 PR: https://git.openjdk.java.net/jdk/pull/5698 From dholmes at openjdk.java.net Mon Sep 27 03:39:53 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 27 Sep 2021 03:39:53 GMT Subject: RFR: 8274325: C4819 warning at vm_version_x86.cpp on Windows after JDK-8234160 In-Reply-To: References: Message-ID: <5RTcUGUYI5U8WHCQMgtz2DQ7q14X43c8F5u44eyetH4=.5410676e-c404-46da-8af9-7d55645a8cb7@github.com> On Sun, 26 Sep 2021 02:47:06 GMT, Jie Fu wrote: > Hi all, > > I'd like to fix the C4819 warning at vm_version_x86.cpp on Windows after JDK-8234160. > > The reason is that there are non-ASCII chars in the comments of vm_version_x86.cpp after JDK-8234160. > It makes the code to be less portable. > > It would be better to fix it. > > The fix just removing those chars in the comments. > > Thanks. > Best regards, > Jie I suggest replacing with ascii equivalents: Intel(R) Core(TM) ------------- PR: https://git.openjdk.java.net/jdk/pull/5701 From jiefu at openjdk.java.net Mon Sep 27 03:58:58 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Mon, 27 Sep 2021 03:58:58 GMT Subject: RFR: 8274325: C4819 warning at vm_version_x86.cpp on Windows after JDK-8234160 [v2] In-Reply-To: References: Message-ID: > Hi all, > > I'd like to fix the C4819 warning at vm_version_x86.cpp on Windows after JDK-8234160. > > The reason is that there are non-ASCII chars in the comments of vm_version_x86.cpp after JDK-8234160. > It makes the code to be less portable. > > It would be better to fix it. > > The fix just removing those chars in the comments. > > Thanks. > Best regards, > Jie Jie Fu has updated the pull request incrementally with one additional commit since the last revision: Use (R) and (TM) ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5701/files - new: https://git.openjdk.java.net/jdk/pull/5701/files/9e7a1cfe..5b3ea4bf Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5701&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5701&range=00-01 Stats: 37 lines in 1 file changed: 0 ins; 0 del; 37 mod Patch: https://git.openjdk.java.net/jdk/pull/5701.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5701/head:pull/5701 PR: https://git.openjdk.java.net/jdk/pull/5701 From jiefu at openjdk.java.net Mon Sep 27 03:58:58 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Mon, 27 Sep 2021 03:58:58 GMT Subject: RFR: 8274325: C4819 warning at vm_version_x86.cpp on Windows after JDK-8234160 In-Reply-To: <5RTcUGUYI5U8WHCQMgtz2DQ7q14X43c8F5u44eyetH4=.5410676e-c404-46da-8af9-7d55645a8cb7@github.com> References: <5RTcUGUYI5U8WHCQMgtz2DQ7q14X43c8F5u44eyetH4=.5410676e-c404-46da-8af9-7d55645a8cb7@github.com> Message-ID: On Mon, 27 Sep 2021 03:36:35 GMT, David Holmes wrote: > I suggest replacing with ascii equivalents: Intel(R) Core(TM) Updated. Thanks @dholmes-ora . ------------- PR: https://git.openjdk.java.net/jdk/pull/5701 From dholmes at openjdk.java.net Mon Sep 27 06:12:09 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Mon, 27 Sep 2021 06:12:09 GMT Subject: RFR: 8274325: C4819 warning at vm_version_x86.cpp on Windows after JDK-8234160 [v2] In-Reply-To: References: Message-ID: On Mon, 27 Sep 2021 03:58:58 GMT, Jie Fu wrote: >> Hi all, >> >> I'd like to fix the C4819 warning at vm_version_x86.cpp on Windows after JDK-8234160. >> >> The reason is that there are non-ASCII chars in the comments of vm_version_x86.cpp after JDK-8234160. >> It makes the code to be less portable. >> >> It would be better to fix it. >> >> The fix just removing those chars in the comments. >> >> Thanks. >> Best regards, >> Jie > > Jie Fu has updated the pull request incrementally with one additional commit since the last revision: > > Use (R) and (TM) Looks good and trivial. Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5701 From sjohanss at openjdk.java.net Mon Sep 27 07:59:20 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Mon, 27 Sep 2021 07:59:20 GMT Subject: RFR: 8264707: HotSpot Style Guide should permit use of lambda [v2] In-Reply-To: References: Message-ID: On Tue, 21 Sep 2021 22:05:09 GMT, Kim Barrett wrote: >> Please review this proposal to permit the use of lambda expressions in >> HotSpot code, with some restrictions and suggestions for good usage within >> HotSpot code. Lambda expressions were added in C++11, and provide a more >> expressive syntax for local functions, with a number of use-cases where they >> can improve readability by eliminating a lot of uninteresting boilerplate. >> >> Some example uses are included, but are not part of the proposed change. >> They will be removed from the PR before it is pushed. (In particular, the >> ScopeGuard utility uses move semantics, the use of which hasn't been >> approved or even discussed.) They are given to show some of the benefits >> that might accrue from permitting the use of lambdas. In particular, they >> highlight some of the code reduction that is possible. Some of these code >> changes might be proposed in the future, using the normal PR process. >> >> This is a modification of the Style Guide, so rough consensus among the >> HotSpot Group members is required to make this change. Only Group members >> should vote for approval (via the github PR), though reasoned objections or >> comments from anyone will be considered. A decision on this proposal will >> not be made before Wednesday 1-Sep-2021 at 12h00 UTC. >> >> Since we're piggybacking on github PRs here, please use the PR review >> process to approve (click on Review Changes > Approve), rather than sending >> a "vote: yes" email reply that would be normal for a CFV. > > Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into permit_lambda > - terminology fix > - add scope guard and some example uses > - G1 SATB filter lambda > - new local functions section Vote: yes ------------- Marked as reviewed by sjohanss (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5144 From jiefu at openjdk.java.net Mon Sep 27 09:27:32 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Mon, 27 Sep 2021 09:27:32 GMT Subject: RFR: 8274325: C4819 warning at vm_version_x86.cpp on Windows after JDK-8234160 [v2] In-Reply-To: References: Message-ID: On Mon, 27 Sep 2021 06:08:59 GMT, David Holmes wrote: > Looks good and trivial. > > Thanks, > David Thanks @dholmes-ora . ------------- PR: https://git.openjdk.java.net/jdk/pull/5701 From jiefu at openjdk.java.net Mon Sep 27 09:42:28 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Mon, 27 Sep 2021 09:42:28 GMT Subject: Integrated: 8274325: C4819 warning at vm_version_x86.cpp on Windows after JDK-8234160 In-Reply-To: References: Message-ID: On Sun, 26 Sep 2021 02:47:06 GMT, Jie Fu wrote: > Hi all, > > I'd like to fix the C4819 warning at vm_version_x86.cpp on Windows after JDK-8234160. > > The reason is that there are non-ASCII chars in the comments of vm_version_x86.cpp after JDK-8234160. > It makes the code to be less portable. > > It would be better to fix it. > > The fix just removing those chars in the comments. > > Thanks. > Best regards, > Jie This pull request has now been integrated. Changeset: 7426fd4c Author: Jie Fu URL: https://git.openjdk.java.net/jdk/commit/7426fd4c9c0428411d2c4a2c675fcad6646ea90a Stats: 37 lines in 1 file changed: 0 ins; 0 del; 37 mod 8274325: C4819 warning at vm_version_x86.cpp on Windows after JDK-8234160 Reviewed-by: dholmes ------------- PR: https://git.openjdk.java.net/jdk/pull/5701 From eosterlund at openjdk.java.net Mon Sep 27 09:57:31 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 27 Sep 2021 09:57:31 GMT Subject: RFR: 8264707: HotSpot Style Guide should permit use of lambda [v2] In-Reply-To: References: Message-ID: On Tue, 21 Sep 2021 22:05:09 GMT, Kim Barrett wrote: >> Please review this proposal to permit the use of lambda expressions in >> HotSpot code, with some restrictions and suggestions for good usage within >> HotSpot code. Lambda expressions were added in C++11, and provide a more >> expressive syntax for local functions, with a number of use-cases where they >> can improve readability by eliminating a lot of uninteresting boilerplate. >> >> Some example uses are included, but are not part of the proposed change. >> They will be removed from the PR before it is pushed. (In particular, the >> ScopeGuard utility uses move semantics, the use of which hasn't been >> approved or even discussed.) They are given to show some of the benefits >> that might accrue from permitting the use of lambdas. In particular, they >> highlight some of the code reduction that is possible. Some of these code >> changes might be proposed in the future, using the normal PR process. >> >> This is a modification of the Style Guide, so rough consensus among the >> HotSpot Group members is required to make this change. Only Group members >> should vote for approval (via the github PR), though reasoned objections or >> comments from anyone will be considered. A decision on this proposal will >> not be made before Wednesday 1-Sep-2021 at 12h00 UTC. >> >> Since we're piggybacking on github PRs here, please use the PR review >> process to approve (click on Review Changes > Approve), rather than sending >> a "vote: yes" email reply that would be normal for a CFV. > > Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into permit_lambda > - terminology fix > - add scope guard and some example uses > - G1 SATB filter lambda > - new local functions section Marked as reviewed by eosterlund (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5144 From ccheung at openjdk.java.net Mon Sep 27 18:10:31 2021 From: ccheung at openjdk.java.net (Calvin Cheung) Date: Mon, 27 Sep 2021 18:10:31 GMT Subject: RFR: 8273508: Support archived heap objects in SerialGC [v2] In-Reply-To: References: Message-ID: On Tue, 21 Sep 2021 17:35:13 GMT, Ioi Lam wrote: >> When `-XX:+UseSerialGC is enabled`, load the CDS archived heap objects into `SerialHeap::old_gen()` during VM bootstrap. This improves VM start-up time, mostly because the module graph can be loaded from the archive. >> >> >> $ perf stat -r 40 java -XX:+UseSerialGC -version >> >> Before: 0.042484507 seconds time elapsed ( +- 0.72% ) >> After: 0.031671000 seconds time elapsed ( +- 0.72% ) >> >> >> Changes in the gc subdirectories are contributed by @tschatzl > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > @tschatzl comments I've reviewed the CDS related changes. Just one minor comment below. src/hotspot/share/cds/heapShared.cpp line 1762: > 1760: if (_loaded_heap_bottom != 0) { > 1761: HeapWord* bottom = (HeapWord*)_loaded_heap_bottom; > 1762: HeapWord* top = (HeapWord*)_loaded_heap_top; Maybe add `assert(_loaded_heap_top != 0, ?must be?)` ? ------------- Marked as reviewed by ccheung (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5596 From iveresov at openjdk.java.net Mon Sep 27 20:36:16 2021 From: iveresov at openjdk.java.net (Igor Veresov) Date: Mon, 27 Sep 2021 20:36:16 GMT Subject: RFR: 8264707: HotSpot Style Guide should permit use of lambda [v2] In-Reply-To: References: Message-ID: On Tue, 21 Sep 2021 22:05:09 GMT, Kim Barrett wrote: >> Please review this proposal to permit the use of lambda expressions in >> HotSpot code, with some restrictions and suggestions for good usage within >> HotSpot code. Lambda expressions were added in C++11, and provide a more >> expressive syntax for local functions, with a number of use-cases where they >> can improve readability by eliminating a lot of uninteresting boilerplate. >> >> Some example uses are included, but are not part of the proposed change. >> They will be removed from the PR before it is pushed. (In particular, the >> ScopeGuard utility uses move semantics, the use of which hasn't been >> approved or even discussed.) They are given to show some of the benefits >> that might accrue from permitting the use of lambdas. In particular, they >> highlight some of the code reduction that is possible. Some of these code >> changes might be proposed in the future, using the normal PR process. >> >> This is a modification of the Style Guide, so rough consensus among the >> HotSpot Group members is required to make this change. Only Group members >> should vote for approval (via the github PR), though reasoned objections or >> comments from anyone will be considered. A decision on this proposal will >> not be made before Wednesday 1-Sep-2021 at 12h00 UTC. >> >> Since we're piggybacking on github PRs here, please use the PR review >> process to approve (click on Review Changes > Approve), rather than sending >> a "vote: yes" email reply that would be normal for a CFV. > > Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into permit_lambda > - terminology fix > - add scope guard and some example uses > - G1 SATB filter lambda > - new local functions section Marked as reviewed by iveresov (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5144 From kvn at openjdk.java.net Mon Sep 27 22:56:39 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 27 Sep 2021 22:56:39 GMT Subject: RFR: 8264707: HotSpot Style Guide should permit use of lambda [v2] In-Reply-To: References: Message-ID: <-YCLk8eXR7RRGzgTAG3JbNAm5UJkiHRVfrWQ6t-FNIQ=.1605a4e5-ecc1-495a-a04c-fb5d8137a90b@github.com> On Tue, 21 Sep 2021 22:05:09 GMT, Kim Barrett wrote: >> Please review this proposal to permit the use of lambda expressions in >> HotSpot code, with some restrictions and suggestions for good usage within >> HotSpot code. Lambda expressions were added in C++11, and provide a more >> expressive syntax for local functions, with a number of use-cases where they >> can improve readability by eliminating a lot of uninteresting boilerplate. >> >> Some example uses are included, but are not part of the proposed change. >> They will be removed from the PR before it is pushed. (In particular, the >> ScopeGuard utility uses move semantics, the use of which hasn't been >> approved or even discussed.) They are given to show some of the benefits >> that might accrue from permitting the use of lambdas. In particular, they >> highlight some of the code reduction that is possible. Some of these code >> changes might be proposed in the future, using the normal PR process. >> >> This is a modification of the Style Guide, so rough consensus among the >> HotSpot Group members is required to make this change. Only Group members >> should vote for approval (via the github PR), though reasoned objections or >> comments from anyone will be considered. A decision on this proposal will >> not be made before Wednesday 1-Sep-2021 at 12h00 UTC. >> >> Since we're piggybacking on github PRs here, please use the PR review >> process to approve (click on Review Changes > Approve), rather than sending >> a "vote: yes" email reply that would be normal for a CFV. > > Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into permit_lambda > - terminology fix > - add scope guard and some example uses > - G1 SATB filter lambda > - new local functions section Approved. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5144 From xliu at openjdk.java.net Mon Sep 27 23:27:39 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Mon, 27 Sep 2021 23:27:39 GMT Subject: RFR: 8273608: Deadlock when jcmd of OnError attaches to itself [v2] In-Reply-To: <3mILJs2Lcq7t5gUDP70FH2LVsm-NT2UTsm1JY-rCKB0=.d4dd4b55-9476-43ad-a0cf-ce6d2a9bef4e@github.com> References: <3mILJs2Lcq7t5gUDP70FH2LVsm-NT2UTsm1JY-rCKB0=.d4dd4b55-9476-43ad-a0cf-ce6d2a9bef4e@github.com> Message-ID: > This patch allows the custom commands of OnError to attach to HotSpot itself. > It sets the thread of report_and_die() to Native before os::fork_and_exec(cmd). > This prevents cmds which require safepoint synchronization from deadlock. > eg. OnError='jcmd %p Thread.print'. > > Without this patch, we will encounter a deadlock at safepoint synchronization. > `"main" #1` is the very thread which executes `os::fork_and_exec(cmd)`. > > > Aborting due to java.lang.OutOfMemoryError: Java heap space > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (debug.cpp:364), pid=94632, tid=94633 > # fatal error: OutOfMemory encountered: Java heap space > # > # JRE version: OpenJDK Runtime Environment (18.0) (build 18-internal+0-adhoc.xxinliu.jdk) > # Java VM: OpenJDK 64-Bit Server VM (18-internal+0-adhoc.xxinliu.jdk, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # /local/home/xxinliu/JDK-2085/hs_err_pid94632.log > # > # -XX:OnError="jcmd %p Thread.print" > # Executing /bin/sh -c "jcmd 94632 Thread.print" ... > 94632: > [10.616s][warning][safepoint] > [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timeout detected: > [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timed out while spinning to reach a safepoint. > [10.616s][warning][safepoint] # SafepointSynchronize::begin: Threads which did not reach the safepoint: > [10.616s][warning][safepoint] # "main" #1 prio=5 os_prio=0 cpu=236.97ms elapsed=10.61s tid=0x00007f01b00232f0 nid=94633 runnable [0x00007f01b7a08000] > [10.616s][warning][safepoint] java.lang.Thread.State: RUNNABLE > [10.616s][warning][safepoint] > [10.616s][warning][safepoint] # SafepointSynchronize::begin: (End of list) Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Add a new testcase for OutOfMemoryError thrown from NIO. - Make state changer one way in VMError. Add a test to show that jcmd %p won't get stuck. - Merge branch 'master' into JDK-8273608 - 8273608: Deadlock when jcmd of OnError attaches to itself Allow custom command of OnError to attach to HotSpot itself. This patch sets the thread of report_and_die() to Native before os::fork_and_exec(cmd). This prevents cmds which require safepoint synchronization from deadlock. eg. OnError='jcmd %p Thread.print'. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5590/files - new: https://git.openjdk.java.net/jdk/pull/5590/files/afd1610d..bf684e5b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5590&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5590&range=00-01 Stats: 31232 lines in 1010 files changed: 21697 ins; 4654 del; 4881 mod Patch: https://git.openjdk.java.net/jdk/pull/5590.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5590/head:pull/5590 PR: https://git.openjdk.java.net/jdk/pull/5590 From kbarrett at openjdk.java.net Tue Sep 28 03:23:16 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 28 Sep 2021 03:23:16 GMT Subject: RFR: 8274322: Problems with oopDesc construction Message-ID: Please review this change to the default constructor for markWord and associated "change" to construction of oopDesc. The current code never invokes the constructor for oopDesc or any of its derived classes. For that to be permissible according to the Standard, those classes must be trivially default constructible. And for that to be the case, the markWord default constructor must be trivial. This change consists of three parts. (1) The markWord default constructor is changed to be trivial, so the default constructors for oopDesc and classes derived from it will also be trivial. It wasn't previously trivial because the mechanism for making it so (a default definition) is a C++11 feature that wasn't yet supported when the previous constructor was defined. (2) This change also adds static asserts to verify the relevant classes have trivial default constructors, to prevent later changes from unintentionally breaking this. (3) This change also makes oopDesc noncopyable, to prevent inadvertent usage of these operations that don't make any sense. A different approach would be to always use placement new with an appropriate constructor to perform the initialization, perhaps encapsulated in factory functions. I did some exploration in that direction. It's a much larger and more complex change, though the final behavior (use constructors for initialization) is simpler. Testing: tier1 ------------- Commit messages: - improve oopDesc and markWord construction Changes: https://git.openjdk.java.net/jdk/pull/5729/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5729&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8274322 Stats: 32 lines in 5 files changed: 28 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/5729.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5729/head:pull/5729 PR: https://git.openjdk.java.net/jdk/pull/5729 From iklam at openjdk.java.net Tue Sep 28 03:56:31 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Tue, 28 Sep 2021 03:56:31 GMT Subject: RFR: 8273508: Support archived heap objects in SerialGC [v3] In-Reply-To: References: Message-ID: > When `-XX:+UseSerialGC is enabled`, load the CDS archived heap objects into `SerialHeap::old_gen()` during VM bootstrap. This improves VM start-up time, mostly because the module graph can be loaded from the archive. > > > $ perf stat -r 40 java -XX:+UseSerialGC -version > > Before: 0.042484507 seconds time elapsed ( +- 0.72% ) > After: 0.031671000 seconds time elapsed ( +- 0.72% ) > > > Changes in the gc subdirectories are contributed by @tschatzl Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Exclude TestSerialGCWithCDS.java from hotspot_appcds_dynamic test group - Comments from @calvinccheung - Merge branch 'master' into 8273508-archived-heap-objects-for-serial-gc - @tschatzl comments - 8273508: Support archived heap objects in SerialGC ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5596/files - new: https://git.openjdk.java.net/jdk/pull/5596/files/7d841aae..41ce0bf0 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5596&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5596&range=01-02 Stats: 21481 lines in 572 files changed: 15377 ins; 2333 del; 3771 mod Patch: https://git.openjdk.java.net/jdk/pull/5596.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5596/head:pull/5596 PR: https://git.openjdk.java.net/jdk/pull/5596 From kbarrett at openjdk.java.net Tue Sep 28 05:27:06 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 28 Sep 2021 05:27:06 GMT Subject: RFR: 8264707: HotSpot Style Guide should permit use of lambda [v2] In-Reply-To: <-YCLk8eXR7RRGzgTAG3JbNAm5UJkiHRVfrWQ6t-FNIQ=.1605a4e5-ecc1-495a-a04c-fb5d8137a90b@github.com> References: <-YCLk8eXR7RRGzgTAG3JbNAm5UJkiHRVfrWQ6t-FNIQ=.1605a4e5-ecc1-495a-a04c-fb5d8137a90b@github.com> Message-ID: On Mon, 27 Sep 2021 22:52:31 GMT, Vladimir Kozlov wrote: >> Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Merge branch 'master' into permit_lambda >> - terminology fix >> - add scope guard and some example uses >> - G1 SATB filter lambda >> - new local functions section > > Approved. Thanks @vnkozlov and all the other reviewers and commenters. As promised, I'll be backing out the example code changes before integrating. I'll also regenerate the html version of the style guide and include that in the integration. ------------- PR: https://git.openjdk.java.net/jdk/pull/5144 From kbarrett at openjdk.java.net Tue Sep 28 05:33:54 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 28 Sep 2021 05:33:54 GMT Subject: RFR: 8274024: Use regular accessors to internal fields of oopDesc In-Reply-To: References: Message-ID: On Sat, 25 Sep 2021 22:39:56 GMT, Kim Barrett wrote: > It turns out there's a pre-existing problem about that; both the proposed changes here and pre-existing code technically invoke UB. See JDK-8274322. See https://github.com/openjdk/jdk/pull/5729 ------------- PR: https://git.openjdk.java.net/jdk/pull/5585 From kbarrett at openjdk.java.net Tue Sep 28 06:00:40 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 28 Sep 2021 06:00:40 GMT Subject: RFR: 8264707: HotSpot Style Guide should permit use of lambda [v3] In-Reply-To: References: Message-ID: > Please review this proposal to permit the use of lambda expressions in > HotSpot code, with some restrictions and suggestions for good usage within > HotSpot code. Lambda expressions were added in C++11, and provide a more > expressive syntax for local functions, with a number of use-cases where they > can improve readability by eliminating a lot of uninteresting boilerplate. > > Some example uses are included, but are not part of the proposed change. > They will be removed from the PR before it is pushed. (In particular, the > ScopeGuard utility uses move semantics, the use of which hasn't been > approved or even discussed.) They are given to show some of the benefits > that might accrue from permitting the use of lambdas. In particular, they > highlight some of the code reduction that is possible. Some of these code > changes might be proposed in the future, using the normal PR process. > > This is a modification of the Style Guide, so rough consensus among the > HotSpot Group members is required to make this change. Only Group members > should vote for approval (via the github PR), though reasoned objections or > comments from anyone will be considered. A decision on this proposal will > not be made before Wednesday 1-Sep-2021 at 12h00 UTC. > > Since we're piggybacking on github PRs here, please use the PR review > process to approve (click on Review Changes > Approve), rather than sending > a "vote: yes" email reply that would be normal for a CFV. Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: - update html version of style guide - Merge branch 'master' into permit_lambda - Revert "G1 SATB filter lambda" This reverts commit a43ffa3af8706b1a3300840d5f549a3b30a42b3e. - Revert "add scope guard and some example uses" This reverts commit cc08f8b435eb0862d07a6ec6fc0af70b60c3b70e. - Merge branch 'master' into permit_lambda - terminology fix - add scope guard and some example uses - G1 SATB filter lambda - new local functions section ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5144/files - new: https://git.openjdk.java.net/jdk/pull/5144/files/1fd7efbc..aceef453 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5144&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5144&range=01-02 Stats: 16823 lines in 474 files changed: 10917 ins; 2349 del; 3557 mod Patch: https://git.openjdk.java.net/jdk/pull/5144.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5144/head:pull/5144 PR: https://git.openjdk.java.net/jdk/pull/5144 From kbarrett at openjdk.java.net Tue Sep 28 06:00:41 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 28 Sep 2021 06:00:41 GMT Subject: Integrated: 8264707: HotSpot Style Guide should permit use of lambda In-Reply-To: References: Message-ID: On Tue, 17 Aug 2021 13:49:43 GMT, Kim Barrett wrote: > Please review this proposal to permit the use of lambda expressions in > HotSpot code, with some restrictions and suggestions for good usage within > HotSpot code. Lambda expressions were added in C++11, and provide a more > expressive syntax for local functions, with a number of use-cases where they > can improve readability by eliminating a lot of uninteresting boilerplate. > > Some example uses are included, but are not part of the proposed change. > They will be removed from the PR before it is pushed. (In particular, the > ScopeGuard utility uses move semantics, the use of which hasn't been > approved or even discussed.) They are given to show some of the benefits > that might accrue from permitting the use of lambdas. In particular, they > highlight some of the code reduction that is possible. Some of these code > changes might be proposed in the future, using the normal PR process. > > This is a modification of the Style Guide, so rough consensus among the > HotSpot Group members is required to make this change. Only Group members > should vote for approval (via the github PR), though reasoned objections or > comments from anyone will be considered. A decision on this proposal will > not be made before Wednesday 1-Sep-2021 at 12h00 UTC. > > Since we're piggybacking on github PRs here, please use the PR review > process to approve (click on Review Changes > Approve), rather than sending > a "vote: yes" email reply that would be normal for a CFV. This pull request has now been integrated. Changeset: 3eca9c36 Author: Kim Barrett URL: https://git.openjdk.java.net/jdk/commit/3eca9c36a63595baee0659ac818fd5bedc528db1 Stats: 399 lines in 2 files changed: 386 ins; 11 del; 2 mod 8264707: HotSpot Style Guide should permit use of lambda Reviewed-by: stefank, dholmes, coleenp, iklam, sjohanss, eosterlund, iveresov, kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/5144 From iklam at openjdk.java.net Tue Sep 28 06:30:00 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Tue, 28 Sep 2021 06:30:00 GMT Subject: RFR: 8273508: Support archived heap objects in SerialGC [v4] In-Reply-To: References: Message-ID: > When `-XX:+UseSerialGC is enabled`, load the CDS archived heap objects into `SerialHeap::old_gen()` during VM bootstrap. This improves VM start-up time, mostly because the module graph can be loaded from the archive. > > > $ perf stat -r 40 java -XX:+UseSerialGC -version > > Before: 0.042484507 seconds time elapsed ( +- 0.72% ) > After: 0.031671000 seconds time elapsed ( +- 0.72% ) > > > Changes in the gc subdirectories are contributed by @tschatzl Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Merge branch 'master' into 8273508-archived-heap-objects-for-serial-gc - Exclude TestSerialGCWithCDS.java from hotspot_appcds_dynamic test group - Comments from @calvinccheung - Merge branch 'master' into 8273508-archived-heap-objects-for-serial-gc - @tschatzl comments - 8273508: Support archived heap objects in SerialGC ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5596/files - new: https://git.openjdk.java.net/jdk/pull/5596/files/41ce0bf0..1ae36713 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5596&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5596&range=02-03 Stats: 1785 lines in 29 files changed: 1293 ins; 360 del; 132 mod Patch: https://git.openjdk.java.net/jdk/pull/5596.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5596/head:pull/5596 PR: https://git.openjdk.java.net/jdk/pull/5596 From iklam at openjdk.java.net Tue Sep 28 06:30:01 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Tue, 28 Sep 2021 06:30:01 GMT Subject: RFR: 8273508: Support archived heap objects in SerialGC [v2] In-Reply-To: References: Message-ID: On Thu, 23 Sep 2021 08:27:03 GMT, Thomas Schatzl wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> @tschatzl comments > > GC changes (and the bits around them) seem good to me. > > We already discussed the (existing) pervasive usage of `uintptr_t` in other code calling the gc code for addresses in the java heap and using size_t/int/intx/whatever for offsets in private which I do not recommend to do. However if there is any change to be made, it's a separate issue. Thanks @tschatzl and @calvinccheung for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/5596 From iklam at openjdk.java.net Tue Sep 28 06:30:03 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Tue, 28 Sep 2021 06:30:03 GMT Subject: Integrated: 8273508: Support archived heap objects in SerialGC In-Reply-To: References: Message-ID: On Tue, 21 Sep 2021 06:11:58 GMT, Ioi Lam wrote: > When `-XX:+UseSerialGC is enabled`, load the CDS archived heap objects into `SerialHeap::old_gen()` during VM bootstrap. This improves VM start-up time, mostly because the module graph can be loaded from the archive. > > > $ perf stat -r 40 java -XX:+UseSerialGC -version > > Before: 0.042484507 seconds time elapsed ( +- 0.72% ) > After: 0.031671000 seconds time elapsed ( +- 0.72% ) > > > Changes in the gc subdirectories are contributed by @tschatzl This pull request has now been integrated. Changeset: 6a573b88 Author: Ioi Lam URL: https://git.openjdk.java.net/jdk/commit/6a573b888d4d3322b9165562f85e1b7b781a5ff1 Stats: 213 lines in 13 files changed: 180 ins; 2 del; 31 mod 8273508: Support archived heap objects in SerialGC Reviewed-by: tschatzl, ccheung ------------- PR: https://git.openjdk.java.net/jdk/pull/5596 From xliu at openjdk.java.net Tue Sep 28 07:10:13 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 28 Sep 2021 07:10:13 GMT Subject: RFR: 8273608: Deadlock when jcmd of OnError attaches to itself [v2] In-Reply-To: References: <3mILJs2Lcq7t5gUDP70FH2LVsm-NT2UTsm1JY-rCKB0=.d4dd4b55-9476-43ad-a0cf-ce6d2a9bef4e@github.com> Message-ID: <5Ry4BP9qhGIZqNRcF7RTKqvTlA7GLXAfGawHrHLftNQ=.9a3ff272-36a0-41dc-96d1-6770729a1546@github.com> On Mon, 27 Sep 2021 23:27:39 GMT, Xin Liu wrote: >> This patch allows the custom commands of OnError to attach to HotSpot itself. >> It sets the thread of report_and_die() to Native before os::fork_and_exec(cmd). >> This prevents cmds which require safepoint synchronization from deadlock. >> eg. OnError='jcmd %p Thread.print'. >> >> Without this patch, we will encounter a deadlock at safepoint synchronization. >> `"main" #1` is the very thread which executes `os::fork_and_exec(cmd)`. >> >> >> Aborting due to java.lang.OutOfMemoryError: Java heap space >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (debug.cpp:364), pid=94632, tid=94633 >> # fatal error: OutOfMemory encountered: Java heap space >> # >> # JRE version: OpenJDK Runtime Environment (18.0) (build 18-internal+0-adhoc.xxinliu.jdk) >> # Java VM: OpenJDK 64-Bit Server VM (18-internal+0-adhoc.xxinliu.jdk, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) >> # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again >> # >> # An error report file with more information is saved as: >> # /local/home/xxinliu/JDK-2085/hs_err_pid94632.log >> # >> # -XX:OnError="jcmd %p Thread.print" >> # Executing /bin/sh -c "jcmd 94632 Thread.print" ... >> 94632: >> [10.616s][warning][safepoint] >> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timeout detected: >> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timed out while spinning to reach a safepoint. >> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Threads which did not reach the safepoint: >> [10.616s][warning][safepoint] # "main" #1 prio=5 os_prio=0 cpu=236.97ms elapsed=10.61s tid=0x00007f01b00232f0 nid=94633 runnable [0x00007f01b7a08000] >> [10.616s][warning][safepoint] java.lang.Thread.State: RUNNABLE >> [10.616s][warning][safepoint] >> [10.616s][warning][safepoint] # SafepointSynchronize::begin: (End of list) > > Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Add a new testcase for OutOfMemoryError thrown from NIO. > - Make state changer one way in VMError. > > Add a test to show that jcmd %p won't get stuck. > - Merge branch 'master' into JDK-8273608 > - 8273608: Deadlock when jcmd of OnError attaches to itself > > Allow custom command of OnError to attach to HotSpot itself. This patch sets > the thread of report_and_die() to Native before os::fork_and_exec(cmd). This > prevents cmds which require safepoint synchronization from deadlock. > eg. OnError='jcmd %p Thread.print'. hi, Reviewers, I added a dedicated test `TestOutOfMemoryErrorFromNIO.java`. TestOnError is too special. `ErrorHandlerTest` crashes HotSpot right after JNI_Create_JVM before Attach Listener Thread. As a result, it can't trigger the deadlock. In TestOutOfMemoryErrorFromNIO.java, I insert 2 jcmd %p dcmds which require safepoint synchronization before and after echo. jcmd %p VM.info doesn't require safepoint. It is trivial for OnError. If this 'transition_into_native' works, I would like to use it in JfrEmergencyDump::on_vm_shutdown as a follow-up issue. Could you take a look at this revision? ------------- PR: https://git.openjdk.java.net/jdk/pull/5590 From github.com+42899633+eastig at openjdk.java.net Tue Sep 28 08:06:12 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Tue, 28 Sep 2021 08:06:12 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v4] In-Reply-To: References: Message-ID: On Wed, 22 Sep 2021 20:03:30 GMT, Paul Hohensee wrote: > In pause_aarch64.hpp, I'd put the definition of PauseInst inside the definition of PauseImplDesc in order to not clutter the global namespec more than needed. Done ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From github.com+42899633+eastig at openjdk.java.net Tue Sep 28 08:06:13 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Tue, 28 Sep 2021 08:06:13 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v4] In-Reply-To: References: Message-ID: On Thu, 23 Sep 2021 08:27:01 GMT, Andrew Haley wrote: >> Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: >> >> Move spin_wait in cpp file with removal of loop macro >> >> In addition, comments are added to a checking method of a test. > > src/hotspot/cpu/aarch64/vm_version_aarch64.cpp line 68: > >> 66: return PauseImplDesc(YIELD, count); >> 67: } else if (strcmp(s, "none") != 0) { >> 68: vm_exit_during_initialization("Invalid value for OnSpinWaitImpl", OnSpinWaitImpl); > > Suggestion: > > vm_exit_during_initialization("The options for OnSpinWaitImpl are nop, isb, yield, and none", OnSpinWaitImpl); Done ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From github.com+42899633+eastig at openjdk.java.net Tue Sep 28 08:06:12 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Tue, 28 Sep 2021 08:06:12 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v3] In-Reply-To: References: <5yTyf-BNoSsU36WxJNxGum10rfMKf4dkZAFIVFl7zEw=.261e52fc-1d56-4ed8-942f-97335f07eca6@github.com> Message-ID: <7THs0B5bDz1qUd8H95kdcQo7ytnLzsNdIq3jpKjYucg=.bb592b04-448a-4945-9ecf-4479f25ead7f@github.com> On Wed, 22 Sep 2021 12:28:42 GMT, Evgeny Astigeevich wrote: >> Good point. There's no significant performance advantage to having this in the header. > >> Why use a macro here? You could just put the loop around the switch statement. And the method body seems sufficiently large that it ought to go in the .cpp file. > > :) compiler engineering experience. Compilers have a problem to apply unswitching optimization to loop-invariant SWITCHes. > I'll update the code as suggested. Done ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From aph at openjdk.java.net Tue Sep 28 08:49:57 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 28 Sep 2021 08:49:57 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v8] In-Reply-To: References: Message-ID: On Thu, 23 Sep 2021 14:14:38 GMT, Evgeny Astigeevich wrote: >> This PR is a follow-up on the discussion [?RFC: AArch64: Implementing spin pauses with ISB?](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054033.html). >> >> It adds the option `OnSpinWaitImpl=value`, where `value` can be: >> >> - `none`: no implementation for spin pauses. This is the default value. >> - `Nnop`: use `nop` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `nop` instructions. >> - `Nisb`: use `isb` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `isb` instructions. >> - `Nyield`: use `yield` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `yield` instructions. >> >> The code for the `Thread.onSpinWait` intrinsic is generated based on the value of `OnSpinWaitImpl`. >> >> Testing: >> >> - `make test TEST="gtest"`: Passed >> - `make run-test TEST="tier1"`: Passed >> - `make run-test TEST="tier2"`: Passed >> - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > Separate OnSpinWaitImpl into OnSpinWaitInst and OnSpinWaitInstCount Looks good. I think we're done. ------------- Marked as reviewed by aph (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5562 From tschatzl at openjdk.java.net Tue Sep 28 10:07:05 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 28 Sep 2021 10:07:05 GMT Subject: RFR: 8272807: Permit use of memory concurrent with pretouch [v2] In-Reply-To: References: Message-ID: On Tue, 21 Sep 2021 23:01:08 GMT, Kim Barrett wrote: >> Note that this PR replaces the withdrawn https://github.com/openjdk/jdk/pull/5215. >> >> Please review this change which adds os::touch_memory, which is similar to >> os::pretouch_memory but allows concurrent access to the memory while it is >> being touched. This is accomplished by using an atomic add of zero as the >> operation for touching the memory, ensuring the virtual location is backed >> by physical memory while not changing any values being read or written by >> other threads. >> >> While I was there, fixed some other lurking issues in os::pretouch_memory. >> There was a potential overflow in the iteration that has been fixed. And if >> the range arguments weren't page aligned then the last page might not get >> touched. The latter was even mentioned in the function's description. Both >> of those have been fixed by careful alignment and some extra checks. The >> resulting code is a little more complicated, but more robust and complete. >> >> Similarly added TouchTask, which is similar to PretouchTask. Again here, >> there is some cleaning up to avoid potential overflows and such. >> >> - The chunk size is computed using the page size after possible adjustment >> for UseTransparentHugePages. We want a chunk size that reflects the actual >> number of touches that will be performed. >> >> - The chunk claim is now done using a CAS that won't exceed the range end. >> The old atomic-fetch-and-add and check the result, which is performed by >> each worker thread, could lead to overflow. The old code has a test for >> overflow, but since pointer-arithmetic overflow is UB that's not reliable. >> >> - The old calculation of num_chunks for parallel touching could also >> potentially overflow. >> >> Testing: >> mach5 tier1-3 > > Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into touch_memory > - simplify touch_impl, using conditional on bool arg rather than template specialization > - touch task > - add touch_memory Other than the comments, the change looks good. src/hotspot/share/gc/shared/pretouchTask.cpp line 105: > 103: size_t page_size, > 104: WorkGang* pretouch_gang) { > 105: PretouchTask task{task_name, start, end, page_size}; I would prefer to use the regular braces (also in TouchTask::touch()) for consistency with other code here, but is fine with me. Just unusual to see in Hotspot code. src/hotspot/share/runtime/os.cpp line 1869: > 1867: > 1868: void os::touch_memory(void* start, void* end, size_t page_size) { > 1869: check_touch_memory_args(start, end, page_size); Fwiw, the refactorings look good, thanks for fixing the issues, but due to the amount of code changed (compared to the actual addition of pretouch vs. touch) I would also have preferred this to be a separate change. src/hotspot/share/runtime/os.hpp line 376: > 374: // precondition: start <= end. > 375: static void pretouch_memory(void* start, void* end, size_t page_size = vm_page_size()); > 376: static void touch_memory(void* start, void* end, size_t page_size = vm_page_size()); Imho the naming is really bad. "Pretouch" and "touch" to me do not differ in the strength of the touch which I assume is intended here, but when the action is done. So this is really confusing to me. I am not a linguist, and looking up synonyms did not yield anything good (that is commonly recognized too) though. Did you consider adding a parameter to the `os::pretouch` method instead of two so confusingly named ones (something like `destructive` to indicate that one actually changes the memory that is touched)? Maybe this allows flattening the `BasicTouchTask` hierarchy to just pass along that flag too, potentially saving a lot of code implementing just that (i.e. the only difference of the two child classes is that they call the respective touch method) unless I'm mistaken. There may still be two different subclasses or helper methods if desired. But maybe there is a reason to have the subclasses in some follow-up? E.g. one option (which may be too much change here since it takes information hiding even further, something for an extra CR maybe) could be to only have a public `PretouchHelper` class in the .hpp file with one (or two) methods and move all implementation detail into the .cpp file. Like class PretouchHelper { static void pretouch(const char* task_name, char* start_address, char* end_address, size_t page_size, WorkGang* pretouch_gang, bool destructive = false); /* safe but slow */ OR static void pretouch(const char* task_name, char* start_address, char* end_address, size_t page_size, WorkGang* pretouch_gang); static void touch(const char* task_name, char* start_address, char* end_address, size_t page_size, WorkGang* pretouch_gang); } and implement these methods as needed. As mentioned, due to naming issues I would prefer the version with the bool parameter. ------------- PR: https://git.openjdk.java.net/jdk/pull/5353 From smonteith at openjdk.java.net Tue Sep 28 11:31:09 2021 From: smonteith at openjdk.java.net (Stuart Monteith) Date: Tue, 28 Sep 2021 11:31:09 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v8] In-Reply-To: References: Message-ID: <5BXaz78z6bOTNs5Pi1ffTsU_WOADzU5prpsiWEtw-i8=.bea9d849-2d8d-48d0-8393-bc3b5174ab37@github.com> On Thu, 23 Sep 2021 14:14:38 GMT, Evgeny Astigeevich wrote: >> This PR is a follow-up on the discussion [?RFC: AArch64: Implementing spin pauses with ISB?](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054033.html). >> >> It adds the option `OnSpinWaitImpl=value`, where `value` can be: >> >> - `none`: no implementation for spin pauses. This is the default value. >> - `Nnop`: use `nop` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `nop` instructions. >> - `Nisb`: use `isb` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `isb` instructions. >> - `Nyield`: use `yield` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `yield` instructions. >> >> The code for the `Thread.onSpinWait` intrinsic is generated based on the value of `OnSpinWaitImpl`. >> >> Testing: >> >> - `make test TEST="gtest"`: Passed >> - `make run-test TEST="tier1"`: Passed >> - `make run-test TEST="tier2"`: Passed >> - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > Separate OnSpinWaitImpl into OnSpinWaitInst and OnSpinWaitInstCount Overall looks fine, but the tests look like they need to work with hsdis. test/hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitAArch64.java line 87: > 85: private static String getSpinWaitInstHex(String spinWaitInst) { > 86: if ("nop".equals(spinWaitInst)) { > 87: return "1f20 03d5"; I'm getting the following when running these tests: STDERR: java.lang.RuntimeException: Wrong instruction 1f20 03d5 count 0! -- expecting 7 at compiler.onSpinWait.TestOnSpinWaitAArch64.checkOutput(TestOnSpinWaitAArch64.java:163) at compiler.onSpinWait.TestOnSpinWaitAArch64.main(TestOnSpinWaitAArch64.java:82) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:568) at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:312) at java.base/java.lang.Thread.run(Thread.java:833) I have hsdis installed as a matter of course, is the test written assuming hsdis is not present? ------------- Changes requested by smonteith (Author). PR: https://git.openjdk.java.net/jdk/pull/5562 From github.com+42899633+eastig at openjdk.java.net Tue Sep 28 12:18:09 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Tue, 28 Sep 2021 12:18:09 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v8] In-Reply-To: References: Message-ID: <9i4FW6TUbkbrbQ6QWhUnVcaYr3KctatffOQjev8Rc0M=.45a10573-6e7e-4f9d-99e1-b02a5d3502cd@github.com> On Tue, 28 Sep 2021 08:47:07 GMT, Andrew Haley wrote: > Looks good. I think we're done. Thank you. I'll create a CSR. ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From adinn at openjdk.java.net Tue Sep 28 12:58:06 2021 From: adinn at openjdk.java.net (Andrew Dinn) Date: Tue, 28 Sep 2021 12:58:06 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v8] In-Reply-To: <5BXaz78z6bOTNs5Pi1ffTsU_WOADzU5prpsiWEtw-i8=.bea9d849-2d8d-48d0-8393-bc3b5174ab37@github.com> References: <5BXaz78z6bOTNs5Pi1ffTsU_WOADzU5prpsiWEtw-i8=.bea9d849-2d8d-48d0-8393-bc3b5174ab37@github.com> Message-ID: On Tue, 28 Sep 2021 11:26:37 GMT, Stuart Monteith wrote: >> Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: >> >> Separate OnSpinWaitImpl into OnSpinWaitInst and OnSpinWaitInstCount > > test/hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitAArch64.java line 87: > >> 85: private static String getSpinWaitInstHex(String spinWaitInst) { >> 86: if ("nop".equals(spinWaitInst)) { >> 87: return "1f20 03d5"; > > I'm getting the following when running these tests: > STDERR: > java.lang.RuntimeException: Wrong instruction 1f20 03d5 count 0! > -- expecting 7 > at compiler.onSpinWait.TestOnSpinWaitAArch64.checkOutput(TestOnSpinWaitAArch64.java:163) > at compiler.onSpinWait.TestOnSpinWaitAArch64.main(TestOnSpinWaitAArch64.java:82) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:568) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:312) > at java.base/java.lang.Thread.run(Thread.java:833) > > I have hsdis installed as a matter of course, is the test written assuming hsdis is not present? Yes, currently it is assumed that hsdis is not present and hence that the disassembly produces hex insns. However, if you install hsdis as part of the build (as I always do) then these tests will fail. It would really be better if the test looked for the correct hex insns or disassembled insns. ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From github.com+42899633+eastig at openjdk.java.net Tue Sep 28 13:17:45 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Tue, 28 Sep 2021 13:17:45 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v8] In-Reply-To: References: <5BXaz78z6bOTNs5Pi1ffTsU_WOADzU5prpsiWEtw-i8=.bea9d849-2d8d-48d0-8393-bc3b5174ab37@github.com> Message-ID: On Tue, 28 Sep 2021 12:54:47 GMT, Andrew Dinn wrote: >> test/hotspot/jtreg/compiler/onSpinWait/TestOnSpinWaitAArch64.java line 87: >> >>> 85: private static String getSpinWaitInstHex(String spinWaitInst) { >>> 86: if ("nop".equals(spinWaitInst)) { >>> 87: return "1f20 03d5"; >> >> I'm getting the following when running these tests: >> STDERR: >> java.lang.RuntimeException: Wrong instruction 1f20 03d5 count 0! >> -- expecting 7 >> at compiler.onSpinWait.TestOnSpinWaitAArch64.checkOutput(TestOnSpinWaitAArch64.java:163) >> at compiler.onSpinWait.TestOnSpinWaitAArch64.main(TestOnSpinWaitAArch64.java:82) >> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) >> at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.base/java.lang.reflect.Method.invoke(Method.java:568) >> at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:312) >> at java.base/java.lang.Thread.run(Thread.java:833) >> >> I have hsdis installed as a matter of course, is the test written assuming hsdis is not present? > > Yes, currently it is assumed that hsdis is not present and hence that the disassembly produces hex insns. However, if you install hsdis as part of the build (as I always do) then these tests will fail. It would really be better if the test looked for the correct hex insns or disassembled insns. @adinn is correct. I am updating the test to support hsdis. ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From smonteith at openjdk.java.net Tue Sep 28 13:17:49 2021 From: smonteith at openjdk.java.net (Stuart Monteith) Date: Tue, 28 Sep 2021 13:17:49 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v8] In-Reply-To: References: <5BXaz78z6bOTNs5Pi1ffTsU_WOADzU5prpsiWEtw-i8=.bea9d849-2d8d-48d0-8393-bc3b5174ab37@github.com> Message-ID: On Tue, 28 Sep 2021 13:10:16 GMT, Evgeny Astigeevich wrote: >> Yes, currently it is assumed that hsdis is not present and hence that the disassembly produces hex insns. However, if you install hsdis as part of the build (as I always do) then these tests will fail. It would really be better if the test looked for the correct hex insns or disassembled insns. > > @adinn is correct. I am updating the test to support hsdis. I'd suggest the most straightforward way to deal with this would be to check for the hex and fail if neither are found. An alternative could be to switch behaviour if the message "Loaded disassembler from hsdis-aarch64.so" is found. ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From aph at openjdk.java.net Tue Sep 28 16:32:41 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Tue, 28 Sep 2021 16:32:41 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v8] In-Reply-To: References: <5BXaz78z6bOTNs5Pi1ffTsU_WOADzU5prpsiWEtw-i8=.bea9d849-2d8d-48d0-8393-bc3b5174ab37@github.com> Message-ID: On Tue, 28 Sep 2021 13:10:26 GMT, Stuart Monteith wrote: >> @adinn is correct. I am updating the test to support hsdis. > > I'd suggest the most straightforward way to deal with this would be to check for the hex and fail if neither are found. An alternative could be to switch behaviour if the message "Loaded disassembler from hsdis-aarch64.so" is found. I guess so. This should be fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From github.com+42899633+eastig at openjdk.java.net Tue Sep 28 17:23:36 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Tue, 28 Sep 2021 17:23:36 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v8] In-Reply-To: References: <5BXaz78z6bOTNs5Pi1ffTsU_WOADzU5prpsiWEtw-i8=.bea9d849-2d8d-48d0-8393-bc3b5174ab37@github.com> Message-ID: On Tue, 28 Sep 2021 16:29:26 GMT, Andrew Haley wrote: >> I'd suggest the most straightforward way to deal with this would be to check for the hex and fail if neither are found. An alternative could be to switch behaviour if the message "Loaded disassembler from hsdis-aarch64.so" is found. > > I guess so. This should be fixed. I have a fix which makes the test working with or without hsdis. I am testing it. ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From phh at openjdk.java.net Tue Sep 28 21:05:46 2021 From: phh at openjdk.java.net (Paul Hohensee) Date: Tue, 28 Sep 2021 21:05:46 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v8] In-Reply-To: References: Message-ID: <4fV5FvhswwUI-2c6ten6LJfQMJkpvVnWxhhY9f8xPTA=.fb0e8242-8967-4516-a0e7-40856d730c62@github.com> On Thu, 23 Sep 2021 14:14:38 GMT, Evgeny Astigeevich wrote: >> This PR is a follow-up on the discussion [?RFC: AArch64: Implementing spin pauses with ISB?](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054033.html). >> >> It adds the option `OnSpinWaitImpl=value`, where `value` can be: >> >> - `none`: no implementation for spin pauses. This is the default value. >> - `Nnop`: use `nop` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `nop` instructions. >> - `Nisb`: use `isb` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `isb` instructions. >> - `Nyield`: use `yield` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `yield` instructions. >> >> The code for the `Thread.onSpinWait` intrinsic is generated based on the value of `OnSpinWaitImpl`. >> >> Testing: >> >> - `make test TEST="gtest"`: Passed >> - `make run-test TEST="tier1"`: Passed >> - `make run-test TEST="tier2"`: Passed >> - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed > > Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: > > Separate OnSpinWaitImpl into OnSpinWaitInst and OnSpinWaitInstCount Subject to CSR approval, lgtm. ------------- Marked as reviewed by phh (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5562 From github.com+42899633+eastig at openjdk.java.net Tue Sep 28 21:38:53 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Tue, 28 Sep 2021 21:38:53 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v9] In-Reply-To: References: Message-ID: > This PR is a follow-up on the discussion [?RFC: AArch64: Implementing spin pauses with ISB?](https://mail.openjdk.java.net/pipermail/hotspot-dev/2021-August/054033.html). > > It adds the option `OnSpinWaitImpl=value`, where `value` can be: > > - `none`: no implementation for spin pauses. This is the default value. > - `Nnop`: use `nop` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `nop` instructions. > - `Nisb`: use `isb` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `isb` instructions. > - `Nyield`: use `yield` instruction for spin pauses. Optional `N` can be `2..9` to specify a number of `yield` instructions. > > The code for the `Thread.onSpinWait` intrinsic is generated based on the value of `OnSpinWaitImpl`. > > Testing: > > - `make test TEST="gtest"`: Passed > - `make run-test TEST="tier1"`: Passed > - `make run-test TEST="tier2"`: Passed > - `make run-test TEST=hotspot/jtreg/compiler/onSpinWait`: Passed Evgeny Astigeevich has updated the pull request incrementally with one additional commit since the last revision: Add support of hsdis output to test When JVM finds the hsdis library it uses it to disassemble code. The output contains assembly instructions instead of hex codes. This change adds support of hsdis output to the test. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5562/files - new: https://git.openjdk.java.net/jdk/pull/5562/files/d4a5183a..122ea2e9 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5562&range=08 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5562&range=07-08 Stats: 40 lines in 1 file changed: 29 ins; 0 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/5562.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5562/head:pull/5562 PR: https://git.openjdk.java.net/jdk/pull/5562 From github.com+42899633+eastig at openjdk.java.net Tue Sep 28 21:38:54 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Tue, 28 Sep 2021 21:38:54 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v8] In-Reply-To: References: <5BXaz78z6bOTNs5Pi1ffTsU_WOADzU5prpsiWEtw-i8=.bea9d849-2d8d-48d0-8393-bc3b5174ab37@github.com> Message-ID: <57NBCKVSwh09wtVC2X3uWiei7QgKOlpdzI-MO_KyZbI=.c702c7e7-8301-4269-92fa-e5e2c7387221@github.com> On Tue, 28 Sep 2021 17:19:59 GMT, Evgeny Astigeevich wrote: >> I guess so. This should be fixed. > > I have a fix which makes the test working with or without hsdis. I am testing it. Done. ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From dholmes at openjdk.java.net Tue Sep 28 23:24:48 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 28 Sep 2021 23:24:48 GMT Subject: RFR: 8274136: -XX:+ExitOnOutOfMemoryError calls exit while threads are running In-Reply-To: References: Message-ID: On Fri, 24 Sep 2021 07:44:19 GMT, Thomas Stuefe wrote: >> Please see bug report for more detailed discussion. >> >> We introduce `os::_exit()` to `call _exit()` to allow us to terminate without running the at_exit handlers and global destructors, which lead to the crashes during termination. >> >> Testing: tiers 1-3 (includes the ExitOnOutOfMemoryError test) >> >> Thanks, >> David > > LGTM. > > Your assumption that `-XX:+ExitOnOutOfMemoryError` should stop the VM painlessly is what I think too. Our customers use it in scenarios where the VM should go down, quickly, with a minimum of fuzz. E.g. in cloud scenarios, where you want to restart the VM as fast as possible. OTOH, `-XX:+CrashOnOutOfMemoryError` should give you a hs-err file and a core, creating either may hang or at least delay matters. > > Incidentally, in our SapMachine we added some subtle behavioral changes (https://github.com/SAP/SapMachine/wiki/Handling-of-OnOutOfMemoryError-switches-in-the-SapMachine, see italics). I know we talked about handling Thread exhaustion events, but what about simple stack printing to stdout, do you think that would be useful upstream? Thanks for the reviews @tstuefe and @hseigel. @tstuefe : personally I don't think printing the stack to stdout is a necessary thing to add. ------------- PR: https://git.openjdk.java.net/jdk/pull/5668 From dholmes at openjdk.java.net Tue Sep 28 23:29:45 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 28 Sep 2021 23:29:45 GMT Subject: Integrated: 8274136: -XX:+ExitOnOutOfMemoryError calls exit while threads are running In-Reply-To: References: Message-ID: On Thu, 23 Sep 2021 23:15:28 GMT, David Holmes wrote: > Please see bug report for more detailed discussion. > > We introduce `os::_exit()` to `call _exit()` to allow us to terminate without running the at_exit handlers and global destructors, which lead to the crashes during termination. > > Testing: tiers 1-3 (includes the ExitOnOutOfMemoryError test) > > Thanks, > David This pull request has now been integrated. Changeset: 2657bcbd Author: David Holmes URL: https://git.openjdk.java.net/jdk/commit/2657bcbd9965d8af83f4063e3602c409735493d1 Stats: 16 lines in 4 files changed: 12 ins; 0 del; 4 mod 8274136: -XX:+ExitOnOutOfMemoryError calls exit while threads are running Reviewed-by: stuefe, hseigel ------------- PR: https://git.openjdk.java.net/jdk/pull/5668 From dholmes at openjdk.java.net Wed Sep 29 01:25:51 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 29 Sep 2021 01:25:51 GMT Subject: RFR: 8273608: Deadlock when jcmd of OnError attaches to itself [v2] In-Reply-To: References: <3mILJs2Lcq7t5gUDP70FH2LVsm-NT2UTsm1JY-rCKB0=.d4dd4b55-9476-43ad-a0cf-ce6d2a9bef4e@github.com> Message-ID: <8LmYbwFdozu6SSdD6j0sE4dKDiu19bvqExkE-TR_4YA=.5f7caeab-e42e-4815-bab0-6da3ca0ae514@github.com> On Mon, 27 Sep 2021 23:27:39 GMT, Xin Liu wrote: >> This patch allows the custom commands of OnError to attach to HotSpot itself. >> It sets the thread of report_and_die() to Native before os::fork_and_exec(cmd). >> This prevents cmds which require safepoint synchronization from deadlock. >> eg. OnError='jcmd %p Thread.print'. >> >> Without this patch, we will encounter a deadlock at safepoint synchronization. >> `"main" #1` is the very thread which executes `os::fork_and_exec(cmd)`. >> >> >> Aborting due to java.lang.OutOfMemoryError: Java heap space >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (debug.cpp:364), pid=94632, tid=94633 >> # fatal error: OutOfMemory encountered: Java heap space >> # >> # JRE version: OpenJDK Runtime Environment (18.0) (build 18-internal+0-adhoc.xxinliu.jdk) >> # Java VM: OpenJDK 64-Bit Server VM (18-internal+0-adhoc.xxinliu.jdk, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) >> # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again >> # >> # An error report file with more information is saved as: >> # /local/home/xxinliu/JDK-2085/hs_err_pid94632.log >> # >> # -XX:OnError="jcmd %p Thread.print" >> # Executing /bin/sh -c "jcmd 94632 Thread.print" ... >> 94632: >> [10.616s][warning][safepoint] >> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timeout detected: >> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timed out while spinning to reach a safepoint. >> [10.616s][warning][safepoint] # SafepointSynchronize::begin: Threads which did not reach the safepoint: >> [10.616s][warning][safepoint] # "main" #1 prio=5 os_prio=0 cpu=236.97ms elapsed=10.61s tid=0x00007f01b00232f0 nid=94633 runnable [0x00007f01b7a08000] >> [10.616s][warning][safepoint] java.lang.Thread.State: RUNNABLE >> [10.616s][warning][safepoint] >> [10.616s][warning][safepoint] # SafepointSynchronize::begin: (End of list) > > Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Add a new testcase for OutOfMemoryError thrown from NIO. > - Make state changer one way in VMError. > > Add a test to show that jcmd %p won't get stuck. > - Merge branch 'master' into JDK-8273608 > - 8273608: Deadlock when jcmd of OnError attaches to itself > > Allow custom command of OnError to attach to HotSpot itself. This patch sets > the thread of report_and_die() to Native before os::fork_and_exec(cmd). This > prevents cmds which require safepoint synchronization from deadlock. > eg. OnError='jcmd %p Thread.print'. Hi Xin, I still have a few concerns about the details here. See below. Thanks, David src/hotspot/share/runtime/mutexLocker.cpp line 379: > 377: assert(thread != NULL, "can't be owned by NULL"); > 378: if (thread->is_Watcher_thread()) { > 379: // need WatcherThread as a safeguard against potential deadlocks You only call this for JavaThreads so you can't see the WatcherThread. src/hotspot/share/runtime/mutexLocker.cpp line 390: > 388: owned_lock = next; > 389: } > 390: #endif // ASSERT Should we also clear `_owned_locks` after this so that it is still correct? src/hotspot/share/runtime/mutexLocker.cpp line 396: > 394: _mutex_array[i]->unlock(); > 395: } > 396: } Surely we don't need this in a debug build as we already unlocked all owned locks? src/hotspot/share/runtime/mutexLocker.hpp line 175: > 173: // by fatal error handler. > 174: void print_owned_locks_on_error(outputStream* st); > 175: void unlock_locks_owned_by(Thread* t); // Unlock all Mutex/Monitors currently owned by a JavaThread when executing // `OnError` actions. void unlock_locks_on_error(JavaThread t); test/hotspot/jtreg/runtime/ErrorHandling/TestOutOfMemoryErrorFromNIO.java line 32: > 30: * @library /test/lib > 31: * @run main/othervm TestOutOfMemoryErrorFromNIO > 32: * @bug 8155004 8273608 Please move this to after the @test line test/hotspot/jtreg/runtime/ErrorHandling/TestOutOfMemoryErrorFromNIO.java line 65: > 63: + after.toString(); > 64: > 65: // else this is the main test else ??? test/hotspot/jtreg/runtime/ErrorHandling/TestOutOfMemoryErrorFromNIO.java line 89: > 87: output_single.stdoutShouldMatch("^" + msg); // match start of line only > 88: > 89: System.out.println("PASSED"); Should we also check for output related to execution of the OnError command? ------------- Changes requested by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5590 From ngasson at openjdk.java.net Wed Sep 29 02:15:41 2021 From: ngasson at openjdk.java.net (Nick Gasson) Date: Wed, 29 Sep 2021 02:15:41 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v8] In-Reply-To: <57NBCKVSwh09wtVC2X3uWiei7QgKOlpdzI-MO_KyZbI=.c702c7e7-8301-4269-92fa-e5e2c7387221@github.com> References: <5BXaz78z6bOTNs5Pi1ffTsU_WOADzU5prpsiWEtw-i8=.bea9d849-2d8d-48d0-8393-bc3b5174ab37@github.com> <57NBCKVSwh09wtVC2X3uWiei7QgKOlpdzI-MO_KyZbI=.c702c7e7-8301-4269-92fa-e5e2c7387221@github.com> Message-ID: <7u0gO49QpF80Rfs3TQZsFgGc7zgmHiurWP3Q00xEQic=.07209741-d84f-4beb-83ce-52804a607f1f@github.com> On Tue, 28 Sep 2021 21:34:39 GMT, Evgeny Astigeevich wrote: >> I have a fix which makes the test working with or without hsdis. I am testing it. > > Done. The [IR test framework](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/lib/ir_framework/README.md) can parse the C2 opto assembly output, and also control compilation level and flags through method annotations. I'm not sure if we could use that to check the C1 output though. Maybe @chhagedorn has some ideas? ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From dholmes at openjdk.java.net Wed Sep 29 04:20:35 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 29 Sep 2021 04:20:35 GMT Subject: RFR: 8274322: Problems with oopDesc construction In-Reply-To: References: Message-ID: On Tue, 28 Sep 2021 03:12:52 GMT, Kim Barrett wrote: > Please review this change to the default constructor for markWord and > associated "change" to construction of oopDesc. > > The current code never invokes the constructor for oopDesc or any of its > derived classes. For that to be permissible according to the Standard, > those classes must be trivially default constructible. And for that to be > the case, the markWord default constructor must be trivial. > > This change consists of three parts. > > (1) The markWord default constructor is changed to be trivial, so the > default constructors for oopDesc and classes derived from it will also be > trivial. It wasn't previously trivial because the mechanism for making it so > (a default definition) is a C++11 feature that wasn't yet supported when the > previous constructor was defined. > > (2) This change also adds static asserts to verify the relevant classes have > trivial default constructors, to prevent later changes from unintentionally > breaking this. > > (3) This change also makes oopDesc noncopyable, to prevent inadvertent usage > of these operations that don't make any sense. > > A different approach would be to always use placement new with an > appropriate constructor to perform the initialization, perhaps encapsulated > in factory functions. I did some exploration in that direction. It's a much > larger and more complex change, though the final behavior (use constructors > for initialization) is simpler. > > Testing: > tier1 Hi Kim, Based on your detailed description the changes look good. A couple of minor comments. Thanks, David src/hotspot/share/oops/oop.hpp line 35: > 33: #include "runtime/atomic.hpp" > 34: #include "utilities/macros.hpp" > 35: #include "utilities/globalDefinitions.hpp" Nit: not included in alphabetic order (and it will include macros.hpp itself anyway). src/hotspot/share/oops/oop.hpp line 326: > 324: // the Java heap, and static functions provided here on HeapWord* are used > 325: // to fill in certain parts of that memory. For that to be valid, the > 326: // object must not have non-trivial initialization (C++14 3.8). For that to Can we avoid the double-negative and state it "must have trivial initialization"? ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5729 From xliu at openjdk.java.net Wed Sep 29 04:42:36 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Wed, 29 Sep 2021 04:42:36 GMT Subject: RFR: 8273608: Deadlock when jcmd of OnError attaches to itself [v2] In-Reply-To: <8LmYbwFdozu6SSdD6j0sE4dKDiu19bvqExkE-TR_4YA=.5f7caeab-e42e-4815-bab0-6da3ca0ae514@github.com> References: <3mILJs2Lcq7t5gUDP70FH2LVsm-NT2UTsm1JY-rCKB0=.d4dd4b55-9476-43ad-a0cf-ce6d2a9bef4e@github.com> <8LmYbwFdozu6SSdD6j0sE4dKDiu19bvqExkE-TR_4YA=.5f7caeab-e42e-4815-bab0-6da3ca0ae514@github.com> Message-ID: On Wed, 29 Sep 2021 01:11:16 GMT, David Holmes wrote: >> Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Add a new testcase for OutOfMemoryError thrown from NIO. >> - Make state changer one way in VMError. >> >> Add a test to show that jcmd %p won't get stuck. >> - Merge branch 'master' into JDK-8273608 >> - 8273608: Deadlock when jcmd of OnError attaches to itself >> >> Allow custom command of OnError to attach to HotSpot itself. This patch sets >> the thread of report_and_die() to Native before os::fork_and_exec(cmd). This >> prevents cmds which require safepoint synchronization from deadlock. >> eg. OnError='jcmd %p Thread.print'. > > src/hotspot/share/runtime/mutexLocker.cpp line 390: > >> 388: owned_lock = next; >> 389: } >> 390: #endif // ASSERT > > Should we also clear `_owned_locks` after this so that it is still correct? In debug build, Thread keeps tracks its owning mutexes and Mutex keeps tracks its owner. eg. Mutex::unlock() -> Mutex::set_owner(NULL) -> Mutex::set_owner_implementation(NULL). here it deletes owned_lock from its owner. https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/mutex.cpp#L483 ------------- PR: https://git.openjdk.java.net/jdk/pull/5590 From dholmes at openjdk.java.net Wed Sep 29 04:59:50 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 29 Sep 2021 04:59:50 GMT Subject: RFR: 8273608: Deadlock when jcmd of OnError attaches to itself [v2] In-Reply-To: References: <3mILJs2Lcq7t5gUDP70FH2LVsm-NT2UTsm1JY-rCKB0=.d4dd4b55-9476-43ad-a0cf-ce6d2a9bef4e@github.com> <8LmYbwFdozu6SSdD6j0sE4dKDiu19bvqExkE-TR_4YA=.5f7caeab-e42e-4815-bab0-6da3ca0ae514@github.com> Message-ID: On Wed, 29 Sep 2021 04:38:59 GMT, Xin Liu wrote: >> src/hotspot/share/runtime/mutexLocker.cpp line 390: >> >>> 388: owned_lock = next; >>> 389: } >>> 390: #endif // ASSERT >> >> Should we also clear `_owned_locks` after this so that it is still correct? > > In debug build, Thread keeps tracks its owning mutexes and Mutex keeps tracks its owner. > eg. Mutex::unlock() -> Mutex::set_owner(NULL) -> Mutex::set_owner_implementation(NULL). > > here it deletes owned_lock from its owner. > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/mutex.cpp#L483 Thanks - I missed that housekeeping. ------------- PR: https://git.openjdk.java.net/jdk/pull/5590 From xliu at openjdk.java.net Wed Sep 29 05:12:35 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Wed, 29 Sep 2021 05:12:35 GMT Subject: RFR: 8273608: Deadlock when jcmd of OnError attaches to itself [v2] In-Reply-To: <8LmYbwFdozu6SSdD6j0sE4dKDiu19bvqExkE-TR_4YA=.5f7caeab-e42e-4815-bab0-6da3ca0ae514@github.com> References: <3mILJs2Lcq7t5gUDP70FH2LVsm-NT2UTsm1JY-rCKB0=.d4dd4b55-9476-43ad-a0cf-ce6d2a9bef4e@github.com> <8LmYbwFdozu6SSdD6j0sE4dKDiu19bvqExkE-TR_4YA=.5f7caeab-e42e-4815-bab0-6da3ca0ae514@github.com> Message-ID: On Wed, 29 Sep 2021 01:10:00 GMT, David Holmes wrote: >> Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Add a new testcase for OutOfMemoryError thrown from NIO. >> - Make state changer one way in VMError. >> >> Add a test to show that jcmd %p won't get stuck. >> - Merge branch 'master' into JDK-8273608 >> - 8273608: Deadlock when jcmd of OnError attaches to itself >> >> Allow custom command of OnError to attach to HotSpot itself. This patch sets >> the thread of report_and_die() to Native before os::fork_and_exec(cmd). This >> prevents cmds which require safepoint synchronization from deadlock. >> eg. OnError='jcmd %p Thread.print'. > > src/hotspot/share/runtime/mutexLocker.cpp line 396: > >> 394: _mutex_array[i]->unlock(); >> 395: } >> 396: } > > Surely we don't need this in a debug build as we already unlocked all owned locks? This logic is for release build. Yes, we don't need them in debug build. We should have released all owning mutexes above. It's no-op in debug build because owner() should be NULL. `thread` isn't NULL in this function. Here is current bookkeeping. From a thread, we can't find all owning mutexes. @pchilano said we can try _mutex_array in release build. I think the idea is that we are supposed to capture failure in debug build. | class | debug | release | | |-------------|---------------------------|--------------|---| | Thread | _owned_locks (listedlist) | N/A | | | Mutex | _owner | _owner | | | MutexLocker | _mutex_array | _mutex_array | | Shall we fix _mutex_array in release? I think we can change it to GrowableArray and keep track all mutexes in runtime. ------------- PR: https://git.openjdk.java.net/jdk/pull/5590 From xliu at openjdk.java.net Wed Sep 29 05:18:43 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Wed, 29 Sep 2021 05:18:43 GMT Subject: RFR: 8273608: Deadlock when jcmd of OnError attaches to itself [v2] In-Reply-To: <8LmYbwFdozu6SSdD6j0sE4dKDiu19bvqExkE-TR_4YA=.5f7caeab-e42e-4815-bab0-6da3ca0ae514@github.com> References: <3mILJs2Lcq7t5gUDP70FH2LVsm-NT2UTsm1JY-rCKB0=.d4dd4b55-9476-43ad-a0cf-ce6d2a9bef4e@github.com> <8LmYbwFdozu6SSdD6j0sE4dKDiu19bvqExkE-TR_4YA=.5f7caeab-e42e-4815-bab0-6da3ca0ae514@github.com> Message-ID: On Wed, 29 Sep 2021 01:16:06 GMT, David Holmes wrote: >> Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Add a new testcase for OutOfMemoryError thrown from NIO. >> - Make state changer one way in VMError. >> >> Add a test to show that jcmd %p won't get stuck. >> - Merge branch 'master' into JDK-8273608 >> - 8273608: Deadlock when jcmd of OnError attaches to itself >> >> Allow custom command of OnError to attach to HotSpot itself. This patch sets >> the thread of report_and_die() to Native before os::fork_and_exec(cmd). This >> prevents cmds which require safepoint synchronization from deadlock. >> eg. OnError='jcmd %p Thread.print'. > > src/hotspot/share/runtime/mutexLocker.hpp line 175: > >> 173: // by fatal error handler. >> 174: void print_owned_locks_on_error(outputStream* st); >> 175: void unlock_locks_owned_by(Thread* t); > > // Unlock all Mutex/Monitors currently owned by a JavaThread when executing > // `OnError` actions. > void unlock_locks_on_error(JavaThread t); Okay. I will remove `thread->is_Watcher_thread()`. I keep it for `prepare_for_emergency_dump` I think the logic fits all general Thread. Can I keep `unlock_locks_on_error(Thread* t);` ------------- PR: https://git.openjdk.java.net/jdk/pull/5590 From dholmes at openjdk.java.net Wed Sep 29 05:38:34 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 29 Sep 2021 05:38:34 GMT Subject: RFR: 8273608: Deadlock when jcmd of OnError attaches to itself [v2] In-Reply-To: References: <3mILJs2Lcq7t5gUDP70FH2LVsm-NT2UTsm1JY-rCKB0=.d4dd4b55-9476-43ad-a0cf-ce6d2a9bef4e@github.com> <8LmYbwFdozu6SSdD6j0sE4dKDiu19bvqExkE-TR_4YA=.5f7caeab-e42e-4815-bab0-6da3ca0ae514@github.com> Message-ID: On Wed, 29 Sep 2021 05:15:27 GMT, Xin Liu wrote: >> src/hotspot/share/runtime/mutexLocker.hpp line 175: >> >>> 173: // by fatal error handler. >>> 174: void print_owned_locks_on_error(outputStream* st); >>> 175: void unlock_locks_owned_by(Thread* t); >> >> // Unlock all Mutex/Monitors currently owned by a JavaThread when executing >> // `OnError` actions. >> void unlock_locks_on_error(JavaThread t); > > Okay. I will remove `thread->is_Watcher_thread()`. I keep it for `prepare_for_emergency_dump` > I think the logic fits all general Thread. Can I keep `unlock_locks_on_error(Thread* t);` The only time we need an unlock-all-mutexes function is for a JavaThread in the error handling code when we need to transition to native. This is not general purpose code and we shouldn't make it look like it is. ------------- PR: https://git.openjdk.java.net/jdk/pull/5590 From xliu at openjdk.java.net Wed Sep 29 05:38:37 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Wed, 29 Sep 2021 05:38:37 GMT Subject: RFR: 8273608: Deadlock when jcmd of OnError attaches to itself [v2] In-Reply-To: <8LmYbwFdozu6SSdD6j0sE4dKDiu19bvqExkE-TR_4YA=.5f7caeab-e42e-4815-bab0-6da3ca0ae514@github.com> References: <3mILJs2Lcq7t5gUDP70FH2LVsm-NT2UTsm1JY-rCKB0=.d4dd4b55-9476-43ad-a0cf-ce6d2a9bef4e@github.com> <8LmYbwFdozu6SSdD6j0sE4dKDiu19bvqExkE-TR_4YA=.5f7caeab-e42e-4815-bab0-6da3ca0ae514@github.com> Message-ID: <_vL5qlJ8iV_1Ho7Tmd0p8MibZf9vUj2qpR-ojhBloEM=.cb20576a-37fd-48f7-8976-3fc96f1640b8@github.com> On Wed, 29 Sep 2021 01:22:05 GMT, David Holmes wrote: >> Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Add a new testcase for OutOfMemoryError thrown from NIO. >> - Make state changer one way in VMError. >> >> Add a test to show that jcmd %p won't get stuck. >> - Merge branch 'master' into JDK-8273608 >> - 8273608: Deadlock when jcmd of OnError attaches to itself >> >> Allow custom command of OnError to attach to HotSpot itself. This patch sets >> the thread of report_and_die() to Native before os::fork_and_exec(cmd). This >> prevents cmds which require safepoint synchronization from deadlock. >> eg. OnError='jcmd %p Thread.print'. > > test/hotspot/jtreg/runtime/ErrorHandling/TestOutOfMemoryErrorFromNIO.java line 89: > >> 87: output_single.stdoutShouldMatch("^" + msg); // match start of line only >> 88: >> 89: System.out.println("PASSED"); > > Should we also check for output related to execution of the OnError command? This testcase will get stuck if I delete `transition_into_native()` in vmErrror.cpp I will try to verify outputs from before command and after command. ------------- PR: https://git.openjdk.java.net/jdk/pull/5590 From chagedorn at openjdk.java.net Wed Sep 29 06:47:35 2021 From: chagedorn at openjdk.java.net (Christian Hagedorn) Date: Wed, 29 Sep 2021 06:47:35 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v8] In-Reply-To: <7u0gO49QpF80Rfs3TQZsFgGc7zgmHiurWP3Q00xEQic=.07209741-d84f-4beb-83ce-52804a607f1f@github.com> References: <5BXaz78z6bOTNs5Pi1ffTsU_WOADzU5prpsiWEtw-i8=.bea9d849-2d8d-48d0-8393-bc3b5174ab37@github.com> <57NBCKVSwh09wtVC2X3uWiei7QgKOlpdzI-MO_KyZbI=.c702c7e7-8301-4269-92fa-e5e2c7387221@github.com> <7u0gO49QpF80Rfs3TQZsFgGc7zgmHiurWP3Q00xEQic=.07209741-d84f-4beb-83ce-52804a607f1f@github.com> Message-ID: On Wed, 29 Sep 2021 02:12:31 GMT, Nick Gasson wrote: >> Done. > > The [IR test framework](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/lib/ir_framework/README.md) can parse the C2 opto assembly output, and also control compilation level and flags through method annotations. I'm not sure if we could use that to check the C1 output though. Maybe @chhagedorn has some ideas? The framework only supports regex matching on C2's ideal and opto assembly output. You would still need the current checks for C1's output. So, I'm not sure if it's worth transforming the C2 checks only to use the framework. ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From kbarrett at openjdk.java.net Wed Sep 29 06:57:10 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 29 Sep 2021 06:57:10 GMT Subject: RFR: 8274322: Problems with oopDesc construction [v2] In-Reply-To: References: Message-ID: > Please review this change to the default constructor for markWord and > associated "change" to construction of oopDesc. > > The current code never invokes the constructor for oopDesc or any of its > derived classes. For that to be permissible according to the Standard, > those classes must be trivially default constructible. And for that to be > the case, the markWord default constructor must be trivial. > > This change consists of three parts. > > (1) The markWord default constructor is changed to be trivial, so the > default constructors for oopDesc and classes derived from it will also be > trivial. It wasn't previously trivial because the mechanism for making it so > (a default definition) is a C++11 feature that wasn't yet supported when the > previous constructor was defined. > > (2) This change also adds static asserts to verify the relevant classes have > trivial default constructors, to prevent later changes from unintentionally > breaking this. > > (3) This change also makes oopDesc noncopyable, to prevent inadvertent usage > of these operations that don't make any sense. > > A different approach would be to always use placement new with an > appropriate constructor to perform the initialization, perhaps encapsulated > in factory functions. I did some exploration in that direction. It's a much > larger and more complex change, though the final behavior (use constructors > for initialization) is simpler. > > Testing: > tier1 Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: dholmes review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5729/files - new: https://git.openjdk.java.net/jdk/pull/5729/files/3f4cc094..5197b2c6 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5729&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5729&range=00-01 Stats: 5 lines in 1 file changed: 1 ins; 1 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/5729.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5729/head:pull/5729 PR: https://git.openjdk.java.net/jdk/pull/5729 From kbarrett at openjdk.java.net Wed Sep 29 07:03:32 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 29 Sep 2021 07:03:32 GMT Subject: RFR: 8274322: Problems with oopDesc construction [v2] In-Reply-To: References: Message-ID: On Wed, 29 Sep 2021 04:13:19 GMT, David Holmes wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> dholmes review > > src/hotspot/share/oops/oop.hpp line 35: > >> 33: #include "runtime/atomic.hpp" >> 34: #include "utilities/macros.hpp" >> 35: #include "utilities/globalDefinitions.hpp" > > Nit: not included in alphabetic order (and it will include macros.hpp itself anyway). Fixed the order. In the interest of "include what you use" I left macros.hpp in the list. > src/hotspot/share/oops/oop.hpp line 326: > >> 324: // the Java heap, and static functions provided here on HeapWord* are used >> 325: // to fill in certain parts of that memory. For that to be valid, the >> 326: // object must not have non-trivial initialization (C++14 3.8). For that to > > Can we avoid the double-negative and state it "must have trivial initialization"? I've improved the wording, including eliminating the double negative. I followed the wording in the standard a little too closely; that's where "non-trivial initialization" came from. I think "must have trivial initialization" isn't right, since "initialization" in this context involves invocation of a constructor, and we're dealing with a case where that doesn't happen. ------------- PR: https://git.openjdk.java.net/jdk/pull/5729 From dholmes at openjdk.java.net Wed Sep 29 07:25:37 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Wed, 29 Sep 2021 07:25:37 GMT Subject: RFR: 8274322: Problems with oopDesc construction [v2] In-Reply-To: References: Message-ID: On Wed, 29 Sep 2021 06:57:10 GMT, Kim Barrett wrote: >> Please review this change to the default constructor for markWord and >> associated "change" to construction of oopDesc. >> >> The current code never invokes the constructor for oopDesc or any of its >> derived classes. For that to be permissible according to the Standard, >> those classes must be trivially default constructible. And for that to be >> the case, the markWord default constructor must be trivial. >> >> This change consists of three parts. >> >> (1) The markWord default constructor is changed to be trivial, so the >> default constructors for oopDesc and classes derived from it will also be >> trivial. It wasn't previously trivial because the mechanism for making it so >> (a default definition) is a C++11 feature that wasn't yet supported when the >> previous constructor was defined. >> >> (2) This change also adds static asserts to verify the relevant classes have >> trivial default constructors, to prevent later changes from unintentionally >> breaking this. >> >> (3) This change also makes oopDesc noncopyable, to prevent inadvertent usage >> of these operations that don't make any sense. >> >> A different approach would be to always use placement new with an >> appropriate constructor to perform the initialization, perhaps encapsulated >> in factory functions. I did some exploration in that direction. It's a much >> larger and more complex change, though the final behavior (use constructors >> for initialization) is simpler. >> >> Testing: >> tier1 > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > dholmes review Looks good! Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5729 From aph at openjdk.java.net Wed Sep 29 09:06:36 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Wed, 29 Sep 2021 09:06:36 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v8] In-Reply-To: References: <5BXaz78z6bOTNs5Pi1ffTsU_WOADzU5prpsiWEtw-i8=.bea9d849-2d8d-48d0-8393-bc3b5174ab37@github.com> <57NBCKVSwh09wtVC2X3uWiei7QgKOlpdzI-MO_KyZbI=.c702c7e7-8301-4269-92fa-e5e2c7387221@github.com> <7u0gO49QpF80Rfs3TQZsFgGc7zgmHiurWP3Q00xEQic=.07209741-d84f-4beb-83ce-52804a607f1f@github.com> Message-ID: On Wed, 29 Sep 2021 06:44:29 GMT, Christian Hagedorn wrote: >> The [IR test framework](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/lib/ir_framework/README.md) can parse the C2 opto assembly output, and also control compilation level and flags through method annotations. I'm not sure if we could use that to check the C1 output though. Maybe @chhagedorn has some ideas? > > The framework only supports regex matching on C2's ideal and opto assembly output. You would still need the current checks for C1's output. So, I'm not sure if it's worth transforming the C2 checks only to use the framework. You might be over-thinking this a little. You already check that the assembler macro does the right thing, so as long as you've made sure the C1 code actually calls the macro you're good. Also, this is a performance-only change (does not affect correctness) and it may not have a great deal of affect in C1-generated code anyway. ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From github.com+42899633+eastig at openjdk.java.net Wed Sep 29 12:35:38 2021 From: github.com+42899633+eastig at openjdk.java.net (Evgeny Astigeevich) Date: Wed, 29 Sep 2021 12:35:38 GMT Subject: RFR: 8186670: Implement _onSpinWait() intrinsic for AArch64 [v8] In-Reply-To: <7u0gO49QpF80Rfs3TQZsFgGc7zgmHiurWP3Q00xEQic=.07209741-d84f-4beb-83ce-52804a607f1f@github.com> References: <5BXaz78z6bOTNs5Pi1ffTsU_WOADzU5prpsiWEtw-i8=.bea9d849-2d8d-48d0-8393-bc3b5174ab37@github.com> <57NBCKVSwh09wtVC2X3uWiei7QgKOlpdzI-MO_KyZbI=.c702c7e7-8301-4269-92fa-e5e2c7387221@github.com> <7u0gO49QpF80Rfs3TQZsFgGc7zgmHiurWP3Q00xEQic=.07209741-d84f-4beb-83ce-52804a607f1f@github.com> Message-ID: On Wed, 29 Sep 2021 02:12:31 GMT, Nick Gasson wrote: >> Done. > > The [IR test framework](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/lib/ir_framework/README.md) can parse the C2 opto assembly output, and also control compilation level and flags through method annotations. I'm not sure if we could use that to check the C1 output though. Maybe @chhagedorn has some ideas? @nick-arm Thank you for the link. ------------- PR: https://git.openjdk.java.net/jdk/pull/5562 From eosterlund at openjdk.java.net Wed Sep 29 15:24:45 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 29 Sep 2021 15:24:45 GMT Subject: RFR: 8274501: c2i entry barriers read int as long on AArch64 Message-ID: There was a bug in the x86_64 implementation of the c2i entry barriers. We read the CLD::_keep_alive int as a 64 bit integer, while it is of course in fact a 32 bit integer. It was fixed in the patch that ported it to x86_32 (JDK-8235262). However, somewhere in-between I think the wrong code was used as a basis for the AArch64 implementation, which now seemingly has inherited that same bug. ------------- Commit messages: - 8274501: c2i entry barriers read int as long on AArch64 Changes: https://git.openjdk.java.net/jdk/pull/5754/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5754&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8274501 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5754.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5754/head:pull/5754 PR: https://git.openjdk.java.net/jdk/pull/5754 From shade at openjdk.java.net Wed Sep 29 16:41:29 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 29 Sep 2021 16:41:29 GMT Subject: RFR: 8274501: c2i entry barriers read int as long on AArch64 In-Reply-To: References: Message-ID: <-oxqSmmhC7j9je1xGxVUP2KxQoxmetKowuILNtq_lcY=.0f8a5327-bc46-45bf-8a6a-bae4cab70fe0@github.com> On Wed, 29 Sep 2021 15:12:40 GMT, Erik ?sterlund wrote: > There was a bug in the x86_64 implementation of the c2i entry barriers. We read the CLD::_keep_alive int as a 64 bit integer, while it is of course in fact a 32 bit integer. It was fixed in the patch that ported it to x86_32 (JDK-8235262). However, somewhere in-between I think the wrong code was used as a basis for the AArch64 implementation, which now seemingly has inherited that same bug. Looks good! I suspect PPC has the same problem, @TheRealMDoerr, @GoeLin? ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5754 From eosterlund at openjdk.java.net Wed Sep 29 16:45:36 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 29 Sep 2021 16:45:36 GMT Subject: RFR: 8274501: c2i entry barriers read int as long on AArch64 In-Reply-To: <-oxqSmmhC7j9je1xGxVUP2KxQoxmetKowuILNtq_lcY=.0f8a5327-bc46-45bf-8a6a-bae4cab70fe0@github.com> References: <-oxqSmmhC7j9je1xGxVUP2KxQoxmetKowuILNtq_lcY=.0f8a5327-bc46-45bf-8a6a-bae4cab70fe0@github.com> Message-ID: On Wed, 29 Sep 2021 16:38:12 GMT, Aleksey Shipilev wrote: > Looks good! > > > > I suspect PPC has the same problem, @TheRealMDoerr, @GoeLin? Thanks Aleksey! ------------- PR: https://git.openjdk.java.net/jdk/pull/5754 From shade at openjdk.java.net Wed Sep 29 17:03:00 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 29 Sep 2021 17:03:00 GMT Subject: RFR: 8274521: jdk/jfr/event/gc/detailed/TestGCLockerEvent.java fails when other GC is selected Message-ID: Simple test bug, need to check if G1 is enabled, or it is a default GC before asking `othervm` with `-XX:+UseG1GC`. Other tests in the same directory do exactly that. Additional testing: - [x] Affected test skipped with Shenandoah now - [x] Affected test still passes with G1 - [x] Affected test still passes default GC ------------- Commit messages: - Fix Changes: https://git.openjdk.java.net/jdk/pull/5756/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5756&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8274521 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5756.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5756/head:pull/5756 PR: https://git.openjdk.java.net/jdk/pull/5756 From kvn at openjdk.java.net Wed Sep 29 20:00:39 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Wed, 29 Sep 2021 20:00:39 GMT Subject: RFR: 8218885: Restore pop_frame and force_early_return functionality for Graal In-Reply-To: References: Message-ID: On Wed, 22 Sep 2021 05:40:40 GMT, Tom Rodriguez wrote: > This logic no longer seems to be necessary since the adjustCompilationLevel callback has been removed. @tkrodriguez Did you test this changes with GraalVM? Would be nice to run it with command line which Serguei pointed. We will be fine if it passed with changes. ------------- PR: https://git.openjdk.java.net/jdk/pull/5625 From kbarrett at openjdk.java.net Wed Sep 29 21:57:33 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 29 Sep 2021 21:57:33 GMT Subject: RFR: 8274501: c2i entry barriers read int as long on AArch64 In-Reply-To: References: Message-ID: <-aUEV7GXVcxLmlQzrhEtCnQZTeb6i3epiOndE4sqeCw=.c318c4d6-3d5a-46e8-81ca-82f1a57fae22@github.com> On Wed, 29 Sep 2021 15:12:40 GMT, Erik ?sterlund wrote: > There was a bug in the x86_64 implementation of the c2i entry barriers. We read the CLD::_keep_alive int as a 64 bit integer, while it is of course in fact a 32 bit integer. It was fixed in the patch that ported it to x86_32 (JDK-8235262). However, somewhere in-between I think the wrong code was used as a basis for the AArch64 implementation, which now seemingly has inherited that same bug. Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5754 From jiefu at openjdk.java.net Wed Sep 29 23:57:22 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 29 Sep 2021 23:57:22 GMT Subject: RFR: 8274527: Minimal VM build fails after JDK-8273459 Message-ID: Hi all, The broken was observed when (gdb) bt #0 MacroAssembler::align (this=0x7ffff0025b98, modulus=32) at /home/jvm/jiefu/docker/jdk/src/hotspot/cpu/x86/macroAssembler_x86.cpp:1182 #1 0x00007ffff67fc6c5 in MacroAssembler::kernel_crc32 (this=0x7ffff0025b98, crc=0x7, buf=0x6, len=0x2, table=0x1, tmp=0xb) at /home/jvm/jiefu/docker/jdk/src/hotspot/cpu/x86/macroAssembler_x86.cpp:6911 #2 0x00007ffff69a3555 in StubGenerator::generate_updateBytesCRC32 (this=0x7ffff5e9c900) at /home/jvm/jiefu/docker/jdk/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp:6532 #3 0x00007ffff69a589b in StubGenerator::generate_initial (this=0x7ffff5e9c900) at /home/jvm/jiefu/docker/jdk/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp:7583 #4 0x00007ffff69a6801 in StubGenerator::StubGenerator (this=0x7ffff5e9c900, code=0x7ffff5e9c9c0, all=false) at /home/jvm/jiefu/docker/jdk/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp:7909 #5 0x00007ffff697fa21 in StubGenerator_generate (code=0x7ffff5e9c9c0, all=false) at /home/jvm/jiefu/docker/jdk/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp:7919 #6 0x00007ffff69a6c13 in StubRoutines::initialize1 () at /home/jvm/jiefu/docker/jdk/src/hotspot/share/runtime/stubRoutines.cpp:223 #7 0x00007ffff69a790d in stubRoutines_init1 () at /home/jvm/jiefu/docker/jdk/src/hotspot/share/runtime/stubRoutines.cpp:366 #8 0x00007ffff672044d in init_globals () at /home/jvm/jiefu/docker/jdk/src/hotspot/share/runtime/init.cpp:119 #9 0x00007ffff69fb39f in Threads::create_vm (args=0x7ffff5e9ce10, canTryAgain=0x7ffff5e9cd33) at /home/jvm/jiefu/docker/jdk/src/hotspot/share/runtime/thread.cpp:2827 #10 0x00007ffff6787879 in JNI_CreateJavaVM_inner (vm=0x7ffff5e9ce68, penv=0x7ffff5e9ce70, args=0x7ffff5e9ce10) at /home/jvm/jiefu/docker/jdk/src/hotspot/share/prims/jni.cpp:3616 #11 0x00007ffff6787a72 in JNI_CreateJavaVM (vm=0x7ffff5e9ce68, penv=0x7ffff5e9ce70, args=0x7ffff5e9ce10) at /home/jvm/jiefu/docker/jdk/src/hotspot/share/prims/jni.cpp:3704 #12 0x00007ffff79b8141 in InitializeJVM (pvm=0x7ffff5e9ce68, penv=0x7ffff5e9ce70, ifn=0x7ffff5e9cec0) at /home/jvm/jiefu/docker/jdk/src/java.base/share/native/libjli/java.c:1459 #13 0x00007ffff79b4f39 in JavaMain (_args=0x7fffffffb1a0) at /home/jvm/jiefu/docker/jdk/src/java.base/share/native/libjli/java.c:411 #14 0x00007ffff79bba79 in ThreadJavaMain (args=0x7fffffffb1a0) at /home/jvm/jiefu/docker/jdk/src/java.base/unix/native/libjli/java_md.c:651 #15 0x00007ffff779cea5 in start_thread () from /lib64/libpthread.so.0 #16 0x00007ffff72c19fd in clone () from /lib64/libc.so.6 In this case, modulus=32 and CodeEntryAlignment=16. So this assert shouldn't be added in `align` since we may use it (modulus > CodeEntryAlignment) in highly optimized hand-crafted assembly code. Thanks. Best regards, Jie ------------- Commit messages: - 8274527: Minimal VM build fails after JDK-8273459 Changes: https://git.openjdk.java.net/jdk/pull/5764/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5764&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8274527 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/5764.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5764/head:pull/5764 PR: https://git.openjdk.java.net/jdk/pull/5764 From never at openjdk.java.net Thu Sep 30 00:01:57 2021 From: never at openjdk.java.net (Tom Rodriguez) Date: Thu, 30 Sep 2021 00:01:57 GMT Subject: RFR: 8218885: Restore pop_frame and force_early_return functionality for Graal In-Reply-To: References: Message-ID: On Wed, 22 Sep 2021 05:40:40 GMT, Tom Rodriguez wrote: > This logic no longer seems to be necessary since the adjustCompilationLevel callback has been removed. I guess I'm not clear how I'm supposed to use that command line to test GraalVM. There's no JDK repository that contains both these JVMTI changes and Graal. Are these tests that can be run in 11? The JDK 11 repository doesn't have this change and also has adjustCompilationLevel removed so if those tests passed there it would be evidence that these changes can be removed. cc @dougxc can you know how to test a JDK17 GraalVM using that command line? ------------- PR: https://git.openjdk.java.net/jdk/pull/5625 From github.com+6704669+asgibbons at openjdk.java.net Thu Sep 30 01:29:43 2021 From: github.com+6704669+asgibbons at openjdk.java.net (Scott Gibbons) Date: Thu, 30 Sep 2021 01:29:43 GMT Subject: RFR: 8274527: Minimal VM build fails after JDK-8273459 In-Reply-To: References: Message-ID: On Wed, 29 Sep 2021 23:41:06 GMT, Jie Fu wrote: > Hi all, > > The broken was observed when > > (gdb) bt > #0 MacroAssembler::align (this=0x7ffff0025b98, modulus=32) at /home/jvm/jiefu/docker/jdk/src/hotspot/cpu/x86/macroAssembler_x86.cpp:1182 > #1 0x00007ffff67fc6c5 in MacroAssembler::kernel_crc32 (this=0x7ffff0025b98, crc=0x7, buf=0x6, len=0x2, table=0x1, tmp=0xb) > at /home/jvm/jiefu/docker/jdk/src/hotspot/cpu/x86/macroAssembler_x86.cpp:6911 > #2 0x00007ffff69a3555 in StubGenerator::generate_updateBytesCRC32 (this=0x7ffff5e9c900) at /home/jvm/jiefu/docker/jdk/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp:6532 > #3 0x00007ffff69a589b in StubGenerator::generate_initial (this=0x7ffff5e9c900) at /home/jvm/jiefu/docker/jdk/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp:7583 > #4 0x00007ffff69a6801 in StubGenerator::StubGenerator (this=0x7ffff5e9c900, code=0x7ffff5e9c9c0, all=false) > at /home/jvm/jiefu/docker/jdk/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp:7909 > #5 0x00007ffff697fa21 in StubGenerator_generate (code=0x7ffff5e9c9c0, all=false) at /home/jvm/jiefu/docker/jdk/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp:7919 > #6 0x00007ffff69a6c13 in StubRoutines::initialize1 () at /home/jvm/jiefu/docker/jdk/src/hotspot/share/runtime/stubRoutines.cpp:223 > #7 0x00007ffff69a790d in stubRoutines_init1 () at /home/jvm/jiefu/docker/jdk/src/hotspot/share/runtime/stubRoutines.cpp:366 > #8 0x00007ffff672044d in init_globals () at /home/jvm/jiefu/docker/jdk/src/hotspot/share/runtime/init.cpp:119 > #9 0x00007ffff69fb39f in Threads::create_vm (args=0x7ffff5e9ce10, canTryAgain=0x7ffff5e9cd33) at /home/jvm/jiefu/docker/jdk/src/hotspot/share/runtime/thread.cpp:2827 > #10 0x00007ffff6787879 in JNI_CreateJavaVM_inner (vm=0x7ffff5e9ce68, penv=0x7ffff5e9ce70, args=0x7ffff5e9ce10) > at /home/jvm/jiefu/docker/jdk/src/hotspot/share/prims/jni.cpp:3616 > #11 0x00007ffff6787a72 in JNI_CreateJavaVM (vm=0x7ffff5e9ce68, penv=0x7ffff5e9ce70, args=0x7ffff5e9ce10) > at /home/jvm/jiefu/docker/jdk/src/hotspot/share/prims/jni.cpp:3704 > #12 0x00007ffff79b8141 in InitializeJVM (pvm=0x7ffff5e9ce68, penv=0x7ffff5e9ce70, ifn=0x7ffff5e9cec0) > at /home/jvm/jiefu/docker/jdk/src/java.base/share/native/libjli/java.c:1459 > #13 0x00007ffff79b4f39 in JavaMain (_args=0x7fffffffb1a0) at /home/jvm/jiefu/docker/jdk/src/java.base/share/native/libjli/java.c:411 > #14 0x00007ffff79bba79 in ThreadJavaMain (args=0x7fffffffb1a0) at /home/jvm/jiefu/docker/jdk/src/java.base/unix/native/libjli/java_md.c:651 > #15 0x00007ffff779cea5 in start_thread () from /lib64/libpthread.so.0 > #16 0x00007ffff72c19fd in clone () from /lib64/libc.so.6 > > > In this case, modulus=32 and CodeEntryAlignment=16. > > So this assert shouldn't be added in `align` since we may use it (modulus > CodeEntryAlignment) in highly optimized hand-crafted assembly code. > > Thanks. > Best regards, > Jie Hi, Jie. With a value of 16 for `CodeEntryAlignment`, there is no way to ensure that the address of the byte following the `align(32)` is, in fact, 32-byte aligned. This is the exact case that I found that caused me to file the bug. I would suggest you verify this with an `assert` following your `align(32)` verifying that the alignment is correct. I think you'll discover that it will be unaligned ~50% of the time. This is because `align()` uses the **_offset_** from the beginning of the segment to determine the number of `nop`s to emit. If the segment has the starting address 0xXXXXXX10 (16-byte aligned), `align(32)` will calculate the `offset()` and align the pc to a multiple of 32 bytes from this starting address. This means that the address after the `align(32)` has the possibility of being 0xXXXXXX30 about half the time. I would suggest that if you absolutely require 32-byte alignment, you take a similar path that I took for 64-byte alignment. That is, to create `align32()` and have it call `align(32, pc())`. This will ensure (for stub code) that the alignment is correct. ------------- PR: https://git.openjdk.java.net/jdk/pull/5764 From jiefu at openjdk.java.net Thu Sep 30 03:06:50 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Thu, 30 Sep 2021 03:06:50 GMT Subject: RFR: 8274527: Minimal VM build fails after JDK-8273459 [v2] In-Reply-To: References: Message-ID: > Hi all, > > The broken was observed when > > (gdb) bt > #0 MacroAssembler::align (this=0x7ffff0025b98, modulus=32) at /home/jvm/jiefu/docker/jdk/src/hotspot/cpu/x86/macroAssembler_x86.cpp:1182 > #1 0x00007ffff67fc6c5 in MacroAssembler::kernel_crc32 (this=0x7ffff0025b98, crc=0x7, buf=0x6, len=0x2, table=0x1, tmp=0xb) > at /home/jvm/jiefu/docker/jdk/src/hotspot/cpu/x86/macroAssembler_x86.cpp:6911 > #2 0x00007ffff69a3555 in StubGenerator::generate_updateBytesCRC32 (this=0x7ffff5e9c900) at /home/jvm/jiefu/docker/jdk/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp:6532 > #3 0x00007ffff69a589b in StubGenerator::generate_initial (this=0x7ffff5e9c900) at /home/jvm/jiefu/docker/jdk/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp:7583 > #4 0x00007ffff69a6801 in StubGenerator::StubGenerator (this=0x7ffff5e9c900, code=0x7ffff5e9c9c0, all=false) > at /home/jvm/jiefu/docker/jdk/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp:7909 > #5 0x00007ffff697fa21 in StubGenerator_generate (code=0x7ffff5e9c9c0, all=false) at /home/jvm/jiefu/docker/jdk/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp:7919 > #6 0x00007ffff69a6c13 in StubRoutines::initialize1 () at /home/jvm/jiefu/docker/jdk/src/hotspot/share/runtime/stubRoutines.cpp:223 > #7 0x00007ffff69a790d in stubRoutines_init1 () at /home/jvm/jiefu/docker/jdk/src/hotspot/share/runtime/stubRoutines.cpp:366 > #8 0x00007ffff672044d in init_globals () at /home/jvm/jiefu/docker/jdk/src/hotspot/share/runtime/init.cpp:119 > #9 0x00007ffff69fb39f in Threads::create_vm (args=0x7ffff5e9ce10, canTryAgain=0x7ffff5e9cd33) at /home/jvm/jiefu/docker/jdk/src/hotspot/share/runtime/thread.cpp:2827 > #10 0x00007ffff6787879 in JNI_CreateJavaVM_inner (vm=0x7ffff5e9ce68, penv=0x7ffff5e9ce70, args=0x7ffff5e9ce10) > at /home/jvm/jiefu/docker/jdk/src/hotspot/share/prims/jni.cpp:3616 > #11 0x00007ffff6787a72 in JNI_CreateJavaVM (vm=0x7ffff5e9ce68, penv=0x7ffff5e9ce70, args=0x7ffff5e9ce10) > at /home/jvm/jiefu/docker/jdk/src/hotspot/share/prims/jni.cpp:3704 > #12 0x00007ffff79b8141 in InitializeJVM (pvm=0x7ffff5e9ce68, penv=0x7ffff5e9ce70, ifn=0x7ffff5e9cec0) > at /home/jvm/jiefu/docker/jdk/src/java.base/share/native/libjli/java.c:1459 > #13 0x00007ffff79b4f39 in JavaMain (_args=0x7fffffffb1a0) at /home/jvm/jiefu/docker/jdk/src/java.base/share/native/libjli/java.c:411 > #14 0x00007ffff79bba79 in ThreadJavaMain (args=0x7fffffffb1a0) at /home/jvm/jiefu/docker/jdk/src/java.base/unix/native/libjli/java_md.c:651 > #15 0x00007ffff779cea5 in start_thread () from /lib64/libpthread.so.0 > #16 0x00007ffff72c19fd in clone () from /lib64/libc.so.6 > > > In this case, modulus=32 and CodeEntryAlignment=16. > > So this assert shouldn't be added in `align` since we may use it (modulus > CodeEntryAlignment) in highly optimized hand-crafted assembly code. > > Thanks. > Best regards, > Jie Jie Fu has updated the pull request incrementally with one additional commit since the last revision: Use align32 ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5764/files - new: https://git.openjdk.java.net/jdk/pull/5764/files/63d635bf..cdd8743b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5764&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5764&range=00-01 Stats: 24 lines in 4 files changed: 7 ins; 0 del; 17 mod Patch: https://git.openjdk.java.net/jdk/pull/5764.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5764/head:pull/5764 PR: https://git.openjdk.java.net/jdk/pull/5764 From jiefu at openjdk.java.net Thu Sep 30 03:10:30 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Thu, 30 Sep 2021 03:10:30 GMT Subject: RFR: 8274527: Minimal VM build fails after JDK-8273459 In-Reply-To: References: Message-ID: On Thu, 30 Sep 2021 01:26:45 GMT, Scott Gibbons wrote: > Hi, Jie. With a value of 16 for `CodeEntryAlignment`, there is no way to ensure that the address of the byte following the `align(32)` is, in fact, 32-byte aligned. This is the exact case that I found that caused me to file the bug. I would suggest you verify this with an `assert` following your `align(32)` verifying that the alignment is correct. I think you'll discover that it will be unaligned ~50% of the time. > > This is because `align()` uses the **_offset_** from the beginning of the segment to determine the number of `nop`s to emit. If the segment has the starting address 0xXXXXXX10 (16-byte aligned), `align(32)` will calculate the `offset()` and align the pc to a multiple of 32 bytes from this starting address. This means that the address after the `align(32)` has the possibility of being 0xXXXXXX30 about half the time. > > I would suggest that if you absolutely require 32-byte alignment, you take a similar path that I took for 64-byte alignment. That is, to create `align32()` and have it call `align(32, pc())`. This will ensure (for stub code) that the alignment is correct. Ah, you are right. I missed that align is with the offset() not the pc(). So the assert should make sense. Updated. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/5764 From eosterlund at openjdk.java.net Thu Sep 30 05:58:32 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 30 Sep 2021 05:58:32 GMT Subject: RFR: 8274501: c2i entry barriers read int as long on AArch64 In-Reply-To: References: <-oxqSmmhC7j9je1xGxVUP2KxQoxmetKowuILNtq_lcY=.0f8a5327-bc46-45bf-8a6a-bae4cab70fe0@github.com> Message-ID: On Wed, 29 Sep 2021 16:42:43 GMT, Erik ?sterlund wrote: > Looks good. Thanks, Kim. ------------- PR: https://git.openjdk.java.net/jdk/pull/5754 From dnsimon at openjdk.java.net Thu Sep 30 08:22:34 2021 From: dnsimon at openjdk.java.net (Doug Simon) Date: Thu, 30 Sep 2021 08:22:34 GMT Subject: RFR: 8218885: Restore pop_frame and force_early_return functionality for Graal In-Reply-To: References: Message-ID: On Wed, 22 Sep 2021 05:40:40 GMT, Tom Rodriguez wrote: > This logic no longer seems to be necessary since the adjustCompilationLevel callback has been removed. I sent instructions via email since it requires using internal resources. ------------- PR: https://git.openjdk.java.net/jdk/pull/5625 From xliu at openjdk.java.net Thu Sep 30 08:28:10 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Thu, 30 Sep 2021 08:28:10 GMT Subject: RFR: 8273608: Deadlock when jcmd of OnError attaches to itself [v3] In-Reply-To: <3mILJs2Lcq7t5gUDP70FH2LVsm-NT2UTsm1JY-rCKB0=.d4dd4b55-9476-43ad-a0cf-ce6d2a9bef4e@github.com> References: <3mILJs2Lcq7t5gUDP70FH2LVsm-NT2UTsm1JY-rCKB0=.d4dd4b55-9476-43ad-a0cf-ce6d2a9bef4e@github.com> Message-ID: > This patch allows the custom commands of OnError to attach to HotSpot itself. > It sets the thread of report_and_die() to Native before os::fork_and_exec(cmd). > This prevents cmds which require safepoint synchronization from deadlock. > eg. OnError='jcmd %p Thread.print'. > > Without this patch, we will encounter a deadlock at safepoint synchronization. > `"main" #1` is the very thread which executes `os::fork_and_exec(cmd)`. > > > Aborting due to java.lang.OutOfMemoryError: Java heap space > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (debug.cpp:364), pid=94632, tid=94633 > # fatal error: OutOfMemory encountered: Java heap space > # > # JRE version: OpenJDK Runtime Environment (18.0) (build 18-internal+0-adhoc.xxinliu.jdk) > # Java VM: OpenJDK 64-Bit Server VM (18-internal+0-adhoc.xxinliu.jdk, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # /local/home/xxinliu/JDK-2085/hs_err_pid94632.log > # > # -XX:OnError="jcmd %p Thread.print" > # Executing /bin/sh -c "jcmd 94632 Thread.print" ... > 94632: > [10.616s][warning][safepoint] > [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timeout detected: > [10.616s][warning][safepoint] # SafepointSynchronize::begin: Timed out while spinning to reach a safepoint. > [10.616s][warning][safepoint] # SafepointSynchronize::begin: Threads which did not reach the safepoint: > [10.616s][warning][safepoint] # "main" #1 prio=5 os_prio=0 cpu=236.97ms elapsed=10.61s tid=0x00007f01b00232f0 nid=94633 runnable [0x00007f01b7a08000] > [10.616s][warning][safepoint] java.lang.Thread.State: RUNNABLE > [10.616s][warning][safepoint] > [10.616s][warning][safepoint] # SafepointSynchronize::begin: (End of list) Xin Liu has updated the pull request incrementally with one additional commit since the last revision: Update unlock_locks_on_error() for JavaThread. changes for reviewer's feedbacks. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/5590/files - new: https://git.openjdk.java.net/jdk/pull/5590/files/bf684e5b..451cdda3 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=5590&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=5590&range=01-02 Stats: 21 lines in 4 files changed: 10 ins; 7 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/5590.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5590/head:pull/5590 PR: https://git.openjdk.java.net/jdk/pull/5590 From dholmes at openjdk.java.net Thu Sep 30 08:28:11 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 30 Sep 2021 08:28:11 GMT Subject: RFR: 8273608: Deadlock when jcmd of OnError attaches to itself [v2] In-Reply-To: References: <3mILJs2Lcq7t5gUDP70FH2LVsm-NT2UTsm1JY-rCKB0=.d4dd4b55-9476-43ad-a0cf-ce6d2a9bef4e@github.com> <8LmYbwFdozu6SSdD6j0sE4dKDiu19bvqExkE-TR_4YA=.5f7caeab-e42e-4815-bab0-6da3ca0ae514@github.com> Message-ID: On Wed, 29 Sep 2021 05:06:39 GMT, Xin Liu wrote: >> src/hotspot/share/runtime/mutexLocker.cpp line 396: >> >>> 394: _mutex_array[i]->unlock(); >>> 395: } >>> 396: } >> >> Surely we don't need this in a debug build as we already unlocked all owned locks? > > This logic is for release build. > > Yes, we don't need them in debug build. We should have released all owning mutexes above. > It's no-op in debug build because owner() should be NULL. `thread` isn't NULL in this function. > > Here is current bookkeeping. From a thread, we can't find all owning mutexes. @pchilano said we can try _mutex_array in release build. I think the idea is that we are supposed to capture failure in debug build. > > | class | debug | release | | > |-------------|---------------------------|--------------|---| > | Thread | _owned_locks (listedlist) | N/A | | > | Mutex | _owner | _owner | | > | MutexLocker | _mutex_array | _mutex_array | | > > Shall we fix _mutex_array in release? I think we can change it to GrowableArray and keep track all mutexes in runtime. My point is that the code to process the `_mutex_array` should be in a `#else` so that we don't build it or use for debug builds. So a debug build uses `owned_locks()` while a release build uses `_mutex_array`. ------------- PR: https://git.openjdk.java.net/jdk/pull/5590 From aph at openjdk.java.net Thu Sep 30 09:53:32 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Thu, 30 Sep 2021 09:53:32 GMT Subject: RFR: 8274501: c2i entry barriers read int as long on AArch64 In-Reply-To: References: Message-ID: On Wed, 29 Sep 2021 15:12:40 GMT, Erik ?sterlund wrote: > There was a bug in the x86_64 implementation of the c2i entry barriers. We read the CLD::_keep_alive int as a 64 bit integer, while it is of course in fact a 32 bit integer. It was fixed in the patch that ported it to x86_32 (JDK-8235262). However, somewhere in-between I think the wrong code was used as a basis for the AArch64 implementation, which now seemingly has inherited that same bug. Marked as reviewed by aph (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/5754 From shade at openjdk.java.net Thu Sep 30 10:03:30 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 30 Sep 2021 10:03:30 GMT Subject: RFR: 8274501: c2i entry barriers read int as long on AArch64 In-Reply-To: References: Message-ID: <4Dyq3LkqhE-TpKcG7LhuxXlKqzK45xVQlc0rRuG9tSk=.d90d46c4-e08e-49d7-b3d6-73ea7459e40c@github.com> On Wed, 29 Sep 2021 15:12:40 GMT, Erik ?sterlund wrote: > There was a bug in the x86_64 implementation of the c2i entry barriers. We read the CLD::_keep_alive int as a 64 bit integer, while it is of course in fact a 32 bit integer. It was fixed in the patch that ported it to x86_32 (JDK-8235262). However, somewhere in-between I think the wrong code was used as a basis for the AArch64 implementation, which now seemingly has inherited that same bug. Submitted [JDK-8274550](https://bugs.openjdk.java.net/browse/JDK-8274550) for the suspect PPC problem. ------------- PR: https://git.openjdk.java.net/jdk/pull/5754 From stefank at openjdk.java.net Thu Sep 30 11:22:45 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Thu, 30 Sep 2021 11:22:45 GMT Subject: RFR: 8274322: Problems with oopDesc construction [v2] In-Reply-To: References: Message-ID: On Wed, 29 Sep 2021 06:57:10 GMT, Kim Barrett wrote: >> Please review this change to the default constructor for markWord and >> associated "change" to construction of oopDesc. >> >> The current code never invokes the constructor for oopDesc or any of its >> derived classes. For that to be permissible according to the Standard, >> those classes must be trivially default constructible. And for that to be >> the case, the markWord default constructor must be trivial. >> >> This change consists of three parts. >> >> (1) The markWord default constructor is changed to be trivial, so the >> default constructors for oopDesc and classes derived from it will also be >> trivial. It wasn't previously trivial because the mechanism for making it so >> (a default definition) is a C++11 feature that wasn't yet supported when the >> previous constructor was defined. >> >> (2) This change also adds static asserts to verify the relevant classes have >> trivial default constructors, to prevent later changes from unintentionally >> breaking this. >> >> (3) This change also makes oopDesc noncopyable, to prevent inadvertent usage >> of these operations that don't make any sense. >> >> A different approach would be to always use placement new with an >> appropriate constructor to perform the initialization, perhaps encapsulated >> in factory functions. I did some exploration in that direction. It's a much >> larger and more complex change, though the final behavior (use constructors >> for initialization) is simpler. >> >> Testing: >> tier1 > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > dholmes review Marked as reviewed by stefank (Reviewer). src/hotspot/share/oops/markWord.hpp line 79: > 77: > 78: // It is critical for performance that this class be trivially > 79: // destructable, copyable, and assignable. Given the comment, would it make sense to also explicitly mark them as `= default`? ------------- PR: https://git.openjdk.java.net/jdk/pull/5729 From github.com+6704669+asgibbons at openjdk.java.net Thu Sep 30 13:55:37 2021 From: github.com+6704669+asgibbons at openjdk.java.net (Scott Gibbons) Date: Thu, 30 Sep 2021 13:55:37 GMT Subject: RFR: 8274527: Minimal VM build fails after JDK-8273459 [v2] In-Reply-To: References: Message-ID: On Thu, 30 Sep 2021 03:06:50 GMT, Jie Fu wrote: >> Hi all, >> >> The broken was observed when >> >> (gdb) bt >> #0 MacroAssembler::align (this=0x7ffff0025b98, modulus=32) at /home/jvm/jiefu/docker/jdk/src/hotspot/cpu/x86/macroAssembler_x86.cpp:1182 >> #1 0x00007ffff67fc6c5 in MacroAssembler::kernel_crc32 (this=0x7ffff0025b98, crc=0x7, buf=0x6, len=0x2, table=0x1, tmp=0xb) >> at /home/jvm/jiefu/docker/jdk/src/hotspot/cpu/x86/macroAssembler_x86.cpp:6911 >> #2 0x00007ffff69a3555 in StubGenerator::generate_updateBytesCRC32 (this=0x7ffff5e9c900) at /home/jvm/jiefu/docker/jdk/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp:6532 >> #3 0x00007ffff69a589b in StubGenerator::generate_initial (this=0x7ffff5e9c900) at /home/jvm/jiefu/docker/jdk/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp:7583 >> #4 0x00007ffff69a6801 in StubGenerator::StubGenerator (this=0x7ffff5e9c900, code=0x7ffff5e9c9c0, all=false) >> at /home/jvm/jiefu/docker/jdk/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp:7909 >> #5 0x00007ffff697fa21 in StubGenerator_generate (code=0x7ffff5e9c9c0, all=false) at /home/jvm/jiefu/docker/jdk/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp:7919 >> #6 0x00007ffff69a6c13 in StubRoutines::initialize1 () at /home/jvm/jiefu/docker/jdk/src/hotspot/share/runtime/stubRoutines.cpp:223 >> #7 0x00007ffff69a790d in stubRoutines_init1 () at /home/jvm/jiefu/docker/jdk/src/hotspot/share/runtime/stubRoutines.cpp:366 >> #8 0x00007ffff672044d in init_globals () at /home/jvm/jiefu/docker/jdk/src/hotspot/share/runtime/init.cpp:119 >> #9 0x00007ffff69fb39f in Threads::create_vm (args=0x7ffff5e9ce10, canTryAgain=0x7ffff5e9cd33) at /home/jvm/jiefu/docker/jdk/src/hotspot/share/runtime/thread.cpp:2827 >> #10 0x00007ffff6787879 in JNI_CreateJavaVM_inner (vm=0x7ffff5e9ce68, penv=0x7ffff5e9ce70, args=0x7ffff5e9ce10) >> at /home/jvm/jiefu/docker/jdk/src/hotspot/share/prims/jni.cpp:3616 >> #11 0x00007ffff6787a72 in JNI_CreateJavaVM (vm=0x7ffff5e9ce68, penv=0x7ffff5e9ce70, args=0x7ffff5e9ce10) >> at /home/jvm/jiefu/docker/jdk/src/hotspot/share/prims/jni.cpp:3704 >> #12 0x00007ffff79b8141 in InitializeJVM (pvm=0x7ffff5e9ce68, penv=0x7ffff5e9ce70, ifn=0x7ffff5e9cec0) >> at /home/jvm/jiefu/docker/jdk/src/java.base/share/native/libjli/java.c:1459 >> #13 0x00007ffff79b4f39 in JavaMain (_args=0x7fffffffb1a0) at /home/jvm/jiefu/docker/jdk/src/java.base/share/native/libjli/java.c:411 >> #14 0x00007ffff79bba79 in ThreadJavaMain (args=0x7fffffffb1a0) at /home/jvm/jiefu/docker/jdk/src/java.base/unix/native/libjli/java_md.c:651 >> #15 0x00007ffff779cea5 in start_thread () from /lib64/libpthread.so.0 >> #16 0x00007ffff72c19fd in clone () from /lib64/libc.so.6 >> >> >> In this case, modulus=32 and CodeEntryAlignment=16. >> >> So this assert shouldn't be added in `align` since we may use it (modulus > CodeEntryAlignment) in highly optimized hand-crafted assembly code. >> >> Thanks. >> Best regards, >> Jie > > Jie Fu has updated the pull request incrementally with one additional commit since the last revision: > > Use align32 Marked as reviewed by asgibbons at github.com (no known OpenJDK username). Looks good to me. ------------- PR: https://git.openjdk.java.net/jdk/pull/5764 From mdoerr at openjdk.java.net Thu Sep 30 14:21:46 2021 From: mdoerr at openjdk.java.net (Martin Doerr) Date: Thu, 30 Sep 2021 14:21:46 GMT Subject: RFR: 8274550: c2i entry barriers read int as long on PPC Message-ID: `_keep_alive` is an int. We shouldn't use a 64 bit load. ------------- Commit messages: - 8274550: c2i entry barriers read int as long on PPC Changes: https://git.openjdk.java.net/jdk/pull/5776/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=5776&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8274550 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/5776.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/5776/head:pull/5776 PR: https://git.openjdk.java.net/jdk/pull/5776 From eosterlund at openjdk.java.net Thu Sep 30 14:29:31 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 30 Sep 2021 14:29:31 GMT Subject: RFR: 8274550: c2i entry barriers read int as long on PPC In-Reply-To: References: Message-ID: <9mc0hdz8vZYHNkPekDNYKB0wadtjCt3kQPrd14-F0ts=.d63b5c4b-950d-446d-9fd8-b78bfe14ee45@github.com> On Thu, 30 Sep 2021 14:15:08 GMT, Martin Doerr wrote: > `_keep_alive` is an int. We shouldn't use a 64 bit load. Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5776 From eosterlund at openjdk.java.net Thu Sep 30 15:55:38 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 30 Sep 2021 15:55:38 GMT Subject: RFR: 8274501: c2i entry barriers read int as long on AArch64 In-Reply-To: References: Message-ID: On Wed, 29 Sep 2021 15:12:40 GMT, Erik ?sterlund wrote: > There was a bug in the x86_64 implementation of the c2i entry barriers. We read the CLD::_keep_alive int as a 64 bit integer, while it is of course in fact a 32 bit integer. It was fixed in the patch that ported it to x86_32 (JDK-8235262). However, somewhere in-between I think the wrong code was used as a basis for the AArch64 implementation, which now seemingly has inherited that same bug. Thanks everyone. ------------- PR: https://git.openjdk.java.net/jdk/pull/5754 From eosterlund at openjdk.java.net Thu Sep 30 15:55:39 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 30 Sep 2021 15:55:39 GMT Subject: Integrated: 8274501: c2i entry barriers read int as long on AArch64 In-Reply-To: References: Message-ID: On Wed, 29 Sep 2021 15:12:40 GMT, Erik ?sterlund wrote: > There was a bug in the x86_64 implementation of the c2i entry barriers. We read the CLD::_keep_alive int as a 64 bit integer, while it is of course in fact a 32 bit integer. It was fixed in the patch that ported it to x86_32 (JDK-8235262). However, somewhere in-between I think the wrong code was used as a basis for the AArch64 implementation, which now seemingly has inherited that same bug. This pull request has now been integrated. Changeset: f08180f3 Author: Erik ?sterlund URL: https://git.openjdk.java.net/jdk/commit/f08180f35f18263e33d96b6d1f06e5129328f01a Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8274501: c2i entry barriers read int as long on AArch64 Reviewed-by: shade, kbarrett, aph ------------- PR: https://git.openjdk.java.net/jdk/pull/5754 From shade at openjdk.java.net Thu Sep 30 16:07:38 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 30 Sep 2021 16:07:38 GMT Subject: RFR: 8274550: c2i entry barriers read int as long on PPC In-Reply-To: References: Message-ID: On Thu, 30 Sep 2021 14:15:08 GMT, Martin Doerr wrote: > `_keep_alive` is an int. We shouldn't use a 64 bit load. Look good! ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5776 From kvn at openjdk.java.net Thu Sep 30 17:59:37 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Thu, 30 Sep 2021 17:59:37 GMT Subject: RFR: 8274527: Minimal VM build fails after JDK-8273459 [v2] In-Reply-To: References: Message-ID: On Thu, 30 Sep 2021 03:06:50 GMT, Jie Fu wrote: >> Hi all, >> >> The broken was observed when >> >> (gdb) bt >> #0 MacroAssembler::align (this=0x7ffff0025b98, modulus=32) at /home/jvm/jiefu/docker/jdk/src/hotspot/cpu/x86/macroAssembler_x86.cpp:1182 >> #1 0x00007ffff67fc6c5 in MacroAssembler::kernel_crc32 (this=0x7ffff0025b98, crc=0x7, buf=0x6, len=0x2, table=0x1, tmp=0xb) >> at /home/jvm/jiefu/docker/jdk/src/hotspot/cpu/x86/macroAssembler_x86.cpp:6911 >> #2 0x00007ffff69a3555 in StubGenerator::generate_updateBytesCRC32 (this=0x7ffff5e9c900) at /home/jvm/jiefu/docker/jdk/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp:6532 >> #3 0x00007ffff69a589b in StubGenerator::generate_initial (this=0x7ffff5e9c900) at /home/jvm/jiefu/docker/jdk/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp:7583 >> #4 0x00007ffff69a6801 in StubGenerator::StubGenerator (this=0x7ffff5e9c900, code=0x7ffff5e9c9c0, all=false) >> at /home/jvm/jiefu/docker/jdk/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp:7909 >> #5 0x00007ffff697fa21 in StubGenerator_generate (code=0x7ffff5e9c9c0, all=false) at /home/jvm/jiefu/docker/jdk/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp:7919 >> #6 0x00007ffff69a6c13 in StubRoutines::initialize1 () at /home/jvm/jiefu/docker/jdk/src/hotspot/share/runtime/stubRoutines.cpp:223 >> #7 0x00007ffff69a790d in stubRoutines_init1 () at /home/jvm/jiefu/docker/jdk/src/hotspot/share/runtime/stubRoutines.cpp:366 >> #8 0x00007ffff672044d in init_globals () at /home/jvm/jiefu/docker/jdk/src/hotspot/share/runtime/init.cpp:119 >> #9 0x00007ffff69fb39f in Threads::create_vm (args=0x7ffff5e9ce10, canTryAgain=0x7ffff5e9cd33) at /home/jvm/jiefu/docker/jdk/src/hotspot/share/runtime/thread.cpp:2827 >> #10 0x00007ffff6787879 in JNI_CreateJavaVM_inner (vm=0x7ffff5e9ce68, penv=0x7ffff5e9ce70, args=0x7ffff5e9ce10) >> at /home/jvm/jiefu/docker/jdk/src/hotspot/share/prims/jni.cpp:3616 >> #11 0x00007ffff6787a72 in JNI_CreateJavaVM (vm=0x7ffff5e9ce68, penv=0x7ffff5e9ce70, args=0x7ffff5e9ce10) >> at /home/jvm/jiefu/docker/jdk/src/hotspot/share/prims/jni.cpp:3704 >> #12 0x00007ffff79b8141 in InitializeJVM (pvm=0x7ffff5e9ce68, penv=0x7ffff5e9ce70, ifn=0x7ffff5e9cec0) >> at /home/jvm/jiefu/docker/jdk/src/java.base/share/native/libjli/java.c:1459 >> #13 0x00007ffff79b4f39 in JavaMain (_args=0x7fffffffb1a0) at /home/jvm/jiefu/docker/jdk/src/java.base/share/native/libjli/java.c:411 >> #14 0x00007ffff79bba79 in ThreadJavaMain (args=0x7fffffffb1a0) at /home/jvm/jiefu/docker/jdk/src/java.base/unix/native/libjli/java_md.c:651 >> #15 0x00007ffff779cea5 in start_thread () from /lib64/libpthread.so.0 >> #16 0x00007ffff72c19fd in clone () from /lib64/libc.so.6 >> >> >> In this case, modulus=32 and CodeEntryAlignment=16. >> >> So this assert shouldn't be added in `align` since we may use it (modulus > CodeEntryAlignment) in highly optimized hand-crafted assembly code. >> >> Thanks. >> Best regards, >> Jie > > Jie Fu has updated the pull request incrementally with one additional commit since the last revision: > > Use align32 Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/5764 From Divino.Cesar at microsoft.com Thu Sep 30 20:51:32 2021 From: Divino.Cesar at microsoft.com (Cesar Soares Lucas) Date: Thu, 30 Sep 2021 20:51:32 +0000 Subject: RFC - Improving C2 Escape Analysis Message-ID: Hi there! I've spent the past few weeks investigating the C2 Escape Analysis implementation with the goal of identifying which part(s) of it would benefit the most from a contribution. ? As a conclusion to that investigation, I wrote a report where I list the most evident points, accompanied with a _preliminary_ quantitative analysis of how effective the current implementation is for finding opportunities for Scalar Replacement. ? I'd like to invite you all to read the document and please provide feedback on any issues that you find relevant to the topic. I'm particularly interested in getting your thoughts on the points I outline in the Questions and Future Work sections. Feel free to provide feedback by email or as a comment on the Gist page. Here is the link to the document: https://gist.github.com/JohnTortugo/c2607821202634a6509ec3c321ebf370 Thank you, Cesar Soares From mark.reinhold at oracle.com Thu Sep 30 21:03:35 2021 From: mark.reinhold at oracle.com (mark.reinhold at oracle.com) Date: Thu, 30 Sep 2021 14:03:35 -0700 Subject: RFC - Improving C2 Escape Analysis In-Reply-To: References: Message-ID: <20210930140335.648146897@eggemoggin.niobe.net> 2021/9/30 13:51:32 -0700, divino.cesar at microsoft.com: > I've spent the past few weeks investigating the C2 Escape Analysis > implementation with the goal of identifying which part(s) of it would benefit > the most from a contribution. > > As a conclusion to that investigation, I wrote a report where I list the most > evident points, accompanied with a _preliminary_ quantitative analysis of how > effective the current implementation is for finding opportunities for Scalar > Replacement. > > ... > > Here is the link to the document: > https://gist.github.com/JohnTortugo/c2607821202634a6509ec3c321ebf370 Thanks for writing this up! For IP clarity, could you please post a copy of this document either to this mailing list or to your directory on cr.openjdk.java.net [1]? - Mark [1] https://openjdk.java.net/guide/codeReview.html From jiefu at openjdk.java.net Thu Sep 30 23:15:56 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Thu, 30 Sep 2021 23:15:56 GMT Subject: RFR: 8274527: Minimal VM build fails after JDK-8273459 [v2] In-Reply-To: References: Message-ID: On Thu, 30 Sep 2021 13:52:03 GMT, Scott Gibbons wrote: >> Jie Fu has updated the pull request incrementally with one additional commit since the last revision: >> >> Use align32 > > Looks good to me. Thanks @asgibbons and @vnkozlov for your review. ------------- PR: https://git.openjdk.java.net/jdk/pull/5764 From jiefu at openjdk.java.net Thu Sep 30 23:15:58 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Thu, 30 Sep 2021 23:15:58 GMT Subject: Integrated: 8274527: Minimal VM build fails after JDK-8273459 In-Reply-To: References: Message-ID: On Wed, 29 Sep 2021 23:41:06 GMT, Jie Fu wrote: > Hi all, > > The broken was observed when > > (gdb) bt > #0 MacroAssembler::align (this=0x7ffff0025b98, modulus=32) at /home/jvm/jiefu/docker/jdk/src/hotspot/cpu/x86/macroAssembler_x86.cpp:1182 > #1 0x00007ffff67fc6c5 in MacroAssembler::kernel_crc32 (this=0x7ffff0025b98, crc=0x7, buf=0x6, len=0x2, table=0x1, tmp=0xb) > at /home/jvm/jiefu/docker/jdk/src/hotspot/cpu/x86/macroAssembler_x86.cpp:6911 > #2 0x00007ffff69a3555 in StubGenerator::generate_updateBytesCRC32 (this=0x7ffff5e9c900) at /home/jvm/jiefu/docker/jdk/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp:6532 > #3 0x00007ffff69a589b in StubGenerator::generate_initial (this=0x7ffff5e9c900) at /home/jvm/jiefu/docker/jdk/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp:7583 > #4 0x00007ffff69a6801 in StubGenerator::StubGenerator (this=0x7ffff5e9c900, code=0x7ffff5e9c9c0, all=false) > at /home/jvm/jiefu/docker/jdk/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp:7909 > #5 0x00007ffff697fa21 in StubGenerator_generate (code=0x7ffff5e9c9c0, all=false) at /home/jvm/jiefu/docker/jdk/src/hotspot/cpu/x86/stubGenerator_x86_64.cpp:7919 > #6 0x00007ffff69a6c13 in StubRoutines::initialize1 () at /home/jvm/jiefu/docker/jdk/src/hotspot/share/runtime/stubRoutines.cpp:223 > #7 0x00007ffff69a790d in stubRoutines_init1 () at /home/jvm/jiefu/docker/jdk/src/hotspot/share/runtime/stubRoutines.cpp:366 > #8 0x00007ffff672044d in init_globals () at /home/jvm/jiefu/docker/jdk/src/hotspot/share/runtime/init.cpp:119 > #9 0x00007ffff69fb39f in Threads::create_vm (args=0x7ffff5e9ce10, canTryAgain=0x7ffff5e9cd33) at /home/jvm/jiefu/docker/jdk/src/hotspot/share/runtime/thread.cpp:2827 > #10 0x00007ffff6787879 in JNI_CreateJavaVM_inner (vm=0x7ffff5e9ce68, penv=0x7ffff5e9ce70, args=0x7ffff5e9ce10) > at /home/jvm/jiefu/docker/jdk/src/hotspot/share/prims/jni.cpp:3616 > #11 0x00007ffff6787a72 in JNI_CreateJavaVM (vm=0x7ffff5e9ce68, penv=0x7ffff5e9ce70, args=0x7ffff5e9ce10) > at /home/jvm/jiefu/docker/jdk/src/hotspot/share/prims/jni.cpp:3704 > #12 0x00007ffff79b8141 in InitializeJVM (pvm=0x7ffff5e9ce68, penv=0x7ffff5e9ce70, ifn=0x7ffff5e9cec0) > at /home/jvm/jiefu/docker/jdk/src/java.base/share/native/libjli/java.c:1459 > #13 0x00007ffff79b4f39 in JavaMain (_args=0x7fffffffb1a0) at /home/jvm/jiefu/docker/jdk/src/java.base/share/native/libjli/java.c:411 > #14 0x00007ffff79bba79 in ThreadJavaMain (args=0x7fffffffb1a0) at /home/jvm/jiefu/docker/jdk/src/java.base/unix/native/libjli/java_md.c:651 > #15 0x00007ffff779cea5 in start_thread () from /lib64/libpthread.so.0 > #16 0x00007ffff72c19fd in clone () from /lib64/libc.so.6 > > > In this case, modulus=32 and CodeEntryAlignment=16. > > So this assert shouldn't be added in `align` since we may use it (modulus > CodeEntryAlignment) in highly optimized hand-crafted assembly code. > > Thanks. > Best regards, > Jie This pull request has now been integrated. Changeset: a8edd1b3 Author: Jie Fu URL: https://git.openjdk.java.net/jdk/commit/a8edd1b360d4e5f35aff371a91fda42eeb00d395 Stats: 22 lines in 4 files changed: 5 ins; 0 del; 17 mod 8274527: Minimal VM build fails after JDK-8273459 Reviewed-by: kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/5764