From kbarrett at openjdk.java.net Fri Jan 1 03:52:03 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 1 Jan 2021 03:52:03 GMT Subject: RFR: 8258382: Fix optimization-unstable code involving pointer overflow [v2] In-Reply-To: References: Message-ID: On Fri, 25 Dec 2020 10:32:09 GMT, Hao Sun wrote: >> Optimization-unstable code refers to code that is unexpectedly discarded >> by compiler optimizations due to undefined behavior in the program. >> >> We applied a static checker called STACK (prototype from SOSP'13 paper >> [1]) to OpenJDK source code and found the following two sites of >> potential unstable code involving pointer overflow. >> >> Removing undefined behaviors would make the code stable. >> >> [1] https://css.csail.mit.edu/stack/ >> >> -------- >> Note that we tested locally Jtreg tests ( tier1 and jdk::tier3) were passed on Linux x86/aarch64 machines after apply this patch. > > Hao Sun has updated the pull request incrementally with one additional commit since the last revision: > > Fix unstable code involving pointer overflow only > > Move the patch involving singed integer overflow into another PR. > In this patch we only fix optimization-unstable code involving pointer > overflow. > > Update the patch based on feedback from upstream. > 1) Remove unnecessary comment. > 2) Remove unnecessary check between end() and top() > 3) Use pointer_delta() to compute the offset between two addresses. > > Change-Id: Icade8e1a4b684081036c85fd2a2b65b5c3b27f54 > CustomizedGitHooks: yes Marked as reviewed by kbarrett (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1886 From kbarrett at openjdk.java.net Fri Jan 1 03:52:04 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 1 Jan 2021 03:52:04 GMT Subject: RFR: 8258382: Fix optimization-unstable code involving pointer overflow [v2] In-Reply-To: References: Message-ID: On Fri, 1 Jan 2021 03:48:25 GMT, Kim Barrett wrote: >> Hao Sun has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix unstable code involving pointer overflow only >> >> Move the patch involving singed integer overflow into another PR. >> In this patch we only fix optimization-unstable code involving pointer >> overflow. >> >> Update the patch based on feedback from upstream. >> 1) Remove unnecessary comment. >> 2) Remove unnecessary check between end() and top() >> 3) Use pointer_delta() to compute the offset between two addresses. >> >> Change-Id: Icade8e1a4b684081036c85fd2a2b65b5c3b27f54 >> CustomizedGitHooks: yes > > Marked as reviewed by kbarrett (Reviewer). Thanks for splitting up the changes. The GC changes look good. ------------- PR: https://git.openjdk.java.net/jdk/pull/1886 From kbarrett at openjdk.java.net Fri Jan 1 03:54:02 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 1 Jan 2021 03:54:02 GMT Subject: RFR: 8259020: null-check of g1 write_ref_field_pre_entry is not necessary In-Reply-To: References: Message-ID: On Thu, 31 Dec 2020 08:12:46 GMT, Xin Liu wrote: > orig is not null because G1BarrierSetC2 won't invoke write_ref_field_pre_entry > if pre_val is NULL. Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1913 From github.com+16932759+shqking at openjdk.java.net Fri Jan 1 07:14:02 2021 From: github.com+16932759+shqking at openjdk.java.net (Hao Sun) Date: Fri, 1 Jan 2021 07:14:02 GMT Subject: RFR: 8258382: Fix optimization-unstable code involving pointer overflow [v2] In-Reply-To: References: Message-ID: <9b_OfSainLug_7jZpXrvK-XfQjFtuJSkcrMWFV9AH1I=.e338785d-d102-4582-9571-f5fcb8da829e@github.com> On Fri, 1 Jan 2021 03:49:16 GMT, Kim Barrett wrote: >> Marked as reviewed by kbarrett (Reviewer). > > Thanks for splitting up the changes. The GC changes look good. @kimbarrett thanks for your review. ------------- PR: https://git.openjdk.java.net/jdk/pull/1886 From kbarrett at openjdk.java.net Fri Jan 1 10:08:17 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 1 Jan 2021 10:08:17 GMT Subject: [jdk16] RFR: 8258985: Parallel WeakProcessor may use too few threads Message-ID: Please review this fix to the parallel WeakProcessor's computation of the number of worker threads to use. It was previously limited by the current value of active_workers(), whatever that happens to be. It should be limited by total_workers(), just as with the parallel ReferenceProcessor. (Both are subject to ReferencesPerThread.) Testing mach5 tier1 Some hand testing (Linux-x64) to verify the expected number of threads are being used. Note: That hand testing suggests some further tuning of ReferencesPerThread might be in order. With the current default of 1000, I often saw in testing that some threads were started late enough that no work was left for them. I'll file a separate RFE for that. ------------- Commit messages: - Use total workers rather than active Changes: https://git.openjdk.java.net/jdk16/pull/75/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk16&pr=75&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8258985 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk16/pull/75.diff Fetch: git fetch https://git.openjdk.java.net/jdk16 pull/75/head:pull/75 PR: https://git.openjdk.java.net/jdk16/pull/75 From ayang at openjdk.java.net Fri Jan 1 18:10:52 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Fri, 1 Jan 2021 18:10:52 GMT Subject: RFR: 8259020: null-check of g1 write_ref_field_pre_entry is not necessary In-Reply-To: References: Message-ID: On Thu, 31 Dec 2020 08:12:46 GMT, Xin Liu wrote: > orig is not null because G1BarrierSetC2 won't invoke write_ref_field_pre_entry > if pre_val is NULL. Marked as reviewed by ayang (Author). ------------- PR: https://git.openjdk.java.net/jdk/pull/1913 From xliu at openjdk.java.net Sat Jan 2 22:20:00 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Sat, 2 Jan 2021 22:20:00 GMT Subject: RFR: 8259020: null-check of g1 write_ref_field_pre_entry is not necessary In-Reply-To: References: Message-ID: On Fri, 1 Jan 2021 18:07:51 GMT, Albert Mingkun Yang wrote: >> orig is not null because G1BarrierSetC2 won't invoke write_ref_field_pre_entry >> if pre_val is NULL. > > Marked as reviewed by ayang (Author). Regression tests is clear. https://github.com/navyxliu/jdk/actions/runs/454177250 I can confirm it's safe for hotspot/c2 and I haven't found evidence that c1 generates call of write_ref_field_pre_entry. on the other side, it is exported to external via JVMCIRuntime::write_barrier_pre. I am not familiar with jvmci. I guess it's used by graal compiler, but I don't understand it much. Can a graal expert tell me if it's safe on your site? ------------- PR: https://git.openjdk.java.net/jdk/pull/1913 From github.com+16932759+shqking at openjdk.java.net Mon Jan 4 02:24:55 2021 From: github.com+16932759+shqking at openjdk.java.net (Hao Sun) Date: Mon, 4 Jan 2021 02:24:55 GMT Subject: Integrated: 8258382: Fix optimization-unstable code involving pointer overflow In-Reply-To: References: Message-ID: <8fgOtc2lp3-gKpHhyjqsdqtQRzABxQibvoVwLDIaUss=.e3b081f3-29c9-4d17-8d9a-254aafc0d9e9@github.com> On Thu, 24 Dec 2020 00:24:32 GMT, Hao Sun wrote: > Optimization-unstable code refers to code that is unexpectedly discarded > by compiler optimizations due to undefined behavior in the program. > > We applied a static checker called STACK (prototype from SOSP'13 paper > [1]) to OpenJDK source code and found the following two sites of > potential unstable code involving pointer overflow. > > Removing undefined behaviors would make the code stable. > > [1] https://css.csail.mit.edu/stack/ > > -------- > Note that we tested locally Jtreg tests ( tier1 and jdk::tier3) were passed on Linux x86/aarch64 machines after apply this patch. This pull request has now been integrated. Changeset: f351e155 Author: Hao Sun Committer: Ningsheng Jian URL: https://git.openjdk.java.net/jdk/commit/f351e155 Stats: 8 lines in 2 files changed: 0 ins; 2 del; 6 mod 8258382: Fix optimization-unstable code involving pointer overflow Reviewed-by: kbarrett ------------- PR: https://git.openjdk.java.net/jdk/pull/1886 From kbarrett at openjdk.java.net Mon Jan 4 04:04:54 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Mon, 4 Jan 2021 04:04:54 GMT Subject: RFR: 8258382: Fix optimization-unstable code involving pointer overflow [v2] In-Reply-To: <9b_OfSainLug_7jZpXrvK-XfQjFtuJSkcrMWFV9AH1I=.e338785d-d102-4582-9571-f5fcb8da829e@github.com> References: <9b_OfSainLug_7jZpXrvK-XfQjFtuJSkcrMWFV9AH1I=.e338785d-d102-4582-9571-f5fcb8da829e@github.com> Message-ID: <5nXxnAAsUR3BG-mCCnmUg5s9gXJVdEzGFg4Mh3YVedY=.5c61f647-ae7e-4c18-a431-ef76d2ea179c@github.com> On Fri, 1 Jan 2021 07:11:32 GMT, Hao Sun wrote: >> Thanks for splitting up the changes. The GC changes look good. > > @kimbarrett thanks for your review. This change should not have been pushed with only one review. HotSpot changes normally require two reviews. https://wiki.openjdk.java.net/display/HotSpot/Pushing+a+HotSpot+change I know the skara bots said it was ready to go. They haven't yet been taught about such project-specific tailorings of the base process. (The information in that page has supposedly been superseded by the new Developers' Guide (https://openjdk.java.net/guide/index.html), but the HotSpot reviewer requirements seem to have not made the transition. I'll bring that up with the dev-guide folks.) ------------- PR: https://git.openjdk.java.net/jdk/pull/1886 From njian at openjdk.java.net Mon Jan 4 05:37:56 2021 From: njian at openjdk.java.net (Ningsheng Jian) Date: Mon, 4 Jan 2021 05:37:56 GMT Subject: RFR: 8258382: Fix optimization-unstable code involving pointer overflow [v2] In-Reply-To: <5nXxnAAsUR3BG-mCCnmUg5s9gXJVdEzGFg4Mh3YVedY=.5c61f647-ae7e-4c18-a431-ef76d2ea179c@github.com> References: <9b_OfSainLug_7jZpXrvK-XfQjFtuJSkcrMWFV9AH1I=.e338785d-d102-4582-9571-f5fcb8da829e@github.com> <5nXxnAAsUR3BG-mCCnmUg5s9gXJVdEzGFg4Mh3YVedY=.5c61f647-ae7e-4c18-a431-ef76d2ea179c@github.com> Message-ID: On Mon, 4 Jan 2021 04:02:28 GMT, Kim Barrett wrote: > This change should not have been pushed with only one review. HotSpot changes normally require two reviews. > https://wiki.openjdk.java.net/display/HotSpot/Pushing+a+HotSpot+change > > I know the skara bots said it was ready to go. They haven't yet been taught about such project-specific tailorings of the base process. (The information in that page has supposedly been superseded by the new Developers' Guide (https://openjdk.java.net/guide/index.html), but the HotSpot reviewer requirements seem to have not made the transition. I'll bring that up with the dev-guide folks.) Thanks for the reminder! Sorry, I (mistakenly) thought that this is trivial change. I've also reviewed the patch internally and should have marked it reviewed by me before sponsoring. ------------- PR: https://git.openjdk.java.net/jdk/pull/1886 From shade at openjdk.java.net Mon Jan 4 09:43:55 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 4 Jan 2021 09:43:55 GMT Subject: RFR: 8258534: Epsilon: clean up unused includes In-Reply-To: References: Message-ID: On Fri, 11 Dec 2020 05:03:56 GMT, Lehua Ding wrote: > Hi all, > > CLion IDE shows two warnings of unused includes(`#include "utilities/macros.hpp"`) in EpsilonGC's code. these maybe can be removed. > > Testing: macosx-x86_64-server-{release,fastdebug,slowdebug} On the second look, the `#include` for `macro.hpp` is indeed excessive: `COMPILER1` and `COMPILER2` are already defined, so the rest of the `#include`-s should work fine. The only reason we would want that include if Epsilon used any of the extended definitions like `COMPILER2_ONLY`, but it does not. The patch looks good then. Please make sure you run the pre-integration tests. To do that, GH Actions should be enabled here: https://github.com/lhtin/jdk/actions -- and then probably triggered manually on your branch. Then "Checks" tab should have the test results. Once that is done, I would formally approve. ------------- PR: https://git.openjdk.java.net/jdk/pull/1745 From shade at openjdk.java.net Mon Jan 4 12:02:57 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 4 Jan 2021 12:02:57 GMT Subject: RFR: 8258490: Shenandoah: Full GC does not need to remark threads and drain SATB buffers [v2] In-Reply-To: References: Message-ID: On Wed, 16 Dec 2020 19:04:09 GMT, Zhengyu Gu wrote: >> Full GC marks heap at a pause with SATB deactivated, therefore, we don't need to remark threads and drain SATB buffers during final mark phase. >> >> - [x] hotspot_gc_shenandoah > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > Minor update Looks fine to me, modulo minor nit. src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 230: > 228: class ShenandoahClaimThreadClosure : public ThreadClosure { > 229: private: > 230: uintx _claim_token; Should be `const`, maybe? ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1805 From zgu at openjdk.java.net Mon Jan 4 15:17:17 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Mon, 4 Jan 2021 15:17:17 GMT Subject: RFR: 8258490: Shenandoah: Full GC does not need to remark threads and drain SATB buffers [v3] In-Reply-To: References: Message-ID: > Full GC marks heap at a pause with SATB deactivated, therefore, we don't need to remark threads and drain SATB buffers during final mark phase. > > - [x] hotspot_gc_shenandoah Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: @shade's comment ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1805/files - new: https://git.openjdk.java.net/jdk/pull/1805/files/602347da..acdb27e8 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1805&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1805&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1805.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1805/head:pull/1805 PR: https://git.openjdk.java.net/jdk/pull/1805 From zgu at openjdk.java.net Mon Jan 4 17:45:07 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Mon, 4 Jan 2021 17:45:07 GMT Subject: RFR: 8255019: Shenandoah: Split STW and concurrent mark into separate classes [v22] In-Reply-To: References: Message-ID: > This is the first part of refactoring, that aims to isolate three Shenandoah GC modes (concurrent, degenerated and full gc). > > Shenandoah started with two GC modes, concurrent and full gc, with minimal shared code, mainly in mark phase. After introducing degenerated GC, it shared quite large portion of code with concurrent GC, with the concept that degenerated GC can simply pick up remaining work of concurrent GC in STW mode. > > It was not a big problem at that time, since concurrent GC also processed roots STW. Since Shenandoah gradually moved root processing into concurrent phase, code started to diverge, that made code hard to reason and maintain. > > First step, I would like to split STW and concurrent mark, so that: > 1) Code has to special case for STW and concurrent mark. > 2) STW mark does not need to rendezvous workers between root mark and the rest of mark > 3) STW mark does not need to activate SATB barrier and drain SATB buffers. > 4) STW mark does not need to remark some of roots. > > The patch mainly just shuffles code. Creates a base class ShenandoahMark, and moved shared code (from current shenandoahConcurrentMark) into this base class. I did 'git mv shenandoahConcurrentMark.inline.hpp shenandoahMark.inline.hpp, but git does not seem to reflect that. > > A few changes: > 1) Moved task queue set from ShenandoahConcurrentMark to ShenandoahHeap. ShenandoahMark and its subclasses are stateless. Instead, mark states are maintained in task queue, mark bitmap and SATB buffers, so that they can be created on demand. > 2) Split ShenandoahConcurrentRootScanner template to ShenandoahConcurrentRootScanner and ShenandoahSTWRootScanner > 3) Split code inside op_final_mark code into finish_mark and prepare_evacuation helper functions. > 4) Made ShenandoahMarkCompact stack allocated (as well as ShenandoahConcurrentGC and ShenandoahDegeneratedGC in upcoming refactoring) Zhengyu Gu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 29 commits: - Merge - Merge branch 'master' into JDK-8255019-sh-mark - Concurrent mark does not expect forwarded objects - Merge branch 'master' into JDK-8255019-sh-mark - Merge branch 'master' into JDK-8255019-sh-mark - Silent valgrind on potential memory leak - Merge branch 'master' into JDK-8255019-sh-mark - Removed ShenandoahConcurrentMark parameter from concurrent GC entry/op, etc. - Merge branch 'master' into JDK-8255019-sh-mark - Merge - ... and 19 more: https://git.openjdk.java.net/jdk/compare/d679caa2...cde20115 ------------- Changes: https://git.openjdk.java.net/jdk/pull/1009/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1009&range=21 Stats: 1954 lines in 22 files changed: 1070 ins; 739 del; 145 mod Patch: https://git.openjdk.java.net/jdk/pull/1009.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1009/head:pull/1009 PR: https://git.openjdk.java.net/jdk/pull/1009 From zgu at openjdk.java.net Mon Jan 4 18:14:03 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Mon, 4 Jan 2021 18:14:03 GMT Subject: Integrated: 8258490: Shenandoah: Full GC does not need to remark threads and drain SATB buffers In-Reply-To: References: Message-ID: On Wed, 16 Dec 2020 17:34:43 GMT, Zhengyu Gu wrote: > Full GC marks heap at a pause with SATB deactivated, therefore, we don't need to remark threads and drain SATB buffers during final mark phase. > > - [x] hotspot_gc_shenandoah This pull request has now been integrated. Changeset: f80c63b3 Author: Zhengyu Gu URL: https://git.openjdk.java.net/jdk/commit/f80c63b3 Stats: 44 lines in 1 file changed: 21 ins; 15 del; 8 mod 8258490: Shenandoah: Full GC does not need to remark threads and drain SATB buffers Reviewed-by: shade ------------- PR: https://git.openjdk.java.net/jdk/pull/1805 From shade at openjdk.java.net Mon Jan 4 18:15:25 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 4 Jan 2021 18:15:25 GMT Subject: RFR: 8251944: Add Shenandoah test config to compiler/gcbarriers/UnsafeIntrinsicsTest.java [v3] In-Reply-To: References: Message-ID: > There used to be failures in Shenandoah CAS handling code like that were caught by this test. Those were fixed in JDK-8255401. This change turns the test into regression test for it. > > Additional testing: > - [x] Affected test on `x86_64` fastdebug, release > - [x] Affected test on `x86_32` fastdebug > - [x] Affected test on `aarch64` fastdebug Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into JDK-8251944-shenandoah-test-unsafe - Merge branch 'master' into JDK-8251944-shenandoah-test-unsafe - Mention 8255401 in @bug - Make test pass in release - 8251944: Add Shenandoah test config to compiler/gcbarriers/UnsafeIntrinsicsTest.java ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1693/files - new: https://git.openjdk.java.net/jdk/pull/1693/files/073e166c..899b899c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1693&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1693&range=01-02 Stats: 40014 lines in 1320 files changed: 27859 ins; 8021 del; 4134 mod Patch: https://git.openjdk.java.net/jdk/pull/1693.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1693/head:pull/1693 PR: https://git.openjdk.java.net/jdk/pull/1693 From zgu at openjdk.java.net Mon Jan 4 18:22:13 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Mon, 4 Jan 2021 18:22:13 GMT Subject: RFR: 8255019: Shenandoah: Split STW and concurrent mark into separate classes [v23] In-Reply-To: References: Message-ID: > This is the first part of refactoring, that aims to isolate three Shenandoah GC modes (concurrent, degenerated and full gc). > > Shenandoah started with two GC modes, concurrent and full gc, with minimal shared code, mainly in mark phase. After introducing degenerated GC, it shared quite large portion of code with concurrent GC, with the concept that degenerated GC can simply pick up remaining work of concurrent GC in STW mode. > > It was not a big problem at that time, since concurrent GC also processed roots STW. Since Shenandoah gradually moved root processing into concurrent phase, code started to diverge, that made code hard to reason and maintain. > > First step, I would like to split STW and concurrent mark, so that: > 1) Code has to special case for STW and concurrent mark. > 2) STW mark does not need to rendezvous workers between root mark and the rest of mark > 3) STW mark does not need to activate SATB barrier and drain SATB buffers. > 4) STW mark does not need to remark some of roots. > > The patch mainly just shuffles code. Creates a base class ShenandoahMark, and moved shared code (from current shenandoahConcurrentMark) into this base class. I did 'git mv shenandoahConcurrentMark.inline.hpp shenandoahMark.inline.hpp, but git does not seem to reflect that. > > A few changes: > 1) Moved task queue set from ShenandoahConcurrentMark to ShenandoahHeap. ShenandoahMark and its subclasses are stateless. Instead, mark states are maintained in task queue, mark bitmap and SATB buffers, so that they can be created on demand. > 2) Split ShenandoahConcurrentRootScanner template to ShenandoahConcurrentRootScanner and ShenandoahSTWRootScanner > 3) Split code inside op_final_mark code into finish_mark and prepare_evacuation helper functions. > 4) Made ShenandoahMarkCompact stack allocated (as well as ShenandoahConcurrentGC and ShenandoahDegeneratedGC in upcoming refactoring) Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: Update copyright years ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1009/files - new: https://git.openjdk.java.net/jdk/pull/1009/files/cde20115..4b367ed6 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1009&range=22 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1009&range=21-22 Stats: 23 lines in 21 files changed: 0 ins; 0 del; 23 mod Patch: https://git.openjdk.java.net/jdk/pull/1009.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1009/head:pull/1009 PR: https://git.openjdk.java.net/jdk/pull/1009 From github.com+1631061+javawithjiva at openjdk.java.net Mon Jan 4 19:27:55 2021 From: github.com+1631061+javawithjiva at openjdk.java.net (Azeem Jiva) Date: Mon, 4 Jan 2021 19:27:55 GMT Subject: RFR: 8259020: null-check of g1 write_ref_field_pre_entry is not necessary In-Reply-To: References: Message-ID: On Thu, 31 Dec 2020 08:12:46 GMT, Xin Liu wrote: > orig is not null because G1BarrierSetC2 won't invoke write_ref_field_pre_entry > if pre_val is NULL. Marked as reviewed by javawithjiva at github.com (no known OpenJDK username). ------------- PR: https://git.openjdk.java.net/jdk/pull/1913 From phh at openjdk.java.net Tue Jan 5 00:01:56 2021 From: phh at openjdk.java.net (Paul Hohensee) Date: Tue, 5 Jan 2021 00:01:56 GMT Subject: RFR: 8259020: null-check of g1 write_ref_field_pre_entry is not necessary In-Reply-To: References: Message-ID: On Thu, 31 Dec 2020 08:12:46 GMT, Xin Liu wrote: > orig is not null because G1BarrierSetC2 won't invoke write_ref_field_pre_entry > if pre_val is NULL. Marked as reviewed by phh (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1913 From xliu at openjdk.java.net Tue Jan 5 00:01:57 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 5 Jan 2021 00:01:57 GMT Subject: Integrated: 8259020: null-check of g1 write_ref_field_pre_entry is not necessary In-Reply-To: References: Message-ID: On Thu, 31 Dec 2020 08:12:46 GMT, Xin Liu wrote: > orig is not null because G1BarrierSetC2 won't invoke write_ref_field_pre_entry > if pre_val is NULL. This pull request has now been integrated. Changeset: f0aae81e Author: Xin Liu Committer: Paul Hohensee URL: https://git.openjdk.java.net/jdk/commit/f0aae81e Stats: 5 lines in 1 file changed: 0 ins; 3 del; 2 mod 8259020: null-check of g1 write_ref_field_pre_entry is not necessary Reviewed-by: kbarrett, ayang, phh ------------- PR: https://git.openjdk.java.net/jdk/pull/1913 From xliu at openjdk.java.net Tue Jan 5 00:14:53 2021 From: xliu at openjdk.java.net (Xin Liu) Date: Tue, 5 Jan 2021 00:14:53 GMT Subject: RFR: 8259020: null-check of g1 write_ref_field_pre_entry is not necessary In-Reply-To: References: Message-ID: <4bI-5xhDCeVpUpVyEfUTw6VK1w94Io7T-SNdoSlWmgg=.abe01c80-8474-4e58-b42c-df8c1a231926@github.com> On Mon, 4 Jan 2021 23:58:13 GMT, Paul Hohensee wrote: >> orig is not null because G1BarrierSetC2 won't invoke write_ref_field_pre_entry >> if pre_val is NULL. > > Marked as reviewed by phh (Reviewer). Thank you all reviewers and @phohensee for sponsoring it. ------------- PR: https://git.openjdk.java.net/jdk/pull/1913 From github.com+13173904+lhtin at openjdk.java.net Tue Jan 5 04:47:56 2021 From: github.com+13173904+lhtin at openjdk.java.net (Lehua Ding) Date: Tue, 5 Jan 2021 04:47:56 GMT Subject: RFR: 8258534: Epsilon: clean up unused includes In-Reply-To: References: Message-ID: <0_euEe1G4735lXKduMHvL9OChmqp7-yczfPL8SrNVUc=.9f657841-f034-46c0-9be5-cc177ab7de87@github.com> On Mon, 4 Jan 2021 09:41:23 GMT, Aleksey Shipilev wrote: > On the second look, the `#include` for `macro.hpp` is indeed excessive: `COMPILER1` and `COMPILER2` are already defined, so the rest of the `#include`-s should work fine. The only reason we would want that include if Epsilon used any of the extended definitions like `COMPILER2_ONLY`, but it does not. > > The patch looks good then. Please make sure you run the pre-integration tests. To do that, GH Actions should be enabled here: https://github.com/lhtin/jdk/actions -- and then probably triggered manually on your branch. Then "Checks" tab should have the test results. Once that is done, I would formally approve. Thank you for your review and guide of test. The pre-integration test has finished. ------------- PR: https://git.openjdk.java.net/jdk/pull/1745 From shade at openjdk.java.net Tue Jan 5 08:24:55 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 5 Jan 2021 08:24:55 GMT Subject: RFR: 8258534: Epsilon: clean up unused includes In-Reply-To: References: Message-ID: On Fri, 11 Dec 2020 05:03:56 GMT, Lehua Ding wrote: > Hi all, > > CLion IDE shows two warnings of unused includes(`#include "utilities/macros.hpp"`) in EpsilonGC's code. these maybe can be removed. > > Testing: macosx-x86_64-server-{release,fastdebug,slowdebug} Excellent, you are good to go. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1745 From jiefu at openjdk.java.net Tue Jan 5 08:28:58 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Tue, 5 Jan 2021 08:28:58 GMT Subject: RFR: 8258534: Epsilon: clean up unused includes In-Reply-To: References: Message-ID: On Fri, 11 Dec 2020 05:03:56 GMT, Lehua Ding wrote: > Hi all, > > CLion IDE shows two warnings of unused includes(`#include "utilities/macros.hpp"`) in EpsilonGC's code. these maybe can be removed. > > Testing: macosx-x86_64-server-{release,fastdebug,slowdebug} Marked as reviewed by jiefu (Committer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1745 From github.com+13173904+lhtin at openjdk.java.net Tue Jan 5 08:36:54 2021 From: github.com+13173904+lhtin at openjdk.java.net (Lehua Ding) Date: Tue, 5 Jan 2021 08:36:54 GMT Subject: Integrated: 8258534: Epsilon: clean up unused includes In-Reply-To: References: Message-ID: On Fri, 11 Dec 2020 05:03:56 GMT, Lehua Ding wrote: > Hi all, > > CLion IDE shows two warnings of unused includes(`#include "utilities/macros.hpp"`) in EpsilonGC's code. these maybe can be removed. > > Testing: macosx-x86_64-server-{release,fastdebug,slowdebug} This pull request has now been integrated. Changeset: 3817c32f Author: Lehua Ding Committer: Jie Fu URL: https://git.openjdk.java.net/jdk/commit/3817c32f Stats: 2 lines in 2 files changed: 0 ins; 2 del; 0 mod 8258534: Epsilon: clean up unused includes Reviewed-by: shade, jiefu ------------- PR: https://git.openjdk.java.net/jdk/pull/1745 From shade at openjdk.java.net Tue Jan 5 08:39:53 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 5 Jan 2021 08:39:53 GMT Subject: Integrated: 8251944: Add Shenandoah test config to compiler/gcbarriers/UnsafeIntrinsicsTest.java In-Reply-To: References: Message-ID: <9o8BbtcVezKGIMDLxEYE1UfSjRHvgTtdlvBS3qykk4w=.bc09e436-3596-459f-8511-9298e3cf3f60@github.com> On Tue, 8 Dec 2020 10:42:08 GMT, Aleksey Shipilev wrote: > There used to be failures in Shenandoah CAS handling code like that were caught by this test. Those were fixed in JDK-8255401. This change turns the test into regression test for it. > > Additional testing: > - [x] Affected test on `x86_64` fastdebug, release > - [x] Affected test on `x86_32` fastdebug > - [x] Affected test on `aarch64` fastdebug This pull request has now been integrated. Changeset: db6f3930 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/db6f3930 Stats: 24 lines in 1 file changed: 22 ins; 0 del; 2 mod 8251944: Add Shenandoah test config to compiler/gcbarriers/UnsafeIntrinsicsTest.java Reviewed-by: rkennke, adityam ------------- PR: https://git.openjdk.java.net/jdk/pull/1693 From github.com+13173904+lhtin at openjdk.java.net Tue Jan 5 12:12:02 2021 From: github.com+13173904+lhtin at openjdk.java.net (Lehua Ding) Date: Tue, 5 Jan 2021 12:12:02 GMT Subject: RFR: 8259231: Fix the chance to allocate failure and improve the speed and quality of EpsilonHeap::allocate_work Message-ID: Hi all, The `EpsilonHeap::allocate_work` method maybe can be fixed and improved by this: 1. it can prevent allocate failure by retry `_space->par_allocate` before expanding virtual space, when there not enough virtual space but another thread expanding succeeded just and has enough space. 2. it can reduce the lock time by move `res = _space->par_allocate(size);` out of lock scope. Test on macosx-x86_64-server-{release, fastdebug, slowdebug} with current test case. ------------- Commit messages: - add a comment - Epsilon: improve speed and quality of EpsilonHeap::allocate_work Changes: https://git.openjdk.java.net/jdk/pull/1794/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1794&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8259231 Stats: 32 lines in 2 files changed: 12 ins; 2 del; 18 mod Patch: https://git.openjdk.java.net/jdk/pull/1794.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1794/head:pull/1794 PR: https://git.openjdk.java.net/jdk/pull/1794 From github.com+13173904+lhtin at openjdk.java.net Tue Jan 5 12:12:03 2021 From: github.com+13173904+lhtin at openjdk.java.net (Lehua Ding) Date: Tue, 5 Jan 2021 12:12:03 GMT Subject: RFR: 8259231: Fix the chance to allocate failure and improve the speed and quality of EpsilonHeap::allocate_work In-Reply-To: References: Message-ID: <-TYiDHFIXiS5HbiUng9Rq6bypIg8IKIyDmbO8KVuVSQ=.ad05b87a-355c-4810-bc87-e9b66ea76d06@github.com> On Wed, 16 Dec 2020 01:29:55 GMT, Lehua Ding wrote: > Hi all, > > The `EpsilonHeap::allocate_work` method maybe can be fixed and improved by this: > 1. it can prevent allocate failure by retry `_space->par_allocate` before expanding virtual space, when there not enough virtual space but another thread expanding succeeded just and has enough space. > 2. it can reduce the lock time by move `res = _space->par_allocate(size);` out of lock scope. > > Test on macosx-x86_64-server-{release, fastdebug, slowdebug} with current test case. Tencent ------------- PR: https://git.openjdk.java.net/jdk/pull/1794 From zgu at openjdk.java.net Tue Jan 5 13:43:06 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Tue, 5 Jan 2021 13:43:06 GMT Subject: RFR: 8255019: Shenandoah: Split STW and concurrent mark into separate classes [v24] In-Reply-To: References: Message-ID: > This is the first part of refactoring, that aims to isolate three Shenandoah GC modes (concurrent, degenerated and full gc). > > Shenandoah started with two GC modes, concurrent and full gc, with minimal shared code, mainly in mark phase. After introducing degenerated GC, it shared quite large portion of code with concurrent GC, with the concept that degenerated GC can simply pick up remaining work of concurrent GC in STW mode. > > It was not a big problem at that time, since concurrent GC also processed roots STW. Since Shenandoah gradually moved root processing into concurrent phase, code started to diverge, that made code hard to reason and maintain. > > First step, I would like to split STW and concurrent mark, so that: > 1) Code has to special case for STW and concurrent mark. > 2) STW mark does not need to rendezvous workers between root mark and the rest of mark > 3) STW mark does not need to activate SATB barrier and drain SATB buffers. > 4) STW mark does not need to remark some of roots. > > The patch mainly just shuffles code. Creates a base class ShenandoahMark, and moved shared code (from current shenandoahConcurrentMark) into this base class. I did 'git mv shenandoahConcurrentMark.inline.hpp shenandoahMark.inline.hpp, but git does not seem to reflect that. > > A few changes: > 1) Moved task queue set from ShenandoahConcurrentMark to ShenandoahHeap. ShenandoahMark and its subclasses are stateless. Instead, mark states are maintained in task queue, mark bitmap and SATB buffers, so that they can be created on demand. > 2) Split ShenandoahConcurrentRootScanner template to ShenandoahConcurrentRootScanner and ShenandoahSTWRootScanner > 3) Split code inside op_final_mark code into finish_mark and prepare_evacuation helper functions. > 4) Made ShenandoahMarkCompact stack allocated (as well as ShenandoahConcurrentGC and ShenandoahDegeneratedGC in upcoming refactoring) Zhengyu Gu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 31 commits: - Merge - Update copyright years - Merge - Merge branch 'master' into JDK-8255019-sh-mark - Concurrent mark does not expect forwarded objects - Merge branch 'master' into JDK-8255019-sh-mark - Merge branch 'master' into JDK-8255019-sh-mark - Silent valgrind on potential memory leak - Merge branch 'master' into JDK-8255019-sh-mark - Removed ShenandoahConcurrentMark parameter from concurrent GC entry/op, etc. - ... and 21 more: https://git.openjdk.java.net/jdk/compare/a6c08813...b7390c08 ------------- Changes: https://git.openjdk.java.net/jdk/pull/1009/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1009&range=23 Stats: 1982 lines in 21 files changed: 1078 ins; 753 del; 151 mod Patch: https://git.openjdk.java.net/jdk/pull/1009.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1009/head:pull/1009 PR: https://git.openjdk.java.net/jdk/pull/1009 From shade at openjdk.java.net Tue Jan 5 13:44:55 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 5 Jan 2021 13:44:55 GMT Subject: RFR: 8259231: Fix the chance to allocate failure and improve the speed and quality of EpsilonHeap::allocate_work In-Reply-To: References: Message-ID: On Wed, 16 Dec 2020 01:29:55 GMT, Lehua Ding wrote: > Hi all, > > The `EpsilonHeap::allocate_work` method maybe can be fixed and improved by this: > 1. it can prevent allocate failure by retry `_space->par_allocate` before expanding virtual space, when there not enough virtual space but another thread expanding succeeded just and has enough space. > 2. it can reduce the lock time by move `res = _space->par_allocate(size);` out of lock scope. > > Test on macosx-x86_64-server-{release, fastdebug, slowdebug} with current test case. Good find! Comments below: src/hotspot/share/gc/epsilon/epsilon_globals.hpp line 67: > 65: \ > 66: product(bool, EpsilonElasticTLABDecay, true, EXPERIMENTAL, \ > 67: "Use timed decays to shrink TLAB sizes. This conserves memory " \ Let's not do the typo fixes in this PR. Maybe there are other spelling problems elsewhere in `gc/epsilon` that we could fix wholesale in another PR? src/hotspot/share/gc/epsilon/epsilonHeap.cpp line 113: > 111: while (res == NULL) { > 112: // Allocation failed, attempt expansion, and retry: > 113: { I see what you are trying to do, and it makes sense. I believe this form would be cleaner: HeapWord* res = NULL; while (true) { // Try to allocate, assume space is available res = par_allocate(size); if (res != NULL) { break; } MutexLocker ml(Heap_Lock); // Try to allocate under the lock, assume another thread was able to expand res = par_allocate(size); if (res != NULL) { break; } // Expand and loop back if space is available size_t space_left = max_capacity() - capacity(); size_t want_space = MAX2(size, EpsilonMinHeapExpand); ... } ------------- Changes requested by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1794 From shade at openjdk.java.net Tue Jan 5 14:11:06 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 5 Jan 2021 14:11:06 GMT Subject: RFR: 8255019: Shenandoah: Split STW and concurrent mark into separate classes [v24] In-Reply-To: References: Message-ID: <4C2MtK5jhW_yHyzDp5k5cAaguy3QlmI0T0LTKf6n978=.8599d416-5d32-40e5-ae7f-9bf5ca5b330d@github.com> On Tue, 5 Jan 2021 13:43:06 GMT, Zhengyu Gu wrote: >> This is the first part of refactoring, that aims to isolate three Shenandoah GC modes (concurrent, degenerated and full gc). >> >> Shenandoah started with two GC modes, concurrent and full gc, with minimal shared code, mainly in mark phase. After introducing degenerated GC, it shared quite large portion of code with concurrent GC, with the concept that degenerated GC can simply pick up remaining work of concurrent GC in STW mode. >> >> It was not a big problem at that time, since concurrent GC also processed roots STW. Since Shenandoah gradually moved root processing into concurrent phase, code started to diverge, that made code hard to reason and maintain. >> >> First step, I would like to split STW and concurrent mark, so that: >> 1) Code has to special case for STW and concurrent mark. >> 2) STW mark does not need to rendezvous workers between root mark and the rest of mark >> 3) STW mark does not need to activate SATB barrier and drain SATB buffers. >> 4) STW mark does not need to remark some of roots. >> >> The patch mainly just shuffles code. Creates a base class ShenandoahMark, and moved shared code (from current shenandoahConcurrentMark) into this base class. I did 'git mv shenandoahConcurrentMark.inline.hpp shenandoahMark.inline.hpp, but git does not seem to reflect that. >> >> A few changes: >> 1) Moved task queue set from ShenandoahConcurrentMark to ShenandoahHeap. ShenandoahMark and its subclasses are stateless. Instead, mark states are maintained in task queue, mark bitmap and SATB buffers, so that they can be created on demand. >> 2) Split ShenandoahConcurrentRootScanner template to ShenandoahConcurrentRootScanner and ShenandoahSTWRootScanner >> 3) Split code inside op_final_mark code into finish_mark and prepare_evacuation helper functions. >> 4) Made ShenandoahMarkCompact stack allocated (as well as ShenandoahConcurrentGC and ShenandoahDegeneratedGC in upcoming refactoring) > > Zhengyu Gu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 31 commits: > > - Merge > - Update copyright years > - Merge > - Merge branch 'master' into JDK-8255019-sh-mark > - Concurrent mark does not expect forwarded objects > - Merge branch 'master' into JDK-8255019-sh-mark > - Merge branch 'master' into JDK-8255019-sh-mark > - Silent valgrind on potential memory leak > - Merge branch 'master' into JDK-8255019-sh-mark > - Removed ShenandoahConcurrentMark parameter from concurrent GC entry/op, etc. > - ... and 21 more: https://git.openjdk.java.net/jdk/compare/a6c08813...b7390c08 First read review follows. src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 161: > 159: // threads, and performance-wise it doesn't really matter. Adds about 1ms to > 160: // full-gc. > 161: { This seems to revert JDK-8258490? src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 168: > 166: while (satb_mq_set.apply_closure_to_completed_buffer(&cl)); > 167: bool do_nmethods = heap->unload_classes() && !ShenandoahConcurrentRoots::can_do_concurrent_class_unloading(); > 168: assert(!heap->has_forwarded_objects(), "Not expected"); Do you need to move this assert? src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 314: > 312: ShenandoahReferenceProcessor* rp, > 313: ShenandoahPhaseTimings::Phase phase, > 314: uint nworkers) : This indenting seems wrong? The original one was correct, I think. src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 325: > 323: ShenandoahConcurrentWorkerSession worker_session(worker_id); > 324: ShenandoahObjToScanQueue* q = _queue_set->queue(worker_id); > 325: ShenandoahMarkResolveRefsClosure cl(q, _rp); Why `ShenandoahMarkRefsClosure` -> `ShenandoahMarkResolveRefsClosure` change? src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 335: > 333: ShenandoahReferenceProcessor* rp = _heap->ref_processor(); > 334: task_queues()->reserve(workers->active_workers()); > 335: ShenandoahMarkConcurrentRootsTask task(task_queues(), rp, ShenandoahPhaseTimings::conc_mark_roots, workers->active_workers()); Excess space: `rp, ShenandoahPhaseTimings`. src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 344: > 342: uint nworkers = workers->active_workers(); > 343: task_queues()->reserve(nworkers); > 344: TaskTerminator terminator(nworkers, task_queues()); There is another `TaskTerminator` right below it, is it correct? src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.hpp line 58: > 56: // TODO: where to put them > 57: static void update_roots(ShenandoahPhaseTimings::Phase root_phase); > 58: static void update_thread_roots(ShenandoahPhaseTimings::Phase root_phase); Sounds like these better to be shared in `ShenandoahMark`? src/hotspot/share/gc/shenandoah/shenandoahMark.cpp line 38: > 36: #include "gc/shenandoah/shenandoahUtils.hpp" > 37: #include "gc/shenandoah/shenandoahVerifier.hpp" > 38: Excess newline? src/hotspot/share/gc/shenandoah/shenandoahMark.inline.hpp line 306: > 304: ShenandoahHeap* ShenandoahMark::heap() const { > 305: return _heap; > 306: } Do we really need this method? `ShenandoahHeap::heap()` is supposed to be as fast. src/hotspot/share/gc/shenandoah/shenandoahMarkCompact.cpp line 249: > 247: rp->set_soft_reference_policy(true); // forcefully purge all soft references > 248: > 249: Excess newline? src/hotspot/share/gc/shenandoah/shenandoahRootProcessor.inline.hpp line 2: > 1: /* > 2: * Copyright (c) 2019, 2019, Red Hat, Inc. All rights reserved. Odd change: 2020 -> 2019. src/hotspot/share/gc/shenandoah/shenandoahSTWMark.cpp line 81: > 79: task_queues()->reserve(nworkers); > 80: > 81: Excess new-line, drop one. src/hotspot/share/gc/shenandoah/shenandoahSTWMark.cpp line 96: > 94: TASKQUEUE_STATS_ONLY(task_queues()->print_taskqueue_stats()); > 95: TASKQUEUE_STATS_ONLY(task_queues()->reset_taskqueue_stats()); > 96: Excess newline ------------- Changes requested by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1009 From zgu at openjdk.java.net Tue Jan 5 14:59:10 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Tue, 5 Jan 2021 14:59:10 GMT Subject: RFR: 8255019: Shenandoah: Split STW and concurrent mark into separate classes [v25] In-Reply-To: References: Message-ID: > This is the first part of refactoring, that aims to isolate three Shenandoah GC modes (concurrent, degenerated and full gc). > > Shenandoah started with two GC modes, concurrent and full gc, with minimal shared code, mainly in mark phase. After introducing degenerated GC, it shared quite large portion of code with concurrent GC, with the concept that degenerated GC can simply pick up remaining work of concurrent GC in STW mode. > > It was not a big problem at that time, since concurrent GC also processed roots STW. Since Shenandoah gradually moved root processing into concurrent phase, code started to diverge, that made code hard to reason and maintain. > > First step, I would like to split STW and concurrent mark, so that: > 1) Code has to special case for STW and concurrent mark. > 2) STW mark does not need to rendezvous workers between root mark and the rest of mark > 3) STW mark does not need to activate SATB barrier and drain SATB buffers. > 4) STW mark does not need to remark some of roots. > > The patch mainly just shuffles code. Creates a base class ShenandoahMark, and moved shared code (from current shenandoahConcurrentMark) into this base class. I did 'git mv shenandoahConcurrentMark.inline.hpp shenandoahMark.inline.hpp, but git does not seem to reflect that. > > A few changes: > 1) Moved task queue set from ShenandoahConcurrentMark to ShenandoahHeap. ShenandoahMark and its subclasses are stateless. Instead, mark states are maintained in task queue, mark bitmap and SATB buffers, so that they can be created on demand. > 2) Split ShenandoahConcurrentRootScanner template to ShenandoahConcurrentRootScanner and ShenandoahSTWRootScanner > 3) Split code inside op_final_mark code into finish_mark and prepare_evacuation helper functions. > 4) Made ShenandoahMarkCompact stack allocated (as well as ShenandoahConcurrentGC and ShenandoahDegeneratedGC in upcoming refactoring) Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: @shade's comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1009/files - new: https://git.openjdk.java.net/jdk/pull/1009/files/b7390c08..8cd3f9dc Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1009&range=24 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1009&range=23-24 Stats: 51 lines in 7 files changed: 7 ins; 13 del; 31 mod Patch: https://git.openjdk.java.net/jdk/pull/1009.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1009/head:pull/1009 PR: https://git.openjdk.java.net/jdk/pull/1009 From zgu at openjdk.java.net Tue Jan 5 14:59:12 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Tue, 5 Jan 2021 14:59:12 GMT Subject: RFR: 8255019: Shenandoah: Split STW and concurrent mark into separate classes [v24] In-Reply-To: <4C2MtK5jhW_yHyzDp5k5cAaguy3QlmI0T0LTKf6n978=.8599d416-5d32-40e5-ae7f-9bf5ca5b330d@github.com> References: <4C2MtK5jhW_yHyzDp5k5cAaguy3QlmI0T0LTKf6n978=.8599d416-5d32-40e5-ae7f-9bf5ca5b330d@github.com> Message-ID: On Tue, 5 Jan 2021 14:00:53 GMT, Aleksey Shipilev wrote: >> Zhengyu Gu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 31 commits: >> >> - Merge >> - Update copyright years >> - Merge >> - Merge branch 'master' into JDK-8255019-sh-mark >> - Concurrent mark does not expect forwarded objects >> - Merge branch 'master' into JDK-8255019-sh-mark >> - Merge branch 'master' into JDK-8255019-sh-mark >> - Silent valgrind on potential memory leak >> - Merge branch 'master' into JDK-8255019-sh-mark >> - Removed ShenandoahConcurrentMark parameter from concurrent GC entry/op, etc. >> - ... and 21 more: https://git.openjdk.java.net/jdk/compare/a6c08813...b7390c08 > > src/hotspot/share/gc/shenandoah/shenandoahMark.inline.hpp line 306: > >> 304: ShenandoahHeap* ShenandoahMark::heap() const { >> 305: return _heap; >> 306: } > > Do we really need this method? `ShenandoahHeap::heap()` is supposed to be as fast. No. Quite messy on how to access heap in mark code ... removed _heap member and all access via ShenandoahHeap::heap() ------------- PR: https://git.openjdk.java.net/jdk/pull/1009 From zgu at openjdk.java.net Tue Jan 5 18:23:13 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Tue, 5 Jan 2021 18:23:13 GMT Subject: RFR: 8255019: Shenandoah: Split STW and concurrent mark into separate classes [v26] In-Reply-To: References: Message-ID: > This is the first part of refactoring, that aims to isolate three Shenandoah GC modes (concurrent, degenerated and full gc). > > Shenandoah started with two GC modes, concurrent and full gc, with minimal shared code, mainly in mark phase. After introducing degenerated GC, it shared quite large portion of code with concurrent GC, with the concept that degenerated GC can simply pick up remaining work of concurrent GC in STW mode. > > It was not a big problem at that time, since concurrent GC also processed roots STW. Since Shenandoah gradually moved root processing into concurrent phase, code started to diverge, that made code hard to reason and maintain. > > First step, I would like to split STW and concurrent mark, so that: > 1) Code has to special case for STW and concurrent mark. > 2) STW mark does not need to rendezvous workers between root mark and the rest of mark > 3) STW mark does not need to activate SATB barrier and drain SATB buffers. > 4) STW mark does not need to remark some of roots. > > The patch mainly just shuffles code. Creates a base class ShenandoahMark, and moved shared code (from current shenandoahConcurrentMark) into this base class. I did 'git mv shenandoahConcurrentMark.inline.hpp shenandoahMark.inline.hpp, but git does not seem to reflect that. > > A few changes: > 1) Moved task queue set from ShenandoahConcurrentMark to ShenandoahHeap. ShenandoahMark and its subclasses are stateless. Instead, mark states are maintained in task queue, mark bitmap and SATB buffers, so that they can be created on demand. > 2) Split ShenandoahConcurrentRootScanner template to ShenandoahConcurrentRootScanner and ShenandoahSTWRootScanner > 3) Split code inside op_final_mark code into finish_mark and prepare_evacuation helper functions. > 4) Made ShenandoahMarkCompact stack allocated (as well as ShenandoahConcurrentGC and ShenandoahDegeneratedGC in upcoming refactoring) Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: Silent MacOSX build ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1009/files - new: https://git.openjdk.java.net/jdk/pull/1009/files/8cd3f9dc..dd57c073 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1009&range=25 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1009&range=24-25 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1009.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1009/head:pull/1009 PR: https://git.openjdk.java.net/jdk/pull/1009 From zgu at openjdk.java.net Tue Jan 5 20:36:08 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Tue, 5 Jan 2021 20:36:08 GMT Subject: RFR: 8255019: Shenandoah: Split STW and concurrent mark into separate classes [v27] In-Reply-To: References: Message-ID: > This is the first part of refactoring, that aims to isolate three Shenandoah GC modes (concurrent, degenerated and full gc). > > Shenandoah started with two GC modes, concurrent and full gc, with minimal shared code, mainly in mark phase. After introducing degenerated GC, it shared quite large portion of code with concurrent GC, with the concept that degenerated GC can simply pick up remaining work of concurrent GC in STW mode. > > It was not a big problem at that time, since concurrent GC also processed roots STW. Since Shenandoah gradually moved root processing into concurrent phase, code started to diverge, that made code hard to reason and maintain. > > First step, I would like to split STW and concurrent mark, so that: > 1) Code has to special case for STW and concurrent mark. > 2) STW mark does not need to rendezvous workers between root mark and the rest of mark > 3) STW mark does not need to activate SATB barrier and drain SATB buffers. > 4) STW mark does not need to remark some of roots. > > The patch mainly just shuffles code. Creates a base class ShenandoahMark, and moved shared code (from current shenandoahConcurrentMark) into this base class. I did 'git mv shenandoahConcurrentMark.inline.hpp shenandoahMark.inline.hpp, but git does not seem to reflect that. > > A few changes: > 1) Moved task queue set from ShenandoahConcurrentMark to ShenandoahHeap. ShenandoahMark and its subclasses are stateless. Instead, mark states are maintained in task queue, mark bitmap and SATB buffers, so that they can be created on demand. > 2) Split ShenandoahConcurrentRootScanner template to ShenandoahConcurrentRootScanner and ShenandoahSTWRootScanner > 3) Split code inside op_final_mark code into finish_mark and prepare_evacuation helper functions. > 4) Made ShenandoahMarkCompact stack allocated (as well as ShenandoahConcurrentGC and ShenandoahDegeneratedGC in upcoming refactoring) Zhengyu Gu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 34 commits: - Merge branch 'master' into JDK-8255019-sh-mark - Silent MacOSX build - @shade's comments - Merge - Update copyright years - Merge - Merge branch 'master' into JDK-8255019-sh-mark - Concurrent mark does not expect forwarded objects - Merge branch 'master' into JDK-8255019-sh-mark - Merge branch 'master' into JDK-8255019-sh-mark - ... and 24 more: https://git.openjdk.java.net/jdk/compare/4d3d5991...a6540b99 ------------- Changes: https://git.openjdk.java.net/jdk/pull/1009/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1009&range=26 Stats: 1978 lines in 21 files changed: 1078 ins; 759 del; 141 mod Patch: https://git.openjdk.java.net/jdk/pull/1009.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1009/head:pull/1009 PR: https://git.openjdk.java.net/jdk/pull/1009 From github.com+13173904+lhtin at openjdk.java.net Wed Jan 6 01:10:57 2021 From: github.com+13173904+lhtin at openjdk.java.net (Lehua Ding) Date: Wed, 6 Jan 2021 01:10:57 GMT Subject: RFR: 8259231: Fix the chance to allocate failure and improve the speed and quality of EpsilonHeap::allocate_work In-Reply-To: References: Message-ID: On Tue, 5 Jan 2021 13:34:24 GMT, Aleksey Shipilev wrote: >> Hi all, >> >> The `EpsilonHeap::allocate_work` method maybe can be fixed and improved by this: >> 1. it can prevent allocate failure by retry `_space->par_allocate` before expanding virtual space, when there not enough virtual space but another thread expanding succeeded just and has enough space. >> 2. it can reduce the lock time by move `res = _space->par_allocate(size);` out of lock scope. >> >> Test on macosx-x86_64-server-{release, fastdebug, slowdebug} with current test case. > > src/hotspot/share/gc/epsilon/epsilon_globals.hpp line 67: > >> 65: \ >> 66: product(bool, EpsilonElasticTLABDecay, true, EXPERIMENTAL, \ >> 67: "Use timed decays to shrink TLAB sizes. This conserves memory " \ > > Let's not do the typo fixes in this PR. Maybe there are other spelling problems elsewhere in `gc/epsilon` that we could fix wholesale in another PR? OK, I will revert the typo fixes change. ------------- PR: https://git.openjdk.java.net/jdk/pull/1794 From github.com+13173904+lhtin at openjdk.java.net Wed Jan 6 01:17:58 2021 From: github.com+13173904+lhtin at openjdk.java.net (Lehua Ding) Date: Wed, 6 Jan 2021 01:17:58 GMT Subject: RFR: 8259231: Fix the chance to allocate failure and improve the speed and quality of EpsilonHeap::allocate_work In-Reply-To: References: Message-ID: On Tue, 5 Jan 2021 13:39:04 GMT, Aleksey Shipilev wrote: >> Hi all, >> >> The `EpsilonHeap::allocate_work` method maybe can be fixed and improved by this: >> 1. it can prevent allocate failure by retry `_space->par_allocate` before expanding virtual space, when there not enough virtual space but another thread expanding succeeded just and has enough space. >> 2. it can reduce the lock time by move `res = _space->par_allocate(size);` out of lock scope. >> >> Test on macosx-x86_64-server-{release, fastdebug, slowdebug} with current test case. > > src/hotspot/share/gc/epsilon/epsilonHeap.cpp line 113: > >> 111: while (res == NULL) { >> 112: // Allocation failed, attempt expansion, and retry: >> 113: { > > I see what you are trying to do, and it makes sense. I believe this form would be cleaner: > > HeapWord* res = NULL; > while (true) { > // Try to allocate, assume space is available > res = par_allocate(size); > if (res != NULL) { > break; > } > > MutexLocker ml(Heap_Lock); > > // Try to allocate under the lock, assume another thread was able to expand > res = par_allocate(size); > if (res != NULL) { > break; > } > > // Expand and loop back if space is available > size_t space_left = max_capacity() - capacity(); > size_t want_space = MAX2(size, EpsilonMinHeapExpand); > ... > } Yes, the new form is cleaner very. And I think it would be a little cleaner if wrap the lock scope in curly braces. like this: ++ HeapWord* res = NULL; while (true) { // Try to allocate, assume space is available res = par_allocate(size); if (res != NULL) { break; } { MutexLocker ml(Heap_Lock); // Try to allocate under the lock, assume another thread was able to expand res = par_allocate(size); if (res != NULL) { break; } // Expand and loop back if space is available size_t space_left = max_capacity() - capacity(); size_t want_space = MAX2(size, EpsilonMinHeapExpand); ... } } ------------- PR: https://git.openjdk.java.net/jdk/pull/1794 From github.com+13173904+lhtin at openjdk.java.net Wed Jan 6 01:28:11 2021 From: github.com+13173904+lhtin at openjdk.java.net (Lehua Ding) Date: Wed, 6 Jan 2021 01:28:11 GMT Subject: RFR: 8259231: Fix the chance to allocate failure and improve the speed and quality of EpsilonHeap::allocate_work [v2] In-Reply-To: References: Message-ID: > Hi all, > > The `EpsilonHeap::allocate_work` method maybe can be fixed and improved by this: > 1. it can prevent allocate failure by retry `_space->par_allocate` before expanding virtual space, when there not enough virtual space but another thread expanding succeeded just and has enough space. > 2. it can reduce the lock time by move `res = _space->par_allocate(size);` out of lock scope. > > Test on macosx-x86_64-server-{release, fastdebug, slowdebug} with current test case. Lehua Ding has updated the pull request incrementally with two additional commits since the last revision: - Epsilon: revert irrelevant typo fixes - Epsilon: refactor to a cleaner form ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1794/files - new: https://git.openjdk.java.net/jdk/pull/1794/files/7c97808d..188c41a1 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1794&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1794&range=00-01 Stats: 14 lines in 2 files changed: 6 ins; 3 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/1794.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1794/head:pull/1794 PR: https://git.openjdk.java.net/jdk/pull/1794 From github.com+13173904+lhtin at openjdk.java.net Wed Jan 6 01:31:10 2021 From: github.com+13173904+lhtin at openjdk.java.net (Lehua Ding) Date: Wed, 6 Jan 2021 01:31:10 GMT Subject: RFR: 8259231: Fix the chance to allocate failure and improve the speed and quality of EpsilonHeap::allocate_work [v3] In-Reply-To: References: Message-ID: > Hi all, > > The `EpsilonHeap::allocate_work` method maybe can be fixed and improved by this: > 1. it can prevent allocate failure by retry `_space->par_allocate` before expanding virtual space, when there not enough virtual space but another thread expanding succeeded just and has enough space. > 2. it can reduce the lock time by move `res = _space->par_allocate(size);` out of lock scope. > > Test on macosx-x86_64-server-{release, fastdebug, slowdebug} with current test case. Lehua Ding has updated the pull request incrementally with one additional commit since the last revision: Epsilon: clean trailing whitespace ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1794/files - new: https://git.openjdk.java.net/jdk/pull/1794/files/188c41a1..4ae547f0 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1794&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1794&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1794.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1794/head:pull/1794 PR: https://git.openjdk.java.net/jdk/pull/1794 From shade at openjdk.java.net Wed Jan 6 09:13:04 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 6 Jan 2021 09:13:04 GMT Subject: RFR: 8259231: Epsilon: improve performance under contention during virtual space expansion [v3] In-Reply-To: References: Message-ID: On Wed, 6 Jan 2021 01:31:10 GMT, Lehua Ding wrote: >> Hi all, >> >> The `EpsilonHeap::allocate_work` method maybe can be fixed and improved by this: >> 1. it can prevent allocate failure by retry `_space->par_allocate` before expanding virtual space, when there not enough virtual space but another thread expanding succeeded just and has enough space. >> 2. it can reduce the lock time by move `res = _space->par_allocate(size);` out of lock scope. >> >> Test on macosx-x86_64-server-{release, fastdebug, slowdebug} with current test case. > > Lehua Ding has updated the pull request incrementally with one additional commit since the last revision: > > Epsilon: clean trailing whitespace Okay, this is good. Please also run `make run-test TEST=gc/epsilon` explicitly; it is supposed to run in tier1 already, but better be safe than sorry. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1794 From srinayak666 at gmail.com Wed Jan 6 11:28:13 2021 From: srinayak666 at gmail.com (Srikanth Nayak) Date: Wed, 6 Jan 2021 16:58:13 +0530 Subject: issue with GC in RE 1.8.0_275 build Message-ID: Hi All, Below are the details about the issue with JRE 1.8.0_275 build. -GC is not executing as expected. Platform: Mac OS X (mojave, catalina etc) JRE-1:(Open JDK) openjdk version "1.8.0_275" OpenJDK Runtime Environment (build 1.8.0_275-b01) Eclipse OpenJ9 VM (build openj9-0.23.0, JRE 1.8.0 Mac OS X amd64-64-Bit 20201112_584 (JIT enabled, AOT enabled) OpenJ9 - 0394ef754 OMR - 582366ae5 JCL - b52d2ff7ee based on jdk8u275-b01) JRE-2:(IBM JDK) java version "1.8.0_181" Java(TM) 2 Runtime Environment, Standard Edition (IBM build 1.8.0_181-b13 25_Jul_2018_11_11 Mac OS X x64(SR5 FP20)) Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode) IBM Java ORB build orb80-20180617.00 XML build XL TXE Java 1.0.60 XML build IBM JAXP 1.6.1 XML build XML4J 4.5.30 gcpolicy: concurrentScavenge (tried with default gencon too) Problem: with JRE-1 which we use with our application. -GC is not executing as expected. -There is a significant increase in memory even when the application is idle. -performing tests causes an increase in memory - but not released as expected. when using JRE-2 shows significant advantage compared to JRE-1 with our application. Analysis: Testing scenario: 1 JRE-1: starting our application in consumes: 292.0MB leave application idle for 1 hour: 28.9MB CONSUMED After testing and final memory will be: 693.5MB Testing scenario: 2 JRE-2: starting our application in consumes: 486.7MB leave application idle for 1 hour: 7.7MB RELEASED After testing and final memory will be: 315.1MB -Testing scenarios are the same. -Mac OS X is installed in the VM environment for testing. -Our application is a standalone application which runs on Mac as well as Windows. -The issue is not seen with the Windows version of JRE. can you please help with the issue so that we can continue with JRE-1 ? -- *Regards,* *????????. ?* ___________________________________ "*freedom exists in the world of ideas*" ___________________________________ From github.com+13173904+lhtin at openjdk.java.net Wed Jan 6 15:40:02 2021 From: github.com+13173904+lhtin at openjdk.java.net (Lehua Ding) Date: Wed, 6 Jan 2021 15:40:02 GMT Subject: RFR: 8259231: Epsilon: improve performance under contention during virtual space expansion [v3] In-Reply-To: References: Message-ID: On Wed, 6 Jan 2021 09:09:59 GMT, Aleksey Shipilev wrote: >> Lehua Ding has updated the pull request incrementally with one additional commit since the last revision: >> >> Epsilon: clean trailing whitespace > > Okay, this is good. Please also run `make run-test TEST=gc/epsilon` explicitly; it is supposed to run in tier1 already, but better be safe than sorry. Tests of gc/epsilon have passed on {macosx,linux}-x86_64-server-{release,fastdebug,slowdebug}. ------------- PR: https://git.openjdk.java.net/jdk/pull/1794 From github.com+13173904+lhtin at openjdk.java.net Wed Jan 6 15:40:04 2021 From: github.com+13173904+lhtin at openjdk.java.net (Lehua Ding) Date: Wed, 6 Jan 2021 15:40:04 GMT Subject: Integrated: 8259231: Epsilon: improve performance under contention during virtual space expansion In-Reply-To: References: Message-ID: <349eScSk5QqlaDKlko3fCLHUo2Yk6uAlEglRiXReGFo=.34a99f83-3da2-469b-a318-09745bdb7ee5@github.com> On Wed, 16 Dec 2020 01:29:55 GMT, Lehua Ding wrote: > Hi all, > > The `EpsilonHeap::allocate_work` method maybe can be fixed and improved by this: > 1. it can prevent allocate failure by retry `_space->par_allocate` before expanding virtual space, when there not enough virtual space but another thread expanding succeeded just and has enough space. > 2. it can reduce the lock time by move `res = _space->par_allocate(size);` out of lock scope. > > Test on macosx-x86_64-server-{release, fastdebug, slowdebug} with current test case. This pull request has now been integrated. Changeset: 722f2361 Author: Lehua Ding Committer: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/722f2361 Stats: 39 lines in 1 file changed: 17 ins; 4 del; 18 mod 8259231: Epsilon: improve performance under contention during virtual space expansion Reviewed-by: shade ------------- PR: https://git.openjdk.java.net/jdk/pull/1794 From github.com+13173904+lhtin at openjdk.java.net Wed Jan 6 15:51:57 2021 From: github.com+13173904+lhtin at openjdk.java.net (Lehua Ding) Date: Wed, 6 Jan 2021 15:51:57 GMT Subject: RFR: 8259231: Epsilon: improve performance under contention during virtual space expansion [v3] In-Reply-To: References: Message-ID: On Wed, 6 Jan 2021 09:09:59 GMT, Aleksey Shipilev wrote: >> Lehua Ding has updated the pull request incrementally with one additional commit since the last revision: >> >> Epsilon: clean trailing whitespace > > Okay, this is good. Please also run `make run-test TEST=gc/epsilon` explicitly; it is supposed to run in tier1 already, but better be safe than sorry. Thank you @shipilev for reviewing and sponsoring. ------------- PR: https://git.openjdk.java.net/jdk/pull/1794 From qpzhang at openjdk.java.net Thu Jan 7 17:02:02 2021 From: qpzhang at openjdk.java.net (Patrick Zhang) Date: Thu, 7 Jan 2021 17:02:02 GMT Subject: RFR: 8259380: Correct pretouch chunk size to cap with actual page size Message-ID: This is actually a regression, with regards to JVM startup time extreme slowdown, initially found at an aarch64 platform (Ampere Altra core). The chunk size of pretouching should cap with the input page size which probably stands for large pages size if UseLargePages was set, otherwise processing chunks with much smaller size inside large size pages would hurt performance. This issue was introduced during an refactor on chunk calculations [JDK-8254972](https://bugs.openjdk.java.net/browse/JDK-8254972) (https://github.com/openjdk/jdk/commit/2c7fc85be92c60f4262aff3bc80e704792c1e810) but did not cause any problem immediately since the default PreTouchParallelChunkSize for all platforms are 1GB which can cover all popular sizes of large pages in use by most kernel variations. Later on, [JDK-8254699](https://bugs.openjdk.java.net/browse/JDK-8254699) (https://github.com/openjdk/jdk/commit/805d05812c5e831947197419d163f9c83d55634a) set default 4MB for Linux platform, which is helpful to speed up startup time for some platforms. For example, most x64, since the popular default large page size (e.g. CentOS) is 2MB. In contrast, most default large page size with aarch64 platforms/kernels (e.g. CentOS) are 512MB, so using the 4MB chunk size to do page walk through the pages inside 512MB large page hurt performance of startup time. In addition, there will be a similar problem if we set -XX:PreTouchParallelChunkSize=4k at a x64 Linux platform, the startup slowdown will show as well. Tests: https://bugs.openjdk.java.net/secure/attachment/92623/pretouch_chunk_size_fix_testing.txt The 4 before-after comparisons show the JVM startup time go back to normal. 1). 33.381s to 0.870s 2). 20.333s to 2.740s 3). 15.090s to 6.268s 4). 38.983s to 6.709s (Use the start time of pretouching the first Survivor space as a rough measurement, while \time, or GCTraceTime can generate similar results) ------------- Commit messages: - 8259380: Correct pretouch chunk size to cap with actual page size Changes: https://git.openjdk.java.net/jdk/pull/1978/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1978&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8259380 Stats: 6 lines in 1 file changed: 4 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1978.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1978/head:pull/1978 PR: https://git.openjdk.java.net/jdk/pull/1978 From zgu at openjdk.java.net Thu Jan 7 18:45:05 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 7 Jan 2021 18:45:05 GMT Subject: RFR: 8259377: Shenandoah: Enhance weak reference processing timing tacking Message-ID: Please review this enhancement for tracking weak references processing. Test: - [x] hotspot_gc_shenandoah ------------- Commit messages: - cleanup - JDK-8259377: init update Changes: https://git.openjdk.java.net/jdk/pull/1979/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1979&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8259377 Stats: 41 lines in 5 files changed: 12 ins; 2 del; 27 mod Patch: https://git.openjdk.java.net/jdk/pull/1979.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1979/head:pull/1979 PR: https://git.openjdk.java.net/jdk/pull/1979 From fw at deneb.enyo.de Thu Jan 7 19:11:38 2021 From: fw at deneb.enyo.de (Florian Weimer) Date: Thu, 07 Jan 2021 20:11:38 +0100 Subject: issue with GC in RE 1.8.0_275 build In-Reply-To: (Srikanth Nayak's message of "Wed, 6 Jan 2021 16:58:13 +0530") References: Message-ID: <87tursk1ed.fsf@mid.deneb.enyo.de> * Srikanth Nayak: > Eclipse OpenJ9 VM (build openj9-0.23.0, JRE 1.8.0 Mac OS X amd64-64-Bit This isn't a Hotspot build, so you need to contact the Eclipse OpenJ9 project to report this. From zgu at openjdk.java.net Thu Jan 7 19:56:16 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 7 Jan 2021 19:56:16 GMT Subject: RFR: 8259377: Shenandoah: Enhance weak reference processing timing tacking [v2] In-Reply-To: References: Message-ID: > Please review this enhancement for tracking weak references processing. > > Test: > - [x] hotspot_gc_shenandoah Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: Fix indentations ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1979/files - new: https://git.openjdk.java.net/jdk/pull/1979/files/a01325d9..baa53fa8 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1979&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1979&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/1979.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1979/head:pull/1979 PR: https://git.openjdk.java.net/jdk/pull/1979 From tschatzl at openjdk.java.net Fri Jan 8 10:54:54 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 8 Jan 2021 10:54:54 GMT Subject: RFR: 8258481: gc.g1.plab.TestPLABPromotion fails on Linux x86 [v2] In-Reply-To: References: Message-ID: On Sun, 20 Dec 2020 22:14:05 GMT, Kim Barrett wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> kbarrett review > > Marked as reviewed by kbarrett (Reviewer). Thanks @kimbarrett @sjohanss for your reviews ------------- PR: https://git.openjdk.java.net/jdk/pull/1842 From tschatzl at openjdk.java.net Fri Jan 8 10:54:56 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 8 Jan 2021 10:54:56 GMT Subject: Integrated: 8258481: gc.g1.plab.TestPLABPromotion fails on Linux x86 In-Reply-To: References: Message-ID: On Fri, 18 Dec 2020 15:17:26 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this test bug fix on x86 (but it did not do the correct thing on 64 bit platforms either)? > > There are sone test cases that allocates byte arrays of ~3500 bytes, and expect that almost all allocations occur in the PLABs given a PLAB waste threshold of some percentage, in this case 20%. > > On x64 this is good, as the PLAB size is 4096 *words*, i.e. 32kb, and 20% of that is ~6.5kb. So all objects are allocated in PLABs as expected > > On x86 the PLAB size of 4096 words is only 16kb, and 20% of that is ~3.2kb. This threshold is less than these 3500 bytes, so the test fails. > > It does not fail always (but very often) because of the broken calculation for meeting the threshold: unless really *all* objects copied are of that 3500 byte size (and hence directly allocated), the current checking using integer calculation results in 0% waste, which is below the expected 20%. > > The suggested fix is to lower the size of that array to 3250 bytes, which meets the criteria on both 32 and 64 bit platforms (and fix the broken calculations). > > Note that we should not change this array size to much lower, because there is another test that fails otherwise. > > Testing: > 100 successful test runs on x86 and x64 linux each > > Thanks, > Thomas This pull request has now been integrated. Changeset: b549cbd3 Author: Thomas Schatzl URL: https://git.openjdk.java.net/jdk/commit/b549cbd3 Stats: 4 lines in 1 file changed: 1 ins; 0 del; 3 mod 8258481: gc.g1.plab.TestPLABPromotion fails on Linux x86 Reviewed-by: sjohanss, kbarrett ------------- PR: https://git.openjdk.java.net/jdk/pull/1842 From tschatzl at openjdk.java.net Fri Jan 8 11:08:57 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 8 Jan 2021 11:08:57 GMT Subject: RFR: 8259380: Correct pretouch chunk size to cap with actual page size In-Reply-To: References: Message-ID: On Thu, 7 Jan 2021 16:56:37 GMT, Patrick Zhang wrote: > This is actually a regression, with regards to JVM startup time extreme slowdown, initially found at an aarch64 platform (Ampere Altra core). > > The chunk size of pretouching should cap with the input page size which probably stands for large pages size if UseLargePages was set, otherwise processing chunks with much smaller size inside large size pages would hurt performance. > > This issue was introduced during a refactor on chunk calculations [JDK-8254972](https://bugs.openjdk.java.net/browse/JDK-8254972) (https://github.com/openjdk/jdk/commit/2c7fc85be92c60f4262aff3bc80e704792c1e810) but did not cause any problem immediately since the default PreTouchParallelChunkSize for all platforms are 1GB which can cover all popular sizes of large pages in use by most kernel variations. Later on, [JDK-8254699](https://bugs.openjdk.java.net/browse/JDK-8254699) (https://github.com/openjdk/jdk/commit/805d05812c5e831947197419d163f9c83d55634a) set default 4MB for Linux platform, which is helpful to speed up startup time for some platforms. For example, most x64, since the popular default large page size (e.g. CentOS) is 2MB. In contrast, most default large page size with aarch64 platforms/kernels (e.g. CentOS) are 512MB, so using the 4MB chunk size to do page walk through the pages inside 512MB large page hurt performance of startup time. > > In addition, there will be a similar problem if we set -XX:PreTouchParallelChunkSize=4k at a x64 Linux platform, the startup slowdown will show as well. > > Tests: > https://bugs.openjdk.java.net/secure/attachment/92623/pretouch_chunk_size_fix_testing.txt > The 4 before-after comparisons show the JVM startup time go back to normal. > 1). 33.381s to 0.870s > 2). 20.333s to 2.740s > 3). 15.090s to 6.268s > 4). 38.983s to 6.709s > (Use the start time of pretouching the first Survivor space as a rough measurement, while \time, or GCTraceTime can generate similar results) Thanks for finding and reporting this issue and even providing a patch. After having looked at the issue we (in the Oracle GC team) think this problem is serious enough to actually go into JDK16. Since backporting after having this pushed to some (this) repo is some extra effort, would you mind closing this PR here on openjdk/jdk and reopening a new one on openjdk/jdk16? It will then be automatically forward ported to this repo. Not only is backporting some additional effort, there is concern that it won't make it into jdk16 otherwise - Jan 14 is cutoff date for bugs of this seriousness, and we'd need to get an exception for this otherwise. As one of the persons typically triaging new issues in the bug tracker, I would also like to ask you to not open new issues immediately. We are looking at these issues three times a week, and if you open them yourselves, issues might not be handled correctly (i.e. like in this case immediately put into openjdk/jdk16). You can still create a PR and everything even if an issue is in "New" state. I'll start looking at your change immediately. Thanks, Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/1978 From tschatzl at openjdk.java.net Fri Jan 8 11:08:58 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 8 Jan 2021 11:08:58 GMT Subject: RFR: 8259380: Correct pretouch chunk size to cap with actual page size In-Reply-To: References: Message-ID: On Fri, 8 Jan 2021 11:03:55 GMT, Thomas Schatzl wrote: >> This is actually a regression, with regards to JVM startup time extreme slowdown, initially found at an aarch64 platform (Ampere Altra core). >> >> The chunk size of pretouching should cap with the input page size which probably stands for large pages size if UseLargePages was set, otherwise processing chunks with much smaller size inside large size pages would hurt performance. >> >> This issue was introduced during a refactor on chunk calculations [JDK-8254972](https://bugs.openjdk.java.net/browse/JDK-8254972) (https://github.com/openjdk/jdk/commit/2c7fc85be92c60f4262aff3bc80e704792c1e810) but did not cause any problem immediately since the default PreTouchParallelChunkSize for all platforms are 1GB which can cover all popular sizes of large pages in use by most kernel variations. Later on, [JDK-8254699](https://bugs.openjdk.java.net/browse/JDK-8254699) (https://github.com/openjdk/jdk/commit/805d05812c5e831947197419d163f9c83d55634a) set default 4MB for Linux platform, which is helpful to speed up startup time for some platforms. For example, most x64, since the popular default large page size (e.g. CentOS) is 2MB. In contrast, most default large page size with aarch64 platforms/kernels (e.g. CentOS) are 512MB, so using the 4MB chunk size to do page walk through the pages inside 512MB large page hurt performance of startup time. >> >> In addition, there will be a similar problem if we set -XX:PreTouchParallelChunkSize=4k at a x64 Linux platform, the startup slowdown will show as well. >> >> Tests: >> https://bugs.openjdk.java.net/secure/attachment/92623/pretouch_chunk_size_fix_testing.txt >> The 4 before-after comparisons show the JVM startup time go back to normal. >> 1). 33.381s to 0.870s >> 2). 20.333s to 2.740s >> 3). 15.090s to 6.268s >> 4). 38.983s to 6.709s >> (Use the start time of pretouching the first Survivor space as a rough measurement, while \time, or GCTraceTime can generate similar results) > > Thanks for finding and reporting this issue and even providing a patch. > > After having looked at the issue we (in the Oracle GC team) think this problem is serious enough to actually go into JDK16. Since backporting after having this pushed to some (this) repo is some extra effort, would you mind closing this PR here on openjdk/jdk and reopening a new one on openjdk/jdk16? > > It will then be automatically forward ported to this repo. Not only is backporting some additional effort, there is concern that it won't make it into jdk16 otherwise - Jan 14 is cutoff date for bugs of this seriousness, and we'd need to get an exception for this otherwise. > > As one of the persons typically triaging new issues in the bug tracker, I would also like to ask you to not open new issues immediately. We are looking at these issues three times a week, and if you open them yourselves, issues might not be handled correctly (i.e. like in this case immediately put into openjdk/jdk16). You can still create a PR and everything even if an issue is in "New" state. > > I'll start looking at your change immediately. > > Thanks, > Thomas Fwiw, the most appropriate label for this change would probably be "hotspot-gc", but "hotspot" is fine too. ------------- PR: https://git.openjdk.java.net/jdk/pull/1978 From qpzhang at openjdk.java.net Fri Jan 8 11:15:59 2021 From: qpzhang at openjdk.java.net (Patrick Zhang) Date: Fri, 8 Jan 2021 11:15:59 GMT Subject: RFR: 8259380: Correct pretouch chunk size to cap with actual page size In-Reply-To: References: Message-ID: On Fri, 8 Jan 2021 11:06:27 GMT, Thomas Schatzl wrote: >> Thanks for finding and reporting this issue and even providing a patch. >> >> After having looked at the issue we (in the Oracle GC team) think this problem is serious enough to actually go into JDK16. Since backporting after having this pushed to some (this) repo is some extra effort, would you mind closing this PR here on openjdk/jdk and reopening a new one on openjdk/jdk16? >> >> It will then be automatically forward ported to this repo. Not only is backporting some additional effort, there is concern that it won't make it into jdk16 otherwise - Jan 14 is cutoff date for bugs of this seriousness, and we'd need to get an exception for this otherwise. >> >> As one of the persons typically triaging new issues in the bug tracker, I would also like to ask you to not open new issues immediately. We are looking at these issues three times a week, and if you open them yourselves, issues might not be handled correctly (i.e. like in this case immediately put into openjdk/jdk16). You can still create a PR and everything even if an issue is in "New" state. >> >> I'll start looking at your change immediately. >> >> Thanks, >> Thomas > > Fwiw, the most appropriate label for this change would probably be "hotspot-gc", but "hotspot" is fine too. Understood, I will do this today. > would you mind closing this PR here on openjdk/jdk and reopening a new one on openjdk/jdk16? OK. Can I use the same tracker (8259380 for the PR to jdk16? > As one of the persons typically triaging new issues in the bug tracker, I would also like to ask you to not open new issues immediately. ------------- PR: https://git.openjdk.java.net/jdk/pull/1978 From tschatzl at openjdk.java.net Fri Jan 8 11:15:59 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 8 Jan 2021 11:15:59 GMT Subject: RFR: 8259380: Correct pretouch chunk size to cap with actual page size In-Reply-To: References: Message-ID: <8jBja_MDcykHxaVbrH297uflunc2knkCO-qF5cgwNTc=.529038e6-4557-4a9f-b385-14f858591203@github.com> On Fri, 8 Jan 2021 11:11:40 GMT, Patrick Zhang wrote: >> Fwiw, the most appropriate label for this change would probably be "hotspot-gc", but "hotspot" is fine too. > > Understood, I will do this today. > >> would you mind closing this PR here on openjdk/jdk and reopening a new one on openjdk/jdk16? > > OK. Can I use the same tracker (8259380 for the PR to jdk16? > >> As one of the persons typically triaging new issues in the bug tracker, I would also like to ask you to not open new issues immediately. Yes, keep everything the same, the only difference is to create a pull request for openjdk/jdk16, not openjdk/jdk. ------------- PR: https://git.openjdk.java.net/jdk/pull/1978 From tschatzl at openjdk.java.net Fri Jan 8 11:29:02 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 8 Jan 2021 11:29:02 GMT Subject: [jdk16] RFR: 8258985: Parallel WeakProcessor may use too few threads In-Reply-To: References: Message-ID: On Fri, 1 Jan 2021 10:02:10 GMT, Kim Barrett wrote: > Please review this fix to the parallel WeakProcessor's computation of the > number of worker threads to use. It was previously limited by the current > value of active_workers(), whatever that happens to be. It should be > limited by total_workers(), just as with the parallel ReferenceProcessor. > (Both are subject to ReferencesPerThread.) > > Testing > mach5 tier1 > Some hand testing (Linux-x64) to verify the expected number of threads are > being used. > > Note: That hand testing suggests some further tuning of ReferencesPerThread > might be in order. With the current default of 1000, I often saw in testing > that some threads were started late enough that no work was left for them. > I'll file a separate RFE for that. Lgtm although the comments in the documentation for both `static void weak_oops_do` method declarations mentions: // Parallel version. Uses ergo_workers(), active workers, and // phase_time's max_threads to determine the number of threads to use. which should be fixed. One option is to put something like "Uses max workers and the total number of weak references to determine the number of threads to use" as description for `ergo_workers` and remove the details in the description of the `weak_oops_do` descriptions, but just removing the mention of `active_workers` there could be fine too. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk16/pull/75 From kbarrett at openjdk.java.net Fri Jan 8 12:48:14 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 8 Jan 2021 12:48:14 GMT Subject: [jdk16] RFR: 8258985: Parallel WeakProcessor may use too few threads [v2] In-Reply-To: References: Message-ID: > Please review this fix to the parallel WeakProcessor's computation of the > number of worker threads to use. It was previously limited by the current > value of active_workers(), whatever that happens to be. It should be > limited by total_workers(), just as with the parallel ReferenceProcessor. > (Both are subject to ReferencesPerThread.) > > Testing > mach5 tier1 > Some hand testing (Linux-x64) to verify the expected number of threads are > being used. > > Note: That hand testing suggests some further tuning of ReferencesPerThread > might be in order. With the current default of 1000, I often saw in testing > that some threads were started late enough that no work was left for them. > I'll file a separate RFE for that. Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: fix doc comments about number of threads used ------------- Changes: - all: https://git.openjdk.java.net/jdk16/pull/75/files - new: https://git.openjdk.java.net/jdk16/pull/75/files/6859e5f6..0ed088d8 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk16&pr=75&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk16&pr=75&range=00-01 Stats: 9 lines in 1 file changed: 5 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk16/pull/75.diff Fetch: git fetch https://git.openjdk.java.net/jdk16 pull/75/head:pull/75 PR: https://git.openjdk.java.net/jdk16/pull/75 From kbarrett at openjdk.java.net Fri Jan 8 12:48:15 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 8 Jan 2021 12:48:15 GMT Subject: [jdk16] RFR: 8258985: Parallel WeakProcessor may use too few threads [v2] In-Reply-To: References: Message-ID: On Fri, 8 Jan 2021 11:26:11 GMT, Thomas Schatzl wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> fix doc comments about number of threads used > > Lgtm although the comments in the documentation for both `static void weak_oops_do` method declarations mentions: > > // Parallel version. Uses ergo_workers(), active workers, and > // phase_time's max_threads to determine the number of threads to use. > which should be fixed. > > One option is to put something like "Uses max workers and the total number of weak references to determine the number of threads to use" as description for `ergo_workers` and remove the details in the description of the `weak_oops_do` descriptions, but just removing the mention of `active_workers` there could be fine too. Thanks @tschatzl . I've updated the descriptions of the weak_oops_do functions, and also added a description for ergo_workers. ------------- PR: https://git.openjdk.java.net/jdk16/pull/75 From ayang at openjdk.java.net Fri Jan 8 13:01:05 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Fri, 8 Jan 2021 13:01:05 GMT Subject: [jdk16] RFR: 8258985: Parallel WeakProcessor may use too few threads [v2] In-Reply-To: References: Message-ID: On Fri, 8 Jan 2021 12:45:12 GMT, Kim Barrett wrote: >> Lgtm although the comments in the documentation for both `static void weak_oops_do` method declarations mentions: >> >> // Parallel version. Uses ergo_workers(), active workers, and >> // phase_time's max_threads to determine the number of threads to use. >> which should be fixed. >> >> One option is to put something like "Uses max workers and the total number of weak references to determine the number of threads to use" as description for `ergo_workers` and remove the details in the description of the `weak_oops_do` descriptions, but just removing the mention of `active_workers` there could be fine too. > > Thanks @tschatzl . I've updated the descriptions of the weak_oops_do functions, and also added a description for ergo_workers. What's the argument for using `total_workers()` here? BTW, `G1ConcurrentMark::weak_refs_work` is in the caller chain. According to its name, it happens outside a pause. ------------- PR: https://git.openjdk.java.net/jdk16/pull/75 From amith.pawar at gmail.com Fri Jan 8 13:08:54 2021 From: amith.pawar at gmail.com (Amit Pawar) Date: Fri, 8 Jan 2021 18:38:54 +0530 Subject: RFR: Convert old-gen single threaded pretouch to multi-threaded during Message-ID: Hi I am trying to improve the pre-touch time taken during old-gen resizing. Need your suggestions whether following change will be accepted or not. What is happening ? Every GC thread resizes the old-gen during object promotion if there is no enough room for the object. After expanding GC thread will pre-touch the pages alone and cant pre-touch in parallel using PretouchTask task as it is already executing a GC task. The total GC pause time depends upon resize size and number of resizes. What is fix? Create another WorkGang and then GC thread can execute pre-touch task with this new WorkGang to reduce the pre-touch time taken. The code change is given below. Improvement: 1. Pre-touch improved by 50-70% for SPECjbb composite test. 2. This depends upon number of resize request and resize size. SPECJbb composite testing shows old-gen resized with sizes like 2MB-32MB with G1GC and up-to 64MB with ParallelGC. Also number of resizes are more than 100-200. 3. PretouchTask class uses PreTouchParallelChunkSize and current default is 4MB for x86 to split the pre-touch task. So time taken depends upon old-gen resize and this change wont help if it lesser than PreTouchParallelChunkSize value. 4. Please refer excel file from bug report for more details on improvement for different sizes. https://bugs.openjdk.java.net/browse/JDK-8254699 Though it helps to reduce the pre-touch time taken but not sure whether adding another WorkGang is allowed. Please suggest. diff --git a/src/hotspot/share/gc/shared/gc_globals.hpp b/src/hotspot/share/gc/shared/gc_globals.hpp index aca8d6b6c34..b5d40b47480 100644 --- a/src/hotspot/share/gc/shared/gc_globals.hpp +++ b/src/hotspot/share/gc/shared/gc_globals.hpp @@ -200,6 +200,12 @@ product(bool, AlwaysPreTouch, false, \ "Force all freshly committed pages to be pre-touched") \ \ + product(size_t, OldGenPreTouchWorkers, 1, \ + "During object promotion old-gen can be expanded as required by" \ + "ParallelGCThreads. OldGenPreTouchWorkers can be used to " \ + "pre-touch the pages by ParallelGCThreads") \ + range(1, 1024) \ + \ product_pd(size_t, PreTouchParallelChunkSize, \ "Per-thread chunk size for parallel memory pre-touch.") \ range(4*K, SIZE_MAX / 2) \ diff --git a/src/hotspot/share/gc/shared/pretouchTask.cpp b/src/hotspot/share/gc/shared/pretouchTask.cpp index 4398d3924cc..435ec2ee76f 100644 --- a/src/hotspot/share/gc/shared/pretouchTask.cpp +++ b/src/hotspot/share/gc/shared/pretouchTask.cpp @@ -27,6 +27,7 @@ #include "runtime/atomic.hpp" #include "runtime/globals.hpp" #include "runtime/os.hpp" +#include "utilities/ticks.hpp" PretouchTask::PretouchTask(const char* task_name, char* start_address, @@ -62,6 +63,8 @@ void PretouchTask::work(uint worker_id) { } } +#define TIME_FORMAT "%0.3lfms" + void PretouchTask::pretouch(const char* task_name, char* start_address, char* end_address, size_t page_size, WorkGang* pretouch_gang) { @@ -83,14 +86,30 @@ void PretouchTask::pretouch(const char* task_name, char* start_address, char* en size_t num_chunks = (total_bytes + chunk_size - 1) / chunk_size; uint num_workers = (uint)MIN2(num_chunks, (size_t)pretouch_gang->total_workers()); - log_debug(gc, heap)("Running %s with %u workers for " SIZE_FORMAT " work units pre-touching " SIZE_FORMAT "B.", - task.name(), num_workers, num_chunks, total_bytes); - + Ticks mark_start = Ticks::now(); pretouch_gang->run_task(&task, num_workers); + Ticks mark_end = Ticks::now(); + log_debug(gc, heap)("Running %s with %u workers for " SIZE_FORMAT " work units pre-touching " SIZE_FORMAT "B. " TIME_FORMAT , + task.name(), num_workers, num_chunks, total_bytes, (mark_end-mark_start).seconds()); + } else { - log_debug(gc, heap)("Running %s pre-touching " SIZE_FORMAT "B.", - task.name(), total_bytes); - task.work(0); + if(OldGenPreTouchWorkers > 1) { + const char *oldgen_workers="Old-gen Pre-touch workers"; + static WorkGang *pretouch_workers= NULL ; + if (! pretouch_workers) { + // pretouch_workers are used when pretouch_gang is null. This usually happens during old-gen + // resizing due to object promotion. + pretouch_workers = new WorkGang(oldgen_workers, OldGenPreTouchWorkers, true, false); + pretouch_workers->initialize_workers(); + } + pretouch(oldgen_workers, start_address, end_address, page_size, pretouch_workers); + } else { + Ticks mark_start = Ticks::now(); + task.work(0); + Ticks mark_end = Ticks::now(); + log_debug(gc, heap)("Running %s pre-touching " SIZE_FORMAT "B. " TIME_FORMAT, + task.name(), total_bytes, (mark_end-mark_start).seconds()); + } } } Thanks, Amit Pawar From kim.barrett at oracle.com Fri Jan 8 13:18:35 2021 From: kim.barrett at oracle.com (Kim Barrett) Date: Fri, 8 Jan 2021 08:18:35 -0500 Subject: [jdk16] RFR: 8258985: Parallel WeakProcessor may use too few threads [v2] In-Reply-To: References: Message-ID: > On Jan 8, 2021, at 8:01 AM, Albert Mingkun Yang wrote: > > On Fri, 8 Jan 2021 12:45:12 GMT, Kim Barrett wrote: > >>> Lgtm although the comments in the documentation for both `static void weak_oops_do` method declarations mentions: >>> >>> // Parallel version. Uses ergo_workers(), active workers, and >>> // phase_time's max_threads to determine the number of threads to use. >>> which should be fixed. >>> >>> One option is to put something like "Uses max workers and the total number of weak references to determine the number of threads to use" as description for `ergo_workers` and remove the details in the description of the `weak_oops_do` descriptions, but just removing the mention of `active_workers` there could be fine too. >> >> Thanks @tschatzl . I've updated the descriptions of the weak_oops_do functions, and also added a description for ergo_workers. > > What's the argument for using `total_workers()` here? BTW, `G1ConcurrentMark::weak_refs_work` is in the caller chain. According to its name, it happens outside a pause. active_workers is semi-random from the POV of being used here. It?s value is whatever was set by the last call to update_active_workers, which was probably for some entirely different usage context. Using total_workers is consistent with, for example, ReferenceProcessor usage by G1 and ParallelGC. weak_refs_work is called in the remark pause. > ------------- > > PR: https://git.openjdk.java.net/jdk16/pull/75 From qpzhang at openjdk.java.net Fri Jan 8 13:46:03 2021 From: qpzhang at openjdk.java.net (Patrick Zhang) Date: Fri, 8 Jan 2021 13:46:03 GMT Subject: [jdk16] RFR: 8259380: Correct pretouch chunk size to cap with actual page size Message-ID: This is actually a regression, with regards to JVM startup time extreme slowdown, initially found at an aarch64 platform (Ampere Altra core). The chunk size of pretouching should cap with the input page size which probably stands for large pages size if UseLargePages was set, otherwise processing chunks with much smaller size inside large size pages would hurt performance. This issue was introduced during a refactor on chunk calculations JDK-8254972 (2c7fc85) but did not cause any problem immediately since the default PreTouchParallelChunkSize for all platforms are 1GB which can cover all popular sizes of large pages in use by most kernel variations. Later on, JDK-8254699 (805d058) set default 4MB for Linux platform, which is helpful to speed up startup time for some platforms. For example, most x64, since the popular default large page size (e.g. CentOS) is 2MB. In contrast, most default large page size with aarch64 platforms/kernels (e.g. CentOS) are 512MB, so using the 4MB chunk size to do page walk through the pages inside 512MB large page hurt performance of startup time. In addition, there will be a similar problem if we set -XX:PreTouchParallelChunkSize=4k at a x64 Linux platform, the startup slowdown will show as well. Tests: https://bugs.openjdk.java.net/secure/attachment/92623/pretouch_chunk_size_fix_testing.txt The 4 before-after comparisons show the JVM startup time go back to normal. 1). 33.381s to 0.870s 2). 20.333s to 2.740s 3). 15.090s to 6.268s 4). 38.983s to 6.709s (Use the start time of pretouching the first Survivor space as a rough measurement, while \time, or GCTraceTime can generate similar results) ------------- Commit messages: - 8259380: Correct pretouch chunk size to cap with actual page size Changes: https://git.openjdk.java.net/jdk16/pull/97/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk16&pr=97&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8259380 Stats: 6 lines in 1 file changed: 4 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk16/pull/97.diff Fetch: git fetch https://git.openjdk.java.net/jdk16 pull/97/head:pull/97 PR: https://git.openjdk.java.net/jdk16/pull/97 From qpzhang at openjdk.java.net Fri Jan 8 13:46:54 2021 From: qpzhang at openjdk.java.net (Patrick Zhang) Date: Fri, 8 Jan 2021 13:46:54 GMT Subject: RFR: 8259380: Correct pretouch chunk size to cap with actual page size In-Reply-To: <8jBja_MDcykHxaVbrH297uflunc2knkCO-qF5cgwNTc=.529038e6-4557-4a9f-b385-14f858591203@github.com> References: <8jBja_MDcykHxaVbrH297uflunc2knkCO-qF5cgwNTc=.529038e6-4557-4a9f-b385-14f858591203@github.com> Message-ID: On Fri, 8 Jan 2021 11:13:06 GMT, Thomas Schatzl wrote: > Yes, keep everything the same, the only difference is to create a pull request for openjdk/jdk16, not openjdk/jdk. Done. Please review the copied https://github.com/openjdk/jdk16/pull/97, and I am going to close this as suggested. Thanks. @tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/1978 From tschatzl at openjdk.java.net Fri Jan 8 13:51:53 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 8 Jan 2021 13:51:53 GMT Subject: RFR: 8259380: Correct pretouch chunk size to cap with actual page size In-Reply-To: References: <8jBja_MDcykHxaVbrH297uflunc2knkCO-qF5cgwNTc=.529038e6-4557-4a9f-b385-14f858591203@github.com> Message-ID: On Fri, 8 Jan 2021 13:44:08 GMT, Patrick Zhang wrote: >> Yes, keep everything the same, the only difference is to create a pull request for openjdk/jdk16, not openjdk/jdk. > >> Yes, keep everything the same, the only difference is to create a pull request for openjdk/jdk16, not openjdk/jdk. > > Done. Please review the copied https://github.com/openjdk/jdk16/pull/97, and I am going to close this as suggested. Thanks. @tschatzl Saw it. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/1978 From qpzhang at openjdk.java.net Fri Jan 8 13:51:54 2021 From: qpzhang at openjdk.java.net (Patrick Zhang) Date: Fri, 8 Jan 2021 13:51:54 GMT Subject: Withdrawn: 8259380: Correct pretouch chunk size to cap with actual page size In-Reply-To: References: Message-ID: On Thu, 7 Jan 2021 16:56:37 GMT, Patrick Zhang wrote: > This is actually a regression, with regards to JVM startup time extreme slowdown, initially found at an aarch64 platform (Ampere Altra core). > > The chunk size of pretouching should cap with the input page size which probably stands for large pages size if UseLargePages was set, otherwise processing chunks with much smaller size inside large size pages would hurt performance. > > This issue was introduced during a refactor on chunk calculations [JDK-8254972](https://bugs.openjdk.java.net/browse/JDK-8254972) (https://github.com/openjdk/jdk/commit/2c7fc85be92c60f4262aff3bc80e704792c1e810) but did not cause any problem immediately since the default PreTouchParallelChunkSize for all platforms are 1GB which can cover all popular sizes of large pages in use by most kernel variations. Later on, [JDK-8254699](https://bugs.openjdk.java.net/browse/JDK-8254699) (https://github.com/openjdk/jdk/commit/805d05812c5e831947197419d163f9c83d55634a) set default 4MB for Linux platform, which is helpful to speed up startup time for some platforms. For example, most x64, since the popular default large page size (e.g. CentOS) is 2MB. In contrast, most default large page size with aarch64 platforms/kernels (e.g. CentOS) are 512MB, so using the 4MB chunk size to do page walk through the pages inside 512MB large page hurt performance of startup time. > > In addition, there will be a similar problem if we set -XX:PreTouchParallelChunkSize=4k at a x64 Linux platform, the startup slowdown will show as well. > > Tests: > https://bugs.openjdk.java.net/secure/attachment/92623/pretouch_chunk_size_fix_testing.txt > The 4 before-after comparisons show the JVM startup time go back to normal. > 1). 33.381s to 0.870s > 2). 20.333s to 2.740s > 3). 15.090s to 6.268s > 4). 38.983s to 6.709s > (Use the start time of pretouching the first Survivor space as a rough measurement, while \time, or GCTraceTime can generate similar results) This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/1978 From ayang at openjdk.java.net Fri Jan 8 14:05:06 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Fri, 8 Jan 2021 14:05:06 GMT Subject: [jdk16] RFR: 8258985: Parallel WeakProcessor may use too few threads [v2] In-Reply-To: References: Message-ID: <7K_H64IToRs0Z4S6wxyTPM28gtSQ72G06qakVZbfScI=.a835700a-4720-460f-bef8-5255d0bc224f@github.com> On Fri, 8 Jan 2021 12:48:14 GMT, Kim Barrett wrote: >> Please review this fix to the parallel WeakProcessor's computation of the >> number of worker threads to use. It was previously limited by the current >> value of active_workers(), whatever that happens to be. It should be >> limited by total_workers(), just as with the parallel ReferenceProcessor. >> (Both are subject to ReferencesPerThread.) >> >> Testing >> mach5 tier1 >> Some hand testing (Linux-x64) to verify the expected number of threads are >> being used. >> >> Note: That hand testing suggests some further tuning of ReferencesPerThread >> might be in order. With the current default of 1000, I often saw in testing >> that some threads were started late enough that no work was left for them. >> I'll file a separate RFE for that. > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > fix doc comments about number of threads used Marked as reviewed by ayang (Author). ------------- PR: https://git.openjdk.java.net/jdk16/pull/75 From ayang at openjdk.java.net Fri Jan 8 14:05:06 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Fri, 8 Jan 2021 14:05:06 GMT Subject: [jdk16] RFR: 8258985: Parallel WeakProcessor may use too few threads [v2] In-Reply-To: References: Message-ID: On Fri, 8 Jan 2021 12:57:42 GMT, Albert Mingkun Yang wrote: >> Thanks @tschatzl . I've updated the descriptions of the weak_oops_do functions, and also added a description for ergo_workers. > > What's the argument for using `total_workers()` here? BTW, `G1ConcurrentMark::weak_refs_work` is in the caller chain. According to its name, it happens outside a pause. > weak_refs_work is called in the remark pause. Indeed, I should have followed the call chain one more step. Then, `WeakProcessor::weak_oops_do` is always called in a pause, right? Maybe this is worth mentioning in the comment. ------------- PR: https://git.openjdk.java.net/jdk16/pull/75 From tschatzl at openjdk.java.net Fri Jan 8 14:24:00 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 8 Jan 2021 14:24:00 GMT Subject: [jdk16] RFR: 8258985: Parallel WeakProcessor may use too few threads [v2] In-Reply-To: <7K_H64IToRs0Z4S6wxyTPM28gtSQ72G06qakVZbfScI=.a835700a-4720-460f-bef8-5255d0bc224f@github.com> References: <7K_H64IToRs0Z4S6wxyTPM28gtSQ72G06qakVZbfScI=.a835700a-4720-460f-bef8-5255d0bc224f@github.com> Message-ID: On Fri, 8 Jan 2021 14:02:12 GMT, Albert Mingkun Yang wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> fix doc comments about number of threads used > > Marked as reviewed by ayang (Author). Still good. Thanks. ------------- PR: https://git.openjdk.java.net/jdk16/pull/75 From tschatzl at openjdk.java.net Fri Jan 8 15:49:05 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 8 Jan 2021 15:49:05 GMT Subject: [jdk16] RFR: 8259380: Correct pretouch chunk size to cap with actual page size In-Reply-To: References: Message-ID: <5F61KXg-uv-UJGrcJWaLEK2Xc-hYxcXQp8pT_NKROA8=.06d83fe2-5ada-4c23-8c3b-a8893d31742f@github.com> On Fri, 8 Jan 2021 13:41:06 GMT, Patrick Zhang wrote: > This is actually a regression, with regards to JVM startup time extreme slowdown, initially found at an aarch64 platform (Ampere Altra core). > > The chunk size of pretouching should cap with the input page size which probably stands for large pages size if UseLargePages was set, otherwise processing chunks with much smaller size inside large size pages would hurt performance. > > This issue was introduced during a refactor on chunk calculations JDK-8254972 (2c7fc85) but did not cause any problem immediately since the default PreTouchParallelChunkSize for all platforms are 1GB which can cover all popular sizes of large pages in use by most kernel variations. Later on, JDK-8254699 (805d058) set default 4MB for Linux platform, which is helpful to speed up startup time for some platforms. For example, most x64, since the popular default large page size (e.g. CentOS) is 2MB. In contrast, most default large page size with aarch64 platforms/kernels (e.g. CentOS) are 512MB, so using the 4MB chunk size to do page walk through the pages inside 512MB large page hurt performance of startup time. > > In addition, there will be a similar problem if we set -XX:PreTouchParallelChunkSize=4k at a x64 Linux platform, the startup slowdown will show as well. > > Tests: > https://bugs.openjdk.java.net/secure/attachment/92623/pretouch_chunk_size_fix_testing.txt > The 4 before-after comparisons show the JVM startup time go back to normal. > 1). 33.381s to 0.870s > 2). 20.333s to 2.740s > 3). 15.090s to 6.268s > 4). 38.983s to 6.709s > (Use the start time of pretouching the first Survivor space as a rough measurement, while \time, or GCTraceTime can generate similar results) Thanks for moving this issue to JDK16. I looked a bit into what could cause this, and one thing that I particularly noticed is that the tests are enabling THP. With THP, the (original) code sets updages the page size to os::vm_page_size(): #ifdef LINUX // When using THP we need to always pre-touch using small pages as the OS will // initially always use small pages. page_size = UseTransparentHugePages ? (size_t)os::vm_page_size() : page_size; #endif size_t chunk_size = MAX2(PretouchTask::chunk_size(), page_size); After having looked at the code, I am not completely sure whether the analysis about the issue is correct or what the change fixes. To me it looks like that on aarch64 the default chunk size should be much higher than on x64. Example: `page_size` is the size of a page, that 512M in your case; `os::vm_page_size()` is the small size page, 64k in that configuration. `chunk_size` is then set to 4M (MAX(PreTouchParallelChunkSize, 64k)) - because with THP, as the comment indicates, we do not know whether the reservation is a large or a small page - so the code must use the small page size for actual pretouch within a chunk. I am also not sure about the statement about the introduction of this issue in JDK-8254972: the only difference seems to be where the page size for the `PretouchTask` is initialized, in the `PretouchTask` constructor there, and the calculation of the chunk size in the `PretouchTask::work` method done by every thread seperately. The only thing I could see that in case the OS already gave us large pages (i.e. 512M), and iterating over the same page using multiple threads may cause performance issues, although for the startup case, x64 does not seem to care (for me, for 20g heaps) and the default of 4M seems to be fastest as shown in [https://bugs.openjdk.java.net/browse/JDK-8254699][JDK-8254699] (and afaik with THP you always get small pages at first). I can't see how setting chunk size to 4k using the shows "the same problem" on x64 as it does not show with 4M (default) chunk size and 1g (huge) pages. E.g. chunk size = 4M $ time java -Xmx20g -Xms20g -XX:+UseLargePages -XX:LargePageSizeInBytes=1g -XX:+UseTransparentHugePages -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:PreTouchParallelChunkSize=4m Hello [0.001s][warning][gc] LargePageSizeInBytes=1073741824 large_page_size 1073741824 [0.053s][warning][gc] pretouch 21474836480 chunk 4194304 page 4096 [0.406s][warning][gc] pretouch 335544320 chunk 4194304 page 4096 [0.413s][warning][gc] pretouch 335544320 chunk 4194304 page 4096 [0.421s][warning][gc] pretouch 41943040 chunk 4194304 page 4096 [0.423s][warning][gc] pretouch 41943040 chunk 4194304 page 4096 [0.432s][warning][gc] pretouch 41943040 chunk 4194304 page 4096 Hello World! real 0m0.708s user 0m0.367s sys 0m9.983s and chunk size = 1g: $ time java -Xmx20g -Xms20g -XX:+UseLargePages -XX:LargePageSizeInBytes=1g -XX:+UseTransparentHugePages -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:PreTouchParallelChunkSize=1g Hello [0.001s][warning][gc] LargePageSizeInBytes=1073741824 large_page_size 1073741824 [0.054s][warning][gc] pretouch 21474836480 chunk 1073741824 page 4096 [1.141s][warning][gc] pretouch 335544320 chunk 1073741824 page 4096 [1.216s][warning][gc] pretouch 335544320 chunk 1073741824 page 4096 [1.289s][warning][gc] pretouch 41943040 chunk 1073741824 page 4096 [1.299s][warning][gc] pretouch 41943040 chunk 1073741824 page 4096 [1.320s][warning][gc] pretouch 41943040 chunk 1073741824 page 4096 Hello World! real 0m1.613s user 0m0.420s sys 0m16.666s Even without THP using 4M chunks (and still using 1g pages for the Java heap) still seems to be consistently faster. I would suggest that in this case the correct fix would be to do the same testing as done in JDK-8254699 and add an aarch64 specific default for `-XX:PreTouchParallelChunkSize`. The suggested change (to increase chunk size based on page size, particularly with THP enabled) seems to not fix the issue (suboptimal default chunk size) and also regress performance on x64 which I would prefer to avoid. (There is still the issue whether it makes sense to have a smaller chunk size than page size *without* THP, but that is not the issue here afaict) ------------- PR: https://git.openjdk.java.net/jdk16/pull/97 From tschatzl at openjdk.java.net Fri Jan 8 16:11:07 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 8 Jan 2021 16:11:07 GMT Subject: [jdk16] RFR: 8259380: Correct pretouch chunk size to cap with actual page size In-Reply-To: <5F61KXg-uv-UJGrcJWaLEK2Xc-hYxcXQp8pT_NKROA8=.06d83fe2-5ada-4c23-8c3b-a8893d31742f@github.com> References: <5F61KXg-uv-UJGrcJWaLEK2Xc-hYxcXQp8pT_NKROA8=.06d83fe2-5ada-4c23-8c3b-a8893d31742f@github.com> Message-ID: <6wfrNGRB7L5KuDUIlaDsm0D9DIrYEVtC-itXY7wauuo=.03575400-d391-495c-a9d4-32c885d8e065@github.com> On Fri, 8 Jan 2021 15:46:16 GMT, Thomas Schatzl wrote: >> This is actually a regression, with regards to JVM startup time extreme slowdown, initially found at an aarch64 platform (Ampere Altra core). >> >> The chunk size of pretouching should cap with the input page size which probably stands for large pages size if UseLargePages was set, otherwise processing chunks with much smaller size inside large size pages would hurt performance. >> >> This issue was introduced during a refactor on chunk calculations JDK-8254972 (2c7fc85) but did not cause any problem immediately since the default PreTouchParallelChunkSize for all platforms are 1GB which can cover all popular sizes of large pages in use by most kernel variations. Later on, JDK-8254699 (805d058) set default 4MB for Linux platform, which is helpful to speed up startup time for some platforms. For example, most x64, since the popular default large page size (e.g. CentOS) is 2MB. In contrast, most default large page size with aarch64 platforms/kernels (e.g. CentOS) are 512MB, so using the 4MB chunk size to do page walk through the pages inside 512MB large page hurt performance of startup time. >> >> In addition, there will be a similar problem if we set -XX:PreTouchParallelChunkSize=4k at a x64 Linux platform, the startup slowdown will show as well. >> >> Tests: >> https://bugs.openjdk.java.net/secure/attachment/92623/pretouch_chunk_size_fix_testing.txt >> The 4 before-after comparisons show the JVM startup time go back to normal. >> 1). 33.381s to 0.870s >> 2). 20.333s to 2.740s >> 3). 15.090s to 6.268s >> 4). 38.983s to 6.709s >> (Use the start time of pretouching the first Survivor space as a rough measurement, while \time, or GCTraceTime can generate similar results) > > Thanks for moving this issue to JDK16. > > I looked a bit into what could cause this, and one thing that I particularly noticed is that the tests are enabling THP. > > With THP, the (original) code sets updages the page size to os::vm_page_size(): > > #ifdef LINUX > // When using THP we need to always pre-touch using small pages as the OS will > // initially always use small pages. > page_size = UseTransparentHugePages ? (size_t)os::vm_page_size() : page_size; > #endif > size_t chunk_size = MAX2(PretouchTask::chunk_size(), page_size); > After having looked at the code, I am not completely sure whether the analysis about the issue is correct or what the change fixes. To me it looks like that on aarch64 the default chunk size should be much higher than on x64. > > Example: > `page_size` is the size of a page, that 512M in your case; `os::vm_page_size()` is the small size page, 64k in that configuration. > > `chunk_size` is then set to 4M (MAX(PreTouchParallelChunkSize, 64k)) - because with THP, as the comment indicates, we do not know whether the reservation is a large or a small page - so the code must use the small page size for actual pretouch within a chunk. > > I am also not sure about the statement about the introduction of this issue in JDK-8254972: the only difference seems to be where the page size for the `PretouchTask` is initialized, in the `PretouchTask` constructor there, and the calculation of the chunk size in the `PretouchTask::work` method done by every thread seperately. > > The only thing I could see that in case the OS already gave us large pages (i.e. 512M), and iterating over the same page using multiple threads may cause performance issues, although for the startup case, x64 does not seem to care (for me, for 20g heaps) and the default of 4M seems to be fastest as shown in [https://bugs.openjdk.java.net/browse/JDK-8254699][JDK-8254699] (and afaik with THP you always get small pages at first). > > I can't see how setting chunk size to 4k using the shows "the same problem" on x64 as it does not show with 4M (default) chunk size and 1g (huge) pages. E.g. chunk size = 4M > > $ time java -Xmx20g -Xms20g -XX:+UseLargePages -XX:LargePageSizeInBytes=1g -XX:+UseTransparentHugePages -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:PreTouchParallelChunkSize=4m Hello > [0.001s][warning][gc] LargePageSizeInBytes=1073741824 large_page_size 1073741824 > [0.053s][warning][gc] pretouch 21474836480 chunk 4194304 page 4096 > [0.406s][warning][gc] pretouch 335544320 chunk 4194304 page 4096 > [0.413s][warning][gc] pretouch 335544320 chunk 4194304 page 4096 > [0.421s][warning][gc] pretouch 41943040 chunk 4194304 page 4096 > [0.423s][warning][gc] pretouch 41943040 chunk 4194304 page 4096 > [0.432s][warning][gc] pretouch 41943040 chunk 4194304 page 4096 > Hello World! > > real 0m0.708s > user 0m0.367s > sys 0m9.983s > > and chunk size = 1g: > > $ time java -Xmx20g -Xms20g -XX:+UseLargePages -XX:LargePageSizeInBytes=1g -XX:+UseTransparentHugePages -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:PreTouchParallelChunkSize=1g Hello > [0.001s][warning][gc] LargePageSizeInBytes=1073741824 large_page_size 1073741824 > [0.054s][warning][gc] pretouch 21474836480 chunk 1073741824 page 4096 > [1.141s][warning][gc] pretouch 335544320 chunk 1073741824 page 4096 > [1.216s][warning][gc] pretouch 335544320 chunk 1073741824 page 4096 > [1.289s][warning][gc] pretouch 41943040 chunk 1073741824 page 4096 > [1.299s][warning][gc] pretouch 41943040 chunk 1073741824 page 4096 > [1.320s][warning][gc] pretouch 41943040 chunk 1073741824 page 4096 > Hello World! > > real 0m1.613s > user 0m0.420s > sys 0m16.666s > > Even without THP using 4M chunks (and still using 1g pages for the Java heap) still seems to be consistently faster. > > I would suggest that in this case the correct fix would be to do the same testing as done in JDK-8254699 and add an aarch64 specific default for `-XX:PreTouchParallelChunkSize`. > > The suggested change (to increase chunk size based on page size, particularly with THP enabled) seems to not fix the issue (suboptimal default chunk size) and also regress performance on x64 which I would prefer to avoid. > > (There is still the issue whether it makes sense to have a smaller chunk size than page size *without* THP, but that is not the issue here afaict) Another option is to just set the default chunk size for aarch64 to e.g. 512M and defer searching for the "best" later. ------------- PR: https://git.openjdk.java.net/jdk16/pull/97 From qpzhang at openjdk.java.net Sat Jan 9 04:18:59 2021 From: qpzhang at openjdk.java.net (Patrick Zhang) Date: Sat, 9 Jan 2021 04:18:59 GMT Subject: [jdk16] RFR: 8259380: Correct pretouch chunk size to cap with actual page size In-Reply-To: <6wfrNGRB7L5KuDUIlaDsm0D9DIrYEVtC-itXY7wauuo=.03575400-d391-495c-a9d4-32c885d8e065@github.com> References: <5F61KXg-uv-UJGrcJWaLEK2Xc-hYxcXQp8pT_NKROA8=.06d83fe2-5ada-4c23-8c3b-a8893d31742f@github.com> <6wfrNGRB7L5KuDUIlaDsm0D9DIrYEVtC-itXY7wauuo=.03575400-d391-495c-a9d4-32c885d8e065@github.com> Message-ID: On Fri, 8 Jan 2021 16:08:43 GMT, Thomas Schatzl wrote: >> Thanks for moving this issue to JDK16. >> >> I looked a bit into what could cause this, and one thing that I particularly noticed is that the tests are enabling THP. >> >> With THP, the (original) code sets updages the page size to os::vm_page_size(): >> >> #ifdef LINUX >> // When using THP we need to always pre-touch using small pages as the OS will >> // initially always use small pages. >> page_size = UseTransparentHugePages ? (size_t)os::vm_page_size() : page_size; >> #endif >> size_t chunk_size = MAX2(PretouchTask::chunk_size(), page_size); >> After having looked at the code, I am not completely sure whether the analysis about the issue is correct or what the change fixes. To me it looks like that on aarch64 the default chunk size should be much higher than on x64. >> >> Example: >> `page_size` is the size of a page, that 512M in your case; `os::vm_page_size()` is the small size page, 64k in that configuration. >> >> `chunk_size` is then set to 4M (MAX(PreTouchParallelChunkSize, 64k)) - because with THP, as the comment indicates, we do not know whether the reservation is a large or a small page - so the code must use the small page size for actual pretouch within a chunk. >> >> I am also not sure about the statement about the introduction of this issue in JDK-8254972: the only difference seems to be where the page size for the `PretouchTask` is initialized, in the `PretouchTask` constructor there, and the calculation of the chunk size in the `PretouchTask::work` method done by every thread seperately. >> >> The only thing I could see that in case the OS already gave us large pages (i.e. 512M), and iterating over the same page using multiple threads may cause performance issues, although for the startup case, x64 does not seem to care (for me, for 20g heaps) and the default of 4M seems to be fastest as shown in [https://bugs.openjdk.java.net/browse/JDK-8254699][JDK-8254699] (and afaik with THP you always get small pages at first). >> >> I can't see how setting chunk size to 4k using the shows "the same problem" on x64 as it does not show with 4M (default) chunk size and 1g (huge) pages. E.g. chunk size = 4M >> >> $ time java -Xmx20g -Xms20g -XX:+UseLargePages -XX:LargePageSizeInBytes=1g -XX:+UseTransparentHugePages -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:PreTouchParallelChunkSize=4m Hello >> [0.001s][warning][gc] LargePageSizeInBytes=1073741824 large_page_size 1073741824 >> [0.053s][warning][gc] pretouch 21474836480 chunk 4194304 page 4096 >> [0.406s][warning][gc] pretouch 335544320 chunk 4194304 page 4096 >> [0.413s][warning][gc] pretouch 335544320 chunk 4194304 page 4096 >> [0.421s][warning][gc] pretouch 41943040 chunk 4194304 page 4096 >> [0.423s][warning][gc] pretouch 41943040 chunk 4194304 page 4096 >> [0.432s][warning][gc] pretouch 41943040 chunk 4194304 page 4096 >> Hello World! >> >> real 0m0.708s >> user 0m0.367s >> sys 0m9.983s >> >> and chunk size = 1g: >> >> $ time java -Xmx20g -Xms20g -XX:+UseLargePages -XX:LargePageSizeInBytes=1g -XX:+UseTransparentHugePages -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:PreTouchParallelChunkSize=1g Hello >> [0.001s][warning][gc] LargePageSizeInBytes=1073741824 large_page_size 1073741824 >> [0.054s][warning][gc] pretouch 21474836480 chunk 1073741824 page 4096 >> [1.141s][warning][gc] pretouch 335544320 chunk 1073741824 page 4096 >> [1.216s][warning][gc] pretouch 335544320 chunk 1073741824 page 4096 >> [1.289s][warning][gc] pretouch 41943040 chunk 1073741824 page 4096 >> [1.299s][warning][gc] pretouch 41943040 chunk 1073741824 page 4096 >> [1.320s][warning][gc] pretouch 41943040 chunk 1073741824 page 4096 >> Hello World! >> >> real 0m1.613s >> user 0m0.420s >> sys 0m16.666s >> >> Even without THP using 4M chunks (and still using 1g pages for the Java heap) still seems to be consistently faster. >> >> I would suggest that in this case the correct fix would be to do the same testing as done in JDK-8254699 and add an aarch64 specific default for `-XX:PreTouchParallelChunkSize`. >> >> The suggested change (to increase chunk size based on page size, particularly with THP enabled) seems to not fix the issue (suboptimal default chunk size) and also regress performance on x64 which I would prefer to avoid. >> >> (There is still the issue whether it makes sense to have a smaller chunk size than page size *without* THP, but that is not the issue here afaict) > > Another option is to just set the default chunk size for aarch64 to e.g. 512M and defer searching for the "best" later. Thanks for the comments. First of all, I am not objecting to https://github.com/openjdk/jdk16/commit/805d05812c5e831947197419d163f9c83d55634a, which does helps most cases. If we have an aarch64 system with 2MB large page configured for the kernel, certainly we can share the benefit as well. > I am also not sure about the statement about the introduction of this issue in JDK-8254972: the only difference seems to be where the page size for the `PretouchTask` is initialized, in the `PretouchTask` constructor there, and the calculation of the chunk size in the `PretouchTask::work` method done by every thread seperately. Before https://github.com/openjdk/jdk16/commit/2c7fc85be92c60f4262aff3bc80e704792c1e810, `PretouchTask` instance gets initialized firstly, then doing the `cap with page size` when calculating `num_chunks`. In contrast, after https://github.com/openjdk/jdk16/commit/2c7fc85be92c60f4262aff3bc80e704792c1e810, `PretouchTask` instance initialization followed the calculation of `chunk_size`. This is the diff. > The only thing I could see that in case the OS already gave us large pages (i.e. 512M), and iterating over the same page using multiple threads may cause performance issues, although for the startup case, x64 does not seem to care (for me, for 20g heaps) and the default of 4M seems to be fastest as shown in [https://bugs.openjdk.java.net/browse/JDK-8254699][JDK-8254699] (and afaik with THP you always get small pages at first). Please see https://github.com/torvalds/linux/blob/a09b1d78505eb9fe27597a5174c61a7c66253fe8/Documentation/admin-guide/mm/hugetlbpage.rst. We cannot take assumption of the size of large pages, this is not specific to any arch, x64, aarch64, or else. Users are able to configure any choice to kernel they want, if architecturally supported. So x64 can face to 512MB large page, while aarch64 can work with 2MB large page too. > I can't see how setting chunk size to 4k using the shows "the same problem" on x64 as it does not show with 4M (default) chunk size and 1g (huge) pages. E.g. chunk size = 4M Please see the testing results I attached, https://bugs.openjdk.java.net/secure/attachment/92623/pretouch_chunk_size_fix_testing.txt 2), 3), 4) are done on x86 servers with various -XX:PreTouchParallelChunkSize=xxk > Even without THP using 4M chunks (and still using 1g pages for the Java heap) still seems to be consistently faster. Again, I agree it is faster under some conditions, but not all. > I would suggest that in this case the correct fix would be to do the same testing as done in JDK-8254699 and add an aarch64 specific default for `-XX:PreTouchParallelChunkSize`. Not agree, it hurts startup time on most systems configured by default, e.g., CentOS 8 Stream aarch64. > The suggested change (to increase chunk size based on page size, particularly with THP enabled) seems to not fix the issue (suboptimal default chunk size) and also regress performance on x64 which I would prefer to avoid. No, it does not hurt default system on x64, since the size of large pages there is 2M, which means 4M can still work very well. > (There is still the issue whether it makes sense to have a smaller chunk size than page size _without_ THP, but that is not the issue here afaict) I assume this change does not change things if not LINUX, or not THP. Please double check. ------------- PR: https://git.openjdk.java.net/jdk16/pull/97 From qpzhang at openjdk.java.net Sat Jan 9 11:38:58 2021 From: qpzhang at openjdk.java.net (Patrick Zhang) Date: Sat, 9 Jan 2021 11:38:58 GMT Subject: [jdk16] RFR: 8259380: Correct pretouch chunk size to cap with actual page size In-Reply-To: <6wfrNGRB7L5KuDUIlaDsm0D9DIrYEVtC-itXY7wauuo=.03575400-d391-495c-a9d4-32c885d8e065@github.com> References: <5F61KXg-uv-UJGrcJWaLEK2Xc-hYxcXQp8pT_NKROA8=.06d83fe2-5ada-4c23-8c3b-a8893d31742f@github.com> <6wfrNGRB7L5KuDUIlaDsm0D9DIrYEVtC-itXY7wauuo=.03575400-d391-495c-a9d4-32c885d8e065@github.com> Message-ID: On Fri, 8 Jan 2021 16:08:43 GMT, Thomas Schatzl wrote: > Another option is to just set the default chunk size for aarch64 to e.g. 512M and defer searching for the "best" later. This cannot solve the problem completely, e.g., [HugeTLB Pages](https://github.com/torvalds/linux/blob/a09b1d78505eb9fe27597a5174c61a7c66253fe8/Documentation/admin-guide/mm/hugetlbpage.rst): "_x86 CPUs normally support 4K and 2M (1G if architecturally supported)_". Should there be a x64 system configured with 1GB large page, using current 4MB chunk size, the regression slowdown would show too, I believe. This was probably the reason why `-XX:PreTouchParallelChunkSize` has default 1GB settings, which could cover all kinds of large pages in modern kernels/architectures. ------------- PR: https://git.openjdk.java.net/jdk16/pull/97 From tgoldstein at outbrain.com Sat Jan 9 20:15:54 2021 From: tgoldstein at outbrain.com (Tal Goldstein) Date: Sat, 9 Jan 2021 22:15:54 +0200 Subject: Unexpected results when enabling +UseNUMA for G1GC Message-ID: Hi Guys, We're exploring the use of the flag -XX:+UseNUMA and its effect on G1 GC in JDK 14. For that, we've created a test that consists of 2 k8s deployments of some service, where deployment A has the UseNUMA flag enabled, and deployment B doesn't have it. In order for NUMA to actually work inside the docker container, we also needed to add numactl lib to the container (apk add numactl), and in order to measure the local/remote memory access we've used pcm-numa ( https://github.com/opcm/pcm), the docker is based on an image of Alpine Linux v3.11. Each deployment handles around 150 requests per second and all of the deployment's pods are running on the same kube machine. When running the test, we expected to see that the (local memory access) / (total memory access) ratio on the UseNUMA deployment, is much higher than the non-numa deployment, and as a result that the deployment itself handles a higher throughput of requests than the non-numa deployment. Surprisingly this isn't the case: On the kube running deployment A which uses NUMA, we measured 20M/ 13M/ 33M (local/remote/total) memory accesses, and for the kube running deployment B which doesn't use NUMA, we measured (23M/10M/33M) on the same time. Can you help to understand if we're doing anything wrong? or maybe our expectations are wrong ? The 2 deployments are identical (except for the UseNUMA flag): Each deployment contains 2 pods running on k8s. Each pod has 10GB memory, 8GB heap, requires 2 CPUs (but not limited to 2). Each deployment runs on a separate but identical kube machine with this spec: Hardware............: Supermicro SYS-2027TR-HTRF+ CPU.................: Intel(R) Xeon(R) CPU E5-2630L v2 @ 2.40GHz CPUs................: 2 CPU Cores...........: 12 Memory..............: 63627 MB We've also written to a file all NUMA related logs (using -Xlog:os*,gc*=trace:file=/outbrain/heapdumps/fulllog.log:hostname,time,level,tags) - log file could be found here: https://drive.google.com/file/d/1eZqYDtBDWKXaEakh_DoYv0P6V9bcLs6Z/view?usp=sharing so we know that NUMA is indeed working, but again, it doesn't give the desired results we expected to see. Any Ideas why ? Is it a matter of workload ? Are there any workloads you can suggest that will benefit from G1 NUMA awareness ? Do you happen to have a link to code that runs such a workload? Thanks, Tal -- The above terms reflect a potential business arrangement, are provided? solely as a basis for further discussion, and are not intended to be and do? not constitute a legally binding obligation. No legally binding obligations will be created, implied, or inferred until an agreement in final form is? executed in writing by all parties involved. This email and any attachments hereto may be confidential or privileged. ?If you received this communication by mistake, please don't forward it to anyone else, please? erase all copies and attachments, and please let me know that it has gone? to the wrong person. Thanks. From github.com+13688759+lgxbslgx at openjdk.java.net Mon Jan 11 11:42:57 2021 From: github.com+13688759+lgxbslgx at openjdk.java.net (Guoxiong Li) Date: Mon, 11 Jan 2021 11:42:57 GMT Subject: Withdrawn: 8227106: InitiatingHeapOccupancyPercent is G1-specific but defined in shared In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 08:21:01 GMT, Guoxiong Li wrote: > Hi all, > > Please review this little fix of G1. > The command line option InitiatingHeapOccupancyPercent is G1-specific. But it is defined in shared/gc_globals.hpp rather than g1/g1_globals.hpp. This patch moves it to the proper location. > Thank you for taking the time to review. > > Best Regards. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/1217 From zgu at openjdk.java.net Mon Jan 11 13:54:08 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Mon, 11 Jan 2021 13:54:08 GMT Subject: RFR: 8255019: Shenandoah: Split STW and concurrent mark into separate classes [v24] In-Reply-To: <4C2MtK5jhW_yHyzDp5k5cAaguy3QlmI0T0LTKf6n978=.8599d416-5d32-40e5-ae7f-9bf5ca5b330d@github.com> References: <4C2MtK5jhW_yHyzDp5k5cAaguy3QlmI0T0LTKf6n978=.8599d416-5d32-40e5-ae7f-9bf5ca5b330d@github.com> Message-ID: On Tue, 5 Jan 2021 13:49:12 GMT, Aleksey Shipilev wrote: >> Zhengyu Gu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 31 commits: >> >> - Merge >> - Update copyright years >> - Merge >> - Merge branch 'master' into JDK-8255019-sh-mark >> - Concurrent mark does not expect forwarded objects >> - Merge branch 'master' into JDK-8255019-sh-mark >> - Merge branch 'master' into JDK-8255019-sh-mark >> - Silent valgrind on potential memory leak >> - Merge branch 'master' into JDK-8255019-sh-mark >> - Removed ShenandoahConcurrentMark parameter from concurrent GC entry/op, etc. >> - ... and 21 more: https://git.openjdk.java.net/jdk/compare/a6c08813...b7390c08 > > src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 161: > >> 159: // threads, and performance-wise it doesn't really matter. Adds about 1ms to >> 160: // full-gc. >> 161: { > > This seems to revert JDK-8258490? No. After splitting, full-gc never gets here. > src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 168: > >> 166: while (satb_mq_set.apply_closure_to_completed_buffer(&cl)); >> 167: bool do_nmethods = heap->unload_classes() && !ShenandoahConcurrentRoots::can_do_concurrent_class_unloading(); >> 168: assert(!heap->has_forwarded_objects(), "Not expected"); > > Do you need to move this assert? No, fixed. > src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 335: > >> 333: ShenandoahReferenceProcessor* rp = _heap->ref_processor(); >> 334: task_queues()->reserve(workers->active_workers()); >> 335: ShenandoahMarkConcurrentRootsTask task(task_queues(), rp, ShenandoahPhaseTimings::conc_mark_roots, workers->active_workers()); > > Excess space: `rp, ShenandoahPhaseTimings`. Fixed > src/hotspot/share/gc/shenandoah/shenandoahMarkCompact.cpp line 249: > >> 247: rp->set_soft_reference_policy(true); // forcefully purge all soft references >> 248: >> 249: > > Excess newline? Fixed > src/hotspot/share/gc/shenandoah/shenandoahRootProcessor.inline.hpp line 2: > >> 1: /* >> 2: * Copyright (c) 2019, 2019, Red Hat, Inc. All rights reserved. > > Odd change: 2020 -> 2019. Another merge error. Fixed. > src/hotspot/share/gc/shenandoah/shenandoahSTWMark.cpp line 96: > >> 94: TASKQUEUE_STATS_ONLY(task_queues()->print_taskqueue_stats()); >> 95: TASKQUEUE_STATS_ONLY(task_queues()->reset_taskqueue_stats()); >> 96: > > Excess newline Fixed ------------- PR: https://git.openjdk.java.net/jdk/pull/1009 From tschatzl at openjdk.java.net Mon Jan 11 14:49:00 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 11 Jan 2021 14:49:00 GMT Subject: [jdk16] RFR: 8259380: Correct pretouch chunk size to cap with actual page size In-Reply-To: References: <5F61KXg-uv-UJGrcJWaLEK2Xc-hYxcXQp8pT_NKROA8=.06d83fe2-5ada-4c23-8c3b-a8893d31742f@github.com> <6wfrNGRB7L5KuDUIlaDsm0D9DIrYEVtC-itXY7wauuo=.03575400-d391-495c-a9d4-32c885d8e065@github.com> Message-ID: On Sat, 9 Jan 2021 11:36:31 GMT, Patrick Zhang wrote: >> Another option is to just set the default chunk size for aarch64 to e.g. 512M and defer searching for the "best" later. > >> Another option is to just set the default chunk size for aarch64 to e.g. 512M and defer searching for the "best" later. > > This cannot solve the problem completely, e.g., [HugeTLB Pages](https://github.com/torvalds/linux/blob/a09b1d78505eb9fe27597a5174c61a7c66253fe8/Documentation/admin-guide/mm/hugetlbpage.rst): "_x86 CPUs normally support 4K and 2M (1G if architecturally supported)_". Should there be a x64 system configured with 1GB large page, using current 4MB chunk size, the regression slowdown would show too, I believe. > This was probably the reason why `-XX:PreTouchParallelChunkSize` has default 1GB settings, which could cover all kinds of large pages in modern kernels/architectures. Hi, you are right about the initialization order change. As for the expected regression with 1g pages with 4m chunk size vs. 1g chunk size: interestingly, on Linux, without THP, 4m chunk size is faster for a simple "Hello World" app. I noticed that already yesterday, and re-verified on different machines and heap sizes up to 2TB today. However this seems to be an artifact of the test, as when comparing log message times (the `Running G1 PreTouch with X workers for ...` ones shown with gc+heap=debug, they are the same. So I think the approach is good. Thomas ------------- PR: https://git.openjdk.java.net/jdk16/pull/97 From tschatzl at openjdk.java.net Mon Jan 11 14:59:00 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 11 Jan 2021 14:59:00 GMT Subject: [jdk16] RFR: 8259380: Correct pretouch chunk size to cap with actual page size In-Reply-To: References: <5F61KXg-uv-UJGrcJWaLEK2Xc-hYxcXQp8pT_NKROA8=.06d83fe2-5ada-4c23-8c3b-a8893d31742f@github.com> <6wfrNGRB7L5KuDUIlaDsm0D9DIrYEVtC-itXY7wauuo=.03575400-d391-495c-a9d4-32c885d8e065@github.com> Message-ID: On Sat, 9 Jan 2021 11:36:31 GMT, Patrick Zhang wrote: >> Another option is to just set the default chunk size for aarch64 to e.g. 512M and defer searching for the "best" later. > >> Another option is to just set the default chunk size for aarch64 to e.g. 512M and defer searching for the "best" later. > > This cannot solve the problem completely, e.g., [HugeTLB Pages](https://github.com/torvalds/linux/blob/a09b1d78505eb9fe27597a5174c61a7c66253fe8/Documentation/admin-guide/mm/hugetlbpage.rst): "_x86 CPUs normally support 4K and 2M (1G if architecturally supported)_". Should there be a x64 system configured with 1GB large page, using current 4MB chunk size, the regression slowdown would show too, I believe. > This was probably the reason why `-XX:PreTouchParallelChunkSize` has default 1GB settings, which could cover all kinds of large pages in modern kernels/architectures. As for the expected regression with 1g pages with 4m chunk size vs. 1g chunk size: interestingly, on Linux, without THP, 4m chunk size is faster for a simple "Hello World" app. I noticed that already yesterday, and re-verified on different machines and heap sizes up to 2TB today. However this seems to be an artifact of the test, as when comparing log message times (the `Running G1 PreTouch with X workers for ...` ones shown with gc+heap=debug, they are the same. ------------- PR: https://git.openjdk.java.net/jdk16/pull/97 From tschatzl at openjdk.java.net Mon Jan 11 15:17:59 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 11 Jan 2021 15:17:59 GMT Subject: [jdk16] RFR: 8259380: Correct pretouch chunk size to cap with actual page size In-Reply-To: References: Message-ID: <7s1sxeNe7w2F2OlPdZC4tnQFmjPQpHvnRqQVwljHbNc=.e0a30da1-22ef-43f6-bf70-518ed2de4bac@github.com> On Fri, 8 Jan 2021 13:41:06 GMT, Patrick Zhang wrote: > This is actually a regression, with regards to JVM startup time extreme slowdown, initially found at an aarch64 platform (Ampere Altra core). > > The chunk size of pretouching should cap with the input page size which probably stands for large pages size if UseLargePages was set, otherwise processing chunks with much smaller size inside large size pages would hurt performance. > > This issue was introduced during a refactor on chunk calculations JDK-8254972 (2c7fc85) but did not cause any problem immediately since the default PreTouchParallelChunkSize for all platforms are 1GB which can cover all popular sizes of large pages in use by most kernel variations. Later on, JDK-8254699 (805d058) set default 4MB for Linux platform, which is helpful to speed up startup time for some platforms. For example, most x64, since the popular default large page size (e.g. CentOS) is 2MB. In contrast, most default large page size with aarch64 platforms/kernels (e.g. CentOS) are 512MB, so using the 4MB chunk size to do page walk through the pages inside 512MB large page hurt performance of startup time. > > In addition, there will be a similar problem if we set -XX:PreTouchParallelChunkSize=4k at a x64 Linux platform, the startup slowdown will show as well. > > Tests: > https://bugs.openjdk.java.net/secure/attachment/92623/pretouch_chunk_size_fix_testing.txt > The 4 before-after comparisons show the JVM startup time go back to normal. > 1). 33.381s to 0.870s > 2). 20.333s to 2.740s > 3). 15.090s to 6.268s > 4). 38.983s to 6.709s > (Use the start time of pretouching the first Survivor space as a rough measurement, while \time, or GCTraceTime can generate similar results) Changes requested by tschatzl (Reviewer). src/hotspot/share/gc/shared/pretouchTask.cpp line 70: > 68: // large pages size if UseLargePages?was set, otherwise processing chunks with > 69: // much smaller size inside large size pages would hurt performance. > 70: // Revising page_size should be placed after having decided the proper chuck_size. Something like `// Chunk size should be at least (unmodified) page size as using multiple threads pretouch on a single chunk can decrease performance.` is sufficient here. ------------- PR: https://git.openjdk.java.net/jdk16/pull/97 From qpzhang at openjdk.java.net Mon Jan 11 15:24:59 2021 From: qpzhang at openjdk.java.net (Patrick Zhang) Date: Mon, 11 Jan 2021 15:24:59 GMT Subject: [jdk16] RFR: 8259380: Correct pretouch chunk size to cap with actual page size In-Reply-To: <7s1sxeNe7w2F2OlPdZC4tnQFmjPQpHvnRqQVwljHbNc=.e0a30da1-22ef-43f6-bf70-518ed2de4bac@github.com> References: <7s1sxeNe7w2F2OlPdZC4tnQFmjPQpHvnRqQVwljHbNc=.e0a30da1-22ef-43f6-bf70-518ed2de4bac@github.com> Message-ID: On Mon, 11 Jan 2021 15:15:02 GMT, Thomas Schatzl wrote: >> This is actually a regression, with regards to JVM startup time extreme slowdown, initially found at an aarch64 platform (Ampere Altra core). >> >> The chunk size of pretouching should cap with the input page size which probably stands for large pages size if UseLargePages was set, otherwise processing chunks with much smaller size inside large size pages would hurt performance. >> >> This issue was introduced during a refactor on chunk calculations JDK-8254972 (2c7fc85) but did not cause any problem immediately since the default PreTouchParallelChunkSize for all platforms are 1GB which can cover all popular sizes of large pages in use by most kernel variations. Later on, JDK-8254699 (805d058) set default 4MB for Linux platform, which is helpful to speed up startup time for some platforms. For example, most x64, since the popular default large page size (e.g. CentOS) is 2MB. In contrast, most default large page size with aarch64 platforms/kernels (e.g. CentOS) are 512MB, so using the 4MB chunk size to do page walk through the pages inside 512MB large page hurt performance of startup time. >> >> In addition, there will be a similar problem if we set -XX:PreTouchParallelChunkSize=4k at a x64 Linux platform, the startup slowdown will show as well. >> >> Tests: >> https://bugs.openjdk.java.net/secure/attachment/92623/pretouch_chunk_size_fix_testing.txt >> The 4 before-after comparisons show the JVM startup time go back to normal. >> 1). 33.381s to 0.870s >> 2). 20.333s to 2.740s >> 3). 15.090s to 6.268s >> 4). 38.983s to 6.709s >> (Use the start time of pretouching the first Survivor space as a rough measurement, while \time, or GCTraceTime can generate similar results) > > src/hotspot/share/gc/shared/pretouchTask.cpp line 70: > >> 68: // large pages size if UseLargePages?was set, otherwise processing chunks with >> 69: // much smaller size inside large size pages would hurt performance. >> 70: // Revising page_size should be placed after having decided the proper chuck_size. > > Something like > > `// Chunk size should be at least (unmodified) page size as using multiple threads pretouch on a single chunk can decrease performance.` > > is sufficient here. Sure I will update this accordingly, thanks ------------- PR: https://git.openjdk.java.net/jdk16/pull/97 From qpzhang at openjdk.java.net Mon Jan 11 15:40:20 2021 From: qpzhang at openjdk.java.net (Patrick Zhang) Date: Mon, 11 Jan 2021 15:40:20 GMT Subject: [jdk16] RFR: 8259380: Correct pretouch chunk size to cap with actual page size [v2] In-Reply-To: References: Message-ID: > This is actually a regression, with regards to JVM startup time extreme slowdown, initially found at an aarch64 platform (Ampere Altra core). > > The chunk size of pretouching should cap with the input page size which probably stands for large pages size if UseLargePages was set, otherwise processing chunks with much smaller size inside large size pages would hurt performance. > > This issue was introduced during a refactor on chunk calculations JDK-8254972 (2c7fc85) but did not cause any problem immediately since the default PreTouchParallelChunkSize for all platforms are 1GB which can cover all popular sizes of large pages in use by most kernel variations. Later on, JDK-8254699 (805d058) set default 4MB for Linux platform, which is helpful to speed up startup time for some platforms. For example, most x64, since the popular default large page size (e.g. CentOS) is 2MB. In contrast, most default large page size with aarch64 platforms/kernels (e.g. CentOS) are 512MB, so using the 4MB chunk size to do page walk through the pages inside 512MB large page hurt performance of startup time. > > In addition, there will be a similar problem if we set -XX:PreTouchParallelChunkSize=4k at a x64 Linux platform, the startup slowdown will show as well. > > Tests: > https://bugs.openjdk.java.net/secure/attachment/92623/pretouch_chunk_size_fix_testing.txt > The 4 before-after comparisons show the JVM startup time go back to normal. > 1). 33.381s to 0.870s > 2). 20.333s to 2.740s > 3). 15.090s to 6.268s > 4). 38.983s to 6.709s > (Use the start time of pretouching the first Survivor space as a rough measurement, while \time, or GCTraceTime can generate similar results) Patrick Zhang has updated the pull request incrementally with one additional commit since the last revision: 8259380: Update the comments for chunk_size calculation to pretouch ------------- Changes: - all: https://git.openjdk.java.net/jdk16/pull/97/files - new: https://git.openjdk.java.net/jdk16/pull/97/files/cc059770..7415184b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk16&pr=97&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk16&pr=97&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.java.net/jdk16/pull/97.diff Fetch: git fetch https://git.openjdk.java.net/jdk16 pull/97/head:pull/97 PR: https://git.openjdk.java.net/jdk16/pull/97 From qpzhang at openjdk.java.net Mon Jan 11 15:44:17 2021 From: qpzhang at openjdk.java.net (Patrick Zhang) Date: Mon, 11 Jan 2021 15:44:17 GMT Subject: [jdk16] RFR: 8259380: Correct pretouch chunk size to cap with actual page size [v3] In-Reply-To: References: Message-ID: <6ycfcUwyr91m-WmqBNYgKUgPUxXVz67LuyLQh8d0Cb0=.10f9319c-47f2-4b34-a313-89f550bda121@github.com> > This is actually a regression, with regards to JVM startup time extreme slowdown, initially found at an aarch64 platform (Ampere Altra core). > > The chunk size of pretouching should cap with the input page size which probably stands for large pages size if UseLargePages was set, otherwise processing chunks with much smaller size inside large size pages would hurt performance. > > This issue was introduced during a refactor on chunk calculations JDK-8254972 (2c7fc85) but did not cause any problem immediately since the default PreTouchParallelChunkSize for all platforms are 1GB which can cover all popular sizes of large pages in use by most kernel variations. Later on, JDK-8254699 (805d058) set default 4MB for Linux platform, which is helpful to speed up startup time for some platforms. For example, most x64, since the popular default large page size (e.g. CentOS) is 2MB. In contrast, most default large page size with aarch64 platforms/kernels (e.g. CentOS) are 512MB, so using the 4MB chunk size to do page walk through the pages inside 512MB large page hurt performance of startup time. > > In addition, there will be a similar problem if we set -XX:PreTouchParallelChunkSize=4k at a x64 Linux platform, the startup slowdown will show as well. > > Tests: > https://bugs.openjdk.java.net/secure/attachment/92623/pretouch_chunk_size_fix_testing.txt > The 4 before-after comparisons show the JVM startup time go back to normal. > 1). 33.381s to 0.870s > 2). 20.333s to 2.740s > 3). 15.090s to 6.268s > 4). 38.983s to 6.709s > (Use the start time of pretouching the first Survivor space as a rough measurement, while \time, or GCTraceTime can generate similar results) Patrick Zhang has updated the pull request incrementally with one additional commit since the last revision: 8259380: Remove the trailing whitespace ------------- Changes: - all: https://git.openjdk.java.net/jdk16/pull/97/files - new: https://git.openjdk.java.net/jdk16/pull/97/files/7415184b..f9aecda1 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk16&pr=97&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk16&pr=97&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk16/pull/97.diff Fetch: git fetch https://git.openjdk.java.net/jdk16 pull/97/head:pull/97 PR: https://git.openjdk.java.net/jdk16/pull/97 From qpzhang at openjdk.java.net Mon Jan 11 15:44:18 2021 From: qpzhang at openjdk.java.net (Patrick Zhang) Date: Mon, 11 Jan 2021 15:44:18 GMT Subject: [jdk16] RFR: 8259380: Correct pretouch chunk size to cap with actual page size [v3] In-Reply-To: References: <7s1sxeNe7w2F2OlPdZC4tnQFmjPQpHvnRqQVwljHbNc=.e0a30da1-22ef-43f6-bf70-518ed2de4bac@github.com> Message-ID: On Mon, 11 Jan 2021 15:22:20 GMT, Patrick Zhang wrote: >> src/hotspot/share/gc/shared/pretouchTask.cpp line 70: >> >>> 68: // large pages size if UseLargePages?was set, otherwise processing chunks with >>> 69: // much smaller size inside large size pages would hurt performance. >>> 70: // Revising page_size should be placed after having decided the proper chuck_size. >> >> Something like >> >> `// Chunk size should be at least (unmodified) page size as using multiple threads pretouch on a single chunk can decrease performance.` >> >> is sufficient here. > > Sure I will update this accordingly, thanks Done ------------- PR: https://git.openjdk.java.net/jdk16/pull/97 From tschatzl at openjdk.java.net Mon Jan 11 16:12:59 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 11 Jan 2021 16:12:59 GMT Subject: [jdk16] RFR: 8259380: Correct pretouch chunk size to cap with actual page size [v3] In-Reply-To: <6ycfcUwyr91m-WmqBNYgKUgPUxXVz67LuyLQh8d0Cb0=.10f9319c-47f2-4b34-a313-89f550bda121@github.com> References: <6ycfcUwyr91m-WmqBNYgKUgPUxXVz67LuyLQh8d0Cb0=.10f9319c-47f2-4b34-a313-89f550bda121@github.com> Message-ID: On Mon, 11 Jan 2021 15:44:17 GMT, Patrick Zhang wrote: >> This is actually a regression, with regards to JVM startup time extreme slowdown, initially found at an aarch64 platform (Ampere Altra core). >> >> The chunk size of pretouching should cap with the input page size which probably stands for large pages size if UseLargePages was set, otherwise processing chunks with much smaller size inside large size pages would hurt performance. >> >> This issue was introduced during a refactor on chunk calculations JDK-8254972 (2c7fc85) but did not cause any problem immediately since the default PreTouchParallelChunkSize for all platforms are 1GB which can cover all popular sizes of large pages in use by most kernel variations. Later on, JDK-8254699 (805d058) set default 4MB for Linux platform, which is helpful to speed up startup time for some platforms. For example, most x64, since the popular default large page size (e.g. CentOS) is 2MB. In contrast, most default large page size with aarch64 platforms/kernels (e.g. CentOS) are 512MB, so using the 4MB chunk size to do page walk through the pages inside 512MB large page hurt performance of startup time. >> >> In addition, there will be a similar problem if we set -XX:PreTouchParallelChunkSize=4k at a x64 Linux platform, the startup slowdown will show as well. >> >> Tests: >> https://bugs.openjdk.java.net/secure/attachment/92623/pretouch_chunk_size_fix_testing.txt >> The 4 before-after comparisons show the JVM startup time go back to normal. >> 1). 33.381s to 0.870s >> 2). 20.333s to 2.740s >> 3). 15.090s to 6.268s >> 4). 38.983s to 6.709s >> (Use the start time of pretouching the first Survivor space as a rough measurement, while \time, or GCTraceTime can generate similar results) > > Patrick Zhang has updated the pull request incrementally with one additional commit since the last revision: > > 8259380: Remove the trailing whitespace Changes requested by tschatzl (Reviewer). src/hotspot/share/gc/shared/pretouchTask.cpp line 68: > 66: size_t page_size, WorkGang* pretouch_gang) { > 67: // Chunk size should be at least (unmodified) page size as using multiple threads > 68: // pretouch on a single chunk can decrease performance. it should actually read "... pretouch on a single page can ..." not chunk :( Sorry, my fault. ------------- PR: https://git.openjdk.java.net/jdk16/pull/97 From tschatzl at openjdk.java.net Mon Jan 11 16:38:56 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 11 Jan 2021 16:38:56 GMT Subject: RFR: 8258254: Move PtrQueue flush to PtrQueueSet subclasses In-Reply-To: <-QQP60eAp1I0kSia-QngOIk1qZ09fGriqo2A2866xv4=.60d6cb4d-1a13-4443-8cb6-389568113c9c@github.com> References: <-QQP60eAp1I0kSia-QngOIk1qZ09fGriqo2A2866xv4=.60d6cb4d-1a13-4443-8cb6-389568113c9c@github.com> Message-ID: On Sun, 20 Dec 2020 10:03:15 GMT, Kim Barrett wrote: > Please review this change to the PtrQueue hierarchy, changing queue flushing > from an intrinsic operation of the queue to an operation the qset performs on > a queue. This is a piece of the refactoring being done under JDK-8258251, > separated out for easier review. > > This change also removes a couple of no longer used internal helper functions > from PtrQueue. > > Testing: > mach5 tier1 > local (linux-x64) hotspot:tier1 with -XX:+UseShenandoahGC Lgtm. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1851 From shade at openjdk.java.net Mon Jan 11 16:53:56 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 11 Jan 2021 16:53:56 GMT Subject: RFR: 8258254: Move PtrQueue flush to PtrQueueSet subclasses In-Reply-To: <-QQP60eAp1I0kSia-QngOIk1qZ09fGriqo2A2866xv4=.60d6cb4d-1a13-4443-8cb6-389568113c9c@github.com> References: <-QQP60eAp1I0kSia-QngOIk1qZ09fGriqo2A2866xv4=.60d6cb4d-1a13-4443-8cb6-389568113c9c@github.com> Message-ID: On Sun, 20 Dec 2020 10:03:15 GMT, Kim Barrett wrote: > Please review this change to the PtrQueue hierarchy, changing queue flushing > from an intrinsic operation of the queue to an operation the qset performs on > a queue. This is a piece of the refactoring being done under JDK-8258251, > separated out for easier review. > > This change also removes a couple of no longer used internal helper functions > from PtrQueue. > > Testing: > mach5 tier1 > local (linux-x64) hotspot:tier1 with -XX:+UseShenandoahGC Shenandoah and shared parts look fine. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1851 From shade at openjdk.java.net Mon Jan 11 17:45:02 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 11 Jan 2021 17:45:02 GMT Subject: RFR: 8255019: Shenandoah: Split STW and concurrent mark into separate classes [v27] In-Reply-To: References: Message-ID: On Tue, 5 Jan 2021 20:36:08 GMT, Zhengyu Gu wrote: >> This is the first part of refactoring, that aims to isolate three Shenandoah GC modes (concurrent, degenerated and full gc). >> >> Shenandoah started with two GC modes, concurrent and full gc, with minimal shared code, mainly in mark phase. After introducing degenerated GC, it shared quite large portion of code with concurrent GC, with the concept that degenerated GC can simply pick up remaining work of concurrent GC in STW mode. >> >> It was not a big problem at that time, since concurrent GC also processed roots STW. Since Shenandoah gradually moved root processing into concurrent phase, code started to diverge, that made code hard to reason and maintain. >> >> First step, I would like to split STW and concurrent mark, so that: >> 1) Code has to special case for STW and concurrent mark. >> 2) STW mark does not need to rendezvous workers between root mark and the rest of mark >> 3) STW mark does not need to activate SATB barrier and drain SATB buffers. >> 4) STW mark does not need to remark some of roots. >> >> The patch mainly just shuffles code. Creates a base class ShenandoahMark, and moved shared code (from current shenandoahConcurrentMark) into this base class. I did 'git mv shenandoahConcurrentMark.inline.hpp shenandoahMark.inline.hpp, but git does not seem to reflect that. >> >> A few changes: >> 1) Moved task queue set from ShenandoahConcurrentMark to ShenandoahHeap. ShenandoahMark and its subclasses are stateless. Instead, mark states are maintained in task queue, mark bitmap and SATB buffers, so that they can be created on demand. >> 2) Split ShenandoahConcurrentRootScanner template to ShenandoahConcurrentRootScanner and ShenandoahSTWRootScanner >> 3) Split code inside op_final_mark code into finish_mark and prepare_evacuation helper functions. >> 4) Made ShenandoahMarkCompact stack allocated (as well as ShenandoahConcurrentGC and ShenandoahDegeneratedGC in upcoming refactoring) > > Zhengyu Gu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 34 commits: > > - Merge branch 'master' into JDK-8255019-sh-mark > - Silent MacOSX build > - @shade's comments > - Merge > - Update copyright years > - Merge > - Merge branch 'master' into JDK-8255019-sh-mark > - Concurrent mark does not expect forwarded objects > - Merge branch 'master' into JDK-8255019-sh-mark > - Merge branch 'master' into JDK-8255019-sh-mark > - ... and 24 more: https://git.openjdk.java.net/jdk/compare/4d3d5991...a6540b99 Changes requested by shade (Reviewer). src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 251: > 249: } > 250: > 251: _marking_context = new ShenandoahMarkingContext(_heap_region, _bitmap_region, _num_regions, MAX2(_max_workers, 1U)); So, `MAX2` protects from `_max_workers == 0`? Is that even plausible? If not, it should be an assert inside `ShenandoahMarkingContext` constructor? src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 1610: > 1608: // Make above changes visible to worker threads > 1609: OrderAccess::fence(); > 1610: ShenandoahConcurrentMark mark; Add (or rather, retain), a newline before this statement. src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 2212: > 2210: if (point == _degenerated_mark) { > 2211: finish_mark(); > 2212: } So if we don't call `finish_mark`, do we ever call `set_concurrent_mark_in_progress(false);` and `mark_complete_marking_context();`? ------------- PR: https://git.openjdk.java.net/jdk/pull/1009 From shade at openjdk.java.net Mon Jan 11 17:45:04 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 11 Jan 2021 17:45:04 GMT Subject: RFR: 8255019: Shenandoah: Split STW and concurrent mark into separate classes [v24] In-Reply-To: References: <4C2MtK5jhW_yHyzDp5k5cAaguy3QlmI0T0LTKf6n978=.8599d416-5d32-40e5-ae7f-9bf5ca5b330d@github.com> Message-ID: On Mon, 11 Jan 2021 13:49:16 GMT, Zhengyu Gu wrote: >> src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 161: >> >>> 159: // threads, and performance-wise it doesn't really matter. Adds about 1ms to >>> 160: // full-gc. >>> 161: { >> >> This seems to revert JDK-8258490? > > No. After splitting, full-gc never gets here. Why the comment that mentions mark-compact and full-gc then? >> src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 168: >> >>> 166: while (satb_mq_set.apply_closure_to_completed_buffer(&cl)); >>> 167: bool do_nmethods = heap->unload_classes() && !ShenandoahConcurrentRoots::can_do_concurrent_class_unloading(); >>> 168: assert(!heap->has_forwarded_objects(), "Not expected"); >> >> Do you need to move this assert? > > No, fixed. Note it still removes the new line. ------------- PR: https://git.openjdk.java.net/jdk/pull/1009 From shade at openjdk.java.net Mon Jan 11 17:45:05 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 11 Jan 2021 17:45:05 GMT Subject: RFR: 8255019: Shenandoah: Split STW and concurrent mark into separate classes [v24] In-Reply-To: <4C2MtK5jhW_yHyzDp5k5cAaguy3QlmI0T0LTKf6n978=.8599d416-5d32-40e5-ae7f-9bf5ca5b330d@github.com> References: <4C2MtK5jhW_yHyzDp5k5cAaguy3QlmI0T0LTKf6n978=.8599d416-5d32-40e5-ae7f-9bf5ca5b330d@github.com> Message-ID: On Tue, 5 Jan 2021 13:55:17 GMT, Aleksey Shipilev wrote: >> Zhengyu Gu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 31 commits: >> >> - Merge >> - Update copyright years >> - Merge >> - Merge branch 'master' into JDK-8255019-sh-mark >> - Concurrent mark does not expect forwarded objects >> - Merge branch 'master' into JDK-8255019-sh-mark >> - Merge branch 'master' into JDK-8255019-sh-mark >> - Silent valgrind on potential memory leak >> - Merge branch 'master' into JDK-8255019-sh-mark >> - Removed ShenandoahConcurrentMark parameter from concurrent GC entry/op, etc. >> - ... and 21 more: https://git.openjdk.java.net/jdk/compare/a6c08813...b7390c08 > > src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.hpp line 58: > >> 56: // TODO: where to put them >> 57: static void update_roots(ShenandoahPhaseTimings::Phase root_phase); >> 58: static void update_thread_roots(ShenandoahPhaseTimings::Phase root_phase); > > Sounds like these better to be shared in `ShenandoahMark`? This is still unanswered. ------------- PR: https://git.openjdk.java.net/jdk/pull/1009 From zgu at openjdk.java.net Mon Jan 11 18:26:04 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Mon, 11 Jan 2021 18:26:04 GMT Subject: RFR: 8255019: Shenandoah: Split STW and concurrent mark into separate classes [v27] In-Reply-To: References: Message-ID: On Mon, 11 Jan 2021 16:42:30 GMT, Aleksey Shipilev wrote: >> Zhengyu Gu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 34 commits: >> >> - Merge branch 'master' into JDK-8255019-sh-mark >> - Silent MacOSX build >> - @shade's comments >> - Merge >> - Update copyright years >> - Merge >> - Merge branch 'master' into JDK-8255019-sh-mark >> - Concurrent mark does not expect forwarded objects >> - Merge branch 'master' into JDK-8255019-sh-mark >> - Merge branch 'master' into JDK-8255019-sh-mark >> - ... and 24 more: https://git.openjdk.java.net/jdk/compare/4d3d5991...a6540b99 > > src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 1610: > >> 1608: // Make above changes visible to worker threads >> 1609: OrderAccess::fence(); >> 1610: ShenandoahConcurrentMark mark; > > Add (or rather, retain), a newline before this statement. Fixed > src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 251: > >> 249: } >> 250: >> 251: _marking_context = new ShenandoahMarkingContext(_heap_region, _bitmap_region, _num_regions, MAX2(_max_workers, 1U)); > > So, `MAX2` protects from `_max_workers == 0`? Is that even plausible? If not, it should be an assert inside `ShenandoahMarkingContext` constructor? Fixed ------------- PR: https://git.openjdk.java.net/jdk/pull/1009 From zgu at openjdk.java.net Mon Jan 11 18:41:09 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Mon, 11 Jan 2021 18:41:09 GMT Subject: RFR: 8255019: Shenandoah: Split STW and concurrent mark into separate classes [v24] In-Reply-To: <4C2MtK5jhW_yHyzDp5k5cAaguy3QlmI0T0LTKf6n978=.8599d416-5d32-40e5-ae7f-9bf5ca5b330d@github.com> References: <4C2MtK5jhW_yHyzDp5k5cAaguy3QlmI0T0LTKf6n978=.8599d416-5d32-40e5-ae7f-9bf5ca5b330d@github.com> Message-ID: On Tue, 5 Jan 2021 13:59:35 GMT, Aleksey Shipilev wrote: >> Zhengyu Gu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 31 commits: >> >> - Merge >> - Update copyright years >> - Merge >> - Merge branch 'master' into JDK-8255019-sh-mark >> - Concurrent mark does not expect forwarded objects >> - Merge branch 'master' into JDK-8255019-sh-mark >> - Merge branch 'master' into JDK-8255019-sh-mark >> - Silent valgrind on potential memory leak >> - Merge branch 'master' into JDK-8255019-sh-mark >> - Removed ShenandoahConcurrentMark parameter from concurrent GC entry/op, etc. >> - ... and 21 more: https://git.openjdk.java.net/jdk/compare/a6c08813...b7390c08 > > src/hotspot/share/gc/shenandoah/shenandoahMark.cpp line 38: > >> 36: #include "gc/shenandoah/shenandoahUtils.hpp" >> 37: #include "gc/shenandoah/shenandoahVerifier.hpp" >> 38: > > Excess newline? Fixed ------------- PR: https://git.openjdk.java.net/jdk/pull/1009 From zgu at openjdk.java.net Mon Jan 11 18:41:06 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Mon, 11 Jan 2021 18:41:06 GMT Subject: RFR: 8255019: Shenandoah: Split STW and concurrent mark into separate classes [v24] In-Reply-To: References: <4C2MtK5jhW_yHyzDp5k5cAaguy3QlmI0T0LTKf6n978=.8599d416-5d32-40e5-ae7f-9bf5ca5b330d@github.com> Message-ID: On Mon, 11 Jan 2021 17:42:02 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.hpp line 58: >> >>> 56: // TODO: where to put them >>> 57: static void update_roots(ShenandoahPhaseTimings::Phase root_phase); >>> 58: static void update_thread_roots(ShenandoahPhaseTimings::Phase root_phase); >> >> Sounds like these better to be shared in `ShenandoahMark`? > > This is still unanswered. They are not really marking functions, they do not belong to there neither ... I left them there, and moved them to ShenandoahGC in next PR (isolate GCs). Okay? ------------- PR: https://git.openjdk.java.net/jdk/pull/1009 From zgu at openjdk.java.net Mon Jan 11 18:58:17 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Mon, 11 Jan 2021 18:58:17 GMT Subject: RFR: 8255019: Shenandoah: Split STW and concurrent mark into separate classes [v28] In-Reply-To: References: Message-ID: <5ly9_tDk3Ja-pmAdkYU4BjREIHtqsJB2NhkSnIQFbI8=.63aeec1e-2e79-42f5-baeb-055e8740029b@github.com> > This is the first part of refactoring, that aims to isolate three Shenandoah GC modes (concurrent, degenerated and full gc). > > Shenandoah started with two GC modes, concurrent and full gc, with minimal shared code, mainly in mark phase. After introducing degenerated GC, it shared quite large portion of code with concurrent GC, with the concept that degenerated GC can simply pick up remaining work of concurrent GC in STW mode. > > It was not a big problem at that time, since concurrent GC also processed roots STW. Since Shenandoah gradually moved root processing into concurrent phase, code started to diverge, that made code hard to reason and maintain. > > First step, I would like to split STW and concurrent mark, so that: > 1) Code has to special case for STW and concurrent mark. > 2) STW mark does not need to rendezvous workers between root mark and the rest of mark > 3) STW mark does not need to activate SATB barrier and drain SATB buffers. > 4) STW mark does not need to remark some of roots. > > The patch mainly just shuffles code. Creates a base class ShenandoahMark, and moved shared code (from current shenandoahConcurrentMark) into this base class. I did 'git mv shenandoahConcurrentMark.inline.hpp shenandoahMark.inline.hpp, but git does not seem to reflect that. > > A few changes: > 1) Moved task queue set from ShenandoahConcurrentMark to ShenandoahHeap. ShenandoahMark and its subclasses are stateless. Instead, mark states are maintained in task queue, mark bitmap and SATB buffers, so that they can be created on demand. > 2) Split ShenandoahConcurrentRootScanner template to ShenandoahConcurrentRootScanner and ShenandoahSTWRootScanner > 3) Split code inside op_final_mark code into finish_mark and prepare_evacuation helper functions. > 4) Made ShenandoahMarkCompact stack allocated (as well as ShenandoahConcurrentGC and ShenandoahDegeneratedGC in upcoming refactoring) Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: Fixes based on shade's comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1009/files - new: https://git.openjdk.java.net/jdk/pull/1009/files/a6540b99..37cda8b7 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1009&range=27 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1009&range=26-27 Stats: 10 lines in 4 files changed: 3 ins; 6 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1009.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1009/head:pull/1009 PR: https://git.openjdk.java.net/jdk/pull/1009 From shade at openjdk.java.net Mon Jan 11 19:08:03 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 11 Jan 2021 19:08:03 GMT Subject: RFR: 8259580: Shenandoah: uninitialized label in VerifyThreadGCState Message-ID: "label" is passed, but never hooked into the field. So instead of reporting a GC bug, Verifier would probably crash itself trying to read garbage memory. ------------- Commit messages: - 8259580: Shenandoah: uninitialized label in VerifyThreadGCState Changes: https://git.openjdk.java.net/jdk/pull/2033/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2033&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8259580 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/2033.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2033/head:pull/2033 PR: https://git.openjdk.java.net/jdk/pull/2033 From zgu at openjdk.java.net Mon Jan 11 19:34:57 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Mon, 11 Jan 2021 19:34:57 GMT Subject: RFR: 8259580: Shenandoah: uninitialized label in VerifyThreadGCState In-Reply-To: References: Message-ID: On Mon, 11 Jan 2021 19:02:55 GMT, Aleksey Shipilev wrote: > "label" is passed, but never hooked into the field. So instead of reporting a GC bug, Verifier would probably crash itself trying to read garbage memory. Looks good ------------- Marked as reviewed by zgu (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2033 From sangheon.kim at oracle.com Mon Jan 11 20:05:10 2021 From: sangheon.kim at oracle.com (sangheon.kim at oracle.com) Date: Mon, 11 Jan 2021 12:05:10 -0800 Subject: Unexpected results when enabling +UseNUMA for G1GC In-Reply-To: References: Message-ID: <9c42a365-db78-4699-c138-cc06d0c4708f@oracle.com> Hi Tal, I added in-line comments. On 1/9/21 12:15 PM, Tal Goldstein wrote: > Hi Guys, > We're exploring the use of the flag -XX:+UseNUMA and its effect on G1 GC in > JDK 14. > For that, we've created a test that consists of 2 k8s deployments of some > service, > where deployment A has the UseNUMA flag enabled, and deployment B doesn't > have it. > > In order for NUMA to actually work inside the docker container, we also > needed to add numactl lib to the container (apk add numactl), > and in order to measure the local/remote memory access we've used pcm-numa ( > https://github.com/opcm/pcm), > the docker is based on an image of Alpine Linux v3.11. > > Each deployment handles around 150 requests per second and all of the > deployment's pods are running on the same kube machine. > When running the test, we expected to see that the (local memory access) / > (total memory access) ratio on the UseNUMA deployment, is much higher than > the non-numa deployment, > and as a result that the deployment itself handles a higher throughput of > requests than the non-numa deployment. > > Surprisingly this isn't the case: > On the kube running deployment A which uses NUMA, we measured 20M/ 13M/ 33M > (local/remote/total) memory accesses, > and for the kube running deployment B which doesn't use NUMA, we measured > (23M/10M/33M) on the same time. Just curious, did you see any performance difference(other than pcm-numa) between those two? Does it mean you ran 2 pods in parallel(at the same time) on one physical machine? > Can you help to understand if we're doing anything wrong? or maybe our > expectations are wrong ? > > The 2 deployments are identical (except for the UseNUMA flag): > Each deployment contains 2 pods running on k8s. > Each pod has 10GB memory, 8GB heap, requires 2 CPUs (but not limited to 2). > Each deployment runs on a separate but identical kube machine with this > spec: > Hardware............: Supermicro SYS-2027TR-HTRF+ > CPU.................: Intel(R) Xeon(R) CPU E5-2630L v2 @ > 2.40GHz > CPUs................: 2 > CPU Cores...........: 12 > Memory..............: 63627 MB > > > We've also written to a file all NUMA related logs (using > -Xlog:os*,gc*=trace:file=/outbrain/heapdumps/fulllog.log:hostname,time,level,tags) > - log file could be found here: > https://drive.google.com/file/d/1eZqYDtBDWKXaEakh_DoYv0P6V9bcLs6Z/view?usp=sharing > so we know that NUMA is indeed working, but again, it doesn't give the > desired results we expected to see. From the shared log file, I see only 1 GC (GC id, 6761) and numa stat shows 53% of local memory allocation (gc,heap,numa) which seems okay. Could you share your full vm options? > > Any Ideas why ? > Is it a matter of workload ? Can you increase your Java heap on the testing machine? Your test machine has almost 64GB of memory on 2 NUMA nodes. So I assume each NUMA node will have almost 32GB of memory. But you are using only 8GB on Java heap which fits on one node, so I can't expect any benefit of enabling NUMA. As the JVM is running on Kubernetes, there could be another thing may affect to the test. For example, topology manager may treat a pod to allocate from a single NUMA node. https://kubernetes.io/docs/tasks/administer-cluster/topology-manager/ > Are there any workloads you can suggest that > will benefit from G1 NUMA awareness ? I measured some performance improvements on SpecJBB2015 and SpecJBB2005. > Do you happen to have a link to code that runs such a workload? No, I don't have such link for above runs. Thanks, Sangheon > Thanks, > Tal > From rkennke at openjdk.java.net Mon Jan 11 20:57:06 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 11 Jan 2021 20:57:06 GMT Subject: RFR: 8259580: Shenandoah: uninitialized label in VerifyThreadGCState In-Reply-To: References: Message-ID: On Mon, 11 Jan 2021 19:02:55 GMT, Aleksey Shipilev wrote: > "label" is passed, but never hooked into the field. So instead of reporting a GC bug, Verifier would probably crash itself trying to read garbage memory. src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp line 606: > 604: private: > 605: const char* const _label; > 606: char const _expected; Why can't _expected not be const char* const too? Are we really messing with the string on the way? ------------- PR: https://git.openjdk.java.net/jdk/pull/2033 From qpzhang at openjdk.java.net Tue Jan 12 02:39:12 2021 From: qpzhang at openjdk.java.net (Patrick Zhang) Date: Tue, 12 Jan 2021 02:39:12 GMT Subject: [jdk16] RFR: 8259380: Correct pretouch chunk size to cap with actual page size [v4] In-Reply-To: References: Message-ID: > This is actually a regression, with regards to JVM startup time extreme slowdown, initially found at an aarch64 platform (Ampere Altra core). > > The chunk size of pretouching should cap with the input page size which probably stands for large pages size if UseLargePages was set, otherwise processing chunks with much smaller size inside large size pages would hurt performance. > > This issue was introduced during a refactor on chunk calculations JDK-8254972 (2c7fc85) but did not cause any problem immediately since the default PreTouchParallelChunkSize for all platforms are 1GB which can cover all popular sizes of large pages in use by most kernel variations. Later on, JDK-8254699 (805d058) set default 4MB for Linux platform, which is helpful to speed up startup time for some platforms. For example, most x64, since the popular default large page size (e.g. CentOS) is 2MB. In contrast, most default large page size with aarch64 platforms/kernels (e.g. CentOS) are 512MB, so using the 4MB chunk size to do page walk through the pages inside 512MB large page hurt performance of startup time. > > In addition, there will be a similar problem if we set -XX:PreTouchParallelChunkSize=4k at a x64 Linux platform, the startup slowdown will show as well. > > Tests: > https://bugs.openjdk.java.net/secure/attachment/92623/pretouch_chunk_size_fix_testing.txt > The 4 before-after comparisons show the JVM startup time go back to normal. > 1). 33.381s to 0.870s > 2). 20.333s to 2.740s > 3). 15.090s to 6.268s > 4). 38.983s to 6.709s > (Use the start time of pretouching the first Survivor space as a rough measurement, while \time, or GCTraceTime can generate similar results) Patrick Zhang has updated the pull request incrementally with one additional commit since the last revision: 8259380: Updated comments regarding pretouching on a single page ------------- Changes: - all: https://git.openjdk.java.net/jdk16/pull/97/files - new: https://git.openjdk.java.net/jdk16/pull/97/files/f9aecda1..efb58730 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk16&pr=97&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk16&pr=97&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk16/pull/97.diff Fetch: git fetch https://git.openjdk.java.net/jdk16 pull/97/head:pull/97 PR: https://git.openjdk.java.net/jdk16/pull/97 From qpzhang at openjdk.java.net Tue Jan 12 02:39:13 2021 From: qpzhang at openjdk.java.net (Patrick Zhang) Date: Tue, 12 Jan 2021 02:39:13 GMT Subject: [jdk16] RFR: 8259380: Correct pretouch chunk size to cap with actual page size [v3] In-Reply-To: References: <6ycfcUwyr91m-WmqBNYgKUgPUxXVz67LuyLQh8d0Cb0=.10f9319c-47f2-4b34-a313-89f550bda121@github.com> Message-ID: On Mon, 11 Jan 2021 16:10:08 GMT, Thomas Schatzl wrote: >> Patrick Zhang has updated the pull request incrementally with one additional commit since the last revision: >> >> 8259380: Remove the trailing whitespace > > src/hotspot/share/gc/shared/pretouchTask.cpp line 68: > >> 66: size_t page_size, WorkGang* pretouch_gang) { >> 67: // Chunk size should be at least (unmodified) page size as using multiple threads >> 68: // pretouch on a single chunk can decrease performance. > > it should actually read "... pretouch on a single page can ..." not chunk :( Sorry, my fault. fixed ------------- PR: https://git.openjdk.java.net/jdk16/pull/97 From kbarrett at openjdk.java.net Tue Jan 12 03:15:02 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 12 Jan 2021 03:15:02 GMT Subject: RFR: 8258254: Move PtrQueue flush to PtrQueueSet subclasses In-Reply-To: References: <-QQP60eAp1I0kSia-QngOIk1qZ09fGriqo2A2866xv4=.60d6cb4d-1a13-4443-8cb6-389568113c9c@github.com> Message-ID: <90ADrhvR1wYnK9Kk3aYW7oseegRFEM_Mnb1WqBQNUFg=.8525b43e-99d0-4136-8bed-6255294188cc@github.com> On Mon, 11 Jan 2021 16:51:05 GMT, Aleksey Shipilev wrote: >> Please review this change to the PtrQueue hierarchy, changing queue flushing >> from an intrinsic operation of the queue to an operation the qset performs on >> a queue. This is a piece of the refactoring being done under JDK-8258251, >> separated out for easier review. >> >> This change also removes a couple of no longer used internal helper functions >> from PtrQueue. >> >> Testing: >> mach5 tier1 >> local (linux-x64) hotspot:tier1 with -XX:+UseShenandoahGC > > Shenandoah and shared parts look fine. Thanks @shipilev and @tschatzl for reviewing. ------------- PR: https://git.openjdk.java.net/jdk/pull/1851 From kbarrett at openjdk.java.net Tue Jan 12 04:15:20 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 12 Jan 2021 04:15:20 GMT Subject: RFR: 8258254: Move PtrQueue flush to PtrQueueSet subclasses [v2] In-Reply-To: <-QQP60eAp1I0kSia-QngOIk1qZ09fGriqo2A2866xv4=.60d6cb4d-1a13-4443-8cb6-389568113c9c@github.com> References: <-QQP60eAp1I0kSia-QngOIk1qZ09fGriqo2A2866xv4=.60d6cb4d-1a13-4443-8cb6-389568113c9c@github.com> Message-ID: > Please review this change to the PtrQueue hierarchy, changing queue flushing > from an intrinsic operation of the queue to an operation the qset performs on > a queue. This is a piece of the refactoring being done under JDK-8258251, > separated out for easier review. > > This change also removes a couple of no longer used internal helper functions > from PtrQueue. > > Testing: > mach5 tier1 > local (linux-x64) hotspot:tier1 with -XX:+UseShenandoahGC Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into move_flush - update copyrights - move flush ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1851/files - new: https://git.openjdk.java.net/jdk/pull/1851/files/9b51552f..07d17089 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1851&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1851&range=00-01 Stats: 38704 lines in 1385 files changed: 14517 ins; 11573 del; 12614 mod Patch: https://git.openjdk.java.net/jdk/pull/1851.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1851/head:pull/1851 PR: https://git.openjdk.java.net/jdk/pull/1851 From kbarrett at openjdk.java.net Tue Jan 12 04:15:21 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 12 Jan 2021 04:15:21 GMT Subject: Integrated: 8258254: Move PtrQueue flush to PtrQueueSet subclasses In-Reply-To: <-QQP60eAp1I0kSia-QngOIk1qZ09fGriqo2A2866xv4=.60d6cb4d-1a13-4443-8cb6-389568113c9c@github.com> References: <-QQP60eAp1I0kSia-QngOIk1qZ09fGriqo2A2866xv4=.60d6cb4d-1a13-4443-8cb6-389568113c9c@github.com> Message-ID: On Sun, 20 Dec 2020 10:03:15 GMT, Kim Barrett wrote: > Please review this change to the PtrQueue hierarchy, changing queue flushing > from an intrinsic operation of the queue to an operation the qset performs on > a queue. This is a piece of the refactoring being done under JDK-8258251, > separated out for easier review. > > This change also removes a couple of no longer used internal helper functions > from PtrQueue. > > Testing: > mach5 tier1 > local (linux-x64) hotspot:tier1 with -XX:+UseShenandoahGC This pull request has now been integrated. Changeset: 77f62909 Author: Kim Barrett URL: https://git.openjdk.java.net/jdk/commit/77f62909 Stats: 147 lines in 12 files changed: 55 ins; 69 del; 23 mod 8258254: Move PtrQueue flush to PtrQueueSet subclasses Reviewed-by: tschatzl, shade ------------- PR: https://git.openjdk.java.net/jdk/pull/1851 From tschatzl at openjdk.java.net Tue Jan 12 08:23:01 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 12 Jan 2021 08:23:01 GMT Subject: [jdk16] RFR: 8259380: Correct pretouch chunk size to cap with actual page size [v4] In-Reply-To: <9xXvQ9dpCY5tLgLIIf_TBwkEgZQyGd_7rvsmmufSSJ8=.fb4cde97-038d-437d-9f60-2c3d524309c2@github.com> References: <9xXvQ9dpCY5tLgLIIf_TBwkEgZQyGd_7rvsmmufSSJ8=.fb4cde97-038d-437d-9f60-2c3d524309c2@github.com> Message-ID: <5ZsDml8C1FuZPO-s6oJ5GWgXStn25TCuP2A0QRafnQk=.41d7a07c-c213-4901-b33a-85b781cde3d7@github.com> On Tue, 12 Jan 2021 08:19:32 GMT, Thomas Schatzl wrote: >> Patrick Zhang has updated the pull request incrementally with one additional commit since the last revision: >> >> 8259380: Updated comments regarding pretouching on a single page > > Lgtm, thanks. Please wait for a second reviewer to approve before integrating. ------------- PR: https://git.openjdk.java.net/jdk16/pull/97 From tschatzl at openjdk.java.net Tue Jan 12 08:23:00 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 12 Jan 2021 08:23:00 GMT Subject: [jdk16] RFR: 8259380: Correct pretouch chunk size to cap with actual page size [v4] In-Reply-To: References: Message-ID: <9xXvQ9dpCY5tLgLIIf_TBwkEgZQyGd_7rvsmmufSSJ8=.fb4cde97-038d-437d-9f60-2c3d524309c2@github.com> On Tue, 12 Jan 2021 02:39:12 GMT, Patrick Zhang wrote: >> This is actually a regression, with regards to JVM startup time extreme slowdown, initially found at an aarch64 platform (Ampere Altra core). >> >> The chunk size of pretouching should cap with the input page size which probably stands for large pages size if UseLargePages was set, otherwise processing chunks with much smaller size inside large size pages would hurt performance. >> >> This issue was introduced during a refactor on chunk calculations JDK-8254972 (2c7fc85) but did not cause any problem immediately since the default PreTouchParallelChunkSize for all platforms are 1GB which can cover all popular sizes of large pages in use by most kernel variations. Later on, JDK-8254699 (805d058) set default 4MB for Linux platform, which is helpful to speed up startup time for some platforms. For example, most x64, since the popular default large page size (e.g. CentOS) is 2MB. In contrast, most default large page size with aarch64 platforms/kernels (e.g. CentOS) are 512MB, so using the 4MB chunk size to do page walk through the pages inside 512MB large page hurt performance of startup time. >> >> In addition, there will be a similar problem if we set -XX:PreTouchParallelChunkSize=4k at a x64 Linux platform, the startup slowdown will show as well. >> >> Tests: >> https://bugs.openjdk.java.net/secure/attachment/92623/pretouch_chunk_size_fix_testing.txt >> The 4 before-after comparisons show the JVM startup time go back to normal. >> 1). 33.381s to 0.870s >> 2). 20.333s to 2.740s >> 3). 15.090s to 6.268s >> 4). 38.983s to 6.709s >> (Use the start time of pretouching the first Survivor space as a rough measurement, while \time, or GCTraceTime can generate similar results) > > Patrick Zhang has updated the pull request incrementally with one additional commit since the last revision: > > 8259380: Updated comments regarding pretouching on a single page Lgtm, thanks. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk16/pull/97 From sjohanss at openjdk.java.net Tue Jan 12 08:36:02 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Tue, 12 Jan 2021 08:36:02 GMT Subject: [jdk16] RFR: 8259380: Correct pretouch chunk size to cap with actual page size [v4] In-Reply-To: References: Message-ID: On Tue, 12 Jan 2021 02:39:12 GMT, Patrick Zhang wrote: >> This is actually a regression, with regards to JVM startup time extreme slowdown, initially found at an aarch64 platform (Ampere Altra core). >> >> The chunk size of pretouching should cap with the input page size which probably stands for large pages size if UseLargePages was set, otherwise processing chunks with much smaller size inside large size pages would hurt performance. >> >> This issue was introduced during a refactor on chunk calculations JDK-8254972 (2c7fc85) but did not cause any problem immediately since the default PreTouchParallelChunkSize for all platforms are 1GB which can cover all popular sizes of large pages in use by most kernel variations. Later on, JDK-8254699 (805d058) set default 4MB for Linux platform, which is helpful to speed up startup time for some platforms. For example, most x64, since the popular default large page size (e.g. CentOS) is 2MB. In contrast, most default large page size with aarch64 platforms/kernels (e.g. CentOS) are 512MB, so using the 4MB chunk size to do page walk through the pages inside 512MB large page hurt performance of startup time. >> >> In addition, there will be a similar problem if we set -XX:PreTouchParallelChunkSize=4k at a x64 Linux platform, the startup slowdown will show as well. >> >> Tests: >> https://bugs.openjdk.java.net/secure/attachment/92623/pretouch_chunk_size_fix_testing.txt >> The 4 before-after comparisons show the JVM startup time go back to normal. >> 1). 33.381s to 0.870s >> 2). 20.333s to 2.740s >> 3). 15.090s to 6.268s >> 4). 38.983s to 6.709s >> (Use the start time of pretouching the first Survivor space as a rough measurement, while \time, or GCTraceTime can generate similar results) > > Patrick Zhang has updated the pull request incrementally with one additional commit since the last revision: > > 8259380: Updated comments regarding pretouching on a single page Looks good to me too. ------------- Marked as reviewed by sjohanss (Reviewer). PR: https://git.openjdk.java.net/jdk16/pull/97 From tschatzl at openjdk.java.net Tue Jan 12 09:31:00 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 12 Jan 2021 09:31:00 GMT Subject: RFR: 8227106: InitiatingHeapOccupancyPercent is G1-specific but defined in shared In-Reply-To: References: Message-ID: On Mon, 16 Nov 2020 08:21:01 GMT, Guoxiong Li wrote: > Hi all, > > Please review this little fix of G1. > The command line option InitiatingHeapOccupancyPercent is G1-specific. But it is defined in shared/gc_globals.hpp rather than g1/g1_globals.hpp. This patch moves it to the proper location. > Thank you for taking the time to review. > > Best Regards. Hi, first, apologies for the late reply and actually having this PR get automatically closed due to inactivity. Unfortunately, simply moving the `InitiatingHeapOccupancyPercent` flag to the other file is not sufficient to complete this problem properly. The move to that file implies that the flag will not be available for VM builds without G1 any more, possibly breaking lots of existing deployments - and there are lots of them using that flag. Further, flags in that file should all have a `G1` prefix clearly indicating that this is a G1 specific flag. So a proper path to this "simple" move would probably be: 1) Deprecate `InitiatingHeapOccupancyPercent` so that people can still use it the next few releases. Basically add it at the proper location in `special_jvm_flags` (best grep it). 2) Create a new flag "G1InitiatingHeapOccupancyPercent" and alias the other to this one (and add code handling if both are specified etc). 3) Change the code including the tests to use the new flag. 4) Fix the documentation (which we would need to do as the documentation sources are not public) These two steps require a CSR - I think a single one is sufficient renaming the flag. 5) At some point in the future a few releases out, obsolete and another release ahead, remove the old flag. (I will also add this information to the CR in JIRA) Tbh it's lots of work for not a lot of gain and a certain degree of disruption for end users so it might not be worth it. However I will support you (with improved turnaround time) if you want to continue on that path, as in the end it's work that should be done. Thanks, Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/1217 From stefank at openjdk.java.net Tue Jan 12 10:12:59 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Tue, 12 Jan 2021 10:12:59 GMT Subject: RFR: 8256814: WeakProcessorPhases may be redundant In-Reply-To: References: Message-ID: On Tue, 22 Dec 2020 04:59:28 GMT, Kim Barrett wrote: > Please review this change which eliminates the WeakProcessorPhase class. > > The OopStorageSet class is changed to provide scoped enums for the different > categories: StrongId, WeakId, and Id (for the union of strong and weak). > An accessor is provided for obtaining the storage corresponding to a > category value. > > Various other enumerator ranges, array sizes and indices, and iterations are > derived directly from the corresponding OopStorageSet category's enum range. > > Iteration over a category of enumerators can be done via EnumIterator. The > iteration over storage objects makes use of that enum iteration, rather than > having a bespoke implementation. Some use-cases need iteration of the > enumerators, with storage lookup from the enumerator; other use-cases just > need the storage objects. > > Testing: > mach5 tier1-6 > Local (linux-x64) hotspot:tier1 with -XX:+UseShenandoahGC I think this looks good. I have a few comments that I would like to get addressed, but they are not blockers if you want to proceed with what you have. src/hotspot/share/gc/shared/oopStorageSetParState.hpp line 35: > 33: > 34: // Base class for OopStorageSet{Strong,Weak}ParState. > 35: template While reviewing this, it was not immediately obvious what T represent. EnumRange uses the name StorageId, maybe use the same here? src/hotspot/share/gc/shared/oopStorageSetParState.hpp line 52: > 50: > 51: NONCOPYABLE(OopStorageSetParState); > 52: }; We tend to put the member variables at the top of classes. I don't think ParState needs to be public, and this could be changed to: template class OopStorageSetParState { using ParState = OopStorage::ParState; ValueObjArray().size()> _par_states; public: ParState* par_state(T id) const { return _par_states.at(checked_cast(EnumRange().index(id))); } protected: OopStorageSetParState() : _par_states(OopStorageSet::Range().begin()) {} ~OopStorageSetParState() = default; private: NONCOPYABLE(OopStorageSetParState); }; src/hotspot/share/gc/shared/oopStorageSetParState.hpp line 58: > 56: class OopStorageSetStrongParState > 57: : public OopStorageSetParState > 58: { We usually keep the `{` on the same line. src/hotspot/share/gc/shared/oopStorageSetParState.hpp line 68: > 66: class OopStorageSetWeakParState > 67: : public OopStorageSetParState > 68: { Same comment as above. src/hotspot/share/gc/shared/oopStorageSetParState.inline.hpp line 36: > 34: > 35: template > 36: template Other places in the file uses `template <` so the usage of `template<` makes the code inconsistent. src/hotspot/share/gc/shared/weakProcessorTimes.hpp line 37: > 35: class WeakProcessorTimes { > 36: public: > 37: using StorageId = OopStorageSet::WeakId; Could be private. test/hotspot/gtest/gc/shared/test_oopStorageSet.cpp line 48: > 46: > 47: template > 48: static void check_iterator(OopStorageSet::Iterator it, All the functions you changed are named `_iterator` and tested OopStorageSet::Iterator. Now the name is the same, but instead they test the Range facility. I think these functions should be renamed. Alternatively, we keep the tests for the OopStorageSet::Iterator and create a new set for the Range? ------------- Marked as reviewed by stefank (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1862 From qpzhang at openjdk.java.net Tue Jan 12 10:14:05 2021 From: qpzhang at openjdk.java.net (Patrick Zhang) Date: Tue, 12 Jan 2021 10:14:05 GMT Subject: [jdk16] Integrated: 8259380: Correct pretouch chunk size to cap with actual page size In-Reply-To: References: Message-ID: On Fri, 8 Jan 2021 13:41:06 GMT, Patrick Zhang wrote: > This is actually a regression, with regards to JVM startup time extreme slowdown, initially found at an aarch64 platform (Ampere Altra core). > > The chunk size of pretouching should cap with the input page size which probably stands for large pages size if UseLargePages was set, otherwise processing chunks with much smaller size inside large size pages would hurt performance. > > This issue was introduced during a refactor on chunk calculations JDK-8254972 (2c7fc85) but did not cause any problem immediately since the default PreTouchParallelChunkSize for all platforms are 1GB which can cover all popular sizes of large pages in use by most kernel variations. Later on, JDK-8254699 (805d058) set default 4MB for Linux platform, which is helpful to speed up startup time for some platforms. For example, most x64, since the popular default large page size (e.g. CentOS) is 2MB. In contrast, most default large page size with aarch64 platforms/kernels (e.g. CentOS) are 512MB, so using the 4MB chunk size to do page walk through the pages inside 512MB large page hurt performance of startup time. > > In addition, there will be a similar problem if we set -XX:PreTouchParallelChunkSize=4k at a x64 Linux platform, the startup slowdown will show as well. > > Tests: > https://bugs.openjdk.java.net/secure/attachment/92623/pretouch_chunk_size_fix_testing.txt > The 4 before-after comparisons show the JVM startup time go back to normal. > 1). 33.381s to 0.870s > 2). 20.333s to 2.740s > 3). 15.090s to 6.268s > 4). 38.983s to 6.709s > (Use the start time of pretouching the first Survivor space as a rough measurement, while \time, or GCTraceTime can generate similar results) This pull request has now been integrated. Changeset: 67e1b639 Author: Patrick Zhang Committer: Thomas Schatzl URL: https://git.openjdk.java.net/jdk16/commit/67e1b639 Stats: 4 lines in 1 file changed: 2 ins; 1 del; 1 mod 8259380: Correct pretouch chunk size to cap with actual page size Reviewed-by: tschatzl, sjohanss ------------- PR: https://git.openjdk.java.net/jdk16/pull/97 From shade at openjdk.java.net Tue Jan 12 10:55:03 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 12 Jan 2021 10:55:03 GMT Subject: RFR: 8259580: Shenandoah: uninitialized label in VerifyThreadGCState In-Reply-To: References: Message-ID: On Mon, 11 Jan 2021 20:54:28 GMT, Roman Kennke wrote: >> "label" is passed, but never hooked into the field. So instead of reporting a GC bug, Verifier would probably crash itself trying to read garbage memory. > > src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp line 606: > >> 604: private: >> 605: const char* const _label; >> 606: char const _expected; > > Why can't _expected not be const char* const too? Are we really messing with the string on the way? `_expected` is not `const char*`, it is `const char`, where `const` is placed near the field identifier for consistency. ------------- PR: https://git.openjdk.java.net/jdk/pull/2033 From ayang at openjdk.java.net Tue Jan 12 11:20:06 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 12 Jan 2021 11:20:06 GMT Subject: RFR: 8074101: Add verification that all tasks are actually claimed during roots processing Message-ID: The first commit removes some obsolete enum items, while the second commit adds the verification logic. Commit 2 introduces some "empty" task claims for the verification logic, explicitly marked in the comments. Test: hotspot_gc ------------- Commit messages: - add claim verification - remove unused enum items Changes: https://git.openjdk.java.net/jdk/pull/2046/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2046&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8074101 Stats: 59 lines in 8 files changed: 29 ins; 20 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/2046.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2046/head:pull/2046 PR: https://git.openjdk.java.net/jdk/pull/2046 From github.com+13688759+lgxbslgx at openjdk.java.net Tue Jan 12 12:10:59 2021 From: github.com+13688759+lgxbslgx at openjdk.java.net (Guoxiong Li) Date: Tue, 12 Jan 2021 12:10:59 GMT Subject: RFR: 8227106: InitiatingHeapOccupancyPercent is G1-specific but defined in shared In-Reply-To: References: Message-ID: On Tue, 12 Jan 2021 09:27:55 GMT, Thomas Schatzl wrote: >> Hi all, >> >> Please review this little fix of G1. >> The command line option InitiatingHeapOccupancyPercent is G1-specific. But it is defined in shared/gc_globals.hpp rather than g1/g1_globals.hpp. This patch moves it to the proper location. >> Thank you for taking the time to review. >> >> Best Regards. > > Hi, > > first, apologies for the late reply and actually having this PR get automatically closed due to inactivity. > > Unfortunately, simply moving the `InitiatingHeapOccupancyPercent` flag to the other file is not sufficient to complete this problem properly. The move to that file implies that the flag will not be available for VM builds without G1 any more, possibly breaking lots of existing deployments - and there are lots of them using that flag. > Further, flags in that file should all have a `G1` prefix clearly indicating that this is a G1 specific flag. > > So a proper path to this "simple" move would probably be: > > 1) Deprecate `InitiatingHeapOccupancyPercent` so that people can still use it the next few releases. Basically add it at the proper location in `special_jvm_flags` (best grep it). > 2) Create a new flag "G1InitiatingHeapOccupancyPercent" and alias the other to this one (and add code handling if both are specified etc). > 3) Change the code including the tests to use the new flag. > 4) Fix the documentation (which we would need to do as the documentation sources are not public) > > These two steps require a CSR - I think a single one is sufficient renaming the flag. > > 5) At some point in the future a few releases out, obsolete and another release ahead, remove the old flag. > > (I will also add this information to the CR in JIRA) > > Tbh it's lots of work for not a lot of gain and a certain degree of disruption for end users so it might not be worth it. > However I will support you (with improved turnaround time) if you want to continue on that path, as in the end it's work that should be done. > > Thanks, > Thomas @tschatzl Thank you for your reply. Maybe the designer want other GC to reuse `InitiatingHeapOccupancyPercent` in the future. We should do more investigations about the possible useful situations before we restart this work. ------------- PR: https://git.openjdk.java.net/jdk/pull/1217 From tschatzl at openjdk.java.net Tue Jan 12 12:37:58 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 12 Jan 2021 12:37:58 GMT Subject: RFR: 8227106: InitiatingHeapOccupancyPercent is G1-specific but defined in shared In-Reply-To: References: Message-ID: On Tue, 12 Jan 2021 12:08:31 GMT, Guoxiong Li wrote: >> Hi, >> >> first, apologies for the late reply and actually having this PR get automatically closed due to inactivity. >> >> Unfortunately, simply moving the `InitiatingHeapOccupancyPercent` flag to the other file is not sufficient to complete this problem properly. The move to that file implies that the flag will not be available for VM builds without G1 any more, possibly breaking lots of existing deployments - and there are lots of them using that flag. >> Further, flags in that file should all have a `G1` prefix clearly indicating that this is a G1 specific flag. >> >> So a proper path to this "simple" move would probably be: >> >> 1) Deprecate `InitiatingHeapOccupancyPercent` so that people can still use it the next few releases. Basically add it at the proper location in `special_jvm_flags` (best grep it). >> 2) Create a new flag "G1InitiatingHeapOccupancyPercent" and alias the other to this one (and add code handling if both are specified etc). >> 3) Change the code including the tests to use the new flag. >> 4) Fix the documentation (which we would need to do as the documentation sources are not public) >> >> These two steps require a CSR - I think a single one is sufficient renaming the flag. >> >> 5) At some point in the future a few releases out, obsolete and another release ahead, remove the old flag. >> >> (I will also add this information to the CR in JIRA) >> >> Tbh it's lots of work for not a lot of gain and a certain degree of disruption for end users so it might not be worth it. >> However I will support you (with improved turnaround time) if you want to continue on that path, as in the end it's work that should be done. >> >> Thanks, >> Thomas > > @tschatzl Thank you for your reply. > > Maybe the designer want other GC to reuse `InitiatingHeapOccupancyPercent` in the future. > We should do more investigations about the possible useful situations before we restart this work. It seems unlikely that this flag will be reused soon: both of the other concurrent collectors, ZGC and Shenandoah, have their own heuristics with their own set of flags that do not exactly match the semantics of `InitiatingHeapOccupancyPercent`. Feel free to keep this PR closed though for now. ------------- PR: https://git.openjdk.java.net/jdk/pull/1217 From kbarrett at openjdk.java.net Tue Jan 12 15:05:02 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 12 Jan 2021 15:05:02 GMT Subject: RFR: 8074101: Add verification that all tasks are actually claimed during roots processing In-Reply-To: References: Message-ID: On Tue, 12 Jan 2021 11:13:29 GMT, Albert Mingkun Yang wrote: > The first commit removes some obsolete enum items, while the second commit adds the verification logic. Commit 2 introduces some "empty" task claims for the verification logic, explicitly marked in the comments. > > Test: hotspot_gc src/hotspot/share/gc/g1/g1RootProcessor.cpp line 66: > 64: // already processed in java roots. > 65: _process_strong_tasks.try_claim_task(G1RP_PS_CodeCache_oops_do); > 66: #endif Rather than these fake claims, consider something like this: template void all_tasks_completed(uint nworkers, Ts... tags) { // Type-check more_skipped are all of the same type as first_skipped. T0 typed_skipped[] = { first_skipped, more_skipped... }; uint skipped[] = { static_cast(tags)... }; all_tasks_completed_impl(nworkers, skipped, ARRAY_SIZE(skipped)); } void all_tasks_completed(uint nworkers) { all_tasks_completed_impl(nworkers, nullptr, 0); } Usage: all_tasks_completed(n_workers(), G1RP_PS_CodeCache_oops_do, G1RP_PS_refProcessor_oops_do) all_tasks_completed_impl can check that all tasks have been claimed except the skipped ones, which have not been claimed. There might be better ways to write the variadic all_tasks_completed. It's been a while since I've done anything with variadic templates. ------------- PR: https://git.openjdk.java.net/jdk/pull/2046 From sjohanss at openjdk.java.net Tue Jan 12 15:16:04 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Tue, 12 Jan 2021 15:16:04 GMT Subject: [jdk16] RFR: 8258985: Parallel WeakProcessor may use too few threads [v2] In-Reply-To: References: Message-ID: On Fri, 8 Jan 2021 12:48:14 GMT, Kim Barrett wrote: >> Please review this fix to the parallel WeakProcessor's computation of the >> number of worker threads to use. It was previously limited by the current >> value of active_workers(), whatever that happens to be. It should be >> limited by total_workers(), just as with the parallel ReferenceProcessor. >> (Both are subject to ReferencesPerThread.) >> >> Testing >> mach5 tier1 >> Some hand testing (Linux-x64) to verify the expected number of threads are >> being used. >> >> Note: That hand testing suggests some further tuning of ReferencesPerThread >> might be in order. With the current default of 1000, I often saw in testing >> that some threads were started late enough that no work was left for them. >> I'll file a separate RFE for that. > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > fix doc comments about number of threads used Marked as reviewed by sjohanss (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk16/pull/75 From kbarrett at openjdk.java.net Tue Jan 12 17:16:55 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 12 Jan 2021 17:16:55 GMT Subject: RFR: 8074101: Add verification that all tasks are actually claimed during roots processing In-Reply-To: References: Message-ID: On Tue, 12 Jan 2021 15:02:27 GMT, Kim Barrett wrote: >> The first commit removes some obsolete enum items, while the second commit adds the verification logic. Commit 2 introduces some "empty" task claims for the verification logic, explicitly marked in the comments. >> >> Test: hotspot_gc > > src/hotspot/share/gc/g1/g1RootProcessor.cpp line 66: > >> 64: // already processed in java roots. >> 65: _process_strong_tasks.try_claim_task(G1RP_PS_CodeCache_oops_do); >> 66: #endif > > Rather than these fake claims, consider something like this: > > template > void all_tasks_completed(uint nworkers, Ts... tags) { > // Type-check more_skipped are all of the same type as first_skipped. > T0 typed_skipped[] = { first_skipped, more_skipped... }; > uint skipped[] = { static_cast(tags)... }; > all_tasks_completed_impl(nworkers, skipped, ARRAY_SIZE(skipped)); > } > > void all_tasks_completed(uint nworkers) { > all_tasks_completed_impl(nworkers, nullptr, 0); > } > > Usage: > > all_tasks_completed(n_workers(), > G1RP_PS_CodeCache_oops_do, > G1RP_PS_refProcessor_oops_do) > > all_tasks_completed_impl can check that all tasks have been claimed except > the skipped ones, which have not been claimed. > > There might be better ways to write the variadic all_tasks_completed. It's > been a while since I've done anything with variadic templates. Here's a better version of the variadic `all_tasks_completed` template...>::value)> void all_tasks_completed(uint n_threads, T0 first_skipped, Ts... more_skipped) { static_assert(std::is_convertible::value, "not convertible"); uint skipped[] = { static_cast(first_skipped), static_cast(more_skipped)... }; all_tasks_completed_impl(n_threads, skipped, ARRAY_SIZE(skipped)); } `Conjunction` is in metaprogramming/logical.hpp. ------------- PR: https://git.openjdk.java.net/jdk/pull/2046 From ayang at openjdk.java.net Tue Jan 12 18:46:15 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 12 Jan 2021 18:46:15 GMT Subject: RFR: 8074101: Add verification that all tasks are actually claimed during roots processing [v2] In-Reply-To: References: Message-ID: On Tue, 12 Jan 2021 17:14:10 GMT, Kim Barrett wrote: >> src/hotspot/share/gc/g1/g1RootProcessor.cpp line 66: >> >>> 64: // already processed in java roots. >>> 65: _process_strong_tasks.try_claim_task(G1RP_PS_CodeCache_oops_do); >>> 66: #endif >> >> Rather than these fake claims, consider something like this: >> >> template >> void all_tasks_completed(uint nworkers, Ts... tags) { >> // Type-check more_skipped are all of the same type as first_skipped. >> T0 typed_skipped[] = { first_skipped, more_skipped... }; >> uint skipped[] = { static_cast(tags)... }; >> all_tasks_completed_impl(nworkers, skipped, ARRAY_SIZE(skipped)); >> } >> >> void all_tasks_completed(uint nworkers) { >> all_tasks_completed_impl(nworkers, nullptr, 0); >> } >> >> Usage: >> >> all_tasks_completed(n_workers(), >> G1RP_PS_CodeCache_oops_do, >> G1RP_PS_refProcessor_oops_do) >> >> all_tasks_completed_impl can check that all tasks have been claimed except >> the skipped ones, which have not been claimed. >> >> There might be better ways to write the variadic all_tasks_completed. It's >> been a while since I've done anything with variadic templates. > > Here's a better version of the variadic `all_tasks_completed` > > template ENABLE_IF(Conjunction...>::value)> > void all_tasks_completed(uint n_threads, T0 first_skipped, Ts... more_skipped) { > static_assert(std::is_convertible::value, "not convertible"); > uint skipped[] = { static_cast(first_skipped), static_cast(more_skipped)... }; > all_tasks_completed_impl(n_threads, skipped, ARRAY_SIZE(skipped)); > } > `Conjunction` is in metaprogramming/logical.hpp. Thank you; updated as suggested. ------------- PR: https://git.openjdk.java.net/jdk/pull/2046 From ayang at openjdk.java.net Tue Jan 12 18:46:15 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 12 Jan 2021 18:46:15 GMT Subject: RFR: 8074101: Add verification that all tasks are actually claimed during roots processing [v2] In-Reply-To: References: Message-ID: > The first commit removes some obsolete enum items, while the second commit adds the verification logic. Commit 2 introduces some "empty" task claims for the verification logic, explicitly marked in the comments. > > Test: hotspot_gc Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2046/files - new: https://git.openjdk.java.net/jdk/pull/2046/files/4fc22b48..938dce1d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2046&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2046&range=00-01 Stats: 71 lines in 3 files changed: 35 ins; 25 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/2046.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2046/head:pull/2046 PR: https://git.openjdk.java.net/jdk/pull/2046 From ayang at openjdk.java.net Tue Jan 12 22:19:21 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 12 Jan 2021 22:19:21 GMT Subject: RFR: 8074101: Add verification that all tasks are actually claimed during roots processing [v3] In-Reply-To: References: Message-ID: > The first commit removes some obsolete enum items, while the second commit adds the verification logic. Commit 2 introduces some "empty" task claims for the verification logic, explicitly marked in the comments. > > Test: hotspot_gc Albert Mingkun Yang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2046/files - new: https://git.openjdk.java.net/jdk/pull/2046/files/938dce1d..aa2d853c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2046&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2046&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2046.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2046/head:pull/2046 PR: https://git.openjdk.java.net/jdk/pull/2046 From kbarrett at openjdk.java.net Wed Jan 13 07:17:57 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 13 Jan 2021 07:17:57 GMT Subject: [jdk16] RFR: 8258985: Parallel WeakProcessor may use too few threads [v2] In-Reply-To: References: Message-ID: On Fri, 8 Jan 2021 11:26:11 GMT, Thomas Schatzl wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> fix doc comments about number of threads used > > Lgtm although the comments in the documentation for both `static void weak_oops_do` method declarations mentions: > > // Parallel version. Uses ergo_workers(), active workers, and > // phase_time's max_threads to determine the number of threads to use. > which should be fixed. > > One option is to put something like "Uses max workers and the total number of weak references to determine the number of threads to use" as description for `ergo_workers` and remove the details in the description of the `weak_oops_do` descriptions, but just removing the mention of `active_workers` there could be fine too. Thanks @tschatzl , @albertnetymk , @kstefanj for your reviews. ------------- PR: https://git.openjdk.java.net/jdk16/pull/75 From shade at openjdk.java.net Wed Jan 13 07:41:59 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 13 Jan 2021 07:41:59 GMT Subject: RFR: 8259580: Shenandoah: uninitialized label in VerifyThreadGCState In-Reply-To: References: Message-ID: On Mon, 11 Jan 2021 20:54:28 GMT, Roman Kennke wrote: >> "label" is passed, but never hooked into the field. So instead of reporting a GC bug, Verifier would probably crash itself trying to read garbage memory. > > src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp line 606: > >> 604: private: >> 605: const char* const _label; >> 606: char const _expected; > > Why can't _expected not be const char* const too? Are we really messing with the string on the way? @rkennke Still good with this explanation? ------------- PR: https://git.openjdk.java.net/jdk/pull/2033 From kbarrett at openjdk.java.net Wed Jan 13 08:26:21 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 13 Jan 2021 08:26:21 GMT Subject: [jdk16] RFR: 8258985: Parallel WeakProcessor may use too few threads [v3] In-Reply-To: References: Message-ID: > Please review this fix to the parallel WeakProcessor's computation of the > number of worker threads to use. It was previously limited by the current > value of active_workers(), whatever that happens to be. It should be > limited by total_workers(), just as with the parallel ReferenceProcessor. > (Both are subject to ReferencesPerThread.) > > Testing > mach5 tier1 > Some hand testing (Linux-x64) to verify the expected number of threads are > being used. > > Note: That hand testing suggests some further tuning of ReferencesPerThread > might be in order. With the current default of 1000, I often saw in testing > that some threads were started late enough that no work was left for them. > I'll file a separate RFE for that. Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into weak_ergo_workers - fix doc comments about number of threads used - Use total workers rather than active ------------- Changes: - all: https://git.openjdk.java.net/jdk16/pull/75/files - new: https://git.openjdk.java.net/jdk16/pull/75/files/0ed088d8..e4bfea5c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk16&pr=75&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk16&pr=75&range=01-02 Stats: 4041 lines in 124 files changed: 1417 ins; 2424 del; 200 mod Patch: https://git.openjdk.java.net/jdk16/pull/75.diff Fetch: git fetch https://git.openjdk.java.net/jdk16 pull/75/head:pull/75 PR: https://git.openjdk.java.net/jdk16/pull/75 From kbarrett at openjdk.java.net Wed Jan 13 08:26:21 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 13 Jan 2021 08:26:21 GMT Subject: [jdk16] Integrated: 8258985: Parallel WeakProcessor may use too few threads In-Reply-To: References: Message-ID: On Fri, 1 Jan 2021 10:02:10 GMT, Kim Barrett wrote: > Please review this fix to the parallel WeakProcessor's computation of the > number of worker threads to use. It was previously limited by the current > value of active_workers(), whatever that happens to be. It should be > limited by total_workers(), just as with the parallel ReferenceProcessor. > (Both are subject to ReferencesPerThread.) > > Testing > mach5 tier1 > Some hand testing (Linux-x64) to verify the expected number of threads are > being used. > > Note: That hand testing suggests some further tuning of ReferencesPerThread > might be in order. With the current default of 1000, I often saw in testing > that some threads were started late enough that no work was left for them. > I'll file a separate RFE for that. This pull request has now been integrated. Changeset: efc36be5 Author: Kim Barrett URL: https://git.openjdk.java.net/jdk16/commit/efc36be5 Stats: 11 lines in 2 files changed: 5 ins; 0 del; 6 mod 8258985: Parallel WeakProcessor may use too few threads Use total workers rather than active. Reviewed-by: tschatzl, ayang, sjohanss ------------- PR: https://git.openjdk.java.net/jdk16/pull/75 From kbarrett at openjdk.java.net Wed Jan 13 08:40:59 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 13 Jan 2021 08:40:59 GMT Subject: RFR: 8074101: Add verification that all tasks are actually claimed during roots processing [v3] In-Reply-To: References: Message-ID: On Tue, 12 Jan 2021 22:19:21 GMT, Albert Mingkun Yang wrote: >> The first commit removes some obsolete enum items, while the second commit adds the verification logic. Commit 2 introduces some "empty" task claims for the verification logic, explicitly marked in the comments. >> >> Test: hotspot_gc > > Albert Mingkun Yang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. Changes requested by kbarrett (Reviewer). src/hotspot/share/gc/shared/workgroup.cpp line 384: > 382: } > 383: assert(is_skipped, "%d not claimed.", i); > 384: } Can also do a separate loop over `skipped` to verify `_tasks[skipped[i]] == 0` (with appropriate bounds checking). src/hotspot/share/gc/shared/workgroup.hpp line 336: > 334: // > 335: // n_threads - Number of threads executing the sub-tasks. > 336: // followed by vararg skipped tasks The description comment should be augmented to describe the optional skipped values. src/hotspot/share/gc/g1/g1RootProcessor.cpp line 109: > 107: // refProcessor is not needed since we are inside a safe point > 108: _process_strong_tasks.all_tasks_completed(n_workers(), > 109: G1RP_PS_CodeCache_oops_do, G1RP_PS_refProcessor_oops_do); When parameters or arguments are on multiple lines, we usually align all the arguments, and usually one per line. There are some other similar cases elsewhere in this change that I didn't redundantly comment. src/hotspot/share/gc/shared/workgroup.hpp line 29: > 27: > 28: #include "memory/allocation.hpp" > 29: #include "metaprogramming/logical.hpp" Should also add metaprogramming/enableIf.hpp, to avoid implicit include dependency. ------------- PR: https://git.openjdk.java.net/jdk/pull/2046 From ayang at openjdk.java.net Wed Jan 13 09:11:59 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Wed, 13 Jan 2021 09:11:59 GMT Subject: RFR: 8074101: Add verification that all tasks are actually claimed during roots processing [v3] In-Reply-To: References: Message-ID: <705T3CG8S9Wo9-CGBRAKXwjWOzp33r0VRlCsajnm9B8=.b9335e75-7827-4284-bcbb-382b28301604@github.com> On Wed, 13 Jan 2021 08:37:24 GMT, Kim Barrett wrote: >> Albert Mingkun Yang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. > > src/hotspot/share/gc/shared/workgroup.hpp line 29: > >> 27: >> 28: #include "memory/allocation.hpp" >> 29: #include "metaprogramming/logical.hpp" > > Should also add metaprogramming/enableIf.hpp, to avoid implicit include dependency. Should I include both `enableIf.hpp` and `logical.hpp`, or only `enableIf.hpp` since it includes `logical.hpp` already? ------------- PR: https://git.openjdk.java.net/jdk/pull/2046 From kbarrett at openjdk.java.net Wed Jan 13 09:21:57 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 13 Jan 2021 09:21:57 GMT Subject: RFR: 8074101: Add verification that all tasks are actually claimed during roots processing [v3] In-Reply-To: <705T3CG8S9Wo9-CGBRAKXwjWOzp33r0VRlCsajnm9B8=.b9335e75-7827-4284-bcbb-382b28301604@github.com> References: <705T3CG8S9Wo9-CGBRAKXwjWOzp33r0VRlCsajnm9B8=.b9335e75-7827-4284-bcbb-382b28301604@github.com> Message-ID: On Wed, 13 Jan 2021 09:09:25 GMT, Albert Mingkun Yang wrote: >> src/hotspot/share/gc/shared/workgroup.hpp line 29: >> >>> 27: >>> 28: #include "memory/allocation.hpp" >>> 29: #include "metaprogramming/logical.hpp" >> >> Should also add metaprogramming/enableIf.hpp, to avoid implicit include dependency. > > Should I include both `enableIf.hpp` and `logical.hpp`, or only `enableIf.hpp` since it includes `logical.hpp` already? We don't have any tooling support, but there has been a trend away from relying on implicit includes, because they lead to breakages far away from refactorings that change includes. ------------- PR: https://git.openjdk.java.net/jdk/pull/2046 From ayang at openjdk.java.net Wed Jan 13 10:20:12 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Wed, 13 Jan 2021 10:20:12 GMT Subject: RFR: 8074101: Add verification that all tasks are actually claimed during roots processing [v3] In-Reply-To: References: Message-ID: On Wed, 13 Jan 2021 08:38:25 GMT, Kim Barrett wrote: >> Albert Mingkun Yang has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. > > Changes requested by kbarrett (Reviewer). Addressed all suggestions in the revision. ------------- PR: https://git.openjdk.java.net/jdk/pull/2046 From ayang at openjdk.java.net Wed Jan 13 10:20:12 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Wed, 13 Jan 2021 10:20:12 GMT Subject: RFR: 8074101: Add verification that all tasks are actually claimed during roots processing [v4] In-Reply-To: References: Message-ID: > The first commit removes some obsolete enum items, while the second commit adds the verification logic. Commit 2 introduces some "empty" task claims for the verification logic, explicitly marked in the comments. > > Test: hotspot_gc Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2046/files - new: https://git.openjdk.java.net/jdk/pull/2046/files/aa2d853c..4d147817 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2046&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2046&range=02-03 Stats: 21 lines in 3 files changed: 12 ins; 2 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/2046.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2046/head:pull/2046 PR: https://git.openjdk.java.net/jdk/pull/2046 From ayang at openjdk.java.net Wed Jan 13 10:20:13 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Wed, 13 Jan 2021 10:20:13 GMT Subject: RFR: 8074101: Add verification that all tasks are actually claimed during roots processing [v3] In-Reply-To: References: <705T3CG8S9Wo9-CGBRAKXwjWOzp33r0VRlCsajnm9B8=.b9335e75-7827-4284-bcbb-382b28301604@github.com> Message-ID: On Wed, 13 Jan 2021 09:19:17 GMT, Kim Barrett wrote: >> Should I include both `enableIf.hpp` and `logical.hpp`, or only `enableIf.hpp` since it includes `logical.hpp` already? > > We don't have any tooling support, but there has been a trend away from relying on implicit includes, because they lead to breakages far away from refactorings that change includes. Got it; including both then. ------------- PR: https://git.openjdk.java.net/jdk/pull/2046 From kbarrett at openjdk.java.net Wed Jan 13 10:35:00 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 13 Jan 2021 10:35:00 GMT Subject: RFR: 8074101: Add verification that all tasks are actually claimed during roots processing [v4] In-Reply-To: References: Message-ID: On Wed, 13 Jan 2021 10:20:12 GMT, Albert Mingkun Yang wrote: >> The first commit removes some obsolete enum items, while the second commit adds the verification logic. Commit 2 introduces some "empty" task claims for the verification logic, explicitly marked in the comments. >> >> Test: hotspot_gc > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Marked as reviewed by kbarrett (Reviewer). src/hotspot/share/gc/shared/workgroup.hpp line 331: > 329: > 330: // The calling thread asserts that it has attempted to claim all the tasks > 331: // that it will try to claim. Tasks that is meant to be skipped must be s/is meant/are meant/ src/hotspot/share/gc/shared/workgroup.hpp line 332: > 330: // The calling thread asserts that it has attempted to claim all the tasks > 331: // that it will try to claim. Tasks that is meant to be skipped must be > 332: // explicitly passed as extra arguments using the variadic version below. I would drop "using the variadic version below". I think of this as a function that takes some optional "these are expected to be skipped" task designators, and the non-variadic overload is just an implementation detail to handle the base case of there being none of those. ------------- PR: https://git.openjdk.java.net/jdk/pull/2046 From ayang at openjdk.java.net Wed Jan 13 11:19:12 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Wed, 13 Jan 2021 11:19:12 GMT Subject: RFR: 8074101: Add verification that all tasks are actually claimed during roots processing [v5] In-Reply-To: References: Message-ID: > The first commit removes some obsolete enum items, while the second commit adds the verification logic. Commit 2 introduces some "empty" task claims for the verification logic, explicitly marked in the comments. > > Test: hotspot_gc Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2046/files - new: https://git.openjdk.java.net/jdk/pull/2046/files/4d147817..15892bd4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2046&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2046&range=03-04 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/2046.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2046/head:pull/2046 PR: https://git.openjdk.java.net/jdk/pull/2046 From ayang at openjdk.java.net Wed Jan 13 11:19:12 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Wed, 13 Jan 2021 11:19:12 GMT Subject: RFR: 8074101: Add verification that all tasks are actually claimed during roots processing [v4] In-Reply-To: References: Message-ID: On Wed, 13 Jan 2021 10:32:12 GMT, Kim Barrett wrote: >> Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > Marked as reviewed by kbarrett (Reviewer). Addressed suggestions on the comments. ------------- PR: https://git.openjdk.java.net/jdk/pull/2046 From rkennke at openjdk.java.net Wed Jan 13 11:52:56 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 13 Jan 2021 11:52:56 GMT Subject: RFR: 8259580: Shenandoah: uninitialized label in VerifyThreadGCState In-Reply-To: References: Message-ID: On Mon, 11 Jan 2021 19:02:55 GMT, Aleksey Shipilev wrote: > "label" is passed, but never hooked into the field. So instead of reporting a GC bug, Verifier would probably crash itself trying to read garbage memory. Looks good to me! ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2033 From rkennke at openjdk.java.net Wed Jan 13 11:52:58 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 13 Jan 2021 11:52:58 GMT Subject: RFR: 8259580: Shenandoah: uninitialized label in VerifyThreadGCState In-Reply-To: References: Message-ID: <4DOMCo7tkldCumfiRMEgQj6fuIgvjN6MiyzlHU3SDGw=.a2b3a652-9ff2-4830-adab-2b19c00d77fe@github.com> On Wed, 13 Jan 2021 07:39:06 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp line 606: >> >>> 604: private: >>> 605: const char* const _label; >>> 606: char const _expected; >> >> Why can't _expected not be const char* const too? Are we really messing with the string on the way? > > @rkennke Still good with this explanation? Yes, thank you! Stupid me overlooked the missing * :-) ------------- PR: https://git.openjdk.java.net/jdk/pull/2033 From shade at openjdk.java.net Wed Jan 13 11:58:56 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 13 Jan 2021 11:58:56 GMT Subject: Integrated: 8259580: Shenandoah: uninitialized label in VerifyThreadGCState In-Reply-To: References: Message-ID: On Mon, 11 Jan 2021 19:02:55 GMT, Aleksey Shipilev wrote: > "label" is passed, but never hooked into the field. So instead of reporting a GC bug, Verifier would probably crash itself trying to read garbage memory. This pull request has now been integrated. Changeset: 2e124544 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/2e124544 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod 8259580: Shenandoah: uninitialized label in VerifyThreadGCState Reviewed-by: zgu, rkennke ------------- PR: https://git.openjdk.java.net/jdk/pull/2033 From zgu at openjdk.java.net Wed Jan 13 13:09:18 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 13 Jan 2021 13:09:18 GMT Subject: RFR: 8255019: Shenandoah: Split STW and concurrent mark into separate classes [v29] In-Reply-To: References: Message-ID: > This is the first part of refactoring, that aims to isolate three Shenandoah GC modes (concurrent, degenerated and full gc). > > Shenandoah started with two GC modes, concurrent and full gc, with minimal shared code, mainly in mark phase. After introducing degenerated GC, it shared quite large portion of code with concurrent GC, with the concept that degenerated GC can simply pick up remaining work of concurrent GC in STW mode. > > It was not a big problem at that time, since concurrent GC also processed roots STW. Since Shenandoah gradually moved root processing into concurrent phase, code started to diverge, that made code hard to reason and maintain. > > First step, I would like to split STW and concurrent mark, so that: > 1) Code has to special case for STW and concurrent mark. > 2) STW mark does not need to rendezvous workers between root mark and the rest of mark > 3) STW mark does not need to activate SATB barrier and drain SATB buffers. > 4) STW mark does not need to remark some of roots. > > The patch mainly just shuffles code. Creates a base class ShenandoahMark, and moved shared code (from current shenandoahConcurrentMark) into this base class. I did 'git mv shenandoahConcurrentMark.inline.hpp shenandoahMark.inline.hpp, but git does not seem to reflect that. > > A few changes: > 1) Moved task queue set from ShenandoahConcurrentMark to ShenandoahHeap. ShenandoahMark and its subclasses are stateless. Instead, mark states are maintained in task queue, mark bitmap and SATB buffers, so that they can be created on demand. > 2) Split ShenandoahConcurrentRootScanner template to ShenandoahConcurrentRootScanner and ShenandoahSTWRootScanner > 3) Split code inside op_final_mark code into finish_mark and prepare_evacuation helper functions. > 4) Made ShenandoahMarkCompact stack allocated (as well as ShenandoahConcurrentGC and ShenandoahDegeneratedGC in upcoming refactoring) Zhengyu Gu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 36 commits: - Merge branch 'master' into JDK-8255019-sh-mark - Fixes based on shade's comments - Merge branch 'master' into JDK-8255019-sh-mark - Silent MacOSX build - @shade's comments - Merge - Update copyright years - Merge - Merge branch 'master' into JDK-8255019-sh-mark - Concurrent mark does not expect forwarded objects - ... and 26 more: https://git.openjdk.java.net/jdk/compare/ce945120...4fbbfd27 ------------- Changes: https://git.openjdk.java.net/jdk/pull/1009/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1009&range=28 Stats: 1971 lines in 21 files changed: 1072 ins; 756 del; 143 mod Patch: https://git.openjdk.java.net/jdk/pull/1009.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1009/head:pull/1009 PR: https://git.openjdk.java.net/jdk/pull/1009 From zgu at openjdk.java.net Wed Jan 13 13:09:19 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 13 Jan 2021 13:09:19 GMT Subject: RFR: 8255019: Shenandoah: Split STW and concurrent mark into separate classes [v27] In-Reply-To: References: Message-ID: On Mon, 11 Jan 2021 17:42:08 GMT, Aleksey Shipilev wrote: >> Zhengyu Gu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 34 commits: >> >> - Merge branch 'master' into JDK-8255019-sh-mark >> - Silent MacOSX build >> - @shade's comments >> - Merge >> - Update copyright years >> - Merge >> - Merge branch 'master' into JDK-8255019-sh-mark >> - Concurrent mark does not expect forwarded objects >> - Merge branch 'master' into JDK-8255019-sh-mark >> - Merge branch 'master' into JDK-8255019-sh-mark >> - ... and 24 more: https://git.openjdk.java.net/jdk/compare/4d3d5991...a6540b99 > > Changes requested by shade (Reviewer). @shade, did I address all your concerns? Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/1009 From shade at openjdk.java.net Wed Jan 13 15:06:11 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 13 Jan 2021 15:06:11 GMT Subject: RFR: 8255019: Shenandoah: Split STW and concurrent mark into separate classes [v27] In-Reply-To: References: Message-ID: On Mon, 11 Jan 2021 17:38:41 GMT, Aleksey Shipilev wrote: >> Zhengyu Gu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 34 commits: >> >> - Merge branch 'master' into JDK-8255019-sh-mark >> - Silent MacOSX build >> - @shade's comments >> - Merge >> - Update copyright years >> - Merge >> - Merge branch 'master' into JDK-8255019-sh-mark >> - Concurrent mark does not expect forwarded objects >> - Merge branch 'master' into JDK-8255019-sh-mark >> - Merge branch 'master' into JDK-8255019-sh-mark >> - ... and 24 more: https://git.openjdk.java.net/jdk/compare/4d3d5991...a6540b99 > > src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 2212: > >> 2210: if (point == _degenerated_mark) { >> 2211: finish_mark(); >> 2212: } > > So if we don't call `finish_mark`, do we ever call `set_concurrent_mark_in_progress(false);` and `mark_complete_marking_context();`? This was not answered, and I don't see relevant follow-up changes. Is this a bug? ------------- PR: https://git.openjdk.java.net/jdk/pull/1009 From rkennke at openjdk.java.net Wed Jan 13 15:09:05 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 13 Jan 2021 15:09:05 GMT Subject: RFR: 8259377: Shenandoah: Enhance weak reference processing timing tracking [v2] In-Reply-To: References: Message-ID: On Thu, 7 Jan 2021 19:56:16 GMT, Zhengyu Gu wrote: >> Please review this enhancement for tracking weak references processing. >> >> Test: >> - [x] hotspot_gc_shenandoah > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > Fix indentations Looks good to me! GH title and Jira title disagree. ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1979 From zgu at openjdk.java.net Wed Jan 13 15:22:04 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 13 Jan 2021 15:22:04 GMT Subject: RFR: 8255019: Shenandoah: Split STW and concurrent mark into separate classes [v27] In-Reply-To: References: Message-ID: On Wed, 13 Jan 2021 15:03:24 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 2212: >> >>> 2210: if (point == _degenerated_mark) { >>> 2211: finish_mark(); >>> 2212: } >> >> So if we don't call `finish_mark`, do we ever call `set_concurrent_mark_in_progress(false);` and `mark_complete_marking_context();`? > > This was not answered, and I don't see relevant follow-up changes. Is this a bug? No, I don't think so. For fall through case, STWMark sets both flags. For degen case, if we pass _degnerated_mark point, op_final_mark calls finish_mark to set them. ------------- PR: https://git.openjdk.java.net/jdk/pull/1009 From zgu at openjdk.java.net Wed Jan 13 19:04:01 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 13 Jan 2021 19:04:01 GMT Subject: Integrated: 8259377: Shenandoah: Enhance weak reference processing time tracking In-Reply-To: References: Message-ID: On Thu, 7 Jan 2021 17:58:19 GMT, Zhengyu Gu wrote: > Please review this enhancement for tracking weak references processing. > > Test: > - [x] hotspot_gc_shenandoah This pull request has now been integrated. Changeset: ccdf171d Author: Zhengyu Gu URL: https://git.openjdk.java.net/jdk/commit/ccdf171d Stats: 41 lines in 5 files changed: 12 ins; 2 del; 27 mod 8259377: Shenandoah: Enhance weak reference processing time tracking Reviewed-by: rkennke ------------- PR: https://git.openjdk.java.net/jdk/pull/1979 From shade at openjdk.java.net Thu Jan 14 09:26:09 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 14 Jan 2021 09:26:09 GMT Subject: RFR: 8255019: Shenandoah: Split STW and concurrent mark into separate classes [v29] In-Reply-To: References: Message-ID: On Wed, 13 Jan 2021 13:09:18 GMT, Zhengyu Gu wrote: >> This is the first part of refactoring, that aims to isolate three Shenandoah GC modes (concurrent, degenerated and full gc). >> >> Shenandoah started with two GC modes, concurrent and full gc, with minimal shared code, mainly in mark phase. After introducing degenerated GC, it shared quite large portion of code with concurrent GC, with the concept that degenerated GC can simply pick up remaining work of concurrent GC in STW mode. >> >> It was not a big problem at that time, since concurrent GC also processed roots STW. Since Shenandoah gradually moved root processing into concurrent phase, code started to diverge, that made code hard to reason and maintain. >> >> First step, I would like to split STW and concurrent mark, so that: >> 1) Code has to special case for STW and concurrent mark. >> 2) STW mark does not need to rendezvous workers between root mark and the rest of mark >> 3) STW mark does not need to activate SATB barrier and drain SATB buffers. >> 4) STW mark does not need to remark some of roots. >> >> The patch mainly just shuffles code. Creates a base class ShenandoahMark, and moved shared code (from current shenandoahConcurrentMark) into this base class. I did 'git mv shenandoahConcurrentMark.inline.hpp shenandoahMark.inline.hpp, but git does not seem to reflect that. >> >> A few changes: >> 1) Moved task queue set from ShenandoahConcurrentMark to ShenandoahHeap. ShenandoahMark and its subclasses are stateless. Instead, mark states are maintained in task queue, mark bitmap and SATB buffers, so that they can be created on demand. >> 2) Split ShenandoahConcurrentRootScanner template to ShenandoahConcurrentRootScanner and ShenandoahSTWRootScanner >> 3) Split code inside op_final_mark code into finish_mark and prepare_evacuation helper functions. >> 4) Made ShenandoahMarkCompact stack allocated (as well as ShenandoahConcurrentGC and ShenandoahDegeneratedGC in upcoming refactoring) > > Zhengyu Gu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 36 commits: > > - Merge branch 'master' into JDK-8255019-sh-mark > - Fixes based on shade's comments > - Merge branch 'master' into JDK-8255019-sh-mark > - Silent MacOSX build > - @shade's comments > - Merge > - Update copyright years > - Merge > - Merge branch 'master' into JDK-8255019-sh-mark > - Concurrent mark does not expect forwarded objects > - ... and 26 more: https://git.openjdk.java.net/jdk/compare/ce945120...4fbbfd27 All right, fine, let's do it. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1009 From zgu at openjdk.java.net Thu Jan 14 14:43:15 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 14 Jan 2021 14:43:15 GMT Subject: RFR: 8255019: Shenandoah: Split STW and concurrent mark into separate classes [v30] In-Reply-To: References: Message-ID: > This is the first part of refactoring, that aims to isolate three Shenandoah GC modes (concurrent, degenerated and full gc). > > Shenandoah started with two GC modes, concurrent and full gc, with minimal shared code, mainly in mark phase. After introducing degenerated GC, it shared quite large portion of code with concurrent GC, with the concept that degenerated GC can simply pick up remaining work of concurrent GC in STW mode. > > It was not a big problem at that time, since concurrent GC also processed roots STW. Since Shenandoah gradually moved root processing into concurrent phase, code started to diverge, that made code hard to reason and maintain. > > First step, I would like to split STW and concurrent mark, so that: > 1) Code has to special case for STW and concurrent mark. > 2) STW mark does not need to rendezvous workers between root mark and the rest of mark > 3) STW mark does not need to activate SATB barrier and drain SATB buffers. > 4) STW mark does not need to remark some of roots. > > The patch mainly just shuffles code. Creates a base class ShenandoahMark, and moved shared code (from current shenandoahConcurrentMark) into this base class. I did 'git mv shenandoahConcurrentMark.inline.hpp shenandoahMark.inline.hpp, but git does not seem to reflect that. > > A few changes: > 1) Moved task queue set from ShenandoahConcurrentMark to ShenandoahHeap. ShenandoahMark and its subclasses are stateless. Instead, mark states are maintained in task queue, mark bitmap and SATB buffers, so that they can be created on demand. > 2) Split ShenandoahConcurrentRootScanner template to ShenandoahConcurrentRootScanner and ShenandoahSTWRootScanner > 3) Split code inside op_final_mark code into finish_mark and prepare_evacuation helper functions. > 4) Made ShenandoahMarkCompact stack allocated (as well as ShenandoahConcurrentGC and ShenandoahDegeneratedGC in upcoming refactoring) Zhengyu Gu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 37 commits: - Merge - Merge branch 'master' into JDK-8255019-sh-mark - Fixes based on shade's comments - Merge branch 'master' into JDK-8255019-sh-mark - Silent MacOSX build - @shade's comments - Merge - Update copyright years - Merge - Merge branch 'master' into JDK-8255019-sh-mark - ... and 27 more: https://git.openjdk.java.net/jdk/compare/ff3e6e46...8a6d94b8 ------------- Changes: https://git.openjdk.java.net/jdk/pull/1009/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1009&range=29 Stats: 1965 lines in 21 files changed: 1069 ins; 757 del; 139 mod Patch: https://git.openjdk.java.net/jdk/pull/1009.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1009/head:pull/1009 PR: https://git.openjdk.java.net/jdk/pull/1009 From zgu at openjdk.java.net Thu Jan 14 17:46:04 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 14 Jan 2021 17:46:04 GMT Subject: Integrated: 8255019: Shenandoah: Split STW and concurrent mark into separate classes In-Reply-To: References: Message-ID: On Mon, 2 Nov 2020 14:45:03 GMT, Zhengyu Gu wrote: > This is the first part of refactoring, that aims to isolate three Shenandoah GC modes (concurrent, degenerated and full gc). > > Shenandoah started with two GC modes, concurrent and full gc, with minimal shared code, mainly in mark phase. After introducing degenerated GC, it shared quite large portion of code with concurrent GC, with the concept that degenerated GC can simply pick up remaining work of concurrent GC in STW mode. > > It was not a big problem at that time, since concurrent GC also processed roots STW. Since Shenandoah gradually moved root processing into concurrent phase, code started to diverge, that made code hard to reason and maintain. > > First step, I would like to split STW and concurrent mark, so that: > 1) Code has to special case for STW and concurrent mark. > 2) STW mark does not need to rendezvous workers between root mark and the rest of mark > 3) STW mark does not need to activate SATB barrier and drain SATB buffers. > 4) STW mark does not need to remark some of roots. > > The patch mainly just shuffles code. Creates a base class ShenandoahMark, and moved shared code (from current shenandoahConcurrentMark) into this base class. I did 'git mv shenandoahConcurrentMark.inline.hpp shenandoahMark.inline.hpp, but git does not seem to reflect that. > > A few changes: > 1) Moved task queue set from ShenandoahConcurrentMark to ShenandoahHeap. ShenandoahMark and its subclasses are stateless. Instead, mark states are maintained in task queue, mark bitmap and SATB buffers, so that they can be created on demand. > 2) Split ShenandoahConcurrentRootScanner template to ShenandoahConcurrentRootScanner and ShenandoahSTWRootScanner > 3) Split code inside op_final_mark code into finish_mark and prepare_evacuation helper functions. > 4) Made ShenandoahMarkCompact stack allocated (as well as ShenandoahConcurrentGC and ShenandoahDegeneratedGC in upcoming refactoring) This pull request has now been integrated. Changeset: da6bcf96 Author: Zhengyu Gu URL: https://git.openjdk.java.net/jdk/commit/da6bcf96 Stats: 1965 lines in 21 files changed: 1069 ins; 757 del; 139 mod 8255019: Shenandoah: Split STW and concurrent mark into separate classes Reviewed-by: rkennke, shade ------------- PR: https://git.openjdk.java.net/jdk/pull/1009 From zgu at openjdk.java.net Thu Jan 14 21:19:11 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 14 Jan 2021 21:19:11 GMT Subject: RFR: 8255765: Shenandoah: Isolate concurrent, degenerated and full GC Message-ID: <9sUK197ZPOSEPLEDl8zKSqby-KMi9FryXnoctOdWmMc=.7b678df1-1c51-4bbf-b1a3-d8a8167a927d@github.com> The purpose of this patch is to isolate concurrent, degenerated and full gc implementation, so that, makes each GC implementation more straightforward and clean, and improves readability. Current implementation emphasis code sharing, e.g. degenerated GC reuses concurrent GC's ops. It was not a problem in the beginning, when they actually behave similarly. Since concurrent GC moved root processing into concurrent phases, code started to diverge, we started to put bandages to make the shared ops work for both concurrent and degenerated GC, that made code hard to read and error prone. The patch breaks up GCs into 3 (mainly just concurrent and degenerated GC, as full GC already standalone) easy to identify and understand classes (ShenandoahConcurrentGC, ShenandoahDegeneratedGC and ShenandoahMarkCompactGC), subclasses of ShenandoahGC class. The three GCs still keep vmop/entry/op paradigm, but encapsulate GC control flow inside their own classes, as ShenandoahMarkCompact GC already does. So that, concurrent and degenerated GC no longer share ops and op implementations no longer need to consider other GC modes, which results in simplifying implementation and improving readability. Code sharing is achieved via helper methods provided by ShenandoahHeap. ------------- Commit messages: - Remove cached heap in ShenandoahGC - Merge - Merge branch 'fix_phase_timings' into JDK-8255765-isolate-gcs - Merge - Merge - Merge - Merge - Removed trailing whitespaces - Merge - Merge branch 'JDK-8255019-sh-mark' into fix_phase_timings - ... and 89 more: https://git.openjdk.java.net/jdk/compare/a6b2162f...d9805040 Changes: https://git.openjdk.java.net/jdk/pull/1964/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1964&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255765 Stats: 3232 lines in 21 files changed: 1815 ins; 1281 del; 136 mod Patch: https://git.openjdk.java.net/jdk/pull/1964.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1964/head:pull/1964 PR: https://git.openjdk.java.net/jdk/pull/1964 From ddong at openjdk.java.net Fri Jan 15 03:17:03 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Fri, 15 Jan 2021 03:17:03 GMT Subject: RFR: 8259808: Add JFR event to detect GC locker stall In-Reply-To: References: Message-ID: On Fri, 15 Jan 2021 02:42:20 GMT, Denghui Dong wrote: > GC locker will stall the operation of GC, resulting in some Java threads can not continue to run until GC locker is released, thus affecting the response time of the application. Add a JFR event to report this information is helpful to detect this issue. > > For the test purpose, I add two Whitebox methods to lock/unlock critical. GC locker will stall the operation of GC, resulting in some Java threads can not continue to run until GC locker is released, thus affecting the response time of the application. Add a JFR event to report this information is helpful to detect this issue. For the test purpose, I add two Whitebox methods to lock/unlock critical. ------------- PR: https://git.openjdk.java.net/jdk/pull/2088 From sjohanss at openjdk.java.net Fri Jan 15 11:24:09 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Fri, 15 Jan 2021 11:24:09 GMT Subject: RFR: JDK-8256155: os::Linux Populate all large_page_sizes, select smallest page size in reserve_memory_special_huge_tlbfs* [v15] In-Reply-To: References: Message-ID: On Tue, 15 Dec 2020 18:48:05 GMT, Marcus G K Williams wrote: >> When using LargePageSizeInBytes=1G, os::Linux::reserve_memory_special_huge_tlbfs* cannot select large pages smaller than 1G. Code heap usually uses less than 1G, so currently the code precludes code heap from using >> Large pages in this circumstance and when os::Linux::reserve_memory_special_huge_tlbfs* is called page sizes fall back to Linux::page_size() (usually 4k). >> >> This change allows the above use case by populating all large_page_sizes present in /sys/kernel/mm/hugepages in _page_sizes upon calling os::Linux::setup_large_page_size(). >> >> In os::Linux::reserve_memory_special_huge_tlbfs* we then select the largest large page size available in _page_sizes that is smaller than bytes being reserved. > > Marcus G K Williams has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: > > - Merge branch 'master' into update_hlp > - Merge branch 'master' into update_hlp > - Remove extraneous ' from warning > > Signed-off-by: Marcus G K Williams > - Merge branch 'master' into update_hlp > - Merge branch 'master' into update_hlp > - Merge branch 'master' into update_hlp > - Fix os::large_page_size() in last update > > Signed-off-by: Marcus G K Williams > - Ivan W. Requested Changes > > Removed os::Linux::select_large_page_size and > use os::page_size_for_region instead > > Removed Linux::find_large_page_size and use > register_large_page_sizes. Streamlined > Linux::setup_large_page_size > > Signed-off-by: Marcus G K Williams > - Fix space format, use Linux:: for local func. > > Signed-off-by: Marcus G K Williams > - Merge branch 'update_hlp' of github.com:mgkwill/jdk into update_hlp > - ... and 13 more: https://git.openjdk.java.net/jdk/compare/da2415fe...d73e7a4c Back from the holidays and actually looking at our use of large pages from a other perspecive as well. I think the approach here has been simplified a lot from the first suggestion and I like it. Just a few small additional comments. src/hotspot/os/linux/os_linux.cpp line 3750: > 3748: } > 3749: } > 3750: closedir(dir); It would be nice to add some logging here using the `pagesize` tag. The new PageSizes class has a `print_on()` that we could use. I'm thinking something like: LogTarget(Info, pagesize) lt; if (lt.is_enabled()) { LogStream ls(lt); ls.print("Available page sizes: "); _page_sizes.print_on(&ls); } src/hotspot/os/linux/os_linux.cpp line 4013: > 4011: assert(UseLargePages && UseHugeTLBFS, "only for Huge TLBFS large pages"); > 4012: assert(is_aligned(bytes, large_page_size), "Unaligned size"); > 4013: assert(is_aligned(req_addr, large_page_size), "Unaligned address"); Adding an assert here that `large_page_size` is larger than os::vm_page_size (small page size) to ensure we actually get a large page size from `page_size_for_region_aligned()`. Otherwise the passed in a size wasn't correctly aligned. src/hotspot/os/linux/os_linux.cpp line 4047: > 4045: // that is smaller than size_t bytes > 4046: size_t large_page_size = os::page_size_for_region_aligned(bytes, 1); > 4047: Adding the same assert as suggested above here. ------------- Changes requested by sjohanss (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1153 From kbarrett at openjdk.java.net Fri Jan 15 14:22:09 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 15 Jan 2021 14:22:09 GMT Subject: RFR: 8259776: Remove ParallelGC non-CAS oldgen allocation Message-ID: Please review this change to ParallelGC oldgen allocation. There were two variants, one using CAS on the _top member of the mutable space, the other requiring locking or other forms of mutual exclusion. We don't need both variants. The non-CAS variant is only used in a few places, where the cost of an extra CAS doesn't matter. What does matter is that having two variants, which must not be used concurrently, makes the code larger, more complex, and harder to verify. (This change came out of analyzing JDK-8259271. No problems were found (or expected), so this change is not expected to impact that bug. But because of the two variants, the possibility of unexpected interact needed to be examined.) The non-CAS allocation support has been removed, with PSOldGen::allocate now implemented using the CAS-based allocation. The cas_ prefix naming convention is retained for the internals for clarity. While looking at this, noticed and removed a couple of lingering references to the class AdjoiningGenerations, which no longer exists after JDK-8243146. Testing: mach5 tier1-5 ------------- Commit messages: - remove non-CAS allocate Changes: https://git.openjdk.java.net/jdk/pull/2101/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2101&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8259776 Stats: 133 lines in 9 files changed: 1 ins; 119 del; 13 mod Patch: https://git.openjdk.java.net/jdk/pull/2101.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2101/head:pull/2101 PR: https://git.openjdk.java.net/jdk/pull/2101 From pliden at openjdk.java.net Fri Jan 15 14:22:19 2021 From: pliden at openjdk.java.net (Per Liden) Date: Fri, 15 Jan 2021 14:22:19 GMT Subject: [jdk16] RFR: 8259765: ZGC: Handle incorrect processor id reported by the operating system Message-ID: Some environments (e.g. OpenVZ containers) incorrectly report a logical processor id that is higher than the number of processors available. This is problematic, for example, when implementing CPU-local data structures, where the processor id is used to index into an array of length processor_count(). We've received crash reports from Jelastic (a Virtuozzo/OpenVZ user) where they run into this problem. We can workaround the problem in the JVM, until the underlying problem is fixed. Without this workaround ZGC can't be used in this environment. This is currently a ZGC-specific issue, since ZGC is currently the only part of HotSpot that is using CPU-local data structures, but that could change in the future. Just to clarify. In a Virtuozzo/OpenZV environment, it seems the underlying problem is not necessarily that sched_getcpu() returns an incorrect processor id, but rather that sysconf(_SC_NPROCESSORS_CONF) returns a too low number. Either way, sched_getcpu() and syconf(_SC_NPROCESSORS_CONF) seems to have different views of the world. This is not an issue in container environments such as Docker. This patch works around this problem by letting os::processor_id() on Linux detect incorrect processor ids, and convert them to processor id 0. As mentioned in the comment in the code, this is safe, but not optimal for performance if the system actually has more than one processor. There's also a warning printed the first time this happen. Testing: Manual testing with various fake/incorrect values returned from sched_getcpu(). ------------- Commit messages: - 8259765: ZGC: Handle incorrect processor id reported by the operating system Changes: https://git.openjdk.java.net/jdk16/pull/124/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk16&pr=124&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8259765 Stats: 37 lines in 1 file changed: 29 ins; 2 del; 6 mod Patch: https://git.openjdk.java.net/jdk16/pull/124.diff Fetch: git fetch https://git.openjdk.java.net/jdk16 pull/124/head:pull/124 PR: https://git.openjdk.java.net/jdk16/pull/124 From ayang at openjdk.java.net Fri Jan 15 14:26:10 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Fri, 15 Jan 2021 14:26:10 GMT Subject: [jdk16] RFR: 8259765: ZGC: Handle incorrect processor id reported by the operating system In-Reply-To: References: Message-ID: On Fri, 15 Jan 2021 13:48:26 GMT, Per Liden wrote: > Some environments (e.g. OpenVZ containers) incorrectly report a logical processor id that is higher than the number of processors available. This is problematic, for example, when implementing CPU-local data structures, where the processor id is used to index into an array of length processor_count(). > > We've received crash reports from Jelastic (a Virtuozzo/OpenVZ user) where they run into this problem. We can workaround the problem in the JVM, until the underlying problem is fixed. Without this workaround ZGC can't be used in this environment. > > This is currently a ZGC-specific issue, since ZGC is currently the only part of HotSpot that is using CPU-local data structures, but that could change in the future. > > Just to clarify. In a Virtuozzo/OpenZV environment, it seems the underlying problem is not necessarily that sched_getcpu() returns an incorrect processor id, but rather that sysconf(_SC_NPROCESSORS_CONF) returns a too low number. Either way, sched_getcpu() and syconf(_SC_NPROCESSORS_CONF) seems to have different views of the world. This is not an issue in container environments such as Docker. > > This patch works around this problem by letting os::processor_id() on Linux detect incorrect processor ids, and convert them to processor id 0. As mentioned in the comment in the code, this is safe, but not optimal for performance if the system actually has more than one processor. There's also a warning printed the first time this happen. > > Testing: Manual testing with various fake/incorrect values returned from sched_getcpu(). Marked as reviewed by ayang (Author). src/hotspot/os/linux/os_linux.cpp line 4749: > 4747: } > 4748: > 4749: static volatile int warn_invalid_processor_id = 1; Maybe moving this var into the function, since it's only used inside it. ------------- PR: https://git.openjdk.java.net/jdk16/pull/124 From tschatzl at openjdk.java.net Fri Jan 15 14:42:08 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 15 Jan 2021 14:42:08 GMT Subject: RFR: 8074101: Add verification that all tasks are actually claimed during roots processing [v5] In-Reply-To: References: Message-ID: On Wed, 13 Jan 2021 11:19:12 GMT, Albert Mingkun Yang wrote: >> The first commit removes some obsolete enum items, while the second commit adds the verification logic. Commit 2 introduces some "empty" task claims for the verification logic, explicitly marked in the comments. >> >> Test: hotspot_gc > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Lgmt. src/hotspot/share/gc/shared/workgroup.cpp line 377: > 375: // all non-skipped tasks are claimed > 376: for (uint i = 0; i < _n_tasks; ++i) { > 377: if (_tasks[i] == 0) { pre-existing: This could be fixed in a separate CR: _tasks could be an array of bool instead of (u)int. Using an int is a historic artifact of not having a good Atomics library. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2046 From eosterlund at openjdk.java.net Fri Jan 15 15:12:14 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 15 Jan 2021 15:12:14 GMT Subject: [jdk16] RFR: 8259765: ZGC: Handle incorrect processor id reported by the operating system In-Reply-To: References: Message-ID: On Fri, 15 Jan 2021 13:48:26 GMT, Per Liden wrote: > Some environments (e.g. OpenVZ containers) incorrectly report a logical processor id that is higher than the number of processors available. This is problematic, for example, when implementing CPU-local data structures, where the processor id is used to index into an array of length processor_count(). > > We've received crash reports from Jelastic (a Virtuozzo/OpenVZ user) where they run into this problem. We can workaround the problem in the JVM, until the underlying problem is fixed. Without this workaround ZGC can't be used in this environment. > > This is currently a ZGC-specific issue, since ZGC is currently the only part of HotSpot that is using CPU-local data structures, but that could change in the future. > > Just to clarify. In a Virtuozzo/OpenZV environment, it seems the underlying problem is not necessarily that sched_getcpu() returns an incorrect processor id, but rather that sysconf(_SC_NPROCESSORS_CONF) returns a too low number. Either way, sched_getcpu() and syconf(_SC_NPROCESSORS_CONF) seems to have different views of the world. This is not an issue in container environments such as Docker. > > This patch works around this problem by letting os::processor_id() on Linux detect incorrect processor ids, and convert them to processor id 0. As mentioned in the comment in the code, this is safe, but not optimal for performance if the system actually has more than one processor. There's also a warning printed the first time this happen. > > Testing: Manual testing with various fake/incorrect values returned from sched_getcpu(). Changes requested by eosterlund (Reviewer). src/hotspot/os/linux/os_linux.cpp line 4784: > 4782: "(got processor id %d, valid processor id range is 0-%d)", > 4783: id, processor_count() - 1); > 4784: log_warning(os)("Falling back so assuming processor id is 0. " s/so/to/ src/hotspot/os/linux/os_linux.cpp line 4769: > 4767: const int id = Linux::sched_getcpu(); > 4768: > 4769: if (id >= 0 && id < processor_count()) { Do we really need to check if the returned processor ID is negative? That seems a whole new level of environment screwup to me. ------------- PR: https://git.openjdk.java.net/jdk16/pull/124 From tschatzl at openjdk.java.net Fri Jan 15 15:09:03 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 15 Jan 2021 15:09:03 GMT Subject: RFR: 8259776: Remove ParallelGC non-CAS oldgen allocation In-Reply-To: References: Message-ID: <7x5IeSN0ALisnW5QXO2Jxyqbcv6Ihe1yf8utqWXsPHA=.ca49404f-ad51-49f0-bfb8-546eb818eab7@github.com> On Fri, 15 Jan 2021 14:16:50 GMT, Kim Barrett wrote: > Please review this change to ParallelGC oldgen allocation. There were two > variants, one using CAS on the _top member of the mutable space, the other > requiring locking or other forms of mutual exclusion. > > We don't need both variants. The non-CAS variant is only used in a few > places, where the cost of an extra CAS doesn't matter. What does matter is > that having two variants, which must not be used concurrently, makes the > code larger, more complex, and harder to verify. (This change came out of > analyzing JDK-8259271. No problems were found (or expected), so this change > is not expected to impact that bug. But because of the two variants, the > possibility of unexpected interact needed to be examined.) > > The non-CAS allocation support has been removed, with PSOldGen::allocate now > implemented using the CAS-based allocation. The cas_ prefix naming > convention is retained for the internals for clarity. > > While looking at this, noticed and removed a couple of lingering references > to the class AdjoiningGenerations, which no longer exists after JDK-8243146. > > Testing: > mach5 tier1-5 Changes requested by tschatzl (Reviewer). src/hotspot/share/gc/parallel/psOldGen.hpp line 137: > 135: void resize(size_t desired_free_space); > 136: > 137: HeapWord* allocate(size_t word_size) { Before this change there has been a small semantical difference between `allocate()` and `par_allocate()`. The former also updated the size policy, which seem to be missing now in `ParallelScavengeHeap::mem_allocate_old_gen()` and `ParallelScavengeHeap::failed_mem_allocate()`. ------------- PR: https://git.openjdk.java.net/jdk/pull/2101 From github.com+168222+mgkwill at openjdk.java.net Fri Jan 15 18:32:07 2021 From: github.com+168222+mgkwill at openjdk.java.net (Marcus G K Williams) Date: Fri, 15 Jan 2021 18:32:07 GMT Subject: RFR: JDK-8256155: os::Linux Populate all large_page_sizes, select smallest page size in reserve_memory_special_huge_tlbfs* [v15] In-Reply-To: References: Message-ID: On Fri, 15 Jan 2021 11:21:11 GMT, Stefan Johansson wrote: >> Marcus G K Williams has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: >> >> - Merge branch 'master' into update_hlp >> - Merge branch 'master' into update_hlp >> - Remove extraneous ' from warning >> >> Signed-off-by: Marcus G K Williams >> - Merge branch 'master' into update_hlp >> - Merge branch 'master' into update_hlp >> - Merge branch 'master' into update_hlp >> - Fix os::large_page_size() in last update >> >> Signed-off-by: Marcus G K Williams >> - Ivan W. Requested Changes >> >> Removed os::Linux::select_large_page_size and >> use os::page_size_for_region instead >> >> Removed Linux::find_large_page_size and use >> register_large_page_sizes. Streamlined >> Linux::setup_large_page_size >> >> Signed-off-by: Marcus G K Williams >> - Fix space format, use Linux:: for local func. >> >> Signed-off-by: Marcus G K Williams >> - Merge branch 'update_hlp' of github.com:mgkwill/jdk into update_hlp >> - ... and 13 more: https://git.openjdk.java.net/jdk/compare/da2415fe...d73e7a4c > > Back from the holidays and actually looking at our use of large pages from a other perspecive as well. I think the approach here has been simplified a lot from the first suggestion and I like it. Just a few small additional comments. Thanks @kstefanj . I'm taking a look at your suggestions and will have an update soon. I'm also working through the testing needs suggested by @tstuefe. It appears he's added gtest runs with largepages option in https://github.com/openjdk/jdk/pull/1763 - Thanks @tstuefe ------------- PR: https://git.openjdk.java.net/jdk/pull/1153 From ayang at openjdk.java.net Fri Jan 15 19:32:06 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Fri, 15 Jan 2021 19:32:06 GMT Subject: RFR: 8074101: Add verification that all tasks are actually claimed during roots processing [v5] In-Reply-To: References: Message-ID: On Fri, 15 Jan 2021 14:37:45 GMT, Thomas Schatzl wrote: >> Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > src/hotspot/share/gc/shared/workgroup.cpp line 377: > >> 375: // all non-skipped tasks are claimed >> 376: for (uint i = 0; i < _n_tasks; ++i) { >> 377: if (_tasks[i] == 0) { > > pre-existing: This could be fixed in a separate CR: _tasks could be an array of bool instead of (u)int. Using an int is a historic artifact of not having a good Atomics library. Created a ticket (JDK-8259851) for it; will start working on that after this is merged. Thanks for the suggestion. ------------- PR: https://git.openjdk.java.net/jdk/pull/2046 From sjohanss at openjdk.java.net Fri Jan 15 20:16:12 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Fri, 15 Jan 2021 20:16:12 GMT Subject: RFR: JDK-8256155: os::Linux Populate all large_page_sizes, select smallest page size in reserve_memory_special_huge_tlbfs* [v15] In-Reply-To: References: Message-ID: <3BHnoVKWC7hs9ftPYJGiILcSiYDC58nneNLIrwmKYA4=.d5c63333-9b77-4ec5-aab6-719d288e3f2f@github.com> On Tue, 15 Dec 2020 18:48:05 GMT, Marcus G K Williams wrote: >> When using LargePageSizeInBytes=1G, os::Linux::reserve_memory_special_huge_tlbfs* cannot select large pages smaller than 1G. Code heap usually uses less than 1G, so currently the code precludes code heap from using >> Large pages in this circumstance and when os::Linux::reserve_memory_special_huge_tlbfs* is called page sizes fall back to Linux::page_size() (usually 4k). >> >> This change allows the above use case by populating all large_page_sizes present in /sys/kernel/mm/hugepages in _page_sizes upon calling os::Linux::setup_large_page_size(). >> >> In os::Linux::reserve_memory_special_huge_tlbfs* we then select the largest large page size available in _page_sizes that is smaller than bytes being reserved. > > Marcus G K Williams has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: > > - Merge branch 'master' into update_hlp > - Merge branch 'master' into update_hlp > - Remove extraneous ' from warning > > Signed-off-by: Marcus G K Williams > - Merge branch 'master' into update_hlp > - Merge branch 'master' into update_hlp > - Merge branch 'master' into update_hlp > - Fix os::large_page_size() in last update > > Signed-off-by: Marcus G K Williams > - Ivan W. Requested Changes > > Removed os::Linux::select_large_page_size and > use os::page_size_for_region instead > > Removed Linux::find_large_page_size and use > register_large_page_sizes. Streamlined > Linux::setup_large_page_size > > Signed-off-by: Marcus G K Williams > - Fix space format, use Linux:: for local func. > > Signed-off-by: Marcus G K Williams > - Merge branch 'update_hlp' of github.com:mgkwill/jdk into update_hlp > - ... and 13 more: https://git.openjdk.java.net/jdk/compare/da2415fe...d73e7a4c Did some more testing with the code. I'm using Parallel for testing becuase G1 does a better job aligning sizes and avoiding some problems. I found that this change has a problem with mapping using both small and large pages (`reserve_memory_special_huge_tlbfs_mixed()`). I'm currently investigating if we can remove these type of mixed-mappings, and instead make sure we only use large pages when properly aligned, so in the future we might be able get rid of some code in this area. For know see my comments below. src/hotspot/os/linux/os_linux.cpp line 4134: > 4132: // Select large_page_size from _page_sizes > 4133: // that is smaller than size_t bytes > 4134: size_t large_page_size = os::page_size_for_region_aligned(bytes, 1); This is partly what I'm looking at from a slightly different direction. And my current thinking is that we should get rid of all mappings that are not properly aligned when using large pages. But that is something for a different PR. I need to look more at this next week, but for this to work as before this call needs to use the unaligned version: `os::page_size_for_region_unaligned(...)` otherwise we will here get a small page size in many cases and that can not be used with the code doing the reservations below. src/hotspot/os/linux/os_linux.cpp line 4046: > 4044: // Select large_page_size from _page_sizes > 4045: // that is smaller than size_t bytes > 4046: size_t large_page_size = os::page_size_for_region_aligned(bytes, 1); This one also needs to use `os::page_size_for_region_unaligned(...)` since we know we have a size that needs both small and large pages. ------------- Changes requested by sjohanss (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1153 From pliden at openjdk.java.net Fri Jan 15 22:05:12 2021 From: pliden at openjdk.java.net (Per Liden) Date: Fri, 15 Jan 2021 22:05:12 GMT Subject: [jdk16] RFR: 8259765: ZGC: Handle incorrect processor id reported by the operating system In-Reply-To: References: Message-ID: On Fri, 15 Jan 2021 15:07:30 GMT, Erik ?sterlund wrote: >> Some environments (e.g. OpenVZ containers) incorrectly report a logical processor id that is higher than the number of processors available. This is problematic, for example, when implementing CPU-local data structures, where the processor id is used to index into an array of length processor_count(). >> >> We've received crash reports from Jelastic (a Virtuozzo/OpenVZ user) where they run into this problem. We can workaround the problem in the JVM, until the underlying problem is fixed. Without this workaround ZGC can't be used in this environment. >> >> This is currently a ZGC-specific issue, since ZGC is currently the only part of HotSpot that is using CPU-local data structures, but that could change in the future. >> >> Just to clarify. In a Virtuozzo/OpenZV environment, it seems the underlying problem is not necessarily that sched_getcpu() returns an incorrect processor id, but rather that sysconf(_SC_NPROCESSORS_CONF) returns a too low number. Either way, sched_getcpu() and syconf(_SC_NPROCESSORS_CONF) seems to have different views of the world. This is not an issue in container environments such as Docker. >> >> This patch works around this problem by letting os::processor_id() on Linux detect incorrect processor ids, and convert them to processor id 0. As mentioned in the comment in the code, this is safe, but not optimal for performance if the system actually has more than one processor. There's also a warning printed the first time this happen. >> >> Testing: Manual testing with various fake/incorrect values returned from sched_getcpu(). > > src/hotspot/os/linux/os_linux.cpp line 4784: > >> 4782: "(got processor id %d, valid processor id range is 0-%d)", >> 4783: id, processor_count() - 1); >> 4784: log_warning(os)("Falling back so assuming processor id is 0. " > > s/so/to/ Will fix! > src/hotspot/os/linux/os_linux.cpp line 4769: > >> 4767: const int id = Linux::sched_getcpu(); >> 4768: >> 4769: if (id >= 0 && id < processor_count()) { > > Do we really need to check if the returned processor ID is negative? That seems a whole new level of environment screwup to me. I'm thinking we should make this safe to call in all cases. God knows what a broken environment might return. ------------- PR: https://git.openjdk.java.net/jdk16/pull/124 From pliden at openjdk.java.net Fri Jan 15 22:05:11 2021 From: pliden at openjdk.java.net (Per Liden) Date: Fri, 15 Jan 2021 22:05:11 GMT Subject: [jdk16] RFR: 8259765: ZGC: Handle incorrect processor id reported by the operating system In-Reply-To: References: Message-ID: <4y_7I4sbLbG4BHe7xZi0Jzf4eykTF1ae7ngTukj1JkM=.d7fd2659-9c69-40fa-9e3d-8c5e0fe268b7@github.com> On Fri, 15 Jan 2021 14:22:42 GMT, Albert Mingkun Yang wrote: >> Some environments (e.g. OpenVZ containers) incorrectly report a logical processor id that is higher than the number of processors available. This is problematic, for example, when implementing CPU-local data structures, where the processor id is used to index into an array of length processor_count(). >> >> We've received crash reports from Jelastic (a Virtuozzo/OpenVZ user) where they run into this problem. We can workaround the problem in the JVM, until the underlying problem is fixed. Without this workaround ZGC can't be used in this environment. >> >> This is currently a ZGC-specific issue, since ZGC is currently the only part of HotSpot that is using CPU-local data structures, but that could change in the future. >> >> Just to clarify. In a Virtuozzo/OpenZV environment, it seems the underlying problem is not necessarily that sched_getcpu() returns an incorrect processor id, but rather that sysconf(_SC_NPROCESSORS_CONF) returns a too low number. Either way, sched_getcpu() and syconf(_SC_NPROCESSORS_CONF) seems to have different views of the world. This is not an issue in container environments such as Docker. >> >> This patch works around this problem by letting os::processor_id() on Linux detect incorrect processor ids, and convert them to processor id 0. As mentioned in the comment in the code, this is safe, but not optimal for performance if the system actually has more than one processor. There's also a warning printed the first time this happen. >> >> Testing: Manual testing with various fake/incorrect values returned from sched_getcpu(). > > src/hotspot/os/linux/os_linux.cpp line 4749: > >> 4747: } >> 4748: >> 4749: static volatile int warn_invalid_processor_id = 1; > > Maybe moving this var into the function, since it's only used inside it. Doing so will come with the cost of always having to run a pthread_once() in function entry. ------------- PR: https://git.openjdk.java.net/jdk16/pull/124 From stuefe at openjdk.java.net Sat Jan 16 06:02:15 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sat, 16 Jan 2021 06:02:15 GMT Subject: RFR: JDK-8256155: os::Linux Populate all large_page_sizes, select smallest page size in reserve_memory_special_huge_tlbfs* [v15] In-Reply-To: <3BHnoVKWC7hs9ftPYJGiILcSiYDC58nneNLIrwmKYA4=.d5c63333-9b77-4ec5-aab6-719d288e3f2f@github.com> References: <3BHnoVKWC7hs9ftPYJGiILcSiYDC58nneNLIrwmKYA4=.d5c63333-9b77-4ec5-aab6-719d288e3f2f@github.com> Message-ID: On Fri, 15 Jan 2021 20:13:42 GMT, Stefan Johansson wrote: >> Marcus G K Williams has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: >> >> - Merge branch 'master' into update_hlp >> - Merge branch 'master' into update_hlp >> - Remove extraneous ' from warning >> >> Signed-off-by: Marcus G K Williams >> - Merge branch 'master' into update_hlp >> - Merge branch 'master' into update_hlp >> - Merge branch 'master' into update_hlp >> - Fix os::large_page_size() in last update >> >> Signed-off-by: Marcus G K Williams >> - Ivan W. Requested Changes >> >> Removed os::Linux::select_large_page_size and >> use os::page_size_for_region instead >> >> Removed Linux::find_large_page_size and use >> register_large_page_sizes. Streamlined >> Linux::setup_large_page_size >> >> Signed-off-by: Marcus G K Williams >> - Fix space format, use Linux:: for local func. >> >> Signed-off-by: Marcus G K Williams >> - Merge branch 'update_hlp' of github.com:mgkwill/jdk into update_hlp >> - ... and 13 more: https://git.openjdk.java.net/jdk/compare/da2415fe...d73e7a4c > > Did some more testing with the code. I'm using Parallel for testing becuase G1 does a better job aligning sizes and avoiding some problems. > > I found that this change has a problem with mapping using both small and large pages (`reserve_memory_special_huge_tlbfs_mixed()`). I'm currently investigating if we can remove these type of mixed-mappings, and instead make sure we only use large pages when properly aligned, so in the future we might be able get rid of some code in this area. For know see my comments below. Since we are not shipping this with JDK16, I'm more relaxed now. This will have time to cook before JDK17 is shipped, which takes care of my third point (doing more tests). About the jtreg test. I originally wrote: >> one jtreg test to test that the VM comes up with -XX:+UseLargePages -XX:LargePageSizeInBytes=1G and allocates small-large-pages as expected. This is not only needed as a function proof but to prevent regressions when we reform the code (which will happen) Not sure if that was too vague. An easy way would be to add some tracing to the VM in the allocation path, eg with `log_info(os)(...)`, then in the test start a VM with `-XX:+UseLargePages -XX:LargePageSizeInBytes=1G -Xlog=os`, and scan its output. There are many tests which do this, for an easy example see e.g. runtime/os/TestUseCpuAllocPath.java. I'll take a closer look next week but will wait until Stefan had his go. ------------- PR: https://git.openjdk.java.net/jdk/pull/1153 From kbarrett at openjdk.java.net Sat Jan 16 11:59:30 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sat, 16 Jan 2021 11:59:30 GMT Subject: RFR: 8259776: Remove ParallelGC non-CAS oldgen allocation [v2] In-Reply-To: References: Message-ID: > Please review this change to ParallelGC oldgen allocation. There were two > variants, one using CAS on the _top member of the mutable space, the other > requiring locking or other forms of mutual exclusion. > > We don't need both variants. The non-CAS variant is only used in a few > places, where the cost of an extra CAS doesn't matter. What does matter is > that having two variants, which must not be used concurrently, makes the > code larger, more complex, and harder to verify. (This change came out of > analyzing JDK-8259271. No problems were found (or expected), so this change > is not expected to impact that bug. But because of the two variants, the > possibility of unexpected interact needed to be examined.) > > The non-CAS allocation support has been removed, with PSOldGen::allocate now > implemented using the CAS-based allocation. The cas_ prefix naming > convention is retained for the internals for clarity. > > While looking at this, noticed and removed a couple of lingering references > to the class AdjoiningGenerations, which no longer exists after JDK-8243146. > > Testing: > mach5 tier1-5 Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: record oldgen mutator allocations in size policy ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2101/files - new: https://git.openjdk.java.net/jdk/pull/2101/files/e93ea3d7..ed7e2b26 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2101&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2101&range=00-01 Stats: 18 lines in 3 files changed: 14 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/2101.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2101/head:pull/2101 PR: https://git.openjdk.java.net/jdk/pull/2101 From kbarrett at openjdk.java.net Sat Jan 16 11:59:31 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sat, 16 Jan 2021 11:59:31 GMT Subject: RFR: 8259776: Remove ParallelGC non-CAS oldgen allocation [v2] In-Reply-To: <7x5IeSN0ALisnW5QXO2Jxyqbcv6Ihe1yf8utqWXsPHA=.ca49404f-ad51-49f0-bfb8-546eb818eab7@github.com> References: <7x5IeSN0ALisnW5QXO2Jxyqbcv6Ihe1yf8utqWXsPHA=.ca49404f-ad51-49f0-bfb8-546eb818eab7@github.com> Message-ID: On Fri, 15 Jan 2021 15:04:17 GMT, Thomas Schatzl wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> record oldgen mutator allocations in size policy > > src/hotspot/share/gc/parallel/psOldGen.hpp line 137: > >> 135: void resize(size_t desired_free_space); >> 136: >> 137: HeapWord* allocate(size_t word_size) { > > Before this change there has been a small semantical difference between `allocate()` and `par_allocate()`. The former also updated the size policy, which seem to be missing now in `ParallelScavengeHeap::mem_allocate_old_gen()` and `ParallelScavengeHeap::failed_mem_allocate()`. Oops, I forgot about that and got overly enthusiastic about code deletion. Thanks for spotting it. ------------- PR: https://git.openjdk.java.net/jdk/pull/2101 From dholmes at openjdk.java.net Sat Jan 16 13:03:12 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Sat, 16 Jan 2021 13:03:12 GMT Subject: [jdk16] RFR: 8259765: ZGC: Handle incorrect processor id reported by the operating system In-Reply-To: References: Message-ID: On Fri, 15 Jan 2021 13:48:26 GMT, Per Liden wrote: > Some environments (e.g. OpenVZ containers) incorrectly report a logical processor id that is higher than the number of processors available. This is problematic, for example, when implementing CPU-local data structures, where the processor id is used to index into an array of length processor_count(). > > We've received crash reports from Jelastic (a Virtuozzo/OpenVZ user) where they run into this problem. We can workaround the problem in the JVM, until the underlying problem is fixed. Without this workaround ZGC can't be used in this environment. > > This is currently a ZGC-specific issue, since ZGC is currently the only part of HotSpot that is using CPU-local data structures, but that could change in the future. > > Just to clarify. In a Virtuozzo/OpenZV environment, it seems the underlying problem is not necessarily that sched_getcpu() returns an incorrect processor id, but rather that sysconf(_SC_NPROCESSORS_CONF) returns a too low number. Either way, sched_getcpu() and syconf(_SC_NPROCESSORS_CONF) seems to have different views of the world. This is not an issue in container environments such as Docker. > > This patch works around this problem by letting os::processor_id() on Linux detect incorrect processor ids, and convert them to processor id 0. As mentioned in the comment in the code, this is safe, but not optimal for performance if the system actually has more than one processor. There's also a warning printed the first time this happen. > > Testing: Manual testing with various fake/incorrect values returned from sched_getcpu(). So we have to penalize all correctly functioning users because of one broken environment? Can we not detect this broken environment at startup and inject a workaround then? Why is this an environment that is important enough that OpenJDK has to make changes to deal with a broken environment? Cheers, David ------------- PR: https://git.openjdk.java.net/jdk16/pull/124 From kbarrett at openjdk.java.net Sat Jan 16 15:40:31 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sat, 16 Jan 2021 15:40:31 GMT Subject: RFR: 8256814: WeakProcessorPhases may be redundant [v2] In-Reply-To: References: Message-ID: > Please review this change which eliminates the WeakProcessorPhase class. > > The OopStorageSet class is changed to provide scoped enums for the different > categories: StrongId, WeakId, and Id (for the union of strong and weak). > An accessor is provided for obtaining the storage corresponding to a > category value. > > Various other enumerator ranges, array sizes and indices, and iterations are > derived directly from the corresponding OopStorageSet category's enum range. > > Iteration over a category of enumerators can be done via EnumIterator. The > iteration over storage objects makes use of that enum iteration, rather than > having a bespoke implementation. Some use-cases need iteration of the > enumerators, with storage lookup from the enumerator; other use-cases just > need the storage objects. > > Testing: > mach5 tier1-6 > Local (linux-x64) hotspot:tier1 with -XX:+UseShenandoahGC Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - Merge branch 'master' into wpp4 - stefank review - Remove WeakProcessorPhase, adding scoped enum categories to OopStorageSet. ------------- Changes: https://git.openjdk.java.net/jdk/pull/1862/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1862&range=01 Stats: 1042 lines in 25 files changed: 400 ins; 465 del; 177 mod Patch: https://git.openjdk.java.net/jdk/pull/1862.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1862/head:pull/1862 PR: https://git.openjdk.java.net/jdk/pull/1862 From kim.barrett at oracle.com Sat Jan 16 15:42:44 2021 From: kim.barrett at oracle.com (Kim Barrett) Date: Sat, 16 Jan 2021 10:42:44 -0500 Subject: RFR: 8256814: WeakProcessorPhases may be redundant In-Reply-To: References: Message-ID: <33C0F891-C26A-4B41-B85A-84627676071D@oracle.com> > On Jan 12, 2021, at 5:12 AM, Stefan Karlsson wrote: > > On Tue, 22 Dec 2020 04:59:28 GMT, Kim Barrett wrote: > >> Please review this change which eliminates the WeakProcessorPhase class. >> >> The OopStorageSet class is changed to provide scoped enums for the different >> categories: StrongId, WeakId, and Id (for the union of strong and weak). >> An accessor is provided for obtaining the storage corresponding to a >> category value. >> >> Various other enumerator ranges, array sizes and indices, and iterations are >> derived directly from the corresponding OopStorageSet category's enum range. >> >> Iteration over a category of enumerators can be done via EnumIterator. The >> iteration over storage objects makes use of that enum iteration, rather than >> having a bespoke implementation. Some use-cases need iteration of the >> enumerators, with storage lookup from the enumerator; other use-cases just >> need the storage objects. >> >> Testing: >> mach5 tier1-6 >> Local (linux-x64) hotspot:tier1 with -XX:+UseShenandoahGC > > I think this looks good. I have a few comments that I would like to get addressed, but they are not blockers if you want to proceed with what you have. Thanks. > src/hotspot/share/gc/shared/oopStorageSetParState.hpp line 35: > >> 33: >> 34: // Base class for OopStorageSet{Strong,Weak}ParState. >> 35: template > > While reviewing this, it was not immediately obvious what T represent. EnumRange uses the name StorageId, maybe use the same here? T -> StorageId -- done. > src/hotspot/share/gc/shared/oopStorageSetParState.hpp line 52: > >> 50: >> 51: NONCOPYABLE(OopStorageSetParState); >> 52: }; > > We tend to put the member variables at the top of classes. I don't think ParState needs to be public, and this could be changed to: > [?] Having a public function whose type signature involves identifiers that can't be used by clients, particularly for the return type, is problematic. Personally, I intensely dislike the typical HotSpot ordering, but go along anyway if there's not a direct reason not to, as there is here. (The HotSpot ordering also happens to be contrary to every style guide I've ever seen discuss the subject, and those style guides give good reasons that are the basis of my dislike.) Options are (1) drop the type alias and write out the type, (2) have multiple public sections, (3) put the public stuff first. I prefer (3). Though I went with (2) in weakProcessorTimes.hpp, to reduce the code churn for this PR. > src/hotspot/share/gc/shared/oopStorageSetParState.hpp line 58: > >> 56: class OopStorageSetStrongParState >> 57: : public OopStorageSetParState >> 58: { > > We usually keep the `{` on the same line. We also usually put the base class on the same line as the class, but that made the lines longer than I like, hence the line break there. Having the open brace at the end of the base class line then makes the distinction between the base class part and the members more subtle than I liked; I think the brace placement I used helps with that. (All this is a long-winded way of saying that the formatting here is intentional, attempting to make the code easier to parse by eye.) > src/hotspot/share/gc/shared/oopStorageSetParState.hpp line 68: > >> 66: class OopStorageSetWeakParState >> 67: : public OopStorageSetParState >> 68: { > > Same comment as above. > > src/hotspot/share/gc/shared/oopStorageSetParState.inline.hpp line 36: > >> 34: >> 35: template >> 36: template > > Other places in the file uses `template <` so the usage of `template<` makes the code inconsistent. Yeah, sorry, I try to be consistent with nearby code but failed here; fixed. FWIW, HotSpot uses both, and there doesn't seem to be a consensus in the wider C++ community. The C++ Standard uses both! My default has always been no-space, and having a space there bugs me a little, but not enough to argue over. > src/hotspot/share/gc/shared/weakProcessorTimes.hpp line 37: > >> 35: class WeakProcessorTimes { >> 36: public: >> 37: using StorageId = OopStorageSet::WeakId; > > Could be private. Here too I think public functions whose type signatures involve identifiers that can't be used by clients is problematic. But I renamed it from StorageId to WeakId; looking at it again, the more generic name seems counterproductive here. > test/hotspot/gtest/gc/shared/test_oopStorageSet.cpp line 48: > >> 46: >> 47: template >> 48: static void check_iterator(OopStorageSet::Iterator it, > > All the functions you changed are named `_iterator` and tested OopStorageSet::Iterator. Now the name is the same, but instead they test the Range facility. I think these functions should be renamed. Alternatively, we keep the tests for the OopStorageSet::Iterator and create a new set for the Range? What's being tested is iteration, so "iterator" => "iteration? throughout seems better. > Marked as reviewed by stefank (Reviewer). Thanks for reviewing. > PR: https://git.openjdk.java.net/jdk/pull/1862 From kbarrett at openjdk.java.net Sun Jan 17 13:21:57 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sun, 17 Jan 2021 13:21:57 GMT Subject: RFR: 8258742: Move PtrQueue reset to PtrQueueSet subclasses Message-ID: Please remove this change to the PtrQueue hierarchy, changing queue reset from an intrinsic operation of the queue to an operation the qset performs on a queue. This is another piece of the refactoring being done under JDK-8258251, separated out for easier review. After the refactoring of queue reset, PtrQueue::is_empty and PtrQueue::size are no longer used, so are removed. Further, PtrQueue::{set_}index_in_bytes are removed, directly using _index instead. A less obvious part of the change is in the G1 remark task and Shenandoah final marking task. The threads walk performed by these no longer directly processes the partial per-thread SATB buffers. Instead they just flush the queues for later completed buffer processing. Testing: mach5 tier1 local (linux-x64) hotspot:tier1 with -XX:+UseShenandoahGC ------------- Commit messages: - update shenandoah - remove pq index_in_bytes - remove pq size - remove pq is_empty - move reset Changes: https://git.openjdk.java.net/jdk/pull/2115/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2115&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8258742 Stats: 89 lines in 9 files changed: 17 ins; 45 del; 27 mod Patch: https://git.openjdk.java.net/jdk/pull/2115.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2115/head:pull/2115 PR: https://git.openjdk.java.net/jdk/pull/2115 From ddong at openjdk.java.net Mon Jan 18 02:53:37 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Mon, 18 Jan 2021 02:53:37 GMT Subject: RFR: 8259808: Add JFR event to detect GC locker stall In-Reply-To: References: Message-ID: On Fri, 15 Jan 2021 03:14:10 GMT, Denghui Dong wrote: >> GC locker will stall the operation of GC, resulting in some Java threads can not continue to run until GC locker is released, thus affecting the response time of the application. Add a JFR event to report this information is helpful to detect this issue. >> >> For the test purpose, I add two Whitebox methods to lock/unlock critical. > > GC locker will stall the operation of GC, resulting in some Java threads can not continue to run until GC locker is released, thus affecting the response time of the application. Add a JFR event to report this information is helpful to detect this issue. > > For the test purpose, I add two Whitebox methods to lock/unlock critical. Greetings, please help review this patch:) Thanks ------------- PR: https://git.openjdk.java.net/jdk/pull/2088 From iklam at openjdk.java.net Mon Jan 18 05:56:48 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Mon, 18 Jan 2021 05:56:48 GMT Subject: RFR: 8259870: zBarrier.inline.hpp should not include javaClasses.hpp Message-ID: zBarrier.inline.hpp is a popular header file (it's included by about 430 out of ~1000 hotspot .o files). It includes javaClasses.hpp only for the inline function verify_on_weak(), which is used only for assert purposes in debug builds. javaClasses.hpp is large and in turn pulls in other large header files. If we move verify_on_weak() into zBarrier.cpp and stop including javaClasses.hpp in zBarrier.inline.hpp, building hotspot is about 0.5% faster. The number of .o files that include javaClasses.hpp is reduced from 459 to 175. Testing: Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. ------------- Commit messages: - 8259870: zBarrier.inline.hpp should not include javaClasses.hpp Changes: https://git.openjdk.java.net/jdk/pull/2120/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2120&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8259870 Stats: 33 lines in 3 files changed: 17 ins; 13 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/2120.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2120/head:pull/2120 PR: https://git.openjdk.java.net/jdk/pull/2120 From stefank at openjdk.java.net Mon Jan 18 07:55:39 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Mon, 18 Jan 2021 07:55:39 GMT Subject: RFR: 8259870: zBarrier.inline.hpp should not include javaClasses.hpp In-Reply-To: References: Message-ID: On Sun, 17 Jan 2021 23:53:53 GMT, Ioi Lam wrote: > zBarrier.inline.hpp is a popular header file (it's included by about 430 out of ~1000 hotspot .o files). It includes javaClasses.hpp only for the inline function verify_on_weak(), which is used only for assert purposes in debug builds. > > javaClasses.hpp is large and in turn pulls in other large header files. If we move verify_on_weak() into zBarrier.cpp and stop including javaClasses.hpp in zBarrier.inline.hpp, building hotspot is about 0.5% faster. The number of .o files that include javaClasses.hpp is reduced from 459 to 175. > > Testing: > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. Marked as reviewed by stefank (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2120 From eosterlund at openjdk.java.net Mon Jan 18 08:08:45 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 18 Jan 2021 08:08:45 GMT Subject: [jdk16] RFR: 8259765: ZGC: Handle incorrect processor id reported by the operating system In-Reply-To: References: Message-ID: On Fri, 15 Jan 2021 13:48:26 GMT, Per Liden wrote: > Some environments (e.g. OpenVZ containers) incorrectly report a logical processor id that is higher than the number of processors available. This is problematic, for example, when implementing CPU-local data structures, where the processor id is used to index into an array of length processor_count(). > > We've received crash reports from Jelastic (a Virtuozzo/OpenVZ user) where they run into this problem. We can workaround the problem in the JVM, until the underlying problem is fixed. Without this workaround ZGC can't be used in this environment. > > This is currently a ZGC-specific issue, since ZGC is currently the only part of HotSpot that is using CPU-local data structures, but that could change in the future. > > Just to clarify. In a Virtuozzo/OpenZV environment, it seems the underlying problem is not necessarily that sched_getcpu() returns an incorrect processor id, but rather that sysconf(_SC_NPROCESSORS_CONF) returns a too low number. Either way, sched_getcpu() and syconf(_SC_NPROCESSORS_CONF) seems to have different views of the world. This is not an issue in container environments such as Docker. > > This patch works around this problem by letting os::processor_id() on Linux detect incorrect processor ids, and convert them to processor id 0. As mentioned in the comment in the code, this is safe, but not optimal for performance if the system actually has more than one processor. There's also a warning printed the first time this happen. > > Testing: Manual testing with various fake/incorrect values returned from sched_getcpu(). Marked as reviewed by eosterlund (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk16/pull/124 From stefank at openjdk.java.net Mon Jan 18 08:12:47 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Mon, 18 Jan 2021 08:12:47 GMT Subject: RFR: 8259808: Add JFR event to detect GC locker stall In-Reply-To: References: Message-ID: On Fri, 15 Jan 2021 02:42:20 GMT, Denghui Dong wrote: > GC locker will stall the operation of GC, resulting in some Java threads can not continue to run until GC locker is released, thus affecting the response time of the application. Add a JFR event to report this information is helpful to detect this issue. > > For the test purpose, I add two Whitebox methods to lock/unlock critical. Not a review, but a few comments about what probably needs to be cleaned up before a proper review starts. src/hotspot/share/gc/shared/gcLocker.cpp line 186: > 184: _stall_count = 0; > 185: } > 186: #endif This adds a fair amount of noise and hides the actual GCLocker logic, IMHO. Could you somehow encapsulate this code and the other INCLUDE_JFR above into a class and make single-line calls perform these functions? src/hotspot/share/jfr/metadata/metadata.xml line 1080: > 1078: > 1079: > 1080: You add this between two Shenandoah events. Could you put it somewhere where it's not splitting up a group of events? src/hotspot/share/prims/whitebox.cpp line 44: > 42: #include "gc/shared/genArguments.hpp" > 43: #include "gc/shared/genCollectedHeap.hpp" > 44: #include "gc/shared/gcLocker.inline.hpp" Sort includes. src/hotspot/share/gc/shared/gcLocker.cpp line 112: > 110: #if INCLUDE_JFR > 111: if (EventGCLocker::is_enabled()) { > 112: _needs_gc_start_timestamp = JfrTicks::now(); Do you really need to use JfrTicks instead of Ticks here? If not, could you remove all references and includes of JfrTicks? We usually use pass in Ticks when we send JFR events. src/hotspot/share/utilities/ticks.hpp line 241: > 239: friend class GCTimerTest; > 240: friend class CompilerEvent; > 241: friend class GCLocker; I don't think this should be needed. ------------- Changes requested by stefank (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2088 From ayang at openjdk.java.net Mon Jan 18 08:31:47 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Mon, 18 Jan 2021 08:31:47 GMT Subject: RFR: 8074101: Add verification that all tasks are actually claimed during roots processing [v5] In-Reply-To: References: Message-ID: <1EvfoP-M7eGHkfH79oSV6DBMGtunMjUNTSoFEYn5ysI=.33cb0699-6647-478d-b34d-c729c9da3e30@github.com> On Fri, 15 Jan 2021 14:39:06 GMT, Thomas Schatzl wrote: >> Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > Lgmt. Thanks for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/2046 From tschatzl at openjdk.java.net Mon Jan 18 08:32:43 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 18 Jan 2021 08:32:43 GMT Subject: RFR: 8259870: zBarrier.inline.hpp should not include javaClasses.hpp In-Reply-To: References: Message-ID: On Sun, 17 Jan 2021 23:53:53 GMT, Ioi Lam wrote: > zBarrier.inline.hpp is a popular header file (it's included by about 430 out of ~1000 hotspot .o files). It includes javaClasses.hpp only for the inline function verify_on_weak(), which is used only for assert purposes in debug builds. > > javaClasses.hpp is large and in turn pulls in other large header files. If we move verify_on_weak() into zBarrier.cpp and stop including javaClasses.hpp in zBarrier.inline.hpp, building hotspot is about 0.5% faster. The number of .o files that include javaClasses.hpp is reduced from 459 to 175. > > Testing: > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. Marked as reviewed by tschatzl (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2120 From ayang at openjdk.java.net Mon Jan 18 08:36:38 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Mon, 18 Jan 2021 08:36:38 GMT Subject: Integrated: 8074101: Add verification that all tasks are actually claimed during roots processing In-Reply-To: References: Message-ID: On Tue, 12 Jan 2021 11:13:29 GMT, Albert Mingkun Yang wrote: > The first commit removes some obsolete enum items, while the second commit adds the verification logic. Commit 2 introduces some "empty" task claims for the verification logic, explicitly marked in the comments. > > Test: hotspot_gc This pull request has now been integrated. Changeset: e93f08e2 Author: Albert Mingkun Yang Committer: Thomas Schatzl URL: https://git.openjdk.java.net/jdk/commit/e93f08e2 Stats: 95 lines in 8 files changed: 51 ins; 22 del; 22 mod 8074101: Add verification that all tasks are actually claimed during roots processing Reviewed-by: kbarrett, tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/2046 From tschatzl at openjdk.java.net Mon Jan 18 10:23:49 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 18 Jan 2021 10:23:49 GMT Subject: RFR: 8259776: Remove ParallelGC non-CAS oldgen allocation [v2] In-Reply-To: References: Message-ID: On Sat, 16 Jan 2021 11:59:30 GMT, Kim Barrett wrote: >> Please review this change to ParallelGC oldgen allocation. There were two >> variants, one using CAS on the _top member of the mutable space, the other >> requiring locking or other forms of mutual exclusion. >> >> We don't need both variants. The non-CAS variant is only used in a few >> places, where the cost of an extra CAS doesn't matter. What does matter is >> that having two variants, which must not be used concurrently, makes the >> code larger, more complex, and harder to verify. (This change came out of >> analyzing JDK-8259271. No problems were found (or expected), so this change >> is not expected to impact that bug. But because of the two variants, the >> possibility of unexpected interact needed to be examined.) >> >> The non-CAS allocation support has been removed, with PSOldGen::allocate now >> implemented using the CAS-based allocation. The cas_ prefix naming >> convention is retained for the internals for clarity. >> >> While looking at this, noticed and removed a couple of lingering references >> to the class AdjoiningGenerations, which no longer exists after JDK-8243146. >> >> Testing: >> mach5 tier1-5 > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > record oldgen mutator allocations in size policy Marked as reviewed by tschatzl (Reviewer). src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 404: > 402: if (!should_alloc_in_eden(size) || GCLocker::is_active_and_needs_gc()) { > 403: // Size is too big for eden, or gc is locked out. > 404: return old_gen()->allocate_and_record(size); I would have kind of preferred if `allocate_and_record` were a helper method here in `ParallelScavengeHeap` since the recording seems to be entirely a thing of the PSH and not of the old gen, and the implementation of that method just calls back in here, but I am good with this too. ------------- PR: https://git.openjdk.java.net/jdk/pull/2101 From tschatzl at openjdk.java.net Mon Jan 18 10:34:51 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 18 Jan 2021 10:34:51 GMT Subject: RFR: 8256814: WeakProcessorPhases may be redundant [v2] In-Reply-To: References: Message-ID: On Sat, 16 Jan 2021 15:40:31 GMT, Kim Barrett wrote: >> Please review this change which eliminates the WeakProcessorPhase class. >> >> The OopStorageSet class is changed to provide scoped enums for the different >> categories: StrongId, WeakId, and Id (for the union of strong and weak). >> An accessor is provided for obtaining the storage corresponding to a >> category value. >> >> Various other enumerator ranges, array sizes and indices, and iterations are >> derived directly from the corresponding OopStorageSet category's enum range. >> >> Iteration over a category of enumerators can be done via EnumIterator. The >> iteration over storage objects makes use of that enum iteration, rather than >> having a bespoke implementation. Some use-cases need iteration of the >> enumerators, with storage lookup from the enumerator; other use-cases just >> need the storage objects. >> >> Testing: >> mach5 tier1-6 >> Local (linux-x64) hotspot:tier1 with -XX:+UseShenandoahGC > > Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - Merge branch 'master' into wpp4 > - stefank review > - Remove WeakProcessorPhase, adding scoped enum categories to OopStorageSet. Marked as reviewed by tschatzl (Reviewer). src/hotspot/share/gc/shared/oopStorageSet.cpp line 2: > 1: /* > 2: * Copyright (c) 2019, 2020, Oracle and/or its affiliates. All rights reserved. It's 2021 by now, I suggest to update these before pushing. ------------- PR: https://git.openjdk.java.net/jdk/pull/1862 From tschatzl at openjdk.java.net Mon Jan 18 10:37:46 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 18 Jan 2021 10:37:46 GMT Subject: RFR: 8258742: Move PtrQueue reset to PtrQueueSet subclasses In-Reply-To: References: Message-ID: <1dgRRDTcWGLGFijLRvA01DMsmGoQMoFZM8-1Z5VrWQ4=.96f80cf3-136b-4a86-bf9e-f9db626f3979@github.com> On Sun, 17 Jan 2021 13:17:20 GMT, Kim Barrett wrote: > Please remove this change to the PtrQueue hierarchy, changing queue reset > from an intrinsic operation of the queue to an operation the qset performs > on a queue. This is another piece of the refactoring being done under > JDK-8258251, separated out for easier review. > > After the refactoring of queue reset, PtrQueue::is_empty and PtrQueue::size > are no longer used, so are removed. Further, PtrQueue::{set_}index_in_bytes > are removed, directly using _index instead. > > A less obvious part of the change is in the G1 remark task and Shenandoah > final marking task. The threads walk performed by these no longer directly > processes the partial per-thread SATB buffers. Instead they just flush the > queues for later completed buffer processing. > > Testing: > mach5 tier1 > local (linux-x64) hotspot:tier1 with -XX:+UseShenandoahGC Lgtm. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2115 From kim.barrett at oracle.com Mon Jan 18 10:41:38 2021 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 18 Jan 2021 05:41:38 -0500 Subject: RFR: 8256814: WeakProcessorPhases may be redundant In-Reply-To: <33C0F891-C26A-4B41-B85A-84627676071D@oracle.com> References: <33C0F891-C26A-4B41-B85A-84627676071D@oracle.com> Message-ID: <39C4F5B9-FA6C-4E5C-95A3-2A1F7C1299B3@oracle.com> > On Jan 16, 2021, at 10:42 AM, Kim Barrett wrote: > >> On Jan 12, 2021, at 5:12 AM, Stefan Karlsson wrote: >> src/hotspot/share/gc/shared/weakProcessorTimes.hpp line 37: >> >>> 35: class WeakProcessorTimes { >>> 36: public: >>> 37: using StorageId = OopStorageSet::WeakId; >> >> Could be private. > > Here too I think public functions whose type signatures involve identifiers > that can't be used by clients is problematic. But I renamed it from > StorageId to WeakId; looking at it again, the more generic name seems > counterproductive here. After thinking about this some more, I?m going to see what it looks like to just eliminate the type alias entirely. From kim.barrett at oracle.com Mon Jan 18 10:43:25 2021 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 18 Jan 2021 05:43:25 -0500 Subject: RFR: 8259776: Remove ParallelGC non-CAS oldgen allocation [v2] In-Reply-To: References: Message-ID: <341D935A-DD2A-4194-B96C-F87A5483DF4C@oracle.com> > On Jan 18, 2021, at 5:23 AM, Thomas Schatzl wrote: > src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 404: > >> 402: if (!should_alloc_in_eden(size) || GCLocker::is_active_and_needs_gc()) { >> 403: // Size is too big for eden, or gc is locked out. >> 404: return old_gen()->allocate_and_record(size); > > I would have kind of preferred if `allocate_and_record` were a helper method here in `ParallelScavengeHeap` since the recording seems to be entirely a thing of the PSH and not of the old gen, and the implementation of that method just calls back in here, but I am good with this too. I like this suggestion. From sjohanss at openjdk.java.net Mon Jan 18 10:45:48 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Mon, 18 Jan 2021 10:45:48 GMT Subject: RFR: JDK-8256155: os::Linux Populate all large_page_sizes, select smallest page size in reserve_memory_special_huge_tlbfs* [v15] In-Reply-To: References: <3BHnoVKWC7hs9ftPYJGiILcSiYDC58nneNLIrwmKYA4=.d5c63333-9b77-4ec5-aab6-719d288e3f2f@github.com> Message-ID: On Sat, 16 Jan 2021 05:58:56 GMT, Thomas Stuefe wrote: >> Did some more testing with the code. I'm using Parallel for testing becuase G1 does a better job aligning sizes and avoiding some problems. >> >> I found that this change has a problem with mapping using both small and large pages (`reserve_memory_special_huge_tlbfs_mixed()`). I'm currently investigating if we can remove these type of mixed-mappings, and instead make sure we only use large pages when properly aligned, so in the future we might be able get rid of some code in this area. For know see my comments below. > > Since we are not shipping this with JDK16, I'm more relaxed now. This will have time to cook before JDK17 is shipped, which takes care of my third point (doing more tests). > > About the jtreg test. I originally wrote: > >>> one jtreg test to test that the VM comes up with -XX:+UseLargePages -XX:LargePageSizeInBytes=1G and allocates small-large-pages as expected. This is not only needed as a function proof but to prevent regressions when we reform the code (which will happen) > > Not sure if that was too vague. An easy way would be to add some tracing to the VM in the allocation path, eg with `log_info(os)(...)`, then in the test start a VM with `-XX:+UseLargePages -XX:LargePageSizeInBytes=1G -Xlog=os`, and scan its output. There are many tests which do this, for an easy example see e.g. runtime/os/TestUseCpuAllocPath.java. > > I'll take a closer look next week but will wait until Stefan had his go. Found a couple of additional issues: * The `page_size_for_region_*()` helpers was previously only used in higher level code to help figure out if large pages should/could be used for a given size. Now when using them at the actual site of reservation it will break the cases where someone in a higher level has requested that there should be at least a certain number of pages for the given size. We can take the heap using Parallel as an example: const size_t min_pages = 4; // 1 for eden + 1 for each survivor + 1 for old const size_t page_sz = os::page_size_for_region_aligned(MinHeapSize, min_pages); If both 2M and 1G pages are enabled this will settle for 2M in the code setting up Parallel GC but then end up allocating just one 1G page if we run with `-Xmx1g`. * There is also an issue when there, for example, are too few pages to allocate the heap using 1G pages, then we fall straight back to 4k pages instead of trying 2M pages first. My preferred way of handling this would be that the higher level code sets an upper bound on the page size to be used and the mapping layer satisfies the mapping using the largest possible page size with enough pages free. Such a change might be a bit big for this PR, but we need to make sure this change don't break anything like what I describe above. ------------- PR: https://git.openjdk.java.net/jdk/pull/1153 From aph at redhat.com Mon Jan 18 11:43:47 2021 From: aph at redhat.com (Andrew Haley) Date: Mon, 18 Jan 2021 11:43:47 +0000 Subject: Help with 8166811: Missing memory fences between memory allocation and refinement Message-ID: We're looking at backporting this to 8u. However, the resulting patch was rather extensive, and I'm concerned that backporting it requires a great deal of care. Kim Barrett's initial comments suggest that, while not optimally efficient, a couple of memory fences might do the job. From my point of view this would be better because it cannot break anything. However, I'm having some difficulty understanding Kim's comments. My best guess follows. Is this what was intended? All suggestions are very welcome. index e787719fb1b..ccdd16c7390 100644 --- a/hotspot/src/share/vm/gc/g1/heapRegion.cpp +++ b/hotspot/src/share/vm/gc/g1/heapRegion.cpp @@ -413,6 +413,8 @@ bool HeapRegion::oops_on_card_seq_iterate_careful(MemRegion mr, if (g1h->is_gc_active()) { mr = mr.intersection(MemRegion(bottom(), scan_top())); } else { mr = mr.intersection(used_region()); } if (mr.is_empty()) { return true; } + // LoadLoad/Acquire here? + // The intersection of the incoming mr (for the card) and the // allocated part of the region is non-empty. This implies that // we have actually allocated into this region. The code in diff --git a/hotspot/src/share/vm/gc/g1/heapRegion.inline.hpp b/hotspot/src/share/vm/gc/g1/heapRegion.inline.hpp index 01c53283579..72f9e8781cd 100644 --- a/hotspot/src/share/vm/gc/g1/heapRegion.inline.hpp +++ b/hotspot/src/share/vm/gc/g1/heapRegion.inline.hpp @@ -40,6 +40,9 @@ inline HeapWord* G1ContiguousSpace::allocate_impl(size_t min_word_size, size_t want_to_allocate = MIN2(available, desired_word_size); if (want_to_allocate >= min_word_size) { HeapWord* new_top = obj + want_to_allocate; + + // StoreStore/Release here? + set_top(new_top); assert(is_aligned(obj) && is_aligned(new_top), "checking alignment"); *actual_size = want_to_allocate; -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From sjohanss at openjdk.java.net Mon Jan 18 11:46:49 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Mon, 18 Jan 2021 11:46:49 GMT Subject: RFR: JDK-8256155: os::Linux Populate all large_page_sizes, select smallest page size in reserve_memory_special_huge_tlbfs* [v15] In-Reply-To: References: Message-ID: On Tue, 15 Dec 2020 18:48:05 GMT, Marcus G K Williams wrote: >> When using LargePageSizeInBytes=1G, os::Linux::reserve_memory_special_huge_tlbfs* cannot select large pages smaller than 1G. Code heap usually uses less than 1G, so currently the code precludes code heap from using >> Large pages in this circumstance and when os::Linux::reserve_memory_special_huge_tlbfs* is called page sizes fall back to Linux::page_size() (usually 4k). >> >> This change allows the above use case by populating all large_page_sizes present in /sys/kernel/mm/hugepages in _page_sizes upon calling os::Linux::setup_large_page_size(). >> >> In os::Linux::reserve_memory_special_huge_tlbfs* we then select the largest large page size available in _page_sizes that is smaller than bytes being reserved. > > Marcus G K Williams has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: > > - Merge branch 'master' into update_hlp > - Merge branch 'master' into update_hlp > - Remove extraneous ' from warning > > Signed-off-by: Marcus G K Williams > - Merge branch 'master' into update_hlp > - Merge branch 'master' into update_hlp > - Merge branch 'master' into update_hlp > - Fix os::large_page_size() in last update > > Signed-off-by: Marcus G K Williams > - Ivan W. Requested Changes > > Removed os::Linux::select_large_page_size and > use os::page_size_for_region instead > > Removed Linux::find_large_page_size and use > register_large_page_sizes. Streamlined > Linux::setup_large_page_size > > Signed-off-by: Marcus G K Williams > - Fix space format, use Linux:: for local func. > > Signed-off-by: Marcus G K Williams > - Merge branch 'update_hlp' of github.com:mgkwill/jdk into update_hlp > - ... and 13 more: https://git.openjdk.java.net/jdk/compare/da2415fe...d73e7a4c src/hotspot/os/linux/os_linux.cpp line 3746: > 3744: if (page_size * K > (size_t)Linux::page_size()) { > 3745: // Add each found Large Page Size to _page_sizes > 3746: _page_sizes.add(page_size * K); Just realized one more thing, with this code we will enable all page sizes configured even if there are no pages "allocated" for the given size. Is that what we want or should we read the file nr_hugepages in the given director and only add it if the size is > 0? ------------- PR: https://git.openjdk.java.net/jdk/pull/1153 From iwalulya at openjdk.java.net Mon Jan 18 12:20:52 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Mon, 18 Jan 2021 12:20:52 GMT Subject: RFR: JDK-8256155: os::Linux Populate all large_page_sizes, select smallest page size in reserve_memory_special_huge_tlbfs* [v15] In-Reply-To: References: Message-ID: On Mon, 18 Jan 2021 11:44:13 GMT, Stefan Johansson wrote: >> Marcus G K Williams has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: >> >> - Merge branch 'master' into update_hlp >> - Merge branch 'master' into update_hlp >> - Remove extraneous ' from warning >> >> Signed-off-by: Marcus G K Williams >> - Merge branch 'master' into update_hlp >> - Merge branch 'master' into update_hlp >> - Merge branch 'master' into update_hlp >> - Fix os::large_page_size() in last update >> >> Signed-off-by: Marcus G K Williams >> - Ivan W. Requested Changes >> >> Removed os::Linux::select_large_page_size and >> use os::page_size_for_region instead >> >> Removed Linux::find_large_page_size and use >> register_large_page_sizes. Streamlined >> Linux::setup_large_page_size >> >> Signed-off-by: Marcus G K Williams >> - Fix space format, use Linux:: for local func. >> >> Signed-off-by: Marcus G K Williams >> - Merge branch 'update_hlp' of github.com:mgkwill/jdk into update_hlp >> - ... and 13 more: https://git.openjdk.java.net/jdk/compare/da2415fe...d73e7a4c > > src/hotspot/os/linux/os_linux.cpp line 3746: > >> 3744: if (page_size * K > (size_t)Linux::page_size()) { >> 3745: // Add each found Large Page Size to _page_sizes >> 3746: _page_sizes.add(page_size * K); > > Just realized one more thing, with this code we will enable all page sizes configured even if there are no pages "allocated" for the given size. > > Is that what we want or should we read the file nr_hugepages in the given director and only add it if the size is > 0? I think a more complete solution is to check the nr_hugepages. Additionally, this will be required by the solution you propose above that would consider "largest possible page size with enough pages free". ------------- PR: https://git.openjdk.java.net/jdk/pull/1153 From thartmann at openjdk.java.net Mon Jan 18 12:48:59 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Mon, 18 Jan 2021 12:48:59 GMT Subject: RFR: 8258383: vmTestbase/gc/g1/unloading/tests/unloading_compilation_level[1, 2, 3] time out without TieredCompilation Message-ID: <9GYSfaiylUZMije7ILsk6TZpg4_dADdCjzCJ4epMKhw=.0b99d872-bb6d-467f-809b-e2eb423f8c05@github.com> The test gets stuck while waiting for a compilation to succeed, because the corresponding compilation level is not available since Tiered Compilation is disabled (or `TieredStopAtLevel` is set). The tests should not be executed without Tiered Compilation (or if the requested compilation level is not available) and also check the output of `enqueueMethodForCompilation` for sanity. Thanks, Tobias ------------- Commit messages: - 8258383: vmTestbase/gc/g1/unloading/tests/unloading_compilation_level[1,2,3] time out without TieredCompilation Changes: https://git.openjdk.java.net/jdk/pull/2125/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2125&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8258383 Stats: 75 lines in 25 files changed: 50 ins; 0 del; 25 mod Patch: https://git.openjdk.java.net/jdk/pull/2125.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2125/head:pull/2125 PR: https://git.openjdk.java.net/jdk/pull/2125 From sjohanss at openjdk.java.net Mon Jan 18 13:17:52 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Mon, 18 Jan 2021 13:17:52 GMT Subject: RFR: JDK-8256155: os::Linux Populate all large_page_sizes, select smallest page size in reserve_memory_special_huge_tlbfs* [v15] In-Reply-To: References: Message-ID: On Mon, 18 Jan 2021 12:17:26 GMT, Ivan Walulya wrote: >> src/hotspot/os/linux/os_linux.cpp line 3746: >> >>> 3744: if (page_size * K > (size_t)Linux::page_size()) { >>> 3745: // Add each found Large Page Size to _page_sizes >>> 3746: _page_sizes.add(page_size * K); >> >> Just realized one more thing, with this code we will enable all page sizes configured even if there are no pages "allocated" for the given size. >> >> Is that what we want or should we read the file nr_hugepages in the given director and only add it if the size is > 0? > > I think a more complete solution is to check the nr_hugepages. Additionally, this will be required by the solution you propose above that would consider "largest possible page size with enough pages free". I think so too. The "largest possible page size..." could be solved anyway, by just retrying all configured pages sizes until we find one that works. But it would be much more efficient to just try the page size that actually could work. ------------- PR: https://git.openjdk.java.net/jdk/pull/1153 From stuefe at openjdk.java.net Mon Jan 18 13:50:39 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Mon, 18 Jan 2021 13:50:39 GMT Subject: RFR: JDK-8256155: os::Linux Populate all large_page_sizes, select smallest page size in reserve_memory_special_huge_tlbfs* [v15] In-Reply-To: References: Message-ID: On Mon, 18 Jan 2021 13:14:28 GMT, Stefan Johansson wrote: >> I think a more complete solution is to check the nr_hugepages. Additionally, this will be required by the solution you propose above that would consider "largest possible page size with enough pages free". > > I think so too. The "largest possible page size..." could be solved anyway, by just retrying all configured pages sizes until we find one that works. But it would be much more efficient to just try the page size that actually could work. One also can set nr_overcommit_hugepages>0 and have a "dynamic" large page pool this way, even with nr_hugepages=0. Moreover, these settings can change during the lifetime of the VM. I would not bother adding too much logic here. Allocating huge pages may or may not fail anyway and the VM has to be prepared to deal with failure. Just my 5c. ------------- PR: https://git.openjdk.java.net/jdk/pull/1153 From ddong at openjdk.java.net Mon Jan 18 13:55:06 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Mon, 18 Jan 2021 13:55:06 GMT Subject: RFR: 8259808: Add JFR event to detect GC locker stall [v2] In-Reply-To: References: Message-ID: > GC locker will stall the operation of GC, resulting in some Java threads can not continue to run until GC locker is released, thus affecting the response time of the application. Add a JFR event to report this information is helpful to detect this issue. > > For the test purpose, I add two Whitebox methods to lock/unlock critical. Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: Refactor based on comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2088/files - new: https://git.openjdk.java.net/jdk/pull/2088/files/b68814f3..c36d4f96 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2088&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2088&range=00-01 Stats: 94 lines in 6 files changed: 55 ins; 36 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/2088.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2088/head:pull/2088 PR: https://git.openjdk.java.net/jdk/pull/2088 From ddong at openjdk.java.net Mon Jan 18 13:55:07 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Mon, 18 Jan 2021 13:55:07 GMT Subject: RFR: 8259808: Add JFR event to detect GC locker stall [v2] In-Reply-To: References: Message-ID: On Mon, 18 Jan 2021 08:10:10 GMT, Stefan Karlsson wrote: >> Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: >> >> Refactor based on comments > > Not a review, but a few comments about what probably needs to be cleaned up before a proper review starts. Refactored. Testing: jdk/jfr all passed. > src/hotspot/share/gc/shared/gcLocker.cpp line 186: > >> 184: _stall_count = 0; >> 185: } >> 186: #endif > > This adds a fair amount of noise and hides the actual GCLocker logic, IMHO. Could you somehow encapsulate this code and the other INCLUDE_JFR above into a class and make single-line calls perform these functions? good idea. updated. ------------- PR: https://git.openjdk.java.net/jdk/pull/2088 From sjohanss at openjdk.java.net Mon Jan 18 14:01:41 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Mon, 18 Jan 2021 14:01:41 GMT Subject: RFR: JDK-8256155: os::Linux Populate all large_page_sizes, select smallest page size in reserve_memory_special_huge_tlbfs* [v15] In-Reply-To: References: Message-ID: On Mon, 18 Jan 2021 13:48:12 GMT, Thomas Stuefe wrote: >> I think so too. The "largest possible page size..." could be solved anyway, by just retrying all configured pages sizes until we find one that works. But it would be much more efficient to just try the page size that actually could work. > > One also can set nr_overcommit_hugepages>0 and have a "dynamic" large page pool this way, even with nr_hugepages=0. Moreover, these settings can change during the lifetime of the VM. I would not bother adding too much logic here. Allocating huge pages may or may not fail anyway and the VM has to be prepared to deal with failure. Just my 5c. The "dynamic part" might make this actually being "too much logic", otherwise if feels like a pretty reasonable check. The fact that huge pages can be added during the runtime of the JVM doesn't feel like a big problem, since most large reservations are done at startup. But you might be right, since we have to handle the case of a failed mapping, it might not be to big of a problem trying all possible page sizes. ------------- PR: https://git.openjdk.java.net/jdk/pull/1153 From ddong at openjdk.java.net Mon Jan 18 14:00:42 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Mon, 18 Jan 2021 14:00:42 GMT Subject: RFR: 8259808: Add JFR event to detect GC locker stall [v2] In-Reply-To: References: Message-ID: <0GQ4f8N7Ddv0bpiazsnfPydzLYN1gZdhHu42U3ZpMMc=.3dd4d732-c24a-4166-9d00-0b8e7e4b03e0@github.com> On Mon, 18 Jan 2021 08:04:38 GMT, Stefan Karlsson wrote: >> Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: >> >> Refactor based on comments > > src/hotspot/share/utilities/ticks.hpp line 241: > >> 239: friend class GCTimerTest; >> 240: friend class CompilerEvent; >> 241: friend class GCLocker; > > I don't think this should be needed. Fixed. > src/hotspot/share/gc/shared/gcLocker.cpp line 112: > >> 110: #if INCLUDE_JFR >> 111: if (EventGCLocker::is_enabled()) { >> 112: _needs_gc_start_timestamp = JfrTicks::now(); > > Do you really need to use JfrTicks instead of Ticks here? If not, could you remove all references and includes of JfrTicks? We usually use pass in Ticks when we send JFR events. Fixed. > src/hotspot/share/prims/whitebox.cpp line 44: > >> 42: #include "gc/shared/genArguments.hpp" >> 43: #include "gc/shared/genCollectedHeap.hpp" >> 44: #include "gc/shared/gcLocker.inline.hpp" > > Sort includes. Fixed. > src/hotspot/share/jfr/metadata/metadata.xml line 1080: > >> 1078: >> 1079: >> 1080: > > You add this between two Shenandoah events. Could you put it somewhere where it's not splitting up a group of events? Fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/2088 From ayang at openjdk.java.net Mon Jan 18 14:23:57 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Mon, 18 Jan 2021 14:23:57 GMT Subject: RFR: 8259851: Using boolean type for tasks in SubTasksDone Message-ID: Changing `uint` to `bool` in `SubTasksDone`, since atomic operations on `bool` are well supported. Tested: hotspot_gc ------------- Commit messages: - bool Changes: https://git.openjdk.java.net/jdk/pull/2131/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2131&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8259851 Stats: 13 lines in 2 files changed: 0 ins; 6 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/2131.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2131/head:pull/2131 PR: https://git.openjdk.java.net/jdk/pull/2131 From kbarrett at openjdk.java.net Mon Jan 18 15:08:12 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Mon, 18 Jan 2021 15:08:12 GMT Subject: RFR: 8256814: WeakProcessorPhases may be redundant [v3] In-Reply-To: References: Message-ID: > Please review this change which eliminates the WeakProcessorPhase class. > > The OopStorageSet class is changed to provide scoped enums for the different > categories: StrongId, WeakId, and Id (for the union of strong and weak). > An accessor is provided for obtaining the storage corresponding to a > category value. > > Various other enumerator ranges, array sizes and indices, and iterations are > derived directly from the corresponding OopStorageSet category's enum range. > > Iteration over a category of enumerators can be done via EnumIterator. The > iteration over storage objects makes use of that enum iteration, rather than > having a bespoke implementation. Some use-cases need iteration of the > enumerators, with storage lookup from the enumerator; other use-cases just > need the storage objects. > > Testing: > mach5 tier1-6 > Local (linux-x64) hotspot:tier1 with -XX:+UseShenandoahGC Kim Barrett has updated the pull request incrementally with two additional commits since the last revision: - update copyrights - remove type aliases for OopStorageSet::WeakId ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1862/files - new: https://git.openjdk.java.net/jdk/pull/1862/files/3a4d5b78..ebe50e35 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1862&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1862&range=01-02 Stats: 48 lines in 17 files changed: 6 ins; 8 del; 34 mod Patch: https://git.openjdk.java.net/jdk/pull/1862.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1862/head:pull/1862 PR: https://git.openjdk.java.net/jdk/pull/1862 From kim.barrett at oracle.com Mon Jan 18 15:10:22 2021 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 18 Jan 2021 10:10:22 -0500 Subject: RFR: 8256814: WeakProcessorPhases may be redundant In-Reply-To: <39C4F5B9-FA6C-4E5C-95A3-2A1F7C1299B3@oracle.com> References: <33C0F891-C26A-4B41-B85A-84627676071D@oracle.com> <39C4F5B9-FA6C-4E5C-95A3-2A1F7C1299B3@oracle.com> Message-ID: <9D910C33-7007-46B3-8638-93CC65E267FC@oracle.com> > On Jan 18, 2021, at 5:41 AM, Kim Barrett wrote: > >> On Jan 16, 2021, at 10:42 AM, Kim Barrett wrote: >> >>> On Jan 12, 2021, at 5:12 AM, Stefan Karlsson wrote: >>> src/hotspot/share/gc/shared/weakProcessorTimes.hpp line 37: >>> >>>> 35: class WeakProcessorTimes { >>>> 36: public: >>>> 37: using StorageId = OopStorageSet::WeakId; >>> >>> Could be private. >> >> Here too I think public functions whose type signatures involve identifiers >> that can't be used by clients is problematic. But I renamed it from >> StorageId to WeakId; looking at it again, the more generic name seems >> counterproductive here. > > After thinking about this some more, I?m going to see what it looks like to just > eliminate the type alias entirely. Done. The type alias didn?t really add much, just shortening some a few uses and signatures, not really enough to justify adding it. I also updated the copyrights for 2021. From kbarrett at openjdk.java.net Mon Jan 18 15:33:49 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Mon, 18 Jan 2021 15:33:49 GMT Subject: RFR: 8259851: Using boolean type for tasks in SubTasksDone In-Reply-To: References: Message-ID: On Mon, 18 Jan 2021 14:17:21 GMT, Albert Mingkun Yang wrote: > Changing `uint` to `bool` in `SubTasksDone`, since atomic operations on `bool` are well supported. > > Tested: hotspot_gc Should this change be made? I understand the intent is to use the semantically intended type, and agree with that intent. But there is a hidden cost; some platforms don't directly support cmpxchg on byte sized values, and use CmpxchgByteUsingInt. Maybe that cost is in the noise, but the question should be considered. For Zero I don't care. But there are affected platforms. ------------- PR: https://git.openjdk.java.net/jdk/pull/2131 From amith.pawar at gmail.com Mon Jan 18 15:46:20 2021 From: amith.pawar at gmail.com (Amit Pawar) Date: Mon, 18 Jan 2021 21:16:20 +0530 Subject: RFR: Convert old-gen single threaded pretouch to multi-threaded during In-Reply-To: References: Message-ID: On Fri, Jan 8, 2021 at 6:38 PM Amit Pawar wrote: > Hi > > I am trying to improve the pre-touch time taken during old-gen resizing. > Need your suggestions whether following change will be accepted or not. > > What is happening ? > Every GC thread resizes the old-gen during object promotion if there is no > enough room for the object. After expanding GC thread will pre-touch the > pages alone and cant pre-touch in parallel using PretouchTask task as it is > already executing a GC task. The total GC pause time depends upon resize > size and number of resizes. > > What is fix? > Create another WorkGang and then GC thread can execute pre-touch task with > this new WorkGang to reduce the pre-touch time taken. The code change is > given below. > > Improvement: > 1. Pre-touch improved by 50-70% for SPECjbb composite test. > 2. This depends upon number of resize request and resize size. SPECJbb > composite testing shows old-gen resized with sizes like 2MB-32MB with G1GC > and up-to 64MB with ParallelGC. Also number of resizes are more than > 100-200. > 3. PretouchTask class uses PreTouchParallelChunkSize and current default > is 4MB for x86 to split the pre-touch task. So time taken depends upon > old-gen resize and this change wont help if it lesser than > PreTouchParallelChunkSize value. > 4. Please refer excel file from bug report for more details on improvement > for different sizes. https://bugs.openjdk.java.net/browse/JDK-8254699 > > Though it helps to reduce the pre-touch time taken but not sure whether > adding another WorkGang is allowed. Please suggest. > > diff --git a/src/hotspot/share/gc/shared/gc_globals.hpp > b/src/hotspot/share/gc/shared/gc_globals.hpp > index aca8d6b6c34..b5d40b47480 100644 > --- a/src/hotspot/share/gc/shared/gc_globals.hpp > +++ b/src/hotspot/share/gc/shared/gc_globals.hpp > @@ -200,6 +200,12 @@ > product(bool, AlwaysPreTouch, false, > \ > "Force all freshly committed pages to be pre-touched") > \ > > \ > + product(size_t, OldGenPreTouchWorkers, 1, > \ > + "During object promotion old-gen can be expanded as required > by" \ > + "ParallelGCThreads. OldGenPreTouchWorkers can be used to " > \ > + "pre-touch the pages by ParallelGCThreads") > \ > + range(1, 1024) > \ > + > \ > product_pd(size_t, PreTouchParallelChunkSize, > \ > "Per-thread chunk size for parallel memory pre-touch.") > \ > range(4*K, SIZE_MAX / 2) > \ > diff --git a/src/hotspot/share/gc/shared/pretouchTask.cpp > b/src/hotspot/share/gc/shared/pretouchTask.cpp > index 4398d3924cc..435ec2ee76f 100644 > --- a/src/hotspot/share/gc/shared/pretouchTask.cpp > +++ b/src/hotspot/share/gc/shared/pretouchTask.cpp > @@ -27,6 +27,7 @@ > #include "runtime/atomic.hpp" > #include "runtime/globals.hpp" > #include "runtime/os.hpp" > +#include "utilities/ticks.hpp" > > PretouchTask::PretouchTask(const char* task_name, > char* start_address, > @@ -62,6 +63,8 @@ void PretouchTask::work(uint worker_id) { > } > } > > +#define TIME_FORMAT "%0.3lfms" > + > void PretouchTask::pretouch(const char* task_name, char* start_address, > char* end_address, > size_t page_size, WorkGang* pretouch_gang) { > > @@ -83,14 +86,30 @@ void PretouchTask::pretouch(const char* task_name, > char* start_address, char* en > size_t num_chunks = (total_bytes + chunk_size - 1) / chunk_size; > > uint num_workers = (uint)MIN2(num_chunks, > (size_t)pretouch_gang->total_workers()); > - log_debug(gc, heap)("Running %s with %u workers for " SIZE_FORMAT " > work units pre-touching " SIZE_FORMAT "B.", > - task.name(), num_workers, num_chunks, > total_bytes); > - > + Ticks mark_start = Ticks::now(); > pretouch_gang->run_task(&task, num_workers); > + Ticks mark_end = Ticks::now(); > + log_debug(gc, heap)("Running %s with %u workers for " SIZE_FORMAT " > work units pre-touching " SIZE_FORMAT "B. " TIME_FORMAT , > + task.name(), num_workers, num_chunks, > total_bytes, (mark_end-mark_start).seconds()); > + > } else { > - log_debug(gc, heap)("Running %s pre-touching " SIZE_FORMAT "B.", > - task.name(), total_bytes); > - task.work(0); > + if(OldGenPreTouchWorkers > 1) { > + const char *oldgen_workers="Old-gen Pre-touch workers"; > + static WorkGang *pretouch_workers= NULL ; > + if (! pretouch_workers) { > + // pretouch_workers are used when pretouch_gang is null. This usually > happens during old-gen > + // resizing due to object promotion. > + pretouch_workers = new WorkGang(oldgen_workers, > OldGenPreTouchWorkers, true, false); > + pretouch_workers->initialize_workers(); > + } > + pretouch(oldgen_workers, start_address, end_address, page_size, > pretouch_workers); > + } else { > + Ticks mark_start = Ticks::now(); > + task.work(0); > + Ticks mark_end = Ticks::now(); > + log_debug(gc, heap)("Running %s pre-touching " SIZE_FORMAT "B. " > TIME_FORMAT, > + task.name(), total_bytes, > (mark_end-mark_start).seconds()); > + } > } > } > > > > Ping! From rkennke at openjdk.java.net Mon Jan 18 18:23:47 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 18 Jan 2021 18:23:47 GMT Subject: RFR: 8256814: WeakProcessorPhases may be redundant [v3] In-Reply-To: References: Message-ID: On Mon, 18 Jan 2021 15:08:12 GMT, Kim Barrett wrote: >> Please review this change which eliminates the WeakProcessorPhase class. >> >> The OopStorageSet class is changed to provide scoped enums for the different >> categories: StrongId, WeakId, and Id (for the union of strong and weak). >> An accessor is provided for obtaining the storage corresponding to a >> category value. >> >> Various other enumerator ranges, array sizes and indices, and iterations are >> derived directly from the corresponding OopStorageSet category's enum range. >> >> Iteration over a category of enumerators can be done via EnumIterator. The >> iteration over storage objects makes use of that enum iteration, rather than >> having a bespoke implementation. Some use-cases need iteration of the >> enumerators, with storage lookup from the enumerator; other use-cases just >> need the storage objects. >> >> Testing: >> mach5 tier1-6 >> Local (linux-x64) hotspot:tier1 with -XX:+UseShenandoahGC > > Kim Barrett has updated the pull request incrementally with two additional commits since the last revision: > > - update copyrights > - remove type aliases for OopStorageSet::WeakId Changes look good to me! I also ran some tests with Shenandoah and they look good too! Thanks! ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1862 From rkennke at openjdk.java.net Mon Jan 18 20:32:51 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 18 Jan 2021 20:32:51 GMT Subject: RFR: 8255765: Shenandoah: Isolate concurrent, degenerated and full GC In-Reply-To: <9sUK197ZPOSEPLEDl8zKSqby-KMi9FryXnoctOdWmMc=.7b678df1-1c51-4bbf-b1a3-d8a8167a927d@github.com> References: <9sUK197ZPOSEPLEDl8zKSqby-KMi9FryXnoctOdWmMc=.7b678df1-1c51-4bbf-b1a3-d8a8167a927d@github.com> Message-ID: On Wed, 6 Jan 2021 16:45:03 GMT, Zhengyu Gu wrote: > The purpose of this patch is to isolate concurrent, degenerated and full gc implementation, so that, makes each GC implementation more straightforward and clean, and improves readability. > > Current implementation emphasis code sharing, e.g. degenerated GC reuses concurrent GC's ops. It was not a problem in the beginning, when they actually behave similarly. Since concurrent GC moved root processing into concurrent phases, code started to diverge, we started to put bandages to make the shared ops work for both concurrent and degenerated GC, that made code hard to read and error prone. > > The patch breaks up GCs into 3 (mainly just concurrent and degenerated GC, as full GC already standalone) easy to identify and understand classes (ShenandoahConcurrentGC, ShenandoahDegeneratedGC and ShenandoahMarkCompactGC), subclasses of ShenandoahGC class. > > The three GCs still keep vmop/entry/op paradigm, but encapsulate GC control flow inside their own classes, as ShenandoahMarkCompact GC already does. So that, concurrent and degenerated GC no longer share ops and op implementations no longer need to consider other GC modes, which results in simplifying implementation and improving readability. Code sharing is achieved via helper methods provided by ShenandoahHeap. > > Test: > - [x] hotspot_gc_shenandoah > - [x] nightly tests Nice! I really like how that moves some burden out from ShHeap (and ShControlThread) to more apprppriate places. Just the following two remarks. src/hotspot/share/gc/shenandoah/shenandoahCollectorPolicy.cpp line 45: > 43: _cycle_counter(0) { > 44: > 45: Copy::zero_to_bytes(_degen_points, sizeof(size_t) * ShenandoahGC::_DEGENERATED_LIMIT); I wonder if those statistics all belong into ShenandoahDegenGC now? Might be worth considering as a follow-up. src/hotspot/share/gc/shenandoah/shenandoahCollectorPolicy.hpp line 30: > 28: #include "gc/shared/gcTrace.hpp" > 29: #include "gc/shenandoah/shenandoahGC.hpp" > 30: #include "gc/shenandoah/shenandoahSharedVariables.hpp" What's that include for shenandoahSharedVariables.hpp needed for? ------------- PR: https://git.openjdk.java.net/jdk/pull/1964 From rkennke at openjdk.java.net Mon Jan 18 20:32:51 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 18 Jan 2021 20:32:51 GMT Subject: RFR: 8255765: Shenandoah: Isolate concurrent, degenerated and full GC In-Reply-To: References: <9sUK197ZPOSEPLEDl8zKSqby-KMi9FryXnoctOdWmMc=.7b678df1-1c51-4bbf-b1a3-d8a8167a927d@github.com> Message-ID: On Mon, 18 Jan 2021 20:28:14 GMT, Roman Kennke wrote: >> The purpose of this patch is to isolate concurrent, degenerated and full gc implementation, so that, makes each GC implementation more straightforward and clean, and improves readability. >> >> Current implementation emphasis code sharing, e.g. degenerated GC reuses concurrent GC's ops. It was not a problem in the beginning, when they actually behave similarly. Since concurrent GC moved root processing into concurrent phases, code started to diverge, we started to put bandages to make the shared ops work for both concurrent and degenerated GC, that made code hard to read and error prone. >> >> The patch breaks up GCs into 3 (mainly just concurrent and degenerated GC, as full GC already standalone) easy to identify and understand classes (ShenandoahConcurrentGC, ShenandoahDegeneratedGC and ShenandoahMarkCompactGC), subclasses of ShenandoahGC class. >> >> The three GCs still keep vmop/entry/op paradigm, but encapsulate GC control flow inside their own classes, as ShenandoahMarkCompact GC already does. So that, concurrent and degenerated GC no longer share ops and op implementations no longer need to consider other GC modes, which results in simplifying implementation and improving readability. Code sharing is achieved via helper methods provided by ShenandoahHeap. >> >> Test: >> - [x] hotspot_gc_shenandoah >> - [x] nightly tests > > Nice! I really like how that moves some burden out from ShHeap (and ShControlThread) to more apprppriate places. Just the following two remarks. @shipilev should also look at this. ------------- PR: https://git.openjdk.java.net/jdk/pull/1964 From kvn at openjdk.java.net Mon Jan 18 20:33:47 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Mon, 18 Jan 2021 20:33:47 GMT Subject: RFR: 8258383: vmTestbase/gc/g1/unloading/tests/unloading_compilation_level[1, 2, 3] time out without TieredCompilation In-Reply-To: <9GYSfaiylUZMije7ILsk6TZpg4_dADdCjzCJ4epMKhw=.0b99d872-bb6d-467f-809b-e2eb423f8c05@github.com> References: <9GYSfaiylUZMije7ILsk6TZpg4_dADdCjzCJ4epMKhw=.0b99d872-bb6d-467f-809b-e2eb423f8c05@github.com> Message-ID: On Mon, 18 Jan 2021 12:44:17 GMT, Tobias Hartmann wrote: > The test gets stuck while waiting for a compilation to succeed, because the corresponding compilation level is not available since Tiered Compilation is disabled (or `TieredStopAtLevel` is set). The tests should not be executed without Tiered Compilation (or if the requested compilation level is not available) and also check the output of `enqueueMethodForCompilation` for sanity. > > Thanks, > Tobias I see such `requires` patter in other tests too. But what will happen if server VM is built without C1 - no tiered? Such tests may need additions requires `vm.compiler1.enabled` Also when requested level 4 compilation (*_compilation_level4_* tests) you don't need to force -XX:+TieredCompilation. ------------- PR: https://git.openjdk.java.net/jdk/pull/2125 From ayang at openjdk.java.net Mon Jan 18 20:39:41 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Mon, 18 Jan 2021 20:39:41 GMT Subject: RFR: 8259851: Using boolean type for tasks in SubTasksDone In-Reply-To: References: Message-ID: <7r174Y4PWKIhxVpiDLhhmc4ay9FfsxeZuCHPzow2iUA=.d7e732b7-65da-441b-83e1-3f6d98cc9a5c@github.com> On Mon, 18 Jan 2021 15:30:48 GMT, Kim Barrett wrote: > For Zero I don't care. But there are affected platforms. I see arm and s390 with grepping `CmpxchgByteUsingInt`; I will test specjbb2015 and dacapo on arm to see if there is any perf diff. ------------- PR: https://git.openjdk.java.net/jdk/pull/2131 From kbarrett at openjdk.java.net Mon Jan 18 23:51:55 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Mon, 18 Jan 2021 23:51:55 GMT Subject: RFR: 8259776: Remove ParallelGC non-CAS oldgen allocation [v3] In-Reply-To: References: Message-ID: > Please review this change to ParallelGC oldgen allocation. There were two > variants, one using CAS on the _top member of the mutable space, the other > requiring locking or other forms of mutual exclusion. > > We don't need both variants. The non-CAS variant is only used in a few > places, where the cost of an extra CAS doesn't matter. What does matter is > that having two variants, which must not be used concurrently, makes the > code larger, more complex, and harder to verify. (This change came out of > analyzing JDK-8259271. No problems were found (or expected), so this change > is not expected to impact that bug. But because of the two variants, the > possibility of unexpected interact needed to be examined.) > > The non-CAS allocation support has been removed, with PSOldGen::allocate now > implemented using the CAS-based allocation. The cas_ prefix naming > convention is retained for the internals for clarity. > > While looking at this, noticed and removed a couple of lingering references > to the class AdjoiningGenerations, which no longer exists after JDK-8243146. > > Testing: > mach5 tier1-5 Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: move oldgen alloc with size policy recording to heap object ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2101/files - new: https://git.openjdk.java.net/jdk/pull/2101/files/ed7e2b26..994c0eb6 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2101&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2101&range=01-02 Stats: 31 lines in 4 files changed: 12 ins; 14 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/2101.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2101/head:pull/2101 PR: https://git.openjdk.java.net/jdk/pull/2101 From iklam at openjdk.java.net Tue Jan 19 06:43:07 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Tue, 19 Jan 2021 06:43:07 GMT Subject: RFR: 8259870: zBarrier.inline.hpp should not include javaClasses.hpp [v2] In-Reply-To: References: Message-ID: > zBarrier.inline.hpp is a popular header file (it's included by about 430 out of ~1000 hotspot .o files). It includes javaClasses.hpp only for the inline function verify_on_weak(), which is used only for assert purposes in debug builds. > > javaClasses.hpp is large and in turn pulls in other large header files. If we move verify_on_weak() into zBarrier.cpp and stop including javaClasses.hpp in zBarrier.inline.hpp, building hotspot is about 0.5% faster. The number of .o files that include javaClasses.hpp is reduced from 459 to 175. > > Testing: > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into 8259870-zBarrier-not-include-javaClasses - 8259870: zBarrier.inline.hpp should not include javaClasses.hpp ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2120/files - new: https://git.openjdk.java.net/jdk/pull/2120/files/451e9a58..2221d845 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2120&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2120&range=00-01 Stats: 12140 lines in 138 files changed: 1409 ins; 9273 del; 1458 mod Patch: https://git.openjdk.java.net/jdk/pull/2120.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2120/head:pull/2120 PR: https://git.openjdk.java.net/jdk/pull/2120 From iklam at openjdk.java.net Tue Jan 19 06:47:52 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Tue, 19 Jan 2021 06:47:52 GMT Subject: RFR: 8259870: zBarrier.inline.hpp should not include javaClasses.hpp [v2] In-Reply-To: References: Message-ID: <-ptwgdW7j_72Ke0wKpFCTc5Upm2jglxkAHoT85lk8sY=.821910c4-aa68-4745-9427-c3d0b75b48cc@github.com> On Mon, 18 Jan 2021 07:52:43 GMT, Stefan Karlsson wrote: >> Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge branch 'master' into 8259870-zBarrier-not-include-javaClasses >> - 8259870: zBarrier.inline.hpp should not include javaClasses.hpp > > Marked as reviewed by stefank (Reviewer). Thanks @stefank and @tschatzl for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/2120 From iklam at openjdk.java.net Tue Jan 19 06:47:53 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Tue, 19 Jan 2021 06:47:53 GMT Subject: Integrated: 8259870: zBarrier.inline.hpp should not include javaClasses.hpp In-Reply-To: References: Message-ID: On Sun, 17 Jan 2021 23:53:53 GMT, Ioi Lam wrote: > zBarrier.inline.hpp is a popular header file (it's included by about 430 out of ~1000 hotspot .o files). It includes javaClasses.hpp only for the inline function verify_on_weak(), which is used only for assert purposes in debug builds. > > javaClasses.hpp is large and in turn pulls in other large header files. If we move verify_on_weak() into zBarrier.cpp and stop including javaClasses.hpp in zBarrier.inline.hpp, building hotspot is about 0.5% faster. The number of .o files that include javaClasses.hpp is reduced from 459 to 175. > > Testing: > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. This pull request has now been integrated. Changeset: 14ce8f1a Author: Ioi Lam URL: https://git.openjdk.java.net/jdk/commit/14ce8f1a Stats: 33 lines in 3 files changed: 17 ins; 13 del; 3 mod 8259870: zBarrier.inline.hpp should not include javaClasses.hpp Reviewed-by: stefank, tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/2120 From stuefe at openjdk.java.net Tue Jan 19 07:09:52 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 19 Jan 2021 07:09:52 GMT Subject: RFR: JDK-8256155: os::Linux Populate all large_page_sizes, select smallest page size in reserve_memory_special_huge_tlbfs* [v15] In-Reply-To: References: Message-ID: On Mon, 18 Jan 2021 13:59:02 GMT, Stefan Johansson wrote: >> One also can set nr_overcommit_hugepages>0 and have a "dynamic" large page pool this way, even with nr_hugepages=0. Moreover, these settings can change during the lifetime of the VM. I would not bother adding too much logic here. Allocating huge pages may or may not fail anyway and the VM has to be prepared to deal with failure. Just my 5c. > > The "dynamic part" might make this actually being "too much logic", otherwise if feels like a pretty reasonable check. The fact that huge pages can be added during the runtime of the JVM doesn't feel like a big problem, since most large reservations are done at startup. > > But you might be right, since we have to handle the case of a failed mapping, it might not be to big of a problem trying all possible page sizes. I'm actually using nr_overcommit_hugepages alot (has been a tip by Per Liden) since its so convenient. As for allocation at startup, I plan on making Metaspace large-page-able again at some point in the future; that would mean larger LP allocations may happen later in VM life too. ------------- PR: https://git.openjdk.java.net/jdk/pull/1153 From pliden at openjdk.java.net Tue Jan 19 08:31:04 2021 From: pliden at openjdk.java.net (Per Liden) Date: Tue, 19 Jan 2021 08:31:04 GMT Subject: [jdk16] RFR: 8259765: ZGC: Handle incorrect processor id reported by the operating system [v2] In-Reply-To: References: Message-ID: <5eyQW6avZzkhazf9YWyqLUcu7IBDAX5BAGx2z7gtPeo=.f84445ed-62e9-4a7e-9704-c00bee3214a7@github.com> > Some environments (e.g. OpenVZ containers) incorrectly report a logical processor id that is higher than the number of processors available. This is problematic, for example, when implementing CPU-local data structures, where the processor id is used to index into an array of length processor_count(). > > We've received crash reports from Jelastic (a Virtuozzo/OpenVZ user) where they run into this problem. We can workaround the problem in the JVM, until the underlying problem is fixed. Without this workaround ZGC can't be used in this environment. > > This is currently a ZGC-specific issue, since ZGC is currently the only part of HotSpot that is using CPU-local data structures, but that could change in the future. > > Just to clarify. In a Virtuozzo/OpenZV environment, it seems the underlying problem is not necessarily that sched_getcpu() returns an incorrect processor id, but rather that sysconf(_SC_NPROCESSORS_CONF) returns a too low number. Either way, sched_getcpu() and syconf(_SC_NPROCESSORS_CONF) seems to have different views of the world. This is not an issue in container environments such as Docker. > > This patch works around this problem by letting os::processor_id() on Linux detect incorrect processor ids, and convert them to processor id 0. As mentioned in the comment in the code, this is safe, but not optimal for performance if the system actually has more than one processor. There's also a warning printed the first time this happen. > > Testing: Manual testing with various fake/incorrect values returned from sched_getcpu(). Per Liden has updated the pull request incrementally with one additional commit since the last revision: Review ------------- Changes: - all: https://git.openjdk.java.net/jdk16/pull/124/files - new: https://git.openjdk.java.net/jdk16/pull/124/files/dbe8bd89..48d9b68a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk16&pr=124&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk16&pr=124&range=00-01 Stats: 8 lines in 1 file changed: 2 ins; 2 del; 4 mod Patch: https://git.openjdk.java.net/jdk16/pull/124.diff Fetch: git fetch https://git.openjdk.java.net/jdk16 pull/124/head:pull/124 PR: https://git.openjdk.java.net/jdk16/pull/124 From thartmann at openjdk.java.net Tue Jan 19 08:33:04 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 19 Jan 2021 08:33:04 GMT Subject: RFR: 8258383: vmTestbase/gc/g1/unloading/tests/unloading_compilation_level[1, 2, 3] time out without TieredCompilation [v2] In-Reply-To: <9GYSfaiylUZMije7ILsk6TZpg4_dADdCjzCJ4epMKhw=.0b99d872-bb6d-467f-809b-e2eb423f8c05@github.com> References: <9GYSfaiylUZMije7ILsk6TZpg4_dADdCjzCJ4epMKhw=.0b99d872-bb6d-467f-809b-e2eb423f8c05@github.com> Message-ID: > The test gets stuck while waiting for a compilation to succeed, because the corresponding compilation level is not available since Tiered Compilation is disabled (or `TieredStopAtLevel` is set). The tests should not be executed without Tiered Compilation (or if the requested compilation level is not available) and also check the output of `enqueueMethodForCompilation` for sanity. > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Removed TieredCompilation flag from compilation level 4 tests ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2125/files - new: https://git.openjdk.java.net/jdk/pull/2125/files/0a31cbd7..c89cb5d6 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2125&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2125&range=00-01 Stats: 6 lines in 6 files changed: 0 ins; 6 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2125.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2125/head:pull/2125 PR: https://git.openjdk.java.net/jdk/pull/2125 From pliden at openjdk.java.net Tue Jan 19 08:35:45 2021 From: pliden at openjdk.java.net (Per Liden) Date: Tue, 19 Jan 2021 08:35:45 GMT Subject: [jdk16] RFR: 8259765: ZGC: Handle incorrect processor id reported by the operating system [v2] In-Reply-To: References: Message-ID: On Fri, 15 Jan 2021 22:02:44 GMT, Per Liden wrote: >> src/hotspot/os/linux/os_linux.cpp line 4769: >> >>> 4767: const int id = Linux::sched_getcpu(); >>> 4768: >>> 4769: if (id >= 0 && id < processor_count()) { >> >> Do we really need to check if the returned processor ID is negative? That seems a whole new level of environment screwup to me. > > I'm thinking we should make this safe to call in all cases. God knows what a broken environment might return. After some discussions, we agreed to not to check for negative processor ids. ------------- PR: https://git.openjdk.java.net/jdk16/pull/124 From pliden at openjdk.java.net Tue Jan 19 08:35:44 2021 From: pliden at openjdk.java.net (Per Liden) Date: Tue, 19 Jan 2021 08:35:44 GMT Subject: [jdk16] RFR: 8259765: ZGC: Handle incorrect processor id reported by the operating system [v2] In-Reply-To: References: Message-ID: On Sat, 16 Jan 2021 13:00:04 GMT, David Holmes wrote: >> Per Liden has updated the pull request incrementally with one additional commit since the last revision: >> >> Review > > So we have to penalize all correctly functioning users because of one broken environment? Can we not detect this broken environment at startup and inject a workaround then? > > Why is this an environment that is important enough that OpenJDK has to make changes to deal with a broken environment? > > Cheers, > David @dholmes-ora > So we have to penalize all correctly functioning users because of one broken environment? Can we not detect this broken environment at startup and inject a workaround then? Not sure what you have in mind here? Having an indirect function call would not result in a lower overhead than the test/branch I've introduced. It's also not necessarily trivial to detect this error at startup, as you would need a reliable way to enumerate all processors (something that seems semi-broken in this environment, which is the root of the problem), bind the current thread to each of them and then check the processor id. > Why is this an environment that is important enough that OpenJDK has to make changes to deal with a broken environment? That's of course always judgement call/trade-off. I can't say I have a super good understanding of how common this environment it, but there's at least one "Java cloud provider" that uses this environment. ------------- PR: https://git.openjdk.java.net/jdk16/pull/124 From thartmann at openjdk.java.net Tue Jan 19 08:35:51 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Tue, 19 Jan 2021 08:35:51 GMT Subject: RFR: 8258383: vmTestbase/gc/g1/unloading/tests/unloading_compilation_level[1, 2, 3] time out without TieredCompilation [v2] In-Reply-To: References: <9GYSfaiylUZMije7ILsk6TZpg4_dADdCjzCJ4epMKhw=.0b99d872-bb6d-467f-809b-e2eb423f8c05@github.com> Message-ID: On Mon, 18 Jan 2021 20:30:43 GMT, Vladimir Kozlov wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Removed TieredCompilation flag from compilation level 4 tests > > I see such `requires` patter in other tests too. > But what will happen if server VM is built without C1 - no tiered? > Such tests may need additions requires `vm.compiler1.enabled` > > Also when requested level 4 compilation (*_compilation_level4_* tests) you don't need to force -XX:+TieredCompilation. Thanks for the review Vladimir. > I see such requires patter in other tests too. > But what will happen if server VM is built without C1 - no tiered? > Such tests may need additions requires vm.compiler1.enabled Yes, I've took that pattern from other tests. The problem with `requires vm.compiler1.enabled` is that the test will be skipped if `-XX:-TieredCompilation` is set (because then C1 is not available). Since this is a general problem that affects other tests as well, I think it should be addressed separately if necessary. What do you think? > Also when requested level 4 compilation (compilation_level4 tests) you don't need to force -XX:+TieredCompilation. Right, I've updated the corresponding tests. ------------- PR: https://git.openjdk.java.net/jdk/pull/2125 From pliden at openjdk.java.net Tue Jan 19 08:35:46 2021 From: pliden at openjdk.java.net (Per Liden) Date: Tue, 19 Jan 2021 08:35:46 GMT Subject: [jdk16] RFR: 8259765: ZGC: Handle incorrect processor id reported by the operating system [v2] In-Reply-To: <4y_7I4sbLbG4BHe7xZi0Jzf4eykTF1ae7ngTukj1JkM=.d7fd2659-9c69-40fa-9e3d-8c5e0fe268b7@github.com> References: <4y_7I4sbLbG4BHe7xZi0Jzf4eykTF1ae7ngTukj1JkM=.d7fd2659-9c69-40fa-9e3d-8c5e0fe268b7@github.com> Message-ID: On Fri, 15 Jan 2021 21:59:56 GMT, Per Liden wrote: >> src/hotspot/os/linux/os_linux.cpp line 4749: >> >>> 4747: } >>> 4748: >>> 4749: static volatile int warn_invalid_processor_id = 1; >> >> Maybe moving this var into the function, since it's only used inside it. > > Doing so will come with the cost of always having to run a pthread_once() in function entry. I'm was wrong here. Since the initialization if effectively const/constexp, there will not be any "pthread_once" overhead here. Moved the static variable inside the function in the latest commit. ------------- PR: https://git.openjdk.java.net/jdk16/pull/124 From sjohanss at openjdk.java.net Tue Jan 19 09:00:45 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Tue, 19 Jan 2021 09:00:45 GMT Subject: RFR: JDK-8256155: os::Linux Populate all large_page_sizes, select smallest page size in reserve_memory_special_huge_tlbfs* [v15] In-Reply-To: References: Message-ID: <-XDDOuZ587CjzTAMiKepvJ-siKe6kSCUiNJwQ_VZS6I=.19e9fc03-e93d-4ccd-ace1-5b844ad7caa7@github.com> On Tue, 19 Jan 2021 07:06:48 GMT, Thomas Stuefe wrote: >> The "dynamic part" might make this actually being "too much logic", otherwise if feels like a pretty reasonable check. The fact that huge pages can be added during the runtime of the JVM doesn't feel like a big problem, since most large reservations are done at startup. >> >> But you might be right, since we have to handle the case of a failed mapping, it might not be to big of a problem trying all possible page sizes. > > I'm actually using nr_overcommit_hugepages alot (has been a tip by Per Liden) since its so convenient. As for allocation at startup, I plan on making Metaspace large-page-able again at some point in the future; that would mean larger LP allocations may happen later in VM life too. Ok, maybe I should try it out as well :) Regarding allocation at startup vs later, is the plan to make new reservations during the run or supporting uncommit of large pages. Currently if a `ReservedSpace` is special (uses large pages), uncommit is disabled and all pages are committed up front. Is your plan to change this or will it work by adding and removing `ReservedSpace`s. I have not had time to look at the new `Metaspace` implementation in detail yet. ------------- PR: https://git.openjdk.java.net/jdk/pull/1153 From stuefe at openjdk.java.net Tue Jan 19 09:27:50 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 19 Jan 2021 09:27:50 GMT Subject: RFR: JDK-8256155: os::Linux Populate all large_page_sizes, select smallest page size in reserve_memory_special_huge_tlbfs* [v15] In-Reply-To: <-XDDOuZ587CjzTAMiKepvJ-siKe6kSCUiNJwQ_VZS6I=.19e9fc03-e93d-4ccd-ace1-5b844ad7caa7@github.com> References: <-XDDOuZ587CjzTAMiKepvJ-siKe6kSCUiNJwQ_VZS6I=.19e9fc03-e93d-4ccd-ace1-5b844ad7caa7@github.com> Message-ID: On Tue, 19 Jan 2021 08:57:42 GMT, Stefan Johansson wrote: >> I'm actually using nr_overcommit_hugepages alot (has been a tip by Per Liden) since its so convenient. As for allocation at startup, I plan on making Metaspace large-page-able again at some point in the future; that would mean larger LP allocations may happen later in VM life too. > > Ok, maybe I should try it out as well :) > > Regarding allocation at startup vs later, is the plan to make new reservations during the run or supporting uncommit of large pages. Currently if a `ReservedSpace` is special (uses large pages), uncommit is disabled and all pages are committed up front. Is your plan to change this or will it work by adding and removing `ReservedSpace`s. I have not had time to look at the new `Metaspace` implementation in detail yet. As it is now in my head, using LP on Metaspace would disable on-demand uncommitting (there is a second stage release of memory unaffected by this, where ReservedSpace segments get unmapped, but that is rare due to fragmentation. Due to the large page size uncommit on demand would be much less effective anyway than with normal pages. I am vaguely aware however of someones (yours?) experiments with "soft uncommit" - madvise(MADV_FREE) - and was planning on playing around with this too. Depending on how that plays out it may be a way to get uncommit-like behavior for large pages. ------------- PR: https://git.openjdk.java.net/jdk/pull/1153 From fweimer at redhat.com Tue Jan 19 09:31:03 2021 From: fweimer at redhat.com (Florian Weimer) Date: Tue, 19 Jan 2021 10:31:03 +0100 Subject: [jdk16] RFR: 8259765: ZGC: Handle incorrect processor id reported by the operating system In-Reply-To: (Per Liden's message of "Fri, 15 Jan 2021 14:22:19 GMT") References: Message-ID: <87v9btjmso.fsf@oldenburg.str.redhat.com> I tried to comment on the Github pull request, but Skara overwrite my comment. You could perhaps look at the highest CPU in the affinity mask, add 1, and take the maximum of that and the return value of sysconf(_SC_NPROCESSORS_CONF). As long as OpenVZ is not altering CPU masks dynamically, this should work around this particular issue, in a way that doesn't increase overhead for everyone. It would be good to have someone from Virtuozzo comment to indicate whether the affinity mask is actually reliable for this. But they will see test failures in low-level test suites if the affinity mask and sched_getcpu are incompatible (I actually wrote a glibc test case for this). Thanks, Florian -- Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill From per.liden at oracle.com Tue Jan 19 09:42:03 2021 From: per.liden at oracle.com (Per Liden) Date: Tue, 19 Jan 2021 10:42:03 +0100 Subject: [jdk16] RFR: 8259765: ZGC: Handle incorrect processor id reported by the operating system In-Reply-To: <87v9btjmso.fsf@oldenburg.str.redhat.com> References: <87v9btjmso.fsf@oldenburg.str.redhat.com> Message-ID: <7bfd6fbc-77f7-fa00-870c-32ca99410383@oracle.com> On 1/19/21 10:31 AM, Florian Weimer wrote: > I tried to comment on the Github pull request, but Skara overwrite my > comment. > > You could perhaps look at the highest CPU in the affinity mask, add 1, > and take the maximum of that and the return value of > sysconf(_SC_NPROCESSORS_CONF). As long as OpenVZ is not altering CPU > masks dynamically, this should work around this particular issue, in a > way that doesn't increase overhead for everyone. > > It would be good to have someone from Virtuozzo comment to indicate > whether the affinity mask is actually reliable for this. But they will > see test failures in low-level test suites if the affinity mask and > sched_getcpu are incompatible (I actually wrote a glibc test case for > this). Glibc's tst-getcpu.c (which I assume is the test you are referring to?) fails in their environment, so it seems like the affinity mask isn't reliable either. cheers, Per From david.holmes at oracle.com Tue Jan 19 10:00:30 2021 From: david.holmes at oracle.com (David Holmes) Date: Tue, 19 Jan 2021 20:00:30 +1000 Subject: [jdk16] RFR: 8259765: ZGC: Handle incorrect processor id reported by the operating system In-Reply-To: <7bfd6fbc-77f7-fa00-870c-32ca99410383@oracle.com> References: <87v9btjmso.fsf@oldenburg.str.redhat.com> <7bfd6fbc-77f7-fa00-870c-32ca99410383@oracle.com> Message-ID: <5dda5692-1664-c6c7-9697-b34cddd96799@oracle.com> On 19/01/2021 7:42 pm, Per Liden wrote: > On 1/19/21 10:31 AM, Florian Weimer wrote: >> I tried to comment on the Github pull request, but Skara overwrite my >> comment. >> >> You could perhaps look at the highest CPU in the affinity mask, add 1, >> and take the maximum of that and the return value of >> sysconf(_SC_NPROCESSORS_CONF). As long as OpenVZ is not altering CPU >> masks dynamically, this should work around this particular issue, in a >> way that doesn't increase overhead for everyone. >> >> It would be good to have someone from Virtuozzo comment to indicate >> whether the affinity mask is actually reliable for this.? But they will >> see test failures in low-level test suites if the affinity mask and >> sched_getcpu are incompatible (I actually wrote a glibc test case for >> this). > > Glibc's tst-getcpu.c (which I assume is the test you are referring to?) > fails in their environment, so it seems like the affinity mask isn't > reliable either. Then there will potentially be a number of other problems because the active processor count may not be correct either. David ----- > cheers, > Per From fweimer at redhat.com Tue Jan 19 10:23:34 2021 From: fweimer at redhat.com (Florian Weimer) Date: Tue, 19 Jan 2021 11:23:34 +0100 Subject: [jdk16] RFR: 8259765: ZGC: Handle incorrect processor id reported by the operating system In-Reply-To: <7bfd6fbc-77f7-fa00-870c-32ca99410383@oracle.com> (Per Liden's message of "Tue, 19 Jan 2021 10:42:03 +0100") References: <87v9btjmso.fsf@oldenburg.str.redhat.com> <7bfd6fbc-77f7-fa00-870c-32ca99410383@oracle.com> Message-ID: <87czy1jkd5.fsf@oldenburg.str.redhat.com> * Per Liden: > Glibc's tst-getcpu.c (which I assume is the test you are referring > to?) fails in their environment, so it seems like the affinity mask > isn't reliable either. What's the nature of the failure? If it's due to a non-changing affinity mask, then using sched_getaffinity data would still be okay. Do you have any guidance from Virtuozzo what should be done here? Incorrect handling of affinities is a bit concerning because it breaks some (not entirely unreasonable) concurrency algorithms. Thanks, Florian -- Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill From fweimer at openjdk.java.net Tue Jan 19 10:33:46 2021 From: fweimer at openjdk.java.net (Florian Weimer) Date: Tue, 19 Jan 2021 10:33:46 GMT Subject: [jdk16] RFR: 8259765: ZGC: Handle incorrect processor id reported by the operating system [v2] In-Reply-To: References: Message-ID: <0gWEoIU_SHeOtZx4B5Avi8PaXGPXoKI4oIY5_g8mvIM=.3f13d53d-c1a4-4390-a370-2ac1509f1112@github.com> On Mon, 18 Jan 2021 08:05:27 GMT, Erik ?sterlund wrote: >> Per Liden has updated the pull request incrementally with one additional commit since the last revision: >> >> Review > > Marked as reviewed by eosterlund (Reviewer). What does the affinity mask look like at process startup? It should be possible to look at that and take the maximum CPU ID (plus 1) and `sysconf(_SC_NPROCESSORS_CONF)`. This would be a one-time overhead. It will not work with container deployments that dynamically alter affinity masks. Are there any? ------------- PR: https://git.openjdk.java.net/jdk16/pull/124 From shade at openjdk.java.net Tue Jan 19 10:36:03 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 19 Jan 2021 10:36:03 GMT Subject: RFR: 8259962: Shenandoah: task queue statistics is inconsistent after JDK-8255019 Message-ID: I believe `ShenandoahSTWMark` misses the TQ stats reset after the takeover from concurrent cycle. See the stack trace and events info in the bug. Additional testing: - [x] Linux x86_64, failing tests now pass - [x] Linux x86_64 `hotspot_gc_shenandoah` - [ ] Linux x86_64 `tier1` with Shenandoah ------------- Commit messages: - 8259962: Shenandoah: task queue statistics is inconsistent after JDK-8255019 Changes: https://git.openjdk.java.net/jdk/pull/2141/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2141&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8259962 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2141.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2141/head:pull/2141 PR: https://git.openjdk.java.net/jdk/pull/2141 From stuefe at openjdk.java.net Tue Jan 19 10:38:55 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Tue, 19 Jan 2021 10:38:55 GMT Subject: RFR: JDK-8256155: os::Linux Populate all large_page_sizes, select smallest page size in reserve_memory_special_huge_tlbfs* [v15] In-Reply-To: References: <-XDDOuZ587CjzTAMiKepvJ-siKe6kSCUiNJwQ_VZS6I=.19e9fc03-e93d-4ccd-ace1-5b844ad7caa7@github.com> Message-ID: On Tue, 19 Jan 2021 09:31:43 GMT, Stefan Johansson wrote: >> As it is now in my head, using LP on Metaspace would disable on-demand uncommitting (there is a second stage release of memory unaffected by this, where ReservedSpace segments get unmapped, but that is rare due to fragmentation. Due to the large page size uncommit on demand would be much less effective anyway than with normal pages. >> >> I am vaguely aware however of someones (yours?) experiments with "soft uncommit" - madvise(MADV_FREE) - and was planning on playing around with this too. Depending on how that plays out it may be a way to get uncommit-like behavior for large pages. > > Yes, we've done some experiments using `madvise`. Some results looked promising and others a bit surprising, but I didn't actually looked at how it would affect `HUGETLB` large pages. But yes, it might be a way to get better behavior for large pages. Maybe we also could take another look at the "never-remap-hugepages" rule added with JDK-8007074. I understand why stefank did that, but maybe if one added safety measures (eg before remapping making sure that we have enough huge pages in the pool with a large margin) and combined with a switch it would be safe enough. ------------- PR: https://git.openjdk.java.net/jdk/pull/1153 From sjohanss at openjdk.java.net Tue Jan 19 10:38:55 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Tue, 19 Jan 2021 10:38:55 GMT Subject: RFR: JDK-8256155: os::Linux Populate all large_page_sizes, select smallest page size in reserve_memory_special_huge_tlbfs* [v15] In-Reply-To: References: <-XDDOuZ587CjzTAMiKepvJ-siKe6kSCUiNJwQ_VZS6I=.19e9fc03-e93d-4ccd-ace1-5b844ad7caa7@github.com> Message-ID: On Tue, 19 Jan 2021 09:25:19 GMT, Thomas Stuefe wrote: >> Ok, maybe I should try it out as well :) >> >> Regarding allocation at startup vs later, is the plan to make new reservations during the run or supporting uncommit of large pages. Currently if a `ReservedSpace` is special (uses large pages), uncommit is disabled and all pages are committed up front. Is your plan to change this or will it work by adding and removing `ReservedSpace`s. I have not had time to look at the new `Metaspace` implementation in detail yet. > > As it is now in my head, using LP on Metaspace would disable on-demand uncommitting (there is a second stage release of memory unaffected by this, where ReservedSpace segments get unmapped, but that is rare due to fragmentation. Due to the large page size uncommit on demand would be much less effective anyway than with normal pages. > > I am vaguely aware however of someones (yours?) experiments with "soft uncommit" - madvise(MADV_FREE) - and was planning on playing around with this too. Depending on how that plays out it may be a way to get uncommit-like behavior for large pages. Yes, we've done some experiments using `madvise`. Some results looked promising and others a bit surprising, but I didn't actually looked at how it would affect `HUGETLB` large pages. But yes, it might be a way to get better behavior for large pages. ------------- PR: https://git.openjdk.java.net/jdk/pull/1153 From sjohanss at openjdk.java.net Tue Jan 19 10:38:55 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Tue, 19 Jan 2021 10:38:55 GMT Subject: RFR: JDK-8256155: os::Linux Populate all large_page_sizes, select smallest page size in reserve_memory_special_huge_tlbfs* [v15] In-Reply-To: References: <-XDDOuZ587CjzTAMiKepvJ-siKe6kSCUiNJwQ_VZS6I=.19e9fc03-e93d-4ccd-ace1-5b844ad7caa7@github.com> Message-ID: On Tue, 19 Jan 2021 09:25:19 GMT, Thomas Stuefe wrote: >> Ok, maybe I should try it out as well :) >> >> Regarding allocation at startup vs later, is the plan to make new reservations during the run or supporting uncommit of large pages. Currently if a `ReservedSpace` is special (uses large pages), uncommit is disabled and all pages are committed up front. Is your plan to change this or will it work by adding and removing `ReservedSpace`s. I have not had time to look at the new `Metaspace` implementation in detail yet. > > As it is now in my head, using LP on Metaspace would disable on-demand uncommitting (there is a second stage release of memory unaffected by this, where ReservedSpace segments get unmapped, but that is rare due to fragmentation. Due to the large page size uncommit on demand would be much less effective anyway than with normal pages. > > I am vaguely aware however of someones (yours?) experiments with "soft uncommit" - madvise(MADV_FREE) - and was planning on playing around with this too. Depending on how that plays out it may be a way to get uncommit-like behavior for large pages. Yes, we've done some experiments using `madvise`. Some results looked promising and others a bit surprising, but I didn't actually looked at how it would affect `HUGETLB` large pages. But yes, it might be a way to get better behavior for large pages. ------------- PR: https://git.openjdk.java.net/jdk/pull/1153 From sjohanss at openjdk.java.net Tue Jan 19 10:59:52 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Tue, 19 Jan 2021 10:59:52 GMT Subject: RFR: JDK-8256155: os::Linux Populate all large_page_sizes, select smallest page size in reserve_memory_special_huge_tlbfs* [v15] In-Reply-To: References: <-XDDOuZ587CjzTAMiKepvJ-siKe6kSCUiNJwQ_VZS6I=.19e9fc03-e93d-4ccd-ace1-5b844ad7caa7@github.com> Message-ID: On Tue, 19 Jan 2021 10:35:47 GMT, Thomas Stuefe wrote: >> Yes, we've done some experiments using `madvise`. Some results looked promising and others a bit surprising, but I didn't actually looked at how it would affect `HUGETLB` large pages. But yes, it might be a way to get better behavior for large pages. > > Maybe we also could take another look at the "never-remap-hugepages" rule added with JDK-8007074. I understand why stefank did that, but maybe if one added safety measures (eg before remapping making sure that we have enough huge pages in the pool with a large margin) and combined with a switch it would be safe enough. Yes, and maybe take another look at if +UseLargePages could mean use THP, currently THP is only used if explicitly set and I'm not sure that is true with newer Linux kernels. We even have a comment about this: // Don't try UseTransparentHugePages since there are known // performance issues with it turned on. This might change in the future. UseTransparentHugePages = false; ------------- PR: https://git.openjdk.java.net/jdk/pull/1153 From shade at openjdk.java.net Tue Jan 19 12:09:54 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 19 Jan 2021 12:09:54 GMT Subject: RFR: 8255765: Shenandoah: Isolate concurrent, degenerated and full GC In-Reply-To: <9sUK197ZPOSEPLEDl8zKSqby-KMi9FryXnoctOdWmMc=.7b678df1-1c51-4bbf-b1a3-d8a8167a927d@github.com> References: <9sUK197ZPOSEPLEDl8zKSqby-KMi9FryXnoctOdWmMc=.7b678df1-1c51-4bbf-b1a3-d8a8167a927d@github.com> Message-ID: On Wed, 6 Jan 2021 16:45:03 GMT, Zhengyu Gu wrote: > The purpose of this patch is to isolate concurrent, degenerated and full gc implementation, so that, makes each GC implementation more straightforward and clean, and improves readability. > > Current implementation emphasis code sharing, e.g. degenerated GC reuses concurrent GC's ops. It was not a problem in the beginning, when they actually behave similarly. Since concurrent GC moved root processing into concurrent phases, code started to diverge, we started to put bandages to make the shared ops work for both concurrent and degenerated GC, that made code hard to read and error prone. > > The patch breaks up GCs into 3 (mainly just concurrent and degenerated GC, as full GC already standalone) easy to identify and understand classes (ShenandoahConcurrentGC, ShenandoahDegeneratedGC and ShenandoahMarkCompactGC), subclasses of ShenandoahGC class. > > The three GCs still keep vmop/entry/op paradigm, but encapsulate GC control flow inside their own classes, as ShenandoahMarkCompact GC already does. So that, concurrent and degenerated GC no longer share ops and op implementations no longer need to consider other GC modes, which results in simplifying implementation and improving readability. Code sharing is achieved via helper methods provided by ShenandoahHeap. > > Test: > - [x] hotspot_gc_shenandoah > - [x] nightly tests Some minor nits from the initial review. Please make sure `tier1` and `tier2` with Shenandoah still pass. src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.hpp line 53: > 51: ShenandoahDegenPoint degen_point() const; > 52: > 53: // Cancel on going concurrent GC "ongoing"? src/hotspot/share/gc/shenandoah/shenandoahDegeneratedGC.cpp line 28: > 26: > 27: #include "gc/shared/collectorCounters.hpp" > 28: Do we really need these newlines? src/hotspot/share/gc/shenandoah/shenandoahDegeneratedGC.cpp line 44: > 42: #include "gc/shenandoah/shenandoahWorkerPolicy.hpp" > 43: #include "gc/shenandoah/shenandoahVMOperations.hpp" > 44: ...and this newline? src/hotspot/share/gc/shenandoah/shenandoahDegeneratedGC.hpp line 54: > 52: // Prepare STW evacuation > 53: void op_prepare_evacuation(); > 54: // Empty comment. Actually, do we even need comments here? I.e. does `ShenandoahConcurrentGC` have them? src/hotspot/share/gc/shenandoah/shenandoahDegeneratedGC.hpp line 25: > 23: */ > 24: > 25: #ifndef SHARE_GC_SHENANDOAH_SHENANDOAHDEGENDGC_HPP The include guard should match the file name, maybe? `SHARE_GC_SHENANDOAH_SHENANDOAHDEGENERATEDGC_HPP`? src/hotspot/share/gc/shenandoah/shenandoahGC.cpp line 28: > 26: > 27: #include "gc/shared/workgroup.hpp" > 28: Excess newlines? src/hotspot/share/gc/shenandoah/shenandoahGC.hpp line 43: > 41: * | | upgrade from degenerated GC | > 42: * Full GC---------------------------v---------------------------->| > 43: */ This diagram is confusing to me. ("normal" mode) ----> Concurrent GC ----> (finish) | | v ("passive" mode) ---> Degenerated GC ---> (finish) | | v Full GC --------> (finish) src/hotspot/share/gc/shenandoah/shenandoahGC.hpp line 32: > 30: > 31: /* > 32: * Base class of three Shenandoah GC modes Is it a "mode", though? There is `ShenandoahGCMode`, which means a different thing. Maybe "flavor"? src/hotspot/share/gc/shenandoah/shenandoahGC.hpp line 66: > 64: }; > 65: > 66: #endif No newline at the end of the file. src/hotspot/share/gc/shenandoah/shenandoahGC.hpp line 45: > 43: */ > 44: > 45: class ShenandoahHeap; Cannot see why this forward declaration is needed. src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 128: > 126: friend class ShenandoahConcurrentGC; > 127: friend class ShenandoahDegenGC; > 128: friend class ShenandoahMarkCompact; At some point, it would make sense to rename `ShenandoahMarkCompact` to `ShenandoahFullGC`? src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 369: > 367: > 368: public: > 369: void notify_gc_progress() { _progress_last_gc.set();} Align the closing brace with the next line? src/hotspot/share/gc/shenandoah/shenandoahRootProcessor.cpp line 196: > 194: > 195: if (!heap->unload_classes()) { > 196: _cld_roots.cld_do(&clds_cl, worker_id); This change seems unrelated? ------------- Changes requested by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1964 From tschatzl at openjdk.java.net Tue Jan 19 12:37:51 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 19 Jan 2021 12:37:51 GMT Subject: RFR: 8259776: Remove ParallelGC non-CAS oldgen allocation [v3] In-Reply-To: References: Message-ID: On Mon, 18 Jan 2021 23:51:55 GMT, Kim Barrett wrote: >> Please review this change to ParallelGC oldgen allocation. There were two >> variants, one using CAS on the _top member of the mutable space, the other >> requiring locking or other forms of mutual exclusion. >> >> We don't need both variants. The non-CAS variant is only used in a few >> places, where the cost of an extra CAS doesn't matter. What does matter is >> that having two variants, which must not be used concurrently, makes the >> code larger, more complex, and harder to verify. (This change came out of >> analyzing JDK-8259271. No problems were found (or expected), so this change >> is not expected to impact that bug. But because of the two variants, the >> possibility of unexpected interact needed to be examined.) >> >> The non-CAS allocation support has been removed, with PSOldGen::allocate now >> implemented using the CAS-based allocation. The cas_ prefix naming >> convention is retained for the internals for clarity. >> >> While looking at this, noticed and removed a couple of lingering references >> to the class AdjoiningGenerations, which no longer exists after JDK-8243146. >> >> Testing: >> mach5 tier1-5 > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > move oldgen alloc with size policy recording to heap object Lgtm. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2101 From zgu at openjdk.java.net Tue Jan 19 13:13:37 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Tue, 19 Jan 2021 13:13:37 GMT Subject: RFR: 8259962: Shenandoah: task queue statistics is inconsistent after JDK-8255019 In-Reply-To: References: Message-ID: On Tue, 19 Jan 2021 10:27:26 GMT, Aleksey Shipilev wrote: > I believe `ShenandoahSTWMark` misses the TQ stats reset after the takeover from concurrent cycle. See the stack trace and events info in the bug. > > Additional testing: > - [x] Linux x86_64, failing tests now pass > - [x] Linux x86_64 `hotspot_gc_shenandoah` > - [x] Linux x86_64 `tier1` with Shenandoah Thanks for catching this. Looks good. Concurrent mark resets qstats in ShenandoahConcurrentMark::mark_stw_roots(), so it should be fine. ------------- Marked as reviewed by zgu (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2141 From shade at openjdk.java.net Tue Jan 19 14:42:48 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 19 Jan 2021 14:42:48 GMT Subject: Integrated: 8259962: Shenandoah: task queue statistics is inconsistent after JDK-8255019 In-Reply-To: References: Message-ID: On Tue, 19 Jan 2021 10:27:26 GMT, Aleksey Shipilev wrote: > I believe `ShenandoahSTWMark` misses the TQ stats reset after the takeover from concurrent cycle. See the stack trace and events info in the bug. > > Additional testing: > - [x] Linux x86_64, failing tests now pass > - [x] Linux x86_64 `hotspot_gc_shenandoah` > - [x] Linux x86_64 `tier1` with Shenandoah This pull request has now been integrated. Changeset: c0e9c446 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/c0e9c446 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8259962: Shenandoah: task queue statistics is inconsistent after JDK-8255019 Reviewed-by: zgu ------------- PR: https://git.openjdk.java.net/jdk/pull/2141 From shade at openjdk.java.net Tue Jan 19 14:42:47 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 19 Jan 2021 14:42:47 GMT Subject: RFR: 8259962: Shenandoah: task queue statistics is inconsistent after JDK-8255019 In-Reply-To: References: Message-ID: On Tue, 19 Jan 2021 13:11:11 GMT, Zhengyu Gu wrote: >> I believe `ShenandoahSTWMark` misses the TQ stats reset after the takeover from concurrent cycle. See the stack trace and events info in the bug. >> >> Additional testing: >> - [x] Linux x86_64, failing tests now pass >> - [x] Linux x86_64 `hotspot_gc_shenandoah` >> - [x] Linux x86_64 `tier1` with Shenandoah > > Thanks for catching this. Looks good. > > Concurrent mark resets qstats in ShenandoahConcurrentMark::mark_stw_roots(), so it should be fine. Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/2141 From mbaesken at openjdk.java.net Tue Jan 19 15:42:58 2021 From: mbaesken at openjdk.java.net (Matthias Baesken) Date: Tue, 19 Jan 2021 15:42:58 GMT Subject: RFR: JDK-8259983: do not use uninitialized expand_ms value in G1CollectedHeap::expand_heap_after_young_collection Message-ID: <86FHj3iOq3pmxZzqpcyFxT3ipdWha8RRERHK833RKSI=.a630dec5-d362-4308-8ba5-3cb4b31f565b@github.com> Currently we could run into an uninitialized value of expand_ms in G1CollectedHeap::expand_heap_after_young_collection() . This would happen in case of an early return of bool G1CollectedHeap::expand(size_t expand_bytes, WorkGang* pretouch_workers, double* expand_time_ms) . See the special case in expand if (is_maximal_no_gc()) { log_debug(gc, ergo, heap)("Did not expand the heap (heap already fully expanded)"); return false; } ------------- Commit messages: - JDK-8259983 Changes: https://git.openjdk.java.net/jdk/pull/2148/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2148&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8259983 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2148.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2148/head:pull/2148 PR: https://git.openjdk.java.net/jdk/pull/2148 From zgu at openjdk.java.net Tue Jan 19 18:09:53 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Tue, 19 Jan 2021 18:09:53 GMT Subject: RFR: 8255765: Shenandoah: Isolate concurrent, degenerated and full GC In-Reply-To: References: <9sUK197ZPOSEPLEDl8zKSqby-KMi9FryXnoctOdWmMc=.7b678df1-1c51-4bbf-b1a3-d8a8167a927d@github.com> Message-ID: On Tue, 19 Jan 2021 12:05:19 GMT, Aleksey Shipilev wrote: >> The purpose of this patch is to isolate concurrent, degenerated and full gc implementation, so that, makes each GC implementation more straightforward and clean, and improves readability. >> >> Current implementation emphasis code sharing, e.g. degenerated GC reuses concurrent GC's ops. It was not a problem in the beginning, when they actually behave similarly. Since concurrent GC moved root processing into concurrent phases, code started to diverge, we started to put bandages to make the shared ops work for both concurrent and degenerated GC, that made code hard to read and error prone. >> >> The patch breaks up GCs into 3 (mainly just concurrent and degenerated GC, as full GC already standalone) easy to identify and understand classes (ShenandoahConcurrentGC, ShenandoahDegeneratedGC and ShenandoahMarkCompactGC), subclasses of ShenandoahGC class. >> >> The three GCs still keep vmop/entry/op paradigm, but encapsulate GC control flow inside their own classes, as ShenandoahMarkCompact GC already does. So that, concurrent and degenerated GC no longer share ops and op implementations no longer need to consider other GC modes, which results in simplifying implementation and improving readability. Code sharing is achieved via helper methods provided by ShenandoahHeap. >> >> Test: >> - [x] hotspot_gc_shenandoah >> - [x] nightly tests > > src/hotspot/share/gc/shenandoah/shenandoahRootProcessor.cpp line 196: > >> 194: >> 195: if (!heap->unload_classes()) { >> 196: _cld_roots.cld_do(&clds_cl, worker_id); > > This change seems unrelated? Right. Restored and will file a separate bug to clean it up. ------------- PR: https://git.openjdk.java.net/jdk/pull/1964 From kvn at openjdk.java.net Tue Jan 19 18:11:49 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 19 Jan 2021 18:11:49 GMT Subject: RFR: 8258383: vmTestbase/gc/g1/unloading/tests/unloading_compilation_level[1, 2, 3] time out without TieredCompilation [v2] In-Reply-To: References: <9GYSfaiylUZMije7ILsk6TZpg4_dADdCjzCJ4epMKhw=.0b99d872-bb6d-467f-809b-e2eb423f8c05@github.com> Message-ID: <3XRiDgynGZg2hNJhPKdZ29I9SJIlEsemKXHsIz5lo4c=.20dfafd6-4aa6-4cc6-ad48-ea86754c31fd@github.com> On Tue, 19 Jan 2021 08:33:04 GMT, Tobias Hartmann wrote: >> The test gets stuck while waiting for a compilation to succeed, because the corresponding compilation level is not available since Tiered Compilation is disabled (or `TieredStopAtLevel` is set). The tests should not be executed without Tiered Compilation (or if the requested compilation level is not available) and also check the output of `enqueueMethodForCompilation` for sanity. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Removed TieredCompilation flag from compilation level 4 tests Good. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2125 From kvn at openjdk.java.net Tue Jan 19 18:17:47 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Tue, 19 Jan 2021 18:17:47 GMT Subject: RFR: 8258383: vmTestbase/gc/g1/unloading/tests/unloading_compilation_level[1, 2, 3] time out without TieredCompilation [v2] In-Reply-To: <3XRiDgynGZg2hNJhPKdZ29I9SJIlEsemKXHsIz5lo4c=.20dfafd6-4aa6-4cc6-ad48-ea86754c31fd@github.com> References: <9GYSfaiylUZMije7ILsk6TZpg4_dADdCjzCJ4epMKhw=.0b99d872-bb6d-467f-809b-e2eb423f8c05@github.com> <3XRiDgynGZg2hNJhPKdZ29I9SJIlEsemKXHsIz5lo4c=.20dfafd6-4aa6-4cc6-ad48-ea86754c31fd@github.com> Message-ID: On Tue, 19 Jan 2021 18:08:57 GMT, Vladimir Kozlov wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Removed TieredCompilation flag from compilation level 4 tests > > Good. > Thanks for the review Vladimir. > > > I see such requires patter in other tests too. > > But what will happen if server VM is built without C1 - no tiered? > > Such tests may need additions requires vm.compiler1.enabled > > Yes, I've took that pattern from other tests. The problem with `requires vm.compiler1.enabled` is that the test will be skipped if `-XX:-TieredCompilation` is set (because then C1 is not available). Since this is a general problem that affects other tests as well, I think it should be addressed separately if necessary. What do you think? Yes, you are right about switching off C1 with TieredCompilation flag. Current changes are fine. I agree with addressing C1 absence from VM build in separate changes. I think we need additional `requires` feature to check if C1/C2 are included in VM build. `vm.server` is not enough. ------------- PR: https://git.openjdk.java.net/jdk/pull/2125 From zgu at openjdk.java.net Tue Jan 19 18:24:04 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Tue, 19 Jan 2021 18:24:04 GMT Subject: RFR: 8255765: Shenandoah: Isolate concurrent, degenerated and full GC [v2] In-Reply-To: <9sUK197ZPOSEPLEDl8zKSqby-KMi9FryXnoctOdWmMc=.7b678df1-1c51-4bbf-b1a3-d8a8167a927d@github.com> References: <9sUK197ZPOSEPLEDl8zKSqby-KMi9FryXnoctOdWmMc=.7b678df1-1c51-4bbf-b1a3-d8a8167a927d@github.com> Message-ID: > The purpose of this patch is to isolate concurrent, degenerated and full gc implementation, so that, makes each GC implementation more straightforward and clean, and improves readability. > > Current implementation emphasis code sharing, e.g. degenerated GC reuses concurrent GC's ops. It was not a problem in the beginning, when they actually behave similarly. Since concurrent GC moved root processing into concurrent phases, code started to diverge, we started to put bandages to make the shared ops work for both concurrent and degenerated GC, that made code hard to read and error prone. > > The patch breaks up GCs into 3 (mainly just concurrent and degenerated GC, as full GC already standalone) easy to identify and understand classes (ShenandoahConcurrentGC, ShenandoahDegeneratedGC and ShenandoahMarkCompactGC), subclasses of ShenandoahGC class. > > The three GCs still keep vmop/entry/op paradigm, but encapsulate GC control flow inside their own classes, as ShenandoahMarkCompact GC already does. So that, concurrent and degenerated GC no longer share ops and op implementations no longer need to consider other GC modes, which results in simplifying implementation and improving readability. Code sharing is achieved via helper methods provided by ShenandoahHeap. > > Test: > - [x] hotspot_gc_shenandoah > - [x] nightly tests Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: Aleksey's comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1964/files - new: https://git.openjdk.java.net/jdk/pull/1964/files/d9805040..93b2ceed Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1964&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1964&range=00-01 Stats: 32 lines in 7 files changed: 3 ins; 15 del; 14 mod Patch: https://git.openjdk.java.net/jdk/pull/1964.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1964/head:pull/1964 PR: https://git.openjdk.java.net/jdk/pull/1964 From kbarrett at openjdk.java.net Tue Jan 19 18:57:44 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 19 Jan 2021 18:57:44 GMT Subject: RFR: JDK-8259983: do not use uninitialized expand_ms value in G1CollectedHeap::expand_heap_after_young_collection In-Reply-To: <86FHj3iOq3pmxZzqpcyFxT3ipdWha8RRERHK833RKSI=.a630dec5-d362-4308-8ba5-3cb4b31f565b@github.com> References: <86FHj3iOq3pmxZzqpcyFxT3ipdWha8RRERHK833RKSI=.a630dec5-d362-4308-8ba5-3cb4b31f565b@github.com> Message-ID: On Tue, 19 Jan 2021 15:37:10 GMT, Matthias Baesken wrote: > Currently we could run into an uninitialized value of expand_ms in G1CollectedHeap::expand_heap_after_young_collection() . > This would happen in case of an early return of bool G1CollectedHeap::expand(size_t expand_bytes, WorkGang* pretouch_workers, double* expand_time_ms) . See the special case in expand > > if (is_maximal_no_gc()) { > log_debug(gc, ergo, heap)("Did not expand the heap (heap already fully expanded)"); > return false; > } I considered suggesting instead only calling record_expand_heap_time if expand succeeds. The underlying value is reset to 0 as part of G1GCPhaseTimes::reset. But that assumes there aren't any time-consuming reasons for expand to fail. So this change looks good to me. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2148 From zgu at openjdk.java.net Tue Jan 19 19:34:02 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Tue, 19 Jan 2021 19:34:02 GMT Subject: RFR: 8255765: Shenandoah: Isolate concurrent, degenerated and full GC [v3] In-Reply-To: <9sUK197ZPOSEPLEDl8zKSqby-KMi9FryXnoctOdWmMc=.7b678df1-1c51-4bbf-b1a3-d8a8167a927d@github.com> References: <9sUK197ZPOSEPLEDl8zKSqby-KMi9FryXnoctOdWmMc=.7b678df1-1c51-4bbf-b1a3-d8a8167a927d@github.com> Message-ID: > The purpose of this patch is to isolate concurrent, degenerated and full gc implementation, so that, makes each GC implementation more straightforward and clean, and improves readability. > > Current implementation emphasis code sharing, e.g. degenerated GC reuses concurrent GC's ops. It was not a problem in the beginning, when they actually behave similarly. Since concurrent GC moved root processing into concurrent phases, code started to diverge, we started to put bandages to make the shared ops work for both concurrent and degenerated GC, that made code hard to read and error prone. > > The patch breaks up GCs into 3 (mainly just concurrent and degenerated GC, as full GC already standalone) easy to identify and understand classes (ShenandoahConcurrentGC, ShenandoahDegeneratedGC and ShenandoahMarkCompactGC), subclasses of ShenandoahGC class. > > The three GCs still keep vmop/entry/op paradigm, but encapsulate GC control flow inside their own classes, as ShenandoahMarkCompact GC already does. So that, concurrent and degenerated GC no longer share ops and op implementations no longer need to consider other GC modes, which results in simplifying implementation and improving readability. Code sharing is achieved via helper methods provided by ShenandoahHeap. > > Test: > - [x] hotspot_gc_shenandoah > - [x] nightly tests Zhengyu Gu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 101 commits: - Merge branch 'master' into JDK-8255765-isolate-gcs - Aleksey's comments - Remove cached heap in ShenandoahGC - Merge - Merge branch 'fix_phase_timings' into JDK-8255765-isolate-gcs - Merge - Merge - Merge - Merge - Removed trailing whitespaces - ... and 91 more: https://git.openjdk.java.net/jdk/compare/3edf393d...4e54d38d ------------- Changes: https://git.openjdk.java.net/jdk/pull/1964/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1964&range=02 Stats: 3218 lines in 20 files changed: 1802 ins; 1280 del; 136 mod Patch: https://git.openjdk.java.net/jdk/pull/1964.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1964/head:pull/1964 PR: https://git.openjdk.java.net/jdk/pull/1964 From zgu at openjdk.java.net Tue Jan 19 22:14:56 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Tue, 19 Jan 2021 22:14:56 GMT Subject: RFR: 8260005: Shenandoah: Remove unused AlwaysTrueClosure in ShenandoahConcurrentRootScanner::roots_do() Message-ID: Please review this trivial cleanup that removes unused AlwaysTrueClosure in ShenandoahConcurrentRootScanner::roots_do() Test: - [x] hotspot_gc_shenandoah ------------- Commit messages: - Init update Changes: https://git.openjdk.java.net/jdk/pull/2152/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2152&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260005 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2152.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2152/head:pull/2152 PR: https://git.openjdk.java.net/jdk/pull/2152 From shade at openjdk.java.net Tue Jan 19 22:19:48 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 19 Jan 2021 22:19:48 GMT Subject: Withdrawn: 8256949: Shenandoah: ditch allocation spike and GC penalties handling In-Reply-To: References: Message-ID: On Tue, 24 Nov 2020 12:53:29 GMT, Aleksey Shipilev wrote: > Following the improvements in JDK-8255984, I think we can dispense with old-style allocation spike and GC penalties handling. JDK-8255984 is supposed to accommodate both cases now. This issue is to have a base for performance evaluation. > > Additional testing: > - [x] Linux x86_64 `hotspot_gc_shenandoah` This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/1409 From lucy at openjdk.java.net Tue Jan 19 22:28:38 2021 From: lucy at openjdk.java.net (Lutz Schmidt) Date: Tue, 19 Jan 2021 22:28:38 GMT Subject: RFR: JDK-8259983: do not use uninitialized expand_ms value in G1CollectedHeap::expand_heap_after_young_collection In-Reply-To: <86FHj3iOq3pmxZzqpcyFxT3ipdWha8RRERHK833RKSI=.a630dec5-d362-4308-8ba5-3cb4b31f565b@github.com> References: <86FHj3iOq3pmxZzqpcyFxT3ipdWha8RRERHK833RKSI=.a630dec5-d362-4308-8ba5-3cb4b31f565b@github.com> Message-ID: On Tue, 19 Jan 2021 15:37:10 GMT, Matthias Baesken wrote: > Currently we could run into an uninitialized value of expand_ms in G1CollectedHeap::expand_heap_after_young_collection() . > This would happen in case of an early return of bool G1CollectedHeap::expand(size_t expand_bytes, WorkGang* pretouch_workers, double* expand_time_ms) . See the special case in expand > > if (is_maximal_no_gc()) { > log_debug(gc, ergo, heap)("Did not expand the heap (heap already fully expanded)"); > return false; > } Looks good to me. Isn't that complicated either. :-) ------------- Marked as reviewed by lucy (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2148 From cjplummer at openjdk.java.net Tue Jan 19 22:48:01 2021 From: cjplummer at openjdk.java.net (Chris Plummer) Date: Tue, 19 Jan 2021 22:48:01 GMT Subject: RFR: 8247514: Improve clhsdb 'findpc' ability to determine what an address points to by improving PointerFinder and PointerLocation classes Message-ID: <4YKNpyXQ9QGrLhR61tkh71Q3A7VvCj5Ete_4OvzAA-o=.28b7be8c-6f05-42d4-892b-87ebea907b24@github.com> See the bug for most details. A few notes here about some implementation details: In the `PointerLocation` class, I added more consistency w.r.t. whether or not a newline is printed. It used to for some address types, but not others. Now it always does. And if you see a comment something like the following: ` getTLAB().printOn(tty); // includes "\n" ` That's just clarifying whether or not the `printOn()` method called will include the newline. Some do and some don't, and knowing what the various `printOn()` methods do makes getting the proper inclusion of the newline easier to understand. I added `verbose` and `printAddress` boolean arguments to `PointerLocation.printOn()`. Currently they are always `true`. The false arguments will be used when I complete [JDK-8250801](https://bugs.openjdk.java.net/browse/JDK-8250801), which will use `PointerFinder/Location` to show what each register points to. The CR mentions that the main motivation for this work is for eventual replacement of the old clhsdb `whatis` command, which was implemented in javascript. It used to resolve DSO symbols, whereas `findpc` did not. The `whatis` code did this with the following: var dso = loadObjectContainingPC(addr); if (dso == null) { return ptrLoc.toString(); } var sym = dso.closestSymbolToPC(addr); if (sym != null) { return sym.name + '+' + sym.offset; } And now you'll see something similar in the PointerFinder code: loc.loadObject = cdbg.loadObjectContainingPC(a); if (loc.loadObject != null) { loc.nativeSymbol = loc.loadObject.closestSymbolToPC(a); return loc; } Note that now that `findpc` does everything that `whatis` used to (and more), we don't really need to add a java version of `whatis`, but I'll probably do so anyway just help out people who are used to using the `whatis` command. That will be done using [JDK-8244670](https://bugs.openjdk.java.net/browse/JDK-8244670) ------------- Commit messages: - Improvements for PointerFinder and findpc command. Changes: https://git.openjdk.java.net/jdk/pull/2111/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2111&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8247514 Stats: 292 lines in 5 files changed: 237 ins; 8 del; 47 mod Patch: https://git.openjdk.java.net/jdk/pull/2111.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2111/head:pull/2111 PR: https://git.openjdk.java.net/jdk/pull/2111 From mbaesken at openjdk.java.net Wed Jan 20 07:52:48 2021 From: mbaesken at openjdk.java.net (Matthias Baesken) Date: Wed, 20 Jan 2021 07:52:48 GMT Subject: Integrated: JDK-8259983: do not use uninitialized expand_ms value in G1CollectedHeap::expand_heap_after_young_collection In-Reply-To: <86FHj3iOq3pmxZzqpcyFxT3ipdWha8RRERHK833RKSI=.a630dec5-d362-4308-8ba5-3cb4b31f565b@github.com> References: <86FHj3iOq3pmxZzqpcyFxT3ipdWha8RRERHK833RKSI=.a630dec5-d362-4308-8ba5-3cb4b31f565b@github.com> Message-ID: On Tue, 19 Jan 2021 15:37:10 GMT, Matthias Baesken wrote: > Currently we could run into an uninitialized value of expand_ms in G1CollectedHeap::expand_heap_after_young_collection() . > This would happen in case of an early return of bool G1CollectedHeap::expand(size_t expand_bytes, WorkGang* pretouch_workers, double* expand_time_ms) . See the special case in expand > > if (is_maximal_no_gc()) { > log_debug(gc, ergo, heap)("Did not expand the heap (heap already fully expanded)"); > return false; > } This pull request has now been integrated. Changeset: 9f21bb6a Author: Matthias Baesken URL: https://git.openjdk.java.net/jdk/commit/9f21bb6a Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8259983: do not use uninitialized expand_ms value in G1CollectedHeap::expand_heap_after_young_collection Reviewed-by: kbarrett, lucy ------------- PR: https://git.openjdk.java.net/jdk/pull/2148 From shade at openjdk.java.net Wed Jan 20 07:53:57 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 20 Jan 2021 07:53:57 GMT Subject: RFR: 8260005: Shenandoah: Remove unused AlwaysTrueClosure in ShenandoahConcurrentRootScanner::roots_do() In-Reply-To: References: Message-ID: On Tue, 19 Jan 2021 22:09:49 GMT, Zhengyu Gu wrote: > Please review this trivial cleanup that removes unused AlwaysTrueClosure in ShenandoahConcurrentRootScanner::roots_do() > > Test: > - [x] hotspot_gc_shenandoah Looks fine and trivial. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2152 From thartmann at openjdk.java.net Wed Jan 20 08:13:51 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 20 Jan 2021 08:13:51 GMT Subject: RFR: 8258383: vmTestbase/gc/g1/unloading/tests/unloading_compilation_level[1, 2, 3] time out without TieredCompilation [v2] In-Reply-To: References: <9GYSfaiylUZMije7ILsk6TZpg4_dADdCjzCJ4epMKhw=.0b99d872-bb6d-467f-809b-e2eb423f8c05@github.com> <3XRiDgynGZg2hNJhPKdZ29I9SJIlEsemKXHsIz5lo4c=.20dfafd6-4aa6-4cc6-ad48-ea86754c31fd@github.com> Message-ID: On Tue, 19 Jan 2021 18:15:05 GMT, Vladimir Kozlov wrote: >> Good. > >> Thanks for the review Vladimir. >> >> > I see such requires patter in other tests too. >> > But what will happen if server VM is built without C1 - no tiered? >> > Such tests may need additions requires vm.compiler1.enabled >> >> Yes, I've took that pattern from other tests. The problem with `requires vm.compiler1.enabled` is that the test will be skipped if `-XX:-TieredCompilation` is set (because then C1 is not available). Since this is a general problem that affects other tests as well, I think it should be addressed separately if necessary. What do you think? > > Yes, you are right about switching off C1 with TieredCompilation flag. Current changes are fine. > I agree with addressing C1 absence from VM build in separate changes. I think we need additional `requires` feature to check if C1/C2 are included in VM build. `vm.server` is not enough. Thanks for the review, Vladimir! ------------- PR: https://git.openjdk.java.net/jdk/pull/2125 From rkennke at openjdk.java.net Wed Jan 20 10:01:38 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 20 Jan 2021 10:01:38 GMT Subject: RFR: 8260005: Shenandoah: Remove unused AlwaysTrueClosure in ShenandoahConcurrentRootScanner::roots_do() In-Reply-To: References: Message-ID: <3ywjav1EJRHYLQIPPF73PHJ2hUfTr4eOe7TmTd_ljS0=.d78d98eb-6dbd-443b-bf3a-9be824a50769@github.com> On Tue, 19 Jan 2021 22:09:49 GMT, Zhengyu Gu wrote: > Please review this trivial cleanup that removes unused AlwaysTrueClosure in ShenandoahConcurrentRootScanner::roots_do() > > Test: > - [x] hotspot_gc_shenandoah Looks good and trivial. ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2152 From thartmann at openjdk.java.net Wed Jan 20 11:51:54 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Wed, 20 Jan 2021 11:51:54 GMT Subject: Integrated: 8258383: vmTestbase/gc/g1/unloading/tests/unloading_compilation_level[1, 2, 3] time out without TieredCompilation In-Reply-To: <9GYSfaiylUZMije7ILsk6TZpg4_dADdCjzCJ4epMKhw=.0b99d872-bb6d-467f-809b-e2eb423f8c05@github.com> References: <9GYSfaiylUZMije7ILsk6TZpg4_dADdCjzCJ4epMKhw=.0b99d872-bb6d-467f-809b-e2eb423f8c05@github.com> Message-ID: On Mon, 18 Jan 2021 12:44:17 GMT, Tobias Hartmann wrote: > The test gets stuck while waiting for a compilation to succeed, because the corresponding compilation level is not available since Tiered Compilation is disabled (or `TieredStopAtLevel` is set). The tests should not be executed without Tiered Compilation (or if the requested compilation level is not available) and also check the output of `enqueueMethodForCompilation` for sanity. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 7c32ffea Author: Tobias Hartmann URL: https://git.openjdk.java.net/jdk/commit/7c32ffea Stats: 69 lines in 25 files changed: 44 ins; 0 del; 25 mod 8258383: vmTestbase/gc/g1/unloading/tests/unloading_compilation_level[1,2,3] time out without TieredCompilation Reviewed-by: kvn ------------- PR: https://git.openjdk.java.net/jdk/pull/2125 From sjohanss at openjdk.java.net Wed Jan 20 12:58:49 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 20 Jan 2021 12:58:49 GMT Subject: RFR: 8259776: Remove ParallelGC non-CAS oldgen allocation [v3] In-Reply-To: References: Message-ID: On Mon, 18 Jan 2021 23:51:55 GMT, Kim Barrett wrote: >> Please review this change to ParallelGC oldgen allocation. There were two >> variants, one using CAS on the _top member of the mutable space, the other >> requiring locking or other forms of mutual exclusion. >> >> We don't need both variants. The non-CAS variant is only used in a few >> places, where the cost of an extra CAS doesn't matter. What does matter is >> that having two variants, which must not be used concurrently, makes the >> code larger, more complex, and harder to verify. (This change came out of >> analyzing JDK-8259271. No problems were found (or expected), so this change >> is not expected to impact that bug. But because of the two variants, the >> possibility of unexpected interact needed to be examined.) >> >> The non-CAS allocation support has been removed, with PSOldGen::allocate now >> implemented using the CAS-based allocation. The cas_ prefix naming >> convention is retained for the internals for clarity. >> >> While looking at this, noticed and removed a couple of lingering references >> to the class AdjoiningGenerations, which no longer exists after JDK-8243146. >> >> Testing: >> mach5 tier1-5 > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > move oldgen alloc with size policy recording to heap object Nice cleanup. ------------- Marked as reviewed by sjohanss (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2101 From zgu at openjdk.java.net Wed Jan 20 13:14:56 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 20 Jan 2021 13:14:56 GMT Subject: Integrated: 8260005: Shenandoah: Remove unused AlwaysTrueClosure in ShenandoahConcurrentRootScanner::roots_do() In-Reply-To: References: Message-ID: On Tue, 19 Jan 2021 22:09:49 GMT, Zhengyu Gu wrote: > Please review this trivial cleanup that removes unused AlwaysTrueClosure in ShenandoahConcurrentRootScanner::roots_do() > > Test: > - [x] hotspot_gc_shenandoah This pull request has now been integrated. Changeset: 0b01d692 Author: Zhengyu Gu URL: https://git.openjdk.java.net/jdk/commit/0b01d692 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod 8260005: Shenandoah: Remove unused AlwaysTrueClosure in ShenandoahConcurrentRootScanner::roots_do() Reviewed-by: shade, rkennke ------------- PR: https://git.openjdk.java.net/jdk/pull/2152 From pliden at openjdk.java.net Wed Jan 20 13:16:45 2021 From: pliden at openjdk.java.net (Per Liden) Date: Wed, 20 Jan 2021 13:16:45 GMT Subject: [jdk16] RFR: 8259765: ZGC: Handle incorrect processor id reported by the operating system [v2] In-Reply-To: References: Message-ID: On Tue, 19 Jan 2021 08:30:53 GMT, Per Liden wrote: >> So we have to penalize all correctly functioning users because of one broken environment? Can we not detect this broken environment at startup and inject a workaround then? >> >> Why is this an environment that is important enough that OpenJDK has to make changes to deal with a broken environment? >> >> Cheers, >> David > > @dholmes-ora > >> So we have to penalize all correctly functioning users because of one broken environment? Can we not detect this broken environment at startup and inject a workaround then? > > Not sure what you have in mind here? Having an indirect function call would not result in a lower overhead than the test/branch I've introduced. It's also not necessarily trivial to detect this error at startup, as you would need a reliable way to enumerate all processors (something that seems semi-broken in this environment, which is the root of the problem), bind the current thread to each of them and then check the processor id. > >> Why is this an environment that is important enough that OpenJDK has to make changes to deal with a broken environment? > > That's of course always judgement call/trade-off. I can't say I have a super good understanding of how common this environment it, but there's at least one "Java cloud provider" that uses this environment. It seems there have been e-mails sent that didn't show up here, so I'm answering on GitHub to hopefully re-attach the discussion to this PR. >From the mailing list: >> Glibc's tst-getcpu.c (which I assume is the test you are referring >> to?) fails in their environment, so it seems like the affinity mask >> isn't reliable either. > > What's the nature of the failure? If it's due to a non-changing > affinity mask, then using sched_getaffinity data would still be okay. Glibc's tst-getcpu fails with some version of "getcpu results X should be Y". There seems to be a disconnect between CPU masks/affinity and what sched_getcpu() returns. Example (container with 1 CPU): 1. sysconf(_SC_NPROCESSORS_CONF) returns 1 2. sysconf(_SC_NPROCESSORS_ONLN) returns 1 3. sched_getaffinity() returns the mask 00000001 4. sched_setaffinity(00000001) return success, but then sched_getcpu() returns 7(!) Should have returned 0. Another example (container with 2 CPUs): 1. sysconf(_SC_NPROCESSORS_CONF) returns 2 2. sysconf(_SC_NPROCESSORS_ONLN) returns 2 3. sched_getaffinity() returns the mask 00000011 4. sched_setaffinity(00000001) returns success, but then sched_getcpu() returns 2(!). Should have returned 0. 5. sched_setaffinity(00000010) returns success, but then sched_getcpu() also returns 2(!). Should have returned 1. It looks like CPUs are virtualized on some level, but not in sched_getcpu(). I'm guessing sched_getcpu() is returning the CPU id of the physical CPU, and not the virtual CPU, or something. So in the last example, maybe both virtual CPUs were scheduled on the same physical CPU. ------------- PR: https://git.openjdk.java.net/jdk16/pull/124 From shade at openjdk.java.net Wed Jan 20 13:28:08 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 20 Jan 2021 13:28:08 GMT Subject: RFR: 8260048: Shenandoah: ShenandoahMarkingContext asserts are unnecessary Message-ID: There are two `shenandoah_assert_not_forwarded` asserts that are not necessary in `mark_{strong,weak}`, because the only caller [already asserts](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahMark.inline.hpp#L272) this higher-level invariant. There is no need to check it in `ShenandoahMarkingContext` once again. This simplifies the fastpath in fastdebug builds. Additional testing: - [x] `hotspot_gc_shenandoah` - [x] `tier1`, `tier2` with Shenandoah ------------- Commit messages: - Drop a few more parentheses - 8260048: Shenandoah: ShenandoahMarkingContext asserts are unnecessary Changes: https://git.openjdk.java.net/jdk/pull/2164/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2164&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260048 Stats: 4 lines in 1 file changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2164.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2164/head:pull/2164 PR: https://git.openjdk.java.net/jdk/pull/2164 From zgu at openjdk.java.net Wed Jan 20 13:53:38 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 20 Jan 2021 13:53:38 GMT Subject: RFR: 8260048: Shenandoah: ShenandoahMarkingContext asserts are unnecessary In-Reply-To: References: Message-ID: On Wed, 20 Jan 2021 13:20:41 GMT, Aleksey Shipilev wrote: > There are two `shenandoah_assert_not_forwarded` asserts that are not necessary in `mark_{strong,weak}`, because the only caller [already asserts](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahMark.inline.hpp#L272) this higher-level invariant. There is no need to check it in `ShenandoahMarkingContext` once again. This simplifies the fastpath in fastdebug builds. > > Additional testing: > - [x] `hotspot_gc_shenandoah` > - [x] `tier1`, `tier2` with Shenandoah Looks good and trivial. ------------- Marked as reviewed by zgu (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2164 From zgu at openjdk.java.net Wed Jan 20 15:12:00 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 20 Jan 2021 15:12:00 GMT Subject: RFR: 8259488: Shenandoah: Missing timing tracking for STW CLD root processing Message-ID: <5Ym9NRL-jXY5JFNIfjdsxSO_W25fV-uP8uQryx2CcUw=.082669c1-3063-4412-b578-f1b86f3c59e9@github.com> Please review this trivial patch that adds missing timing tracking for STW CLD root processing. - [x] hotspot_gc_shenandoah ------------- Commit messages: - JDK-8259488 Changes: https://git.openjdk.java.net/jdk/pull/2165/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2165&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8259488 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2165.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2165/head:pull/2165 PR: https://git.openjdk.java.net/jdk/pull/2165 From shade at openjdk.java.net Wed Jan 20 15:12:59 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 20 Jan 2021 15:12:59 GMT Subject: RFR: 8260106: Shenandoah: simplify maybe_update_with_forwarded and related code Message-ID: We have a block in `ShenandoahHeap::maybe_update_with_forwarded` that is irrelevant after JDK-8231086. Additionally, "resolve and update" paths are really only used by STW GCs, and thus do not require atomic updates. This leads to considerable simplifications in the code, and improves performance on the common paths (especially in fastdebug builds that drop many irrelevant asserts). Additional testing: - [x] `hotspot_gc_shenandoah` - [ ] `tier1` with Shenandoah - [ ] `tier2` with Shenandoah ------------- Commit messages: - 8260106: Shenandoah: simplify maybe_update_with_forwarded and related code Changes: https://git.openjdk.java.net/jdk/pull/2166/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2166&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260106 Stats: 147 lines in 4 files changed: 19 ins; 84 del; 44 mod Patch: https://git.openjdk.java.net/jdk/pull/2166.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2166/head:pull/2166 PR: https://git.openjdk.java.net/jdk/pull/2166 From shade at openjdk.java.net Wed Jan 20 18:55:53 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 20 Jan 2021 18:55:53 GMT Subject: RFR: 8259488: Shenandoah: Missing timing tracking for STW CLD root processing In-Reply-To: <5Ym9NRL-jXY5JFNIfjdsxSO_W25fV-uP8uQryx2CcUw=.082669c1-3063-4412-b578-f1b86f3c59e9@github.com> References: <5Ym9NRL-jXY5JFNIfjdsxSO_W25fV-uP8uQryx2CcUw=.082669c1-3063-4412-b578-f1b86f3c59e9@github.com> Message-ID: On Wed, 20 Jan 2021 15:04:55 GMT, Zhengyu Gu wrote: > Please review this trivial patch that adds missing timing tracking for STW CLD root processing. > > - [x] hotspot_gc_shenandoah Looks good. Please wait for tests ("Checks" tab) to complete anyway, in case tracking runs into any kind of assert. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2165 From zgu at openjdk.java.net Wed Jan 20 19:13:51 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 20 Jan 2021 19:13:51 GMT Subject: RFR: 8260106: Shenandoah: simplify maybe_update_with_forwarded and related code In-Reply-To: References: Message-ID: <_wIuAaUBJdBGdD4lrDUdS5f2dv4aZJqcAvqWA22qOeQ=.d73a77a1-80a2-46ba-b728-b461b67b738e@github.com> On Wed, 20 Jan 2021 15:06:20 GMT, Aleksey Shipilev wrote: > We have a block in `ShenandoahHeap::maybe_update_with_forwarded` that is irrelevant after JDK-8231086. Additionally, "resolve and update" paths are really only used by STW GCs, and thus do not require atomic updates. This leads to considerable simplifications in the code, and improves performance on the common paths (especially in fastdebug builds that drop many irrelevant asserts). > > Additional testing: > - [x] `hotspot_gc_shenandoah` > - [ ] `tier1` with Shenandoah > - [ ] `tier2` with Shenandoah Looks good. Please update copyright years of shenandoahHeap.inline.hpp and shenandoahOopClosures.hpp ------------- Marked as reviewed by zgu (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2166 From rkennke at openjdk.java.net Wed Jan 20 19:49:49 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 20 Jan 2021 19:49:49 GMT Subject: RFR: 8260048: Shenandoah: ShenandoahMarkingContext asserts are unnecessary In-Reply-To: References: Message-ID: <_V3GudSNRag1N31V9DupSgnsRmLBMQdyLVuFAv9e3HE=.bd0c5ead-948d-4c62-b2a9-4c01e36453cd@github.com> On Wed, 20 Jan 2021 13:20:41 GMT, Aleksey Shipilev wrote: > There are two `shenandoah_assert_not_forwarded` asserts that are not necessary in `mark_{strong,weak}`, because the only caller [already asserts](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahMark.inline.hpp#L272) this higher-level invariant. There is no need to check it in `ShenandoahMarkingContext` once again. This simplifies the fastpath in fastdebug builds. > > Additional testing: > - [x] `hotspot_gc_shenandoah` > - [x] `tier1`, `tier2` with Shenandoah Ok. ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2164 From rkennke at openjdk.java.net Wed Jan 20 19:56:48 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 20 Jan 2021 19:56:48 GMT Subject: RFR: 8260106: Shenandoah: simplify maybe_update_with_forwarded and related code In-Reply-To: References: Message-ID: <7qUIYyY5EsilZrrh2KWDmw2zU9O_J56xsLQ1_ztVzqE=.c36234af-f3cf-4ce8-a299-be8213b70f4b@github.com> On Wed, 20 Jan 2021 15:06:20 GMT, Aleksey Shipilev wrote: > We have a block in `ShenandoahHeap::maybe_update_with_forwarded` that is irrelevant after JDK-8231086. Additionally, "resolve and update" paths are really only used by STW GCs, and thus do not require atomic updates. This leads to considerable simplifications in the code, and improves performance on the common paths (especially in fastdebug builds that drop many irrelevant asserts). > > Additional testing: > - [x] `hotspot_gc_shenandoah` > - [ ] `tier1` with Shenandoah > - [ ] `tier2` with Shenandoah Nice! Few nits below. src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp line 120: > 118: shenandoah_assert_not_in_cset_except(p, fwd, cancelled_gc()); > 119: > 120: // Sanity check: we are should not be updating the cset regions themselves, Typo: excess 'are' (I think) ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2166 From zgu at openjdk.java.net Wed Jan 20 21:45:58 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 20 Jan 2021 21:45:58 GMT Subject: Integrated: 8259488: Shenandoah: Missing timing tracking for STW CLD root processing In-Reply-To: <5Ym9NRL-jXY5JFNIfjdsxSO_W25fV-uP8uQryx2CcUw=.082669c1-3063-4412-b578-f1b86f3c59e9@github.com> References: <5Ym9NRL-jXY5JFNIfjdsxSO_W25fV-uP8uQryx2CcUw=.082669c1-3063-4412-b578-f1b86f3c59e9@github.com> Message-ID: On Wed, 20 Jan 2021 15:04:55 GMT, Zhengyu Gu wrote: > Please review this trivial patch that adds missing timing tracking for STW CLD root processing. > > - [x] hotspot_gc_shenandoah This pull request has now been integrated. Changeset: 4f11ff32 Author: Zhengyu Gu URL: https://git.openjdk.java.net/jdk/commit/4f11ff32 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8259488: Shenandoah: Missing timing tracking for STW CLD root processing Reviewed-by: shade ------------- PR: https://git.openjdk.java.net/jdk/pull/2165 From kbarrett at openjdk.java.net Thu Jan 21 04:33:53 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 21 Jan 2021 04:33:53 GMT Subject: RFR: 8259776: Remove ParallelGC non-CAS oldgen allocation [v3] In-Reply-To: References: Message-ID: On Tue, 19 Jan 2021 12:34:42 GMT, Thomas Schatzl wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> move oldgen alloc with size policy recording to heap object > > Lgtm. Thanks @tschatzl and @kstefanj for reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/2101 From david.holmes at oracle.com Thu Jan 21 06:43:12 2021 From: david.holmes at oracle.com (David Holmes) Date: Thu, 21 Jan 2021 16:43:12 +1000 Subject: [jdk16] RFR: 8259765: ZGC: Handle incorrect processor id reported by the operating system [v2] In-Reply-To: References: Message-ID: <4f29dc95-d87f-dcd2-c7a0-fdef6c01d3d4@oracle.com> Hi Per, On 20/01/2021 11:16 pm, Per Liden wrote: > It seems there have been e-mails sent that didn't show up here, so I'm answering on GitHub to hopefully re-attach the discussion to this PR. > > From the mailing list: >>> Glibc's tst-getcpu.c (which I assume is the test you are referring >>> to?) fails in their environment, so it seems like the affinity mask >>> isn't reliable either. >> >> What's the nature of the failure? If it's due to a non-changing >> affinity mask, then using sched_getaffinity data would still be okay. > > Glibc's tst-getcpu fails with some version of "getcpu results X should be Y". > > There seems to be a disconnect between CPU masks/affinity and what sched_getcpu() returns. > > Example (container with 1 CPU): > > 1. sysconf(_SC_NPROCESSORS_CONF) returns 1 > 2. sysconf(_SC_NPROCESSORS_ONLN) returns 1 > 3. sched_getaffinity() returns the mask 00000001 > 4. sched_setaffinity(00000001) return success, but then sched_getcpu() returns 7(!) Should have returned 0. > > Another example (container with 2 CPUs): > > 1. sysconf(_SC_NPROCESSORS_CONF) returns 2 > 2. sysconf(_SC_NPROCESSORS_ONLN) returns 2 > 3. sched_getaffinity() returns the mask 00000011 > 4. sched_setaffinity(00000001) returns success, but then sched_getcpu() returns 2(!). Should have returned 0. > 5. sched_setaffinity(00000010) returns success, but then sched_getcpu() also returns 2(!). Should have returned 1. > > It looks like CPUs are virtualized on some level, but not in sched_getcpu(). I'm guessing sched_getcpu() is returning the CPU id of the physical CPU, and not the virtual CPU, or something. So in the last example, maybe both virtual CPUs were scheduled on the same physical CPU. > So it isn't that sysconf(_SC_NPROCESSORS_CONF) returns a too low number as stated in the PR but rather that after calling sched_setaffinity, sched_getcpu is broken? Either way won't that breakage also potentially affect the NUMA code as well? Thanks, David > ------------- > > PR: https://git.openjdk.java.net/jdk16/pull/124 > From shade at openjdk.java.net Thu Jan 21 07:24:56 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 21 Jan 2021 07:24:56 GMT Subject: Integrated: 8260048: Shenandoah: ShenandoahMarkingContext asserts are unnecessary In-Reply-To: References: Message-ID: On Wed, 20 Jan 2021 13:20:41 GMT, Aleksey Shipilev wrote: > There are two `shenandoah_assert_not_forwarded` asserts that are not necessary in `mark_{strong,weak}`, because the only caller [already asserts](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shenandoah/shenandoahMark.inline.hpp#L272) this higher-level invariant. There is no need to check it in `ShenandoahMarkingContext` once again. This simplifies the fastpath in fastdebug builds. > > Additional testing: > - [x] `hotspot_gc_shenandoah` > - [x] `tier1`, `tier2` with Shenandoah This pull request has now been integrated. Changeset: 5940287b Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/5940287b Stats: 4 lines in 1 file changed: 0 ins; 2 del; 2 mod 8260048: Shenandoah: ShenandoahMarkingContext asserts are unnecessary Reviewed-by: zgu, rkennke ------------- PR: https://git.openjdk.java.net/jdk/pull/2164 From fweimer at redhat.com Thu Jan 21 07:42:21 2021 From: fweimer at redhat.com (Florian Weimer) Date: Thu, 21 Jan 2021 08:42:21 +0100 Subject: [jdk16] RFR: 8259765: ZGC: Handle incorrect processor id reported by the operating system [v2] In-Reply-To: (Per Liden's message of "Wed, 20 Jan 2021 13:16:45 GMT") References: Message-ID: <874kja3fdu.fsf@oldenburg.str.redhat.com> * Per Liden: > There seems to be a disconnect between CPU masks/affinity and what sched_getcpu() returns. > > Example (container with 1 CPU): > > 1. sysconf(_SC_NPROCESSORS_CONF) returns 1 > 2. sysconf(_SC_NPROCESSORS_ONLN) returns 1 > 3. sched_getaffinity() returns the mask 00000001 > 4. sched_setaffinity(00000001) return success, but then sched_getcpu() returns 7(!) Should have returned 0. > > Another example (container with 2 CPUs): > > 1. sysconf(_SC_NPROCESSORS_CONF) returns 2 > 2. sysconf(_SC_NPROCESSORS_ONLN) returns 2 > 3. sched_getaffinity() returns the mask 00000011 > 4. sched_setaffinity(00000001) returns success, but then sched_getcpu() returns 2(!). Should have returned 0. > 5. sched_setaffinity(00000010) returns success, but then sched_getcpu() also returns 2(!). Should have returned 1. Does sched_getaffinity actually change the affinity mask? I wonder if it just reports a 2**N - 1 unconditionally, with N being the number of configured vCPUs for the container. It probably does that so that the population count of the affinity mask matches the vCPU count. Likewise for the CPU entries under /sys (currently ignored by glibc because of a parser bug) and /proc/stat (the fallback actually used by glibc). There is no virtualization of CPU IDs whatsoever, it looks like it's all done to communicate the vCPU count, without taking into account how badly this interacts with sched_getcpu. Thanks, Florian -- Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill From shade at openjdk.java.net Thu Jan 21 08:06:11 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 21 Jan 2021 08:06:11 GMT Subject: RFR: 8260106: Shenandoah: simplify maybe_update_with_forwarded and related code [v2] In-Reply-To: <7qUIYyY5EsilZrrh2KWDmw2zU9O_J56xsLQ1_ztVzqE=.c36234af-f3cf-4ce8-a299-be8213b70f4b@github.com> References: <7qUIYyY5EsilZrrh2KWDmw2zU9O_J56xsLQ1_ztVzqE=.c36234af-f3cf-4ce8-a299-be8213b70f4b@github.com> Message-ID: On Wed, 20 Jan 2021 19:50:41 GMT, Roman Kennke wrote: >> Aleksey Shipilev has updated the pull request incrementally with two additional commits since the last revision: >> >> - Rename maybe to atomic >> - Touch up comments > > src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp line 120: > >> 118: shenandoah_assert_not_in_cset_except(p, fwd, cancelled_gc()); >> 119: >> 120: // Sanity check: we are should not be updating the cset regions themselves, > > Typo: excess 'are' (I think) Resolved, along with a few other touchups. ------------- PR: https://git.openjdk.java.net/jdk/pull/2166 From shade at openjdk.java.net Thu Jan 21 08:06:10 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 21 Jan 2021 08:06:10 GMT Subject: RFR: 8260106: Shenandoah: simplify maybe_update_with_forwarded and related code [v2] In-Reply-To: References: Message-ID: <6kABGv_phymIASILTHYVqGaQf6Lu7tgqj4wQYibNYaA=.12ef2e99-9d24-4a15-be74-aaf468cd0ca5@github.com> > We have a block in `ShenandoahHeap::maybe_update_with_forwarded` that is irrelevant after JDK-8231086. Additionally, "resolve and update" paths are really only used by STW GCs, and thus do not require atomic updates. This leads to considerable simplifications in the code, and improves performance on the common paths (especially in fastdebug builds that drop many irrelevant asserts). > > Additional testing: > - [x] `hotspot_gc_shenandoah` > - [ ] `tier1` with Shenandoah > - [ ] `tier2` with Shenandoah Aleksey Shipilev has updated the pull request incrementally with two additional commits since the last revision: - Rename maybe to atomic - Touch up comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2166/files - new: https://git.openjdk.java.net/jdk/pull/2166/files/88c15df1..764ec461 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2166&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2166&range=00-01 Stats: 5 lines in 3 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/2166.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2166/head:pull/2166 PR: https://git.openjdk.java.net/jdk/pull/2166 From shade at openjdk.java.net Thu Jan 21 08:32:10 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 21 Jan 2021 08:32:10 GMT Subject: RFR: 8255765: Shenandoah: Isolate concurrent, degenerated and full GC [v3] In-Reply-To: References: <9sUK197ZPOSEPLEDl8zKSqby-KMi9FryXnoctOdWmMc=.7b678df1-1c51-4bbf-b1a3-d8a8167a927d@github.com> Message-ID: On Tue, 19 Jan 2021 19:34:02 GMT, Zhengyu Gu wrote: >> The purpose of this patch is to isolate concurrent, degenerated and full gc implementation, so that, makes each GC implementation more straightforward and clean, and improves readability. >> >> Current implementation emphasis code sharing, e.g. degenerated GC reuses concurrent GC's ops. It was not a problem in the beginning, when they actually behave similarly. Since concurrent GC moved root processing into concurrent phases, code started to diverge, we started to put bandages to make the shared ops work for both concurrent and degenerated GC, that made code hard to read and error prone. >> >> The patch breaks up GCs into 3 (mainly just concurrent and degenerated GC, as full GC already standalone) easy to identify and understand classes (ShenandoahConcurrentGC, ShenandoahDegeneratedGC and ShenandoahMarkCompactGC), subclasses of ShenandoahGC class. >> >> The three GCs still keep vmop/entry/op paradigm, but encapsulate GC control flow inside their own classes, as ShenandoahMarkCompact GC already does. So that, concurrent and degenerated GC no longer share ops and op implementations no longer need to consider other GC modes, which results in simplifying implementation and improving readability. Code sharing is achieved via helper methods provided by ShenandoahHeap. >> >> Test: >> - [x] hotspot_gc_shenandoah >> - [x] nightly tests > > Zhengyu Gu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 101 commits: > > - Merge branch 'master' into JDK-8255765-isolate-gcs > - Aleksey's comments > - Remove cached heap in ShenandoahGC > - Merge > - Merge branch 'fix_phase_timings' into JDK-8255765-isolate-gcs > - Merge > - Merge > - Merge > - Merge > - Removed trailing whitespaces > - ... and 91 more: https://git.openjdk.java.net/jdk/compare/3edf393d...4e54d38d I have only minor nits left. src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 412: > 410: } > 411: > 412: // Actual work for the phases I think we can drop this comment. src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 659: > 657: } > 658: > 659: _dedup_roots.prologue(); It looks to me this line in misindented: one stray space? src/hotspot/share/gc/shenandoah/shenandoahDegeneratedGC.cpp line 115: > 113: // STW mark > 114: op_mark(); > 115: case _degenerated_mark: New line before this `case`? src/hotspot/share/gc/shenandoah/shenandoahGC.hpp line 66: > 64: }; > 65: > 66: #endif Should be `#endif // SHARE_GC_SHENANDOAH_SHENANDOAHGC_HPP`? ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1964 From shade at openjdk.java.net Thu Jan 21 08:37:05 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 21 Jan 2021 08:37:05 GMT Subject: RFR: 8260212: Shenandoah: resolve-only UpdateRefsMode is not used Message-ID: The only "use" is `ShenandoahMarkResolveRefsClosure`, which is unused itself. ------------- Commit messages: - 8260212: Shenandoah: resolve-only UpdateRefsMode is not used Changes: https://git.openjdk.java.net/jdk/pull/2177/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2177&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260212 Stats: 18 lines in 2 files changed: 0 ins; 18 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2177.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2177/head:pull/2177 PR: https://git.openjdk.java.net/jdk/pull/2177 From pliden at openjdk.java.net Thu Jan 21 08:40:50 2021 From: pliden at openjdk.java.net (Per Liden) Date: Thu, 21 Jan 2021 08:40:50 GMT Subject: [jdk16] RFR: 8259765: ZGC: Handle incorrect processor id reported by the operating system [v2] In-Reply-To: References: Message-ID: On Wed, 20 Jan 2021 13:12:53 GMT, Per Liden wrote: >> @dholmes-ora >> >>> So we have to penalize all correctly functioning users because of one broken environment? Can we not detect this broken environment at startup and inject a workaround then? >> >> Not sure what you have in mind here? Having an indirect function call would not result in a lower overhead than the test/branch I've introduced. It's also not necessarily trivial to detect this error at startup, as you would need a reliable way to enumerate all processors (something that seems semi-broken in this environment, which is the root of the problem), bind the current thread to each of them and then check the processor id. >> >>> Why is this an environment that is important enough that OpenJDK has to make changes to deal with a broken environment? >> >> That's of course always judgement call/trade-off. I can't say I have a super good understanding of how common this environment it, but there's at least one "Java cloud provider" that uses this environment. > > It seems there have been e-mails sent that didn't show up here, so I'm answering on GitHub to hopefully re-attach the discussion to this PR. > > From the mailing list: >>> Glibc's tst-getcpu.c (which I assume is the test you are referring >>> to?) fails in their environment, so it seems like the affinity mask >>> isn't reliable either. >> >> What's the nature of the failure? If it's due to a non-changing >> affinity mask, then using sched_getaffinity data would still be okay. > > Glibc's tst-getcpu fails with some version of "getcpu results X should be Y". > > There seems to be a disconnect between CPU masks/affinity and what sched_getcpu() returns. > > Example (container with 1 CPU): > > 1. sysconf(_SC_NPROCESSORS_CONF) returns 1 > 2. sysconf(_SC_NPROCESSORS_ONLN) returns 1 > 3. sched_getaffinity() returns the mask 00000001 > 4. sched_setaffinity(00000001) return success, but then sched_getcpu() returns 7(!) Should have returned 0. > > Another example (container with 2 CPUs): > > 1. sysconf(_SC_NPROCESSORS_CONF) returns 2 > 2. sysconf(_SC_NPROCESSORS_ONLN) returns 2 > 3. sched_getaffinity() returns the mask 00000011 > 4. sched_setaffinity(00000001) returns success, but then sched_getcpu() returns 2(!). Should have returned 0. > 5. sched_setaffinity(00000010) returns success, but then sched_getcpu() also returns 2(!). Should have returned 1. > > It looks like CPUs are virtualized on some level, but not in sched_getcpu(). I'm guessing sched_getcpu() is returning the CPU id of the physical CPU, and not the virtual CPU, or something. So in the last example, maybe both virtual CPUs were scheduled on the same physical CPU. > Does sched_getaffinity actually change the affinity mask? (assuming you meant sched_setaffinity here...) You're seem to be right. sched_setaffinity() returns success, but a following call to sched_getaffinity() shows it had no effect. > I wonder if it just reports a 2**N - 1 unconditionally, with N being the > number of configured vCPUs for the container. It probably does that so > that the population count of the affinity mask matches the vCPU count. > Likewise for the CPU entries under /sys (currently ignored by glibc > because of a parser bug) and /proc/stat (the fallback actually used by > glibc). There is no virtualization of CPU IDs whatsoever, it looks like > it's all done to communicate the vCPU count, without taking into account > how badly this interacts with sched_getcpu. Yep, that's what it looks like. ------------- PR: https://git.openjdk.java.net/jdk16/pull/124 From tschatzl at openjdk.java.net Thu Jan 21 08:54:05 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 21 Jan 2021 08:54:05 GMT Subject: RFR: 8260042: G1 Post-cleanup liveness printing occurs too early Message-ID: <25_KrAypifFeF67n3yODY_SaSYoQj937CwOF8qmGPoc=.b8df7f88-0688-4436-87ac-869dcb858ada@github.com> Hi all, can I have reviews for this small change that fixes position of the Post-cleanup liveness printing, causing wrong gc efficiencies to be printed? I.e. due to some older changes, the calculation of gc efficiences got moved below the printing of the Post-cleanup liveness which should be about these values. This change corrects that. Note that there is a sister issue about not printing the gc efficiencies in the "Post-Marking" phase. This is not scope of this change. Testing: manual testing that values are correct, hs-tier1+2 ------------- Commit messages: - Initial version Changes: https://git.openjdk.java.net/jdk/pull/2168/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2168&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260042 Stats: 10 lines in 2 files changed: 5 ins; 5 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2168.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2168/head:pull/2168 PR: https://git.openjdk.java.net/jdk/pull/2168 From rkennke at openjdk.java.net Thu Jan 21 09:01:52 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Thu, 21 Jan 2021 09:01:52 GMT Subject: RFR: 8260106: Shenandoah: simplify maybe_update_with_forwarded and related code [v2] In-Reply-To: <6kABGv_phymIASILTHYVqGaQf6Lu7tgqj4wQYibNYaA=.12ef2e99-9d24-4a15-be74-aaf468cd0ca5@github.com> References: <6kABGv_phymIASILTHYVqGaQf6Lu7tgqj4wQYibNYaA=.12ef2e99-9d24-4a15-be74-aaf468cd0ca5@github.com> Message-ID: On Thu, 21 Jan 2021 08:06:10 GMT, Aleksey Shipilev wrote: >> We have a block in `ShenandoahHeap::maybe_update_with_forwarded` that is irrelevant after JDK-8231086. Additionally, "resolve and update" paths are really only used by STW GCs, and thus do not require atomic updates. This leads to considerable simplifications in the code, and improves performance on the common paths (especially in fastdebug builds that drop many irrelevant asserts). >> >> Additional testing: >> - [x] `hotspot_gc_shenandoah` >> - [ ] `tier1` with Shenandoah >> - [ ] `tier2` with Shenandoah > > Aleksey Shipilev has updated the pull request incrementally with two additional commits since the last revision: > > - Rename maybe to atomic > - Touch up comments Looks good! Thanks! ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2166 From rkennke at openjdk.java.net Thu Jan 21 09:04:01 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Thu, 21 Jan 2021 09:04:01 GMT Subject: RFR: 8260212: Shenandoah: resolve-only UpdateRefsMode is not used In-Reply-To: References: Message-ID: On Thu, 21 Jan 2021 08:31:36 GMT, Aleksey Shipilev wrote: > The only "use" is `ShenandoahMarkResolveRefsClosure`, which is unused itself. Looks good! (Wow, this really goes up in smoke, doesn't it?) ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2177 From shade at openjdk.java.net Thu Jan 21 09:15:09 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 21 Jan 2021 09:15:09 GMT Subject: RFR: 8259954: gc/shenandoah/mxbeans tests fail with -Xcomp Message-ID: See the bug report for initial observation. The key thing is that the asynchronous GC notifications can arrive late, and they do arrive late with `-Xcomp`, because all that code is now waiting for compilation. The answer is to wait a bit smarter. Additional testing: - [x] `gc/shenandoah/mxbeans` default mode - [x] `gc/shenandoah/mxbeans` with `-Xcomp` - [x] `gc/shenandoah/mxbeans` with `-Xint` ------------- Commit messages: - 8259954: gc/shenandoah/mxbeans tests fail with -Xcomp Changes: https://git.openjdk.java.net/jdk/pull/2179/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2179&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8259954 Stats: 12 lines in 2 files changed: 10 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2179.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2179/head:pull/2179 PR: https://git.openjdk.java.net/jdk/pull/2179 From pliden at openjdk.java.net Thu Jan 21 09:16:47 2021 From: pliden at openjdk.java.net (Per Liden) Date: Thu, 21 Jan 2021 09:16:47 GMT Subject: [jdk16] RFR: 8259765: ZGC: Handle incorrect processor id reported by the operating system [v2] In-Reply-To: References: Message-ID: On Thu, 21 Jan 2021 08:37:26 GMT, Per Liden wrote: >> It seems there have been e-mails sent that didn't show up here, so I'm answering on GitHub to hopefully re-attach the discussion to this PR. >> >> From the mailing list: >>>> Glibc's tst-getcpu.c (which I assume is the test you are referring >>>> to?) fails in their environment, so it seems like the affinity mask >>>> isn't reliable either. >>> >>> What's the nature of the failure? If it's due to a non-changing >>> affinity mask, then using sched_getaffinity data would still be okay. >> >> Glibc's tst-getcpu fails with some version of "getcpu results X should be Y". >> >> There seems to be a disconnect between CPU masks/affinity and what sched_getcpu() returns. >> >> Example (container with 1 CPU): >> >> 1. sysconf(_SC_NPROCESSORS_CONF) returns 1 >> 2. sysconf(_SC_NPROCESSORS_ONLN) returns 1 >> 3. sched_getaffinity() returns the mask 00000001 >> 4. sched_setaffinity(00000001) return success, but then sched_getcpu() returns 7(!) Should have returned 0. >> >> Another example (container with 2 CPUs): >> >> 1. sysconf(_SC_NPROCESSORS_CONF) returns 2 >> 2. sysconf(_SC_NPROCESSORS_ONLN) returns 2 >> 3. sched_getaffinity() returns the mask 00000011 >> 4. sched_setaffinity(00000001) returns success, but then sched_getcpu() returns 2(!). Should have returned 0. >> 5. sched_setaffinity(00000010) returns success, but then sched_getcpu() also returns 2(!). Should have returned 1. >> >> It looks like CPUs are virtualized on some level, but not in sched_getcpu(). I'm guessing sched_getcpu() is returning the CPU id of the physical CPU, and not the virtual CPU, or something. So in the last example, maybe both virtual CPUs were scheduled on the same physical CPU. > >> Does sched_getaffinity actually change the affinity mask? > > (assuming you meant sched_setaffinity here...) > > You're seem to be right. sched_setaffinity() returns success, but a following call to sched_getaffinity() shows it had no effect. > >> I wonder if it just reports a 2**N - 1 unconditionally, with N being the >> number of configured vCPUs for the container. It probably does that so >> that the population count of the affinity mask matches the vCPU count. >> Likewise for the CPU entries under /sys (currently ignored by glibc >> because of a parser bug) and /proc/stat (the fallback actually used by >> glibc). There is no virtualization of CPU IDs whatsoever, it looks like >> it's all done to communicate the vCPU count, without taking into account >> how badly this interacts with sched_getcpu. > > Yep, that's what it looks like. > So it isn't that sysconf(_SC_NPROCESSORS_CONF) returns a too low number as stated in the PR but rather that after calling sched_setaffinity, sched_getcpu is broken? It wasn't my intention to claim that sysconf() _is_ the problem here. I just wanted to mention that it _might_ be sysconf() that is the problem. The reason I mentioned that is the because of how Docker behaves. If you give a Docker container 2 CPUs, sysconf() will still return the number of CPUs available on the host system, e.g. 8, and sched_getcpu() will in that case return numbers in the 0-7 range. Of course, this was just an observation, Docker and OpenVZ could do things differently here. > Either way won't that breakage also potentially affect the NUMA code as well? We should be good, because libnuma will report that NUMA is not available, so we automatically disable UseNUMA if it's set. ------------- PR: https://git.openjdk.java.net/jdk16/pull/124 From kbarrett at openjdk.java.net Thu Jan 21 09:32:00 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 21 Jan 2021 09:32:00 GMT Subject: [jdk16] RFR: 8259271: gc/parallel/TestDynShrinkHeap.java still fails "assert(covered_region.contains(new_memregion)) failed: new region is not in covered_region" Message-ID: <-cfOs4ymxTPyi5QvyU6MJAULUl9G0EyoxYmTr6Psv1k=.e95118c6-a09d-4766-86c5-a2753771e436@github.com> Please review this fix for an intermittent crash when using ParallelGC on aarch64. The problem is a mis-ordered pair of reads that permit an algorithmic invariant to be violated. The mis-ordering is due to the lack any explicit ordering request (a missing barrier). In MutableSpace::cas_allocate, we had HeapWord* obj = top(); if (pointer_delta(end(), obj) >= size) { ... space available, attempt the CAS to claim it ... } If end is read before top, other threads may advance top and end between those reads. If, when top is read, current top > old end and current top + size > current end, the range check will unexpectedly pass because of underflow in pointer_delta. This will allow top to be advanced to a value which is > current end, violating the algorithmic invariant, and likely leading to crashes or memory corruption. gcc for x86 doesn't reorder the reads, but for aarch64 it does, and is permitted to do so. Even if it didn't, there is nothing to prevent the hardware from doing so. The solution is to use a load_acquire for top, to ensure the needed order. This may have been the actual root cause of JDK-8257999. However, the change made there was and still is needed for the reasons described. Testing: mach5 tier1-3 Even with knowledge of the failure mode it's very hard to reproduce. I was unable to catch the underflow case in over 1K attempts using machines in our test farm, though StefanK caught it a few times on a personal machine. ------------- Commit messages: - Use load_acquire to order reads of top and end. Changes: https://git.openjdk.java.net/jdk16/pull/127/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk16&pr=127&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8259271 Stats: 5 lines in 1 file changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk16/pull/127.diff Fetch: git fetch https://git.openjdk.java.net/jdk16 pull/127/head:pull/127 PR: https://git.openjdk.java.net/jdk16/pull/127 From fweimer at redhat.com Thu Jan 21 09:59:18 2021 From: fweimer at redhat.com (Florian Weimer) Date: Thu, 21 Jan 2021 10:59:18 +0100 Subject: [jdk16] RFR: 8259765: ZGC: Handle incorrect processor id reported by the operating system [v2] In-Reply-To: (Per Liden's message of "Thu, 21 Jan 2021 09:16:47 GMT") References: Message-ID: <87bldi1uh5.fsf@oldenburg.str.redhat.com> * Per Liden: > It wasn't my intention to claim that sysconf() _is_ the problem > here. I just wanted to mention that it _might_ be sysconf() that is > the problem. The reason I mentioned that is the because of how Docker > behaves. If you give a Docker container 2 CPUs, sysconf() will still > return the number of CPUs available on the host system, e.g. 8, and > sched_getcpu() will in that case return numbers in the 0-7 range. Of > course, this was just an observation, Docker and OpenVZ could do > things differently here. Other container run-times leave the affinity mask and /sys and /proc unchanged. The downside is that applications that scale with available system resources need to be adapted individually. The existing replacement interfaces also appear to have poor performance in some cases. There is also no backwards compatibility for them. [linux] Experimental support for cgroup memory limits in container (ie Docker) environments Linux os::available_memory re-reads cgroup configuration on every invocation Cgroups v2: Container awareness So it's fair to say that this is an area where there are no simple solutions. I would like to enhance glibc's sysconf so that it provides the right results even in container environments, but it may not be the right thing to do because it could be easier to adopt applications to cgroups v3, v4, ? than to upgrade glibc on old operating system versions to add future cgroups version support (because that's feature development, and that is supposed to cease at a certain point in the maintainance cycle, right when conservative users start deploying that version ?). Thanks, Florian -- Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill From github.com+10482586+therealeliu at openjdk.java.net Thu Jan 21 09:59:54 2021 From: github.com+10482586+therealeliu at openjdk.java.net (Eric Liu) Date: Thu, 21 Jan 2021 09:59:54 GMT Subject: RFR: 8259954: gc/shenandoah/mxbeans tests fail with -Xcomp In-Reply-To: References: Message-ID: On Thu, 21 Jan 2021 09:09:22 GMT, Aleksey Shipilev wrote: > See the bug report for initial observation. The key thing is that the asynchronous GC notifications can arrive late, and they do arrive late with `-Xcomp`, because all that code is now waiting for compilation. The answer is to wait a bit smarter. > > Additional testing: > - [x] `gc/shenandoah/mxbeans` default mode > - [x] `gc/shenandoah/mxbeans` with `-Xcomp` > - [x] `gc/shenandoah/mxbeans` with `-Xint` test/hotspot/jtreg/gc/shenandoah/mxbeans/TestChurnNotifications.java line 167: > 165: Thread.sleep(1000); > 166: } > 167: Thread.sleep(5000); I was wandering if it's necessary to handle the timeout by the code itself instead of delegating to jtreg? In the worst case, that's a really long time about 960000ms. ------------- PR: https://git.openjdk.java.net/jdk/pull/2179 From shade at openjdk.java.net Thu Jan 21 10:04:02 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 21 Jan 2021 10:04:02 GMT Subject: RFR: 8259954: gc/shenandoah/mxbeans tests fail with -Xcomp In-Reply-To: References: Message-ID: On Thu, 21 Jan 2021 09:56:29 GMT, Eric Liu wrote: >> See the bug report for initial observation. The key thing is that the asynchronous GC notifications can arrive late, and they do arrive late with `-Xcomp`, because all that code is now waiting for compilation. The answer is to wait a bit smarter. >> >> Additional testing: >> - [x] `gc/shenandoah/mxbeans` default mode >> - [x] `gc/shenandoah/mxbeans` with `-Xcomp` >> - [x] `gc/shenandoah/mxbeans` with `-Xint` > > test/hotspot/jtreg/gc/shenandoah/mxbeans/TestChurnNotifications.java line 167: > >> 165: Thread.sleep(1000); >> 166: } >> 167: Thread.sleep(5000); > > I was wandering if it's necessary to handle the timeout by the code itself instead of delegating to jtreg? In the worst case, that's a really long time about 960000ms. Yeah, I was wondering about the same when doing that chunk, but then argued to myself that timeout might be as well handled by jtreg. Mostly because in many cases JTREG_TIMEOUT_FACTOR is passed to control the machine-dependent behavior: slower/overloaded machines get larger timeout factors configured. Hardcoding the timeouts in the test would deprive us of this "feature". ------------- PR: https://git.openjdk.java.net/jdk/pull/2179 From tschatzl at openjdk.java.net Thu Jan 21 10:05:07 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 21 Jan 2021 10:05:07 GMT Subject: [jdk16] RFR: 8227695: assert(pss->trim_ticks().seconds() == 0.0) failed: Unexpected partial trimming during evacuation Message-ID: <6CHUAswbb5ZkgsMFxekXioiksHbxV8YZsbeNPzwE9Ew=.a00ccd28-072c-4474-a4fa-8e3552d6d40e@github.com> Hi all, can I have reviews for this change that fixes an assert that checks whether there is no "trimming" action to be stable. We found that only on Windows Server 2012 and 2016 (not 2019) on many AMD Epyc machines sometimes pss->trim_ticks().seconds() == 0.0``` fails on random tests. The `seconds()` methods is return (double)value * ((double)unit / (double)TimeSource::frequency());``` where value is always zero, and `unit` and `TimeSource::frequency()` some constant integers, i.e. `(double) 0 * ((double) 1 / (double) 1000...000)` does not equal `0.0`. Code like this: double tt = pss->trim_ticks().seconds(); assert(tt == 0.0, ".... %2.f " PTR_FORMAT, tt, julong_cast(tt)); gives something like: `assert(tt == 0.0," .... 0.0 0x00000....0000"` so somehow the bit pattern 0x00...000 does not compare to FP 0.0. I've investigated this quite a bit (littering the code with this assert) with no particular result except that it somehow seems to have something to do with the `QueryPerformanceCounter()` call as most of the time the assert happens right after taking time. Dumping FP+XMM register state (via `fxsave`) right after the comparison `tt == 0.0` goes wrong did not yield anything (to me) obviously wrong (still `val1` and `val2` of this `Tickspan` are zero). There is no known issue with release code (crashes in this or other particular locations), just that the failures are very annoying in the CI. The fix changes the FP comparison to an integer comparison which should have been done initially (there is also precedent in the code that does exactly this integer comparison for the same reason which never failed so far), but I/we could not explain why binary 0x00..00 is not always FP "0.0". Testing: After in total 8k iterations of two tests that seemed to cause this issue more than usual there has been no assertion failure (4k runs with this patch, 4k runs with this assert duplicated all over the place). hs-tier1-5 ------------- Commit messages: - Initial commit Changes: https://git.openjdk.java.net/jdk16/pull/128/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk16&pr=128&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8227695 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk16/pull/128.diff Fetch: git fetch https://git.openjdk.java.net/jdk16 pull/128/head:pull/128 PR: https://git.openjdk.java.net/jdk16/pull/128 From kbarrett at openjdk.java.net Thu Jan 21 10:15:52 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 21 Jan 2021 10:15:52 GMT Subject: RFR: 8259851: Using boolean type for tasks in SubTasksDone In-Reply-To: References: Message-ID: <-K7nQ7EsGNpB8JgsCPdQVgmkblD5Apk1bfr4XTWK8wg=.11e61efe-ab43-4d63-8b8a-71ae811e46e7@github.com> On Mon, 18 Jan 2021 14:17:21 GMT, Albert Mingkun Yang wrote: > Changing `uint` to `bool` in `SubTasksDone`, since atomic operations on `bool` are well supported. > > Tested: hotspot_gc Marked as reviewed by kbarrett (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2131 From kbarrett at openjdk.java.net Thu Jan 21 10:15:53 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 21 Jan 2021 10:15:53 GMT Subject: RFR: 8259851: Using boolean type for tasks in SubTasksDone In-Reply-To: <7r174Y4PWKIhxVpiDLhhmc4ay9FfsxeZuCHPzow2iUA=.d7e732b7-65da-441b-83e1-3f6d98cc9a5c@github.com> References: <7r174Y4PWKIhxVpiDLhhmc4ay9FfsxeZuCHPzow2iUA=.d7e732b7-65da-441b-83e1-3f6d98cc9a5c@github.com> Message-ID: On Mon, 18 Jan 2021 20:36:43 GMT, Albert Mingkun Yang wrote: >> Should this change be made? I understand the intent is to use the >> semantically intended type, and agree with that intent. But there is a >> hidden cost; some platforms don't directly support cmpxchg on byte sized >> values, and use CmpxchgByteUsingInt. Maybe that cost is in the noise, but >> the question should be considered. For Zero I don't care. But there are >> affected platforms. > >> For Zero I don't care. But there are affected platforms. > > I see arm and s390 with grepping `CmpxchgByteUsingInt`; I will test specjbb2015 and dacapo on arm to see if there is any perf diff. Discussed with @albertnetymk and agreed the impact of this is not going to be measurable, so withdrawing the question. ------------- PR: https://git.openjdk.java.net/jdk/pull/2131 From github.com+10482586+therealeliu at openjdk.java.net Thu Jan 21 10:15:55 2021 From: github.com+10482586+therealeliu at openjdk.java.net (Eric Liu) Date: Thu, 21 Jan 2021 10:15:55 GMT Subject: RFR: 8259954: gc/shenandoah/mxbeans tests fail with -Xcomp In-Reply-To: References: Message-ID: On Thu, 21 Jan 2021 10:00:30 GMT, Aleksey Shipilev wrote: >> test/hotspot/jtreg/gc/shenandoah/mxbeans/TestChurnNotifications.java line 167: >> >>> 165: Thread.sleep(1000); >>> 166: } >>> 167: Thread.sleep(5000); >> >> I was wandering if it's necessary to handle the timeout by the code itself instead of delegating to jtreg? In the worst case, that's a really long time about 960000ms. > > Yeah, I was wondering about the same when doing that chunk, but then argued to myself that timeout might be as well handled by jtreg. Mostly because in many cases JTREG_TIMEOUT_FACTOR is passed to control the machine-dependent behavior: slower/overloaded machines get larger timeout factors configured. Hardcoding the timeouts in the test would deprive us of this "feature". Thanks, that's make sense to me. ------------- PR: https://git.openjdk.java.net/jdk/pull/2179 From tschatzl at openjdk.java.net Thu Jan 21 10:48:59 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 21 Jan 2021 10:48:59 GMT Subject: [jdk16] RFR: 8259271: gc/parallel/TestDynShrinkHeap.java still fails "assert(covered_region.contains(new_memregion)) failed: new region is not in covered_region" In-Reply-To: <-cfOs4ymxTPyi5QvyU6MJAULUl9G0EyoxYmTr6Psv1k=.e95118c6-a09d-4766-86c5-a2753771e436@github.com> References: <-cfOs4ymxTPyi5QvyU6MJAULUl9G0EyoxYmTr6Psv1k=.e95118c6-a09d-4766-86c5-a2753771e436@github.com> Message-ID: On Thu, 21 Jan 2021 09:26:19 GMT, Kim Barrett wrote: > Please review this fix for an intermittent crash when using ParallelGC on > aarch64. The problem is a mis-ordered pair of reads that permit an > algorithmic invariant to be violated. The mis-ordering is due to the lack > any explicit ordering request (a missing barrier). > > In MutableSpace::cas_allocate, we had > > HeapWord* obj = top(); > if (pointer_delta(end(), obj) >= size) { > ... space available, attempt the CAS to claim it ... > } > > If end is read before top, other threads may advance top and end between > those reads. If, when top is read, current top > old end and current top + > size > current end, the range check will unexpectedly pass because of > underflow in pointer_delta. This will allow top to be advanced to a value > which is > current end, violating the algorithmic invariant, and likely > leading to crashes or memory corruption. > > gcc for x86 doesn't reorder the reads, but for aarch64 it does, and is > permitted to do so. Even if it didn't, there is nothing to prevent the > hardware from doing so. The solution is to use a load_acquire for top, to > ensure the needed order. > > This may have been the actual root cause of JDK-8257999. However, the > change made there was and still is needed for the reasons described. > > Testing: > mach5 tier1-3 > > Even with knowledge of the failure mode it's very hard to reproduce. I was > unable to catch the underflow case in over 1K attempts using machines in our > test farm, though StefanK caught it a few times on a personal machine. Lgtm. Thanks. Marked as reviewed by tschatzl (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk16/pull/127 From mli at openjdk.java.net Thu Jan 21 11:17:08 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Thu, 21 Jan 2021 11:17:08 GMT Subject: RFR: JDK-8260200: optimize FreeRegionList::remove_starting_at by removing =?UTF-8?B?deKApg==?= Message-ID: optimize FreeRegionList::remove_starting_at by removing unnecessary reading and setting FreeRegionList::remove_starting_at(...) traverses from a node and removes subsequent N nodes from free list. But when traverses the free list, it removes nodes one by one by setting the prev and next pointers of prev and next node. it's not necessary do these settings for every node, as we can remove target nodes at once and just set prev and next pointers for just 2 nodes. ------------- Commit messages: - JDK-8260200 optimize FreeRegionList::remove_starting_at by removing unnecessary reading and setting Changes: https://git.openjdk.java.net/jdk/pull/2181/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2181&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260200 Stats: 44 lines in 1 file changed: 28 ins; 15 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2181.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2181/head:pull/2181 PR: https://git.openjdk.java.net/jdk/pull/2181 From rkennke at openjdk.java.net Thu Jan 21 11:18:54 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Thu, 21 Jan 2021 11:18:54 GMT Subject: RFR: 8259954: gc/shenandoah/mxbeans tests fail with -Xcomp In-Reply-To: References: Message-ID: On Thu, 21 Jan 2021 09:09:22 GMT, Aleksey Shipilev wrote: > See the bug report for initial observation. The key thing is that the asynchronous GC notifications can arrive late, and they do arrive late with `-Xcomp`, because all that code is now waiting for compilation. The answer is to wait a bit smarter. > > Additional testing: > - [x] `gc/shenandoah/mxbeans` default mode > - [x] `gc/shenandoah/mxbeans` with `-Xcomp` > - [x] `gc/shenandoah/mxbeans` with `-Xint` Ok. ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2179 From mli at openjdk.java.net Thu Jan 21 11:22:10 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Thu, 21 Jan 2021 11:22:10 GMT Subject: RFR: JDK-8260208: fix dummy object filling condition in =?UTF-8?B?RzFDb2xsZWN0ZWRIZWFwOjpm4oCm?= Message-ID: it's a minor fix/enhancement in cds, it fixes dummy object filling condition in G1CollectedHeap::fill_archive_regions ------------- Commit messages: - JDK-8260208: fix dummy object filling condition in G1CollectedHeap::fill_archive_regions in cds Changes: https://git.openjdk.java.net/jdk/pull/2183/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2183&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260208 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2183.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2183/head:pull/2183 PR: https://git.openjdk.java.net/jdk/pull/2183 From kbarrett at openjdk.java.net Thu Jan 21 11:31:46 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 21 Jan 2021 11:31:46 GMT Subject: [jdk16] RFR: 8227695: assert(pss->trim_ticks().seconds() == 0.0) failed: Unexpected partial trimming during evacuation In-Reply-To: <6CHUAswbb5ZkgsMFxekXioiksHbxV8YZsbeNPzwE9Ew=.a00ccd28-072c-4474-a4fa-8e3552d6d40e@github.com> References: <6CHUAswbb5ZkgsMFxekXioiksHbxV8YZsbeNPzwE9Ew=.a00ccd28-072c-4474-a4fa-8e3552d6d40e@github.com> Message-ID: <6jrViq2GhIU8X-RqA6uEzKxx97S8Ni_erhLUtjTLBf0=.a4ed134d-0750-4dc6-9761-49f6d221a5f4@github.com> On Thu, 21 Jan 2021 09:59:17 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this change that fixes an assert that checks whether there is no "trimming" action to be stable. > > We found that only on Windows Server 2012 and 2016 (not 2019) on many AMD Epyc machines sometimes > > pss->trim_ticks().seconds() == 0.0``` > > fails on random tests. The `seconds()` methods is > > return (double)value * ((double)unit / (double)TimeSource::frequency());``` > > where value is always zero, and `unit` and `TimeSource::frequency()` some constant integers, i.e. > > `(double) 0 * ((double) 1 / (double) 1000...000)` > > does not equal `0.0`. > > Code like this: > > double tt = pss->trim_ticks().seconds(); > assert(tt == 0.0, ".... %2.f " PTR_FORMAT, tt, julong_cast(tt)); > gives something like: > > `assert(tt == 0.0," .... 0.0 0x00000....0000"` > > so somehow the bit pattern 0x00...000 does not compare to FP 0.0. > > I've investigated this quite a bit (littering the code with this assert) with no particular result except that it somehow seems to have something to do with the `QueryPerformanceCounter()` call as most of the time the assert happens right after taking time. > Dumping FP+XMM register state (via `fxsave`) right after the comparison `tt == 0.0` goes wrong did not yield anything (to me) obviously wrong (still `val1` and `val2` of this `Tickspan` are zero). > > There is no known issue with release code (crashes in this or other particular locations), just that the failures are very annoying in the CI. > > The fix changes the FP comparison to an integer comparison which should have been done initially (there is also precedent in the code that does exactly this integer comparison for the same reason which never failed so far), but I/we could not explain why binary 0x00..00 is not always FP "0.0". > > Testing: After in total 8k iterations of two tests that seemed to cause this issue more than usual there has been no assertion failure (4k runs with this patch, 4k runs with this assert duplicated all over the place). hs-tier1-5 I've been following Thomas's long investigation of this, and this change looks good to me. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk16/pull/128 From tschatzl at openjdk.java.net Thu Jan 21 11:41:54 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 21 Jan 2021 11:41:54 GMT Subject: RFR: 8259851: Use boolean type for tasks in SubTasksDone In-Reply-To: References: Message-ID: On Mon, 18 Jan 2021 14:17:21 GMT, Albert Mingkun Yang wrote: > Changing `uint` to `bool` in `SubTasksDone`, since atomic operations on `bool` are well supported. > > Tested: hotspot_gc Lgtm. Not sure what the arguments were, but: The low number of invocations/cmpxchg (<10 iirc) is dwarfed by the number of other (regular) cmpxchg evacuation needs to do (for every evacuated object at least once, if not more), not talking about other barriers and actual code to be executed per object or reference. Further, at least on 32-bit ARM the number of threads is typically very small, so the contention is expected to be very low too, i.e. the amount of retries induced by this change. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2131 From ayang at openjdk.java.net Thu Jan 21 12:06:54 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Thu, 21 Jan 2021 12:06:54 GMT Subject: RFR: 8259851: Use boolean type for tasks in SubTasksDone In-Reply-To: References: Message-ID: On Thu, 21 Jan 2021 11:38:41 GMT, Thomas Schatzl wrote: >> Changing `uint` to `bool` in `SubTasksDone`, since atomic operations on `bool` are well supported. >> >> Tested: hotspot_gc > > Lgtm. > > Not sure what the arguments were, but: > > The low number of invocations/cmpxchg (<10 iirc) is dwarfed by the number of other (regular) cmpxchg evacuation needs to do (for every evacuated object at least once, if not more), not talking about other barriers and actual code to be executed per object or reference. > > Further, at least on 32-bit ARM the number of threads is typically very small, so the contention is expected to be very low too, i.e. the amount of retries induced by this change. Thank you for the reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/2131 From ayang at openjdk.java.net Thu Jan 21 12:13:58 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Thu, 21 Jan 2021 12:13:58 GMT Subject: Integrated: 8259851: Use boolean type for tasks in SubTasksDone In-Reply-To: References: Message-ID: On Mon, 18 Jan 2021 14:17:21 GMT, Albert Mingkun Yang wrote: > Changing `uint` to `bool` in `SubTasksDone`, since atomic operations on `bool` are well supported. > > Tested: hotspot_gc This pull request has now been integrated. Changeset: 6ce0799b Author: Albert Mingkun Yang Committer: Thomas Schatzl URL: https://git.openjdk.java.net/jdk/commit/6ce0799b Stats: 13 lines in 2 files changed: 0 ins; 6 del; 7 mod 8259851: Use boolean type for tasks in SubTasksDone Reviewed-by: kbarrett, tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/2131 From zgu at openjdk.java.net Thu Jan 21 13:27:17 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 21 Jan 2021 13:27:17 GMT Subject: RFR: 8259954: gc/shenandoah/mxbeans tests fail with -Xcomp In-Reply-To: References: Message-ID: On Thu, 21 Jan 2021 09:09:22 GMT, Aleksey Shipilev wrote: > See the bug report for initial observation. The key thing is that the asynchronous GC notifications can arrive late, and they do arrive late with `-Xcomp`, because all that code is now waiting for compilation. The answer is to wait a bit smarter. > > Additional testing: > - [x] `gc/shenandoah/mxbeans` default mode > - [x] `gc/shenandoah/mxbeans` with `-Xcomp` > - [x] `gc/shenandoah/mxbeans` with `-Xint` Okay. ------------- Marked as reviewed by zgu (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2179 From shade at openjdk.java.net Thu Jan 21 13:36:27 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 21 Jan 2021 13:36:27 GMT Subject: Integrated: 8259954: gc/shenandoah/mxbeans tests fail with -Xcomp In-Reply-To: References: Message-ID: On Thu, 21 Jan 2021 09:09:22 GMT, Aleksey Shipilev wrote: > See the bug report for initial observation. The key thing is that the asynchronous GC notifications can arrive late, and they do arrive late with `-Xcomp`, because all that code is now waiting for compilation. The answer is to wait a bit smarter. > > Additional testing: > - [x] `gc/shenandoah/mxbeans` default mode > - [x] `gc/shenandoah/mxbeans` with `-Xcomp` > - [x] `gc/shenandoah/mxbeans` with `-Xint` This pull request has now been integrated. Changeset: c3c66625 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/c3c66625 Stats: 12 lines in 2 files changed: 10 ins; 0 del; 2 mod 8259954: gc/shenandoah/mxbeans tests fail with -Xcomp Reviewed-by: rkennke, zgu ------------- PR: https://git.openjdk.java.net/jdk/pull/2179 From zgu at openjdk.java.net Thu Jan 21 13:38:36 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 21 Jan 2021 13:38:36 GMT Subject: RFR: 8255765: Shenandoah: Isolate concurrent, degenerated and full GC [v4] In-Reply-To: <9sUK197ZPOSEPLEDl8zKSqby-KMi9FryXnoctOdWmMc=.7b678df1-1c51-4bbf-b1a3-d8a8167a927d@github.com> References: <9sUK197ZPOSEPLEDl8zKSqby-KMi9FryXnoctOdWmMc=.7b678df1-1c51-4bbf-b1a3-d8a8167a927d@github.com> Message-ID: <09bd9-Ql3GxB5ELAGppBGDqXuTJeKN2QDMhYlk6f8RU=.2389637d-0d0a-4d7b-9002-4ebc0c551007@github.com> > The purpose of this patch is to isolate concurrent, degenerated and full gc implementation, so that, makes each GC implementation more straightforward and clean, and improves readability. > > Current implementation emphasis code sharing, e.g. degenerated GC reuses concurrent GC's ops. It was not a problem in the beginning, when they actually behave similarly. Since concurrent GC moved root processing into concurrent phases, code started to diverge, we started to put bandages to make the shared ops work for both concurrent and degenerated GC, that made code hard to read and error prone. > > The patch breaks up GCs into 3 (mainly just concurrent and degenerated GC, as full GC already standalone) easy to identify and understand classes (ShenandoahConcurrentGC, ShenandoahDegeneratedGC and ShenandoahMarkCompactGC), subclasses of ShenandoahGC class. > > The three GCs still keep vmop/entry/op paradigm, but encapsulate GC control flow inside their own classes, as ShenandoahMarkCompact GC already does. So that, concurrent and degenerated GC no longer share ops and op implementations no longer need to consider other GC modes, which results in simplifying implementation and improving readability. Code sharing is achieved via helper methods provided by ShenandoahHeap. > > Test: > - [x] hotspot_gc_shenandoah > - [x] nightly tests Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: More from Aleksey's review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1964/files - new: https://git.openjdk.java.net/jdk/pull/1964/files/4e54d38d..514aac66 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1964&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1964&range=02-03 Stats: 3 lines in 3 files changed: 1 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1964.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1964/head:pull/1964 PR: https://git.openjdk.java.net/jdk/pull/1964 From zgu at openjdk.java.net Thu Jan 21 13:44:53 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 21 Jan 2021 13:44:53 GMT Subject: RFR: 8255765: Shenandoah: Isolate concurrent, degenerated and full GC [v5] In-Reply-To: <9sUK197ZPOSEPLEDl8zKSqby-KMi9FryXnoctOdWmMc=.7b678df1-1c51-4bbf-b1a3-d8a8167a927d@github.com> References: <9sUK197ZPOSEPLEDl8zKSqby-KMi9FryXnoctOdWmMc=.7b678df1-1c51-4bbf-b1a3-d8a8167a927d@github.com> Message-ID: > The purpose of this patch is to isolate concurrent, degenerated and full gc implementation, so that, makes each GC implementation more straightforward and clean, and improves readability. > > Current implementation emphasis code sharing, e.g. degenerated GC reuses concurrent GC's ops. It was not a problem in the beginning, when they actually behave similarly. Since concurrent GC moved root processing into concurrent phases, code started to diverge, we started to put bandages to make the shared ops work for both concurrent and degenerated GC, that made code hard to read and error prone. > > The patch breaks up GCs into 3 (mainly just concurrent and degenerated GC, as full GC already standalone) easy to identify and understand classes (ShenandoahConcurrentGC, ShenandoahDegeneratedGC and ShenandoahMarkCompactGC), subclasses of ShenandoahGC class. > > The three GCs still keep vmop/entry/op paradigm, but encapsulate GC control flow inside their own classes, as ShenandoahMarkCompact GC already does. So that, concurrent and degenerated GC no longer share ops and op implementations no longer need to consider other GC modes, which results in simplifying implementation and improving readability. Code sharing is achieved via helper methods provided by ShenandoahHeap. > > Test: > - [x] hotspot_gc_shenandoah > - [x] nightly tests Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: Fixed indentation ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1964/files - new: https://git.openjdk.java.net/jdk/pull/1964/files/514aac66..7119186c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1964&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1964&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/1964.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1964/head:pull/1964 PR: https://git.openjdk.java.net/jdk/pull/1964 From iwalulya at openjdk.java.net Thu Jan 21 13:49:11 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Thu, 21 Jan 2021 13:49:11 GMT Subject: [jdk16] RFR: 8259271: gc/parallel/TestDynShrinkHeap.java still fails "assert(covered_region.contains(new_memregion)) failed: new region is not in covered_region" In-Reply-To: <-cfOs4ymxTPyi5QvyU6MJAULUl9G0EyoxYmTr6Psv1k=.e95118c6-a09d-4766-86c5-a2753771e436@github.com> References: <-cfOs4ymxTPyi5QvyU6MJAULUl9G0EyoxYmTr6Psv1k=.e95118c6-a09d-4766-86c5-a2753771e436@github.com> Message-ID: On Thu, 21 Jan 2021 09:26:19 GMT, Kim Barrett wrote: > Please review this fix for an intermittent crash when using ParallelGC on > aarch64. The problem is a mis-ordered pair of reads that permit an > algorithmic invariant to be violated. The mis-ordering is due to the lack > any explicit ordering request (a missing barrier). > > In MutableSpace::cas_allocate, we had > > HeapWord* obj = top(); > if (pointer_delta(end(), obj) >= size) { > ... space available, attempt the CAS to claim it ... > } > > If end is read before top, other threads may advance top and end between > those reads. If, when top is read, current top > old end and current top + > size > current end, the range check will unexpectedly pass because of > underflow in pointer_delta. This will allow top to be advanced to a value > which is > current end, violating the algorithmic invariant, and likely > leading to crashes or memory corruption. > > gcc for x86 doesn't reorder the reads, but for aarch64 it does, and is > permitted to do so. Even if it didn't, there is nothing to prevent the > hardware from doing so. The solution is to use a load_acquire for top, to > ensure the needed order. > > This may have been the actual root cause of JDK-8257999. However, the > change made there was and still is needed for the reasons described. > > Testing: > mach5 tier1-3 > > Even with knowledge of the failure mode it's very hard to reproduce. I was > unable to catch the underflow case in over 1K attempts using machines in our > test farm, though StefanK caught it a few times on a personal machine. looks good ------------- Marked as reviewed by iwalulya (Committer). PR: https://git.openjdk.java.net/jdk16/pull/127 From iwalulya at openjdk.java.net Thu Jan 21 13:54:28 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Thu, 21 Jan 2021 13:54:28 GMT Subject: [jdk16] RFR: 8227695: assert(pss->trim_ticks().seconds() == 0.0) failed: Unexpected partial trimming during evacuation In-Reply-To: <6CHUAswbb5ZkgsMFxekXioiksHbxV8YZsbeNPzwE9Ew=.a00ccd28-072c-4474-a4fa-8e3552d6d40e@github.com> References: <6CHUAswbb5ZkgsMFxekXioiksHbxV8YZsbeNPzwE9Ew=.a00ccd28-072c-4474-a4fa-8e3552d6d40e@github.com> Message-ID: On Thu, 21 Jan 2021 09:59:17 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this change that fixes an assert that checks whether there is no "trimming" action to be stable. > > We found that only on Windows Server 2012 and 2016 (not 2019) on many AMD Epyc machines sometimes > > pss->trim_ticks().seconds() == 0.0``` > > fails on random tests. The `seconds()` methods is > > return (double)value * ((double)unit / (double)TimeSource::frequency());``` > > where value is always zero, and `unit` and `TimeSource::frequency()` some constant integers, i.e. > > `(double) 0 * ((double) 1 / (double) 1000...000)` > > does not equal `0.0`. > > Code like this: > > double tt = pss->trim_ticks().seconds(); > assert(tt == 0.0, ".... %2.f " PTR_FORMAT, tt, julong_cast(tt)); > gives something like: > > `assert(tt == 0.0," .... 0.0 0x00000....0000"` > > so somehow the bit pattern 0x00...000 does not compare to FP 0.0. > > I've investigated this quite a bit (littering the code with this assert) with no particular result except that it somehow seems to have something to do with the `QueryPerformanceCounter()` call as most of the time the assert happens right after taking time. > Dumping FP+XMM register state (via `fxsave`) right after the comparison `tt == 0.0` goes wrong did not yield anything (to me) obviously wrong (still `val1` and `val2` of this `Tickspan` are zero). > > There is no known issue with release code (crashes in this or other particular locations), just that the failures are very annoying in the CI. > > The fix changes the FP comparison to an integer comparison which should have been done initially (there is also precedent in the code that does exactly this integer comparison for the same reason which never failed so far), but I/we could not explain why binary 0x00..00 is not always FP "0.0". > > Testing: After in total 8k iterations of two tests that seemed to cause this issue more than usual there has been no assertion failure (4k runs with this patch, 4k runs with this assert duplicated all over the place). hs-tier1-5 lgtm ------------- Marked as reviewed by iwalulya (Committer). PR: https://git.openjdk.java.net/jdk16/pull/128 From zgu at openjdk.java.net Thu Jan 21 13:56:29 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 21 Jan 2021 13:56:29 GMT Subject: RFR: 8260212: Shenandoah: resolve-only UpdateRefsMode is not used In-Reply-To: References: Message-ID: On Thu, 21 Jan 2021 08:31:36 GMT, Aleksey Shipilev wrote: > The only "use" is `ShenandoahMarkResolveRefsClosure`, which is unused itself. Looks good. ------------- Marked as reviewed by zgu (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2177 From rkennke at openjdk.java.net Thu Jan 21 13:58:47 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Thu, 21 Jan 2021 13:58:47 GMT Subject: RFR: 8255765: Shenandoah: Isolate concurrent, degenerated and full GC [v5] In-Reply-To: References: <9sUK197ZPOSEPLEDl8zKSqby-KMi9FryXnoctOdWmMc=.7b678df1-1c51-4bbf-b1a3-d8a8167a927d@github.com> Message-ID: On Thu, 21 Jan 2021 13:44:53 GMT, Zhengyu Gu wrote: >> The purpose of this patch is to isolate concurrent, degenerated and full gc implementation, so that, makes each GC implementation more straightforward and clean, and improves readability. >> >> Current implementation emphasis code sharing, e.g. degenerated GC reuses concurrent GC's ops. It was not a problem in the beginning, when they actually behave similarly. Since concurrent GC moved root processing into concurrent phases, code started to diverge, we started to put bandages to make the shared ops work for both concurrent and degenerated GC, that made code hard to read and error prone. >> >> The patch breaks up GCs into 3 (mainly just concurrent and degenerated GC, as full GC already standalone) easy to identify and understand classes (ShenandoahConcurrentGC, ShenandoahDegeneratedGC and ShenandoahMarkCompactGC), subclasses of ShenandoahGC class. >> >> The three GCs still keep vmop/entry/op paradigm, but encapsulate GC control flow inside their own classes, as ShenandoahMarkCompact GC already does. So that, concurrent and degenerated GC no longer share ops and op implementations no longer need to consider other GC modes, which results in simplifying implementation and improving readability. Code sharing is achieved via helper methods provided by ShenandoahHeap. >> >> Test: >> - [x] hotspot_gc_shenandoah >> - [x] nightly tests > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > Fixed indentation Looks good! ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1964 From eosterlund at openjdk.java.net Thu Jan 21 14:00:14 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 21 Jan 2021 14:00:14 GMT Subject: [jdk16] RFR: 8259271: gc/parallel/TestDynShrinkHeap.java still fails "assert(covered_region.contains(new_memregion)) failed: new region is not in covered_region" In-Reply-To: <-cfOs4ymxTPyi5QvyU6MJAULUl9G0EyoxYmTr6Psv1k=.e95118c6-a09d-4766-86c5-a2753771e436@github.com> References: <-cfOs4ymxTPyi5QvyU6MJAULUl9G0EyoxYmTr6Psv1k=.e95118c6-a09d-4766-86c5-a2753771e436@github.com> Message-ID: On Thu, 21 Jan 2021 09:26:19 GMT, Kim Barrett wrote: > Please review this fix for an intermittent crash when using ParallelGC on > aarch64. The problem is a mis-ordered pair of reads that permit an > algorithmic invariant to be violated. The mis-ordering is due to the lack > any explicit ordering request (a missing barrier). > > In MutableSpace::cas_allocate, we had > > HeapWord* obj = top(); > if (pointer_delta(end(), obj) >= size) { > ... space available, attempt the CAS to claim it ... > } > > If end is read before top, other threads may advance top and end between > those reads. If, when top is read, current top > old end and current top + > size > current end, the range check will unexpectedly pass because of > underflow in pointer_delta. This will allow top to be advanced to a value > which is > current end, violating the algorithmic invariant, and likely > leading to crashes or memory corruption. > > gcc for x86 doesn't reorder the reads, but for aarch64 it does, and is > permitted to do so. Even if it didn't, there is nothing to prevent the > hardware from doing so. The solution is to use a load_acquire for top, to > ensure the needed order. > > This may have been the actual root cause of JDK-8257999. However, the > change made there was and still is needed for the reasons described. > > Testing: > mach5 tier1-3 > > Even with knowledge of the failure mode it's very hard to reproduce. I was > unable to catch the underflow case in over 1K attempts using machines in our > test farm, though StefanK caught it a few times on a personal machine. Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.java.net/jdk16/pull/127 From eosterlund at openjdk.java.net Thu Jan 21 14:05:36 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 21 Jan 2021 14:05:36 GMT Subject: [jdk16] RFR: 8227695: assert(pss->trim_ticks().seconds() == 0.0) failed: Unexpected partial trimming during evacuation In-Reply-To: <6CHUAswbb5ZkgsMFxekXioiksHbxV8YZsbeNPzwE9Ew=.a00ccd28-072c-4474-a4fa-8e3552d6d40e@github.com> References: <6CHUAswbb5ZkgsMFxekXioiksHbxV8YZsbeNPzwE9Ew=.a00ccd28-072c-4474-a4fa-8e3552d6d40e@github.com> Message-ID: On Thu, 21 Jan 2021 09:59:17 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this change that fixes an assert that checks whether there is no "trimming" action to be stable. > > We found that only on Windows Server 2012 and 2016 (not 2019) on many AMD Epyc machines sometimes > > pss->trim_ticks().seconds() == 0.0``` > > fails on random tests. The `seconds()` methods is > > return (double)value * ((double)unit / (double)TimeSource::frequency());``` > > where value is always zero, and `unit` and `TimeSource::frequency()` some constant integers, i.e. > > `(double) 0 * ((double) 1 / (double) 1000...000)` > > does not equal `0.0`. > > Code like this: > > double tt = pss->trim_ticks().seconds(); > assert(tt == 0.0, ".... %2.f " PTR_FORMAT, tt, julong_cast(tt)); > gives something like: > > `assert(tt == 0.0," .... 0.0 0x00000....0000"` > > so somehow the bit pattern 0x00...000 does not compare to FP 0.0. > > I've investigated this quite a bit (littering the code with this assert) with no particular result except that it somehow seems to have something to do with the `QueryPerformanceCounter()` call as most of the time the assert happens right after taking time. > Dumping FP+XMM register state (via `fxsave`) right after the comparison `tt == 0.0` goes wrong did not yield anything (to me) obviously wrong (still `val1` and `val2` of this `Tickspan` are zero). > > There is no known issue with release code (crashes in this or other particular locations), just that the failures are very annoying in the CI. > > The fix changes the FP comparison to an integer comparison which should have been done initially (there is also precedent in the code that does exactly this integer comparison for the same reason which never failed so far), but I/we could not explain why binary 0x00..00 is not always FP "0.0". > > Testing: After in total 8k iterations of two tests that seemed to cause this issue more than usual there has been no assertion failure (4k runs with this patch, 4k runs with this assert duplicated all over the place). hs-tier1-5 Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.java.net/jdk16/pull/128 From kbarrett at openjdk.java.net Thu Jan 21 15:40:40 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 21 Jan 2021 15:40:40 GMT Subject: [jdk16] RFR: 8259271: gc/parallel/TestDynShrinkHeap.java still fails "assert(covered_region.contains(new_memregion)) failed: new region is not in covered_region" In-Reply-To: References: <-cfOs4ymxTPyi5QvyU6MJAULUl9G0EyoxYmTr6Psv1k=.e95118c6-a09d-4766-86c5-a2753771e436@github.com> Message-ID: On Thu, 21 Jan 2021 10:45:52 GMT, Thomas Schatzl wrote: >> Please review this fix for an intermittent crash when using ParallelGC on >> aarch64. The problem is a mis-ordered pair of reads that permit an >> algorithmic invariant to be violated. The mis-ordering is due to the lack >> any explicit ordering request (a missing barrier). >> >> In MutableSpace::cas_allocate, we had >> >> HeapWord* obj = top(); >> if (pointer_delta(end(), obj) >= size) { >> ... space available, attempt the CAS to claim it ... >> } >> >> If end is read before top, other threads may advance top and end between >> those reads. If, when top is read, current top > old end and current top + >> size > current end, the range check will unexpectedly pass because of >> underflow in pointer_delta. This will allow top to be advanced to a value >> which is > current end, violating the algorithmic invariant, and likely >> leading to crashes or memory corruption. >> >> gcc for x86 doesn't reorder the reads, but for aarch64 it does, and is >> permitted to do so. Even if it didn't, there is nothing to prevent the >> hardware from doing so. The solution is to use a load_acquire for top, to >> ensure the needed order. >> >> This may have been the actual root cause of JDK-8257999. However, the >> change made there was and still is needed for the reasons described. >> >> Testing: >> mach5 tier1-3 >> >> Even with knowledge of the failure mode it's very hard to reproduce. I was >> unable to catch the underflow case in over 1K attempts using machines in our >> test farm, though StefanK caught it a few times on a personal machine. > > Marked as reviewed by tschatzl (Reviewer). Thanks @tschatzl , @walulyai , and @fisk for reviews. ------------- PR: https://git.openjdk.java.net/jdk16/pull/127 From shade at openjdk.java.net Thu Jan 21 16:48:56 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 21 Jan 2021 16:48:56 GMT Subject: RFR: 8255765: Shenandoah: Isolate concurrent, degenerated and full GC [v5] In-Reply-To: References: <9sUK197ZPOSEPLEDl8zKSqby-KMi9FryXnoctOdWmMc=.7b678df1-1c51-4bbf-b1a3-d8a8167a927d@github.com> Message-ID: On Thu, 21 Jan 2021 13:44:53 GMT, Zhengyu Gu wrote: >> The purpose of this patch is to isolate concurrent, degenerated and full gc implementation, so that, makes each GC implementation more straightforward and clean, and improves readability. >> >> Current implementation emphasis code sharing, e.g. degenerated GC reuses concurrent GC's ops. It was not a problem in the beginning, when they actually behave similarly. Since concurrent GC moved root processing into concurrent phases, code started to diverge, we started to put bandages to make the shared ops work for both concurrent and degenerated GC, that made code hard to read and error prone. >> >> The patch breaks up GCs into 3 (mainly just concurrent and degenerated GC, as full GC already standalone) easy to identify and understand classes (ShenandoahConcurrentGC, ShenandoahDegeneratedGC and ShenandoahMarkCompactGC), subclasses of ShenandoahGC class. >> >> The three GCs still keep vmop/entry/op paradigm, but encapsulate GC control flow inside their own classes, as ShenandoahMarkCompact GC already does. So that, concurrent and degenerated GC no longer share ops and op implementations no longer need to consider other GC modes, which results in simplifying implementation and improving readability. Code sharing is achieved via helper methods provided by ShenandoahHeap. >> >> Test: >> - [x] hotspot_gc_shenandoah >> - [x] nightly tests > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > Fixed indentation Looks good. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1964 From zgu at openjdk.java.net Thu Jan 21 16:59:09 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 21 Jan 2021 16:59:09 GMT Subject: Integrated: 8255765: Shenandoah: Isolate concurrent, degenerated and full GC In-Reply-To: <9sUK197ZPOSEPLEDl8zKSqby-KMi9FryXnoctOdWmMc=.7b678df1-1c51-4bbf-b1a3-d8a8167a927d@github.com> References: <9sUK197ZPOSEPLEDl8zKSqby-KMi9FryXnoctOdWmMc=.7b678df1-1c51-4bbf-b1a3-d8a8167a927d@github.com> Message-ID: On Wed, 6 Jan 2021 16:45:03 GMT, Zhengyu Gu wrote: > The purpose of this patch is to isolate concurrent, degenerated and full gc implementation, so that, makes each GC implementation more straightforward and clean, and improves readability. > > Current implementation emphasis code sharing, e.g. degenerated GC reuses concurrent GC's ops. It was not a problem in the beginning, when they actually behave similarly. Since concurrent GC moved root processing into concurrent phases, code started to diverge, we started to put bandages to make the shared ops work for both concurrent and degenerated GC, that made code hard to read and error prone. > > The patch breaks up GCs into 3 (mainly just concurrent and degenerated GC, as full GC already standalone) easy to identify and understand classes (ShenandoahConcurrentGC, ShenandoahDegeneratedGC and ShenandoahMarkCompactGC), subclasses of ShenandoahGC class. > > The three GCs still keep vmop/entry/op paradigm, but encapsulate GC control flow inside their own classes, as ShenandoahMarkCompact GC already does. So that, concurrent and degenerated GC no longer share ops and op implementations no longer need to consider other GC modes, which results in simplifying implementation and improving readability. Code sharing is achieved via helper methods provided by ShenandoahHeap. > > Test: > - [x] hotspot_gc_shenandoah > - [x] nightly tests This pull request has now been integrated. Changeset: 34eb8b34 Author: Zhengyu Gu URL: https://git.openjdk.java.net/jdk/commit/34eb8b34 Stats: 3218 lines in 20 files changed: 1802 ins; 1280 del; 136 mod 8255765: Shenandoah: Isolate concurrent, degenerated and full GC Reviewed-by: rkennke, shade ------------- PR: https://git.openjdk.java.net/jdk/pull/1964 From tgoldstein at outbrain.com Thu Jan 21 17:39:04 2021 From: tgoldstein at outbrain.com (Tal Goldstein) Date: Thu, 21 Jan 2021 19:39:04 +0200 Subject: Fwd: Unexpected results when enabling +UseNUMA for G1GC In-Reply-To: References: <9c42a365-db78-4699-c138-cc06d0c4708f@oracle.com> Message-ID: Hey Sangheon, Thanks for your suggestions. I answered your questions in-line. Regarding your suggestion to increase the heap, I've increased the heap size to 40GB and the container memory to 50GB, and ran 2 deployments (numa and non-numa), each deployment has 1 pod which runs on a dedicated physical k8s node (the same machines mentioned previously). After running it for several days I could see the following pattern: For several days, whenever comes the hours of the day when throughput is at its max, then the local memory access ratio of NUMA deployment is much better than the non-numa deployment (5%-6% diff). This can be seen in the charts below: 1. Throughput Per deployment (Numa deployment vs Non-Numa deployment): https://drive.google.com/file/d/1tG_Qm9MNHZbtmIiXryL8KGMyUk_vylVG/view?usp=sharing 2. Local memory ratio % (kube3-10769 is the k8s node WITH NUMA, kube3-10770 WITHOUT NUMA) https://drive.google.com/file/d/1WmjBSPiwwMpXDX3MWsjQQN6vR3BLSro1/view?usp=sharing >From this I understand that the NUMA based deployment behaves better under a higher workload, but what's still unclear to me, is why the throughput of the non-numa deployment is higher than numa deployment ? Thanks, Tal On Mon, Jan 11, 2021 at 10:05 PM wrote: > Hi Tal, > I added in-line comments. > On 1/9/21 12:15 PM, Tal Goldstein wrote: > > Hi Guys, > > We're exploring the use of the flag -XX:+UseNUMA and its effect on G1 GC > in > > JDK 14. > > For that, we've created a test that consists of 2 k8s deployments of some > > service, > > where deployment A has the UseNUMA flag enabled, and deployment B doesn't > > have it. > > > > In order for NUMA to actually work inside the docker container, we also > > needed to add numactl lib to the container (apk add numactl), > > and in order to measure the local/remote memory access we've used > pcm-numa ( > > https://github.com/opcm/pcm), > > the docker is based on an image of Alpine Linux v3.11. > > > > Each deployment handles around 150 requests per second and all of the > > deployment's pods are running on the same kube machine. > > When running the test, we expected to see that the (local memory access) > / > > (total memory access) ratio on the UseNUMA deployment, is much higher > than > > the non-numa deployment, > > and as a result that the deployment itself handles a higher throughput of > > requests than the non-numa deployment. > > > > Surprisingly this isn't the case: > > On the kube running deployment A which uses NUMA, we measured 20M/ 13M/ > 33M > > (local/remote/total) memory accesses, > > and for the kube running deployment B which doesn't use NUMA, we measured > > (23M/10M/33M) on the same time. > Just curious, did you see any performance difference(other than > pcm-numa) between those two? > Does it mean you ran 2 pods in parallel(at the same time) on one > physical machine? > I didn't see any other significant difference. Yes, so there were 4 pods on the original experiment: 2 On each deployment (NUMA deployment, and non-NUMA deployment), and each deployment ran on a separate k8s physical node, and those nodes didn't run anything else but the 2 k8s pods. > > > Can you help to understand if we're doing anything wrong? or maybe our > > expectations are wrong ? > > > > The 2 deployments are identical (except for the UseNUMA flag): > > Each deployment contains 2 pods running on k8s. > > Each pod has 10GB memory, 8GB heap, requires 2 CPUs (but not limited to > 2). > > Each deployment runs on a separate but identical kube machine with this > > spec: > > Hardware............: Supermicro SYS-2027TR-HTRF+ > > CPU.................: Intel(R) Xeon(R) CPU E5-2630L v2 @ > > 2.40GHz > > CPUs................: 2 > > CPU Cores...........: 12 > > Memory..............: 63627 MB > > > > > > We've also written to a file all NUMA related logs (using > > > -Xlog:os*,gc*=trace:file=/outbrain/heapdumps/fulllog.log:hostname,time,level,tags) > > - log file could be found here: > > > https://drive.google.com/file/d/1eZqYDtBDWKXaEakh_DoYv0P6V9bcLs6Z/view?usp=sharing > > so we know that NUMA is indeed working, but again, it doesn't give the > > desired results we expected to see. > From the shared log file, I see only 1 GC (GC id, 6761) and numa stat > shows 53% of local memory allocation (gc,heap,numa) which seems okay. > Could you share your full vm options? > These are the updated vm options: -XX:+PerfDisableSharedMem -Xmx40g -Xms40g -XX:+DisableExplicitGC -XX:-OmitStackTraceInFastThrow -XX:+AlwaysPreTouch -Duser.country=US -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -XX:+ParallelRefProcEnabled -XX:+UnlockExperimentalVMOptions -XX:G1MaxNewSizePercent=90 -XX:InitiatingHeapOccupancyPercent=35 -XX:-G1UseAdaptiveIHOP -XX:ActiveProcessorCount=2 -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005 -XX:+UseNUMA -Xlog:os*,gc*=trace:file=/outbrain/heapdumps/fulllog.log:hostname,time,level,tags > > > > Any Ideas why ? > > Is it a matter of workload ? > Can you increase your Java heap on the testing machine? > Your test machine has almost 64GB of memory on 2 NUMA nodes. So I assume > each NUMA node will have almost 32GB of memory. > But you are using only 8GB on Java heap which fits on one node, so I > can't expect any benefit of enabling NUMA. > But when the jvm is started, doesn't it spreads the heap evenly across all numa nodes ? And in this case, won't each NUMA node hold half of the heap (around 4GB) ? I've increased the heap to be 40GB, and the container memory to 50GB. > As the JVM is running on Kubernetes, there could be another thing may > affect to the test. > For example, topology manager may treat a pod to allocate from a single > NUMA node. > https://kubernetes.io/docs/tasks/administer-cluster/topology-manager/ > > That's very interesting, I will read about it and try to understand more, and to understand if we're even using the topology manager. Do you think that using k8s with toplogy manager might be the problem ? Or that actually enabling topology manager should allow better usage of the hardware and actually help in our case ? > > Are there any workloads you can suggest that > > will benefit from G1 NUMA awareness ? > I measured some performance improvements on SpecJBB2015 and SpecJBB2005. > > > Do you happen to have a link to code that runs such a workload? > No, I don't have such link for above runs. > > Thanks, > Sangheon > > > Thanks, > > Tal > > > -- The above terms reflect a potential business arrangement, are provided? solely as a basis for further discussion, and are not intended to be and do? not constitute a legally binding obligation. No legally binding obligations will be created, implied, or inferred until an agreement in final form is? executed in writing by all parties involved. This email and any attachments hereto may be confidential or privileged. ?If you received this communication by mistake, please don't forward it to anyone else, please? erase all copies and attachments, and please let me know that it has gone? to the wrong person. Thanks. From tschatzl at openjdk.java.net Thu Jan 21 18:22:16 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 21 Jan 2021 18:22:16 GMT Subject: [jdk16] RFR: 8227695: assert(pss->trim_ticks().seconds() == 0.0) failed: Unexpected partial trimming during evacuation In-Reply-To: <6jrViq2GhIU8X-RqA6uEzKxx97S8Ni_erhLUtjTLBf0=.a4ed134d-0750-4dc6-9761-49f6d221a5f4@github.com> References: <6CHUAswbb5ZkgsMFxekXioiksHbxV8YZsbeNPzwE9Ew=.a00ccd28-072c-4474-a4fa-8e3552d6d40e@github.com> <6jrViq2GhIU8X-RqA6uEzKxx97S8Ni_erhLUtjTLBf0=.a4ed134d-0750-4dc6-9761-49f6d221a5f4@github.com> Message-ID: On Thu, 21 Jan 2021 11:28:55 GMT, Kim Barrett wrote: >> Hi all, >> >> can I have reviews for this change that fixes an assert that checks whether there is no "trimming" action to be stable. >> >> We found that only on Windows Server 2012 and 2016 (not 2019) on many AMD Epyc machines sometimes >> >> pss->trim_ticks().seconds() == 0.0``` >> >> fails on random tests. The `seconds()` methods is >> >> return (double)value * ((double)unit / (double)TimeSource::frequency());``` >> >> where value is always zero, and `unit` and `TimeSource::frequency()` some constant integers, i.e. >> >> `(double) 0 * ((double) 1 / (double) 1000...000)` >> >> does not equal `0.0`. >> >> Code like this: >> >> double tt = pss->trim_ticks().seconds(); >> assert(tt == 0.0, ".... %2.f " PTR_FORMAT, tt, julong_cast(tt)); >> gives something like: >> >> `assert(tt == 0.0," .... 0.0 0x00000....0000"` >> >> so somehow the bit pattern 0x00...000 does not compare to FP 0.0. >> >> I've investigated this quite a bit (littering the code with this assert) with no particular result except that it somehow seems to have something to do with the `QueryPerformanceCounter()` call as most of the time the assert happens right after taking time. >> Dumping FP+XMM register state (via `fxsave`) right after the comparison `tt == 0.0` goes wrong did not yield anything (to me) obviously wrong (still `val1` and `val2` of this `Tickspan` are zero). >> >> There is no known issue with release code (crashes in this or other particular locations), just that the failures are very annoying in the CI. >> >> The fix changes the FP comparison to an integer comparison which should have been done initially (there is also precedent in the code that does exactly this integer comparison for the same reason which never failed so far), but I/we could not explain why binary 0x00..00 is not always FP "0.0". >> >> Testing: After in total 8k iterations of two tests that seemed to cause this issue more than usual there has been no assertion failure (4k runs with this patch, 4k runs with this assert duplicated all over the place). hs-tier1-5 > > I've been following Thomas's long investigation of this, and this change looks good to me. Thanks @kimbarrett @fisk @walulyai for your reviews. ------------- PR: https://git.openjdk.java.net/jdk16/pull/128 From tschatzl at openjdk.java.net Thu Jan 21 18:24:24 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 21 Jan 2021 18:24:24 GMT Subject: [jdk16] Integrated: 8227695: assert(pss->trim_ticks().seconds() == 0.0) failed: Unexpected partial trimming during evacuation In-Reply-To: <6CHUAswbb5ZkgsMFxekXioiksHbxV8YZsbeNPzwE9Ew=.a00ccd28-072c-4474-a4fa-8e3552d6d40e@github.com> References: <6CHUAswbb5ZkgsMFxekXioiksHbxV8YZsbeNPzwE9Ew=.a00ccd28-072c-4474-a4fa-8e3552d6d40e@github.com> Message-ID: <3vHSCYO8xz-AL7hFCq4mPl3Hdec1FT5uPHxCM_yLoi4=.f490533b-8f7e-4aea-b213-9c5562b34f4a@github.com> On Thu, 21 Jan 2021 09:59:17 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this change that fixes an assert that checks whether there is no "trimming" action to be stable. > > We found that only on Windows Server 2012 and 2016 (not 2019) on many AMD Epyc machines sometimes > > pss->trim_ticks().seconds() == 0.0``` > > fails on random tests. The `seconds()` methods is > > return (double)value * ((double)unit / (double)TimeSource::frequency());``` > > where value is always zero, and `unit` and `TimeSource::frequency()` some constant integers, i.e. > > `(double) 0 * ((double) 1 / (double) 1000...000)` > > does not equal `0.0`. > > Code like this: > > double tt = pss->trim_ticks().seconds(); > assert(tt == 0.0, ".... %2.f " PTR_FORMAT, tt, julong_cast(tt)); > gives something like: > > `assert(tt == 0.0," .... 0.0 0x00000....0000"` > > so somehow the bit pattern 0x00...000 does not compare to FP 0.0. > > I've investigated this quite a bit (littering the code with this assert) with no particular result except that it somehow seems to have something to do with the `QueryPerformanceCounter()` call as most of the time the assert happens right after taking time. > Dumping FP+XMM register state (via `fxsave`) right after the comparison `tt == 0.0` goes wrong did not yield anything (to me) obviously wrong (still `val1` and `val2` of this `Tickspan` are zero). > > There is no known issue with release code (crashes in this or other particular locations), just that the failures are very annoying in the CI. > > The fix changes the FP comparison to an integer comparison which should have been done initially (there is also precedent in the code that does exactly this integer comparison for the same reason which never failed so far), but I/we could not explain why binary 0x00..00 is not always FP "0.0". > > Testing: After in total 8k iterations of two tests that seemed to cause this issue more than usual there has been no assertion failure (4k runs with this patch, 4k runs with this assert duplicated all over the place). hs-tier1-5 This pull request has now been integrated. Changeset: ede1beae Author: Thomas Schatzl URL: https://git.openjdk.java.net/jdk16/commit/ede1beae Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod 8227695: assert(pss->trim_ticks().seconds() == 0.0) failed: Unexpected partial trimming during evacuation Change FP comparison to integer comparison. Reviewed-by: kbarrett, iwalulya, eosterlund ------------- PR: https://git.openjdk.java.net/jdk16/pull/128 From ayang at openjdk.java.net Thu Jan 21 19:15:17 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Thu, 21 Jan 2021 19:15:17 GMT Subject: RFR: JDK-8260200: optimize FreeRegionList::remove_starting_at by removing =?UTF-8?B?deKApg==?= In-Reply-To: References: Message-ID: On Thu, 21 Jan 2021 11:12:18 GMT, Hamlin Li wrote: > optimize FreeRegionList::remove_starting_at by removing unnecessary reading and setting > > FreeRegionList::remove_starting_at(...) traverses from a node and removes subsequent N nodes from free list. But when traverses the free list, it removes nodes one by one by setting the prev and next pointers of prev and next node. it's not necessary do these settings for every node, as we can remove target nodes at once and just set prev and next pointers for just 2 nodes. Changes requested by ayang (Author). src/hotspot/share/gc/g1/heapRegionSet.cpp line 250: > 248: } else { > 249: assert_free_region_list(_tail != curr, "invariant"); > 250: } I think it's best to use `first` instead of `curr` in this part, since `first` is const while `curr` is not as we iterate through the list. `prev` is const, right? How about enforcing it in the type? src/hotspot/share/gc/g1/heapRegionSet.cpp line 261: > 259: assert_free_region_list(_tail != curr, "invariant"); > 260: } > 261: assert(count < num_regions, Since you are touching this area, `assert(count < num_regions)` can be dropped. (We are inside the while-loop; this condition must hold.) A more useful assert is sth like `length() >= num_regions`, if I understand the original code correctly. A bit surprised to not this mentioned in the doc. src/hotspot/share/gc/g1/heapRegionSet.cpp line 281: > 279: } > 280: > 281: if (prev == NULL) { A brief doc on what `prev` and `next` point to could be nice; sth like: `prev` points to the node right before `first` or null when `first == _head`, etc. ------------- PR: https://git.openjdk.java.net/jdk/pull/2181 From ysuenaga at openjdk.java.net Fri Jan 22 00:35:46 2021 From: ysuenaga at openjdk.java.net (Yasumasa Suenaga) Date: Fri, 22 Jan 2021 00:35:46 GMT Subject: RFR: 8259009: G1 heap summary should be shown in "Heap Parameters" window on HSDB In-Reply-To: <0H5ICCOuS2CMe6xjvEKSAi_T3qGsTsamid4pzp7EV18=.b7219f71-9be3-40e0-8605-3d0b7edd9e99@github.com> References: <0H5ICCOuS2CMe6xjvEKSAi_T3qGsTsamid4pzp7EV18=.b7219f71-9be3-40e0-8605-3d0b7edd9e99@github.com> Message-ID: On Wed, 30 Dec 2020 14:26:22 GMT, Yasumasa Suenaga wrote: > G1 heap summary (G1 Heap, summaries for each spaces) is shown on console even though I chosen "Heap Parameters" menu on HSDB. It should be shown on "Heap Parameters" window on HSDB. PING: could you review it? ------------- PR: https://git.openjdk.java.net/jdk/pull/1911 From cjplummer at openjdk.java.net Fri Jan 22 01:21:12 2021 From: cjplummer at openjdk.java.net (Chris Plummer) Date: Fri, 22 Jan 2021 01:21:12 GMT Subject: RFR: 8259009: G1 heap summary should be shown in "Heap Parameters" window on HSDB In-Reply-To: <0H5ICCOuS2CMe6xjvEKSAi_T3qGsTsamid4pzp7EV18=.b7219f71-9be3-40e0-8605-3d0b7edd9e99@github.com> References: <0H5ICCOuS2CMe6xjvEKSAi_T3qGsTsamid4pzp7EV18=.b7219f71-9be3-40e0-8605-3d0b7edd9e99@github.com> Message-ID: On Wed, 30 Dec 2020 14:26:22 GMT, Yasumasa Suenaga wrote: > G1 heap summary (G1 Heap, summaries for each spaces) is shown on console even though I chosen "Heap Parameters" menu on HSDB. It should be shown on "Heap Parameters" window on HSDB. Copyrights need updating. Otherwise looks good. ------------- Marked as reviewed by cjplummer (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1911 From mli at openjdk.java.net Fri Jan 22 01:38:11 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Fri, 22 Jan 2021 01:38:11 GMT Subject: RFR: JDK-8260200: optimize FreeRegionList::remove_starting_at by removing =?UTF-8?B?deKApg==?= [v2] In-Reply-To: References: Message-ID: > optimize FreeRegionList::remove_starting_at by removing unnecessary reading and setting > > FreeRegionList::remove_starting_at(...) traverses from a node and removes subsequent N nodes from free list. But when traverses the free list, it removes nodes one by one by setting the prev and next pointers of prev and next node. it's not necessary do these settings for every node, as we can remove target nodes at once and just set prev and next pointers for just 2 nodes. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: JDK-8260200: optimize FreeRegionList::remove_starting_at by removing unnecessary reading and setting ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2181/files - new: https://git.openjdk.java.net/jdk/pull/2181/files/bf42e208..5b0fed14 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2181&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2181&range=00-01 Stats: 14 lines in 1 file changed: 3 ins; 4 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/2181.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2181/head:pull/2181 PR: https://git.openjdk.java.net/jdk/pull/2181 From mli at openjdk.java.net Fri Jan 22 01:45:46 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Fri, 22 Jan 2021 01:45:46 GMT Subject: RFR: JDK-8260200: optimize FreeRegionList::remove_starting_at by removing =?UTF-8?B?deKApg==?= [v2] In-Reply-To: References: Message-ID: <5q-a7BAbv-TuoGjWueYygtgrjlg4cczVzZ_Y-ohCZ5c=.fd511eb9-5078-4521-85fa-dbc51399a064@github.com> On Thu, 21 Jan 2021 19:10:26 GMT, Albert Mingkun Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> JDK-8260200: optimize FreeRegionList::remove_starting_at by removing unnecessary reading and setting > > src/hotspot/share/gc/g1/heapRegionSet.cpp line 281: > >> 279: } >> 280: >> 281: if (prev == NULL) { > > A brief doc on what `prev` and `next` point to could be nice; sth like: `prev` points to the node right before `first` or null when `first == _head`, etc. Thanks Albert for pointing out these, I have changed as you suggested. ------------- PR: https://git.openjdk.java.net/jdk/pull/2181 From ysuenaga at openjdk.java.net Fri Jan 22 06:53:24 2021 From: ysuenaga at openjdk.java.net (Yasumasa Suenaga) Date: Fri, 22 Jan 2021 06:53:24 GMT Subject: RFR: 8259009: G1 heap summary should be shown in "Heap Parameters" window on HSDB [v2] In-Reply-To: References: <0H5ICCOuS2CMe6xjvEKSAi_T3qGsTsamid4pzp7EV18=.b7219f71-9be3-40e0-8605-3d0b7edd9e99@github.com> Message-ID: On Fri, 22 Jan 2021 01:17:27 GMT, Chris Plummer wrote: >> Yasumasa Suenaga has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Update copyright year >> - Merge remote-tracking branch 'upstream/master' into JDK-8259009 >> - G1 heap summary should be shown in "Heap Parameters" window on HSDB > > Copyrights need updating. Otherwise looks good. @plummercj Thank you for the review! I updated copyright year in new commit. ------------- PR: https://git.openjdk.java.net/jdk/pull/1911 From ysuenaga at openjdk.java.net Fri Jan 22 06:51:59 2021 From: ysuenaga at openjdk.java.net (Yasumasa Suenaga) Date: Fri, 22 Jan 2021 06:51:59 GMT Subject: RFR: 8259009: G1 heap summary should be shown in "Heap Parameters" window on HSDB [v2] In-Reply-To: <0H5ICCOuS2CMe6xjvEKSAi_T3qGsTsamid4pzp7EV18=.b7219f71-9be3-40e0-8605-3d0b7edd9e99@github.com> References: <0H5ICCOuS2CMe6xjvEKSAi_T3qGsTsamid4pzp7EV18=.b7219f71-9be3-40e0-8605-3d0b7edd9e99@github.com> Message-ID: > G1 heap summary (G1 Heap, summaries for each spaces) is shown on console even though I chosen "Heap Parameters" menu on HSDB. It should be shown on "Heap Parameters" window on HSDB. Yasumasa Suenaga has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Update copyright year - Merge remote-tracking branch 'upstream/master' into JDK-8259009 - G1 heap summary should be shown in "Heap Parameters" window on HSDB ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1911/files - new: https://git.openjdk.java.net/jdk/pull/1911/files/4d8d484f..7fca8434 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1911&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1911&range=00-01 Stats: 74715 lines in 2162 files changed: 26456 ins; 30967 del; 17292 mod Patch: https://git.openjdk.java.net/jdk/pull/1911.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1911/head:pull/1911 PR: https://git.openjdk.java.net/jdk/pull/1911 From tschatzl at openjdk.java.net Fri Jan 22 08:26:42 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 22 Jan 2021 08:26:42 GMT Subject: RFR: 8259009: G1 heap summary should be shown in "Heap Parameters" window on HSDB [v2] In-Reply-To: References: <0H5ICCOuS2CMe6xjvEKSAi_T3qGsTsamid4pzp7EV18=.b7219f71-9be3-40e0-8605-3d0b7edd9e99@github.com> Message-ID: On Fri, 22 Jan 2021 06:51:59 GMT, Yasumasa Suenaga wrote: >> G1 heap summary (G1 Heap, summaries for each spaces) is shown on console even though I chosen "Heap Parameters" menu on HSDB. It should be shown on "Heap Parameters" window on HSDB. > > Yasumasa Suenaga has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Update copyright year > - Merge remote-tracking branch 'upstream/master' into JDK-8259009 > - G1 heap summary should be shown in "Heap Parameters" window on HSDB I would prefer to make the order of parameters uniform in `printG1HeapSummary(G1CollectedHeap g1h, PrintStream tty) {` too. I.e. there does not seem to be a reason to have only this function have `tty` as second parameter. ------------- Changes requested by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1911 From tschatzl at openjdk.java.net Fri Jan 22 08:30:48 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 22 Jan 2021 08:30:48 GMT Subject: RFR: JDK-8260200: optimize FreeRegionList::remove_starting_at by removing =?UTF-8?B?deKApg==?= [v2] In-Reply-To: References: Message-ID: On Thu, 21 Jan 2021 19:10:43 GMT, Albert Mingkun Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> JDK-8260200: optimize FreeRegionList::remove_starting_at by removing unnecessary reading and setting > > Changes requested by ayang (Author). Sorry, wrong PR. ------------- PR: https://git.openjdk.java.net/jdk/pull/2181 From tschatzl at openjdk.java.net Fri Jan 22 08:32:45 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 22 Jan 2021 08:32:45 GMT Subject: RFR: JDK-8260208: fix dummy object filling condition in =?UTF-8?B?RzFDb2xsZWN0ZWRIZWFwOjpm4oCm?= In-Reply-To: References: Message-ID: On Fri, 22 Jan 2021 08:29:23 GMT, Thomas Schatzl wrote: >> it's a minor fix/enhancement in cds, it fixes dummy object filling condition in G1CollectedHeap::fill_archive_regions > > I think this is good, but let's have somebody from the runtime team also have a quick look over it. What's the testing for this change? Pre-submit testing is not configured, so there is no indication that this has been tested. ------------- PR: https://git.openjdk.java.net/jdk/pull/2183 From tschatzl at openjdk.java.net Fri Jan 22 08:32:45 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 22 Jan 2021 08:32:45 GMT Subject: RFR: JDK-8260208: fix dummy object filling condition in =?UTF-8?B?RzFDb2xsZWN0ZWRIZWFwOjpm4oCm?= In-Reply-To: References: Message-ID: On Thu, 21 Jan 2021 11:16:42 GMT, Hamlin Li wrote: > it's a minor fix/enhancement in cds, it fixes dummy object filling condition in G1CollectedHeap::fill_archive_regions I think this is good, but let's have somebody from the runtime team also have a quick look over it. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2183 From tschatzl at openjdk.java.net Fri Jan 22 08:39:46 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 22 Jan 2021 08:39:46 GMT Subject: RFR: JDK-8260200: optimize FreeRegionList::remove_starting_at by removing =?UTF-8?B?deKApg==?= [v2] In-Reply-To: References: Message-ID: <_i3PP7fAkvWPFy3iN3uJZuX8sD0XJ_fR2cJrgmFHWUs=.d968424c-6309-4bb0-8592-9dfd3da9dd07@github.com> On Fri, 22 Jan 2021 08:27:12 GMT, Thomas Schatzl wrote: >> Changes requested by ayang (Author). > > Sorry, wrong PR. Can you give details about the testing you performed? Automatied pre-submit testing has not been configured. ------------- PR: https://git.openjdk.java.net/jdk/pull/2181 From sjohanss at openjdk.java.net Fri Jan 22 08:58:49 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Fri, 22 Jan 2021 08:58:49 GMT Subject: RFR: JDK-8260200: optimize FreeRegionList::remove_starting_at by removing =?UTF-8?B?deKApg==?= [v2] In-Reply-To: References: Message-ID: On Fri, 22 Jan 2021 01:38:11 GMT, Hamlin Li wrote: >> optimize FreeRegionList::remove_starting_at by removing unnecessary reading and setting >> >> FreeRegionList::remove_starting_at(...) traverses from a node and removes subsequent N nodes from free list. But when traverses the free list, it removes nodes one by one by setting the prev and next pointers of prev and next node. it's not necessary do these settings for every node, as we can remove target nodes at once and just set prev and next pointers for just 2 nodes. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8260200: optimize FreeRegionList::remove_starting_at by removing unnecessary reading and setting I would also prefer if you enable GitHub actions to allow testing. I've kicked of a run in our internal environment as well. src/hotspot/share/gc/g1/heapRegionSet.cpp line 252: > 250: } else { > 251: assert_free_region_list(_tail != first, "invariant"); > 252: } Since these checks no longer does anything other than assertions I think it would be nice to hide it in a helper that for production builds will do nothing using `NOT_DEBUG_RETURN`. ------------- Changes requested by sjohanss (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2181 From sjohanss at openjdk.java.net Fri Jan 22 09:03:53 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Fri, 22 Jan 2021 09:03:53 GMT Subject: RFR: 8260042: G1 Post-cleanup liveness printing occurs too early In-Reply-To: <25_KrAypifFeF67n3yODY_SaSYoQj937CwOF8qmGPoc=.b8df7f88-0688-4436-87ac-869dcb858ada@github.com> References: <25_KrAypifFeF67n3yODY_SaSYoQj937CwOF8qmGPoc=.b8df7f88-0688-4436-87ac-869dcb858ada@github.com> Message-ID: On Wed, 20 Jan 2021 16:00:18 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this small change that fixes position of the Post-cleanup liveness printing, causing wrong gc efficiencies to be printed? > > I.e. due to some older changes, the calculation of gc efficiences got moved below the printing of the Post-cleanup liveness which should be about these values. This change corrects that. > > Note that there is a sister issue about not printing the gc efficiencies in the "Post-Marking" phase. This is not scope of this change. > > Testing: manual testing that values are correct, hs-tier1+2 Looks good! ------------- Marked as reviewed by sjohanss (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2168 From mli at openjdk.java.net Fri Jan 22 09:03:53 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Fri, 22 Jan 2021 09:03:53 GMT Subject: RFR: JDK-8260208: fix dummy object filling condition in =?UTF-8?B?RzFDb2xsZWN0ZWRIZWFwOjpm4oCm?= In-Reply-To: References: Message-ID: On Fri, 22 Jan 2021 08:30:36 GMT, Thomas Schatzl wrote: >> I think this is good, but let's have somebody from the runtime team also have a quick look over it. > > What's the testing for this change? Pre-submit testing is not configured, so there is no indication that this has been tested. Thanks for reviewing. I tested locally with tests at test/hotspot/jtreg/gc/g1/, it has the same test result as master jdk. This is the first time I submit code on github, I will configure pre-submit testing. ------------- PR: https://git.openjdk.java.net/jdk/pull/2183 From ayang at openjdk.java.net Fri Jan 22 09:06:05 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Fri, 22 Jan 2021 09:06:05 GMT Subject: RFR: 8253420: Refactor HeapRegionManager::find_highest_free Message-ID: Using for-loop to make the number of iterations more explicit. Direct backward iteration, `for (uint curr = reserved_length() - 1; curr >= 0; curr--)` doesn't work due to underflow of `uint` type. Therefore, I went for current approach. Test: hotspot_gc ------------- Commit messages: - for Changes: https://git.openjdk.java.net/jdk/pull/2193/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2193&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8253420 Stats: 12 lines in 1 file changed: 2 ins; 6 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/2193.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2193/head:pull/2193 PR: https://git.openjdk.java.net/jdk/pull/2193 From mli at openjdk.java.net Fri Jan 22 09:07:44 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Fri, 22 Jan 2021 09:07:44 GMT Subject: RFR: JDK-8260200: optimize FreeRegionList::remove_starting_at by removing =?UTF-8?B?deKApg==?= [v2] In-Reply-To: References: Message-ID: <65w4oVchMIei1Asbc8d4CuawYNeyBhKMIhkKTPhmR2U=.f72b851b-67a0-4a6b-9e15-50e02e2501c6@github.com> On Fri, 22 Jan 2021 08:55:19 GMT, Stefan Johansson wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> JDK-8260200: optimize FreeRegionList::remove_starting_at by removing unnecessary reading and setting > > I would also prefer if you enable GitHub actions to allow testing. I've kicked of a run in our internal environment as well. > Can you give details about the testing you performed? Automatied pre-submit testing has not been configured. I tested locally with tests at test/hotspot/jtreg/gc/g1/, it has the same test result as master jdk. ------------- PR: https://git.openjdk.java.net/jdk/pull/2181 From ayang at openjdk.java.net Fri Jan 22 09:18:47 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Fri, 22 Jan 2021 09:18:47 GMT Subject: RFR: JDK-8260200: optimize FreeRegionList::remove_starting_at by removing =?UTF-8?B?deKApg==?= [v2] In-Reply-To: References: Message-ID: On Fri, 22 Jan 2021 01:38:11 GMT, Hamlin Li wrote: >> optimize FreeRegionList::remove_starting_at by removing unnecessary reading and setting >> >> FreeRegionList::remove_starting_at(...) traverses from a node and removes subsequent N nodes from free list. But when traverses the free list, it removes nodes one by one by setting the prev and next pointers of prev and next node. it's not necessary do these settings for every node, as we can remove target nodes at once and just set prev and next pointers for just 2 nodes. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8260200: optimize FreeRegionList::remove_starting_at by removing unnecessary reading and setting Thanks for the revision. I think the doc on `next` can be further improved. It's best to describe its purpose, sth like, after the while-loop, `next` should point to the next node right after the removed sublist, or null if the sublist contains tail. ------------- Marked as reviewed by ayang (Author). PR: https://git.openjdk.java.net/jdk/pull/2181 From sjohanss at openjdk.java.net Fri Jan 22 09:37:53 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Fri, 22 Jan 2021 09:37:53 GMT Subject: RFR: 8259808: Add JFR event to detect GC locker stall [v2] In-Reply-To: References: Message-ID: On Mon, 18 Jan 2021 13:55:06 GMT, Denghui Dong wrote: >> GC locker will stall the operation of GC, resulting in some Java threads can not continue to run until GC locker is released, thus affecting the response time of the application. Add a JFR event to report this information is helpful to detect this issue. >> >> For the test purpose, I add two Whitebox methods to lock/unlock critical. > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > Refactor based on comments Some comments. src/hotspot/share/gc/shared/gcTraceSend.cpp line 354: > 352: > 353: void GCLockerTracer::start_gc_locker(const jint jni_lock_count) { > 354: assert(SafepointSynchronize::is_at_safepoint(), "sanity"); Maybe add assertions that `_jni_lock_count` and `_stall_count` is 0 to ensure this is not called multiple times for a single event. src/hotspot/share/gc/shared/gcTraceSend.cpp line 369: > 367: void GCLockerTracer::report_gc_locker() { > 368: Ticks zero; > 369: if (_needs_gc_start_timestamp != zero) { Why is this needed? src/hotspot/share/gc/shared/gcTraceSend.cpp line 372: > 370: EventGCLocker event(UNTIMED); > 371: if (event.should_commit()) { > 372: event.set_starttime(_needs_gc_start_timestamp); Shouldn't you also set the endtime using `event.set_endtime(...)`? src/hotspot/share/gc/shared/gcTraceSend.cpp line 378: > 376: } > 377: _needs_gc_start_timestamp = zero; > 378: _stall_count = 0; Any reason to not clear `_jni_lock_count`? It would be needed for the assert suggested above. ------------- Changes requested by sjohanss (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2088 From tschatzl at openjdk.java.net Fri Jan 22 10:39:59 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 22 Jan 2021 10:39:59 GMT Subject: RFR: 8260263: Remove PtrQueue::_qset Message-ID: <-SnSuf-MYWyw7qEE8FMVi81-2qREuzbYEHiNcTSPCuU=.e3a4f5f7-1f60-4495-8f5c-f2274488e6e2@github.com> Hi all, can I have reviews for this trivial(?) removal of dead code pertaining to and including `PtrQueue::_qset`? Testing: local compilation, tier1 Thanks, Thomas ------------- Commit messages: - Initial commit Changes: https://git.openjdk.java.net/jdk/pull/2194/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2194&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260263 Stats: 25 lines in 4 files changed: 0 ins; 25 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2194.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2194/head:pull/2194 PR: https://git.openjdk.java.net/jdk/pull/2194 From ysuenaga at openjdk.java.net Fri Jan 22 10:43:12 2021 From: ysuenaga at openjdk.java.net (Yasumasa Suenaga) Date: Fri, 22 Jan 2021 10:43:12 GMT Subject: RFR: 8259009: G1 heap summary should be shown in "Heap Parameters" window on HSDB [v3] In-Reply-To: <0H5ICCOuS2CMe6xjvEKSAi_T3qGsTsamid4pzp7EV18=.b7219f71-9be3-40e0-8605-3d0b7edd9e99@github.com> References: <0H5ICCOuS2CMe6xjvEKSAi_T3qGsTsamid4pzp7EV18=.b7219f71-9be3-40e0-8605-3d0b7edd9e99@github.com> Message-ID: > G1 heap summary (G1 Heap, summaries for each spaces) is shown on console even though I chosen "Heap Parameters" menu on HSDB. It should be shown on "Heap Parameters" window on HSDB. Yasumasa Suenaga has updated the pull request incrementally with one additional commit since the last revision: Fix parameter order ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/1911/files - new: https://git.openjdk.java.net/jdk/pull/1911/files/7fca8434..6fa2204a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1911&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1911&range=01-02 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/1911.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1911/head:pull/1911 PR: https://git.openjdk.java.net/jdk/pull/1911 From ysuenaga at openjdk.java.net Fri Jan 22 10:46:55 2021 From: ysuenaga at openjdk.java.net (Yasumasa Suenaga) Date: Fri, 22 Jan 2021 10:46:55 GMT Subject: RFR: 8259009: G1 heap summary should be shown in "Heap Parameters" window on HSDB [v2] In-Reply-To: References: <0H5ICCOuS2CMe6xjvEKSAi_T3qGsTsamid4pzp7EV18=.b7219f71-9be3-40e0-8605-3d0b7edd9e99@github.com> Message-ID: <9i6tANwPwA06D4WGldBiyw1dBehsrG6I7AqG34e6PDA=.606ba923-ef18-4416-883f-c125e833f95b@github.com> On Fri, 22 Jan 2021 08:23:17 GMT, Thomas Schatzl wrote: >> Yasumasa Suenaga has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Update copyright year >> - Merge remote-tracking branch 'upstream/master' into JDK-8259009 >> - G1 heap summary should be shown in "Heap Parameters" window on HSDB > > I would prefer to make the order of parameters uniform in `printG1HeapSummary(G1CollectedHeap g1h, PrintStream tty) {` too. I.e. there does not seem to be a reason to have only this function have `tty` as second parameter. @tschatzl Thanks for the comment! I fixed it. ------------- PR: https://git.openjdk.java.net/jdk/pull/1911 From tschatzl at openjdk.java.net Fri Jan 22 10:52:46 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 22 Jan 2021 10:52:46 GMT Subject: RFR: 8259009: G1 heap summary should be shown in "Heap Parameters" window on HSDB [v3] In-Reply-To: References: <0H5ICCOuS2CMe6xjvEKSAi_T3qGsTsamid4pzp7EV18=.b7219f71-9be3-40e0-8605-3d0b7edd9e99@github.com> Message-ID: On Fri, 22 Jan 2021 10:43:12 GMT, Yasumasa Suenaga wrote: >> G1 heap summary (G1 Heap, summaries for each spaces) is shown on console even though I chosen "Heap Parameters" menu on HSDB. It should be shown on "Heap Parameters" window on HSDB. > > Yasumasa Suenaga has updated the pull request incrementally with one additional commit since the last revision: > > Fix parameter order Marked as reviewed by tschatzl (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1911 From pliden at openjdk.java.net Fri Jan 22 11:21:54 2021 From: pliden at openjdk.java.net (Per Liden) Date: Fri, 22 Jan 2021 11:21:54 GMT Subject: [jdk16] RFR: 8259765: ZGC: Handle incorrect processor id reported by the operating system [v2] In-Reply-To: References: Message-ID: <75qnWumnRn-LDG_hwMAxgqeQCn0HtJWEPadgb9i2_qE=.376ff161-9d5f-4332-9719-a4a5d2beae00@github.com> On Sat, 16 Jan 2021 13:00:04 GMT, David Holmes wrote: >> Per Liden has updated the pull request incrementally with one additional commit since the last revision: >> >> Review > > So we have to penalize all correctly functioning users because of one broken environment? Can we not detect this broken environment at startup and inject a workaround then? > > Why is this an environment that is important enough that OpenJDK has to make changes to deal with a broken environment? > > Cheers, > David @dholmes-ora Do you still have questions or concerns here, or can I go ahead and integrate this? I've gone through all uses of sysconf(_SC_NPROCESSORS_*) and sched_getaffinity() we have, and they look fine. I've also looked at how the OSContainer stuff behaves in this environment, and it also looks fine. In summary, the only problem I can spot is related to sched_getcpu(). ------------- PR: https://git.openjdk.java.net/jdk16/pull/124 From kbarrett at openjdk.java.net Fri Jan 22 11:23:49 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 22 Jan 2021 11:23:49 GMT Subject: [jdk16] Integrated: 8259271: gc/parallel/TestDynShrinkHeap.java still fails "assert(covered_region.contains(new_memregion)) failed: new region is not in covered_region" In-Reply-To: <-cfOs4ymxTPyi5QvyU6MJAULUl9G0EyoxYmTr6Psv1k=.e95118c6-a09d-4766-86c5-a2753771e436@github.com> References: <-cfOs4ymxTPyi5QvyU6MJAULUl9G0EyoxYmTr6Psv1k=.e95118c6-a09d-4766-86c5-a2753771e436@github.com> Message-ID: On Thu, 21 Jan 2021 09:26:19 GMT, Kim Barrett wrote: > Please review this fix for an intermittent crash when using ParallelGC on > aarch64. The problem is a mis-ordered pair of reads that permit an > algorithmic invariant to be violated. The mis-ordering is due to the lack > any explicit ordering request (a missing barrier). > > In MutableSpace::cas_allocate, we had > > HeapWord* obj = top(); > if (pointer_delta(end(), obj) >= size) { > ... space available, attempt the CAS to claim it ... > } > > If end is read before top, other threads may advance top and end between > those reads. If, when top is read, current top > old end and current top + > size > current end, the range check will unexpectedly pass because of > underflow in pointer_delta. This will allow top to be advanced to a value > which is > current end, violating the algorithmic invariant, and likely > leading to crashes or memory corruption. > > gcc for x86 doesn't reorder the reads, but for aarch64 it does, and is > permitted to do so. Even if it didn't, there is nothing to prevent the > hardware from doing so. The solution is to use a load_acquire for top, to > ensure the needed order. > > This may have been the actual root cause of JDK-8257999. However, the > change made there was and still is needed for the reasons described. > > Testing: > mach5 tier1-3 > > Even with knowledge of the failure mode it's very hard to reproduce. I was > unable to catch the underflow case in over 1K attempts using machines in our > test farm, though StefanK caught it a few times on a personal machine. This pull request has now been integrated. Changeset: 685c03dc Author: Kim Barrett URL: https://git.openjdk.java.net/jdk16/commit/685c03dc Stats: 5 lines in 1 file changed: 4 ins; 0 del; 1 mod 8259271: gc/parallel/TestDynShrinkHeap.java still fails "assert(covered_region.contains(new_memregion)) failed: new region is not in covered_region" Use load_acquire to order reads of top and end. Reviewed-by: tschatzl, iwalulya, eosterlund ------------- PR: https://git.openjdk.java.net/jdk16/pull/127 From shade at openjdk.java.net Fri Jan 22 11:43:54 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 22 Jan 2021 11:43:54 GMT Subject: Integrated: 8260212: Shenandoah: resolve-only UpdateRefsMode is not used In-Reply-To: References: Message-ID: On Thu, 21 Jan 2021 08:31:36 GMT, Aleksey Shipilev wrote: > The only "use" is `ShenandoahMarkResolveRefsClosure`, which is unused itself. This pull request has now been integrated. Changeset: bfac3fb5 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/bfac3fb5 Stats: 18 lines in 2 files changed: 0 ins; 18 del; 0 mod 8260212: Shenandoah: resolve-only UpdateRefsMode is not used Reviewed-by: rkennke, zgu ------------- PR: https://git.openjdk.java.net/jdk/pull/2177 From mli at openjdk.java.net Fri Jan 22 11:44:10 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Fri, 22 Jan 2021 11:44:10 GMT Subject: RFR: JDK-8260200: optimize FreeRegionList::remove_starting_at by removing =?UTF-8?B?deKApg==?= [v3] In-Reply-To: References: Message-ID: > optimize FreeRegionList::remove_starting_at by removing unnecessary reading and setting > > FreeRegionList::remove_starting_at(...) traverses from a node and removes subsequent N nodes from free list. But when traverses the free list, it removes nodes one by one by setting the prev and next pointers of prev and next node. it's not necessary do these settings for every node, as we can remove target nodes at once and just set prev and next pointers for just 2 nodes. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: JDK-8260200: G1: Remove unnecessary update in FreeRegionList::remove_starting_at ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2181/files - new: https://git.openjdk.java.net/jdk/pull/2181/files/5b0fed14..27c85ec5 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2181&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2181&range=01-02 Stats: 8 lines in 1 file changed: 6 ins; 2 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2181.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2181/head:pull/2181 PR: https://git.openjdk.java.net/jdk/pull/2181 From mli at openjdk.java.net Fri Jan 22 11:44:10 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Fri, 22 Jan 2021 11:44:10 GMT Subject: RFR: JDK-8260200: optimize FreeRegionList::remove_starting_at by removing =?UTF-8?B?deKApg==?= [v2] In-Reply-To: References: Message-ID: On Fri, 22 Jan 2021 09:15:19 GMT, Albert Mingkun Yang wrote: > Thanks for the revision. > > I think the doc on `next` can be further improved. It's best to describe its purpose, sth like, after the while-loop, `next` should point to the next node right after the removed sublist, or null if the sublist contains tail. Just add more detailed info for "next" as you suggested. ------------- PR: https://git.openjdk.java.net/jdk/pull/2181 From mli at openjdk.java.net Fri Jan 22 11:44:11 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Fri, 22 Jan 2021 11:44:11 GMT Subject: RFR: JDK-8260200: optimize FreeRegionList::remove_starting_at by removing =?UTF-8?B?deKApg==?= [v2] In-Reply-To: References: Message-ID: <04tWoUPw_97DYXPqlJdJhQyDr0dBW1Vkmaja9rqPMEE=.ff792b59-ab27-49c7-bcaa-3b6d329b7650@github.com> On Fri, 22 Jan 2021 08:54:16 GMT, Stefan Johansson wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> JDK-8260200: optimize FreeRegionList::remove_starting_at by removing unnecessary reading and setting > > src/hotspot/share/gc/g1/heapRegionSet.cpp line 252: > >> 250: } else { >> 251: assert_free_region_list(_tail != first, "invariant"); >> 252: } > > Since these checks no longer does anything other than assertions I think it would be nice to hide it in a helper that for production builds will do nothing using `NOT_DEBUG_RETURN`. Thanks for revewing, just changed as you suggested. ------------- PR: https://git.openjdk.java.net/jdk/pull/2181 From shade at openjdk.java.net Fri Jan 22 12:16:10 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 22 Jan 2021 12:16:10 GMT Subject: RFR: 8260106: Shenandoah: simplify maybe_update_with_forwarded and related code [v2] In-Reply-To: References: <6kABGv_phymIASILTHYVqGaQf6Lu7tgqj4wQYibNYaA=.12ef2e99-9d24-4a15-be74-aaf468cd0ca5@github.com> Message-ID: On Thu, 21 Jan 2021 08:59:07 GMT, Roman Kennke wrote: >> Aleksey Shipilev has updated the pull request incrementally with two additional commits since the last revision: >> >> - Rename maybe to atomic >> - Touch up comments > > Looks good! Thanks! I updated the patch a bit after `RESOLVE` mode removal. I think `NO_UPDATE` and `STW_UPDATE` reads better. This also highlights that STW closures are really doing non-CAS updates. ------------- PR: https://git.openjdk.java.net/jdk/pull/2166 From shade at openjdk.java.net Fri Jan 22 12:16:10 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 22 Jan 2021 12:16:10 GMT Subject: RFR: 8260106: Shenandoah: simplify maybe_update_with_forwarded and related code [v3] In-Reply-To: References: Message-ID: > We have a block in `ShenandoahHeap::maybe_update_with_forwarded` that is irrelevant after JDK-8231086. Additionally, "resolve and update" paths are really only used by STW GCs, and thus do not require atomic updates. This leads to considerable simplifications in the code, and improves performance on the common paths (especially in fastdebug builds that drop many irrelevant asserts). > > Additional testing: > - [x] `hotspot_gc_shenandoah` > - [x] `tier1` with Shenandoah > - [x] `tier2` with Shenandoah Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Simplify further after RESOLVE removal - Merge branch 'master' into JDK-8260106-shenandoah-simplify-updates - Rename maybe to atomic - Touch up comments - 8260106: Shenandoah: simplify maybe_update_with_forwarded and related code ------------- Changes: https://git.openjdk.java.net/jdk/pull/2166/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2166&range=02 Stats: 159 lines in 5 files changed: 19 ins; 84 del; 56 mod Patch: https://git.openjdk.java.net/jdk/pull/2166.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2166/head:pull/2166 PR: https://git.openjdk.java.net/jdk/pull/2166 From kbarrett at openjdk.java.net Fri Jan 22 13:17:05 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 22 Jan 2021 13:17:05 GMT Subject: RFR: 8256814: WeakProcessorPhases may be redundant [v4] In-Reply-To: References: Message-ID: On Tue, 12 Jan 2021 10:09:51 GMT, Stefan Karlsson wrote: >> Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: >> >> - Merge branch 'master' into wpp4 >> - update copyrights >> - remove type aliases for OopStorageSet::WeakId >> - Merge branch 'master' into wpp4 >> - stefank review >> - Remove WeakProcessorPhase, adding scoped enum categories to OopStorageSet. > > I think this looks good. I have a few comments that I would like to get addressed, but they are not blockers if you want to proceed with what you have. Thanks @stefank , @tschatzl , @rkennke for reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/1862 From kbarrett at openjdk.java.net Fri Jan 22 13:17:05 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 22 Jan 2021 13:17:05 GMT Subject: RFR: 8256814: WeakProcessorPhases may be redundant [v4] In-Reply-To: References: Message-ID: > Please review this change which eliminates the WeakProcessorPhase class. > > The OopStorageSet class is changed to provide scoped enums for the different > categories: StrongId, WeakId, and Id (for the union of strong and weak). > An accessor is provided for obtaining the storage corresponding to a > category value. > > Various other enumerator ranges, array sizes and indices, and iterations are > derived directly from the corresponding OopStorageSet category's enum range. > > Iteration over a category of enumerators can be done via EnumIterator. The > iteration over storage objects makes use of that enum iteration, rather than > having a bespoke implementation. Some use-cases need iteration of the > enumerators, with storage lookup from the enumerator; other use-cases just > need the storage objects. > > Testing: > mach5 tier1-6 > Local (linux-x64) hotspot:tier1 with -XX:+UseShenandoahGC Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Merge branch 'master' into wpp4 - update copyrights - remove type aliases for OopStorageSet::WeakId - Merge branch 'master' into wpp4 - stefank review - Remove WeakProcessorPhase, adding scoped enum categories to OopStorageSet. ------------- Changes: https://git.openjdk.java.net/jdk/pull/1862/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1862&range=03 Stats: 1052 lines in 25 files changed: 398 ins; 465 del; 189 mod Patch: https://git.openjdk.java.net/jdk/pull/1862.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1862/head:pull/1862 PR: https://git.openjdk.java.net/jdk/pull/1862 From kbarrett at openjdk.java.net Fri Jan 22 13:17:08 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 22 Jan 2021 13:17:08 GMT Subject: Integrated: 8256814: WeakProcessorPhases may be redundant In-Reply-To: References: Message-ID: On Tue, 22 Dec 2020 04:59:28 GMT, Kim Barrett wrote: > Please review this change which eliminates the WeakProcessorPhase class. > > The OopStorageSet class is changed to provide scoped enums for the different > categories: StrongId, WeakId, and Id (for the union of strong and weak). > An accessor is provided for obtaining the storage corresponding to a > category value. > > Various other enumerator ranges, array sizes and indices, and iterations are > derived directly from the corresponding OopStorageSet category's enum range. > > Iteration over a category of enumerators can be done via EnumIterator. The > iteration over storage objects makes use of that enum iteration, rather than > having a bespoke implementation. Some use-cases need iteration of the > enumerators, with storage lookup from the enumerator; other use-cases just > need the storage objects. > > Testing: > mach5 tier1-6 > Local (linux-x64) hotspot:tier1 with -XX:+UseShenandoahGC This pull request has now been integrated. Changeset: 7ed8ba1c Author: Kim Barrett URL: https://git.openjdk.java.net/jdk/commit/7ed8ba1c Stats: 1052 lines in 25 files changed: 398 ins; 465 del; 189 mod 8256814: WeakProcessorPhases may be redundant Remove WeakProcessorPhase, adding scoped enum categories to OopStorageSet. Reviewed-by: stefank, tschatzl, rkennke ------------- PR: https://git.openjdk.java.net/jdk/pull/1862 From ddong at openjdk.java.net Fri Jan 22 13:31:11 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Fri, 22 Jan 2021 13:31:11 GMT Subject: RFR: 8259808: Add JFR event to detect GC locker stall [v2] In-Reply-To: References: Message-ID: On Fri, 22 Jan 2021 09:17:45 GMT, Stefan Johansson wrote: >> Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: >> >> Refactor based on comments > > src/hotspot/share/gc/shared/gcTraceSend.cpp line 354: > >> 352: >> 353: void GCLockerTracer::start_gc_locker(const jint jni_lock_count) { >> 354: assert(SafepointSynchronize::is_at_safepoint(), "sanity"); > > Maybe add assertions that `_jni_lock_count` and `_stall_count` is 0 to ensure this is not called multiple times for a single event. Makes sense. Added. > src/hotspot/share/gc/shared/gcTraceSend.cpp line 369: > >> 367: void GCLockerTracer::report_gc_locker() { >> 368: Ticks zero; >> 369: if (_needs_gc_start_timestamp != zero) { > > Why is this needed? Because we can't assume that EventGCLocker is enabled when GC locker is started, in another word, at the beginning and end of the GC locker, whether EventGCLocker is enabled may be inconsistent. So check here if _needs_gc_start_timestamp is not zero, if it is not 0, it needs to reset _needs_gc_start_timestamp regardless of whether the event will be sent. > src/hotspot/share/gc/shared/gcTraceSend.cpp line 372: > >> 370: EventGCLocker event(UNTIMED); >> 371: if (event.should_commit()) { >> 372: event.set_starttime(_needs_gc_start_timestamp); > > Shouldn't you also set the endtime using `event.set_endtime(...)`? endtime will be set in event.commit() > src/hotspot/share/gc/shared/gcTraceSend.cpp line 378: > >> 376: } >> 377: _needs_gc_start_timestamp = zero; >> 378: _stall_count = 0; > > Any reason to not clear `_jni_lock_count`? It would be needed for the assert suggested above. There is no special reason, just to save memory access. It's okay for me to reset it. Fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/2088 From ddong at openjdk.java.net Fri Jan 22 13:31:09 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Fri, 22 Jan 2021 13:31:09 GMT Subject: RFR: 8259808: Add JFR event to detect GC locker stall [v3] In-Reply-To: References: Message-ID: > GC locker will stall the operation of GC, resulting in some Java threads can not continue to run until GC locker is released, thus affecting the response time of the application. Add a JFR event to report this information is helpful to detect this issue. > > For the test purpose, I add two Whitebox methods to lock/unlock critical. Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: add assertions and reset _jni_lock_count ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2088/files - new: https://git.openjdk.java.net/jdk/pull/2088/files/c36d4f96..41641d8a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2088&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2088&range=01-02 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2088.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2088/head:pull/2088 PR: https://git.openjdk.java.net/jdk/pull/2088 From ddong at openjdk.java.net Fri Jan 22 13:33:53 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Fri, 22 Jan 2021 13:33:53 GMT Subject: RFR: 8259808: Add JFR event to detect GC locker stall [v2] In-Reply-To: References: Message-ID: On Fri, 22 Jan 2021 09:35:04 GMT, Stefan Johansson wrote: > Some comments. Thanks for the review :) ------------- PR: https://git.openjdk.java.net/jdk/pull/2088 From ysuenaga at openjdk.java.net Fri Jan 22 14:41:46 2021 From: ysuenaga at openjdk.java.net (Yasumasa Suenaga) Date: Fri, 22 Jan 2021 14:41:46 GMT Subject: Integrated: 8259009: G1 heap summary should be shown in "Heap Parameters" window on HSDB In-Reply-To: <0H5ICCOuS2CMe6xjvEKSAi_T3qGsTsamid4pzp7EV18=.b7219f71-9be3-40e0-8605-3d0b7edd9e99@github.com> References: <0H5ICCOuS2CMe6xjvEKSAi_T3qGsTsamid4pzp7EV18=.b7219f71-9be3-40e0-8605-3d0b7edd9e99@github.com> Message-ID: On Wed, 30 Dec 2020 14:26:22 GMT, Yasumasa Suenaga wrote: > G1 heap summary (G1 Heap, summaries for each spaces) is shown on console even though I chosen "Heap Parameters" menu on HSDB. It should be shown on "Heap Parameters" window on HSDB. This pull request has now been integrated. Changeset: 154e1d63 Author: Yasumasa Suenaga URL: https://git.openjdk.java.net/jdk/commit/154e1d63 Stats: 33 lines in 2 files changed: 14 ins; 1 del; 18 mod 8259009: G1 heap summary should be shown in "Heap Parameters" window on HSDB Reviewed-by: cjplummer, tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/1911 From rkennke at openjdk.java.net Fri Jan 22 15:53:47 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Fri, 22 Jan 2021 15:53:47 GMT Subject: RFR: 8260106: Shenandoah: simplify maybe_update_with_forwarded and related code [v3] In-Reply-To: References: Message-ID: <9fHyPXjbBbqKJlZJxNyeeLytjC6ASp87u1i8Pf1cT4s=.eb0bee65-eca2-4513-b0d0-1ce174cf5b7e@github.com> On Fri, 22 Jan 2021 12:16:10 GMT, Aleksey Shipilev wrote: >> We have a block in `ShenandoahHeap::maybe_update_with_forwarded` that is irrelevant after JDK-8231086. Additionally, "resolve and update" paths are really only used by STW GCs, and thus do not require atomic updates. This leads to considerable simplifications in the code, and improves performance on the common paths (especially in fastdebug builds that drop many irrelevant asserts). >> >> Additional testing: >> - [x] `hotspot_gc_shenandoah` >> - [x] `tier1` with Shenandoah >> - [x] `tier2` with Shenandoah > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Simplify further after RESOLVE removal > - Merge branch 'master' into JDK-8260106-shenandoah-simplify-updates > - Rename maybe to atomic > - Touch up comments > - 8260106: Shenandoah: simplify maybe_update_with_forwarded and related code Looks good to me! Thanks! ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2166 From shade at openjdk.java.net Fri Jan 22 16:27:59 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 22 Jan 2021 16:27:59 GMT Subject: RFR: 8260106: Shenandoah: simplify maybe_update_with_forwarded and related code [v4] In-Reply-To: References: Message-ID: > We have a block in `ShenandoahHeap::maybe_update_with_forwarded` that is irrelevant after JDK-8231086. Additionally, "resolve and update" paths are really only used by STW GCs, and thus do not require atomic updates. This leads to considerable simplifications in the code, and improves performance on the common paths (especially in fastdebug builds that drop many irrelevant asserts). > > Additional testing: > - [x] `hotspot_gc_shenandoah` > - [x] `tier1` with Shenandoah > - [x] `tier2` with Shenandoah Aleksey Shipilev has updated the pull request incrementally with two additional commits since the last revision: - Simplify ShenandoahUpdateHeapRefsTask - Fix up generic update references too, introduce CONC_UPDATE ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2166/files - new: https://git.openjdk.java.net/jdk/pull/2166/files/edcab593..a479e7b7 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2166&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2166&range=02-03 Stats: 90 lines in 12 files changed: 64 ins; 5 del; 21 mod Patch: https://git.openjdk.java.net/jdk/pull/2166.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2166/head:pull/2166 PR: https://git.openjdk.java.net/jdk/pull/2166 From shade at openjdk.java.net Fri Jan 22 16:28:00 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 22 Jan 2021 16:28:00 GMT Subject: RFR: 8260106: Shenandoah: simplify maybe_update_with_forwarded and related code [v3] In-Reply-To: <9fHyPXjbBbqKJlZJxNyeeLytjC6ASp87u1i8Pf1cT4s=.eb0bee65-eca2-4513-b0d0-1ce174cf5b7e@github.com> References: <9fHyPXjbBbqKJlZJxNyeeLytjC6ASp87u1i8Pf1cT4s=.eb0bee65-eca2-4513-b0d0-1ce174cf5b7e@github.com> Message-ID: On Fri, 22 Jan 2021 15:51:06 GMT, Roman Kennke wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: >> >> - Simplify further after RESOLVE removal >> - Merge branch 'master' into JDK-8260106-shenandoah-simplify-updates >> - Rename maybe to atomic >> - Touch up comments >> - 8260106: Shenandoah: simplify maybe_update_with_forwarded and related code > > Looks good to me! Thanks! This kinda keeps creeping: I just realized we need to sweep in the "normal" update references in the same thing. I'll make this a draft and keep working on it. ------------- PR: https://git.openjdk.java.net/jdk/pull/2166 From iklam at openjdk.java.net Fri Jan 22 18:30:40 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Fri, 22 Jan 2021 18:30:40 GMT Subject: RFR: JDK-8260208: fix dummy object filling condition in =?UTF-8?B?RzFDb2xsZWN0ZWRIZWFwOjpm4oCm?= In-Reply-To: References: Message-ID: On Thu, 21 Jan 2021 11:16:42 GMT, Hamlin Li wrote: > it's a minor fix/enhancement in cds, it fixes dummy object filling condition in G1CollectedHeap::fill_archive_regions The assert looks correct to me. ------------- Marked as reviewed by iklam (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2183 From kbarrett at openjdk.java.net Fri Jan 22 20:34:40 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 22 Jan 2021 20:34:40 GMT Subject: RFR: 8260263: Remove PtrQueue::_qset In-Reply-To: <-SnSuf-MYWyw7qEE8FMVi81-2qREuzbYEHiNcTSPCuU=.e3a4f5f7-1f60-4495-8f5c-f2274488e6e2@github.com> References: <-SnSuf-MYWyw7qEE8FMVi81-2qREuzbYEHiNcTSPCuU=.e3a4f5f7-1f60-4495-8f5c-f2274488e6e2@github.com> Message-ID: On Fri, 22 Jan 2021 10:15:27 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this trivial(?) removal of dead code pertaining to and including `PtrQueue::_qset`? > > Testing: local compilation, tier1 > > Thanks, > Thomas I hadn't realized the `_qset` was quite this dead. And this doesn't appear to depend on JDK-8258742 (still waiting for a 2nd reviewer), or even have merge conflicts with it. Cool! ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2194 From kbarrett at openjdk.java.net Fri Jan 22 20:49:41 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 22 Jan 2021 20:49:41 GMT Subject: RFR: 8253420: Refactor HeapRegionManager::find_highest_free In-Reply-To: References: Message-ID: On Fri, 22 Jan 2021 08:59:51 GMT, Albert Mingkun Yang wrote: > Using for-loop to make the number of iterations more explicit. Direct backward iteration, `for (uint curr = reserved_length() - 1; curr >= 0; curr--)` doesn't work due to underflow of `uint` type. Therefore, I went for current approach. > > Test: hotspot_gc src/hotspot/share/gc/g1/heapRegionManager.cpp line 541: > 539: // committed, expand at that index. > 540: for (uint i = 0; i < reserved_length(); ++i) { > 541: uint curr = reserved_length() - 1 - i; Maybe this instead? for (uint curr = reserved_length(); curr-- > 0; ) { ------------- PR: https://git.openjdk.java.net/jdk/pull/2193 From ayang at openjdk.java.net Fri Jan 22 21:08:41 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Fri, 22 Jan 2021 21:08:41 GMT Subject: RFR: 8253420: Refactor HeapRegionManager::find_highest_free In-Reply-To: References: Message-ID: On Fri, 22 Jan 2021 20:46:46 GMT, Kim Barrett wrote: >> Using for-loop to make the number of iterations more explicit. Direct backward iteration, `for (uint curr = reserved_length() - 1; curr >= 0; curr--)` doesn't work due to underflow of `uint` type. Therefore, I went for current approach. >> >> Test: hotspot_gc > > src/hotspot/share/gc/g1/heapRegionManager.cpp line 541: > >> 539: // committed, expand at that index. >> 540: for (uint i = 0; i < reserved_length(); ++i) { >> 541: uint curr = reserved_length() - 1 - i; > > Maybe this instead? > for (uint curr = reserved_length(); curr-- > 0; ) { I would prefer not having side effect in the condition. At first glance, it's not obvious how many iteration the loop entails, `length` or `length - 1`? ------------- PR: https://git.openjdk.java.net/jdk/pull/2193 From iwalulya at openjdk.java.net Sat Jan 23 08:40:41 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Sat, 23 Jan 2021 08:40:41 GMT Subject: RFR: 8258742: Move PtrQueue reset to PtrQueueSet subclasses In-Reply-To: References: Message-ID: <7TEY2FCK1gA0vAtL15lQChhXgsxgF7qabipKcXv-n-8=.8431f565-19eb-4528-ba16-e7b9cc1fbcdb@github.com> On Sun, 17 Jan 2021 13:17:20 GMT, Kim Barrett wrote: > Please remove this change to the PtrQueue hierarchy, changing queue reset > from an intrinsic operation of the queue to an operation the qset performs > on a queue. This is another piece of the refactoring being done under > JDK-8258251, separated out for easier review. > > After the refactoring of queue reset, PtrQueue::is_empty and PtrQueue::size > are no longer used, so are removed. Further, PtrQueue::{set_}index_in_bytes > are removed, directly using _index instead. > > A less obvious part of the change is in the G1 remark task and Shenandoah > final marking task. The threads walk performed by these no longer directly > processes the partial per-thread SATB buffers. Instead they just flush the > queues for later completed buffer processing. > > Testing: > mach5 tier1 > local (linux-x64) hotspot:tier1 with -XX:+UseShenandoahGC Looks good! ------------- Marked as reviewed by iwalulya (Committer). PR: https://git.openjdk.java.net/jdk/pull/2115 From sjohanss at openjdk.java.net Sat Jan 23 14:05:38 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Sat, 23 Jan 2021 14:05:38 GMT Subject: RFR: 8253420: Refactor HeapRegionManager::find_highest_free In-Reply-To: References: Message-ID: On Fri, 22 Jan 2021 21:05:27 GMT, Albert Mingkun Yang wrote: >> src/hotspot/share/gc/g1/heapRegionManager.cpp line 541: >> >>> 539: // committed, expand at that index. >>> 540: for (uint i = 0; i < reserved_length(); ++i) { >>> 541: uint curr = reserved_length() - 1 - i; >> >> Maybe this instead? >> for (uint curr = reserved_length(); curr-- > 0; ) { > > I would prefer not having side effect in the condition. At first glance, it's not obvious how many iteration the loop entails, `length` or `length - 1`? Avoiding side effects is normally good, but in this case I think it actually make the whole intent of the code clearer. We could add to the comment that we loop backwards through all reserved regions to make it clear what the bound is. ------------- PR: https://git.openjdk.java.net/jdk/pull/2193 From sjohanss at openjdk.java.net Sat Jan 23 14:55:44 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Sat, 23 Jan 2021 14:55:44 GMT Subject: RFR: 8260263: Remove PtrQueue::_qset In-Reply-To: <-SnSuf-MYWyw7qEE8FMVi81-2qREuzbYEHiNcTSPCuU=.e3a4f5f7-1f60-4495-8f5c-f2274488e6e2@github.com> References: <-SnSuf-MYWyw7qEE8FMVi81-2qREuzbYEHiNcTSPCuU=.e3a4f5f7-1f60-4495-8f5c-f2274488e6e2@github.com> Message-ID: On Fri, 22 Jan 2021 10:15:27 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this trivial(?) removal of dead code pertaining to and including `PtrQueue::_qset`? > > Testing: local compilation, tier1 > > Thanks, > Thomas Nice cleanup. ------------- Marked as reviewed by sjohanss (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2194 From sjohanss at openjdk.java.net Sat Jan 23 15:19:40 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Sat, 23 Jan 2021 15:19:40 GMT Subject: RFR: 8259808: Add JFR event to detect GC locker stall [v2] In-Reply-To: References: Message-ID: On Fri, 22 Jan 2021 13:25:12 GMT, Denghui Dong wrote: >> src/hotspot/share/gc/shared/gcTraceSend.cpp line 369: >> >>> 367: void GCLockerTracer::report_gc_locker() { >>> 368: Ticks zero; >>> 369: if (_needs_gc_start_timestamp != zero) { >> >> Why is this needed? > > Because we can't assume that EventGCLocker is enabled when GC locker is started, in another word, at the beginning and end of the GC locker, whether EventGCLocker is enabled may be inconsistent. > > So check here if _needs_gc_start_timestamp is not zero, if it is not 0, it needs to reset _needs_gc_start_timestamp regardless of whether the event will be sent. Ok, so instead of sending an incomplete event you want to skip it? That might be correct but in that case I would prefer adding a helper `GCLockerTracer::is_started()`, that checks if the timestamp is set. That would make the intention more clear I think. This helper could assert that the event is enabled and it could also be used in `inc_stall_count()` since that one won't have any effect if the event is not "started". What do you think about that? ------------- PR: https://git.openjdk.java.net/jdk/pull/2088 From kbarrett at openjdk.java.net Sat Jan 23 19:32:40 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sat, 23 Jan 2021 19:32:40 GMT Subject: RFR: 8258742: Move PtrQueue reset to PtrQueueSet subclasses In-Reply-To: <1dgRRDTcWGLGFijLRvA01DMsmGoQMoFZM8-1Z5VrWQ4=.96f80cf3-136b-4a86-bf9e-f9db626f3979@github.com> References: <1dgRRDTcWGLGFijLRvA01DMsmGoQMoFZM8-1Z5VrWQ4=.96f80cf3-136b-4a86-bf9e-f9db626f3979@github.com> Message-ID: On Mon, 18 Jan 2021 10:35:14 GMT, Thomas Schatzl wrote: >> Please review this change to the PtrQueue hierarchy, changing queue reset >> from an intrinsic operation of the queue to an operation the qset performs >> on a queue. This is another piece of the refactoring being done under >> JDK-8258251, separated out for easier review. >> >> After the refactoring of queue reset, PtrQueue::is_empty and PtrQueue::size >> are no longer used, so are removed. Further, PtrQueue::{set_}index_in_bytes >> are removed, directly using _index instead. >> >> A less obvious part of the change is in the G1 remark task and Shenandoah >> final marking task. The threads walk performed by these no longer directly >> processes the partial per-thread SATB buffers. Instead they just flush the >> queues for later completed buffer processing. >> >> Testing: >> mach5 tier1 >> local (linux-x64) hotspot:tier1 with -XX:+UseShenandoahGC > > Lgtm. Thanks @tschatzl and @walulyai for reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/2115 From kbarrett at openjdk.java.net Sat Jan 23 19:51:54 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sat, 23 Jan 2021 19:51:54 GMT Subject: Integrated: 8258742: Move PtrQueue reset to PtrQueueSet subclasses In-Reply-To: References: Message-ID: On Sun, 17 Jan 2021 13:17:20 GMT, Kim Barrett wrote: > Please review this change to the PtrQueue hierarchy, changing queue reset > from an intrinsic operation of the queue to an operation the qset performs > on a queue. This is another piece of the refactoring being done under > JDK-8258251, separated out for easier review. > > After the refactoring of queue reset, PtrQueue::is_empty and PtrQueue::size > are no longer used, so are removed. Further, PtrQueue::{set_}index_in_bytes > are removed, directly using _index instead. > > A less obvious part of the change is in the G1 remark task and Shenandoah > final marking task. The threads walk performed by these no longer directly > processes the partial per-thread SATB buffers. Instead they just flush the > queues for later completed buffer processing. > > Testing: > mach5 tier1 > local (linux-x64) hotspot:tier1 with -XX:+UseShenandoahGC This pull request has now been integrated. Changeset: 6c4c96fa Author: Kim Barrett URL: https://git.openjdk.java.net/jdk/commit/6c4c96fa Stats: 89 lines in 9 files changed: 17 ins; 45 del; 27 mod 8258742: Move PtrQueue reset to PtrQueueSet subclasses Reviewed-by: tschatzl, iwalulya ------------- PR: https://git.openjdk.java.net/jdk/pull/2115 From kbarrett at openjdk.java.net Sat Jan 23 19:51:54 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sat, 23 Jan 2021 19:51:54 GMT Subject: RFR: 8258742: Move PtrQueue reset to PtrQueueSet subclasses [v2] In-Reply-To: References: Message-ID: > Please review this change to the PtrQueue hierarchy, changing queue reset > from an intrinsic operation of the queue to an operation the qset performs > on a queue. This is another piece of the refactoring being done under > JDK-8258251, separated out for easier review. > > After the refactoring of queue reset, PtrQueue::is_empty and PtrQueue::size > are no longer used, so are removed. Further, PtrQueue::{set_}index_in_bytes > are removed, directly using _index instead. > > A less obvious part of the change is in the G1 remark task and Shenandoah > final marking task. The threads walk performed by these no longer directly > processes the partial per-thread SATB buffers. Instead they just flush the > queues for later completed buffer processing. > > Testing: > mach5 tier1 > local (linux-x64) hotspot:tier1 with -XX:+UseShenandoahGC Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Merge branch 'master' into move_reset - update shenandoah - remove pq index_in_bytes - remove pq size - remove pq is_empty - move reset ------------- Changes: https://git.openjdk.java.net/jdk/pull/2115/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2115&range=01 Stats: 89 lines in 9 files changed: 17 ins; 45 del; 27 mod Patch: https://git.openjdk.java.net/jdk/pull/2115.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2115/head:pull/2115 PR: https://git.openjdk.java.net/jdk/pull/2115 From kbarrett at openjdk.java.net Sat Jan 23 22:51:00 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sat, 23 Jan 2021 22:51:00 GMT Subject: RFR: 8259776: Remove ParallelGC non-CAS oldgen allocation [v4] In-Reply-To: References: Message-ID: <2BwE5mRjN6HC3QpxRjRWbAVMP8YmnFMzGhdCNXMhK6s=.85f45c7d-d910-4e0e-9a94-410072961137@github.com> > Please review this change to ParallelGC oldgen allocation. There were two > variants, one using CAS on the _top member of the mutable space, the other > requiring locking or other forms of mutual exclusion. > > We don't need both variants. The non-CAS variant is only used in a few > places, where the cost of an extra CAS doesn't matter. What does matter is > that having two variants, which must not be used concurrently, makes the > code larger, more complex, and harder to verify. (This change came out of > analyzing JDK-8259271. No problems were found (or expected), so this change > is not expected to impact that bug. But because of the two variants, the > possibility of unexpected interact needed to be examined.) > > The non-CAS allocation support has been removed, with PSOldGen::allocate now > implemented using the CAS-based allocation. The cas_ prefix naming > convention is retained for the internals for clarity. > > While looking at this, noticed and removed a couple of lingering references > to the class AdjoiningGenerations, which no longer exists after JDK-8243146. > > Testing: > mach5 tier1-5 Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into nocas_alloc - move oldgen alloc with size policy recording to heap object - record oldgen mutator allocations in size policy - remove non-CAS allocate ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2101/files - new: https://git.openjdk.java.net/jdk/pull/2101/files/994c0eb6..11ca7991 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2101&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2101&range=02-03 Stats: 29337 lines in 598 files changed: 8628 ins; 16330 del; 4379 mod Patch: https://git.openjdk.java.net/jdk/pull/2101.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2101/head:pull/2101 PR: https://git.openjdk.java.net/jdk/pull/2101 From kbarrett at openjdk.java.net Sat Jan 23 22:51:01 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sat, 23 Jan 2021 22:51:01 GMT Subject: Integrated: 8259776: Remove ParallelGC non-CAS oldgen allocation In-Reply-To: References: Message-ID: On Fri, 15 Jan 2021 14:16:50 GMT, Kim Barrett wrote: > Please review this change to ParallelGC oldgen allocation. There were two > variants, one using CAS on the _top member of the mutable space, the other > requiring locking or other forms of mutual exclusion. > > We don't need both variants. The non-CAS variant is only used in a few > places, where the cost of an extra CAS doesn't matter. What does matter is > that having two variants, which must not be used concurrently, makes the > code larger, more complex, and harder to verify. (This change came out of > analyzing JDK-8259271. No problems were found (or expected), so this change > is not expected to impact that bug. But because of the two variants, the > possibility of unexpected interact needed to be examined.) > > The non-CAS allocation support has been removed, with PSOldGen::allocate now > implemented using the CAS-based allocation. The cas_ prefix naming > convention is retained for the internals for clarity. > > While looking at this, noticed and removed a couple of lingering references > to the class AdjoiningGenerations, which no longer exists after JDK-8243146. > > Testing: > mach5 tier1-5 This pull request has now been integrated. Changeset: 06348dfc Author: Kim Barrett URL: https://git.openjdk.java.net/jdk/commit/06348dfc Stats: 150 lines in 10 files changed: 13 ins; 119 del; 18 mod 8259776: Remove ParallelGC non-CAS oldgen allocation Reviewed-by: tschatzl, sjohanss ------------- PR: https://git.openjdk.java.net/jdk/pull/2101 From zgu at openjdk.java.net Sun Jan 24 01:34:58 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Sun, 24 Jan 2021 01:34:58 GMT Subject: RFR: 8256298: Shenandoah: Enable concurrent stack processing Message-ID: Please review this patch that enables concurrent stack processing for Shenandoah GC. After this patch, all root processing is done concurrently for concurrent GC. Test: - [x] hotspot_gc_shenandoah Linux x86_64 and x86_32 - [x] Nightly - [x] tier1 with -XX:+UseShenandoahGC on Linux x86_32 ------------- Commit messages: - Merge master - Merge branch 'JDK-8255765-isolate-gcs' into JDK-8256298-conc-stack-proc - Fixed indentation - More from Aleksey's review - cleanup - fix styles and cleanup - Removed unnecessary includes - Merge - Merge branch 'JDK-8255765-isolate-gcs' into JDK-8256298-conc-stack-proc - Merge branch 'master' into JDK-8255765-isolate-gcs - ... and 121 more: https://git.openjdk.java.net/jdk/compare/34eb8b34...4fd486b5 Changes: https://git.openjdk.java.net/jdk/pull/2185/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2185&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8256298 Stats: 649 lines in 19 files changed: 466 ins; 129 del; 54 mod Patch: https://git.openjdk.java.net/jdk/pull/2185.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2185/head:pull/2185 PR: https://git.openjdk.java.net/jdk/pull/2185 From zgu at openjdk.java.net Sun Jan 24 01:46:56 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Sun, 24 Jan 2021 01:46:56 GMT Subject: RFR: 8256298: Shenandoah: Enable concurrent stack processing [v2] In-Reply-To: References: Message-ID: > Please review this patch that enables concurrent stack processing for Shenandoah GC. > > After this patch, all root processing is done concurrently for concurrent GC. > > Test: > - [x] hotspot_gc_shenandoah Linux x86_64 and x86_32 > - [x] Nightly > - [x] tier1 with -XX:+UseShenandoahGC on Linux x86_32 Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: Minor cleanup ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2185/files - new: https://git.openjdk.java.net/jdk/pull/2185/files/4fd486b5..62170608 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2185&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2185&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2185.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2185/head:pull/2185 PR: https://git.openjdk.java.net/jdk/pull/2185 From ddong at openjdk.java.net Sun Jan 24 01:48:57 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Sun, 24 Jan 2021 01:48:57 GMT Subject: RFR: 8259808: Add JFR event to detect GC locker stall [v4] In-Reply-To: References: Message-ID: > GC locker will stall the operation of GC, resulting in some Java threads can not continue to run until GC locker is released, thus affecting the response time of the application. Add a JFR event to report this information is helpful to detect this issue. > > For the test purpose, I add two Whitebox methods to lock/unlock critical. Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: add GCLockerTracer::is_started() that makes the logic more clear ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2088/files - new: https://git.openjdk.java.net/jdk/pull/2088/files/41641d8a..85987c58 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2088&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2088&range=02-03 Stats: 13 lines in 2 files changed: 8 ins; 1 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/2088.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2088/head:pull/2088 PR: https://git.openjdk.java.net/jdk/pull/2088 From ddong at openjdk.java.net Sun Jan 24 01:52:42 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Sun, 24 Jan 2021 01:52:42 GMT Subject: RFR: 8259808: Add JFR event to detect GC locker stall [v2] In-Reply-To: References: Message-ID: On Sat, 23 Jan 2021 15:16:40 GMT, Stefan Johansson wrote: >> Because we can't assume that EventGCLocker is enabled when GC locker is started, in another word, at the beginning and end of the GC locker, whether EventGCLocker is enabled may be inconsistent. >> >> So check here if _needs_gc_start_timestamp is not zero, if it is not 0, it needs to reset _needs_gc_start_timestamp regardless of whether the event will be sent. > > Ok, so instead of sending an incomplete event you want to skip it? That might be correct but in that case I would prefer adding a helper `GCLockerTracer::is_started()`, that checks if the timestamp is set. That would make the intention more clear I think. This helper could assert that the event is enabled and it could also be used in `inc_stall_count()` since that one won't have any effect if the event is not "started". What do you think about that? Good idea. Updated. But I didn't add the assertion that makes sure the event is enabled, because it may be disabled between GCLockerTracer::start_gc_locker and GCLockerTracer::inc_stall_count(). ------------- PR: https://git.openjdk.java.net/jdk/pull/2088 From kim.barrett at oracle.com Sun Jan 24 14:37:32 2021 From: kim.barrett at oracle.com (Kim Barrett) Date: Sun, 24 Jan 2021 09:37:32 -0500 Subject: RFR: Convert old-gen single threaded pretouch to multi-threaded during In-Reply-To: References: Message-ID: <92A36846-F059-47A4-8AEF-086135651CED@oracle.com> > On Jan 8, 2021, at 8:08 AM, Amit Pawar wrote: > > Hi > > I am trying to improve the pre-touch time taken during old-gen resizing. > Need your suggestions whether following change will be accepted or not. > > What is happening ? > Every GC thread resizes the old-gen during object promotion if there is no > enough room for the object. After expanding GC thread will pre-touch the > pages alone and cant pre-touch in parallel using PretouchTask task as it is > already executing a GC task. The total GC pause time depends upon resize > size and number of resizes. > > What is fix? > Create another WorkGang and then GC thread can execute pre-touch task with > this new WorkGang to reduce the pre-touch time taken. The code change is > given below. I don't think adding a work gang is the right approach here. The threads in that new work gang may just end up competing for CPUs with the already in-progress work gang doing the normal GC work. A better approach would be to refactor pretouch parallization to allow threads to join the fray as needed. Then arrange for the in-progress work gang threads to join the pretouch if they would otherwise be waiting for it to complete. I've recently been looking at the relevant parts of ParallelGC, and it looks like it shouldn't be too hard to allow threads waiting for expansion to cooperate in any ongoing pretouch, esp. after some other recent RFEs have been dealt with. I've filed JDK-8260332 for this. I haven't looked at the G1 side of things yet. From david.holmes at oracle.com Sun Jan 24 21:07:23 2021 From: david.holmes at oracle.com (David Holmes) Date: Mon, 25 Jan 2021 07:07:23 +1000 Subject: [jdk16] RFR: 8259765: ZGC: Handle incorrect processor id reported by the operating system [v2] In-Reply-To: <75qnWumnRn-LDG_hwMAxgqeQCn0HtJWEPadgb9i2_qE=.376ff161-9d5f-4332-9719-a4a5d2beae00@github.com> References: <75qnWumnRn-LDG_hwMAxgqeQCn0HtJWEPadgb9i2_qE=.376ff161-9d5f-4332-9719-a4a5d2beae00@github.com> Message-ID: On 22/01/2021 9:21 pm, Per Liden wrote: > On Sat, 16 Jan 2021 13:00:04 GMT, David Holmes wrote: > >>> Per Liden has updated the pull request incrementally with one additional commit since the last revision: >>> >>> Review >> >> So we have to penalize all correctly functioning users because of one broken environment? Can we not detect this broken environment at startup and inject a workaround then? >> >> Why is this an environment that is important enough that OpenJDK has to make changes to deal with a broken environment? >> >> Cheers, >> David > > @dholmes-ora Do you still have questions or concerns here, or can I go ahead and integrate this? I remain concerned about the justification for putting in this workaround for a broken virtualization system. I would be happier if the bug was acknowledged and a fix was in the pipeline so we would know how long we have to carry this for. > I've gone through all uses of sysconf(_SC_NPROCESSORS_*) and sched_getaffinity() we have, and they look fine. I've also looked at how the OSContainer stuff behaves in this environment, and it also looks fine. In summary, the only problem I can spot is related to sched_getcpu(). So IIUC what we suspect is that sched_getcpu is reporting physical id's rather than virtualized ones. I find it hard to imagine how only one API in this area can be affected by such a bug, but if that appears to be the case then that is reassuring. I won't "block" this, but I'm not happy about it. Thanks, David > ------------- > > PR: https://git.openjdk.java.net/jdk16/pull/124 > From mli at openjdk.java.net Mon Jan 25 01:09:44 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Mon, 25 Jan 2021 01:09:44 GMT Subject: Integrated: JDK-8260208: Improve dummy object filling condition in G1CollectedHeap::fill_archive_regions in cds In-Reply-To: References: Message-ID: On Thu, 21 Jan 2021 11:16:42 GMT, Hamlin Li wrote: > it's a minor fix/enhancement in cds, it fixes dummy object filling condition in G1CollectedHeap::fill_archive_regions This pull request has now been integrated. Changeset: 4ae39b14 Author: Hamlin Li URL: https://git.openjdk.java.net/jdk/commit/4ae39b14 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod 8260208: Improve dummy object filling condition in G1CollectedHeap::fill_archive_regions in cds Reviewed-by: tschatzl, iklam ------------- PR: https://git.openjdk.java.net/jdk/pull/2183 From jiefu at openjdk.java.net Mon Jan 25 04:12:53 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Mon, 25 Jan 2021 04:12:53 GMT Subject: RFR: 8260327: Shenandoah: Shenandoah may fail with -XX:UseSSE=0 on x86_32 Message-ID: Hi all, I'd like to fix this bug although UseSSE=0 won't be used in product environments. However, it will be benefit for our testing of OpenJDK. The fix just following the style of RegisterSaver::save_live_registers [1]. Thanks. Best regards, Jie [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/sharedRuntime_x86_32.cpp#L205 ------------- Commit messages: - 8260327: Shenandoah: Shenandoah may fail with -XX:UseSSE=0 on x86_32 Changes: https://git.openjdk.java.net/jdk/pull/2214/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2214&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260327 Stats: 48 lines in 2 files changed: 44 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/2214.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2214/head:pull/2214 PR: https://git.openjdk.java.net/jdk/pull/2214 From tschatzl at openjdk.java.net Mon Jan 25 08:37:54 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 25 Jan 2021 08:37:54 GMT Subject: RFR: 8260263: Remove PtrQueue::_qset [v2] In-Reply-To: <-SnSuf-MYWyw7qEE8FMVi81-2qREuzbYEHiNcTSPCuU=.e3a4f5f7-1f60-4495-8f5c-f2274488e6e2@github.com> References: <-SnSuf-MYWyw7qEE8FMVi81-2qREuzbYEHiNcTSPCuU=.e3a4f5f7-1f60-4495-8f5c-f2274488e6e2@github.com> Message-ID: > Hi all, > > can I have reviews for this trivial(?) removal of dead code pertaining to and including `PtrQueue::_qset`? > > Testing: local compilation, tier1 > > Thanks, > Thomas Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - Merge branch 'master' into 8260263-remove-ptrqueue-qset - Initial commit ------------- Changes: https://git.openjdk.java.net/jdk/pull/2194/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2194&range=01 Stats: 26 lines in 4 files changed: 2 ins; 23 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2194.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2194/head:pull/2194 PR: https://git.openjdk.java.net/jdk/pull/2194 From tschatzl at openjdk.java.net Mon Jan 25 08:41:43 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 25 Jan 2021 08:41:43 GMT Subject: RFR: 8260263: Remove PtrQueue::_qset [v2] In-Reply-To: References: <-SnSuf-MYWyw7qEE8FMVi81-2qREuzbYEHiNcTSPCuU=.e3a4f5f7-1f60-4495-8f5c-f2274488e6e2@github.com> Message-ID: On Fri, 22 Jan 2021 20:32:23 GMT, Kim Barrett wrote: >> Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: >> >> - Merge branch 'master' into 8260263-remove-ptrqueue-qset >> - Initial commit > > I hadn't realized the `_qset` was quite this dead. And this doesn't appear to depend on JDK-8258742 (still waiting for a 2nd reviewer), or even have merge conflicts with it. Cool! Thanks @kimbarrett @kstefanj for your reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/2194 From tschatzl at openjdk.java.net Mon Jan 25 08:41:44 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 25 Jan 2021 08:41:44 GMT Subject: Integrated: 8260263: Remove PtrQueue::_qset In-Reply-To: <-SnSuf-MYWyw7qEE8FMVi81-2qREuzbYEHiNcTSPCuU=.e3a4f5f7-1f60-4495-8f5c-f2274488e6e2@github.com> References: <-SnSuf-MYWyw7qEE8FMVi81-2qREuzbYEHiNcTSPCuU=.e3a4f5f7-1f60-4495-8f5c-f2274488e6e2@github.com> Message-ID: On Fri, 22 Jan 2021 10:15:27 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this trivial(?) removal of dead code pertaining to and including `PtrQueue::_qset`? > > Testing: local compilation, tier1 > > Thanks, > Thomas This pull request has now been integrated. Changeset: d825339d Author: Thomas Schatzl URL: https://git.openjdk.java.net/jdk/commit/d825339d Stats: 26 lines in 4 files changed: 2 ins; 23 del; 1 mod 8260263: Remove PtrQueue::_qset Remove dead code related to PtrQueue::_qset and itself. Reviewed-by: kbarrett, sjohanss ------------- PR: https://git.openjdk.java.net/jdk/pull/2194 From tschatzl at openjdk.java.net Mon Jan 25 08:56:38 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 25 Jan 2021 08:56:38 GMT Subject: RFR: 8260042: G1 Post-cleanup liveness printing occurs too early In-Reply-To: References: <25_KrAypifFeF67n3yODY_SaSYoQj937CwOF8qmGPoc=.b8df7f88-0688-4436-87ac-869dcb858ada@github.com> Message-ID: <76ihAHdwnPjqqdfTJijMuXGbGHqlnoIRnkElIrVpZSA=.c9a12867-3cc9-4451-9aea-b566a4ac099f@github.com> On Fri, 22 Jan 2021 09:00:43 GMT, Stefan Johansson wrote: >> Hi all, >> >> can I have reviews for this small change that fixes position of the Post-cleanup liveness printing, causing wrong gc efficiencies to be printed? >> >> I.e. due to some older changes, the calculation of gc efficiences got moved below the printing of the Post-cleanup liveness which should be about these values. This change corrects that. >> >> Note that there is a sister issue about not printing the gc efficiencies in the "Post-Marking" phase. This is not scope of this change. >> >> Testing: manual testing that values are correct, hs-tier1+2 > > Looks good! Thanks for your review @kstefanj ------------- PR: https://git.openjdk.java.net/jdk/pull/2168 From sjohanss at openjdk.java.net Mon Jan 25 09:00:43 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Mon, 25 Jan 2021 09:00:43 GMT Subject: RFR: JDK-8260200 G1: Remove unnecessary update in FreeRegionList::remove_starting_at [v2] In-Reply-To: <04tWoUPw_97DYXPqlJdJhQyDr0dBW1Vkmaja9rqPMEE=.ff792b59-ab27-49c7-bcaa-3b6d329b7650@github.com> References: <04tWoUPw_97DYXPqlJdJhQyDr0dBW1Vkmaja9rqPMEE=.ff792b59-ab27-49c7-bcaa-3b6d329b7650@github.com> Message-ID: On Fri, 22 Jan 2021 11:39:33 GMT, Hamlin Li wrote: >> src/hotspot/share/gc/g1/heapRegionSet.cpp line 252: >> >>> 250: } else { >>> 251: assert_free_region_list(_tail != first, "invariant"); >>> 252: } >> >> Since these checks no longer does anything other than assertions I think it would be nice to hide it in a helper that for production builds will do nothing using `NOT_DEBUG_RETURN`. > > Thanks for reviewing. > I just realized I made the change in another way from what you suggested exactly, seems it has same effect. Not sue if my current change with "if ASSERT" is OK, please kindly let me know if you don't think so. You are correct that when executing it will have the same effect, but I'm more concerned about the readability of the code in this case. So please move it to a separate function, something like `verify_region_to_remove()`. Looking a bit closer at the code I think this function should be able to be called from each round of the loop as well, I mean that's what was done before and by doing so we could get the loop a bit cleaner as well. Or have I missed anything? In that case you would not need to call it before the loop, but just inside it. ------------- PR: https://git.openjdk.java.net/jdk/pull/2181 From rs at jelastic.com Mon Jan 25 09:20:06 2021 From: rs at jelastic.com (Ruslan Synytsky) Date: Mon, 25 Jan 2021 13:20:06 +0400 Subject: [jdk16] RFR: 8259765: ZGC: Handle incorrect processor id reported by the operating system [v2] In-Reply-To: References: Message-ID: Hi, sharing comments provided by Virtuozzo team (cc'd). *Question (**Florian**):** It would be good to have someone from Virtuozzo comment to indicate whether the affinity mask is actually reliable for this. But they will see test failures in low-level test suites if the affinity mask and sched_getcpu are incompatible (I actually wrote a glibc test case for this).* *Answer (**Denis**): Syscall sched_setaffinity is not working inside containers. On one hand we can not return error as this will immediately break a lot of software, on the other hand we could not allow to bind the process to the specific CPU as in this case we could have DoS attack vector. Thus it returns success, but actually does nothing. The rest is the consequence.* Hope it's helpful. Regards > ---------- Forwarded message ---------- > From: David Holmes > To: Per Liden , hotspot-gc-dev at openjdk.java.net, > hotspot-runtime-dev at openjdk.java.net > Cc: > Bcc: > Date: Mon, 25 Jan 2021 07:07:23 +1000 > Subject: Re: [jdk16] RFR: 8259765: ZGC: Handle incorrect processor id > reported by the operating system [v2] > On 22/01/2021 9:21 pm, Per Liden wrote: > > On Sat, 16 Jan 2021 13:00:04 GMT, David Holmes > wrote: > > > >>> Per Liden has updated the pull request incrementally with one > additional commit since the last revision: > >>> > >>> Review > >> > >> So we have to penalize all correctly functioning users because of one > broken environment? Can we not detect this broken environment at startup > and inject a workaround then? > >> > >> Why is this an environment that is important enough that OpenJDK has to > make changes to deal with a broken environment? > >> > >> Cheers, > >> David > > > > @dholmes-ora Do you still have questions or concerns here, or can I go > ahead and integrate this? > > I remain concerned about the justification for putting in this > workaround for a broken virtualization system. I would be happier if the > bug was acknowledged and a fix was in the pipeline so we would know how > long we have to carry this for. > > > I've gone through all uses of sysconf(_SC_NPROCESSORS_*) and > sched_getaffinity() we have, and they look fine. I've also looked at how > the OSContainer stuff behaves in this environment, and it also looks fine. > In summary, the only problem I can spot is related to sched_getcpu(). > > So IIUC what we suspect is that sched_getcpu is reporting physical id's > rather than virtualized ones. I find it hard to imagine how only one API > in this area can be affected by such a bug, but if that appears to be > the case then that is reassuring. > > I won't "block" this, but I'm not happy about it. > > Thanks, > David > > > ------------- > > > > PR: https://git.openjdk.java.net/jdk16/pull/124 > > > -- Ruslan Synytsky CEO @ Jelastic Multi-Cloud PaaS From rkennke at openjdk.java.net Mon Jan 25 09:43:46 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 25 Jan 2021 09:43:46 GMT Subject: RFR: 8260327: Shenandoah: Shenandoah may fail with -XX:UseSSE=0 on x86_32 In-Reply-To: References: Message-ID: On Mon, 25 Jan 2021 04:08:09 GMT, Jie Fu wrote: > Hi all, > > I'd like to fix this bug although UseSSE=0 won't be used in product environments. > However, it will be benefit for our testing of OpenJDK. > > The fix just following the style of RegisterSaver::save_live_registers [1]. > > Thanks. > Best regards, > Jie > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/sharedRuntime_x86_32.cpp#L205 Hi @DamonFool, we already have an ongoing PR about this: https://github.com/openjdk/jdk/pull/1172 Maybe coordinate with @shipilev which way to go? Not sure if Aleksey intends to take this any further. ------------- PR: https://git.openjdk.java.net/jdk/pull/2214 From jiefu at openjdk.java.net Mon Jan 25 10:42:40 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Mon, 25 Jan 2021 10:42:40 GMT Subject: RFR: 8260327: Shenandoah: Shenandoah may fail with -XX:UseSSE=0 on x86_32 In-Reply-To: References: Message-ID: On Mon, 25 Jan 2021 09:41:03 GMT, Roman Kennke wrote: >> Hi all, >> >> I'd like to fix this bug although UseSSE=0 won't be used in product environments. >> However, it will be benefit for our testing of OpenJDK. >> >> The fix just following the style of RegisterSaver::save_live_registers [1]. >> >> Thanks. >> Best regards, >> Jie >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/sharedRuntime_x86_32.cpp#L205 > > Hi @DamonFool, > we already have an ongoing PR about this: > https://github.com/openjdk/jdk/pull/1172 > > Maybe coordinate with @shipilev which way to go? Not sure if Aleksey intends to take this any further. I didn't know that before. Since you are the experts in Shenandoah, I'll close this PR. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/2214 From jiefu at openjdk.java.net Mon Jan 25 10:42:41 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Mon, 25 Jan 2021 10:42:41 GMT Subject: Withdrawn: 8260327: Shenandoah: Shenandoah may fail with -XX:UseSSE=0 on x86_32 In-Reply-To: References: Message-ID: On Mon, 25 Jan 2021 04:08:09 GMT, Jie Fu wrote: > Hi all, > > I'd like to fix this bug although UseSSE=0 won't be used in product environments. > However, it will be benefit for our testing of OpenJDK. > > The fix just following the style of RegisterSaver::save_live_registers [1]. > > Thanks. > Best regards, > Jie > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/sharedRuntime_x86_32.cpp#L205 This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/2214 From github.com+779991+jaokim at openjdk.java.net Mon Jan 25 11:57:52 2021 From: github.com+779991+jaokim at openjdk.java.net (Joakim =?UTF-8?B?Tm9yZHN0csO2bQ==?=) Date: Mon, 25 Jan 2021 11:57:52 GMT Subject: RFR: 8217327: G1 Post-Cleanup region liveness printing should not print out-of-date efficiency Message-ID: **Description** This fix addresses the issue where gc-efficiency is printed incorrectly when logging post-marking and post-cleanup. The gc-efficiency is calculated in the end of the marking phase, to be logged in the post-cleanup section. It is however not reset, meaning that next phase's post-marking log will show the old value. - The gc-efficiency is initialized to -1 when it hasn't been calculated. - Negative gc-efficiency is displayed as a hyphen "-" in the summary. - The gc-efficiency is reset to -1 in `HeapRegion::note_start_of_marking()` **Note:** there is a sister issue that moves the post-cleanup printing to a later stage. Without this fix, the logging will still be incorrect, so both fixes are needed. See: [JDK-8260042: G1 Post-cleanup liveness printing occurs too early](https://github.com/openjdk/jdk/pull/2168) This fix has been tested together with the above mentioned fix. **Example** This is what logging like after fix has been applied. ### PHASE Post-Marking @ 410.303 ### HEAP reserved: 0x0ffc00000-0x100000000 region-size: 1048576 ### ### type address-range used prev-live next-live gc-eff remset state code-roots ### (bytes) (bytes) (bytes) (bytes/ms) (bytes) (bytes) ### OLD 0x0ffc00000-0x0ffd00000 1048368 1048368 1048368 - 8464 UPDAT 6096 ### OLD 0x0ffd00000-0x0ffe00000 132856 132856 132856 - 2544 UPDAT 16 ### SURV 0x0ffe00000-0x0fff00000 21368 21368 21368 - 2544 CMPLT 16 ### FREE 0x0fff00000-0x100000000 0 0 0 - 2384 UNTRA 16 ### ### SUMMARY capacity: 4.00 MB used: 1.15 MB / 28.67 % prev-live: 1.15 MB / 28.67 % next-live: 1.15 MB / 28.67 % remset: 0.02 MB code-roots: 0.01 MB ### PHASE Post-Cleanup @ 410.305 ### HEAP reserved: 0x0ffc00000-0x100000000 region-size: 1048576 ### ### type address-range used prev-live next-live gc-eff remset state code-roots ### (bytes) (bytes) (bytes) (bytes/ms) (bytes) (bytes) ### OLD 0x0ffc00000-0x0ffd00000 1048368 1048368 1048368 - 8624 UNTRA 6096 ### OLD 0x0ffd00000-0x0ffe00000 132856 132856 132856 1352923.9 2544 CMPLT 16 ### SURV 0x0ffe00000-0x0fff00000 21368 21368 21368 - 2544 CMPLT 16 ### FREE 0x0fff00000-0x100000000 0 0 0 - 2384 UNTRA 16 ### ### SUMMARY capacity: 4.00 MB used: 1.15 MB / 28.67 % prev-live: 1.15 MB / 28.67 % next-live: 1.15 MB / 28.67 % remset: 0.02 MB code-roots: 0.01 MB ### PHASE Post-Marking @ 450.310 ### HEAP reserved: 0x0ffc00000-0x100000000 region-size: 1048576 ### ### type address-range used prev-live next-live gc-eff remset state code-roots ### (bytes) (bytes) (bytes) (bytes/ms) (bytes) (bytes) ### OLD 0x0ffc00000-0x0ffd00000 1048368 1048368 1048368 - 8624 UPDAT 6096 ### OLD 0x0ffd00000-0x0ffe00000 174456 174456 174456 - 2544 UPDAT 16 ### SURV 0x0ffe00000-0x0fff00000 21368 21368 21368 - 2544 CMPLT 16 ### FREE 0x0fff00000-0x100000000 0 0 0 - 2384 UNTRA 16 ### ### SUMMARY capacity: 4.00 MB used: 1.19 MB / 29.66 % prev-live: 1.19 MB / 29.66 % next-live: 1.19 MB / 29.66 % remset: 0.02 MB code-roots: 0.01 MB ### PHASE Post-Cleanup @ 450.312 ### HEAP reserved: 0x0ffc00000-0x100000000 region-size: 1048576 ### ### type address-range used prev-live next-live gc-eff remset state code-roots ### (bytes) (bytes) (bytes) (bytes/ms) (bytes) (bytes) ### OLD 0x0ffc00000-0x0ffd00000 1048368 1048368 1048368 - 8624 UNTRA 6096 ### OLD 0x0ffd00000-0x0ffe00000 174456 174456 174456 1266519.2 2544 CMPLT 16 ### SURV 0x0ffe00000-0x0fff00000 21368 21368 21368 - 2544 CMPLT 16 ### FREE 0x0fff00000-0x100000000 0 0 0 - 2384 UNTRA 16 ### **Testing** - Manual testing - hs-tier1, hs-tier2 ------------- Commit messages: - Clear gc_efficiency by setting to -1. Print "-" when gc efficieny is negative. Changes: https://git.openjdk.java.net/jdk/pull/2217/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2217&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8217327 Stats: 22 lines in 3 files changed: 17 ins; 0 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/2217.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2217/head:pull/2217 PR: https://git.openjdk.java.net/jdk/pull/2217 From rkennke at openjdk.java.net Mon Jan 25 12:50:47 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 25 Jan 2021 12:50:47 GMT Subject: RFR: 8256298: Shenandoah: Enable concurrent stack processing [v2] In-Reply-To: References: Message-ID: On Sun, 24 Jan 2021 01:46:56 GMT, Zhengyu Gu wrote: >> Please review this patch that enables concurrent stack processing for Shenandoah GC. >> >> After this patch, all root processing is done concurrently for concurrent GC. >> >> Test: >> - [x] hotspot_gc_shenandoah Linux x86_64 and x86_32 >> - [x] Nightly >> - [x] tier1 with -XX:+UseShenandoahGC on Linux x86_32 > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > Minor cleanup Wow that looks very nice and clean! I only have a few minor comments. src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 512: > 510: // > 511: public: > 512: bool uses_stack_watermark_barrier() const { return true; } This overrides a CollectedHeap method. I wonder if we should start adding 'override' keywords to overridden methods. However, this requires >=C++11, and might disturb backporting. Not sure about it. src/hotspot/share/gc/shenandoah/shenandoahStackWatermark.cpp line 2: > 1: /* > 2: * Copyright (c) 2020, Red Hat, Inc. All rights reserved. This is structurally similar to zStackWatermark.cpp, and perhaps 'inspired' by it. Consider adding Oracle copyright there in addition to the Red Hat copyright. We did the same in e.g. shenandoahReferenceProcessor.hpp/cpp. Also, bump the copyright year to 2021 maybe? src/hotspot/share/gc/shenandoah/shenandoahStackWatermark.hpp line 2: > 1: /* > 2: * Copyright (c) 2020, Red Hat, Inc. All rights reserved. Same comment as in shenandoahStackWatermark.cpp above. ------------- Changes requested by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2185 From zgu at openjdk.java.net Mon Jan 25 13:33:06 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Mon, 25 Jan 2021 13:33:06 GMT Subject: RFR: 8256298: Shenandoah: Enable concurrent stack processing [v3] In-Reply-To: References: Message-ID: > Please review this patch that enables concurrent stack processing for Shenandoah GC. > > After this patch, all root processing is done concurrently for concurrent GC. > > Test: > - [x] hotspot_gc_shenandoah Linux x86_64 and x86_32 > - [x] Nightly > - [x] tier1 with -XX:+UseShenandoahGC on Linux x86_32 Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: Roman's comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2185/files - new: https://git.openjdk.java.net/jdk/pull/2185/files/62170608..61b3dcb8 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2185&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2185&range=01-02 Stats: 5 lines in 3 files changed: 2 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/2185.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2185/head:pull/2185 PR: https://git.openjdk.java.net/jdk/pull/2185 From rkennke at openjdk.java.net Mon Jan 25 14:27:44 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 25 Jan 2021 14:27:44 GMT Subject: RFR: 8256298: Shenandoah: Enable concurrent stack processing [v3] In-Reply-To: References: Message-ID: On Mon, 25 Jan 2021 13:33:06 GMT, Zhengyu Gu wrote: >> Please review this patch that enables concurrent stack processing for Shenandoah GC. >> >> After this patch, all root processing is done concurrently for concurrent GC. >> >> Test: >> - [x] hotspot_gc_shenandoah Linux x86_64 and x86_32 >> - [x] Nightly >> - [x] tier1 with -XX:+UseShenandoahGC on Linux x86_32 > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > Roman's comments Better, thanks! Found a few more very minor complaints (sorry... ;-) ) src/hotspot/share/gc/shenandoah/shenandoahStackWatermark.cpp line 3: > 1: /* > 2: * Copyright (c) 2021, Red Hat, Inc. All rights reserved. > 3: * Copyright (c) 2021, Oracle and/or its affiliates. All rights reserved. Small nit: the original copyright in the corresponding z* files is dated 2020. Please preserve that. Leave the Red Hat copyright at 2021. Sorry. src/hotspot/share/gc/shenandoah/shenandoahStackWatermark.hpp line 45: > 43: BarrierSetNMethod* _bs_nm; > 44: > 45: virtual void do_code_blob(CodeBlob* cb); There is no need to mark this virtual, or is there? I see you put override in the other place in that change, so maybe put it here too? src/hotspot/share/gc/shenandoah/shenandoahStackWatermark.hpp line 68: > 66: OopClosure* closure_from_context(void* context); > 67: virtual uint32_t epoch_id() const; > 68: virtual void start_processing_impl(void* context); Also, similar to the other case, we might avoid virtual here, and use override instead? ------------- Changes requested by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2185 From shade at openjdk.java.net Mon Jan 25 14:37:52 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 25 Jan 2021 14:37:52 GMT Subject: RFR: 8260106: Shenandoah: simplify maybe_update_with_forwarded and related code [v5] In-Reply-To: References: Message-ID: > We have a block in `ShenandoahHeap::maybe_update_with_forwarded` that is irrelevant after JDK-8231086. Additionally, "resolve and update" paths are really only used by STW GCs, and thus do not require atomic updates. This leads to considerable simplifications in the code, and improves performance on the common paths (especially in fastdebug builds that drop many irrelevant asserts). > > Additional testing: > - [x] `hotspot_gc_shenandoah` > - [x] `tier1` with Shenandoah > - [x] `tier2` with Shenandoah Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: - Renames - Eliminate UpdateRefsMode altogether - Simplify update_with_forwarded - Comment updates - Merge branch 'master' into JDK-8260106-shenandoah-simplify-updates - Simplify ShenandoahUpdateHeapRefsTask - Fix up generic update references too, introduce CONC_UPDATE - Simplify further after RESOLVE removal - Merge branch 'master' into JDK-8260106-shenandoah-simplify-updates - Rename maybe to atomic - ... and 2 more: https://git.openjdk.java.net/jdk/compare/ca20c63c...f2c1ecdb ------------- Changes: https://git.openjdk.java.net/jdk/pull/2166/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2166&range=04 Stats: 283 lines in 15 files changed: 86 ins; 116 del; 81 mod Patch: https://git.openjdk.java.net/jdk/pull/2166.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2166/head:pull/2166 PR: https://git.openjdk.java.net/jdk/pull/2166 From rkennke at openjdk.java.net Mon Jan 25 15:14:45 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 25 Jan 2021 15:14:45 GMT Subject: RFR: 8260106: Shenandoah: simplify maybe_update_with_forwarded and related code [v5] In-Reply-To: References: Message-ID: On Mon, 25 Jan 2021 14:37:52 GMT, Aleksey Shipilev wrote: >> We have a block in `ShenandoahHeap::maybe_update_with_forwarded` that is irrelevant after JDK-8231086. Additionally, "resolve and update" paths are really only used by STW GCs, and thus do not require atomic updates. This leads to considerable simplifications in the code, and improves performance on the common paths (especially in fastdebug builds that drop many irrelevant asserts). >> >> Additional testing: >> - [x] `hotspot_gc_shenandoah` >> - [x] `tier1` with Shenandoah >> - [x] `tier2` with Shenandoah > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - Renames > - Eliminate UpdateRefsMode altogether > - Simplify update_with_forwarded > - Comment updates > - Merge branch 'master' into JDK-8260106-shenandoah-simplify-updates > - Simplify ShenandoahUpdateHeapRefsTask > - Fix up generic update references too, introduce CONC_UPDATE > - Simplify further after RESOLVE removal > - Merge branch 'master' into JDK-8260106-shenandoah-simplify-updates > - Rename maybe to atomic > - ... and 2 more: https://git.openjdk.java.net/jdk/compare/ca20c63c...f2c1ecdb This is indeed much better! I only have a very minor comment, src/hotspot/share/gc/shenandoah/shenandoahOopClosures.hpp line 192: > 190: class ShenandoahUpdateRefsSuperClosure : public BasicOopIterateClosure { > 191: protected: > 192: ShenandoahHeap* _heap; While moving it around, maybe make it Shenandoah* const too? ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2166 From zgu at openjdk.java.net Mon Jan 25 15:18:02 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Mon, 25 Jan 2021 15:18:02 GMT Subject: RFR: 8256298: Shenandoah: Enable concurrent stack processing [v4] In-Reply-To: References: Message-ID: > Please review this patch that enables concurrent stack processing for Shenandoah GC. > > After this patch, all root processing is done concurrently for concurrent GC. > > Test: > - [x] hotspot_gc_shenandoah Linux x86_64 and x86_32 > - [x] Nightly > - [x] tier1 with -XX:+UseShenandoahGC on Linux x86_32 Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: More of Roman's comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2185/files - new: https://git.openjdk.java.net/jdk/pull/2185/files/61b3dcb8..77582b33 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2185&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2185&range=02-03 Stats: 8 lines in 2 files changed: 0 ins; 2 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/2185.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2185/head:pull/2185 PR: https://git.openjdk.java.net/jdk/pull/2185 From rkennke at openjdk.java.net Mon Jan 25 15:20:43 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 25 Jan 2021 15:20:43 GMT Subject: RFR: 8256298: Shenandoah: Enable concurrent stack processing [v4] In-Reply-To: References: Message-ID: On Mon, 25 Jan 2021 15:18:02 GMT, Zhengyu Gu wrote: >> Please review this patch that enables concurrent stack processing for Shenandoah GC. >> >> After this patch, all root processing is done concurrently for concurrent GC. >> >> Test: >> - [x] hotspot_gc_shenandoah Linux x86_64 and x86_32 >> - [x] Nightly >> - [x] tier1 with -XX:+UseShenandoahGC on Linux x86_32 > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > More of Roman's comments Looks good to me! Thank you! ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2185 From shade at openjdk.java.net Mon Jan 25 15:58:00 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 25 Jan 2021 15:58:00 GMT Subject: RFR: 8260106: Shenandoah: refactor reference updating closures and related code [v6] In-Reply-To: References: Message-ID: > We have a block in `ShenandoahHeap::maybe_update_with_forwarded` that is irrelevant after JDK-8231086. Additionally, "resolve and update" paths are really only used by STW GCs, and thus do not require atomic updates. This leads to considerable simplifications in the code, and improves performance on the common paths (especially in fastdebug builds that drop many irrelevant asserts). > > Additional testing: > - [x] `hotspot_gc_shenandoah` > - [x] `tier1` with Shenandoah > - [x] `tier2` with Shenandoah Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Add const ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2166/files - new: https://git.openjdk.java.net/jdk/pull/2166/files/f2c1ecdb..4f30251d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2166&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2166&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2166.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2166/head:pull/2166 PR: https://git.openjdk.java.net/jdk/pull/2166 From shade at openjdk.java.net Mon Jan 25 15:58:02 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 25 Jan 2021 15:58:02 GMT Subject: RFR: 8260106: Shenandoah: refactor reference updating closures and related code [v5] In-Reply-To: References: Message-ID: <1SnRdJcfSYia2BFuiKzmIrRA8UH1RvjbU8cW-0nmtl0=.ee85977a-1a16-49f4-8335-f29c8f8765fe@github.com> On Mon, 25 Jan 2021 15:10:47 GMT, Roman Kennke wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: >> >> - Renames >> - Eliminate UpdateRefsMode altogether >> - Simplify update_with_forwarded >> - Comment updates >> - Merge branch 'master' into JDK-8260106-shenandoah-simplify-updates >> - Simplify ShenandoahUpdateHeapRefsTask >> - Fix up generic update references too, introduce CONC_UPDATE >> - Simplify further after RESOLVE removal >> - Merge branch 'master' into JDK-8260106-shenandoah-simplify-updates >> - Rename maybe to atomic >> - ... and 2 more: https://git.openjdk.java.net/jdk/compare/ca20c63c...f2c1ecdb > > src/hotspot/share/gc/shenandoah/shenandoahOopClosures.hpp line 192: > >> 190: class ShenandoahUpdateRefsSuperClosure : public BasicOopIterateClosure { >> 191: protected: >> 192: ShenandoahHeap* _heap; > > While moving it around, maybe make it Shenandoah* const too? Right! Did in new changeset. ------------- PR: https://git.openjdk.java.net/jdk/pull/2166 From shade at openjdk.java.net Mon Jan 25 16:09:45 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 25 Jan 2021 16:09:45 GMT Subject: RFR: 8256298: Shenandoah: Enable concurrent stack processing [v4] In-Reply-To: References: Message-ID: On Mon, 25 Jan 2021 15:18:02 GMT, Zhengyu Gu wrote: >> Please review this patch that enables concurrent stack processing for Shenandoah GC. >> >> After this patch, all root processing is done concurrently for concurrent GC. >> >> Test: >> - [x] hotspot_gc_shenandoah Linux x86_64 and x86_32 >> - [x] Nightly >> - [x] tier1 with -XX:+UseShenandoahGC on Linux x86_32 > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > More of Roman's comments Cursory review. src/hotspot/share/gc/shenandoah/shenandoahClosures.inline.hpp line 79: > 77: void ShenandoahKeepAliveClosure::do_oop(narrowOop* p) { > 78: do_oop_work(p); > 79: } These can move to declarations, right? See how the rest of `shenandoahOopClosures.hpp` does it. src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 512: > 510: // > 511: public: > 512: bool uses_stack_watermark_barrier() const override { return true; } I wonder that this single use of `override` caused CLang to complain about the rest of the file: https://github.com/zhengyu123/jdk/runs/1762873416 src/hotspot/share/gc/shenandoah/shenandoahStackWatermark.cpp line 59: > 57: void ShenandoahStackWatermark::change_epoch_id() { > 58: shenandoah_assert_safepoint(); > 59: _epoch_id ++; Excess space before `++`. ------------- PR: https://git.openjdk.java.net/jdk/pull/2185 From dcubed at openjdk.java.net Mon Jan 25 17:15:48 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Mon, 25 Jan 2021 17:15:48 GMT Subject: RFR: 8260381: ProblemList com/sun/management/DiagnosticCommandMBean/DcmdMBeanTestCheckJni.java on Win with ZGC Message-ID: I'm reducing the noise in the JDK17 CI by ProblemListing this new test on Win* with ZGC. ------------- Commit messages: - 8260381: ProblemList com/sun/management/DiagnosticCommandMBean/DcmdMBeanTestCheckJni.java on Win with ZGC Changes: https://git.openjdk.java.net/jdk/pull/2225/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2225&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260381 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2225.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2225/head:pull/2225 PR: https://git.openjdk.java.net/jdk/pull/2225 From sspitsyn at openjdk.java.net Mon Jan 25 17:30:40 2021 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Mon, 25 Jan 2021 17:30:40 GMT Subject: RFR: 8260381: ProblemList com/sun/management/DiagnosticCommandMBean/DcmdMBeanTestCheckJni.java on Win with ZGC In-Reply-To: References: Message-ID: On Mon, 25 Jan 2021 17:09:17 GMT, Daniel D. Daugherty wrote: > I'm reducing the noise in the JDK17 CI by ProblemListing this new test > on Win* with ZGC. Hi Dan, It looks good and trivial. Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2225 From sjohanss at openjdk.java.net Mon Jan 25 18:13:45 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Mon, 25 Jan 2021 18:13:45 GMT Subject: RFR: 8259808: Add JFR event to detect GC locker stall [v4] In-Reply-To: References: Message-ID: <_sVYAZQIiJSDJlj5QO29F6uF0k2V0ApDtlFuMwVLkwA=.d34f9b7e-a6f5-4131-9299-2e570cf3cf04@github.com> On Sun, 24 Jan 2021 01:48:57 GMT, Denghui Dong wrote: >> GC locker will stall the operation of GC, resulting in some Java threads can not continue to run until GC locker is released, thus affecting the response time of the application. Add a JFR event to report this information is helpful to detect this issue. >> >> For the test purpose, I add two Whitebox methods to lock/unlock critical. > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > add GCLockerTracer::is_started() that makes the logic more clear I think this looks good now but please await a second reviewer. I took if for a spin in out internal testing and tier1-2 looks ok as well as the JFR tests. ------------- Marked as reviewed by sjohanss (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2088 From zgu at openjdk.java.net Mon Jan 25 18:20:56 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Mon, 25 Jan 2021 18:20:56 GMT Subject: RFR: 8256298: Shenandoah: Enable concurrent stack processing [v5] In-Reply-To: References: Message-ID: > Please review this patch that enables concurrent stack processing for Shenandoah GC. > > After this patch, all root processing is done concurrently for concurrent GC. > > Test: > - [x] hotspot_gc_shenandoah Linux x86_64 and x86_32 > - [x] Nightly > - [x] tier1 with -XX:+UseShenandoahGC on Linux x86_32 Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: Aleksey's comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2185/files - new: https://git.openjdk.java.net/jdk/pull/2185/files/77582b33..3cab64e8 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2185&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2185&range=03-04 Stats: 6 lines in 2 files changed: 0 ins; 1 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/2185.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2185/head:pull/2185 PR: https://git.openjdk.java.net/jdk/pull/2185 From sgehwolf at openjdk.java.net Mon Jan 25 18:25:40 2021 From: sgehwolf at openjdk.java.net (Severin Gehwolf) Date: Mon, 25 Jan 2021 18:25:40 GMT Subject: RFR: 8260381: ProblemList com/sun/management/DiagnosticCommandMBean/DcmdMBeanTestCheckJni.java on Win with ZGC In-Reply-To: References: Message-ID: On Mon, 25 Jan 2021 17:09:17 GMT, Daniel D. Daugherty wrote: > I'm reducing the noise in the JDK17 CI by ProblemListing this new test > on Win* with ZGC. Looks good. ------------- Marked as reviewed by sgehwolf (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2225 From dcubed at openjdk.java.net Mon Jan 25 18:25:41 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Mon, 25 Jan 2021 18:25:41 GMT Subject: RFR: 8260381: ProblemList com/sun/management/DiagnosticCommandMBean/DcmdMBeanTestCheckJni.java on Win with ZGC In-Reply-To: References: Message-ID: On Mon, 25 Jan 2021 17:27:51 GMT, Serguei Spitsyn wrote: >> I'm reducing the noise in the JDK17 CI by ProblemListing this new test >> on Win* with ZGC. > > Hi Dan, > It looks good and trivial. > Thanks, > Serguei @sspitsyn - Thanks for the fast review! And thanks for calling it trivial. I forgot to propose it as trivial in my review request. ------------- PR: https://git.openjdk.java.net/jdk/pull/2225 From dcubed at openjdk.java.net Mon Jan 25 18:25:43 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Mon, 25 Jan 2021 18:25:43 GMT Subject: Integrated: 8260381: ProblemList com/sun/management/DiagnosticCommandMBean/DcmdMBeanTestCheckJni.java on Win with ZGC In-Reply-To: References: Message-ID: <_m4LBMMOSCHZy-jIXewl6HpSX1egZ5iwFyh8j-CeHBA=.2f4e0305-9ae1-4be3-8bab-b6b5c08d8f3e@github.com> On Mon, 25 Jan 2021 17:09:17 GMT, Daniel D. Daugherty wrote: > I'm reducing the noise in the JDK17 CI by ProblemListing this new test > on Win* with ZGC. This pull request has now been integrated. Changeset: 5b0b24b5 Author: Daniel D. Daugherty URL: https://git.openjdk.java.net/jdk/commit/5b0b24b5 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod 8260381: ProblemList com/sun/management/DiagnosticCommandMBean/DcmdMBeanTestCheckJni.java on Win with ZGC Reviewed-by: sspitsyn, sgehwolf ------------- PR: https://git.openjdk.java.net/jdk/pull/2225 From zgu at openjdk.java.net Mon Jan 25 18:27:04 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Mon, 25 Jan 2021 18:27:04 GMT Subject: RFR: 8256298: Shenandoah: Enable concurrent stack processing [v4] In-Reply-To: References: Message-ID: <7rVr5fwRs-uKztIZhLlFN7fRGkXPNGKpKnN5WL90Xss=.edb1484f-ad5b-436c-a1d5-e4ff7b23a006@github.com> On Mon, 25 Jan 2021 16:05:44 GMT, Aleksey Shipilev wrote: >> Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: >> >> More of Roman's comments > > src/hotspot/share/gc/shenandoah/shenandoahStackWatermark.cpp line 59: > >> 57: void ShenandoahStackWatermark::change_epoch_id() { >> 58: shenandoah_assert_safepoint(); >> 59: _epoch_id ++; > > Excess space before `++`. Done ------------- PR: https://git.openjdk.java.net/jdk/pull/2185 From zgu at openjdk.java.net Mon Jan 25 18:27:02 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Mon, 25 Jan 2021 18:27:02 GMT Subject: RFR: 8256298: Shenandoah: Enable concurrent stack processing [v6] In-Reply-To: References: Message-ID: > Please review this patch that enables concurrent stack processing for Shenandoah GC. > > After this patch, all root processing is done concurrently for concurrent GC. > > Test: > - [x] hotspot_gc_shenandoah Linux x86_64 and x86_32 > - [x] Nightly > - [x] tier1 with -XX:+UseShenandoahGC on Linux x86_32 Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: Reverted override ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2185/files - new: https://git.openjdk.java.net/jdk/pull/2185/files/3cab64e8..6ade41ad Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2185&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2185&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2185.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2185/head:pull/2185 PR: https://git.openjdk.java.net/jdk/pull/2185 From sangheon.kim at oracle.com Mon Jan 25 18:53:24 2021 From: sangheon.kim at oracle.com (Sangheon Kim) Date: Mon, 25 Jan 2021 10:53:24 -0800 Subject: [External] : Fwd: Unexpected results when enabling +UseNUMA for G1GC In-Reply-To: References: <9c42a365-db78-4699-c138-cc06d0c4708f@oracle.com> Message-ID: Hi Tal, On 1/21/21 9:39 AM, Tal Goldstein wrote: > Hey Sangheon, > Thanks for your suggestions. > I answered your questions in-line. > > Regarding your suggestion to increase the heap, > I've increased the heap size to 40GB and the container memory to 50GB, > and ran 2 deployments (numa and non-numa), each deployment has 1 pod > which runs on a dedicated physical k8s node (the same machines > mentioned previously). > After running it for several days I could see the following pattern: > > For several days, whenever comes the hours of the day when throughput > is at its max, > then the local memory access ratio of NUMA deployment is much better > than the non-numa deployment (5%-6% diff). > This can be seen in the charts below: > > 1. Throughput Per deployment (Numa deployment vs Non-Numa deployment): > https://drive.google.com/file/d/1tG_Qm9MNHZbtmIiXryL8KGMyUk_vylVG/view?usp=sharing > > > > 2. Local memory ratio % (kube3-10769?is the k8s node WITH NUMA, > kube3-10770 WITHOUT NUMA) > https://drive.google.com/file/d/1WmjBSPiwwMpXDX3MWsjQQN6vR3BLSro1/view?usp=sharing > > > From this I understand that the NUMA based deployment behaves better > under a higher workload, > but what's still unclear to me, is why the throughput of the non-numa > deployment is higher than numa deployment ? Sorry, I don't have good answer for that. If you want to investigate, you have to compare logs of 2 runs, both vm and endpoint(if applicable) logs. You can check average gc pause time, gc frequency etc. for vm logs. My answers are in-lined. > > Thanks, > Tal > > On Mon, Jan 11, 2021 at 10:05 PM > wrote: > Hi Tal, > I added in-line comments. > On 1/9/21 12:15 PM, Tal Goldstein wrote: > > Hi Guys, > > We're exploring the use of the flag -XX:+UseNUMA and its effect > on G1 GC in > > JDK 14. > > For that, we've created a test that consists of 2 k8s > deployments of some > > service, > > where deployment A has the UseNUMA flag enabled, and deployment > B doesn't > > have it. > > > > In order for NUMA to actually work inside the docker container, > we also > > needed to add numactl lib to the container (apk add numactl), > > and in order to measure the local/remote memory access we've > used pcm-numa ( > > https://github.com/opcm/pcm > ), > > the docker is based on an image of Alpine Linux v3.11. > > > > Each deployment handles around 150 requests per second and all > of the > > deployment's pods are running on the same kube machine. > > When running the test, we expected to see that the (local memory > access) / > > (total memory access) ratio on the UseNUMA deployment, is much > higher than > > the non-numa deployment, > > and as a result that the deployment itself handles a higher > throughput of > > requests than the non-numa deployment. > > > > Surprisingly this isn't the case: > > On the kube running deployment A which uses NUMA, we measured > 20M/ 13M/ 33M > > (local/remote/total) memory accesses, > > and for the kube running deployment B which doesn't use NUMA, we > measured > > (23M/10M/33M) on the same time. > Just curious, did you see any performance difference(other than > pcm-numa) between those two? > Does it mean you ran 2 pods in parallel(at the same time) on one > physical machine? > > > ?I didn't see any other significant difference. > Yes, so there were 4 pods on the original experiment: > 2 On each deployment (NUMA deployment, and non-NUMA deployment), > and each deployment ran on a separate k8s physical node, > and those nodes didn't run anything else but the 2 k8s pods. Okay. > > > Can you help to understand if we're doing anything wrong? or > maybe our > > expectations are wrong ? > > > > The 2 deployments are identical (except for the UseNUMA flag): > > Each deployment contains 2 pods running on k8s. > > Each pod has 10GB memory, 8GB heap, requires 2 CPUs (but not > limited to 2). > > Each deployment runs on a separate but identical kube machine > with this > > spec: > >? ? ? ? ? ? ? ? Hardware............: Supermicro SYS-2027TR-HTRF+ > >? ? ? ? ? ? ? ? CPU.................: Intel(R) Xeon(R) CPU > E5-2630L v2 @ > > 2.40GHz > >? ? ? ? ? ? ? ? CPUs................: 2 > >? ? ? ? ? ? ? ? CPU Cores...........: 12 > >? ? ? ? ? ? ? ? Memory..............: 63627 MB > > > > > > We've also written to a file all NUMA related logs (using > > > -Xlog:os*,gc*=trace:file=/outbrain/heapdumps/fulllog.log:hostname,time,level,tags) > > - log file could be found here: > > > https://drive.google.com/file/d/1eZqYDtBDWKXaEakh_DoYv0P6V9bcLs6Z/view?usp=sharing > > > so we know that NUMA is indeed working, but again, it doesn't > give the > > desired results we expected to see. > ?From the shared log file, I see only 1 GC (GC id, 6761) and numa stat > shows 53% of local memory allocation (gc,heap,numa) which seems okay. > Could you share your full vm options? > > > These are the updated vm options: > -XX:+PerfDisableSharedMem > -Xmx40g > -Xms40g > -XX:+DisableExplicitGC > -XX:-OmitStackTraceInFastThrow > -XX:+AlwaysPreTouch > -Duser.country=US > -XX:+UnlockDiagnosticVMOptions > -XX:+DebugNonSafepoints > -XX:+ParallelRefProcEnabled > -XX:+UnlockExperimentalVMOptions > -XX:G1MaxNewSizePercent=90 > -XX:InitiatingHeapOccupancyPercent=35 > -XX:-G1UseAdaptiveIHOP > -XX:ActiveProcessorCount=2 > -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005 > -XX:+UseNUMA > -Xlog:os*,gc*=trace:file=/outbrain/heapdumps/fulllog.log:hostname,time,level,tags Thanks > > > > Any Ideas why ? > > Is it a matter of workload ? > Can you increase your Java heap on the testing machine? > Your test machine has almost 64GB of memory on 2 NUMA nodes. So I > assume > each NUMA node will have almost 32GB of memory. > But you are using only 8GB on Java heap which fits on one node, so I > can't expect any benefit of enabling NUMA. > > > But when the jvm is started, doesn't it spreads the heap evenly across > all numa nodes ? > And in this case, won't each NUMA node hold half of the heap (around > 4GB) ? Your statements above are all right. From 8G of Java heap, each half of heap(4G) will be allocated to node 0 and 1. G1 NUMA has tiny addition of 1) checking a caller thread's NUMA id and then 2) allocate memory from same node. (compare to the non-NUMA case). If a testing environment is using very little memory and threads, all of them can reside on one node. So above tiny addition may not help. Running without above addition would work better. This is what I wanted to explain in my previous email. > > I've increased the heap to be 40GB, and the container memory to 50GB. > > As the JVM is running on Kubernetes, there could be another thing may > affect to the test. > For example, topology manager may treat a pod to allocate from a > single > NUMA node. > https://kubernetes.io/docs/tasks/administer-cluster/topology-manager/ > > > That's very interesting, I will read about it and try to understand > more, and to understand if we're even using the topology manager. > Do you think that using k8s with toplogy manager might be the problem ? > Or that actually enabling topology manager should allow better usage > of the hardware and actually help in our case ? Sorry I don't have enough experience / knowledge on topology manager / Kubernetes. As I don't know your testing environment fully, I was trying to enumerate what could affect to your test. Thanks, Sangheon > > Are there any workloads you can suggest that > > will benefit from G1 NUMA awareness ? > I measured some performance improvements on SpecJBB2015 and > SpecJBB2005. > > > Do you happen to have a link to code that runs such a workload? > No, I don't have such link for above runs. > > Thanks, > Sangheon > > > Thanks, > > Tal > > > > > > > The above terms reflect a potential business arrangement, are provided > solely as a basis for further discussion, and are not intended to be > and do not constitute a legally binding obligation. No legally binding > obligations will be created, implied, or inferred until an agreement > in final form is executed in writing by all parties involved. > > This email and any attachments hereto may be confidential or > privileged. ?If you received this communication by mistake, please > don't forward it to anyone else, please erase all copies and > attachments, and please let me know that it has gone to the wrong > person. Thanks. From cjplummer at openjdk.java.net Mon Jan 25 20:50:41 2021 From: cjplummer at openjdk.java.net (Chris Plummer) Date: Mon, 25 Jan 2021 20:50:41 GMT Subject: RFR: 8247514: Improve clhsdb 'findpc' ability to determine what an address points to by improving PointerFinder and PointerLocation classes In-Reply-To: <4YKNpyXQ9QGrLhR61tkh71Q3A7VvCj5Ete_4OvzAA-o=.28b7be8c-6f05-42d4-892b-87ebea907b24@github.com> References: <4YKNpyXQ9QGrLhR61tkh71Q3A7VvCj5Ete_4OvzAA-o=.28b7be8c-6f05-42d4-892b-87ebea907b24@github.com> Message-ID: On Sun, 17 Jan 2021 03:57:59 GMT, Chris Plummer wrote: > See the bug for most details. A few notes here about some implementation details: > > In the `PointerLocation` class, I added more consistency w.r.t. whether or not a newline is printed. It used to for some address types, but not others. Now it always does. And if you see a comment something like the following: > > ` getTLAB().printOn(tty); // includes "\n" ` > > That's just clarifying whether or not the `printOn()` method called will include the newline. Some do and some don't, and knowing what the various `printOn()` methods do makes getting the proper inclusion of the newline easier to understand. > > I added `verbose` and `printAddress` boolean arguments to `PointerLocation.printOn()`. Currently they are always `true`. The false arguments will be used when I complete [JDK-8250801](https://bugs.openjdk.java.net/browse/JDK-8250801), which will use `PointerFinder/Location` to show what each register points to. > > The CR mentions that the main motivation for this work is for eventual replacement of the old clhsdb `whatis` command, which was implemented in javascript. It used to resolve DSO symbols, whereas `findpc` did not. The `whatis` code did this with the following: > > var dso = loadObjectContainingPC(addr); > if (dso == null) { > return ptrLoc.toString(); > } > var sym = dso.closestSymbolToPC(addr); > if (sym != null) { > return sym.name + '+' + sym.offset; > } > And now you'll see something similar in the PointerFinder code: > > loc.loadObject = cdbg.loadObjectContainingPC(a); > if (loc.loadObject != null) { > loc.nativeSymbol = loc.loadObject.closestSymbolToPC(a); > return loc; > } > Note that now that `findpc` does everything that `whatis` used to (and more), we don't really need to add a java version of `whatis`, but I'll probably do so anyway just help out people who are used to using the `whatis` command. That will be done using [JDK-8244670](https://bugs.openjdk.java.net/browse/JDK-8244670) Ping! ------------- PR: https://git.openjdk.java.net/jdk/pull/2111 From rkennke at openjdk.java.net Mon Jan 25 20:50:55 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 25 Jan 2021 20:50:55 GMT Subject: RFR: 8260106: Shenandoah: refactor reference updating closures and related code [v6] In-Reply-To: References: Message-ID: On Mon, 25 Jan 2021 15:58:00 GMT, Aleksey Shipilev wrote: >> We have a block in `ShenandoahHeap::maybe_update_with_forwarded` that is irrelevant after JDK-8231086. Additionally, "resolve and update" paths are really only used by STW GCs, and thus do not require atomic updates. This leads to considerable simplifications in the code, and improves performance on the common paths (especially in fastdebug builds that drop many irrelevant asserts). >> >> Additional testing: >> - [x] `hotspot_gc_shenandoah` >> - [x] `tier1` with Shenandoah >> - [x] `tier2` with Shenandoah > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Add const Looks good! Thanks! ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2166 From github.com+779991+jaokim at openjdk.java.net Mon Jan 25 22:14:02 2021 From: github.com+779991+jaokim at openjdk.java.net (Joakim =?UTF-8?B?Tm9yZHN0csO2bQ==?=) Date: Mon, 25 Jan 2021 22:14:02 GMT Subject: RFR: 8217327: G1 Post-Cleanup region liveness printing should not print out-of-date efficiency [v2] In-Reply-To: References: Message-ID: > **Description** > This fix addresses the issue where gc-efficiency is printed incorrectly when logging post-marking and post-cleanup. The gc-efficiency is calculated in the end of the marking phase, to be logged in the post-cleanup section. It is however not reset, meaning that next phase's post-marking log will show the old value. > > - The gc-efficiency is initialized to -1 when it hasn't been calculated. > - Negative gc-efficiency is displayed as a hyphen "-" in the summary. > - The gc-efficiency is reset to -1 in `HeapRegion::note_start_of_marking()` > > **Note:** there is a sister issue that moves the post-cleanup printing to a later stage. Without this fix, the logging will still be incorrect, so both fixes are needed. See: [JDK-8260042: G1 Post-cleanup liveness printing occurs too early](https://github.com/openjdk/jdk/pull/2168) > > This fix has been tested together with the above mentioned fix. > > **Example** > This is what logging like after fix has been applied. > ### PHASE Post-Marking @ 410.303 > ### HEAP reserved: 0x0ffc00000-0x100000000 region-size: 1048576 > ### > ### type address-range used prev-live next-live gc-eff remset state code-roots > ### (bytes) (bytes) (bytes) (bytes/ms) (bytes) (bytes) > ### OLD 0x0ffc00000-0x0ffd00000 1048368 1048368 1048368 - 8464 UPDAT 6096 > ### OLD 0x0ffd00000-0x0ffe00000 132856 132856 132856 - 2544 UPDAT 16 > ### SURV 0x0ffe00000-0x0fff00000 21368 21368 21368 - 2544 CMPLT 16 > ### FREE 0x0fff00000-0x100000000 0 0 0 - 2384 UNTRA 16 > ### > ### SUMMARY capacity: 4.00 MB used: 1.15 MB / 28.67 % prev-live: 1.15 MB / 28.67 % next-live: 1.15 MB / 28.67 % remset: 0.02 MB code-roots: 0.01 MB > ### PHASE Post-Cleanup @ 410.305 > ### HEAP reserved: 0x0ffc00000-0x100000000 region-size: 1048576 > ### > ### type address-range used prev-live next-live gc-eff remset state code-roots > ### (bytes) (bytes) (bytes) (bytes/ms) (bytes) (bytes) > ### OLD 0x0ffc00000-0x0ffd00000 1048368 1048368 1048368 - 8624 UNTRA 6096 > ### OLD 0x0ffd00000-0x0ffe00000 132856 132856 132856 1352923.9 2544 CMPLT 16 > ### SURV 0x0ffe00000-0x0fff00000 21368 21368 21368 - 2544 CMPLT 16 > ### FREE 0x0fff00000-0x100000000 0 0 0 - 2384 UNTRA 16 > ### > ### SUMMARY capacity: 4.00 MB used: 1.15 MB / 28.67 % prev-live: 1.15 MB / 28.67 % next-live: 1.15 MB / 28.67 % remset: 0.02 MB code-roots: 0.01 MB > ### PHASE Post-Marking @ 450.310 > ### HEAP reserved: 0x0ffc00000-0x100000000 region-size: 1048576 > ### > ### type address-range used prev-live next-live gc-eff remset state code-roots > ### (bytes) (bytes) (bytes) (bytes/ms) (bytes) (bytes) > ### OLD 0x0ffc00000-0x0ffd00000 1048368 1048368 1048368 - 8624 UPDAT 6096 > ### OLD 0x0ffd00000-0x0ffe00000 174456 174456 174456 - 2544 UPDAT 16 > ### SURV 0x0ffe00000-0x0fff00000 21368 21368 21368 - 2544 CMPLT 16 > ### FREE 0x0fff00000-0x100000000 0 0 0 - 2384 UNTRA 16 > ### > ### SUMMARY capacity: 4.00 MB used: 1.19 MB / 29.66 % prev-live: 1.19 MB / 29.66 % next-live: 1.19 MB / 29.66 % remset: 0.02 MB code-roots: 0.01 MB > ### PHASE Post-Cleanup @ 450.312 > ### HEAP reserved: 0x0ffc00000-0x100000000 region-size: 1048576 > ### > ### type address-range used prev-live next-live gc-eff remset state code-roots > ### (bytes) (bytes) (bytes) (bytes/ms) (bytes) (bytes) > ### OLD 0x0ffc00000-0x0ffd00000 1048368 1048368 1048368 - 8624 UNTRA 6096 > ### OLD 0x0ffd00000-0x0ffe00000 174456 174456 174456 1266519.2 2544 CMPLT 16 > ### SURV 0x0ffe00000-0x0fff00000 21368 21368 21368 - 2544 CMPLT 16 > ### FREE 0x0fff00000-0x100000000 0 0 0 - 2384 UNTRA 16 > ### > > **Testing** > - Manual testing > - hs-tier1, hs-tier2 Joakim Nordstr?m has updated the pull request incrementally with one additional commit since the last revision: Fixed copyright year. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2217/files - new: https://git.openjdk.java.net/jdk/pull/2217/files/738f303a..24361880 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2217&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2217&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2217.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2217/head:pull/2217 PR: https://git.openjdk.java.net/jdk/pull/2217 From mli at openjdk.java.net Tue Jan 26 01:36:56 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Tue, 26 Jan 2021 01:36:56 GMT Subject: RFR: JDK-8260200 G1: Remove unnecessary update in FreeRegionList::remove_starting_at [v4] In-Reply-To: References: Message-ID: > optimize FreeRegionList::remove_starting_at by removing unnecessary reading and setting > > FreeRegionList::remove_starting_at(...) traverses from a node and removes subsequent N nodes from free list. But when traverses the free list, it removes nodes one by one by setting the prev and next pointers of prev and next node. it's not necessary do these settings for every node, as we can remove target nodes at once and just set prev and next pointers for just 2 nodes. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: JDK-8260200: G1: Remove unnecessary update in FreeRegionList::remove_starting_at ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2181/files - new: https://git.openjdk.java.net/jdk/pull/2181/files/27c85ec5..30ec12f1 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2181&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2181&range=02-03 Stats: 37 lines in 2 files changed: 19 ins; 16 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2181.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2181/head:pull/2181 PR: https://git.openjdk.java.net/jdk/pull/2181 From mli at openjdk.java.net Tue Jan 26 01:42:42 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Tue, 26 Jan 2021 01:42:42 GMT Subject: RFR: JDK-8260200 G1: Remove unnecessary update in FreeRegionList::remove_starting_at [v2] In-Reply-To: References: <04tWoUPw_97DYXPqlJdJhQyDr0dBW1Vkmaja9rqPMEE=.ff792b59-ab27-49c7-bcaa-3b6d329b7650@github.com> Message-ID: On Mon, 25 Jan 2021 08:58:00 GMT, Stefan Johansson wrote: >> Thanks for reviewing. >> I just realized I made the change in another way from what you suggested exactly, seems it has same effect. Not sue if my current change with "if ASSERT" is OK, please kindly let me know if you don't think so. > > You are correct that when executing it will have the same effect, but I'm more concerned about the readability of the code in this case. So please move it to a separate function, something like `verify_region_to_remove()`. Looking a bit closer at the code I think this function should be able to be called from each round of the loop as well, I mean that's what was done before and by doing so we could get the loop a bit cleaner as well. Or have I missed anything? In that case you would not need to call it before the loop, but just inside it. Thank you for reviewing and clarification. Got your point, I have changed as you suggested, would you mind to have another look at it? ------------- PR: https://git.openjdk.java.net/jdk/pull/2181 From sjohanss at openjdk.java.net Tue Jan 26 08:56:45 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Tue, 26 Jan 2021 08:56:45 GMT Subject: RFR: JDK-8260200 G1: Remove unnecessary update in FreeRegionList::remove_starting_at [v4] In-Reply-To: References: Message-ID: <1wbPMhFBB3zgn22_ajYLYIONkwkOPhpllRGSaBwi6uE=.2ae77ba8-4cc3-432b-8784-a7e9cf957bbe@github.com> On Tue, 26 Jan 2021 01:36:56 GMT, Hamlin Li wrote: >> optimize FreeRegionList::remove_starting_at by removing unnecessary reading and setting >> >> FreeRegionList::remove_starting_at(...) traverses from a node and removes subsequent N nodes from free list. But when traverses the free list, it removes nodes one by one by setting the prev and next pointers of prev and next node. it's not necessary do these settings for every node, as we can remove target nodes at once and just set prev and next pointers for just 2 nodes. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8260200: G1: Remove unnecessary update in FreeRegionList::remove_starting_at Latest update looks good, just a few new comments. src/hotspot/share/gc/g1/heapRegionSet.cpp line 234: > 232: assert_free_region_list(_head != next, "invariant"); > 233: if (next != NULL) { > 234: assert_free_region_list(next->prev() != NULL, "invariant"); This assert could be next->prev() == curr, or am I missing some case? src/hotspot/share/gc/g1/heapRegionSet.hpp line 230: > 228: void abandon(); > 229: > 230: void verify_region_to_remove(HeapRegion* curr, HeapRegion* next) NOT_DEBUG_RETURN; Please move this to the private section, right below the `add_list_common`-functions would be fine. ------------- Changes requested by sjohanss (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2181 From ayang at openjdk.java.net Tue Jan 26 09:12:53 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 26 Jan 2021 09:12:53 GMT Subject: RFR: 8253420: Refactor HeapRegionManager::find_highest_free [v2] In-Reply-To: References: Message-ID: > Using for-loop to make the number of iterations more explicit. Direct backward iteration, `for (uint curr = reserved_length() - 1; curr >= 0; curr--)` doesn't work due to underflow of `uint` type. Therefore, I went for current approach. > > Test: hotspot_gc Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2193/files - new: https://git.openjdk.java.net/jdk/pull/2193/files/c734b63a..67ce558f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2193&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2193&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2193.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2193/head:pull/2193 PR: https://git.openjdk.java.net/jdk/pull/2193 From ayang at openjdk.java.net Tue Jan 26 09:12:54 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 26 Jan 2021 09:12:54 GMT Subject: RFR: 8253420: Refactor HeapRegionManager::find_highest_free [v2] In-Reply-To: References: Message-ID: On Sat, 23 Jan 2021 14:03:13 GMT, Stefan Johansson wrote: >> I would prefer not having side effect in the condition. At first glance, it's not obvious how many iteration the loop entails, `length` or `length - 1`? > > Avoiding side effects is normally good, but in this case I think it actually make the whole intent of the code clearer. We could add to the comment that we loop backwards through all reserved regions to make it clear what the bound is. Updated as suggested, since you both think it's better. ------------- PR: https://git.openjdk.java.net/jdk/pull/2193 From tschatzl at openjdk.java.net Tue Jan 26 09:17:43 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 26 Jan 2021 09:17:43 GMT Subject: RFR: 8217327: G1 Post-Cleanup region liveness printing should not print out-of-date efficiency [v2] In-Reply-To: References: Message-ID: On Mon, 25 Jan 2021 22:14:02 GMT, Joakim Nordstr?m wrote: >> **Description** >> This fix addresses the issue where gc-efficiency is printed incorrectly when logging post-marking and post-cleanup. The gc-efficiency is calculated in the end of the marking phase, to be logged in the post-cleanup section. It is however not reset, meaning that next phase's post-marking log will show the old value. >> >> - The gc-efficiency is initialized to -1 when it hasn't been calculated. >> - Negative gc-efficiency is displayed as a hyphen "-" in the summary. >> - The gc-efficiency is reset to -1 in `HeapRegion::note_start_of_marking()` >> >> **Note:** there is a sister issue that moves the post-cleanup printing to a later stage. Without this fix, the logging will still be incorrect, so both fixes are needed. See: [JDK-8260042: G1 Post-cleanup liveness printing occurs too early](https://github.com/openjdk/jdk/pull/2168) >> >> This fix has been tested together with the above mentioned fix. >> >> **Example** >> This is what logging like after fix has been applied. >> ### PHASE Post-Marking @ 410.303 >> ### HEAP reserved: 0x0ffc00000-0x100000000 region-size: 1048576 >> ### >> ### type address-range used prev-live next-live gc-eff remset state code-roots >> ### (bytes) (bytes) (bytes) (bytes/ms) (bytes) (bytes) >> ### OLD 0x0ffc00000-0x0ffd00000 1048368 1048368 1048368 - 8464 UPDAT 6096 >> ### OLD 0x0ffd00000-0x0ffe00000 132856 132856 132856 - 2544 UPDAT 16 >> ### SURV 0x0ffe00000-0x0fff00000 21368 21368 21368 - 2544 CMPLT 16 >> ### FREE 0x0fff00000-0x100000000 0 0 0 - 2384 UNTRA 16 >> ### >> ### SUMMARY capacity: 4.00 MB used: 1.15 MB / 28.67 % prev-live: 1.15 MB / 28.67 % next-live: 1.15 MB / 28.67 % remset: 0.02 MB code-roots: 0.01 MB >> ### PHASE Post-Cleanup @ 410.305 >> ### HEAP reserved: 0x0ffc00000-0x100000000 region-size: 1048576 >> ### >> ### type address-range used prev-live next-live gc-eff remset state code-roots >> ### (bytes) (bytes) (bytes) (bytes/ms) (bytes) (bytes) >> ### OLD 0x0ffc00000-0x0ffd00000 1048368 1048368 1048368 - 8624 UNTRA 6096 >> ### OLD 0x0ffd00000-0x0ffe00000 132856 132856 132856 1352923.9 2544 CMPLT 16 >> ### SURV 0x0ffe00000-0x0fff00000 21368 21368 21368 - 2544 CMPLT 16 >> ### FREE 0x0fff00000-0x100000000 0 0 0 - 2384 UNTRA 16 >> ### >> ### SUMMARY capacity: 4.00 MB used: 1.15 MB / 28.67 % prev-live: 1.15 MB / 28.67 % next-live: 1.15 MB / 28.67 % remset: 0.02 MB code-roots: 0.01 MB >> ### PHASE Post-Marking @ 450.310 >> ### HEAP reserved: 0x0ffc00000-0x100000000 region-size: 1048576 >> ### >> ### type address-range used prev-live next-live gc-eff remset state code-roots >> ### (bytes) (bytes) (bytes) (bytes/ms) (bytes) (bytes) >> ### OLD 0x0ffc00000-0x0ffd00000 1048368 1048368 1048368 - 8624 UPDAT 6096 >> ### OLD 0x0ffd00000-0x0ffe00000 174456 174456 174456 - 2544 UPDAT 16 >> ### SURV 0x0ffe00000-0x0fff00000 21368 21368 21368 - 2544 CMPLT 16 >> ### FREE 0x0fff00000-0x100000000 0 0 0 - 2384 UNTRA 16 >> ### >> ### SUMMARY capacity: 4.00 MB used: 1.19 MB / 29.66 % prev-live: 1.19 MB / 29.66 % next-live: 1.19 MB / 29.66 % remset: 0.02 MB code-roots: 0.01 MB >> ### PHASE Post-Cleanup @ 450.312 >> ### HEAP reserved: 0x0ffc00000-0x100000000 region-size: 1048576 >> ### >> ### type address-range used prev-live next-live gc-eff remset state code-roots >> ### (bytes) (bytes) (bytes) (bytes/ms) (bytes) (bytes) >> ### OLD 0x0ffc00000-0x0ffd00000 1048368 1048368 1048368 - 8624 UNTRA 6096 >> ### OLD 0x0ffd00000-0x0ffe00000 174456 174456 174456 1266519.2 2544 CMPLT 16 >> ### SURV 0x0ffe00000-0x0fff00000 21368 21368 21368 - 2544 CMPLT 16 >> ### FREE 0x0fff00000-0x100000000 0 0 0 - 2384 UNTRA 16 >> ### >> >> **Testing** >> - Manual testing >> - hs-tier1, hs-tier2 > > Joakim Nordstr?m has updated the pull request incrementally with one additional commit since the last revision: > > Fixed copyright year. Looks good otherwise; there are some strange Windows build failures here too. src/hotspot/share/gc/g1/g1ConcurrentMark.cpp line 2981: > 2979: > 2980: // Print a line for this particular region. > 2981: if(gc_eff < 0) { I would prefer instead of the code duplication, use a `%s` format specifier for the efficiency, and a `FormatBuffer` to format the actual string into it. This should result in much shorter code. ------------- Changes requested by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2217 From sjohanss at openjdk.java.net Tue Jan 26 09:29:42 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Tue, 26 Jan 2021 09:29:42 GMT Subject: RFR: 8253420: Refactor HeapRegionManager::find_highest_free [v2] In-Reply-To: References: Message-ID: On Tue, 26 Jan 2021 09:12:53 GMT, Albert Mingkun Yang wrote: >> Using for-loop to make the number of iterations more explicit. Direct backward iteration, `for (uint curr = reserved_length() - 1; curr >= 0; curr--)` doesn't work due to underflow of `uint` type. Therefore, I went for current approach. >> >> Test: hotspot_gc > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Marked as reviewed by sjohanss (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2193 From tschatzl at openjdk.java.net Tue Jan 26 09:52:43 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 26 Jan 2021 09:52:43 GMT Subject: RFR: 8259808: Add JFR event to detect GC locker stall [v4] In-Reply-To: References: Message-ID: On Sun, 24 Jan 2021 01:48:57 GMT, Denghui Dong wrote: >> GC locker will stall the operation of GC, resulting in some Java threads can not continue to run until GC locker is released, thus affecting the response time of the application. Add a JFR event to report this information is helpful to detect this issue. >> >> For the test purpose, I add two Whitebox methods to lock/unlock critical. > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > add GCLockerTracer::is_started() that makes the logic more clear Changes requested by tschatzl (Reviewer). test/jdk/jdk/jfr/event/gc/detailed/TestGCLockerEvent.java line 34: > 32: import sun.hotspot.WhiteBox; > 33: > 34: /** This block should be the first thing in the test after the copyright notice. test/jdk/jdk/jfr/event/gc/detailed/TestGCLockerEvent.java line 116: > 114: ts[i] = new Thread(() -> { > 115: STALL_COUNT_SIGNAL.countDown(); > 116: for (int j = 0; j < LOOP; j++) { Since the test already uses WhiteBox, please use whitebox to trigger a gc instead of this dodgy method. src/hotspot/share/gc/shared/gcLocker.cpp line 101: > 99: verify_critical_count(); > 100: _needs_gc = true; > 101: GCLockerTracer::start_gc_locker(_jni_lock_count); Not really convinced that passing `_jni_lock_count` here gives a lot of information: this is the number of threads in a critical section at the point of the first thread needing a gc. It's probably better than nothing. At least this information should be added to the description of the event (if that is possible). test/jdk/jdk/jfr/event/gc/detailed/TestGCLockerEvent.java line 2: > 1: /* > 2: * Copyright (c) 2021 Alibaba Group Holding Limited. All Rights Reserved. Would it be possible to keep with the general format of copyright messages in other code, i.e. "Copyright (c) , . ..."? I.e. if possible please add a comma after the year. src/hotspot/share/jfr/metadata/metadata.xml line 1095: > 1093: > 1094: > 1095: Please add descriptions to the fields as mentioned above. ------------- PR: https://git.openjdk.java.net/jdk/pull/2088 From iwalulya at openjdk.java.net Tue Jan 26 10:04:39 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Tue, 26 Jan 2021 10:04:39 GMT Subject: RFR: 8260042: G1 Post-cleanup liveness printing occurs too early In-Reply-To: <25_KrAypifFeF67n3yODY_SaSYoQj937CwOF8qmGPoc=.b8df7f88-0688-4436-87ac-869dcb858ada@github.com> References: <25_KrAypifFeF67n3yODY_SaSYoQj937CwOF8qmGPoc=.b8df7f88-0688-4436-87ac-869dcb858ada@github.com> Message-ID: On Wed, 20 Jan 2021 16:00:18 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this small change that fixes position of the Post-cleanup liveness printing, causing wrong gc efficiencies to be printed? > > I.e. due to some older changes, the calculation of gc efficiences got moved below the printing of the Post-cleanup liveness which should be about these values. This change corrects that. > > Note that there is a sister issue about not printing the gc efficiencies in the "Post-Marking" phase. This is not scope of this change. > > Testing: manual testing that values are correct, hs-tier1+2 Marked as reviewed by iwalulya (Committer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2168 From tschatzl at openjdk.java.net Tue Jan 26 10:18:40 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 26 Jan 2021 10:18:40 GMT Subject: RFR: 8260042: G1 Post-cleanup liveness printing occurs too early In-Reply-To: References: <25_KrAypifFeF67n3yODY_SaSYoQj937CwOF8qmGPoc=.b8df7f88-0688-4436-87ac-869dcb858ada@github.com> Message-ID: <33dc_v2wWESzGuF8lItzxljpsGIJ7z1WxJ6xcjTR6jw=.4e54b9c0-a098-48b4-a755-f4b2e7ff895a@github.com> On Fri, 22 Jan 2021 09:00:43 GMT, Stefan Johansson wrote: >> Hi all, >> >> can I have reviews for this small change that fixes position of the Post-cleanup liveness printing, causing wrong gc efficiencies to be printed? >> >> I.e. due to some older changes, the calculation of gc efficiences got moved below the printing of the Post-cleanup liveness which should be about these values. This change corrects that. >> >> Note that there is a sister issue about not printing the gc efficiencies in the "Post-Marking" phase. This is not scope of this change. >> >> Testing: manual testing that values are correct, hs-tier1+2 > > Looks good! Thanks @kstefanj @walulyai for your reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/2168 From tschatzl at openjdk.java.net Tue Jan 26 10:18:42 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 26 Jan 2021 10:18:42 GMT Subject: Integrated: 8260042: G1 Post-cleanup liveness printing occurs too early In-Reply-To: <25_KrAypifFeF67n3yODY_SaSYoQj937CwOF8qmGPoc=.b8df7f88-0688-4436-87ac-869dcb858ada@github.com> References: <25_KrAypifFeF67n3yODY_SaSYoQj937CwOF8qmGPoc=.b8df7f88-0688-4436-87ac-869dcb858ada@github.com> Message-ID: On Wed, 20 Jan 2021 16:00:18 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this small change that fixes position of the Post-cleanup liveness printing, causing wrong gc efficiencies to be printed? > > I.e. due to some older changes, the calculation of gc efficiences got moved below the printing of the Post-cleanup liveness which should be about these values. This change corrects that. > > Note that there is a sister issue about not printing the gc efficiencies in the "Post-Marking" phase. This is not scope of this change. > > Testing: manual testing that values are correct, hs-tier1+2 This pull request has now been integrated. Changeset: b4ace3e9 Author: Thomas Schatzl URL: https://git.openjdk.java.net/jdk/commit/b4ace3e9 Stats: 10 lines in 2 files changed: 5 ins; 5 del; 0 mod 8260042: G1 Post-cleanup liveness printing occurs too early Reviewed-by: sjohanss, iwalulya ------------- PR: https://git.openjdk.java.net/jdk/pull/2168 From ddong at openjdk.java.net Tue Jan 26 11:06:57 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Tue, 26 Jan 2021 11:06:57 GMT Subject: RFR: 8259808: Add JFR event to detect GC locker stall [v5] In-Reply-To: References: Message-ID: > GC locker will stall the operation of GC, resulting in some Java threads can not continue to run until GC locker is released, thus affecting the response time of the application. Add a JFR event to report this information is helpful to detect this issue. > > For the test purpose, I add two Whitebox methods to lock/unlock critical. Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: Add descriptions and fix the format problem ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2088/files - new: https://git.openjdk.java.net/jdk/pull/2088/files/85987c58..a5d0e0a3 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2088&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2088&range=03-04 Stats: 23 lines in 2 files changed: 11 ins; 9 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/2088.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2088/head:pull/2088 PR: https://git.openjdk.java.net/jdk/pull/2088 From ddong at openjdk.java.net Tue Jan 26 11:07:00 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Tue, 26 Jan 2021 11:07:00 GMT Subject: RFR: 8259808: Add JFR event to detect GC locker stall [v4] In-Reply-To: References: Message-ID: On Tue, 26 Jan 2021 09:42:59 GMT, Thomas Schatzl wrote: >> Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: >> >> add GCLockerTracer::is_started() that makes the logic more clear > > src/hotspot/share/gc/shared/gcLocker.cpp line 101: > >> 99: verify_critical_count(); >> 100: _needs_gc = true; >> 101: GCLockerTracer::start_gc_locker(_jni_lock_count); > > Not really convinced that passing `_jni_lock_count` here gives a lot of information: this is the number of threads in a critical section at the point of the first thread needing a gc. It's probably better than nothing. At least this information should be added to the description of the event (if that is possible). I think this field can be used to judge whether there are many threads that are often in a critical section, but I am not sure if it really helps to analyze the problem., and just as you said, it's better than nothing. An appropriate description of this field has been added. > test/jdk/jdk/jfr/event/gc/detailed/TestGCLockerEvent.java line 116: > >> 114: ts[i] = new Thread(() -> { >> 115: STALL_COUNT_SIGNAL.countDown(); >> 116: for (int j = 0; j < LOOP; j++) { > > Since the test already uses WhiteBox, please use whitebox to trigger a gc instead of this dodgy method. Triggering a GC is not enough, I hope these threads could be stall by the GC locker(call GCLocker::stall_until_clear) so that a correct assertion of the number of stall count could be added. I think it could not be done by WhiteBox::youngGC/fullGC, please correct me if I'm wrong. > src/hotspot/share/jfr/metadata/metadata.xml line 1095: > >> 1093: >> 1094: >> 1095: > > Please add descriptions to the fields as mentioned above. added > test/jdk/jdk/jfr/event/gc/detailed/TestGCLockerEvent.java line 2: > >> 1: /* >> 2: * Copyright (c) 2021 Alibaba Group Holding Limited. All Rights Reserved. > > Would it be possible to keep with the general format of copyright messages in other code, i.e. "Copyright (c) , . ..."? I.e. if possible please add a comma after the year. Fixed. > test/jdk/jdk/jfr/event/gc/detailed/TestGCLockerEvent.java line 34: > >> 32: import sun.hotspot.WhiteBox; >> 33: >> 34: /** > > This block should be the first thing in the test after the copyright notice. Fixed. But I notice that there are many other tests that didn't comply with this rule. ------------- PR: https://git.openjdk.java.net/jdk/pull/2088 From mli at openjdk.java.net Tue Jan 26 11:58:44 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Tue, 26 Jan 2021 11:58:44 GMT Subject: RFR: JDK-8260200 G1: Remove unnecessary update in FreeRegionList::remove_starting_at [v4] In-Reply-To: <1wbPMhFBB3zgn22_ajYLYIONkwkOPhpllRGSaBwi6uE=.2ae77ba8-4cc3-432b-8784-a7e9cf957bbe@github.com> References: <1wbPMhFBB3zgn22_ajYLYIONkwkOPhpllRGSaBwi6uE=.2ae77ba8-4cc3-432b-8784-a7e9cf957bbe@github.com> Message-ID: On Tue, 26 Jan 2021 08:49:32 GMT, Stefan Johansson wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> JDK-8260200: G1: Remove unnecessary update in FreeRegionList::remove_starting_at > > src/hotspot/share/gc/g1/heapRegionSet.cpp line 234: > >> 232: assert_free_region_list(_head != next, "invariant"); >> 233: if (next != NULL) { >> 234: assert_free_region_list(next->prev() != NULL, "invariant"); > > This assert could be next->prev() == curr, or am I missing some case? Thanks for pointing out, changed as you suggested. ------------- PR: https://git.openjdk.java.net/jdk/pull/2181 From mli at openjdk.java.net Tue Jan 26 12:04:02 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Tue, 26 Jan 2021 12:04:02 GMT Subject: RFR: JDK-8260200 G1: Remove unnecessary update in FreeRegionList::remove_starting_at [v5] In-Reply-To: References: Message-ID: <-mD5dYNjnUozTsx1G0PdpULRyFuc3sqsQSGR0-AZOpo=.2f140db3-0bc1-4412-9627-692fe9f3cb0a@github.com> > optimize FreeRegionList::remove_starting_at by removing unnecessary reading and setting > > FreeRegionList::remove_starting_at(...) traverses from a node and removes subsequent N nodes from free list. But when traverses the free list, it removes nodes one by one by setting the prev and next pointers of prev and next node. it's not necessary do these settings for every node, as we can remove target nodes at once and just set prev and next pointers for just 2 nodes. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: JDK-8260200: G1: Remove unnecessary update in FreeRegionList::remove_starting_at ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2181/files - new: https://git.openjdk.java.net/jdk/pull/2181/files/30ec12f1..917d5cc3 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2181&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2181&range=03-04 Stats: 3 lines in 2 files changed: 1 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2181.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2181/head:pull/2181 PR: https://git.openjdk.java.net/jdk/pull/2181 From sjohanss at openjdk.java.net Tue Jan 26 12:07:42 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Tue, 26 Jan 2021 12:07:42 GMT Subject: RFR: JDK-8260200 G1: Remove unnecessary update in FreeRegionList::remove_starting_at [v4] In-Reply-To: <1wbPMhFBB3zgn22_ajYLYIONkwkOPhpllRGSaBwi6uE=.2ae77ba8-4cc3-432b-8784-a7e9cf957bbe@github.com> References: <1wbPMhFBB3zgn22_ajYLYIONkwkOPhpllRGSaBwi6uE=.2ae77ba8-4cc3-432b-8784-a7e9cf957bbe@github.com> Message-ID: <8-7JSSwXTiZoQTRrc7xz6uGIPxgVGKk2w9lqHopt2PY=.980f6451-6111-4eff-9832-ccf1eb17b225@github.com> On Tue, 26 Jan 2021 08:54:14 GMT, Stefan Johansson wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> JDK-8260200: G1: Remove unnecessary update in FreeRegionList::remove_starting_at > > Latest update looks good, just a few new comments. Looks good, will run it through some additional testing before approving. ------------- PR: https://git.openjdk.java.net/jdk/pull/2181 From shade at openjdk.java.net Tue Jan 26 12:35:47 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 26 Jan 2021 12:35:47 GMT Subject: RFR: 8256298: Shenandoah: Enable concurrent stack processing [v6] In-Reply-To: References: Message-ID: On Mon, 25 Jan 2021 18:27:02 GMT, Zhengyu Gu wrote: >> Please review this patch that enables concurrent stack processing for Shenandoah GC. >> >> After this patch, all root processing is done concurrently for concurrent GC. >> >> Test: >> - [x] hotspot_gc_shenandoah Linux x86_64 and x86_32 >> - [x] Nightly >> - [x] tier1 with -XX:+UseShenandoahGC on Linux x86_32 > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > Reverted override Good job, I have only a few minor comments. src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.hpp line 56: > 54: f(init_manage_tlabs, " Manage TLABs") \ > 55: f(init_update_region_states, " Update Region States") \ > 56: f(scan_roots, " Scan Roots") \ Is `scan_roots` used anywhere still? src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.hpp line 59: > 57: SHENANDOAH_PAR_PHASE_DO(scan_, " S: ", f) \ > 58: \ > 59: f(conc_mark_roots, "Concurrent Mark Roots ") \ This seems like fixing the regression from JDK-8255765, is it not? Should be a separate issue? ------------- Changes requested by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2185 From zgu at openjdk.java.net Tue Jan 26 13:20:40 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Tue, 26 Jan 2021 13:20:40 GMT Subject: RFR: 8260106: Shenandoah: refactor reference updating closures and related code [v6] In-Reply-To: References: Message-ID: <9E721gpIaOnuSwQKVGq-Rl01vmSuAGpri9T-XactElU=.8012dc72-d561-4d55-b4e8-9e52300e4c20@github.com> On Mon, 25 Jan 2021 15:58:00 GMT, Aleksey Shipilev wrote: >> We have a block in `ShenandoahHeap::maybe_update_with_forwarded` that is irrelevant after JDK-8231086. Additionally, "resolve and update" paths are really only used by STW GCs, and thus do not require atomic updates. This leads to considerable simplifications in the code, and improves performance on the common paths (especially in fastdebug builds that drop many irrelevant asserts). >> >> Additional testing: >> - [x] `hotspot_gc_shenandoah` >> - [x] `tier1` with Shenandoah >> - [x] `tier2` with Shenandoah > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Add const Still good. ------------- Marked as reviewed by zgu (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2166 From shade at openjdk.java.net Tue Jan 26 13:58:49 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 26 Jan 2021 13:58:49 GMT Subject: RFR: 8260106: Shenandoah: refactor reference updating closures and related code [v7] In-Reply-To: References: Message-ID: > We have a block in `ShenandoahHeap::maybe_update_with_forwarded` that is irrelevant after JDK-8231086. Additionally, "resolve and update" paths are really only used by STW GCs, and thus do not require atomic updates. This leads to considerable simplifications in the code, and improves performance on the common paths (especially in fastdebug builds that drop many irrelevant asserts). > > Additional testing: > - [x] `hotspot_gc_shenandoah` > - [x] `tier1` with Shenandoah > - [x] `tier2` with Shenandoah Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: - Put stars in their old places - Merge branch 'master' into JDK-8260106-shenandoah-simplify-updates - Add const - Renames - Eliminate UpdateRefsMode altogether - Simplify update_with_forwarded - Comment updates - Merge branch 'master' into JDK-8260106-shenandoah-simplify-updates - Simplify ShenandoahUpdateHeapRefsTask - Fix up generic update references too, introduce CONC_UPDATE - ... and 5 more: https://git.openjdk.java.net/jdk/compare/e080ce92...09b2d4aa ------------- Changes: https://git.openjdk.java.net/jdk/pull/2166/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2166&range=06 Stats: 279 lines in 15 files changed: 86 ins; 116 del; 77 mod Patch: https://git.openjdk.java.net/jdk/pull/2166.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2166/head:pull/2166 PR: https://git.openjdk.java.net/jdk/pull/2166 From zgu at openjdk.java.net Tue Jan 26 14:08:59 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Tue, 26 Jan 2021 14:08:59 GMT Subject: RFR: 8256298: Shenandoah: Enable concurrent stack processing [v7] In-Reply-To: References: Message-ID: > Please review this patch that enables concurrent stack processing for Shenandoah GC. > > After this patch, all root processing is done concurrently for concurrent GC. > > Test: > - [x] hotspot_gc_shenandoah Linux x86_64 and x86_32 > - [x] Nightly > - [x] tier1 with -XX:+UseShenandoahGC on Linux x86_32 Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: More Aleksey's comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2185/files - new: https://git.openjdk.java.net/jdk/pull/2185/files/6ade41ad..ef787198 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2185&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2185&range=05-06 Stats: 27 lines in 3 files changed: 0 ins; 26 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2185.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2185/head:pull/2185 PR: https://git.openjdk.java.net/jdk/pull/2185 From shade at openjdk.java.net Tue Jan 26 15:35:42 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 26 Jan 2021 15:35:42 GMT Subject: RFR: 8256298: Shenandoah: Enable concurrent stack processing [v7] In-Reply-To: References: Message-ID: On Tue, 26 Jan 2021 14:08:59 GMT, Zhengyu Gu wrote: >> Please review this patch that enables concurrent stack processing for Shenandoah GC. >> >> After this patch, all root processing is done concurrently for concurrent GC. >> >> Test: >> - [x] hotspot_gc_shenandoah Linux x86_64 and x86_32 >> - [x] Nightly >> - [x] tier1 with -XX:+UseShenandoahGC on Linux x86_32 > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > More Aleksey's comments Marked as reviewed by shade (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2185 From zgu at openjdk.java.net Tue Jan 26 16:49:44 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Tue, 26 Jan 2021 16:49:44 GMT Subject: Integrated: 8256298: Shenandoah: Enable concurrent stack processing In-Reply-To: References: Message-ID: <8GosogfhIIocCGgyf-nFB8jdMJK_HhfYcwtts8K0X84=.d242c7fc-f4d8-420e-8533-f72f17b494e4@github.com> On Thu, 21 Jan 2021 17:33:44 GMT, Zhengyu Gu wrote: > Please review this patch that enables concurrent stack processing for Shenandoah GC. > > After this patch, all root processing is done concurrently for concurrent GC. > > Test: > - [x] hotspot_gc_shenandoah Linux x86_64 and x86_32 > - [x] Nightly > - [x] tier1 with -XX:+UseShenandoahGC on Linux x86_32 This pull request has now been integrated. Changeset: fd00ed74 Author: Zhengyu Gu URL: https://git.openjdk.java.net/jdk/commit/fd00ed74 Stats: 672 lines in 19 files changed: 464 ins; 155 del; 53 mod 8256298: Shenandoah: Enable concurrent stack processing Reviewed-by: rkennke, shade ------------- PR: https://git.openjdk.java.net/jdk/pull/2185 From egahlin at openjdk.java.net Tue Jan 26 16:56:43 2021 From: egahlin at openjdk.java.net (Erik Gahlin) Date: Tue, 26 Jan 2021 16:56:43 GMT Subject: RFR: 8259808: Add JFR event to detect GC locker stall [v5] In-Reply-To: References: Message-ID: On Tue, 26 Jan 2021 11:06:57 GMT, Denghui Dong wrote: >> GC locker will stall the operation of GC, resulting in some Java threads can not continue to run until GC locker is released, thus affecting the response time of the application. Add a JFR event to report this information is helpful to detect this issue. >> >> For the test purpose, I add two Whitebox methods to lock/unlock critical. > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > Add descriptions and fix the format problem Marked as reviewed by egahlin (Reviewer). src/hotspot/share/jfr/metadata/metadata.xml line 1094: > 1092: > 1093: > 1094: I would suggest changing this to: "The number of Java threads in a critical section when the GC locker is started" "The number of Java threads stalled by the GC locker" src/hotspot/share/jfr/metadata/metadata.xml line 1093: > 1091: > 1092: > 1093: "GC Locker Information" is not very useful. Remove the description completely or provide more information. ------------- PR: https://git.openjdk.java.net/jdk/pull/2088 From zgu at openjdk.java.net Tue Jan 26 17:00:48 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Tue, 26 Jan 2021 17:00:48 GMT Subject: RFR: 8260421: Shenandoah: Fix conc_mark_roots timing name and indentations Message-ID: Please review this trivial patch that renames conc_mark_roots timing name and fixes indentations Test: - [x] hotspot_gc_shenandoah ------------- Commit messages: - JDK-8260421 Changes: https://git.openjdk.java.net/jdk/pull/2241/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2241&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260421 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2241.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2241/head:pull/2241 PR: https://git.openjdk.java.net/jdk/pull/2241 From rkennke at openjdk.java.net Tue Jan 26 17:04:40 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 26 Jan 2021 17:04:40 GMT Subject: RFR: 8260421: Shenandoah: Fix conc_mark_roots timing name and indentations In-Reply-To: References: Message-ID: On Tue, 26 Jan 2021 16:55:02 GMT, Zhengyu Gu wrote: > Please review this trivial patch that renames conc_mark_roots timing name and fixes indentations > > Test: > - [x] hotspot_gc_shenandoah Ok! Thanks! ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2241 From shade at openjdk.java.net Tue Jan 26 17:39:40 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 26 Jan 2021 17:39:40 GMT Subject: RFR: 8260421: Shenandoah: Fix conc_mark_roots timing name and indentations In-Reply-To: References: Message-ID: On Tue, 26 Jan 2021 16:55:02 GMT, Zhengyu Gu wrote: > Please review this trivial patch that renames conc_mark_roots timing name and fixes indentations > > Test: > - [x] hotspot_gc_shenandoah Marked as reviewed by shade (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2241 From rkennke at openjdk.java.net Tue Jan 26 18:05:48 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 26 Jan 2021 18:05:48 GMT Subject: RFR: 8260449: Remove stale declaration of SATBMarkQueue::apply_closure_and_empty() Message-ID: <341y1eXZgv7GWiMEqSxIqfrmgk2UbR8YZ7VAl6GPcaE=.731cf048-c9bf-446e-bb99-8d34550a9c5e@github.com> JDK-8258742 removed apply_closure_and_empty(), but curiously JDK-8260263 reintroduced its declaration (but no definition). Testing: - [x] build fastdebug/release on Linux/x86_64 - [x] tier1 ------------- Commit messages: - 8260449: Remove stale declaration of SATBMarkQueue::apply_closure_and_empty() Changes: https://git.openjdk.java.net/jdk/pull/2242/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2242&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260449 Stats: 4 lines in 1 file changed: 0 ins; 4 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2242.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2242/head:pull/2242 PR: https://git.openjdk.java.net/jdk/pull/2242 From shade at openjdk.java.net Tue Jan 26 18:18:50 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 26 Jan 2021 18:18:50 GMT Subject: RFR: 8260106: Shenandoah: refactor reference updating closures and related code [v8] In-Reply-To: References: Message-ID: > We have a block in `ShenandoahHeap::maybe_update_with_forwarded` that is irrelevant after JDK-8231086. Additionally, "resolve and update" paths are really only used by STW GCs, and thus do not require atomic updates. This leads to considerable simplifications in the code, and improves performance on the common paths (especially in fastdebug builds that drop many irrelevant asserts). > > Additional testing: > - [x] `hotspot_gc_shenandoah` > - [x] `tier1` with Shenandoah > - [x] `tier2` with Shenandoah Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: - Merge branch 'master' into JDK-8260106-shenandoah-simplify-updates - Put stars in their old places - Merge branch 'master' into JDK-8260106-shenandoah-simplify-updates - Add const - Renames - Eliminate UpdateRefsMode altogether - Simplify update_with_forwarded - Comment updates - Merge branch 'master' into JDK-8260106-shenandoah-simplify-updates - Simplify ShenandoahUpdateHeapRefsTask - ... and 6 more: https://git.openjdk.java.net/jdk/compare/fd00ed74...b2905776 ------------- Changes: https://git.openjdk.java.net/jdk/pull/2166/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2166&range=07 Stats: 279 lines in 15 files changed: 86 ins; 116 del; 77 mod Patch: https://git.openjdk.java.net/jdk/pull/2166.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2166/head:pull/2166 PR: https://git.openjdk.java.net/jdk/pull/2166 From zgu at openjdk.java.net Tue Jan 26 20:28:41 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Tue, 26 Jan 2021 20:28:41 GMT Subject: Integrated: 8260421: Shenandoah: Fix conc_mark_roots timing name and indentations In-Reply-To: References: Message-ID: On Tue, 26 Jan 2021 16:55:02 GMT, Zhengyu Gu wrote: > Please review this trivial patch that renames conc_mark_roots timing name and fixes indentations > > Test: > - [x] hotspot_gc_shenandoah This pull request has now been integrated. Changeset: 1bebd418 Author: Zhengyu Gu URL: https://git.openjdk.java.net/jdk/commit/1bebd418 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8260421: Shenandoah: Fix conc_mark_roots timing name and indentations Reviewed-by: rkennke, shade ------------- PR: https://git.openjdk.java.net/jdk/pull/2241 From tschatzl at openjdk.java.net Tue Jan 26 23:30:39 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 26 Jan 2021 23:30:39 GMT Subject: RFR: 8260449: Remove stale declaration of SATBMarkQueue::apply_closure_and_empty() In-Reply-To: <341y1eXZgv7GWiMEqSxIqfrmgk2UbR8YZ7VAl6GPcaE=.731cf048-c9bf-446e-bb99-8d34550a9c5e@github.com> References: <341y1eXZgv7GWiMEqSxIqfrmgk2UbR8YZ7VAl6GPcaE=.731cf048-c9bf-446e-bb99-8d34550a9c5e@github.com> Message-ID: On Tue, 26 Jan 2021 18:01:12 GMT, Roman Kennke wrote: > JDK-8258742 removed apply_closure_and_empty(), but curiously JDK-8260263 reintroduced its declaration (but no definition). > > Testing: > - [x] build fastdebug/release on Linux/x86_64 > - [x] tier1 Lgtm and trivial. Sorry for introducing this. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2242 From kbarrett at openjdk.java.net Wed Jan 27 00:30:42 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 27 Jan 2021 00:30:42 GMT Subject: RFR: 8253420: Refactor HeapRegionManager::find_highest_free [v2] In-Reply-To: References: Message-ID: On Tue, 26 Jan 2021 09:12:53 GMT, Albert Mingkun Yang wrote: >> Using for-loop to make the number of iterations more explicit. Direct backward iteration, `for (uint curr = reserved_length() - 1; curr >= 0; curr--)` doesn't work due to underflow of `uint` type. Therefore, I went for current approach. >> >> Test: hotspot_gc > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Marked as reviewed by kbarrett (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2193 From mli at openjdk.java.net Wed Jan 27 01:04:40 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Wed, 27 Jan 2021 01:04:40 GMT Subject: RFR: JDK-8260200 G1: Remove unnecessary update in FreeRegionList::remove_starting_at [v4] In-Reply-To: <8-7JSSwXTiZoQTRrc7xz6uGIPxgVGKk2w9lqHopt2PY=.980f6451-6111-4eff-9832-ccf1eb17b225@github.com> References: <1wbPMhFBB3zgn22_ajYLYIONkwkOPhpllRGSaBwi6uE=.2ae77ba8-4cc3-432b-8784-a7e9cf957bbe@github.com> <8-7JSSwXTiZoQTRrc7xz6uGIPxgVGKk2w9lqHopt2PY=.980f6451-6111-4eff-9832-ccf1eb17b225@github.com> Message-ID: On Tue, 26 Jan 2021 12:05:09 GMT, Stefan Johansson wrote: > Looks good, will run it through some additional testing before approving. Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/2181 From ddong at openjdk.java.net Wed Jan 27 02:25:01 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Wed, 27 Jan 2021 02:25:01 GMT Subject: RFR: 8259808: Add JFR event to detect GC locker stall [v5] In-Reply-To: References: Message-ID: <1a-o8CEtmWfhrp2K9gzs8pqgt_ISmH8BEivGEqS4Rlo=.d16eda2f-b559-4083-a16e-25a1dd0d325a@github.com> On Tue, 26 Jan 2021 16:49:37 GMT, Erik Gahlin wrote: >> Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: >> >> Add descriptions and fix the format problem > > src/hotspot/share/jfr/metadata/metadata.xml line 1094: > >> 1092: >> 1093: >> 1094: > > I would suggest changing this to: > > "The number of Java threads in a critical section when the GC locker is started" > "The number of Java threads stalled by the GC locker" changed. > src/hotspot/share/jfr/metadata/metadata.xml line 1093: > >> 1091: >> 1092: >> 1093: > > "GC Locker Information" is not very useful. Remove the description completely or provide more information. removed. ------------- PR: https://git.openjdk.java.net/jdk/pull/2088 From ddong at openjdk.java.net Wed Jan 27 02:25:00 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Wed, 27 Jan 2021 02:25:00 GMT Subject: RFR: 8259808: Add JFR event to detect GC locker stall [v6] In-Reply-To: References: Message-ID: > GC locker will stall the operation of GC, resulting in some Java threads can not continue to run until GC locker is released, thus affecting the response time of the application. Add a JFR event to report this information is helpful to detect this issue. > > For the test purpose, I add two Whitebox methods to lock/unlock critical. Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: improve descriptions ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2088/files - new: https://git.openjdk.java.net/jdk/pull/2088/files/a5d0e0a3..7efac60f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2088&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2088&range=04-05 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/2088.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2088/head:pull/2088 PR: https://git.openjdk.java.net/jdk/pull/2088 From shade at openjdk.java.net Wed Jan 27 07:20:42 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 27 Jan 2021 07:20:42 GMT Subject: Integrated: 8260106: Shenandoah: refactor reference updating closures and related code In-Reply-To: References: Message-ID: On Wed, 20 Jan 2021 15:06:20 GMT, Aleksey Shipilev wrote: > We have a block in `ShenandoahHeap::maybe_update_with_forwarded` that is irrelevant after JDK-8231086. Additionally, "resolve and update" paths are really only used by STW GCs, and thus do not require atomic updates. This leads to considerable simplifications in the code, and improves performance on the common paths (especially in fastdebug builds that drop many irrelevant asserts). > > Additional testing: > - [x] `hotspot_gc_shenandoah` > - [x] `tier1` with Shenandoah > - [x] `tier2` with Shenandoah This pull request has now been integrated. Changeset: bd2744dd Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/bd2744dd Stats: 279 lines in 15 files changed: 86 ins; 116 del; 77 mod 8260106: Shenandoah: refactor reference updating closures and related code Reviewed-by: zgu, rkennke ------------- PR: https://git.openjdk.java.net/jdk/pull/2166 From ayang at openjdk.java.net Wed Jan 27 07:38:41 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Wed, 27 Jan 2021 07:38:41 GMT Subject: RFR: 8253420: Refactor HeapRegionManager::find_highest_free [v2] In-Reply-To: References: Message-ID: On Wed, 27 Jan 2021 00:27:27 GMT, Kim Barrett wrote: >> Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > Marked as reviewed by kbarrett (Reviewer). Thank you for the reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/2193 From rkennke at openjdk.java.net Wed Jan 27 09:35:40 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 27 Jan 2021 09:35:40 GMT Subject: Integrated: 8260449: Remove stale declaration of SATBMarkQueue::apply_closure_and_empty() In-Reply-To: <341y1eXZgv7GWiMEqSxIqfrmgk2UbR8YZ7VAl6GPcaE=.731cf048-c9bf-446e-bb99-8d34550a9c5e@github.com> References: <341y1eXZgv7GWiMEqSxIqfrmgk2UbR8YZ7VAl6GPcaE=.731cf048-c9bf-446e-bb99-8d34550a9c5e@github.com> Message-ID: <_jzC3WmgGO39izNjfZgbqjcaNanUwLZ_gFhw4My5tq0=.c69d03df-3df6-4dc5-80b9-24c5fc81bc07@github.com> On Tue, 26 Jan 2021 18:01:12 GMT, Roman Kennke wrote: > JDK-8258742 removed apply_closure_and_empty(), but curiously JDK-8260263 reintroduced its declaration (but no definition). > > Testing: > - [x] build fastdebug/release on Linux/x86_64 > - [x] tier1 This pull request has now been integrated. Changeset: 4d004c94 Author: Roman Kennke URL: https://git.openjdk.java.net/jdk/commit/4d004c94 Stats: 4 lines in 1 file changed: 0 ins; 4 del; 0 mod 8260449: Remove stale declaration of SATBMarkQueue::apply_closure_and_empty() Reviewed-by: tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/2242 From sjohanss at openjdk.java.net Wed Jan 27 10:00:47 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 27 Jan 2021 10:00:47 GMT Subject: RFR: JDK-8260200 G1: Remove unnecessary update in FreeRegionList::remove_starting_at [v5] In-Reply-To: <-mD5dYNjnUozTsx1G0PdpULRyFuc3sqsQSGR0-AZOpo=.2f140db3-0bc1-4412-9627-692fe9f3cb0a@github.com> References: <-mD5dYNjnUozTsx1G0PdpULRyFuc3sqsQSGR0-AZOpo=.2f140db3-0bc1-4412-9627-692fe9f3cb0a@github.com> Message-ID: On Tue, 26 Jan 2021 12:04:02 GMT, Hamlin Li wrote: >> optimize FreeRegionList::remove_starting_at by removing unnecessary reading and setting >> >> FreeRegionList::remove_starting_at(...) traverses from a node and removes subsequent N nodes from free list. But when traverses the free list, it removes nodes one by one by setting the prev and next pointers of prev and next node. it's not necessary do these settings for every node, as we can remove target nodes at once and just set prev and next pointers for just 2 nodes. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8260200: G1: Remove unnecessary update in FreeRegionList::remove_starting_at Testing looks good. ------------- Marked as reviewed by sjohanss (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2181 From ayang at openjdk.java.net Wed Jan 27 10:01:41 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Wed, 27 Jan 2021 10:01:41 GMT Subject: Integrated: 8253420: Refactor HeapRegionManager::find_highest_free In-Reply-To: References: Message-ID: On Fri, 22 Jan 2021 08:59:51 GMT, Albert Mingkun Yang wrote: > Using for-loop to make the number of iterations more explicit. Direct backward iteration, `for (uint curr = reserved_length() - 1; curr >= 0; curr--)` doesn't work due to underflow of `uint` type. Therefore, I went for current approach. > > Test: hotspot_gc This pull request has now been integrated. Changeset: fa40a966 Author: Albert Mingkun Yang Committer: Stefan Johansson URL: https://git.openjdk.java.net/jdk/commit/fa40a966 Stats: 12 lines in 1 file changed: 2 ins; 7 del; 3 mod 8253420: Refactor HeapRegionManager::find_highest_free Reviewed-by: sjohanss, kbarrett ------------- PR: https://git.openjdk.java.net/jdk/pull/2193 From mli at openjdk.java.net Wed Jan 27 11:21:43 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Wed, 27 Jan 2021 11:21:43 GMT Subject: RFR: JDK-8260200 G1: Remove unnecessary update in FreeRegionList::remove_starting_at [v5] In-Reply-To: References: <-mD5dYNjnUozTsx1G0PdpULRyFuc3sqsQSGR0-AZOpo=.2f140db3-0bc1-4412-9627-692fe9f3cb0a@github.com> Message-ID: On Wed, 27 Jan 2021 09:57:48 GMT, Stefan Johansson wrote: > Testing looks good. Hi Stefan, Thank you for reviewing. :-) ------------- PR: https://git.openjdk.java.net/jdk/pull/2181 From mli at openjdk.java.net Wed Jan 27 11:25:42 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Wed, 27 Jan 2021 11:25:42 GMT Subject: RFR: JDK-8260200 G1: Remove unnecessary update in FreeRegionList::remove_starting_at [v5] In-Reply-To: References: <-mD5dYNjnUozTsx1G0PdpULRyFuc3sqsQSGR0-AZOpo=.2f140db3-0bc1-4412-9627-692fe9f3cb0a@github.com> Message-ID: On Wed, 27 Jan 2021 11:18:51 GMT, Hamlin Li wrote: >> Testing looks good. > >> Testing looks good. > > Hi Stefan, Thank you for reviewing. :-) @tschatzl , Hi Thomas, bot just added the "ready" label, is this PR ready to be integrated? It's only approved by 1 "R". Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/2181 From tschatzl at openjdk.java.net Wed Jan 27 12:42:43 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 27 Jan 2021 12:42:43 GMT Subject: RFR: JDK-8260200 G1: Remove unnecessary update in FreeRegionList::remove_starting_at [v5] In-Reply-To: <-mD5dYNjnUozTsx1G0PdpULRyFuc3sqsQSGR0-AZOpo=.2f140db3-0bc1-4412-9627-692fe9f3cb0a@github.com> References: <-mD5dYNjnUozTsx1G0PdpULRyFuc3sqsQSGR0-AZOpo=.2f140db3-0bc1-4412-9627-692fe9f3cb0a@github.com> Message-ID: <5i7qxWOLTHkqgMR8SPriW1XN0e1gO_UBsiIA_XYoDyE=.e515eb2c-9064-45de-8271-e983f31ba098@github.com> On Tue, 26 Jan 2021 12:04:02 GMT, Hamlin Li wrote: >> optimize FreeRegionList::remove_starting_at by removing unnecessary reading and setting >> >> FreeRegionList::remove_starting_at(...) traverses from a node and removes subsequent N nodes from free list. But when traverses the free list, it removes nodes one by one by setting the prev and next pointers of prev and next node. it's not necessary do these settings for every node, as we can remove target nodes at once and just set prev and next pointers for just 2 nodes. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8260200: G1: Remove unnecessary update in FreeRegionList::remove_starting_at Marked as reviewed by tschatzl (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2181 From sjohanss at openjdk.java.net Wed Jan 27 14:34:40 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 27 Jan 2021 14:34:40 GMT Subject: RFR: 8259808: Add JFR event to detect GC locker stall [v4] In-Reply-To: References: Message-ID: On Tue, 26 Jan 2021 09:50:02 GMT, Thomas Schatzl wrote: >> Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: >> >> add GCLockerTracer::is_started() that makes the logic more clear > > Changes requested by tschatzl (Reviewer). Since @tschatzl requested changes yesterday I will wait for him to sponsor this. ------------- PR: https://git.openjdk.java.net/jdk/pull/2088 From tschatzl at openjdk.java.net Wed Jan 27 15:28:47 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 27 Jan 2021 15:28:47 GMT Subject: RFR: JDK-8260200 G1: Remove unnecessary update in FreeRegionList::remove_starting_at [v5] In-Reply-To: <5i7qxWOLTHkqgMR8SPriW1XN0e1gO_UBsiIA_XYoDyE=.e515eb2c-9064-45de-8271-e983f31ba098@github.com> References: <-mD5dYNjnUozTsx1G0PdpULRyFuc3sqsQSGR0-AZOpo=.2f140db3-0bc1-4412-9627-692fe9f3cb0a@github.com> <5i7qxWOLTHkqgMR8SPriW1XN0e1gO_UBsiIA_XYoDyE=.e515eb2c-9064-45de-8271-e983f31ba098@github.com> Message-ID: On Wed, 27 Jan 2021 12:39:50 GMT, Thomas Schatzl wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> JDK-8260200: G1: Remove unnecessary update in FreeRegionList::remove_starting_at > > Marked as reviewed by tschatzl (Reviewer). You have two "R"eviewers now :) The official rule is one "R"eviewer, and (currently) one "C"ommitter, but I am good with just a second reviewer. ------------- PR: https://git.openjdk.java.net/jdk/pull/2181 From tschatzl at openjdk.java.net Wed Jan 27 15:31:43 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 27 Jan 2021 15:31:43 GMT Subject: RFR: 8259808: Add JFR event to detect GC locker stall [v4] In-Reply-To: References: Message-ID: On Tue, 26 Jan 2021 11:03:15 GMT, Denghui Dong wrote: >> test/jdk/jdk/jfr/event/gc/detailed/TestGCLockerEvent.java line 116: >> >>> 114: ts[i] = new Thread(() -> { >>> 115: STALL_COUNT_SIGNAL.countDown(); >>> 116: for (int j = 0; j < LOOP; j++) { >> >> Since the test already uses WhiteBox, please use whitebox to trigger a gc instead of this dodgy method. > > Triggering a GC is not enough, I hope these threads could be stalled by the GC locker(call GCLocker::stall_until_clear) so that a correct assertion of the number of stall count could be added. > I think it could not be done by WhiteBox::youngGC/fullGC, please correct me if I'm wrong. I agree, although this is more a theoretical concern since it's not checked. Let's keep this for now as is though. ------------- PR: https://git.openjdk.java.net/jdk/pull/2088 From tschatzl at openjdk.java.net Wed Jan 27 15:31:42 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 27 Jan 2021 15:31:42 GMT Subject: RFR: 8259808: Add JFR event to detect GC locker stall [v6] In-Reply-To: References: Message-ID: On Wed, 27 Jan 2021 02:25:00 GMT, Denghui Dong wrote: >> GC locker will stall the operation of GC, resulting in some Java threads can not continue to run until GC locker is released, thus affecting the response time of the application. Add a JFR event to report this information is helpful to detect this issue. >> >> For the test purpose, I add two Whitebox methods to lock/unlock critical. > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > improve descriptions Looks good. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2088 From ddong at openjdk.java.net Wed Jan 27 15:31:46 2021 From: ddong at openjdk.java.net (Denghui Dong) Date: Wed, 27 Jan 2021 15:31:46 GMT Subject: Integrated: 8259808: Add JFR event to detect GC locker stall In-Reply-To: References: Message-ID: <0jX_KfLI5gBLg8Ew1dRZnDcMpwWMB-xysSIoARaCE_w=.144f412d-4b8f-4dc6-bc64-b2d23607ec90@github.com> On Fri, 15 Jan 2021 02:42:20 GMT, Denghui Dong wrote: > GC locker will stall the operation of GC, resulting in some Java threads can not continue to run until GC locker is released, thus affecting the response time of the application. Add a JFR event to report this information is helpful to detect this issue. > > For the test purpose, I add two Whitebox methods to lock/unlock critical. This pull request has now been integrated. Changeset: 311a0a91 Author: Denghui Dong Committer: Thomas Schatzl URL: https://git.openjdk.java.net/jdk/commit/311a0a91 Stats: 224 lines in 10 files changed: 224 ins; 0 del; 0 mod 8259808: Add JFR event to detect GC locker stall Reviewed-by: sjohanss, tschatzl, egahlin ------------- PR: https://git.openjdk.java.net/jdk/pull/2088 From rkennke at openjdk.java.net Wed Jan 27 15:47:51 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 27 Jan 2021 15:47:51 GMT Subject: RFR: 8260497: Shenandoah: Improve SATB flushing Message-ID: Currently, we periodically force flushing of SATB queues. This works by activating a flag every 100ms in every thread, which causes that thread to enqueue its SATB buffer the next time it overflows, even if it doesn't meet its threshold after filtering. This is somewhat problematic when a thread does not actually overflow its SATB queue in time. The whole point of the exercise is to try and avoid having too much left-over work when we reach final-mark. We can do better than that: when concurrent mark is done we can handshake all threads, and let them flush their respective SATB queues, and re-enter concurrent mark loop again, until flushing yields no more work. Experiments show that it usually takes 1-3 flushes to clean out leftover work properly. I ran benchmarks, 3 high-pressure preset runs of SPECjbb2015, 10 minutes each: baseline: Finish Mark = 0,251 s (a = 688 us) (n = 364) (lvls, us = 125, 486, 621, 824, 4156) Finish Mark = 0,338 s (a = 922 us) (n = 366) (lvls, us = 131, 494, 652, 852, 72948) Finish Mark = 0,257 s (a = 699 us) (n = 368) (lvls, us = 111, 492, 645, 826, 4447) patched: Finish Mark = 0,112 s (a = 301 us) (n = 370) (lvls, us = 115, 207, 250, 281, 3709) Finish Mark = 0,107 s (a = 292 us) (n = 368) (lvls, us = 107, 209, 248, 287, 3329) Finish Mark = 0,114 s (a = 310 us) (n = 367) (lvls, us = 115, 211, 254, 285, 3819) It reliably lowers all timings for finish-mark. It also doesn't cause any regressions in throughput. Testing: - [x] hotspot_gc_shenandoah - [x] benchmarks ------------- Commit messages: - Some typing touch-ups - Merge remote-tracking branch 'upstream/master' into conc-flush-satb - Some cleanups, according to Aleksey's suggestions - Use SATBMarkQueue's enqueued counter; Aleksey's comments - Remove some more unrelated changes - Remove unrelated changes - Merge branch 'master' into conc-flush-satb - Remove old force-flush impl; retry until no more SATB enqueues - Simpler flushing - Merge branch 'master' into conc-flush-satb - ... and 2 more: https://git.openjdk.java.net/jdk/compare/7ed591cc...8df11fc1 Changes: https://git.openjdk.java.net/jdk/pull/2254/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2254&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260497 Stats: 94 lines in 10 files changed: 30 ins; 59 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/2254.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2254/head:pull/2254 PR: https://git.openjdk.java.net/jdk/pull/2254 From shade at openjdk.java.net Wed Jan 27 15:47:53 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 27 Jan 2021 15:47:53 GMT Subject: RFR: 8260497: Shenandoah: Improve SATB flushing In-Reply-To: References: Message-ID: On Wed, 27 Jan 2021 10:15:19 GMT, Roman Kennke wrote: > Currently, we periodically force flushing of SATB queues. This works by activating a flag every 100ms in every thread, which causes that thread to enqueue its SATB buffer the next time it overflows, even if it doesn't meet its threshold after filtering. This is somewhat problematic when a thread does not actually overflow its SATB queue in time. The whole point of the exercise is to try and avoid having too much left-over work when we reach final-mark. > > We can do better than that: when concurrent mark is done we can handshake all threads, and let them flush their respective SATB queues, and re-enter concurrent mark loop again, until flushing yields no more work. Experiments show that it usually takes 1-3 flushes to clean out leftover work properly. > > I ran benchmarks, 3 high-pressure preset runs of SPECjbb2015, 10 minutes each: > > baseline: > Finish Mark = 0,251 s (a = 688 us) (n = 364) (lvls, us = 125, 486, 621, 824, 4156) > Finish Mark = 0,338 s (a = 922 us) (n = 366) (lvls, us = 131, 494, 652, 852, 72948) > Finish Mark = 0,257 s (a = 699 us) (n = 368) (lvls, us = 111, 492, 645, 826, 4447) > > patched: > Finish Mark = 0,112 s (a = 301 us) (n = 370) (lvls, us = 115, 207, 250, 281, 3709) > Finish Mark = 0,107 s (a = 292 us) (n = 368) (lvls, us = 107, 209, 248, 287, 3329) > Finish Mark = 0,114 s (a = 310 us) (n = 367) (lvls, us = 115, 211, 254, 285, 3819) > > It reliably lowers all timings for finish-mark. It also doesn't cause any regressions in throughput. > > Testing: > - [x] hotspot_gc_shenandoah > - [x] benchmarks Changes requested by shade (Reviewer). Changes requested by shade (Reviewer). This looks nice. Although you want to fix some of the build falures: https://github.com/rkennke/jdk/runs/1776231024?check_suite_focus=true Changes requested by shade (Reviewer). This looks good to me! src/hotspot/share/gc/shared/satbMarkQueue.hpp line 118: > 116: // Return true if the queue's buffer should be enqueued, even if not full. > 117: // The default method uses the buffer enqueue threshold. > 118: bool should_enqueue_buffer(SATBMarkQueue& queue); Why drop `virtual` here? Is it because Shenandoah was the only virtual override of it, and now we can do the non-virtual call? src/hotspot/share/gc/shenandoah/shenandoahSATBMarkQueueSet.cpp line 59: > 57: void ShenandoahSATBMarkQueueSet::enqueue_completed_buffer(BufferNode* node) { > 58: SATBMarkQueueSet::enqueue_completed_buffer(node); > 59: Atomic::inc(&_enqueued_count); I believe `SATBMarkQueueSet` already tracks this, and we could instead use `SATBMarkQueueSet::completed_buffers_num`? src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 256: > 254: > 255: ShenandoahFlushSATBHandshakeClosure flush_satb; > 256: ShenandoahSATBMarkQueueSet& qset = ShenandoahBarrierSet::satb_mark_queue_set(); Since you have the `qset` here, you might as well pass it to closure. src/hotspot/share/gc/shenandoah/shenandoahSATBMarkQueueSet.hpp line 35: > 33: class ShenandoahSATBMarkQueueSet : public SATBMarkQueueSet { > 34: private: > 35: volatile int _enqueued_count; I have a suspicion that `int` would overflow at some point in the long-running application. `size_t` would fit better, but then see the other comment that `SATBMQ` already tracks it itself. src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 266: > 264: workers->run_task(&task); > 265: > 266: enqueued_count_before = qset.completed_buffers_num(); Suggestion for names: `completed_before`, `completed_after`. src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 270: > 268: enqueued_count_after = qset.completed_buffers_num(); > 269: flushes++; > 270: } while (enqueued_count_before != enqueued_count_after && flushes < max_flushes); So, how does this interact with cancellation? Shouldn't we check for `cancelled_gc()` here as well? src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 257: > 255: ShenandoahSATBMarkQueueSet& qset = ShenandoahBarrierSet::satb_mark_queue_set(); > 256: ShenandoahFlushSATBHandshakeClosure flush_satb(qset); > 257: for (int flushes = 0; flushes < ShenandoahMaxSATBBufferFlushes; flushes++) { Should probably be `uint` to match the unsigned `ShenandoahMaxSATBBufferFlushes`. src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 267: > 265: } > 266: > 267: int before = qset.completed_buffers_num(); Should probably be `size_t`. ------------- PR: https://git.openjdk.java.net/jdk/pull/2254Marked as reviewed by shade (Reviewer). From rkennke at openjdk.java.net Wed Jan 27 15:47:55 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 27 Jan 2021 15:47:55 GMT Subject: RFR: 8260497: Shenandoah: Improve SATB flushing In-Reply-To: References: Message-ID: On Wed, 27 Jan 2021 10:31:47 GMT, Aleksey Shipilev wrote: >> Currently, we periodically force flushing of SATB queues. This works by activating a flag every 100ms in every thread, which causes that thread to enqueue its SATB buffer the next time it overflows, even if it doesn't meet its threshold after filtering. This is somewhat problematic when a thread does not actually overflow its SATB queue in time. The whole point of the exercise is to try and avoid having too much left-over work when we reach final-mark. >> >> We can do better than that: when concurrent mark is done we can handshake all threads, and let them flush their respective SATB queues, and re-enter concurrent mark loop again, until flushing yields no more work. Experiments show that it usually takes 1-3 flushes to clean out leftover work properly. >> >> I ran benchmarks, 3 high-pressure preset runs of SPECjbb2015, 10 minutes each: >> >> baseline: >> Finish Mark = 0,251 s (a = 688 us) (n = 364) (lvls, us = 125, 486, 621, 824, 4156) >> Finish Mark = 0,338 s (a = 922 us) (n = 366) (lvls, us = 131, 494, 652, 852, 72948) >> Finish Mark = 0,257 s (a = 699 us) (n = 368) (lvls, us = 111, 492, 645, 826, 4447) >> >> patched: >> Finish Mark = 0,112 s (a = 301 us) (n = 370) (lvls, us = 115, 207, 250, 281, 3709) >> Finish Mark = 0,107 s (a = 292 us) (n = 368) (lvls, us = 107, 209, 248, 287, 3329) >> Finish Mark = 0,114 s (a = 310 us) (n = 367) (lvls, us = 115, 211, 254, 285, 3819) >> >> It reliably lowers all timings for finish-mark. It also doesn't cause any regressions in throughput. >> >> Testing: >> - [x] hotspot_gc_shenandoah >> - [x] benchmarks > > src/hotspot/share/gc/shared/satbMarkQueue.hpp line 118: > >> 116: // Return true if the queue's buffer should be enqueued, even if not full. >> 117: // The default method uses the buffer enqueue threshold. >> 118: bool should_enqueue_buffer(SATBMarkQueue& queue); > > Why drop `virtual` here? Is it because Shenandoah was the only virtual override of it, and now we can do the non-virtual call? Yes. IIRC we introduced that when we upstreamed Shenandoah, and can drop it again, thus restoring the original non-virtual version. > src/hotspot/share/gc/shenandoah/shenandoahSATBMarkQueueSet.cpp line 59: > >> 57: void ShenandoahSATBMarkQueueSet::enqueue_completed_buffer(BufferNode* node) { >> 58: SATBMarkQueueSet::enqueue_completed_buffer(node); >> 59: Atomic::inc(&_enqueued_count); > > I believe `SATBMarkQueueSet` already tracks this, and we could instead use `SATBMarkQueueSet::completed_buffers_num`? Ohh nice! Will give it a try! > src/hotspot/share/gc/shenandoah/shenandoahSATBMarkQueueSet.hpp line 35: > >> 33: class ShenandoahSATBMarkQueueSet : public SATBMarkQueueSet { >> 34: private: >> 35: volatile int _enqueued_count; > > I have a suspicion that `int` would overflow at some point in the long-running application. `size_t` would fit better, but then see the other comment that `SATBMQ` already tracks it itself. Right. (I only ever compare before != after, so overflow would be ok, but it doesn't matter b/c I'll remove it) ------------- PR: https://git.openjdk.java.net/jdk/pull/2254 From shade at openjdk.java.net Wed Jan 27 15:47:55 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 27 Jan 2021 15:47:55 GMT Subject: RFR: 8260497: Shenandoah: Improve SATB flushing In-Reply-To: References: Message-ID: On Wed, 27 Jan 2021 10:50:21 GMT, Roman Kennke wrote: >> src/hotspot/share/gc/shared/satbMarkQueue.hpp line 118: >> >>> 116: // Return true if the queue's buffer should be enqueued, even if not full. >>> 117: // The default method uses the buffer enqueue threshold. >>> 118: bool should_enqueue_buffer(SATBMarkQueue& queue); >> >> Why drop `virtual` here? Is it because Shenandoah was the only virtual override of it, and now we can do the non-virtual call? > > Yes. IIRC we introduced that when we upstreamed Shenandoah, and can drop it again, thus restoring the original non-virtual version. Okay then! ------------- PR: https://git.openjdk.java.net/jdk/pull/2254 From shade at openjdk.java.net Wed Jan 27 15:47:56 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 27 Jan 2021 15:47:56 GMT Subject: RFR: 8260497: Shenandoah: Improve SATB flushing In-Reply-To: References: Message-ID: On Wed, 27 Jan 2021 11:05:51 GMT, Aleksey Shipilev wrote: >> Currently, we periodically force flushing of SATB queues. This works by activating a flag every 100ms in every thread, which causes that thread to enqueue its SATB buffer the next time it overflows, even if it doesn't meet its threshold after filtering. This is somewhat problematic when a thread does not actually overflow its SATB queue in time. The whole point of the exercise is to try and avoid having too much left-over work when we reach final-mark. >> >> We can do better than that: when concurrent mark is done we can handshake all threads, and let them flush their respective SATB queues, and re-enter concurrent mark loop again, until flushing yields no more work. Experiments show that it usually takes 1-3 flushes to clean out leftover work properly. >> >> I ran benchmarks, 3 high-pressure preset runs of SPECjbb2015, 10 minutes each: >> >> baseline: >> Finish Mark = 0,251 s (a = 688 us) (n = 364) (lvls, us = 125, 486, 621, 824, 4156) >> Finish Mark = 0,338 s (a = 922 us) (n = 366) (lvls, us = 131, 494, 652, 852, 72948) >> Finish Mark = 0,257 s (a = 699 us) (n = 368) (lvls, us = 111, 492, 645, 826, 4447) >> >> patched: >> Finish Mark = 0,112 s (a = 301 us) (n = 370) (lvls, us = 115, 207, 250, 281, 3709) >> Finish Mark = 0,107 s (a = 292 us) (n = 368) (lvls, us = 107, 209, 248, 287, 3329) >> Finish Mark = 0,114 s (a = 310 us) (n = 367) (lvls, us = 115, 211, 254, 285, 3819) >> >> It reliably lowers all timings for finish-mark. It also doesn't cause any regressions in throughput. >> >> Testing: >> - [x] hotspot_gc_shenandoah >> - [x] benchmarks > > src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 270: > >> 268: enqueued_count_after = qset.completed_buffers_num(); >> 269: flushes++; >> 270: } while (enqueued_count_before != enqueued_count_after && flushes < max_flushes); > > So, how does this interact with cancellation? Shouldn't we check for `cancelled_gc()` here as well? I think this would be cleaner: ShenandoahFlushSATBHandshakeClosure flush_satb(qset); for (int flushes = 0; flushes < ShenandoahMaxSATBBufferFlushes; flushes++) { TaskTerminator terminator(nworkers, task_queues()); ShenandoahConcurrentMarkingTask task(this, &terminator); workers->run_task(&task); if (cancelled_gc()) { // GC is cancelled, break out. break; } int before = qset.completed_buffers_num(); Handshake::execute(&flush_satb); int after = qset.completed_buffers_num(); if (before == after) { // No more retries needed, break out. break; } } ------------- PR: https://git.openjdk.java.net/jdk/pull/2254 From zgu at openjdk.java.net Wed Jan 27 15:51:49 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 27 Jan 2021 15:51:49 GMT Subject: RFR: 8255837: Shenandoah: Remove ShenandoahConcurrentRoots class Message-ID: <0qG_GQeV5H4SJ-at5sZYqb9DfQXxB8iTKqu5Lrq9cX0=.dc182daf-dd84-453c-9f3b-e9cf81acfc02@github.com> The class was introduced for 2 purposes: 1) a platform supports concurrent class unloading (e.g. the platform supports nmethod_entry_barrier) 2) should perform concurrent class unloading for particular gc cycle (e.g. STW vs. concurrent GC) Now, concurrent class unloading is supported on all Shenandoah supported platforms. Furthermore, STW and concurrent GC are isolated (JDK-8255765), the class becomes superfluous. Test: - [x] hotspot_gc_shenandoah Linux x86_64 and x86_32 - [x] nightly ------------- Commit messages: - Fixed copyright years - Fixed merge issue - Update - Merge - ClassUnloading -> heap->unload_classes() - JDK-8255837 - Merge master - Merge branch 'JDK-8255765-isolate-gcs' into JDK-8256298-conc-stack-proc - Fixed indentation - More from Aleksey's review - ... and 127 more: https://git.openjdk.java.net/jdk/compare/fd00ed74...172438ec Changes: https://git.openjdk.java.net/jdk/pull/2262/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2262&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8255837 Stats: 165 lines in 16 files changed: 0 ins; 124 del; 41 mod Patch: https://git.openjdk.java.net/jdk/pull/2262.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2262/head:pull/2262 PR: https://git.openjdk.java.net/jdk/pull/2262 From amith.pawar at gmail.com Wed Jan 27 16:03:21 2021 From: amith.pawar at gmail.com (Amit Pawar) Date: Wed, 27 Jan 2021 21:33:21 +0530 Subject: RFR: Convert old-gen single threaded pretouch to multi-threaded during In-Reply-To: <92A36846-F059-47A4-8AEF-086135651CED@oracle.com> References: <92A36846-F059-47A4-8AEF-086135651CED@oracle.com> Message-ID: Thanks Kim for replying. On Sun, Jan 24, 2021 at 8:07 PM Kim Barrett wrote: > > On Jan 8, 2021, at 8:08 AM, Amit Pawar wrote: > > > > Hi > > > > I am trying to improve the pre-touch time taken during old-gen resizing. > > Need your suggestions whether following change will be accepted or not. > > > > What is happening ? > > Every GC thread resizes the old-gen during object promotion if there is > no > > enough room for the object. After expanding GC thread will pre-touch the > > pages alone and cant pre-touch in parallel using PretouchTask task as it > is > > already executing a GC task. The total GC pause time depends upon resize > > size and number of resizes. > > > > What is fix? > > Create another WorkGang and then GC thread can execute pre-touch task > with > > this new WorkGang to reduce the pre-touch time taken. The code change is > > given below. > > I don't think adding a work gang is the right approach here. The threads > in > that new work gang may just end up competing for CPUs with the already > in-progress work gang doing the normal GC work. > A better approach would be to refactor pretouch parallization to allow > threads to join the fray as needed. Then arrange for the in-progress work > gang threads to join the pretouch if they would otherwise be waiting for it > to complete. > > OK. > I've recently been looking at the relevant parts of ParallelGC, and it > looks > like it shouldn't be too hard to allow threads waiting for expansion to > cooperate in any ongoing pretouch, esp. after some other recent RFEs have > been dealt with. I've filed JDK-8260332 for this. I haven't looked at the > G1 > side of things yet. > It will be useful if you can share those RFEs or suggest which part of the code to refer to. Thanks, Amit From zgu at openjdk.java.net Wed Jan 27 16:04:40 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 27 Jan 2021 16:04:40 GMT Subject: RFR: 8260497: Shenandoah: Improve SATB flushing In-Reply-To: References: Message-ID: <_A61XN_bwigElezlBwG3GVF3up9pCswYqeOI8VjUN8k=.699ddc36-faf8-4485-92b4-e2781040f56d@github.com> On Wed, 27 Jan 2021 10:15:19 GMT, Roman Kennke wrote: > Currently, we periodically force flushing of SATB queues. This works by activating a flag every 100ms in every thread, which causes that thread to enqueue its SATB buffer the next time it overflows, even if it doesn't meet its threshold after filtering. This is somewhat problematic when a thread does not actually overflow its SATB queue in time. The whole point of the exercise is to try and avoid having too much left-over work when we reach final-mark. > > We can do better than that: when concurrent mark is done we can handshake all threads, and let them flush their respective SATB queues, and re-enter concurrent mark loop again, until flushing yields no more work. Experiments show that it usually takes 1-3 flushes to clean out leftover work properly. > > I ran benchmarks, 3 high-pressure preset runs of SPECjbb2015, 10 minutes each: > > baseline: > Finish Mark = 0,251 s (a = 688 us) (n = 364) (lvls, us = 125, 486, 621, 824, 4156) > Finish Mark = 0,338 s (a = 922 us) (n = 366) (lvls, us = 131, 494, 652, 852, 72948) > Finish Mark = 0,257 s (a = 699 us) (n = 368) (lvls, us = 111, 492, 645, 826, 4447) > > patched: > Finish Mark = 0,112 s (a = 301 us) (n = 370) (lvls, us = 115, 207, 250, 281, 3709) > Finish Mark = 0,107 s (a = 292 us) (n = 368) (lvls, us = 107, 209, 248, 287, 3329) > Finish Mark = 0,114 s (a = 310 us) (n = 367) (lvls, us = 115, 211, 254, 285, 3819) > > It reliably lowers all timings for finish-mark. It also doesn't cause any regressions in throughput. > > Testing: > - [x] hotspot_gc_shenandoah > - [x] benchmarks Marked as reviewed by zgu (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2254 From zgu at openjdk.java.net Wed Jan 27 18:21:59 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 27 Jan 2021 18:21:59 GMT Subject: RFR: 8260004: Shenandoah: Rename ShenandoahMarkCompact to ShenandoahFullGC Message-ID: Please review this patch that renames ShenandoahMarkCompact to ShenandoahFullGC, to be consistent with other GCs. ------------- Commit messages: - JDK-8260004-rename-fullgc Changes: https://git.openjdk.java.net/jdk/pull/2266/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2266&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260004 Stats: 38 lines in 9 files changed: 4 ins; 6 del; 28 mod Patch: https://git.openjdk.java.net/jdk/pull/2266.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2266/head:pull/2266 PR: https://git.openjdk.java.net/jdk/pull/2266 From zgu at openjdk.java.net Wed Jan 27 18:27:00 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 27 Jan 2021 18:27:00 GMT Subject: RFR: 8260004: Shenandoah: Rename ShenandoahMarkCompact to ShenandoahFullGC [v2] In-Reply-To: References: Message-ID: <8OECF112bR0GUAr_PNDnJReQUqAgAQ687_os35TMeaY=.c061338c-6ddb-4f1f-a68f-b70de8b495e3@github.com> > Please review this patch that renames ShenandoahMarkCompact to ShenandoahFullGC, to be consistent with other GCs. Zhengyu Gu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - Merge master - JDK-8260004-rename-fullgc ------------- Changes: https://git.openjdk.java.net/jdk/pull/2266/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2266&range=01 Stats: 38 lines in 9 files changed: 4 ins; 6 del; 28 mod Patch: https://git.openjdk.java.net/jdk/pull/2266.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2266/head:pull/2266 PR: https://git.openjdk.java.net/jdk/pull/2266 From rkennke at openjdk.java.net Wed Jan 27 19:02:39 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 27 Jan 2021 19:02:39 GMT Subject: RFR: 8260004: Shenandoah: Rename ShenandoahMarkCompact to ShenandoahFullGC In-Reply-To: References: Message-ID: On Wed, 27 Jan 2021 18:16:09 GMT, Zhengyu Gu wrote: > Please review this patch that renames ShenandoahMarkCompact to ShenandoahFullGC, to be consistent with other GCs. This nomenclature might become problematic when generational Shenandoah becomes a thing. Then, what we do now, collecting the complete heap, as opposed to only the young generation, might be mistaken as 'full gc' too. ------------- PR: https://git.openjdk.java.net/jdk/pull/2266 From kim.barrett at oracle.com Wed Jan 27 20:38:23 2021 From: kim.barrett at oracle.com (Kim Barrett) Date: Wed, 27 Jan 2021 15:38:23 -0500 Subject: RFR: Convert old-gen single threaded pretouch to multi-threaded during In-Reply-To: References: <92A36846-F059-47A4-8AEF-086135651CED@oracle.com> Message-ID: <60BBB17B-235D-490C-B691-65A48BBEE129@oracle.com> > On Jan 27, 2021, at 11:03 AM, Amit Pawar wrote: > On Sun, Jan 24, 2021 at 8:07 PM Kim Barrett wrote: >> I've recently been looking at the relevant parts of ParallelGC, and it looks >> like it shouldn't be too hard to allow threads waiting for expansion to >> cooperate in any ongoing pretouch, esp. after some other recent RFEs have >> been dealt with. I've filed JDK-8260332 for this. I haven't looked at the G1 >> side of things yet. > It will be useful if you can share those RFEs or suggest which part of the code to refer to. See the linked issues for JDK-8260332. Also JDK-8259776, JDK-8259778, and JDK-8259862, that are all somewhat precursors, or at least I plan to deal with them first. From zgu at openjdk.java.net Wed Jan 27 21:04:39 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 27 Jan 2021 21:04:39 GMT Subject: RFR: 8260004: Shenandoah: Rename ShenandoahMarkCompact to ShenandoahFullGC In-Reply-To: References: Message-ID: On Wed, 27 Jan 2021 18:59:51 GMT, Roman Kennke wrote: > This nomenclature might become problematic when generational Shenandoah becomes a thing. Then, what we do now, collecting the complete heap, as opposed to only the young generation, might be mistaken as 'full gc' too. Does "full GC" always means complete heap collection? G1 calls it FullCollector and supporting classes all prefixing with FullGC ... ------------- PR: https://git.openjdk.java.net/jdk/pull/2266 From kbarrett at openjdk.java.net Wed Jan 27 23:12:46 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 27 Jan 2021 23:12:46 GMT Subject: RFR: 8259778: Merge MutableSpace and ImmutableSpace Message-ID: Please review this change which merges ImmutableSpace into MutableSpace, eliminating the former. There were no interesting uses of ImmutableSpace, other than as the base class for MutableSpace. The name ImmutableSpace is kind of a misnomer given that usage. Testing: mach5 tier1-3 ------------- Commit messages: - remove immutablespace Changes: https://git.openjdk.java.net/jdk/pull/2271/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2271&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8259778 Stats: 325 lines in 8 files changed: 51 ins; 258 del; 16 mod Patch: https://git.openjdk.java.net/jdk/pull/2271.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2271/head:pull/2271 PR: https://git.openjdk.java.net/jdk/pull/2271 From sspitsyn at openjdk.java.net Wed Jan 27 23:55:39 2021 From: sspitsyn at openjdk.java.net (Serguei Spitsyn) Date: Wed, 27 Jan 2021 23:55:39 GMT Subject: RFR: 8259778: Merge MutableSpace and ImmutableSpace In-Reply-To: References: Message-ID: On Wed, 27 Jan 2021 23:06:41 GMT, Kim Barrett wrote: > Please review this change which merges ImmutableSpace into MutableSpace, > eliminating the former. There were no interesting uses of ImmutableSpace, > other than as the base class for MutableSpace. The name ImmutableSpace is > kind of a misnomer given that usage. > > Testing: > mach5 tier1-3 LGTM ------------- Marked as reviewed by sspitsyn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2271 From mli at openjdk.java.net Thu Jan 28 00:48:39 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Thu, 28 Jan 2021 00:48:39 GMT Subject: RFR: JDK-8260200 G1: Remove unnecessary update in FreeRegionList::remove_starting_at [v5] In-Reply-To: References: <-mD5dYNjnUozTsx1G0PdpULRyFuc3sqsQSGR0-AZOpo=.2f140db3-0bc1-4412-9627-692fe9f3cb0a@github.com> <5i7qxWOLTHkqgMR8SPriW1XN0e1gO_UBsiIA_XYoDyE=.e515eb2c-9064-45de-8271-e983f31ba098@github.com> Message-ID: On Wed, 27 Jan 2021 15:25:27 GMT, Thomas Schatzl wrote: > You have two "R"eviewers now :) The official rule is one "R"eviewer, and (currently) one "C"ommitter, but I am good with just a second reviewer. Thanks Thomas! :-) ------------- PR: https://git.openjdk.java.net/jdk/pull/2181 From mli at openjdk.java.net Thu Jan 28 00:48:40 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Thu, 28 Jan 2021 00:48:40 GMT Subject: Integrated: JDK-8260200 G1: Remove unnecessary update in FreeRegionList::remove_starting_at In-Reply-To: References: Message-ID: On Thu, 21 Jan 2021 11:12:18 GMT, Hamlin Li wrote: > optimize FreeRegionList::remove_starting_at by removing unnecessary reading and setting > > FreeRegionList::remove_starting_at(...) traverses from a node and removes subsequent N nodes from free list. But when traverses the free list, it removes nodes one by one by setting the prev and next pointers of prev and next node. it's not necessary do these settings for every node, as we can remove target nodes at once and just set prev and next pointers for just 2 nodes. This pull request has now been integrated. Changeset: 7030d2e0 Author: Hamlin Li URL: https://git.openjdk.java.net/jdk/commit/7030d2e0 Stats: 60 lines in 2 files changed: 38 ins; 19 del; 3 mod 8260200: G1: Remove unnecessary update in FreeRegionList::remove_starting_at Reviewed-by: ayang, sjohanss, tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/2181 From kbarrett at openjdk.java.net Thu Jan 28 04:22:52 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 28 Jan 2021 04:22:52 GMT Subject: RFR: 8259487: Remove unused StarTask Message-ID: Please review this change which removes the StarTask class. It was superseded by ScannerTask in JDK-8244684 and JDK-8245022, and is no longer used. Testing: mach5 tier1 ------------- Commit messages: - remove StarTask Changes: https://git.openjdk.java.net/jdk/pull/2277/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2277&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8259487 Stats: 31 lines in 1 file changed: 0 ins; 30 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2277.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2277/head:pull/2277 PR: https://git.openjdk.java.net/jdk/pull/2277 From iklam at openjdk.java.net Thu Jan 28 04:30:41 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 28 Jan 2021 04:30:41 GMT Subject: RFR: 8259487: Remove unused StarTask In-Reply-To: References: Message-ID: On Thu, 28 Jan 2021 04:17:02 GMT, Kim Barrett wrote: > Please review this change which removes the StarTask class. It was > superseded by ScannerTask in JDK-8244684 and JDK-8245022, and is no longer > used. > > Testing: > mach5 tier1 LGTM ------------- Marked as reviewed by iklam (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2277 From dholmes at openjdk.java.net Thu Jan 28 05:22:39 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Thu, 28 Jan 2021 05:22:39 GMT Subject: RFR: 8259778: Merge MutableSpace and ImmutableSpace In-Reply-To: References: Message-ID: On Wed, 27 Jan 2021 23:06:41 GMT, Kim Barrett wrote: > Please review this change which merges ImmutableSpace into MutableSpace, > eliminating the former. There were no interesting uses of ImmutableSpace, > other than as the base class for MutableSpace. The name ImmutableSpace is > kind of a misnomer given that usage. > > Testing: > mach5 tier1-3 This looks good to me. And I deifnitely agree that a MutableSpace is-not-a ImmutableSpace! One minor nit below. Thanks, David src/hotspot/share/gc/parallel/mutableSpace.hpp line 47: > 45: // > 46: // Invariant: bottom() <= top() <= end() > 47: // top() is inclusive and end() is exclusive. If end() is exclusive then shouldn't the invariant be `< end()`? ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2271 From shade at openjdk.java.net Thu Jan 28 07:59:42 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 28 Jan 2021 07:59:42 GMT Subject: RFR: 8260004: Shenandoah: Rename ShenandoahMarkCompact to ShenandoahFullGC [v2] In-Reply-To: <8OECF112bR0GUAr_PNDnJReQUqAgAQ687_os35TMeaY=.c061338c-6ddb-4f1f-a68f-b70de8b495e3@github.com> References: <8OECF112bR0GUAr_PNDnJReQUqAgAQ687_os35TMeaY=.c061338c-6ddb-4f1f-a68f-b70de8b495e3@github.com> Message-ID: On Wed, 27 Jan 2021 18:27:00 GMT, Zhengyu Gu wrote: >> Please review this patch that renames ShenandoahMarkCompact to ShenandoahFullGC, to be consistent with other GCs. > > Zhengyu Gu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge master > - JDK-8260004-rename-fullgc Marked as reviewed by shade (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2266 From pliden at openjdk.java.net Thu Jan 28 08:01:47 2021 From: pliden at openjdk.java.net (Per Liden) Date: Thu, 28 Jan 2021 08:01:47 GMT Subject: [jdk16] Integrated: 8259765: ZGC: Handle incorrect processor id reported by the operating system In-Reply-To: References: Message-ID: On Fri, 15 Jan 2021 13:48:26 GMT, Per Liden wrote: > Some environments (e.g. OpenVZ containers) incorrectly report a logical processor id that is higher than the number of processors available. This is problematic, for example, when implementing CPU-local data structures, where the processor id is used to index into an array of length processor_count(). > > We've received crash reports from Jelastic (a Virtuozzo/OpenVZ user) where they run into this problem. We can workaround the problem in the JVM, until the underlying problem is fixed. Without this workaround ZGC can't be used in this environment. > > This is currently a ZGC-specific issue, since ZGC is currently the only part of HotSpot that is using CPU-local data structures, but that could change in the future. > > Just to clarify. In a Virtuozzo/OpenZV environment, it seems the underlying problem is not necessarily that sched_getcpu() returns an incorrect processor id, but rather that sysconf(_SC_NPROCESSORS_CONF) returns a too low number. Either way, sched_getcpu() and syconf(_SC_NPROCESSORS_CONF) seems to have different views of the world. This is not an issue in container environments such as Docker. > > This patch works around this problem by letting os::processor_id() on Linux detect incorrect processor ids, and convert them to processor id 0. As mentioned in the comment in the code, this is safe, but not optimal for performance if the system actually has more than one processor. There's also a warning printed the first time this happen. > > Testing: Manual testing with various fake/incorrect values returned from sched_getcpu(). This pull request has now been integrated. Changeset: e68eac9c Author: Per Liden URL: https://git.openjdk.java.net/jdk16/commit/e68eac9c Stats: 37 lines in 1 file changed: 29 ins; 2 del; 6 mod 8259765: ZGC: Handle incorrect processor id reported by the operating system Reviewed-by: ayang, eosterlund ------------- PR: https://git.openjdk.java.net/jdk16/pull/124 From pliden at openjdk.java.net Thu Jan 28 08:01:46 2021 From: pliden at openjdk.java.net (Per Liden) Date: Thu, 28 Jan 2021 08:01:46 GMT Subject: [jdk16] RFR: 8259765: ZGC: Handle incorrect processor id reported by the operating system [v2] In-Reply-To: <75qnWumnRn-LDG_hwMAxgqeQCn0HtJWEPadgb9i2_qE=.376ff161-9d5f-4332-9719-a4a5d2beae00@github.com> References: <75qnWumnRn-LDG_hwMAxgqeQCn0HtJWEPadgb9i2_qE=.376ff161-9d5f-4332-9719-a4a5d2beae00@github.com> Message-ID: On Fri, 22 Jan 2021 11:18:55 GMT, Per Liden wrote: >> So we have to penalize all correctly functioning users because of one broken environment? Can we not detect this broken environment at startup and inject a workaround then? >> >> Why is this an environment that is important enough that OpenJDK has to make changes to deal with a broken environment? >> >> Cheers, >> David > > @dholmes-ora Do you still have questions or concerns here, or can I go ahead and integrate this? > > I've gone through all uses of sysconf(_SC_NPROCESSORS_*) and sched_getaffinity() we have, and they look fine. I've also looked at how the OSContainer stuff behaves in this environment, and it also looks fine. In summary, the only problem I can spot is related to sched_getcpu(). Ok, thanks all for reviewing. ------------- PR: https://git.openjdk.java.net/jdk16/pull/124 From shade at openjdk.java.net Thu Jan 28 07:59:43 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 28 Jan 2021 07:59:43 GMT Subject: RFR: 8260004: Shenandoah: Rename ShenandoahMarkCompact to ShenandoahFullGC In-Reply-To: References: Message-ID: On Wed, 27 Jan 2021 21:02:24 GMT, Zhengyu Gu wrote: >> This nomenclature might become problematic when generational Shenandoah becomes a thing. Then, what we do now, collecting the complete heap, as opposed to only the young generation, might be mistaken as 'full gc' too. > >> This nomenclature might become problematic when generational Shenandoah becomes a thing. Then, what we do now, collecting the complete heap, as opposed to only the young generation, might be mistaken as 'full gc' too. > > Does "full GC" always mean complete heap collection? G1 calls it FullCollector and supporting classes all prefix with FullGC ... I vote for this rename. We already report "Pause Full" for this operation, so it is "Full GC". The generational extension would probably call their phases "young" and "mixed" concurrent GC? ------------- PR: https://git.openjdk.java.net/jdk/pull/2266 From neliasso at openjdk.java.net Thu Jan 28 08:10:40 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Thu, 28 Jan 2021 08:10:40 GMT Subject: RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled In-Reply-To: References: Message-ID: On Wed, 27 Jan 2021 10:05:56 GMT, ?? wrote: > https://bugs.openjdk.java.net/browse/JDK-8260473 > > Function "PhaseVector::expand_vunbox_node" creates a LoadNode, but forgets to make the LoadNode to pass gc barriers. > > > Testing: all Vector API related tests have passed. The patch looks good. Please turn the reproducer into a regression test for this bug. Contact me if you need any help with that. ------------- PR: https://git.openjdk.java.net/jdk/pull/2253 From tschatzl at openjdk.java.net Thu Jan 28 08:37:40 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 28 Jan 2021 08:37:40 GMT Subject: RFR: 8259487: Remove unused StarTask In-Reply-To: References: Message-ID: <90RyUuGd8sDBeWjKWIrsz549aXYm8ZXPr1e-PtRK_Oc=.2be447d1-aecd-4d59-b38e-322a66b42722@github.com> On Thu, 28 Jan 2021 04:17:02 GMT, Kim Barrett wrote: > Please review this change which removes the StarTask class. It was > superseded by ScannerTask in JDK-8244684 and JDK-8245022, and is no longer > used. > > Testing: > mach5 tier1 Looks good and trivial. Thanks. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2277 From github.com+779991+jaokim at openjdk.java.net Thu Jan 28 08:44:54 2021 From: github.com+779991+jaokim at openjdk.java.net (Joakim =?UTF-8?B?Tm9yZHN0csO2bQ==?=) Date: Thu, 28 Jan 2021 08:44:54 GMT Subject: RFR: 8217327: G1 Post-Cleanup region liveness printing should not print out-of-date efficiency [v3] In-Reply-To: References: Message-ID: > **Description** > This fix addresses the issue where gc-efficiency is printed incorrectly when logging post-marking and post-cleanup. The gc-efficiency is calculated in the end of the marking phase, to be logged in the post-cleanup section. It is however not reset, meaning that next phase's post-marking log will show the old value. > > - The gc-efficiency is initialized to -1 when it hasn't been calculated. > - Negative gc-efficiency is displayed as a hyphen "-" in the summary. > - The gc-efficiency is reset to -1 in `HeapRegion::note_start_of_marking()` > > **Note:** there is a sister issue that moves the post-cleanup printing to a later stage. Without this fix, the logging will still be incorrect, so both fixes are needed. See: [JDK-8260042: G1 Post-cleanup liveness printing occurs too early](https://github.com/openjdk/jdk/pull/2168) > > This fix has been tested together with the above mentioned fix. > > **Example** > This is what logging like after fix has been applied. > ### PHASE Post-Marking @ 410.303 > ### HEAP reserved: 0x0ffc00000-0x100000000 region-size: 1048576 > ### > ### type address-range used prev-live next-live gc-eff remset state code-roots > ### (bytes) (bytes) (bytes) (bytes/ms) (bytes) (bytes) > ### OLD 0x0ffc00000-0x0ffd00000 1048368 1048368 1048368 - 8464 UPDAT 6096 > ### OLD 0x0ffd00000-0x0ffe00000 132856 132856 132856 - 2544 UPDAT 16 > ### SURV 0x0ffe00000-0x0fff00000 21368 21368 21368 - 2544 CMPLT 16 > ### FREE 0x0fff00000-0x100000000 0 0 0 - 2384 UNTRA 16 > ### > ### SUMMARY capacity: 4.00 MB used: 1.15 MB / 28.67 % prev-live: 1.15 MB / 28.67 % next-live: 1.15 MB / 28.67 % remset: 0.02 MB code-roots: 0.01 MB > ### PHASE Post-Cleanup @ 410.305 > ### HEAP reserved: 0x0ffc00000-0x100000000 region-size: 1048576 > ### > ### type address-range used prev-live next-live gc-eff remset state code-roots > ### (bytes) (bytes) (bytes) (bytes/ms) (bytes) (bytes) > ### OLD 0x0ffc00000-0x0ffd00000 1048368 1048368 1048368 - 8624 UNTRA 6096 > ### OLD 0x0ffd00000-0x0ffe00000 132856 132856 132856 1352923.9 2544 CMPLT 16 > ### SURV 0x0ffe00000-0x0fff00000 21368 21368 21368 - 2544 CMPLT 16 > ### FREE 0x0fff00000-0x100000000 0 0 0 - 2384 UNTRA 16 > ### > ### SUMMARY capacity: 4.00 MB used: 1.15 MB / 28.67 % prev-live: 1.15 MB / 28.67 % next-live: 1.15 MB / 28.67 % remset: 0.02 MB code-roots: 0.01 MB > ### PHASE Post-Marking @ 450.310 > ### HEAP reserved: 0x0ffc00000-0x100000000 region-size: 1048576 > ### > ### type address-range used prev-live next-live gc-eff remset state code-roots > ### (bytes) (bytes) (bytes) (bytes/ms) (bytes) (bytes) > ### OLD 0x0ffc00000-0x0ffd00000 1048368 1048368 1048368 - 8624 UPDAT 6096 > ### OLD 0x0ffd00000-0x0ffe00000 174456 174456 174456 - 2544 UPDAT 16 > ### SURV 0x0ffe00000-0x0fff00000 21368 21368 21368 - 2544 CMPLT 16 > ### FREE 0x0fff00000-0x100000000 0 0 0 - 2384 UNTRA 16 > ### > ### SUMMARY capacity: 4.00 MB used: 1.19 MB / 29.66 % prev-live: 1.19 MB / 29.66 % next-live: 1.19 MB / 29.66 % remset: 0.02 MB code-roots: 0.01 MB > ### PHASE Post-Cleanup @ 450.312 > ### HEAP reserved: 0x0ffc00000-0x100000000 region-size: 1048576 > ### > ### type address-range used prev-live next-live gc-eff remset state code-roots > ### (bytes) (bytes) (bytes) (bytes/ms) (bytes) (bytes) > ### OLD 0x0ffc00000-0x0ffd00000 1048368 1048368 1048368 - 8624 UNTRA 6096 > ### OLD 0x0ffd00000-0x0ffe00000 174456 174456 174456 1266519.2 2544 CMPLT 16 > ### SURV 0x0ffe00000-0x0fff00000 21368 21368 21368 - 2544 CMPLT 16 > ### FREE 0x0fff00000-0x100000000 0 0 0 - 2384 UNTRA 16 > ### > > **Testing** > - Manual testing > - hs-tier1, hs-tier2 Joakim Nordstr?m has updated the pull request incrementally with one additional commit since the last revision: Added format buffer and fixed duplicated code ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2217/files - new: https://git.openjdk.java.net/jdk/pull/2217/files/24361880..97b022ff Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2217&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2217&range=01-02 Stats: 44 lines in 1 file changed: 17 ins; 25 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2217.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2217/head:pull/2217 PR: https://git.openjdk.java.net/jdk/pull/2217 From github.com+779991+jaokim at openjdk.java.net Thu Jan 28 08:44:56 2021 From: github.com+779991+jaokim at openjdk.java.net (Joakim =?UTF-8?B?Tm9yZHN0csO2bQ==?=) Date: Thu, 28 Jan 2021 08:44:56 GMT Subject: RFR: 8217327: G1 Post-Cleanup region liveness printing should not print out-of-date efficiency [v2] In-Reply-To: References: Message-ID: On Tue, 26 Jan 2021 09:14:20 GMT, Thomas Schatzl wrote: >> Joakim Nordstr?m has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixed copyright year. > > src/hotspot/share/gc/g1/g1ConcurrentMark.cpp line 2981: > >> 2979: >> 2980: // Print a line for this particular region. >> 2981: if(gc_eff < 0) { > > I would prefer instead of the code duplication, use a `%s` format specifier for the efficiency, and a `FormatBuffer` to format the actual string into it. This should result in much shorter code. Yes, I absolutely agree! Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/2217 From vlivanov at openjdk.java.net Thu Jan 28 09:04:42 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Thu, 28 Jan 2021 09:04:42 GMT Subject: RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled In-Reply-To: References: Message-ID: <4GliJeBE2JZSVgnDH5pAMIADyJdgA0YGEcCv6j9GdWY=.5d156f50-6ea9-470a-ad0a-db9056507108@github.com> On Wed, 27 Jan 2021 10:05:56 GMT, ?? wrote: > https://bugs.openjdk.java.net/browse/JDK-8260473 > > Function "PhaseVector::expand_vunbox_node" creates a LoadNode, but forgets to make the LoadNode to pass gc barriers. > > > Testing: all Vector API related tests have passed. What about migrating it to `GraphKit::access_load_at` instead? ------------- PR: https://git.openjdk.java.net/jdk/pull/2253 From tschatzl at openjdk.java.net Thu Jan 28 09:04:42 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 28 Jan 2021 09:04:42 GMT Subject: RFR: 8259778: Merge MutableSpace and ImmutableSpace In-Reply-To: References: Message-ID: On Wed, 27 Jan 2021 23:06:41 GMT, Kim Barrett wrote: > Please review this change which merges ImmutableSpace into MutableSpace, > eliminating the former. There were no interesting uses of ImmutableSpace, > other than as the base class for MutableSpace. The name ImmutableSpace is > kind of a misnomer given that usage. > > Testing: > mach5 tier1-3 I assume that tier1-3 includes the SA tests :) Looks good other than that nit. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2271 From tschatzl at openjdk.java.net Thu Jan 28 09:09:41 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 28 Jan 2021 09:09:41 GMT Subject: RFR: 8259778: Merge MutableSpace and ImmutableSpace In-Reply-To: References: Message-ID: On Thu, 28 Jan 2021 05:13:57 GMT, David Holmes wrote: >> Please review this change which merges ImmutableSpace into MutableSpace, >> eliminating the former. There were no interesting uses of ImmutableSpace, >> other than as the base class for MutableSpace. The name ImmutableSpace is >> kind of a misnomer given that usage. >> >> Testing: >> mach5 tier1-3 > > src/hotspot/share/gc/parallel/mutableSpace.hpp line 47: > >> 45: // >> 46: // Invariant: bottom() <= top() <= end() >> 47: // top() is inclusive and end() is exclusive. > > If end() is exclusive then shouldn't the invariant be `< end()`? I also think that top() is also exclusive as in other collectors. @dholmes-ora : e.g. bottom == top == end means the space is empty. These two lines are not disagreeing with each other. ------------- PR: https://git.openjdk.java.net/jdk/pull/2271 From ayang at openjdk.java.net Thu Jan 28 09:15:52 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Thu, 28 Jan 2021 09:15:52 GMT Subject: RFR: 8260574: Remove parallel constructs in GenCollectedHeap::process_roots Message-ID: This PR is broken into two commits for easier reviewing, one for removing the usage of `SubTasksDone` in `GenCollectedHeap`, and the other for removing `StrongRootsScope` in the call chain of `GenCollectedHeap::process_roots`. Tested: hotspot_gc ------------- Commit messages: - StrongRootsScope sequential support - remove SubTasksDone in serial gc Changes: https://git.openjdk.java.net/jdk/pull/2280/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2280&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260574 Stats: 57 lines in 7 files changed: 5 ins; 34 del; 18 mod Patch: https://git.openjdk.java.net/jdk/pull/2280.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2280/head:pull/2280 PR: https://git.openjdk.java.net/jdk/pull/2280 From tschatzl at openjdk.java.net Thu Jan 28 09:20:41 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 28 Jan 2021 09:20:41 GMT Subject: RFR: 8217327: G1 Post-Cleanup region liveness printing should not print out-of-date efficiency [v3] In-Reply-To: References: Message-ID: On Thu, 28 Jan 2021 08:44:54 GMT, Joakim Nordstr?m wrote: >> **Description** >> This fix addresses the issue where gc-efficiency is printed incorrectly when logging post-marking and post-cleanup. The gc-efficiency is calculated in the end of the marking phase, to be logged in the post-cleanup section. It is however not reset, meaning that next phase's post-marking log will show the old value. >> >> - The gc-efficiency is initialized to -1 when it hasn't been calculated. >> - Negative gc-efficiency is displayed as a hyphen "-" in the summary. >> - The gc-efficiency is reset to -1 in `HeapRegion::note_start_of_marking()` >> >> **Note:** there is a sister issue that moves the post-cleanup printing to a later stage. Without this fix, the logging will still be incorrect, so both fixes are needed. See: [JDK-8260042: G1 Post-cleanup liveness printing occurs too early](https://github.com/openjdk/jdk/pull/2168) >> >> This fix has been tested together with the above mentioned fix. >> >> **Example** >> This is what logging like after fix has been applied. >> ### PHASE Post-Marking @ 410.303 >> ### HEAP reserved: 0x0ffc00000-0x100000000 region-size: 1048576 >> ### >> ### type address-range used prev-live next-live gc-eff remset state code-roots >> ### (bytes) (bytes) (bytes) (bytes/ms) (bytes) (bytes) >> ### OLD 0x0ffc00000-0x0ffd00000 1048368 1048368 1048368 - 8464 UPDAT 6096 >> ### OLD 0x0ffd00000-0x0ffe00000 132856 132856 132856 - 2544 UPDAT 16 >> ### SURV 0x0ffe00000-0x0fff00000 21368 21368 21368 - 2544 CMPLT 16 >> ### FREE 0x0fff00000-0x100000000 0 0 0 - 2384 UNTRA 16 >> ### >> ### SUMMARY capacity: 4.00 MB used: 1.15 MB / 28.67 % prev-live: 1.15 MB / 28.67 % next-live: 1.15 MB / 28.67 % remset: 0.02 MB code-roots: 0.01 MB >> ### PHASE Post-Cleanup @ 410.305 >> ### HEAP reserved: 0x0ffc00000-0x100000000 region-size: 1048576 >> ### >> ### type address-range used prev-live next-live gc-eff remset state code-roots >> ### (bytes) (bytes) (bytes) (bytes/ms) (bytes) (bytes) >> ### OLD 0x0ffc00000-0x0ffd00000 1048368 1048368 1048368 - 8624 UNTRA 6096 >> ### OLD 0x0ffd00000-0x0ffe00000 132856 132856 132856 1352923.9 2544 CMPLT 16 >> ### SURV 0x0ffe00000-0x0fff00000 21368 21368 21368 - 2544 CMPLT 16 >> ### FREE 0x0fff00000-0x100000000 0 0 0 - 2384 UNTRA 16 >> ### >> ### SUMMARY capacity: 4.00 MB used: 1.15 MB / 28.67 % prev-live: 1.15 MB / 28.67 % next-live: 1.15 MB / 28.67 % remset: 0.02 MB code-roots: 0.01 MB >> ### PHASE Post-Marking @ 450.310 >> ### HEAP reserved: 0x0ffc00000-0x100000000 region-size: 1048576 >> ### >> ### type address-range used prev-live next-live gc-eff remset state code-roots >> ### (bytes) (bytes) (bytes) (bytes/ms) (bytes) (bytes) >> ### OLD 0x0ffc00000-0x0ffd00000 1048368 1048368 1048368 - 8624 UPDAT 6096 >> ### OLD 0x0ffd00000-0x0ffe00000 174456 174456 174456 - 2544 UPDAT 16 >> ### SURV 0x0ffe00000-0x0fff00000 21368 21368 21368 - 2544 CMPLT 16 >> ### FREE 0x0fff00000-0x100000000 0 0 0 - 2384 UNTRA 16 >> ### >> ### SUMMARY capacity: 4.00 MB used: 1.19 MB / 29.66 % prev-live: 1.19 MB / 29.66 % next-live: 1.19 MB / 29.66 % remset: 0.02 MB code-roots: 0.01 MB >> ### PHASE Post-Cleanup @ 450.312 >> ### HEAP reserved: 0x0ffc00000-0x100000000 region-size: 1048576 >> ### >> ### type address-range used prev-live next-live gc-eff remset state code-roots >> ### (bytes) (bytes) (bytes) (bytes/ms) (bytes) (bytes) >> ### OLD 0x0ffc00000-0x0ffd00000 1048368 1048368 1048368 - 8624 UNTRA 6096 >> ### OLD 0x0ffd00000-0x0ffe00000 174456 174456 174456 1266519.2 2544 CMPLT 16 >> ### SURV 0x0ffe00000-0x0fff00000 21368 21368 21368 - 2544 CMPLT 16 >> ### FREE 0x0fff00000-0x100000000 0 0 0 - 2384 UNTRA 16 >> ### >> >> **Testing** >> - Manual testing >> - hs-tier1, hs-tier2 > > Joakim Nordstr?m has updated the pull request incrementally with one additional commit since the last revision: > > Added format buffer and fixed duplicated code Note that I think the code given is correct, so I am good if you think using `snprintf` is better but I want your feedback about this. Then I'll approve. src/hotspot/share/gc/g1/g1ConcurrentMark.cpp line 2983: > 2981: > 2982: if(gc_eff < 0) { > 2983: snprintf(gc_efficiency, G1PPRL_DOUBLE_FORMAT_LEN+1, G1PPRL_DOUBLE_H_FORMAT, "-"); snprintf is fine with me too, but I had imagined something like this: FormatBuffer<> efficiency(""); // maybe better name this if (gc_eff < 0.0) { efficiency.append("-"); } else { efficiency.append(G1PPRL_DOUBLE_H_FORMAT, gc_eff); } and in the `log_trace` use `%s` and `efficiency.buffer()`. That seems a lot easier to understand (for me) than wrangling with `snprintf`. Maybe there is a reason to not use this? Also, I am not sure that using `G1PPRL_DOUBLE_H_FORMAT` for this `snprintf` here to print only `-` is really correct. I would have naively have expected to require `%s`. ------------- Changes requested by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2217 From rkennke at openjdk.java.net Thu Jan 28 09:53:43 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Thu, 28 Jan 2021 09:53:43 GMT Subject: Integrated: 8260497: Shenandoah: Improve SATB flushing In-Reply-To: References: Message-ID: On Wed, 27 Jan 2021 10:15:19 GMT, Roman Kennke wrote: > Currently, we periodically force flushing of SATB queues. This works by activating a flag every 100ms in every thread, which causes that thread to enqueue its SATB buffer the next time it overflows, even if it doesn't meet its threshold after filtering. This is somewhat problematic when a thread does not actually overflow its SATB queue in time. The whole point of the exercise is to try and avoid having too much left-over work when we reach final-mark. > > We can do better than that: when concurrent mark is done we can handshake all threads, and let them flush their respective SATB queues, and re-enter concurrent mark loop again, until flushing yields no more work. Experiments show that it usually takes 1-3 flushes to clean out leftover work properly. > > I ran benchmarks, 3 high-pressure preset runs of SPECjbb2015, 10 minutes each: > > baseline: > Finish Mark = 0,251 s (a = 688 us) (n = 364) (lvls, us = 125, 486, 621, 824, 4156) > Finish Mark = 0,338 s (a = 922 us) (n = 366) (lvls, us = 131, 494, 652, 852, 72948) > Finish Mark = 0,257 s (a = 699 us) (n = 368) (lvls, us = 111, 492, 645, 826, 4447) > > patched: > Finish Mark = 0,112 s (a = 301 us) (n = 370) (lvls, us = 115, 207, 250, 281, 3709) > Finish Mark = 0,107 s (a = 292 us) (n = 368) (lvls, us = 107, 209, 248, 287, 3329) > Finish Mark = 0,114 s (a = 310 us) (n = 367) (lvls, us = 115, 211, 254, 285, 3819) > > It reliably lowers all timings for finish-mark. It also doesn't cause any regressions in throughput. > > Testing: > - [x] hotspot_gc_shenandoah > - [x] benchmarks This pull request has now been integrated. Changeset: 316d52c1 Author: Roman Kennke URL: https://git.openjdk.java.net/jdk/commit/316d52c1 Stats: 94 lines in 10 files changed: 30 ins; 59 del; 5 mod 8260497: Shenandoah: Improve SATB flushing Reviewed-by: shade, zgu ------------- PR: https://git.openjdk.java.net/jdk/pull/2254 From github.com+779991+jaokim at openjdk.java.net Thu Jan 28 09:55:43 2021 From: github.com+779991+jaokim at openjdk.java.net (Joakim =?UTF-8?B?Tm9yZHN0csO2bQ==?=) Date: Thu, 28 Jan 2021 09:55:43 GMT Subject: RFR: 8217327: G1 Post-Cleanup region liveness printing should not print out-of-date efficiency [v3] In-Reply-To: References: Message-ID: On Thu, 28 Jan 2021 09:14:37 GMT, Thomas Schatzl wrote: >> Joakim Nordstr?m has updated the pull request incrementally with one additional commit since the last revision: >> >> Added format buffer and fixed duplicated code > > src/hotspot/share/gc/g1/g1ConcurrentMark.cpp line 2983: > >> 2981: >> 2982: if(gc_eff < 0) { >> 2983: snprintf(gc_efficiency, G1PPRL_DOUBLE_FORMAT_LEN+1, G1PPRL_DOUBLE_H_FORMAT, "-"); > > snprintf is fine with me too, but I had imagined something like this: > > FormatBuffer<> efficiency(""); // maybe better name this > if (gc_eff < 0.0) { > efficiency.append("-"); > } else { > efficiency.append(G1PPRL_DOUBLE_H_FORMAT, gc_eff); > } > > and in the `log_trace` use `%s` and `efficiency.buffer()`. That seems a lot easier to understand (for me) than wrangling with `snprintf`. > > Maybe there is a reason to not use this? > > Also, I am not sure that using `G1PPRL_DOUBLE_H_FORMAT` for this `snprintf` here to print only `-` is really correct. I would have naively have expected to require `%s`. Again, you're absolutely right. I looked for a FormatBuffer class in the codebase when you mentioned it (kind of what I wanted from the beginning). For whatever reason I couldn't find one (typo, temporary blindness, ignorance??). So I went with snprintf (even though I don't particular like it). I'll fix this. Sorry for all the bother. ------------- PR: https://git.openjdk.java.net/jdk/pull/2217 From rkennke at openjdk.java.net Thu Jan 28 09:55:42 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Thu, 28 Jan 2021 09:55:42 GMT Subject: RFR: 8260004: Shenandoah: Rename ShenandoahMarkCompact to ShenandoahFullGC [v2] In-Reply-To: <8OECF112bR0GUAr_PNDnJReQUqAgAQ687_os35TMeaY=.c061338c-6ddb-4f1f-a68f-b70de8b495e3@github.com> References: <8OECF112bR0GUAr_PNDnJReQUqAgAQ687_os35TMeaY=.c061338c-6ddb-4f1f-a68f-b70de8b495e3@github.com> Message-ID: On Wed, 27 Jan 2021 18:27:00 GMT, Zhengyu Gu wrote: >> Please review this patch that renames ShenandoahMarkCompact to ShenandoahFullGC, to be consistent with other GCs. > > Zhengyu Gu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge master > - JDK-8260004-rename-fullgc Looks good to me! ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2266 From rkennke at openjdk.java.net Thu Jan 28 09:55:45 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Thu, 28 Jan 2021 09:55:45 GMT Subject: RFR: 8260004: Shenandoah: Rename ShenandoahMarkCompact to ShenandoahFullGC [v2] In-Reply-To: References: <8OECF112bR0GUAr_PNDnJReQUqAgAQ687_os35TMeaY=.c061338c-6ddb-4f1f-a68f-b70de8b495e3@github.com> Message-ID: <3IEoNigLVUkdAY5VjNN-NWs2vY99DpWDUmtR9UBa4YU=.24aacfcc-7406-4ce9-9dcf-1863981b821a@github.com> On Thu, 28 Jan 2021 07:57:22 GMT, Aleksey Shipilev wrote: >> Zhengyu Gu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: >> >> - Merge master >> - JDK-8260004-rename-fullgc > > Marked as reviewed by shade (Reviewer). > I vote for this rename. We already report "Pause Full" for this operation, so it is "Full GC". The generational extension would probably call their phases "young" and "mixed" concurrent GC? Yes ok. We can consider what to do when generational GC lands, if it is a problem at all. ------------- PR: https://git.openjdk.java.net/jdk/pull/2266 From tschatzl at openjdk.java.net Thu Jan 28 09:59:41 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 28 Jan 2021 09:59:41 GMT Subject: RFR: 8260574: Remove parallel constructs in GenCollectedHeap::process_roots In-Reply-To: References: Message-ID: <6HXzkr0DUWBD_CWBwp9-EIHCFked8ITdxTOrRobVu8k=.03d0cac5-6121-4198-9710-2933267f68cf@github.com> On Thu, 28 Jan 2021 09:10:52 GMT, Albert Mingkun Yang wrote: > This PR is broken into two commits for easier reviewing, one for removing the usage of `SubTasksDone` in `GenCollectedHeap`, and the other for removing `StrongRootsScope` in the call chain of `GenCollectedHeap::process_roots`. > > Tested: hotspot_gc Lgtm. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2280 From david.holmes at oracle.com Thu Jan 28 10:56:45 2021 From: david.holmes at oracle.com (David Holmes) Date: Thu, 28 Jan 2021 20:56:45 +1000 Subject: RFR: 8259778: Merge MutableSpace and ImmutableSpace In-Reply-To: References: Message-ID: On 28/01/2021 7:09 pm, Thomas Schatzl wrote: > On Thu, 28 Jan 2021 05:13:57 GMT, David Holmes wrote: > >>> Please review this change which merges ImmutableSpace into MutableSpace, >>> eliminating the former. There were no interesting uses of ImmutableSpace, >>> other than as the base class for MutableSpace. The name ImmutableSpace is >>> kind of a misnomer given that usage. >>> >>> Testing: >>> mach5 tier1-3 >> >> src/hotspot/share/gc/parallel/mutableSpace.hpp line 47: >> >>> 45: // >>> 46: // Invariant: bottom() <= top() <= end() >>> 47: // top() is inclusive and end() is exclusive. >> >> If end() is exclusive then shouldn't the invariant be `< end()`? > > I also think that top() is also exclusive as in other collectors. > > @dholmes-ora : e.g. bottom == top == end means the space is empty. These two lines are not disagreeing with each other. If one is exclusive and one is inclusive then I don't see how they can be equal, as that implies they are then both inclusive and exclusive at the same time. ?? If end() is exclusive then I would expect an empty space to be one where bottom and end are adjacent, not coincident. Cheers, David > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/2271 > From neliasso at openjdk.java.net Thu Jan 28 11:16:42 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Thu, 28 Jan 2021 11:16:42 GMT Subject: RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled In-Reply-To: <4GliJeBE2JZSVgnDH5pAMIADyJdgA0YGEcCv6j9GdWY=.5d156f50-6ea9-470a-ad0a-db9056507108@github.com> References: <4GliJeBE2JZSVgnDH5pAMIADyJdgA0YGEcCv6j9GdWY=.5d156f50-6ea9-470a-ad0a-db9056507108@github.com> Message-ID: <0CCcwl_AOsuDgmWZTBrWEAe9pqKDshyQeItgEiCEiZs=.6cd6be67-0951-4373-9546-fb496f48a41b@github.com> On Thu, 28 Jan 2021 09:02:22 GMT, Vladimir Ivanov wrote: >> https://bugs.openjdk.java.net/browse/JDK-8260473 >> >> Function "PhaseVector::expand_vunbox_node" creates a LoadNode, but forgets to make the LoadNode to pass gc barriers. >> >> >> Testing: all Vector API related tests have passed. > > What about migrating it to `GraphKit::access_load_at` instead? @iwanowww GraphKit::access_load_at is for parse time only. C2OptAccess must be used here. ------------- PR: https://git.openjdk.java.net/jdk/pull/2253 From tschatzl at openjdk.java.net Thu Jan 28 12:10:41 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 28 Jan 2021 12:10:41 GMT Subject: RFR: 8259778: Merge MutableSpace and ImmutableSpace In-Reply-To: References: Message-ID: <8uaoMsqDox6nwjHu4flmgzVPTDJ6SDjHJgGBaqOaWEE=.b8a41eb6-ad15-45ea-91b6-08b02d3b7438@github.com> On Thu, 28 Jan 2021 09:01:44 GMT, Thomas Schatzl wrote: >> Please review this change which merges ImmutableSpace into MutableSpace, >> eliminating the former. There were no interesting uses of ImmutableSpace, >> other than as the base class for MutableSpace. The name ImmutableSpace is >> kind of a misnomer given that usage. >> >> Testing: >> mach5 tier1-3 > > I assume that tier1-3 includes the SA tests :) > > Looks good other than that nit. Hi David, > _Mailing list message from [David Holmes](mailto:david.holmes at oracle.com) on [serviceability-dev](mailto:serviceability-dev at openjdk.java.net):_ > > On 28/01/2021 7:09 pm, Thomas Schatzl wrote: > > > On Thu, 28 Jan 2021 05:13:57 GMT, David Holmes wrote: > > > > Please review this change which merges ImmutableSpace into MutableSpace, > > > > eliminating the former. There were no interesting uses of ImmutableSpace, > > > > other than as the base class for MutableSpace. The name ImmutableSpace is > > > > kind of a misnomer given that usage. > > > > Testing: > > > > mach5 tier1-3 > > > > > > src/hotspot/share/gc/parallel/mutableSpace.hpp line 47: > > > > 45: // > > > > 46: // Invariant: bottom() <= top() <= end() > > > > 47: // top() is inclusive and end() is exclusive. > > > > > > If end() is exclusive then shouldn't the invariant be `< end()`? > > > > I also think that top() is also exclusive as in other collectors. > > @dholmes-ora : e.g. bottom == top == end means the space is empty. These two lines are not disagreeing with each other. > > If one is exclusive and one is inclusive then I don't see how they can > be equal, as that implies they are then both inclusive and exclusive at > the same time. ?? If end() is exclusive then I would expect an empty > space to be one where bottom and end are adjacent, not coincident. > > Cheers, > David The original comment about top() being inclusive is wrong. top() is also exclusive like in all other collectors as stated elsewhere in my review comment. My "also" in "I also think that top() is also exclusive as in other collectors." probably threw you off after re-reading it, which is wrong. Sorry. Maybe some examples help: bottom = 200, top = 200, end = 200 is an "empty" space (i.e. is of size zero). Whether that empty space is "free" or "fully allocated" or both or neither is another question :) bottom = 200, top = 200, end = 201 contains one word and is (completely) free (not allocated into at all). bottom = 200, top = 201, end = 201 contains one word and is full(y allocated). Top/end are exclusive, and bottom inclusive as does the code assume from what I can tell by quickly looking at it. Still the invariant is bottom <= top <= end in all cases. Thanks, Thomas ------------- PR: https://git.openjdk.java.net/jdk/pull/2271 From github.com+25214855+casparcwang at openjdk.java.net Thu Jan 28 12:19:07 2021 From: github.com+25214855+casparcwang at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Thu, 28 Jan 2021 12:19:07 GMT Subject: RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled [v2] In-Reply-To: References: Message-ID: > https://bugs.openjdk.java.net/browse/JDK-8260473 > > Function "PhaseVector::expand_vunbox_node" creates a LoadNode, but forgets to make the LoadNode to pass gc barriers. > > > Testing: all Vector API related tests have passed. ?? has updated the pull request incrementally with one additional commit since the last revision: Add the regression test ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2253/files - new: https://git.openjdk.java.net/jdk/pull/2253/files/5859bbbc..fbaddfb7 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2253&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2253&range=00-01 Stats: 132 lines in 1 file changed: 132 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2253.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2253/head:pull/2253 PR: https://git.openjdk.java.net/jdk/pull/2253 From shade at openjdk.java.net Thu Jan 28 12:20:50 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 28 Jan 2021 12:20:50 GMT Subject: RFR: 8260584: Shenandoah: simplify "Concurrent Thread Roots" logging Message-ID: <_tmnF3EWmNl7yricNI7zUIGGOfcA52knD6hFUGzgTmg=.de301c69-4c25-44e0-a943-a5d778d8357a@github.com> There are two separate counters now: f(conc_thread_roots, "Concurrent Stack Processing") \ f(conc_thread_roots_work, " Threads") \ SHENANDOAH_PAR_PHASE_DO(conc_thread_roots_work_, " CT: ", f) \ ...and `_work` counter is unused, and `conc_thread_roots` is used to report worker stats. So the log says ``, where `Thread Roots` should have been mentioned: [34.169s][info][gc,stats] Concurrent Stack Processing 11341 us, parallelism: 7.93x [34.169s][info][gc,stats] Threads 89908 us [34.169s][info][gc,stats] CT: 89908 us, workers (us): 11231, 11270, 11251, 11252, 11237, 11230, 11214, 11223, Fixed log says: [99.797s][info][gc,stats] Concurrent Thread Roots 3929 us, parallelism: 7.45x [99.797s][info][gc,stats] CTR: 29273 us [99.797s][info][gc,stats] CTR: Thread Roots 29273 us, workers (us): 3652, 3643, 3622, 3623, 3623, 3676, 3606, 3829, Also, I believe it should be called "Concurrent Thread Roots", in symmetry with "Concurrent Update Thread Roots" later. ------------- Commit messages: - 8260584: Shenandoah: simplify "Concurrent Thread Roots" logging Changes: https://git.openjdk.java.net/jdk/pull/2287/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2287&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260584 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2287.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2287/head:pull/2287 PR: https://git.openjdk.java.net/jdk/pull/2287 From github.com+25214855+casparcwang at openjdk.java.net Thu Jan 28 12:21:58 2021 From: github.com+25214855+casparcwang at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Thu, 28 Jan 2021 12:21:58 GMT Subject: RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled [v3] In-Reply-To: References: Message-ID: > https://bugs.openjdk.java.net/browse/JDK-8260473 > > Function "PhaseVector::expand_vunbox_node" creates a LoadNode, but forgets to make the LoadNode to pass gc barriers. > > > Testing: all Vector API related tests have passed. ?? has updated the pull request incrementally with one additional commit since the last revision: rm trailing whitespace ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2253/files - new: https://git.openjdk.java.net/jdk/pull/2253/files/fbaddfb7..b1db113f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2253&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2253&range=01-02 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2253.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2253/head:pull/2253 PR: https://git.openjdk.java.net/jdk/pull/2253 From github.com+25214855+casparcwang at openjdk.java.net Thu Jan 28 12:25:39 2021 From: github.com+25214855+casparcwang at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Thu, 28 Jan 2021 12:25:39 GMT Subject: RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled [v3] In-Reply-To: <0CCcwl_AOsuDgmWZTBrWEAe9pqKDshyQeItgEiCEiZs=.6cd6be67-0951-4373-9546-fb496f48a41b@github.com> References: <4GliJeBE2JZSVgnDH5pAMIADyJdgA0YGEcCv6j9GdWY=.5d156f50-6ea9-470a-ad0a-db9056507108@github.com> <0CCcwl_AOsuDgmWZTBrWEAe9pqKDshyQeItgEiCEiZs=.6cd6be67-0951-4373-9546-fb496f48a41b@github.com> Message-ID: <1_hVbQCFz5VshMGIRCNKt8BZpxiuGfptHD7_dTZ5z9M=.44499d58-f526-4d3a-bad2-72be3063dc68@github.com> On Thu, 28 Jan 2021 11:13:37 GMT, Nils Eliasson wrote: >> What about migrating it to `GraphKit::access_load_at` instead? > > @iwanowww GraphKit::access_load_at is for parse time only. C2OptAccess must be used here. The test is part of test/jdk/jdk/incubator/vector/VectorReshapeTests.java. And run the original big test with zgc didn't reproduce the failure, so just add a new small test. The small test is provided by Stuart Monteith in the JBS. Thanks for providing the test. ------------- PR: https://git.openjdk.java.net/jdk/pull/2253 From rkennke at openjdk.java.net Thu Jan 28 12:28:41 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Thu, 28 Jan 2021 12:28:41 GMT Subject: RFR: 8260584: Shenandoah: simplify "Concurrent Thread Roots" logging In-Reply-To: <_tmnF3EWmNl7yricNI7zUIGGOfcA52knD6hFUGzgTmg=.de301c69-4c25-44e0-a943-a5d778d8357a@github.com> References: <_tmnF3EWmNl7yricNI7zUIGGOfcA52knD6hFUGzgTmg=.de301c69-4c25-44e0-a943-a5d778d8357a@github.com> Message-ID: <9R3kGR4n0fw6TRqBMStWzocNVdykrZ4gEBVeGjwkIKk=.551bafdb-dc76-4b21-8018-83daf8016fd8@github.com> On Thu, 28 Jan 2021 12:15:32 GMT, Aleksey Shipilev wrote: > There are two separate counters now: > > f(conc_thread_roots, "Concurrent Stack Processing") \ > f(conc_thread_roots_work, " Threads") \ > SHENANDOAH_PAR_PHASE_DO(conc_thread_roots_work_, " CT: ", f) \ > ...and `_work` counter is unused, and `conc_thread_roots` is used to report worker stats. So the log says ``, where `Thread Roots` should have been mentioned: > > [34.169s][info][gc,stats] Concurrent Stack Processing 11341 us, parallelism: 7.93x > [34.169s][info][gc,stats] Threads 89908 us > [34.169s][info][gc,stats] CT: 89908 us, workers (us): 11231, 11270, 11251, 11252, 11237, 11230, 11214, 11223, > > Fixed log says: > > [99.797s][info][gc,stats] Concurrent Thread Roots 3929 us, parallelism: 7.45x > [99.797s][info][gc,stats] CTR: 29273 us > [99.797s][info][gc,stats] CTR: Thread Roots 29273 us, workers (us): 3652, 3643, 3622, 3623, 3623, 3676, 3606, 3829, > > Also, I believe it should be called "Concurrent Thread Roots", in symmetry with "Concurrent Update Thread Roots" later. Looks good to me. Thank you! ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2287 From jiefu at openjdk.java.net Thu Jan 28 12:40:39 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Thu, 28 Jan 2021 12:40:39 GMT Subject: RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled [v3] In-Reply-To: <1_hVbQCFz5VshMGIRCNKt8BZpxiuGfptHD7_dTZ5z9M=.44499d58-f526-4d3a-bad2-72be3063dc68@github.com> References: <4GliJeBE2JZSVgnDH5pAMIADyJdgA0YGEcCv6j9GdWY=.5d156f50-6ea9-470a-ad0a-db9056507108@github.com> <0CCcwl_AOsuDgmWZTBrWEAe9pqKDshyQeItgEiCEiZs=.6cd6be67-0951-4373-9546-fb496f48a41b@github.com> <1_hVbQCFz5VshMGIRCNKt8BZpxiuGfptHD7_dTZ5z9M=.44499d58-f526-4d3a-bad2-72be3063dc68@github.com> Message-ID: On Thu, 28 Jan 2021 12:22:35 GMT, ?? wrote: >> @iwanowww GraphKit::access_load_at is for parse time only. C2OptAccess must be used here. > > The test is part of test/jdk/jdk/incubator/vector/VectorReshapeTests.java. > And run the original big test with zgc didn't reproduce the failure, so just add a new small test. > > The small test is provided by Stuart Monteith in the JBS. Thanks for providing the test. I suggest adding the test under test/hotspot/jtreg/compiler/vectorapi. Thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/2253 From shade at openjdk.java.net Thu Jan 28 12:44:56 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 28 Jan 2021 12:44:56 GMT Subject: RFR: 8260586: Shenandoah: simplify "Concurrent Weak References" logging Message-ID: Concurrent Weak References always does parallel worker operation. Therefore "Process" counter is redundant, and we might as well make the root counter the per-worker one. This simplifies GC logging. Old log: [95.220s][info][gc,stats] Concurrent Weak References 1709 us [95.220s][info][gc,stats] Process 1588 us, parallelism: 1.30x [95.220s][info][gc,stats] CWRF: 2056 us [95.220s][info][gc,stats] CWRF: Weak References 2056 us, workers (us): 454, 1450, 2, 145, 4, 1, 0, 0, New log: [39.583s][info][gc,stats] Concurrent Weak References 651 us, parallelism: 1.52x [39.583s][info][gc,stats] CWRF: 986 us [39.583s][info][gc,stats] CWRF: Weak References 986 us, workers (us): 183, 29, 145, 627, 1, 0, 0, 0, ------------- Commit messages: - 8260586: Shenandoah: simplify "Concurrent Weak References" logging Changes: https://git.openjdk.java.net/jdk/pull/2288/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2288&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260586 Stats: 6 lines in 3 files changed: 0 ins; 2 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/2288.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2288/head:pull/2288 PR: https://git.openjdk.java.net/jdk/pull/2288 From jiefu at openjdk.java.net Thu Jan 28 12:47:40 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Thu, 28 Jan 2021 12:47:40 GMT Subject: RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled [v3] In-Reply-To: References: <4GliJeBE2JZSVgnDH5pAMIADyJdgA0YGEcCv6j9GdWY=.5d156f50-6ea9-470a-ad0a-db9056507108@github.com> <0CCcwl_AOsuDgmWZTBrWEAe9pqKDshyQeItgEiCEiZs=.6cd6be67-0951-4373-9546-fb496f48a41b@github.com> <1_hVbQCFz5VshMGIRCNKt8BZpxiuGfptHD7_dTZ5z9M=.44499d58-f526-4d3a-bad2-72be3063dc68@github.com> Message-ID: <9t9pW4ro734dO5B6Rv_mv8MqoOq9Yr-XlAojRItL7M0=.1fbab424-d892-4e17-9d26-0383780a73ba@github.com> On Thu, 28 Jan 2021 12:38:11 GMT, Jie Fu wrote: >> The test is part of test/jdk/jdk/incubator/vector/VectorReshapeTests.java. >> And run the original big test with zgc didn't reproduce the failure, so just add a new small test. >> >> The small test is provided by Stuart Monteith in the JBS. Thanks for providing the test. > > I suggest adding the test under test/hotspot/jtreg/compiler/vectorapi. > Thanks. Since 32-bit VM doesn't support ZGC, the test should only for 64-bit. ------------- PR: https://git.openjdk.java.net/jdk/pull/2253 From vlivanov at openjdk.java.net Thu Jan 28 12:47:43 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Thu, 28 Jan 2021 12:47:43 GMT Subject: RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled [v3] In-Reply-To: <4GliJeBE2JZSVgnDH5pAMIADyJdgA0YGEcCv6j9GdWY=.5d156f50-6ea9-470a-ad0a-db9056507108@github.com> References: <4GliJeBE2JZSVgnDH5pAMIADyJdgA0YGEcCv6j9GdWY=.5d156f50-6ea9-470a-ad0a-db9056507108@github.com> Message-ID: On Thu, 28 Jan 2021 09:02:22 GMT, Vladimir Ivanov wrote: >> ?? has updated the pull request incrementally with one additional commit since the last revision: >> >> rm trailing whitespace > > What about migrating it to `GraphKit::access_load_at` instead? > @iwanowww GraphKit::access_load_at is for parse time only. C2OptAccess must be used here. I see. That's unfortunate. Actually, `PhaseVector::optimize_vector_boxes()` sets `C->inlining_incrementally() == true` and it enables the code to use `GraphKit` and, moreover, perform late inlining of vector reboxing operations. But I haven't thought through all the implications yet. ------------- PR: https://git.openjdk.java.net/jdk/pull/2253 From github.com+779991+jaokim at openjdk.java.net Thu Jan 28 12:48:55 2021 From: github.com+779991+jaokim at openjdk.java.net (Joakim =?UTF-8?B?Tm9yZHN0csO2bQ==?=) Date: Thu, 28 Jan 2021 12:48:55 GMT Subject: RFR: 8217327: G1 Post-Cleanup region liveness printing should not print out-of-date efficiency [v4] In-Reply-To: References: Message-ID: <95B6j1ZSceUGfTTDsZfF3a5ZbggYlBiv9WJkHKkzO0w=.edd53e67-02ae-4c8a-ae0f-3a50c7ac0676@github.com> > **Description** > This fix addresses the issue where gc-efficiency is printed incorrectly when logging post-marking and post-cleanup. The gc-efficiency is calculated in the end of the marking phase, to be logged in the post-cleanup section. It is however not reset, meaning that next phase's post-marking log will show the old value. > > - The gc-efficiency is initialized to -1 when it hasn't been calculated. > - Negative gc-efficiency is displayed as a hyphen "-" in the summary. > - The gc-efficiency is reset to -1 in `HeapRegion::note_start_of_marking()` > > **Note:** there is a sister issue that moves the post-cleanup printing to a later stage. Without this fix, the logging will still be incorrect, so both fixes are needed. See: [JDK-8260042: G1 Post-cleanup liveness printing occurs too early](https://github.com/openjdk/jdk/pull/2168) > > This fix has been tested together with the above mentioned fix. > > **Example** > This is what logging like after fix has been applied. > ### PHASE Post-Marking @ 410.303 > ### HEAP reserved: 0x0ffc00000-0x100000000 region-size: 1048576 > ### > ### type address-range used prev-live next-live gc-eff remset state code-roots > ### (bytes) (bytes) (bytes) (bytes/ms) (bytes) (bytes) > ### OLD 0x0ffc00000-0x0ffd00000 1048368 1048368 1048368 - 8464 UPDAT 6096 > ### OLD 0x0ffd00000-0x0ffe00000 132856 132856 132856 - 2544 UPDAT 16 > ### SURV 0x0ffe00000-0x0fff00000 21368 21368 21368 - 2544 CMPLT 16 > ### FREE 0x0fff00000-0x100000000 0 0 0 - 2384 UNTRA 16 > ### > ### SUMMARY capacity: 4.00 MB used: 1.15 MB / 28.67 % prev-live: 1.15 MB / 28.67 % next-live: 1.15 MB / 28.67 % remset: 0.02 MB code-roots: 0.01 MB > ### PHASE Post-Cleanup @ 410.305 > ### HEAP reserved: 0x0ffc00000-0x100000000 region-size: 1048576 > ### > ### type address-range used prev-live next-live gc-eff remset state code-roots > ### (bytes) (bytes) (bytes) (bytes/ms) (bytes) (bytes) > ### OLD 0x0ffc00000-0x0ffd00000 1048368 1048368 1048368 - 8624 UNTRA 6096 > ### OLD 0x0ffd00000-0x0ffe00000 132856 132856 132856 1352923.9 2544 CMPLT 16 > ### SURV 0x0ffe00000-0x0fff00000 21368 21368 21368 - 2544 CMPLT 16 > ### FREE 0x0fff00000-0x100000000 0 0 0 - 2384 UNTRA 16 > ### > ### SUMMARY capacity: 4.00 MB used: 1.15 MB / 28.67 % prev-live: 1.15 MB / 28.67 % next-live: 1.15 MB / 28.67 % remset: 0.02 MB code-roots: 0.01 MB > ### PHASE Post-Marking @ 450.310 > ### HEAP reserved: 0x0ffc00000-0x100000000 region-size: 1048576 > ### > ### type address-range used prev-live next-live gc-eff remset state code-roots > ### (bytes) (bytes) (bytes) (bytes/ms) (bytes) (bytes) > ### OLD 0x0ffc00000-0x0ffd00000 1048368 1048368 1048368 - 8624 UPDAT 6096 > ### OLD 0x0ffd00000-0x0ffe00000 174456 174456 174456 - 2544 UPDAT 16 > ### SURV 0x0ffe00000-0x0fff00000 21368 21368 21368 - 2544 CMPLT 16 > ### FREE 0x0fff00000-0x100000000 0 0 0 - 2384 UNTRA 16 > ### > ### SUMMARY capacity: 4.00 MB used: 1.19 MB / 29.66 % prev-live: 1.19 MB / 29.66 % next-live: 1.19 MB / 29.66 % remset: 0.02 MB code-roots: 0.01 MB > ### PHASE Post-Cleanup @ 450.312 > ### HEAP reserved: 0x0ffc00000-0x100000000 region-size: 1048576 > ### > ### type address-range used prev-live next-live gc-eff remset state code-roots > ### (bytes) (bytes) (bytes) (bytes/ms) (bytes) (bytes) > ### OLD 0x0ffc00000-0x0ffd00000 1048368 1048368 1048368 - 8624 UNTRA 6096 > ### OLD 0x0ffd00000-0x0ffe00000 174456 174456 174456 1266519.2 2544 CMPLT 16 > ### SURV 0x0ffe00000-0x0fff00000 21368 21368 21368 - 2544 CMPLT 16 > ### FREE 0x0fff00000-0x100000000 0 0 0 - 2384 UNTRA 16 > ### > > **Testing** > - Manual testing > - hs-tier1, hs-tier2 Joakim Nordstr?m has updated the pull request incrementally with one additional commit since the last revision: Using FormatBuffer instead of snprintf. Changed defines to more descriptive names. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2217/files - new: https://git.openjdk.java.net/jdk/pull/2217/files/97b022ff..201f785d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2217&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2217&range=02-03 Stats: 11 lines in 1 file changed: 1 ins; 0 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/2217.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2217/head:pull/2217 PR: https://git.openjdk.java.net/jdk/pull/2217 From rkennke at openjdk.java.net Thu Jan 28 12:51:41 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Thu, 28 Jan 2021 12:51:41 GMT Subject: RFR: 8260586: Shenandoah: simplify "Concurrent Weak References" logging In-Reply-To: References: Message-ID: <8d4s3167oTU6HMK8oeWjdZ02ViaFRlLVnzO_MXI66u0=.4f63a1dc-70d5-4933-9ab8-bdcd6e7e5d25@github.com> On Thu, 28 Jan 2021 12:36:58 GMT, Aleksey Shipilev wrote: > Concurrent Weak References always does parallel worker operation. Therefore "Process" counter is redundant, and we might as well make the root counter the per-worker one. This simplifies GC logging. > > Old log: > > [95.220s][info][gc,stats] Concurrent Weak References 1709 us > [95.220s][info][gc,stats] Process 1588 us, parallelism: 1.30x > [95.220s][info][gc,stats] CWRF: 2056 us > [95.220s][info][gc,stats] CWRF: Weak References 2056 us, workers (us): 454, 1450, 2, 145, 4, 1, 0, 0, > > New log: > > [39.583s][info][gc,stats] Concurrent Weak References 651 us, parallelism: 1.52x > [39.583s][info][gc,stats] CWRF: 986 us > [39.583s][info][gc,stats] CWRF: Weak References 986 us, workers (us): 183, 29, 145, 627, 1, 0, 0, 0, Looks good! Thanks! ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2288 From github.com+25214855+casparcwang at openjdk.java.net Thu Jan 28 12:53:03 2021 From: github.com+25214855+casparcwang at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Thu, 28 Jan 2021 12:53:03 GMT Subject: RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled [v4] In-Reply-To: References: Message-ID: > https://bugs.openjdk.java.net/browse/JDK-8260473 > > Function "PhaseVector::expand_vunbox_node" creates a LoadNode, but forgets to make the LoadNode to pass gc barriers. > > > Testing: all Vector API related tests have passed. ?? has updated the pull request incrementally with one additional commit since the last revision: Change the directory & fix the include order ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2253/files - new: https://git.openjdk.java.net/jdk/pull/2253/files/b1db113f..d52fd4c4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2253&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2253&range=02-03 Stats: 2 lines in 2 files changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2253.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2253/head:pull/2253 PR: https://git.openjdk.java.net/jdk/pull/2253 From tschatzl at openjdk.java.net Thu Jan 28 12:56:40 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 28 Jan 2021 12:56:40 GMT Subject: RFR: 8217327: G1 Post-Cleanup region liveness printing should not print out-of-date efficiency [v4] In-Reply-To: <95B6j1ZSceUGfTTDsZfF3a5ZbggYlBiv9WJkHKkzO0w=.edd53e67-02ae-4c8a-ae0f-3a50c7ac0676@github.com> References: <95B6j1ZSceUGfTTDsZfF3a5ZbggYlBiv9WJkHKkzO0w=.edd53e67-02ae-4c8a-ae0f-3a50c7ac0676@github.com> Message-ID: On Thu, 28 Jan 2021 12:48:55 GMT, Joakim Nordstr?m wrote: >> **Description** >> This fix addresses the issue where gc-efficiency is printed incorrectly when logging post-marking and post-cleanup. The gc-efficiency is calculated in the end of the marking phase, to be logged in the post-cleanup section. It is however not reset, meaning that next phase's post-marking log will show the old value. >> >> - The gc-efficiency is initialized to -1 when it hasn't been calculated. >> - Negative gc-efficiency is displayed as a hyphen "-" in the summary. >> - The gc-efficiency is reset to -1 in `HeapRegion::note_start_of_marking()` >> >> **Note:** there is a sister issue that moves the post-cleanup printing to a later stage. Without this fix, the logging will still be incorrect, so both fixes are needed. See: [JDK-8260042: G1 Post-cleanup liveness printing occurs too early](https://github.com/openjdk/jdk/pull/2168) >> >> This fix has been tested together with the above mentioned fix. >> >> **Example** >> This is what logging like after fix has been applied. >> ### PHASE Post-Marking @ 410.303 >> ### HEAP reserved: 0x0ffc00000-0x100000000 region-size: 1048576 >> ### >> ### type address-range used prev-live next-live gc-eff remset state code-roots >> ### (bytes) (bytes) (bytes) (bytes/ms) (bytes) (bytes) >> ### OLD 0x0ffc00000-0x0ffd00000 1048368 1048368 1048368 - 8464 UPDAT 6096 >> ### OLD 0x0ffd00000-0x0ffe00000 132856 132856 132856 - 2544 UPDAT 16 >> ### SURV 0x0ffe00000-0x0fff00000 21368 21368 21368 - 2544 CMPLT 16 >> ### FREE 0x0fff00000-0x100000000 0 0 0 - 2384 UNTRA 16 >> ### >> ### SUMMARY capacity: 4.00 MB used: 1.15 MB / 28.67 % prev-live: 1.15 MB / 28.67 % next-live: 1.15 MB / 28.67 % remset: 0.02 MB code-roots: 0.01 MB >> ### PHASE Post-Cleanup @ 410.305 >> ### HEAP reserved: 0x0ffc00000-0x100000000 region-size: 1048576 >> ### >> ### type address-range used prev-live next-live gc-eff remset state code-roots >> ### (bytes) (bytes) (bytes) (bytes/ms) (bytes) (bytes) >> ### OLD 0x0ffc00000-0x0ffd00000 1048368 1048368 1048368 - 8624 UNTRA 6096 >> ### OLD 0x0ffd00000-0x0ffe00000 132856 132856 132856 1352923.9 2544 CMPLT 16 >> ### SURV 0x0ffe00000-0x0fff00000 21368 21368 21368 - 2544 CMPLT 16 >> ### FREE 0x0fff00000-0x100000000 0 0 0 - 2384 UNTRA 16 >> ### >> ### SUMMARY capacity: 4.00 MB used: 1.15 MB / 28.67 % prev-live: 1.15 MB / 28.67 % next-live: 1.15 MB / 28.67 % remset: 0.02 MB code-roots: 0.01 MB >> ### PHASE Post-Marking @ 450.310 >> ### HEAP reserved: 0x0ffc00000-0x100000000 region-size: 1048576 >> ### >> ### type address-range used prev-live next-live gc-eff remset state code-roots >> ### (bytes) (bytes) (bytes) (bytes/ms) (bytes) (bytes) >> ### OLD 0x0ffc00000-0x0ffd00000 1048368 1048368 1048368 - 8624 UPDAT 6096 >> ### OLD 0x0ffd00000-0x0ffe00000 174456 174456 174456 - 2544 UPDAT 16 >> ### SURV 0x0ffe00000-0x0fff00000 21368 21368 21368 - 2544 CMPLT 16 >> ### FREE 0x0fff00000-0x100000000 0 0 0 - 2384 UNTRA 16 >> ### >> ### SUMMARY capacity: 4.00 MB used: 1.19 MB / 29.66 % prev-live: 1.19 MB / 29.66 % next-live: 1.19 MB / 29.66 % remset: 0.02 MB code-roots: 0.01 MB >> ### PHASE Post-Cleanup @ 450.312 >> ### HEAP reserved: 0x0ffc00000-0x100000000 region-size: 1048576 >> ### >> ### type address-range used prev-live next-live gc-eff remset state code-roots >> ### (bytes) (bytes) (bytes) (bytes/ms) (bytes) (bytes) >> ### OLD 0x0ffc00000-0x0ffd00000 1048368 1048368 1048368 - 8624 UNTRA 6096 >> ### OLD 0x0ffd00000-0x0ffe00000 174456 174456 174456 1266519.2 2544 CMPLT 16 >> ### SURV 0x0ffe00000-0x0fff00000 21368 21368 21368 - 2544 CMPLT 16 >> ### FREE 0x0fff00000-0x100000000 0 0 0 - 2384 UNTRA 16 >> ### >> >> **Testing** >> - Manual testing >> - hs-tier1, hs-tier2 > > Joakim Nordstr?m has updated the pull request incrementally with one additional commit since the last revision: > > Using FormatBuffer instead of snprintf. Changed defines to more descriptive names. Lgtm, thanks for your effort. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2217 From github.com+25214855+casparcwang at openjdk.java.net Thu Jan 28 13:04:39 2021 From: github.com+25214855+casparcwang at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Thu, 28 Jan 2021 13:04:39 GMT Subject: RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled [v4] In-Reply-To: References: <4GliJeBE2JZSVgnDH5pAMIADyJdgA0YGEcCv6j9GdWY=.5d156f50-6ea9-470a-ad0a-db9056507108@github.com> Message-ID: <8CdueoVmisyBg7xQko_cVOVPJDw5qhAOVGL2lyRDn54=.6f498c27-cced-460a-9198-514c5f8cd605@github.com> On Thu, 28 Jan 2021 12:45:39 GMT, Vladimir Ivanov wrote: > > @iwanowww GraphKit::access_load_at is for parse time only. C2OptAccess must be used here. > > I see. That's unfortunate. > > Actually, `PhaseVector::optimize_vector_boxes()` sets `C->inlining_incrementally() == true` and it enables the code to use `GraphKit` and, moreover, perform late inlining of vector reboxing operations. But I haven't thought through all the implications yet. `ArrayCopyNode::load` performs the same work as it does here in `PhaseVector::optimize_vector_boxes `. Is there a need to provide a similar function in PhaseVector or GraphKit? ------------- PR: https://git.openjdk.java.net/jdk/pull/2253 From github.com+779991+jaokim at openjdk.java.net Thu Jan 28 13:11:41 2021 From: github.com+779991+jaokim at openjdk.java.net (Joakim =?UTF-8?B?Tm9yZHN0csO2bQ==?=) Date: Thu, 28 Jan 2021 13:11:41 GMT Subject: RFR: 8217327: G1 Post-Cleanup region liveness printing should not print out-of-date efficiency [v4] In-Reply-To: References: <95B6j1ZSceUGfTTDsZfF3a5ZbggYlBiv9WJkHKkzO0w=.edd53e67-02ae-4c8a-ae0f-3a50c7ac0676@github.com> Message-ID: On Thu, 28 Jan 2021 12:53:59 GMT, Thomas Schatzl wrote: >> Joakim Nordstr?m has updated the pull request incrementally with one additional commit since the last revision: >> >> Using FormatBuffer instead of snprintf. Changed defines to more descriptive names. > > Lgtm, thanks for your effort. Thanks @tschatzl for review and comments! ------------- PR: https://git.openjdk.java.net/jdk/pull/2217 From eosterlund at openjdk.java.net Thu Jan 28 13:12:42 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 28 Jan 2021 13:12:42 GMT Subject: RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled [v4] In-Reply-To: <8CdueoVmisyBg7xQko_cVOVPJDw5qhAOVGL2lyRDn54=.6f498c27-cced-460a-9198-514c5f8cd605@github.com> References: <4GliJeBE2JZSVgnDH5pAMIADyJdgA0YGEcCv6j9GdWY=.5d156f50-6ea9-470a-ad0a-db9056507108@github.com> <8CdueoVmisyBg7xQko_cVOVPJDw5qhAOVGL2lyRDn54=.6f498c27-cced-460a-9198-514c5f8cd605@github.com> Message-ID: On Thu, 28 Jan 2021 13:01:36 GMT, ?? wrote: >>> @iwanowww GraphKit::access_load_at is for parse time only. C2OptAccess must be used here. >> >> I see. That's unfortunate. >> >> Actually, `PhaseVector::optimize_vector_boxes()` sets `C->inlining_incrementally() == true` and it enables the code to use `GraphKit` and, moreover, perform late inlining of vector reboxing operations. But I haven't thought through all the implications yet. > >> > @iwanowww GraphKit::access_load_at is for parse time only. C2OptAccess must be used here. >> >> I see. That's unfortunate. >> >> Actually, `PhaseVector::optimize_vector_boxes()` sets `C->inlining_incrementally() == true` and it enables the code to use `GraphKit` and, moreover, perform late inlining of vector reboxing operations. But I haven't thought through all the implications yet. > > `ArrayCopyNode::load` performs the same work as it does here in `PhaseVector::optimize_vector_boxes `. > Is there a need to provide a similar function in PhaseVector or GraphKit? Worth mentioning is that we have optimization-time loads and stores in the arraycopy/clone code. When that was added, it was the only place where we had such accesses, so a little load/store wrapper utility was written in the arraycopy code. But perhaps now that there is more than one place, that utility belongs in a more central place, so the boilerplate can be reduced. I'm okay with doing that in a follow-up RFE though, as it is a refactoring only, and getting the actual bug fix in soon is is of value. ------------- PR: https://git.openjdk.java.net/jdk/pull/2253 From ayang at openjdk.java.net Thu Jan 28 13:49:05 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Thu, 28 Jan 2021 13:49:05 GMT Subject: RFR: 8260574: Remove parallel constructs in GenCollectedHeap::process_roots [v2] In-Reply-To: References: Message-ID: <9IzPl1Vjckbc8hGFU-x-3lUOaXPNZdbwHZNGELPzxsg=.aba6ae73-0169-4866-8da2-30f5d16a95aa@github.com> > This PR is broken into two commits for easier reviewing, one for removing the usage of `SubTasksDone` in `GenCollectedHeap`, and the other for removing `StrongRootsScope` in the call chain of `GenCollectedHeap::process_roots`. > > Tested: hotspot_gc Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: statically known sequential ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2280/files - new: https://git.openjdk.java.net/jdk/pull/2280/files/a13ce118..fbd1e5b6 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2280&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2280&range=00-01 Stats: 9 lines in 4 files changed: 3 ins; 0 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/2280.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2280/head:pull/2280 PR: https://git.openjdk.java.net/jdk/pull/2280 From zgu at openjdk.java.net Thu Jan 28 13:56:42 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 28 Jan 2021 13:56:42 GMT Subject: RFR: 8260584: Shenandoah: simplify "Concurrent Thread Roots" logging In-Reply-To: <_tmnF3EWmNl7yricNI7zUIGGOfcA52knD6hFUGzgTmg=.de301c69-4c25-44e0-a943-a5d778d8357a@github.com> References: <_tmnF3EWmNl7yricNI7zUIGGOfcA52knD6hFUGzgTmg=.de301c69-4c25-44e0-a943-a5d778d8357a@github.com> Message-ID: <5HU-5OewpWehUi6Y6TaJ9_3Gqk5HoNoxsvWGa1oCTlY=.277e1431-1001-4c98-89b8-28fbb91b28fd@github.com> On Thu, 28 Jan 2021 12:15:32 GMT, Aleksey Shipilev wrote: > There are two separate counters now: > > f(conc_thread_roots, "Concurrent Stack Processing") \ > f(conc_thread_roots_work, " Threads") \ > SHENANDOAH_PAR_PHASE_DO(conc_thread_roots_work_, " CT: ", f) \ > ...and `_work` counter is unused, and `conc_thread_roots` is used to report worker stats. So the log says ``, where `Thread Roots` should have been mentioned: > > [34.169s][info][gc,stats] Concurrent Stack Processing 11341 us, parallelism: 7.93x > [34.169s][info][gc,stats] Threads 89908 us > [34.169s][info][gc,stats] CT: 89908 us, workers (us): 11231, 11270, 11251, 11252, 11237, 11230, 11214, 11223, > > Fixed log says: > > [99.797s][info][gc,stats] Concurrent Thread Roots 3929 us, parallelism: 7.45x > [99.797s][info][gc,stats] CTR: 29273 us > [99.797s][info][gc,stats] CTR: Thread Roots 29273 us, workers (us): 3652, 3643, 3622, 3623, 3623, 3676, 3606, 3829, > > Also, I believe it should be called "Concurrent Thread Roots", in symmetry with "Concurrent Update Thread Roots" later. Looks good. ------------- Marked as reviewed by zgu (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2287 From shade at openjdk.java.net Thu Jan 28 14:09:52 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 28 Jan 2021 14:09:52 GMT Subject: RFR: 8260591: Shenandoah: improve parallelism for concurrent thread root scans Message-ID: Following JDK-8256298, there are a few minor performance issues with the implementation. First, in the spirit of JDK-8246100, we should be scanning the Java threads the last, as they have the most parallelism. Less parallel, or lightweight roots should be scanned before them to improve overall parallelism. Second, claiming each thread dominates the per-thread processing cost. We should really be doing chunked processing. Motivating example is SPECjvm2008 serial, which has very fast concurrent cycles, and thread root scan speed is important. Before: # Baseline [56.176s][info][gc,stats] Concurrent Mark Roots = 0.308 s (a = 1452 us) (n = 212) (lvls, us = 305, 398, 457, 719, 11216) [56.176s][info][gc,stats] CMR: = 1.236 s (a = 5832 us) (n = 212) (lvls, us = 2676, 3535, 4199, 5391, 54522) [56.176s][info][gc,stats] CMR: Thread Roots = 1.179 s (a = 5563 us) (n = 212) (lvls, us = 2441, 3242, 3945, 5156, 54288) [56.176s][info][gc,stats] CMR: VM Strong Roots = 0.005 s (a = 23 us) (n = 212) (lvls, us = 12, 19, 21, 23, 204) [56.176s][info][gc,stats] CMR: CLDG Roots = 0.052 s (a = 247 us) (n = 212) (lvls, us = 73, 203, 252, 293, 562) ... [56.176s][info][gc,stats] Concurrent Stack Processing = 0.124 s (a = 5149 us) (n = 24) (lvls, us = 535, 607, 885, 6387, 27177) [56.176s][info][gc,stats] Threads = 0.632 s (a = 26345 us) (n = 24) (lvls, us = 6465, 8086, 10742, 39453, 145679) [56.176s][info][gc,stats] CT: = 0.632 s (a = 26345 us) (n = 24) (lvls, us = 6465, 8086, 10742, 39453, 145679) After: [56.010s][info][gc,stats] Concurrent Mark Roots = 0.116 s (a = 587 us) (n = 198) (lvls, us = 312, 371, 400, 502, 4316) [56.010s][info][gc,stats] CMR: = 0.931 s (a = 4703 us) (n = 198) (lvls, us = 2402, 3438, 3770, 4453, 62629) [56.010s][info][gc,stats] CMR: Thread Roots = 0.864 s (a = 4366 us) (n = 198) (lvls, us = 1914, 3125, 3477, 4199, 54075) [56.010s][info][gc,stats] CMR: VM Strong Roots = 0.015 s (a = 76 us) (n = 198) (lvls, us = 20, 31, 35, 38, 4693) [56.010s][info][gc,stats] CMR: CLDG Roots = 0.052 s (a = 261 us) (n = 198) (lvls, us = 61, 172, 256, 299, 3861) ... [56.010s][info][gc,stats] Concurrent Stack Processing = 0.081 s (a = 3671 us) (n = 22) (lvls, us = 457, 537, 770, 3359, 24003) [56.010s][info][gc,stats] Threads = 0.469 s (a = 21309 us) (n = 22) (lvls, us = 6016, 6855, 8711, 18945, 103939) [56.010s][info][gc,stats] CT: = 0.469 s (a = 21309 us) (n = 22) (lvls, us = 6016, 6855, 8711, 18945, 103939) ------------- Commit messages: - 8260591: Shenandoah: improve parallelism for concurrent thread root scans Changes: https://git.openjdk.java.net/jdk/pull/2290/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2290&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260591 Stats: 39 lines in 3 files changed: 20 ins; 7 del; 12 mod Patch: https://git.openjdk.java.net/jdk/pull/2290.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2290/head:pull/2290 PR: https://git.openjdk.java.net/jdk/pull/2290 From zgu at openjdk.java.net Thu Jan 28 14:27:42 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 28 Jan 2021 14:27:42 GMT Subject: RFR: 8260591: Shenandoah: improve parallelism for concurrent thread root scans In-Reply-To: References: Message-ID: On Thu, 28 Jan 2021 14:04:07 GMT, Aleksey Shipilev wrote: > Following JDK-8256298, there are a few minor performance issues with the implementation. > > First, in the spirit of JDK-8246100, we should be scanning the Java threads the last, as they have the most parallelism. Less parallel, or lightweight roots should be scanned before them to improve overall parallelism. > > Second, claiming each thread dominates the per-thread processing cost. We should really be doing chunked processing. > > Motivating example is SPECjvm2008 serial, which has very fast concurrent cycles, and thread root scan speed is important. > > Before: > # Baseline > [56.176s][info][gc,stats] Concurrent Mark Roots = 0.308 s (a = 1452 us) (n = 212) (lvls, us = 305, 398, 457, 719, 11216) > [56.176s][info][gc,stats] CMR: = 1.236 s (a = 5832 us) (n = 212) (lvls, us = 2676, 3535, 4199, 5391, 54522) > [56.176s][info][gc,stats] CMR: Thread Roots = 1.179 s (a = 5563 us) (n = 212) (lvls, us = 2441, 3242, 3945, 5156, 54288) > [56.176s][info][gc,stats] CMR: VM Strong Roots = 0.005 s (a = 23 us) (n = 212) (lvls, us = 12, 19, 21, 23, 204) > [56.176s][info][gc,stats] CMR: CLDG Roots = 0.052 s (a = 247 us) (n = 212) (lvls, us = 73, 203, 252, 293, 562) > > ... > [56.176s][info][gc,stats] Concurrent Stack Processing = 0.124 s (a = 5149 us) (n = 24) (lvls, us = 535, 607, 885, 6387, 27177) > [56.176s][info][gc,stats] Threads = 0.632 s (a = 26345 us) (n = 24) (lvls, us = 6465, 8086, 10742, 39453, 145679) > [56.176s][info][gc,stats] CT: = 0.632 s (a = 26345 us) (n = 24) (lvls, us = 6465, 8086, 10742, 39453, 145679) > > After: > [56.010s][info][gc,stats] Concurrent Mark Roots = 0.116 s (a = 587 us) (n = 198) (lvls, us = 312, 371, 400, 502, 4316) > [56.010s][info][gc,stats] CMR: = 0.931 s (a = 4703 us) (n = 198) (lvls, us = 2402, 3438, 3770, 4453, 62629) > [56.010s][info][gc,stats] CMR: Thread Roots = 0.864 s (a = 4366 us) (n = 198) (lvls, us = 1914, 3125, 3477, 4199, 54075) > [56.010s][info][gc,stats] CMR: VM Strong Roots = 0.015 s (a = 76 us) (n = 198) (lvls, us = 20, 31, 35, 38, 4693) > [56.010s][info][gc,stats] CMR: CLDG Roots = 0.052 s (a = 261 us) (n = 198) (lvls, us = 61, 172, 256, 299, 3861) > ... > [56.010s][info][gc,stats] Concurrent Stack Processing = 0.081 s (a = 3671 us) (n = 22) (lvls, us = 457, 537, 770, 3359, 24003) > [56.010s][info][gc,stats] Threads = 0.469 s (a = 21309 us) (n = 22) (lvls, us = 6016, 6855, 8711, 18945, 103939) > [56.010s][info][gc,stats] CT: = 0.469 s (a = 21309 us) (n = 22) (lvls, us = 6016, 6855, 8711, 18945, 103939) Change looks good. I thought fetch_and_add is pretty cheap now, and claiming one thread at a time can balance workload better ... ------------- Marked as reviewed by zgu (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2290 From shade at openjdk.java.net Thu Jan 28 14:32:42 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 28 Jan 2021 14:32:42 GMT Subject: RFR: 8260591: Shenandoah: improve parallelism for concurrent thread root scans In-Reply-To: References: Message-ID: On Thu, 28 Jan 2021 14:24:35 GMT, Zhengyu Gu wrote: > I thought fetch_and_add is pretty cheap now, and claiming one thread at a time can balance workload better ... Atomics are cheap when uncontended. When per-thread work is small, we run into contended atomic, and get unnecessary slowdowns. Chunking might indeed make balancing less precise, but we might not care as much as contending with many GC worker threads. ------------- PR: https://git.openjdk.java.net/jdk/pull/2290 From rkennke at openjdk.java.net Thu Jan 28 14:45:40 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Thu, 28 Jan 2021 14:45:40 GMT Subject: RFR: 8260591: Shenandoah: improve parallelism for concurrent thread root scans In-Reply-To: References: Message-ID: On Thu, 28 Jan 2021 14:04:07 GMT, Aleksey Shipilev wrote: > Following JDK-8256298, there are a few minor performance issues with the implementation. > > First, in the spirit of JDK-8246100, we should be scanning the Java threads the last, as they have the most parallelism. Less parallel, or lightweight roots should be scanned before them to improve overall parallelism. > > Second, claiming each thread dominates the per-thread processing cost. We should really be doing chunked processing. > > Motivating example is SPECjvm2008 serial, which has very fast concurrent cycles, and thread root scan speed is important. > > Before: > # Baseline > [56.176s][info][gc,stats] Concurrent Mark Roots = 0.308 s (a = 1452 us) (n = 212) (lvls, us = 305, 398, 457, 719, 11216) > [56.176s][info][gc,stats] CMR: = 1.236 s (a = 5832 us) (n = 212) (lvls, us = 2676, 3535, 4199, 5391, 54522) > [56.176s][info][gc,stats] CMR: Thread Roots = 1.179 s (a = 5563 us) (n = 212) (lvls, us = 2441, 3242, 3945, 5156, 54288) > [56.176s][info][gc,stats] CMR: VM Strong Roots = 0.005 s (a = 23 us) (n = 212) (lvls, us = 12, 19, 21, 23, 204) > [56.176s][info][gc,stats] CMR: CLDG Roots = 0.052 s (a = 247 us) (n = 212) (lvls, us = 73, 203, 252, 293, 562) > > ... > [56.176s][info][gc,stats] Concurrent Stack Processing = 0.124 s (a = 5149 us) (n = 24) (lvls, us = 535, 607, 885, 6387, 27177) > [56.176s][info][gc,stats] Threads = 0.632 s (a = 26345 us) (n = 24) (lvls, us = 6465, 8086, 10742, 39453, 145679) > [56.176s][info][gc,stats] CT: = 0.632 s (a = 26345 us) (n = 24) (lvls, us = 6465, 8086, 10742, 39453, 145679) > > After: > [56.010s][info][gc,stats] Concurrent Mark Roots = 0.116 s (a = 587 us) (n = 198) (lvls, us = 312, 371, 400, 502, 4316) > [56.010s][info][gc,stats] CMR: = 0.931 s (a = 4703 us) (n = 198) (lvls, us = 2402, 3438, 3770, 4453, 62629) > [56.010s][info][gc,stats] CMR: Thread Roots = 0.864 s (a = 4366 us) (n = 198) (lvls, us = 1914, 3125, 3477, 4199, 54075) > [56.010s][info][gc,stats] CMR: VM Strong Roots = 0.015 s (a = 76 us) (n = 198) (lvls, us = 20, 31, 35, 38, 4693) > [56.010s][info][gc,stats] CMR: CLDG Roots = 0.052 s (a = 261 us) (n = 198) (lvls, us = 61, 172, 256, 299, 3861) > ... > [56.010s][info][gc,stats] Concurrent Stack Processing = 0.081 s (a = 3671 us) (n = 22) (lvls, us = 457, 537, 770, 3359, 24003) > [56.010s][info][gc,stats] Threads = 0.469 s (a = 21309 us) (n = 22) (lvls, us = 6016, 6855, 8711, 18945, 103939) > [56.010s][info][gc,stats] CT: = 0.469 s (a = 21309 us) (n = 22) (lvls, us = 6016, 6855, 8711, 18945, 103939) It's ok by me. I wonder if we could collapse all concurrent-FOO tasks after final-mark into a single task and benefit from even better parallelism there? ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2290 From pliden at openjdk.java.net Thu Jan 28 14:50:47 2021 From: pliden at openjdk.java.net (Per Liden) Date: Thu, 28 Jan 2021 14:50:47 GMT Subject: RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled [v4] In-Reply-To: References: Message-ID: On Thu, 28 Jan 2021 12:53:03 GMT, ?? wrote: >> https://bugs.openjdk.java.net/browse/JDK-8260473 >> >> Function "PhaseVector::expand_vunbox_node" creates a LoadNode, but forgets to make the LoadNode to pass gc barriers. >> >> >> Testing: all Vector API related tests have passed. > > ?? has updated the pull request incrementally with one additional commit since the last revision: > > Change the directory & fix the include order test/hotspot/jtreg/compiler/vectorapi/VectorReshapeTest.java line 38: > 36: * @test > 37: * @bug 8260473 > 38: * @modules jdk.incubator.vector This test should have a `@requires vm.gc.Z` tag. That will make sure it's only executed on platforms where ZGC is supported (and that ZGC is included in the build). ------------- PR: https://git.openjdk.java.net/jdk/pull/2253 From amith.pawar at gmail.com Thu Jan 28 15:10:29 2021 From: amith.pawar at gmail.com (Amit Pawar) Date: Thu, 28 Jan 2021 20:40:29 +0530 Subject: RFR: Convert old-gen single threaded pretouch to multi-threaded during In-Reply-To: <60BBB17B-235D-490C-B691-65A48BBEE129@oracle.com> References: <92A36846-F059-47A4-8AEF-086135651CED@oracle.com> <60BBB17B-235D-490C-B691-65A48BBEE129@oracle.com> Message-ID: Thanks. On Thu, Jan 28, 2021 at 2:08 AM Kim Barrett wrote: > > On Jan 27, 2021, at 11:03 AM, Amit Pawar wrote: > > On Sun, Jan 24, 2021 at 8:07 PM Kim Barrett > wrote: > >> I've recently been looking at the relevant parts of ParallelGC, and it > looks > >> like it shouldn't be too hard to allow threads waiting for expansion to > >> cooperate in any ongoing pretouch, esp. after some other recent RFEs > have > >> been dealt with. I've filed JDK-8260332 for this. I haven't looked at > the G1 > >> side of things yet. > > It will be useful if you can share those RFEs or suggest which part of > the code to refer to. > > See the linked issues for JDK-8260332. Also JDK-8259776, JDK-8259778, > and JDK-8259862, that are all somewhat precursors, or at least I plan > to deal with them first. > > -- With best regards, amit pawar From shade at openjdk.java.net Thu Jan 28 16:34:41 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 28 Jan 2021 16:34:41 GMT Subject: Integrated: 8260584: Shenandoah: simplify "Concurrent Thread Roots" logging In-Reply-To: <_tmnF3EWmNl7yricNI7zUIGGOfcA52knD6hFUGzgTmg=.de301c69-4c25-44e0-a943-a5d778d8357a@github.com> References: <_tmnF3EWmNl7yricNI7zUIGGOfcA52knD6hFUGzgTmg=.de301c69-4c25-44e0-a943-a5d778d8357a@github.com> Message-ID: On Thu, 28 Jan 2021 12:15:32 GMT, Aleksey Shipilev wrote: > There are two separate counters now: > > f(conc_thread_roots, "Concurrent Stack Processing") \ > f(conc_thread_roots_work, " Threads") \ > SHENANDOAH_PAR_PHASE_DO(conc_thread_roots_work_, " CT: ", f) \ > ...and `_work` counter is unused, and `conc_thread_roots` is used to report worker stats. So the log says ``, where `Thread Roots` should have been mentioned: > > [34.169s][info][gc,stats] Concurrent Stack Processing 11341 us, parallelism: 7.93x > [34.169s][info][gc,stats] Threads 89908 us > [34.169s][info][gc,stats] CT: 89908 us, workers (us): 11231, 11270, 11251, 11252, 11237, 11230, 11214, 11223, > > Fixed log says: > > [99.797s][info][gc,stats] Concurrent Thread Roots 3929 us, parallelism: 7.45x > [99.797s][info][gc,stats] CTR: 29273 us > [99.797s][info][gc,stats] CTR: Thread Roots 29273 us, workers (us): 3652, 3643, 3622, 3623, 3623, 3676, 3606, 3829, > > Also, I believe it should be called "Concurrent Thread Roots", in symmetry with "Concurrent Update Thread Roots" later. This pull request has now been integrated. Changeset: 1de3c554 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/1de3c554 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod 8260584: Shenandoah: simplify "Concurrent Thread Roots" logging Reviewed-by: rkennke, zgu ------------- PR: https://git.openjdk.java.net/jdk/pull/2287 From shade at openjdk.java.net Thu Jan 28 17:05:00 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 28 Jan 2021 17:05:00 GMT Subject: RFR: 8260586: Shenandoah: simplify "Concurrent Weak References" logging [v2] In-Reply-To: References: Message-ID: > Concurrent Weak References always does parallel worker operation. Therefore "Process" counter is redundant, and we might as well make the root counter the per-worker one. This simplifies GC logging. > > Old log: > > [95.220s][info][gc,stats] Concurrent Weak References 1709 us > [95.220s][info][gc,stats] Process 1588 us, parallelism: 1.30x > [95.220s][info][gc,stats] CWRF: 2056 us > [95.220s][info][gc,stats] CWRF: Weak References 2056 us, workers (us): 454, 1450, 2, 145, 4, 1, 0, 0, > > New log: > > [39.583s][info][gc,stats] Concurrent Weak References 651 us, parallelism: 1.52x > [39.583s][info][gc,stats] CWRF: 986 us > [39.583s][info][gc,stats] CWRF: Weak References 986 us, workers (us): 183, 29, 145, 627, 1, 0, 0, 0, Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into JDK-8260586-sh-log-cwr - 8260586: Shenandoah: simplify "Concurrent Weak References" logging ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2288/files - new: https://git.openjdk.java.net/jdk/pull/2288/files/8b9bf2e1..a493df7c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2288&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2288&range=00-01 Stats: 426 lines in 28 files changed: 232 ins; 64 del; 130 mod Patch: https://git.openjdk.java.net/jdk/pull/2288.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2288/head:pull/2288 PR: https://git.openjdk.java.net/jdk/pull/2288 From zgu at openjdk.java.net Thu Jan 28 17:29:44 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 28 Jan 2021 17:29:44 GMT Subject: RFR: 8260586: Shenandoah: simplify "Concurrent Weak References" logging [v2] In-Reply-To: References: Message-ID: On Thu, 28 Jan 2021 17:05:00 GMT, Aleksey Shipilev wrote: >> Concurrent Weak References always does parallel worker operation. Therefore "Process" counter is redundant, and we might as well make the root counter the per-worker one. This simplifies GC logging. >> >> Old log: >> >> [95.220s][info][gc,stats] Concurrent Weak References 1709 us >> [95.220s][info][gc,stats] Process 1588 us, parallelism: 1.30x >> [95.220s][info][gc,stats] CWRF: 2056 us >> [95.220s][info][gc,stats] CWRF: Weak References 2056 us, workers (us): 454, 1450, 2, 145, 4, 1, 0, 0, >> >> New log: >> >> [39.583s][info][gc,stats] Concurrent Weak References 651 us, parallelism: 1.52x >> [39.583s][info][gc,stats] CWRF: 986 us >> [39.583s][info][gc,stats] CWRF: Weak References 986 us, workers (us): 183, 29, 145, 627, 1, 0, 0, 0, > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into JDK-8260586-sh-log-cwr > - 8260586: Shenandoah: simplify "Concurrent Weak References" logging Looks good. ------------- Marked as reviewed by zgu (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2288 From zgu at openjdk.java.net Thu Jan 28 17:40:52 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 28 Jan 2021 17:40:52 GMT Subject: RFR: 8259404: Shenandoah: Fix time tracking in parallel_cleaning Message-ID: Please review this patch fixes timing tracking for parallel cleaning. Before: [9.844s][info][gc,stats] System Purge = 0.000 s (a = 76 us) (n = 1) (lvls, us = 76, 76, 76, 76, 76) **_<<=== looks wrong_** [9.844s][info][gc,stats] Unload Classes = 0.001 s (a = 541 us) (n = 1) (lvls, us = 541, 541, 541, 541, 541) [9.844s][info][gc,stats] Weak Roots = 0.000 s (a = 75 us) (n = 1) (lvls, us = 75, 75, 75, 75, 75) [9.844s][info][gc,stats] CLDG = 0.000 s (a = 0 us) (n = 1) (lvls, us = 0, 0, 0, 0, 0) After: [9.936s][info][gc,stats] System Purge = 0.001 s (a = 611 us) (n = 1) (lvls, us = 609, 609, 609, 609, 611) [9.936s][info][gc,stats] Unload Classes = 0.000 s (a = 475 us) (n = 1) (lvls, us = 475, 475, 475, 475, 475) [9.936s][info][gc,stats] DCU: = 0.000 s (a = 162 us) (n = 1) (lvls, us = 160, 160, 160, 160, 162) [9.936s][info][gc,stats] DCU: Code Cache Roots = 0.000 s (a = 162 us) (n = 1) (lvls, us = 160, 160, 160, 160, 162) [9.936s][info][gc,stats] Weak Roots = 0.000 s (a = 105 us) (n = 1) (lvls, us = 105, 105, 105, 105, 105) [9.936s][info][gc,stats] DWR: = 0.000 s (a = 210 us) (n = 1) (lvls, us = 209, 209, 209, 209, 210) [9.936s][info][gc,stats] DWR: VM Weak Roots = 0.000 s (a = 210 us) (n = 1) (lvls, us = 209, 209, 209, 209 ------------- Commit messages: - Merge master - Fix indentation and removed unused phase - Merge - Update - Fix indentations - init update - cleanup - JDK-8259377: init update Changes: https://git.openjdk.java.net/jdk/pull/2073/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2073&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8259404 Stats: 53 lines in 5 files changed: 19 ins; 12 del; 22 mod Patch: https://git.openjdk.java.net/jdk/pull/2073.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2073/head:pull/2073 PR: https://git.openjdk.java.net/jdk/pull/2073 From smonteith at openjdk.java.net Thu Jan 28 17:59:44 2021 From: smonteith at openjdk.java.net (Stuart Monteith) Date: Thu, 28 Jan 2021 17:59:44 GMT Subject: RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled [v4] In-Reply-To: References: Message-ID: On Thu, 28 Jan 2021 12:53:03 GMT, ?? wrote: >> https://bugs.openjdk.java.net/browse/JDK-8260473 >> >> Function "PhaseVector::expand_vunbox_node" creates a LoadNode, but forgets to make the LoadNode to pass gc barriers. >> >> >> Testing: all Vector API related tests have passed. > > ?? has updated the pull request incrementally with one additional commit since the last revision: > > Change the directory & fix the include order Looks good, but needs some editing. test/hotspot/jtreg/compiler/vectorapi/VectorReshapeTest.java line 86: > 84: System.out.println("output: "+Arrays.toString(output)); > 85: // Assert.assertEquals(expected, output); > 86: assert(expected.equals(output)); // SRDM "SRDM" are my initials. You can remove this line and replace it with the uncommented line above. I was structuring this to work outwith the jtreg framework. test/hotspot/jtreg/compiler/vectorapi/VectorReshapeTest.java line 44: > 42: */ > 43: > 44: public class VectorReshapeTest { This simply has this name as it is a cut-down version of test/jdk/jdk/incubator/vector/VectorReshapeTests.java The problem was originally intermittent, but was narrowed somewhat down to what we have here. Perhaps this could be renamed? test/hotspot/jtreg/compiler/vectorapi/VectorReshapeTest.java line 40: > 38: * @modules jdk.incubator.vector > 39: * @modules java.base/jdk.internal.vm.annotation > 40: * @run main/othervm -XX:CompileCommand=compileonly,jdk/incubator/vector/ByteVector.fromByteBuffer -XX:CompileCommand=compileonly,jdk/incubator/vector/ByteVector.fromByteBuffer restricts the compilation to a single method for diagnostic purposes. The test runs much quicker without it, and still reproduces the issue. test/hotspot/jtreg/compiler/vectorapi/VectorReshapeTest.java line 45: > 43: > 44: public class VectorReshapeTest { > 45: static final int INVOC_COUNT = Integer.getInteger("jdk.incubator.vector.test.loop-iterations", 100); The name "jdk.incubator.vector.test.loop-iterations" should probably be "jtreg.compiler.vectorapi.vectorreshapetest.loop-iterations". In addition, it should be reset to "1000" to ensure the test is compiled and executed with a chance of GCing to occur. ------------- Changes requested by smonteith (Author). PR: https://git.openjdk.java.net/jdk/pull/2253 From shade at openjdk.java.net Thu Jan 28 18:10:45 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 28 Jan 2021 18:10:45 GMT Subject: RFR: 8259404: Shenandoah: Fix time tracking in parallel_cleaning In-Reply-To: References: Message-ID: On Thu, 14 Jan 2021 01:39:02 GMT, Zhengyu Gu wrote: > Please review this patch fixes timing tracking for parallel cleaning. > > Before: > `[9.844s][info][gc,stats] System Purge = 0.000 s (a = 76 us) (n = 1) (lvls, us = 76, 76, 76, 76, 76)` **<<== looks wrong** > `[9.844s][info][gc,stats] Unload Classes = 0.001 s (a = 541 us) (n = 1) (lvls, us = 541, 541, 541, 541, 541)` > `[9.844s][info][gc,stats] Weak Roots = 0.000 s (a = 75 us) (n = 1) (lvls, us = 75, 75, 75, 75, 75)` > `[9.844s][info][gc,stats] CLDG = 0.000 s (a = 0 us) (n = 1) (lvls, us = 0, 0, 0, 0, 0)` > After: > `[9.936s][info][gc,stats] System Purge = 0.001 s (a = 611 us) (n = 1) (lvls, us = 609, 609, 609, 609, 611)` > `[9.936s][info][gc,stats] Unload Classes = 0.000 s (a = 475 us) (n = 1) (lvls, us = 475, 475, 475, 475, 475)` > `[9.936s][info][gc,stats] DCU: = 0.000 s (a = 162 us) (n = 1) (lvls, us = 160, 160, 160, 160, 162)` > `[9.936s][info][gc,stats] DCU: Code Cache Roots = 0.000 s (a = 162 us) (n = 1) (lvls, us = 160, 160, 160, 160, 162)` > `[9.936s][info][gc,stats] Weak Roots = 0.000 s (a = 105 us) (n = 1) (lvls, us = 105, 105, 105, 105, 105)` > `[9.936s][info][gc,stats] DWR: = 0.000 s (a = 210 us) (n = 1) (lvls, us = 209, 209, 209, 209, 210)` > `[9.936s][info][gc,stats] DWR: VM Weak Roots = 0.000 s (a = 210 us) (n = 1) (lvls, us = 209, 209, 209, 209)` It is okay, but I have suggestions. src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 1795: > 1793: // Unload classes and purge SystemDictionary. > 1794: { > 1795: ShenandoahPhaseTimings::Phase p = full_gc ? Please name the local variable `phase`. `p` is usually a "location" around GC code. src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 1854: > 1852: assert(SafepointSynchronize::is_at_safepoint(), "Must be at a safepoint"); > 1853: assert(is_stw_gc_in_progress(), "Only for Degenerated and Full GC"); > 1854: ShenandoahGCPhase root_phase(full_gc ? Name it `phase`? src/hotspot/share/gc/shenandoah/shenandoahParallelCleaning.cpp line 49: > 47: { > 48: ShenandoahWorkerTimingsTracker x(_phase, ShenandoahPhaseTimings::CodeCacheRoots, worker_id); > 49: _code_cache_task.work(worker_id); This does not look like "normal" code cache root operation, though, right? Consider adding another type to `SHENANDOAH_PAR_PHASE_DO` instead? I.e. `CodeCacheUnload`? src/hotspot/share/gc/shenandoah/shenandoahParallelCleaning.cpp line 56: > 54: if (_unloading_occurred) { > 55: ShenandoahWorkerTimingsTracker x(_phase, ShenandoahPhaseTimings::CLDGRoots, worker_id); > 56: _klass_cleaning_task.work(); Same thing, maybe new type of `SHENANDOAH_PAR_PHASE_DO`? src/hotspot/share/gc/shenandoah/shenandoahParallelCleaning.inline.hpp line 60: > 58: _weak_processing_task.work(worker_id, _is_alive, _keep_alive); > 59: } > 60: _dedup_roots.oops_do(_is_alive, _keep_alive, worker_id); This might need `ShenadoahWorkerTimingsTracker(... ShenandoahPhaseTimings::StringDedupTableRoots, ...)`? ------------- Changes requested by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2073 From kbarrett at openjdk.java.net Thu Jan 28 18:14:44 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 28 Jan 2021 18:14:44 GMT Subject: RFR: 8259778: Merge MutableSpace and ImmutableSpace In-Reply-To: <8uaoMsqDox6nwjHu4flmgzVPTDJ6SDjHJgGBaqOaWEE=.b8a41eb6-ad15-45ea-91b6-08b02d3b7438@github.com> References: <8uaoMsqDox6nwjHu4flmgzVPTDJ6SDjHJgGBaqOaWEE=.b8a41eb6-ad15-45ea-91b6-08b02d3b7438@github.com> Message-ID: On Thu, 28 Jan 2021 12:07:52 GMT, Thomas Schatzl wrote: >> I assume that tier1-3 includes the SA tests :) >> >> Looks good other than that nit. > > Hi David, > >> _Mailing list message from [David Holmes](mailto:david.holmes at oracle.com) on [serviceability-dev](mailto:serviceability-dev at openjdk.java.net):_ >> >> On 28/01/2021 7:09 pm, Thomas Schatzl wrote: >> >> > On Thu, 28 Jan 2021 05:13:57 GMT, David Holmes wrote: >> > > > Please review this change which merges ImmutableSpace into MutableSpace, >> > > > eliminating the former. There were no interesting uses of ImmutableSpace, >> > > > other than as the base class for MutableSpace. The name ImmutableSpace is >> > > > kind of a misnomer given that usage. >> > > > Testing: >> > > > mach5 tier1-3 >> > > >> > > src/hotspot/share/gc/parallel/mutableSpace.hpp line 47: >> > > > 45: // >> > > > 46: // Invariant: bottom() <= top() <= end() >> > > > 47: // top() is inclusive and end() is exclusive. >> > > >> > > If end() is exclusive then shouldn't the invariant be `< end()`? >> > >> > I also think that top() is also exclusive as in other collectors. >> > @dholmes-ora : e.g. bottom == top == end means the space is empty. These two lines are not disagreeing with each other. >> >> If one is exclusive and one is inclusive then I don't see how they can >> be equal, as that implies they are then both inclusive and exclusive at >> the same time. ?? If end() is exclusive then I would expect an empty >> space to be one where bottom and end are adjacent, not coincident. >> >> Cheers, >> David > > The original comment about top() being inclusive is wrong. top() is also exclusive like in all other collectors as stated elsewhere in my review comment. My "also" in "I also think that top() is also exclusive as in other collectors." probably threw you off after re-reading it, which is wrong. Sorry. > > Maybe some examples help: > > bottom = 200, top = 200, end = 200 is an "empty" space (i.e. is of size zero). Whether that empty space is "free" or "fully allocated" or both or neither is another question :) > > bottom = 200, top = 200, end = 201 contains one word and is (completely) free (not allocated into at all). > > bottom = 200, top = 201, end = 201 contains one word and is full(y allocated). > > Top/end are exclusive, and bottom inclusive as does the code assume from what I can tell by quickly looking at it. Still the invariant is bottom <= top <= end in all cases. > > Thanks, > Thomas What @tschatzl said. The comment saying `top()` is inclusive is simply wrong. I'll fix it before integrating. ------------- PR: https://git.openjdk.java.net/jdk/pull/2271 From shade at openjdk.java.net Thu Jan 28 19:07:42 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 28 Jan 2021 19:07:42 GMT Subject: Integrated: 8260586: Shenandoah: simplify "Concurrent Weak References" logging In-Reply-To: References: Message-ID: <2C0ix6hcOaQ7AitU9NcCG_hKqHfLvE-95s5XR_4YWJI=.0b34752f-ee2a-4f22-bf6b-7e5198a27e8e@github.com> On Thu, 28 Jan 2021 12:36:58 GMT, Aleksey Shipilev wrote: > Concurrent Weak References always does parallel worker operation. Therefore "Process" counter is redundant, and we might as well make the root counter the per-worker one. This simplifies GC logging. > > Old log: > > [95.220s][info][gc,stats] Concurrent Weak References 1709 us > [95.220s][info][gc,stats] Process 1588 us, parallelism: 1.30x > [95.220s][info][gc,stats] CWRF: 2056 us > [95.220s][info][gc,stats] CWRF: Weak References 2056 us, workers (us): 454, 1450, 2, 145, 4, 1, 0, 0, > > New log: > > [39.583s][info][gc,stats] Concurrent Weak References 651 us, parallelism: 1.52x > [39.583s][info][gc,stats] CWRF: 986 us > [39.583s][info][gc,stats] CWRF: Weak References 986 us, workers (us): 183, 29, 145, 627, 1, 0, 0, 0, This pull request has now been integrated. Changeset: 71128cf4 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/71128cf4 Stats: 6 lines in 3 files changed: 0 ins; 2 del; 4 mod 8260586: Shenandoah: simplify "Concurrent Weak References" logging Reviewed-by: rkennke, zgu ------------- PR: https://git.openjdk.java.net/jdk/pull/2288 From zgu at openjdk.java.net Thu Jan 28 19:12:54 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 28 Jan 2021 19:12:54 GMT Subject: RFR: 8259404: Shenandoah: Fix time tracking in parallel_cleaning [v2] In-Reply-To: References: Message-ID: > Please review this patch fixes timing tracking for parallel cleaning. > > Before: > `[9.844s][info][gc,stats] System Purge = 0.000 s (a = 76 us) (n = 1) (lvls, us = 76, 76, 76, 76, 76)` **<<== looks wrong** > `[9.844s][info][gc,stats] Unload Classes = 0.001 s (a = 541 us) (n = 1) (lvls, us = 541, 541, 541, 541, 541)` > `[9.844s][info][gc,stats] Weak Roots = 0.000 s (a = 75 us) (n = 1) (lvls, us = 75, 75, 75, 75, 75)` > `[9.844s][info][gc,stats] CLDG = 0.000 s (a = 0 us) (n = 1) (lvls, us = 0, 0, 0, 0, 0)` > After: > `[9.936s][info][gc,stats] System Purge = 0.001 s (a = 611 us) (n = 1) (lvls, us = 609, 609, 609, 609, 611)` > `[9.936s][info][gc,stats] Unload Classes = 0.000 s (a = 475 us) (n = 1) (lvls, us = 475, 475, 475, 475, 475)` > `[9.936s][info][gc,stats] DCU: = 0.000 s (a = 162 us) (n = 1) (lvls, us = 160, 160, 160, 160, 162)` > `[9.936s][info][gc,stats] DCU: Code Cache Roots = 0.000 s (a = 162 us) (n = 1) (lvls, us = 160, 160, 160, 160, 162)` > `[9.936s][info][gc,stats] Weak Roots = 0.000 s (a = 105 us) (n = 1) (lvls, us = 105, 105, 105, 105, 105)` > `[9.936s][info][gc,stats] DWR: = 0.000 s (a = 210 us) (n = 1) (lvls, us = 209, 209, 209, 209, 210)` > `[9.936s][info][gc,stats] DWR: VM Weak Roots = 0.000 s (a = 210 us) (n = 1) (lvls, us = 209, 209, 209, 209)` Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: Aleksey's comments ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2073/files - new: https://git.openjdk.java.net/jdk/pull/2073/files/3ae33066..33be84c9 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2073&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2073&range=00-01 Stats: 21 lines in 3 files changed: 6 ins; 4 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/2073.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2073/head:pull/2073 PR: https://git.openjdk.java.net/jdk/pull/2073 From shade at openjdk.java.net Thu Jan 28 19:12:54 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 28 Jan 2021 19:12:54 GMT Subject: RFR: 8259404: Shenandoah: Fix time tracking in parallel_cleaning [v2] In-Reply-To: References: Message-ID: On Thu, 28 Jan 2021 19:09:49 GMT, Zhengyu Gu wrote: >> Please review this patch fixes timing tracking for parallel cleaning. >> >> Before: >> `[9.844s][info][gc,stats] System Purge = 0.000 s (a = 76 us) (n = 1) (lvls, us = 76, 76, 76, 76, 76)` **<<== looks wrong** >> `[9.844s][info][gc,stats] Unload Classes = 0.001 s (a = 541 us) (n = 1) (lvls, us = 541, 541, 541, 541, 541)` >> `[9.844s][info][gc,stats] Weak Roots = 0.000 s (a = 75 us) (n = 1) (lvls, us = 75, 75, 75, 75, 75)` >> `[9.844s][info][gc,stats] CLDG = 0.000 s (a = 0 us) (n = 1) (lvls, us = 0, 0, 0, 0, 0)` >> After: >> `[9.936s][info][gc,stats] System Purge = 0.001 s (a = 611 us) (n = 1) (lvls, us = 609, 609, 609, 609, 611)` >> `[9.936s][info][gc,stats] Unload Classes = 0.000 s (a = 475 us) (n = 1) (lvls, us = 475, 475, 475, 475, 475)` >> `[9.936s][info][gc,stats] DCU: = 0.000 s (a = 162 us) (n = 1) (lvls, us = 160, 160, 160, 160, 162)` >> `[9.936s][info][gc,stats] DCU: Code Cache Roots = 0.000 s (a = 162 us) (n = 1) (lvls, us = 160, 160, 160, 160, 162)` >> `[9.936s][info][gc,stats] Weak Roots = 0.000 s (a = 105 us) (n = 1) (lvls, us = 105, 105, 105, 105, 105)` >> `[9.936s][info][gc,stats] DWR: = 0.000 s (a = 210 us) (n = 1) (lvls, us = 209, 209, 209, 209, 210)` >> `[9.936s][info][gc,stats] DWR: VM Weak Roots = 0.000 s (a = 210 us) (n = 1) (lvls, us = 209, 209, 209, 209)` > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > Aleksey's comments Marked as reviewed by shade (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2073 From vlivanov at openjdk.java.net Thu Jan 28 21:39:43 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Thu, 28 Jan 2021 21:39:43 GMT Subject: RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled [v4] In-Reply-To: References: Message-ID: On Thu, 28 Jan 2021 17:56:39 GMT, Stuart Monteith wrote: >> ?? has updated the pull request incrementally with one additional commit since the last revision: >> >> Change the directory & fix the include order > > Looks good, but needs some editing. > ArrayCopyNode::load performs the same work as it does here in PhaseVector::optimize_vector_boxes . > Is there a need to provide a similar function in PhaseVector or GraphKit? My point is since PhaseVector effectively enters the parsing phase (by signaling about the possibility of post-parse inlining), technically I don't see why `GraphKit::access_load_at` won't work. But I need to spend more time looking into the details. So far, I took a look at the review thread of 8212243 (which introduced `ArrayCopyNode::load`) and found the following discussion between Roland and Erik: https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-October/030971.html > ... Also it beats me that this is strictly speaking a load barrier for loads performed in > arraycopy. Would it be possible to use something like access_load_at instead? ... ... GraphKit is a parse time only thing. So the existing gc interface doesn't offer any way to add barriers once parsing is over. This code runs after parsing in optimization phases. ... Considering `PhaseVector::optimize_vector_boxes()` already has access to a usable `GraphKit` instance, it is possible that `GraphKit::access_load_at` will "just work". ------------- PR: https://git.openjdk.java.net/jdk/pull/2253 From github.com+25214855+casparcwang at openjdk.java.net Fri Jan 29 02:19:06 2021 From: github.com+25214855+casparcwang at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Fri, 29 Jan 2021 02:19:06 GMT Subject: RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled [v5] In-Reply-To: References: Message-ID: > https://bugs.openjdk.java.net/browse/JDK-8260473 > > Function "PhaseVector::expand_vunbox_node" creates a LoadNode, but forgets to make the LoadNode to pass gc barriers. > > > Testing: all Vector API related tests have passed. ?? has updated the pull request incrementally with one additional commit since the last revision: Several modification: 1, require Z gc 2, Use testng Assert 3, Rename the test 4, get the right INVOC_COUNT ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2253/files - new: https://git.openjdk.java.net/jdk/pull/2253/files/d52fd4c4..da06b0aa Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2253&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2253&range=03-04 Stats: 293 lines in 2 files changed: 162 ins; 131 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2253.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2253/head:pull/2253 PR: https://git.openjdk.java.net/jdk/pull/2253 From github.com+25214855+casparcwang at openjdk.java.net Fri Jan 29 02:19:07 2021 From: github.com+25214855+casparcwang at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Fri, 29 Jan 2021 02:19:07 GMT Subject: RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled [v4] In-Reply-To: References: Message-ID: <2lCFUfFm21gM4ut27Uz_mYoh7GqajJFaDub5w2Y9WXo=.58895381-1f9c-4122-be83-36d3a2c3ffa3@github.com> On Thu, 28 Jan 2021 14:47:50 GMT, Per Liden wrote: >> ?? has updated the pull request incrementally with one additional commit since the last revision: >> >> Change the directory & fix the include order > > test/hotspot/jtreg/compiler/vectorapi/VectorReshapeTest.java line 38: > >> 36: * @test >> 37: * @bug 8260473 >> 38: * @modules jdk.incubator.vector > > This test should have a `@requires vm.gc.Z` tag. That will make sure it's only executed on platforms where ZGC is supported (and that ZGC is included in the build). done ------------- PR: https://git.openjdk.java.net/jdk/pull/2253 From github.com+25214855+casparcwang at openjdk.java.net Fri Jan 29 02:19:09 2021 From: github.com+25214855+casparcwang at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Fri, 29 Jan 2021 02:19:09 GMT Subject: RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled [v4] In-Reply-To: References: Message-ID: On Thu, 28 Jan 2021 17:09:16 GMT, Stuart Monteith wrote: >> ?? has updated the pull request incrementally with one additional commit since the last revision: >> >> Change the directory & fix the include order > > test/hotspot/jtreg/compiler/vectorapi/VectorReshapeTest.java line 86: > >> 84: System.out.println("output: "+Arrays.toString(output)); >> 85: // Assert.assertEquals(expected, output); >> 86: assert(expected.equals(output)); // SRDM > > "SRDM" are my initials. You can remove this line and replace it with the uncommented line above. > I was structuring this to work outwith the jtreg framework. done > test/hotspot/jtreg/compiler/vectorapi/VectorReshapeTest.java line 44: > >> 42: */ >> 43: >> 44: public class VectorReshapeTest { > > This simply has this name as it is a cut-down version of test/jdk/jdk/incubator/vector/VectorReshapeTests.java > The problem was originally intermittent, but was narrowed somewhat down to what we have here. Perhaps this could be renamed? The test has changed to VectorRebracket128Test.java > test/hotspot/jtreg/compiler/vectorapi/VectorReshapeTest.java line 40: > >> 38: * @modules jdk.incubator.vector >> 39: * @modules java.base/jdk.internal.vm.annotation >> 40: * @run main/othervm -XX:CompileCommand=compileonly,jdk/incubator/vector/ByteVector.fromByteBuffer > > -XX:CompileCommand=compileonly,jdk/incubator/vector/ByteVector.fromByteBuffer restricts the compilation to a single method for diagnostic purposes. The test runs much quicker without it, and still reproduces the issue. The test is changed to 'testng' mode, remove option compileonly will make the test pass the assert in jtreg test framework. But add the option will make it fail the assert. So the option is left unchanged. > test/hotspot/jtreg/compiler/vectorapi/VectorReshapeTest.java line 45: > >> 43: >> 44: public class VectorReshapeTest { >> 45: static final int INVOC_COUNT = Integer.getInteger("jdk.incubator.vector.test.loop-iterations", 100); > > The name "jdk.incubator.vector.test.loop-iterations" should probably be "jtreg.compiler.vectorapi.vectorreshapetest.loop-iterations". > In addition, it should be reset to "1000" to ensure the test is compiled and executed with a chance of GCing to occur. done ------------- PR: https://git.openjdk.java.net/jdk/pull/2253 From github.com+25214855+casparcwang at openjdk.java.net Fri Jan 29 02:27:01 2021 From: github.com+25214855+casparcwang at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Fri, 29 Jan 2021 02:27:01 GMT Subject: RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled [v6] In-Reply-To: References: Message-ID: <5j0GBfBaAXcZboEk5DKBBIsbV2U77X7zQSxjwhmLp7c=.b6ee34c1-15f7-4034-b796-ead45f91f230@github.com> > https://bugs.openjdk.java.net/browse/JDK-8260473 > > Function "PhaseVector::expand_vunbox_node" creates a LoadNode, but forgets to make the LoadNode to pass gc barriers. > > > Testing: all Vector API related tests have passed. ?? has updated the pull request incrementally with one additional commit since the last revision: Remove redundent empty line & import force inline ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2253/files - new: https://git.openjdk.java.net/jdk/pull/2253/files/da06b0aa..db0e596d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2253&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2253&range=04-05 Stats: 2 lines in 1 file changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2253.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2253/head:pull/2253 PR: https://git.openjdk.java.net/jdk/pull/2253 From kbarrett at openjdk.java.net Fri Jan 29 03:37:59 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 29 Jan 2021 03:37:59 GMT Subject: RFR: 8259778: Merge MutableSpace and ImmutableSpace [v2] In-Reply-To: References: Message-ID: > Please review this change which merges ImmutableSpace into MutableSpace, > eliminating the former. There were no interesting uses of ImmutableSpace, > other than as the base class for MutableSpace. The name ImmutableSpace is > kind of a misnomer given that usage. > > Testing: > mach5 tier1-3 Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: fix comment ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2271/files - new: https://git.openjdk.java.net/jdk/pull/2271/files/e1d95459..c957962b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2271&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2271&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2271.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2271/head:pull/2271 PR: https://git.openjdk.java.net/jdk/pull/2271 From kbarrett at openjdk.java.net Fri Jan 29 03:38:41 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 29 Jan 2021 03:38:41 GMT Subject: RFR: 8259487: Remove unused StarTask In-Reply-To: References: Message-ID: On Thu, 28 Jan 2021 04:27:29 GMT, Ioi Lam wrote: >> Please review this change which removes the StarTask class. It was >> superseded by ScannerTask in JDK-8244684 and JDK-8245022, and is no longer >> used. >> >> Testing: >> mach5 tier1 > > LGTM Thanks @iklam and @tschatzl for reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/2277 From kbarrett at openjdk.java.net Fri Jan 29 03:38:42 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 29 Jan 2021 03:38:42 GMT Subject: Integrated: 8259487: Remove unused StarTask In-Reply-To: References: Message-ID: On Thu, 28 Jan 2021 04:17:02 GMT, Kim Barrett wrote: > Please review this change which removes the StarTask class. It was > superseded by ScannerTask in JDK-8244684 and JDK-8245022, and is no longer > used. > > Testing: > mach5 tier1 This pull request has now been integrated. Changeset: 251c6419 Author: Kim Barrett URL: https://git.openjdk.java.net/jdk/commit/251c6419 Stats: 31 lines in 1 file changed: 0 ins; 30 del; 1 mod 8259487: Remove unused StarTask Reviewed-by: iklam, tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/2277 From xgong at openjdk.java.net Fri Jan 29 03:43:41 2021 From: xgong at openjdk.java.net (Xiaohong Gong) Date: Fri, 29 Jan 2021 03:43:41 GMT Subject: RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled [v2] In-Reply-To: References: Message-ID: On Thu, 28 Jan 2021 12:19:07 GMT, ?? wrote: >> https://bugs.openjdk.java.net/browse/JDK-8260473 >> >> Function "PhaseVector::expand_vunbox_node" creates a LoadNode, but forgets to make the LoadNode to pass gc barriers. >> >> >> Testing: all Vector API related tests have passed. > > ?? has updated the pull request incrementally with one additional commit since the last revision: > > Add the regression test test/jdk/jdk/incubator/vector/VectorReshapeTest.java line 2: > 1: /* > 2: * Copyright (c) 2020, 2021, Oracle and/or its affiliates. All rights reserved. The Copyright should be " Copyright (c) 2021," since it's a new file added in 2021. ------------- PR: https://git.openjdk.java.net/jdk/pull/2253 From kbarrett at openjdk.java.net Fri Jan 29 03:52:58 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 29 Jan 2021 03:52:58 GMT Subject: RFR: 8259778: Merge MutableSpace and ImmutableSpace [v3] In-Reply-To: <8uaoMsqDox6nwjHu4flmgzVPTDJ6SDjHJgGBaqOaWEE=.b8a41eb6-ad15-45ea-91b6-08b02d3b7438@github.com> References: <8uaoMsqDox6nwjHu4flmgzVPTDJ6SDjHJgGBaqOaWEE=.b8a41eb6-ad15-45ea-91b6-08b02d3b7438@github.com> Message-ID: On Thu, 28 Jan 2021 12:07:52 GMT, Thomas Schatzl wrote: >> I assume that tier1-3 includes the SA tests :) >> >> Looks good other than that nit. > > Hi David, > >> _Mailing list message from [David Holmes](mailto:david.holmes at oracle.com) on [serviceability-dev](mailto:serviceability-dev at openjdk.java.net):_ >> >> On 28/01/2021 7:09 pm, Thomas Schatzl wrote: >> >> > On Thu, 28 Jan 2021 05:13:57 GMT, David Holmes wrote: >> > > > Please review this change which merges ImmutableSpace into MutableSpace, >> > > > eliminating the former. There were no interesting uses of ImmutableSpace, >> > > > other than as the base class for MutableSpace. The name ImmutableSpace is >> > > > kind of a misnomer given that usage. >> > > > Testing: >> > > > mach5 tier1-3 >> > > >> > > src/hotspot/share/gc/parallel/mutableSpace.hpp line 47: >> > > > 45: // >> > > > 46: // Invariant: bottom() <= top() <= end() >> > > > 47: // top() is inclusive and end() is exclusive. >> > > >> > > If end() is exclusive then shouldn't the invariant be `< end()`? >> > >> > I also think that top() is also exclusive as in other collectors. >> > @dholmes-ora : e.g. bottom == top == end means the space is empty. These two lines are not disagreeing with each other. >> >> If one is exclusive and one is inclusive then I don't see how they can >> be equal, as that implies they are then both inclusive and exclusive at >> the same time. ?? If end() is exclusive then I would expect an empty >> space to be one where bottom and end are adjacent, not coincident. >> >> Cheers, >> David > > The original comment about top() being inclusive is wrong. top() is also exclusive like in all other collectors as stated elsewhere in my review comment. My "also" in "I also think that top() is also exclusive as in other collectors." probably threw you off after re-reading it, which is wrong. Sorry. > > Maybe some examples help: > > bottom = 200, top = 200, end = 200 is an "empty" space (i.e. is of size zero). Whether that empty space is "free" or "fully allocated" or both or neither is another question :) > > bottom = 200, top = 200, end = 201 contains one word and is (completely) free (not allocated into at all). > > bottom = 200, top = 201, end = 201 contains one word and is full(y allocated). > > Top/end are exclusive, and bottom inclusive as does the code assume from what I can tell by quickly looking at it. Still the invariant is bottom <= top <= end in all cases. > > Thanks, > Thomas Thanks @tschatzl , @sspitsyn , and @dholmes-ora for reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/2271 From kbarrett at openjdk.java.net Fri Jan 29 03:52:57 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 29 Jan 2021 03:52:57 GMT Subject: RFR: 8259778: Merge MutableSpace and ImmutableSpace [v3] In-Reply-To: References: Message-ID: > Please review this change which merges ImmutableSpace into MutableSpace, > eliminating the former. There were no interesting uses of ImmutableSpace, > other than as the base class for MutableSpace. The name ImmutableSpace is > kind of a misnomer given that usage. > > Testing: > mach5 tier1-3 Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into merge_spaces - fix comment - remove immutablespace ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2271/files - new: https://git.openjdk.java.net/jdk/pull/2271/files/c957962b..56d5d989 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2271&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2271&range=01-02 Stats: 14767 lines in 242 files changed: 2780 ins; 3810 del; 8177 mod Patch: https://git.openjdk.java.net/jdk/pull/2271.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2271/head:pull/2271 PR: https://git.openjdk.java.net/jdk/pull/2271 From kbarrett at openjdk.java.net Fri Jan 29 03:52:58 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 29 Jan 2021 03:52:58 GMT Subject: Integrated: 8259778: Merge MutableSpace and ImmutableSpace In-Reply-To: References: Message-ID: <9kxw45RFUwH335HVpBLzUlQWyBZkl5n4MqwXz5_vT8Y=.fa80db05-1492-4c18-8b27-f888136d1638@github.com> On Wed, 27 Jan 2021 23:06:41 GMT, Kim Barrett wrote: > Please review this change which merges ImmutableSpace into MutableSpace, > eliminating the former. There were no interesting uses of ImmutableSpace, > other than as the base class for MutableSpace. The name ImmutableSpace is > kind of a misnomer given that usage. > > Testing: > mach5 tier1-3 This pull request has now been integrated. Changeset: ea2c4474 Author: Kim Barrett URL: https://git.openjdk.java.net/jdk/commit/ea2c4474 Stats: 326 lines in 8 files changed: 51 ins; 258 del; 17 mod 8259778: Merge MutableSpace and ImmutableSpace Reviewed-by: sspitsyn, dholmes, tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/2271 From github.com+25214855+casparcwang at openjdk.java.net Fri Jan 29 04:05:42 2021 From: github.com+25214855+casparcwang at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Fri, 29 Jan 2021 04:05:42 GMT Subject: RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled [v4] In-Reply-To: References: Message-ID: On Thu, 28 Jan 2021 21:36:39 GMT, Vladimir Ivanov wrote: > > ArrayCopyNode::load performs the same work as it does here in PhaseVector::optimize_vector_boxes . > > Is there a need to provide a similar function in PhaseVector or GraphKit? > > My point is since PhaseVector effectively enters the parsing phase (by signaling about the possibility of post-parse inlining), technically I don't see why `GraphKit::access_load_at` won't work. But I need to spend more time looking into the details. > > So far, I took a look at the review thread of 8212243 (which introduced `ArrayCopyNode::load`) and found the following discussion between Roland and Erik: > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-October/030971.html > > ``` > > ... Also it beats me that this is strictly speaking a load barrier for loads performed in > > arraycopy. Would it be possible to use something like access_load_at instead? ... > ... > GraphKit is a parse time only thing. So the existing gc interface > doesn't offer any way to add barriers once parsing is over. This code > runs after parsing in optimization phases. > ... > ``` > > Considering `PhaseVector::optimize_vector_boxes()` already has access to a usable `GraphKit` instance, it is possible that `GraphKit::access_load_at` will "just work". `kit.access_load_at` can be called here and make the test pass, but will lead to another test(test/hotspot/jtreg/compiler/vectorapi/VectorReinterpretTest.java) fail, I'm trying to find out why. diff --git a/src/hotspot/share/opto/vector.cpp b/src/hotspot/share/opto/vector.cpp index 671083e..69c00c5 100644 --- a/src/hotspot/share/opto/vector.cpp +++ b/src/hotspot/share/opto/vector.cpp @@ -414,17 +414,12 @@ void PhaseVector::expand_vunbox_node(VectorUnboxNode* vec_unbox) { Node* mem = vec_unbox->mem(); Node* ctrl = vec_unbox->in(0); - Node* vec_field_ld; - { - DecoratorSet decorators = C2_READ_ACCESS | C2_CONTROL_DEPENDENT_LOAD | IN_HEAP; - C2AccessValuePtr addr(vec_adr, vec_adr->bottom_type()->is_ptr()); - MergeMemNode* local_mem = MergeMemNode::make(mem); - gvn.record_for_igvn(local_mem); - BarrierSetC2* bs = BarrierSet::barrier_set()->barrier_set_c2(); - C2OptAccess access(gvn, ctrl, local_mem, decorators, T_OBJECT, obj, addr); - const Type* type = TypeOopPtr::make_from_klass(field->type()->as_klass()); - vec_field_ld = bs->load_at(access, type); - } + Node* vec_field_ld = kit.access_load_at(obj, + vec_adr, + vec_adr->bottom_type()->is_ptr(), + TypeOopPtr::make_from_klass(field->type()->as_klass()), + T_OBJECT, + C2_READ_ACCESS | C2_CONTROL_DEPENDENT_LOAD | IN_HEAP); // For proper aliasing, attach concrete payload type. ciKlass* payload_klass = ciTypeArrayKlass::make(bt); ------------- PR: https://git.openjdk.java.net/jdk/pull/2253 From github.com+25214855+casparcwang at openjdk.java.net Fri Jan 29 04:14:03 2021 From: github.com+25214855+casparcwang at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Fri, 29 Jan 2021 04:14:03 GMT Subject: RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled [v7] In-Reply-To: References: Message-ID: > https://bugs.openjdk.java.net/browse/JDK-8260473 > > Function "PhaseVector::expand_vunbox_node" creates a LoadNode, but forgets to make the LoadNode to pass gc barriers. > > > Testing: all Vector API related tests have passed. ?? has updated the pull request incrementally with one additional commit since the last revision: Fix the copyright date ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2253/files - new: https://git.openjdk.java.net/jdk/pull/2253/files/db0e596d..33039344 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2253&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2253&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2253.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2253/head:pull/2253 PR: https://git.openjdk.java.net/jdk/pull/2253 From github.com+25214855+casparcwang at openjdk.java.net Fri Jan 29 04:14:04 2021 From: github.com+25214855+casparcwang at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Fri, 29 Jan 2021 04:14:04 GMT Subject: RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled [v2] In-Reply-To: References: Message-ID: <4LoMwq9yY7iVtvAEGd3f-gdXt2buzKF2WpyY6YWUGBE=.665be558-2df1-422c-b9a1-ae4f0677f92f@github.com> On Fri, 29 Jan 2021 03:40:22 GMT, Xiaohong Gong wrote: > The Copyright should be " Copyright (c) 2021," since it's a new file added in 2021. done ------------- PR: https://git.openjdk.java.net/jdk/pull/2253 From github.com+25214855+casparcwang at openjdk.java.net Fri Jan 29 06:51:43 2021 From: github.com+25214855+casparcwang at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Fri, 29 Jan 2021 06:51:43 GMT Subject: RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled [v4] In-Reply-To: References: Message-ID: On Fri, 29 Jan 2021 04:02:34 GMT, ?? wrote: > > ArrayCopyNode::load performs the same work as it does here in PhaseVector::optimize_vector_boxes . > > Is there a need to provide a similar function in PhaseVector or GraphKit? > > My point is since PhaseVector effectively enters the parsing phase (by signaling about the possibility of post-parse inlining), technically I don't see why `GraphKit::access_load_at` won't work. But I need to spend more time looking into the details. > > So far, I took a look at the review thread of 8212243 (which introduced `ArrayCopyNode::load`) and found the following discussion between Roland and Erik: > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-October/030971.html > > ``` > > ... Also it beats me that this is strictly speaking a load barrier for loads performed in > > arraycopy. Would it be possible to use something like access_load_at instead? ... > ... > GraphKit is a parse time only thing. So the existing gc interface > doesn't offer any way to add barriers once parsing is over. This code > runs after parsing in optimization phases. > ... > ``` > > Considering `PhaseVector::optimize_vector_boxes()` already has access to a usable `GraphKit` instance, it is possible that `GraphKit::access_load_at` will "just work". As far as I can see, during the parse phase, GraphKit contains the jvm state info which can be used to get the control and memory for creating new nodes. But during optimization, the jvm state info may be missing like the situation in `PhaseVector::optimize_vector_boxes` or Macro Expansion. So it should use C2OptAccess to create the Load Node directly by providing control and memory nodes. I think a similar api like `GraphKit::access_load_at ` should be provided for usage during optimization stages, but where should the API be placed? GraphKit or PhaseIterGVN or somewhere else? ------------- PR: https://git.openjdk.java.net/jdk/pull/2253 From qingfeng.yy at alibaba-inc.com Fri Jan 29 08:06:32 2021 From: qingfeng.yy at alibaba-inc.com (Yang Yi) Date: Fri, 29 Jan 2021 16:06:32 +0800 Subject: =?UTF-8?B?QnVpbGQgZmFpbHMgd2hlbiBleGNsdWRpbmcgU2VyaWFsIEdD?= Message-ID: <9912a911-53a3-4abd-941c-4a22bbd647d3.qingfeng.yy@alibaba-inc.com> Hi, It's quite easy to reproduce this problem: ./configure --with-jvm-features=-serialgc ... ; make images I got the following output ``` ... === Output from failing command(s) repeated here === * For target hotspot_variant-server_libjvm_objs_genCollectedHeap.o: /home/qingfeng.yy/openjdk16_so_warning/jdk/src/hotspot/share/gc/shared/genCollectedHeap.cpp: In member function 'virtual void GenCollectedHeap::post_initialize()': /home/qingfeng.yy/openjdk16_so_warning/jdk/src/hotspot/share/gc/shared/genCollectedHeap.cpp:206:3: error: 'MarkSweep' has not been declared 206 | MarkSweep::initialize(); | ^~~~~~~~~ * All command lines available in /home/qingfeng.yy/openjdk16_so_warning/jdk/build/linux-x86_64-server-release/make-support/failure-logs. === End of repeated output === ``` I found current JVM features contain the serial gc, but actually I can not build an image that does not contain serial gc. This problem has existed from jdk 11 to jdk head. I am somewhat surprised, so I haven't filed an issue on JBS. Is this really a bug? Or actually we should revise the building document and remove all INCLUDE_SERIALGC macros? Cheers,Yang Yi From eosterlund at openjdk.java.net Fri Jan 29 08:26:42 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 29 Jan 2021 08:26:42 GMT Subject: RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled [v4] In-Reply-To: References: Message-ID: <6nZPJh_IZbeLrS2D1lrwq7NIIry0zGQ8EzAXD6fkSrE=.4b476693-5877-434e-9e97-b26f73870e33@github.com> On Fri, 29 Jan 2021 06:48:36 GMT, ?? wrote: > Considering `PhaseVector::optimize_vector_boxes()` already has access to a usable `GraphKit` instance, it is possible that `GraphKit::access_load_at` will "just work". The main thing to make sure you get right, is the aliasing. I'm not sure that will work right after parsing, the way it works now. ------------- PR: https://git.openjdk.java.net/jdk/pull/2253 From shade at openjdk.java.net Fri Jan 29 08:26:51 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 29 Jan 2021 08:26:51 GMT Subject: [jdk16] RFR: 8260632: Build failures after JDK-8253353 Message-ID: [JDK-8253353](https://bugs.openjdk.java.net/browse/JDK-8253353) changed the field to `uint16_t`, and now `shenadoahSupport.cpp` code runs into ambiguity choosing between `uint8_t` and `uint16_t` when instantiating `MAX2` macro: max_depth = MAX2(max_depth, lpt->_nest); ^ In file included from /home/buildbot/worker/build-jdk16u-linux/build/src/hotspot/share/metaprogramming/primitiveConversions.hpp:30:0, from /home/buildbot/worker/build-jdk16u-linux/build/src/hotspot/share/oops/oopHandle.hpp:28, from /home/buildbot/worker/build-jdk16u-linux/build/src/hotspot/share/classfile/systemDictionary.hpp:28, from /home/buildbot/worker/build-jdk16u-linux/build/src/hotspot/share/classfile/javaClasses.hpp:28, from /home/buildbot/worker/build-jdk16u-linux/build/src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp:27: ------------- Commit messages: - 8260632: Build failures after JDK-8253353 Changes: https://git.openjdk.java.net/jdk16/pull/138/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk16&pr=138&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260632 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk16/pull/138.diff Fetch: git fetch https://git.openjdk.java.net/jdk16 pull/138/head:pull/138 PR: https://git.openjdk.java.net/jdk16/pull/138 From kbarrett at openjdk.java.net Fri Jan 29 08:33:49 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 29 Jan 2021 08:33:49 GMT Subject: RFR: 8260044: Parallel GC: Concurrent allocation after heap expansion may cause unnecessary full gc Message-ID: Please review this change to ParallelGC to avoid unnecessary full GCs when concurrent threads attempt oldgen allocations during evacuation. When a GC thread fails an oldgen allocation it expands the heap and retries the allocation. If the second allocation attempt fails then allocation failure is reported to the caller, which can lead to a full GC. But the retried allocation could fail because, after expansion, some other thread allocated enough of the available space that the retry fails. This can happen even though there is plenty of space available, if only that retry were to perform another expansion. Rather than trying to combine the allocation retry with the expansion (it's not clear there's a way to do so without breaking invariants), we instead simply loop on the allocation attempt + expand, until either the allocation succeeds or the expand fails. If some other thread "steals" space from the expanding thread and causes its next allocation attempt to fail and do another expansion, that's functionally no different from the expanding thread succeeding and causing the other thread to fail allocation and do the expand instead. This change includes modifying PSOldGen::expand_to_reserved to return false when there is no space available, where it previously returned true. It's not clear why it returned true; that seems wrong, but was harmless. But it must not do so with the new looping behavior for allocation, else it would never terminate. Testing: mach5 tier1-3, tier5 (tier2-3, 5 do a lot of ParallelGC testing) ------------- Commit messages: - retry failed allocation if expand succeeds Changes: https://git.openjdk.java.net/jdk/pull/2309/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2309&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260044 Stats: 15 lines in 2 files changed: 5 ins; 0 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/2309.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2309/head:pull/2309 PR: https://git.openjdk.java.net/jdk/pull/2309 From stuefe at openjdk.java.net Fri Jan 29 08:38:42 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Fri, 29 Jan 2021 08:38:42 GMT Subject: [jdk16] RFR: 8260632: Build failures after JDK-8253353 In-Reply-To: References: Message-ID: On Fri, 29 Jan 2021 08:20:10 GMT, Aleksey Shipilev wrote: > [JDK-8253353](https://bugs.openjdk.java.net/browse/JDK-8253353) changed the field to `uint16_t`, and now `shenadoahSupport.cpp` code runs into ambiguity choosing between `uint8_t` and `uint16_t` when instantiating `MAX2` macro: > > > max_depth = MAX2(max_depth, lpt->_nest); > ^ > In file included from /home/buildbot/worker/build-jdk16u-linux/build/src/hotspot/share/metaprogramming/primitiveConversions.hpp:30:0, > from /home/buildbot/worker/build-jdk16u-linux/build/src/hotspot/share/oops/oopHandle.hpp:28, > from /home/buildbot/worker/build-jdk16u-linux/build/src/hotspot/share/classfile/systemDictionary.hpp:28, > from /home/buildbot/worker/build-jdk16u-linux/build/src/hotspot/share/classfile/javaClasses.hpp:28, > from /home/buildbot/worker/build-jdk16u-linux/build/src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp:27: Seems fine and trivial to me. ..Thomas ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.java.net/jdk16/pull/138 From thartmann at openjdk.java.net Fri Jan 29 08:53:41 2021 From: thartmann at openjdk.java.net (Tobias Hartmann) Date: Fri, 29 Jan 2021 08:53:41 GMT Subject: [jdk16] RFR: 8260632: Build failures after JDK-8253353 In-Reply-To: References: Message-ID: On Fri, 29 Jan 2021 08:20:10 GMT, Aleksey Shipilev wrote: > [JDK-8253353](https://bugs.openjdk.java.net/browse/JDK-8253353) changed the field to `uint16_t`, and now `shenadoahSupport.cpp` code runs into ambiguity choosing between `uint8_t` and `uint16_t` when instantiating `MAX2` macro: > > > max_depth = MAX2(max_depth, lpt->_nest); > ^ > In file included from /home/buildbot/worker/build-jdk16u-linux/build/src/hotspot/share/metaprogramming/primitiveConversions.hpp:30:0, > from /home/buildbot/worker/build-jdk16u-linux/build/src/hotspot/share/oops/oopHandle.hpp:28, > from /home/buildbot/worker/build-jdk16u-linux/build/src/hotspot/share/classfile/systemDictionary.hpp:28, > from /home/buildbot/worker/build-jdk16u-linux/build/src/hotspot/share/classfile/javaClasses.hpp:28, > from /home/buildbot/worker/build-jdk16u-linux/build/src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp:27: Marked as reviewed by thartmann (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk16/pull/138 From magnus.ihse.bursie at oracle.com Fri Jan 29 09:49:55 2021 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Fri, 29 Jan 2021 10:49:55 +0100 Subject: Build fails when excluding Serial GC In-Reply-To: <7e2adbed-1b0b-4693-92c0-5c03963b3c55.qingfeng.yy@alibaba-inc.com> References: <7e2adbed-1b0b-4693-92c0-5c03963b3c55.qingfeng.yy@alibaba-inc.com> Message-ID: <88f8f4b4-941a-5df3-6a89-28741d2f6c7b@oracle.com> On 2021-01-29 09:03, Yang Yi wrote: > Hi, > > It's quite easy to reproduce this problem: > ./configure --with-jvm-features=-serialgc ... ; make images > > I got the following output > ``` > ... > === Output from failing command(s) repeated here === > * For target hotspot_variant-server_libjvm_objs_genCollectedHeap.o: > /home/qingfeng.yy/openjdk16_so_warning/jdk/src/hotspot/share/gc/shared/genCollectedHeap.cpp: In member function 'virtual void GenCollectedHeap::post_initialize()': > /home/qingfeng.yy/openjdk16_so_warning/jdk/src/hotspot/share/gc/shared/genCollectedHeap.cpp:206:3: error: 'MarkSweep' has not been declared > 206 | MarkSweep::initialize(); > | ^~~~~~~~~ > * All command lines available in /home/qingfeng.yy/openjdk16_so_warning/jdk/build/linux-x86_64-server-release/make-support/failure-logs. > === End of repeated output === > ``` > I found current JVM features contain the serial gc, but actually I can not > build an image that does not contain serial gc. This problem has existed > from jdk 11 to jdk head. I am somewhat surprised, so I haven't filed an > issue on JBS. Is this really a bug? Or actually we should revise the building > document and remove all INCLUDE_SERIALGC macros? About a year ago I opened https://bugs.openjdk.java.net/browse/JDK-8240224, to fix this (and other things). This caused quite a heated debate [1], and the result was that I closed the bug again. In summary, my understanding is that hotspot developers view the serialgc as essential, and that there exists no reason beyond toy applications to remove it from compilation. But furthermore the INCLUDE_SERIALGC macros should remain, even though they do not really work, since they function as markers of intent for the code. I don't? agree 100% with this stance, but it's not my code to complain about. :-) Possibly, the configure script should be changed so it does not look like it's possible to exclude the serialgc... /Magnus [1] https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-March/028779.html > > Cheers,Yang Yi > From tschatzl at openjdk.java.net Fri Jan 29 10:06:42 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 29 Jan 2021 10:06:42 GMT Subject: RFR: 8260574: Remove parallel constructs in GenCollectedHeap::process_roots [v2] In-Reply-To: <9IzPl1Vjckbc8hGFU-x-3lUOaXPNZdbwHZNGELPzxsg=.aba6ae73-0169-4866-8da2-30f5d16a95aa@github.com> References: <9IzPl1Vjckbc8hGFU-x-3lUOaXPNZdbwHZNGELPzxsg=.aba6ae73-0169-4866-8da2-30f5d16a95aa@github.com> Message-ID: <9OpauK9GnxHt0S7TlVXUyYTm_RqEzFFukemyPGR4ns8=.b87210a2-f869-4f9b-aed4-4d3883d56df0@github.com> On Thu, 28 Jan 2021 13:49:05 GMT, Albert Mingkun Yang wrote: >> This PR is broken into two commits for easier reviewing, one for removing the usage of `SubTasksDone` in `GenCollectedHeap`, and the other for removing `StrongRootsScope` in the call chain of `GenCollectedHeap::process_roots`. >> >> Tested: hotspot_gc > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > statically known sequential Marked as reviewed by tschatzl (Reviewer). src/hotspot/share/gc/serial/serialHeap.cpp line 101: > 99: > 100: rem_set()->at_younger_refs_iterate(); > 101: old_gen()->younger_refs_iterate(old_gen_closure, 0); A more complete fix of this issue (failures when setting `StrongRootsScope::n_threads == 1`) would probably be ripping out this second parameter and see what it ripples down to. This extends quite a bit through to `CardGeneration` and `CardTableRS` since serial gc is the only user of this after CMS removal and the parallel code can be removed. Feel free to create another CR for this though. src/hotspot/share/gc/serial/genMarkSweep.cpp line 188: > 186: > 187: gch->full_process_roots(&srs, > 188: false, // not the adjust phase The `srs` parameter is also only used to get the number of threads in the callees. It might be better to remove this parameter similar to the other comment in `serialHeap.cpp` as all that and related code is only used in Serial GC and all the parallel paths here can be removed since CMS is no more. ------------- PR: https://git.openjdk.java.net/jdk/pull/2280 From stefan.karlsson at oracle.com Fri Jan 29 10:07:48 2021 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Fri, 29 Jan 2021 11:07:48 +0100 Subject: Build fails when excluding Serial GC In-Reply-To: <7e2adbed-1b0b-4693-92c0-5c03963b3c55.qingfeng.yy@alibaba-inc.com> References: <7e2adbed-1b0b-4693-92c0-5c03963b3c55.qingfeng.yy@alibaba-inc.com> Message-ID: On 2021-01-29 09:03, Yang Yi wrote: > Hi, > > It's quite easy to reproduce this problem: > ./configure --with-jvm-features=-serialgc ... ; make images > > I got the following output > ``` > ... > === Output from failing command(s) repeated here === > * For target hotspot_variant-server_libjvm_objs_genCollectedHeap.o: > /home/qingfeng.yy/openjdk16_so_warning/jdk/src/hotspot/share/gc/shared/genCollectedHeap.cpp: In member function 'virtual void GenCollectedHeap::post_initialize()': > /home/qingfeng.yy/openjdk16_so_warning/jdk/src/hotspot/share/gc/shared/genCollectedHeap.cpp:206:3: error: 'MarkSweep' has not been declared > 206 | MarkSweep::initialize(); > | ^~~~~~~~~ > * All command lines available in /home/qingfeng.yy/openjdk16_so_warning/jdk/build/linux-x86_64-server-release/make-support/failure-logs. > === End of repeated output === > ``` > I found current JVM features contain the serial gc, but actually I can not > build an image that does not contain serial gc. This problem has existed > from jdk 11 to jdk head. I am somewhat surprised, so I haven't filed an > issue on JBS. Is this really a bug? Or actually we should revise the building > document and remove all INCLUDE_SERIALGC macros? It's sort-of a known issue, but since we (Oracle) don't build without the Serial GC it has been left to be fixed later. Maybe by us, or maybe by someone who want's to be able to exclude the Serial GC. I won't mind if you want to fix this. However, note that there are some legacy code here because GenCollectedHeap used to be shared code between the Serial GC and CMS. Now that CMS has been removed, GenCollectedHeap is actually only used by the Serial GC. If you look at the code you'll see that Serial code calls back and forth between the GenCollectedHeap (and associated classes). Cleaning that up is on our backlog, but is not a high-priority task. When/if that gets done, some of the code in the shared/ directory, will be moved into the serial/ directory. With that, I think most of the INCLUDE_SERIALGC problems will go away. StefanK > > Cheers,Yang Yi > From stefan.karlsson at oracle.com Fri Jan 29 10:19:04 2021 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Fri, 29 Jan 2021 11:19:04 +0100 Subject: Build fails when excluding Serial GC In-Reply-To: <88f8f4b4-941a-5df3-6a89-28741d2f6c7b@oracle.com> References: <7e2adbed-1b0b-4693-92c0-5c03963b3c55.qingfeng.yy@alibaba-inc.com> <88f8f4b4-941a-5df3-6a89-28741d2f6c7b@oracle.com> Message-ID: <2514512e-68d5-868a-5f05-c9d765ae3486@oracle.com> On 2021-01-29 10:49, Magnus Ihse Bursie wrote: > > > On 2021-01-29 09:03, Yang Yi wrote: >> Hi, >> >> It's quite easy to reproduce this problem: >> ./configure --with-jvm-features=-serialgc ... ; make images >> >> I got the following output >> ``` >> ... >> === Output from failing command(s) repeated here === >> * For target hotspot_variant-server_libjvm_objs_genCollectedHeap.o: >> /home/qingfeng.yy/openjdk16_so_warning/jdk/src/hotspot/share/gc/shared/genCollectedHeap.cpp: >> In member function 'virtual void GenCollectedHeap::post_initialize()': >> /home/qingfeng.yy/openjdk16_so_warning/jdk/src/hotspot/share/gc/shared/genCollectedHeap.cpp:206:3: >> error: 'MarkSweep' has not been declared >> ?? 206 |?? MarkSweep::initialize(); >> ?????? |?? ^~~~~~~~~ >> * All command lines available in >> /home/qingfeng.yy/openjdk16_so_warning/jdk/build/linux-x86_64-server-release/make-support/failure-logs. >> === End of repeated output === >> ``` >> I found current JVM features contain the serial gc, but actually I >> can not >> build an image that does not contain serial gc. This problem has existed >> from jdk 11 to jdk head. I am somewhat surprised, so I haven't filed an >> issue on JBS. Is this really a bug? Or actually we should revise the >> building >> document and remove all INCLUDE_SERIALGC macros? > > About a year ago I opened > https://bugs.openjdk.java.net/browse/JDK-8240224, to fix this (and > other things). This caused quite a heated debate [1], and the result > was that I closed the bug again. > > In summary, my understanding is that hotspot developers view the > serialgc as essential, and that there exists no reason beyond toy > applications to remove it from compilation. But furthermore the > INCLUDE_SERIALGC macros should remain, even though they do not really > work, since they function as markers of intent for the code. I don't? > agree 100% with this stance, but it's not my code to complain about. :-) I think you got push back on some of the changes. To me and many others the gcConfig.* changes were really controversial. It doesn't mean that fixes to clean this up won't be accepted. In that mail thread, there was a reference to this bug '8234502: Merge GenCollectedHeap and SerialHeap'. Chipping away at that would be good. Fixing that would not only make it possible to build without Serial GC, but also help with the maintainability of our code. StefanK > > Possibly, the configure script should be changed so it does not look > like it's possible to exclude the serialgc... > > /Magnus > > > [1] > https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-March/028779.html > >> >> Cheers,Yang Yi >> > From tschatzl at openjdk.java.net Fri Jan 29 10:20:46 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 29 Jan 2021 10:20:46 GMT Subject: RFR: 8260044: Parallel GC: Concurrent allocation after heap expansion may cause unnecessary full gc In-Reply-To: References: Message-ID: <8lue9mX77xZJlpcwsI-k2qqPMPbsdxjbVFbkuIWp__Y=.39bede45-3c80-4cd1-93ea-d037abb3a85d@github.com> On Fri, 29 Jan 2021 08:24:13 GMT, Kim Barrett wrote: > Please review this change to ParallelGC to avoid unnecessary full GCs when > concurrent threads attempt oldgen allocations during evacuation. > > When a GC thread fails an oldgen allocation it expands the heap and retries > the allocation. If the second allocation attempt fails then allocation > failure is reported to the caller, which can lead to a full GC. But the > retried allocation could fail because, after expansion, some other thread > allocated enough of the available space that the retry fails. This can > happen even though there is plenty of space available, if only that retry > were to perform another expansion. > > Rather than trying to combine the allocation retry with the expansion (it's > not clear there's a way to do so without breaking invariants), we instead > simply loop on the allocation attempt + expand, until either the allocation > succeeds or the expand fails. If some other thread "steals" space from the > expanding thread and causes its next allocation attempt to fail and do > another expansion, that's functionally no different from the expanding > thread succeeding and causing the other thread to fail allocation and do the > expand instead. > > This change includes modifying PSOldGen::expand_to_reserved to return false > when there is no space available, where it previously returned true. It's > not clear why it returned true; that seems wrong, but was harmless. But it > must not do so with the new looping behavior for allocation, else it would > never terminate. > > Testing: > mach5 tier1-3, tier5 (tier2-3, 5 do a lot of ParallelGC testing) src/hotspot/share/gc/parallel/psOldGen.hpp line 141: > 139: do { > 140: res = cas_allocate_noexpand(word_size); > 141: // Retry failed allocation if expand succeeds. "... but allocation did not." would be nice to be added to this comment to be complete. src/hotspot/share/gc/parallel/psOldGen.cpp line 192: > 190: bool PSOldGen::expand(size_t bytes) { > 191: if (bytes == 0) { > 192: return true; I'd prefer if the code would `guarantee` or at least `assert` that `bytes > 0` because returning `true` here seems scary wrt to the loop. All code paths seem to cover this situation already, i.e. with `word_size == 0` this should not be called. But if you think it's not a big issue, we can keep it. This is pre-existing of course. ------------- PR: https://git.openjdk.java.net/jdk/pull/2309 From ayang at openjdk.java.net Fri Jan 29 10:58:53 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Fri, 29 Jan 2021 10:58:53 GMT Subject: RFR: 8260574: Remove parallel constructs in GenCollectedHeap::process_roots [v3] In-Reply-To: References: Message-ID: > This PR is broken into two commits for easier reviewing, one for removing the usage of `SubTasksDone` in `GenCollectedHeap`, and the other for removing `StrongRootsScope` in the call chain of `GenCollectedHeap::process_roots`. > > Tested: hotspot_gc Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: remove srs argument ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2280/files - new: https://git.openjdk.java.net/jdk/pull/2280/files/fbd1e5b6..9585038c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2280&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2280&range=01-02 Stats: 8 lines in 3 files changed: 0 ins; 4 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/2280.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2280/head:pull/2280 PR: https://git.openjdk.java.net/jdk/pull/2280 From ayang at openjdk.java.net Fri Jan 29 10:58:54 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Fri, 29 Jan 2021 10:58:54 GMT Subject: RFR: 8260574: Remove parallel constructs in GenCollectedHeap::process_roots [v2] In-Reply-To: <9OpauK9GnxHt0S7TlVXUyYTm_RqEzFFukemyPGR4ns8=.b87210a2-f869-4f9b-aed4-4d3883d56df0@github.com> References: <9IzPl1Vjckbc8hGFU-x-3lUOaXPNZdbwHZNGELPzxsg=.aba6ae73-0169-4866-8da2-30f5d16a95aa@github.com> <9OpauK9GnxHt0S7TlVXUyYTm_RqEzFFukemyPGR4ns8=.b87210a2-f869-4f9b-aed4-4d3883d56df0@github.com> Message-ID: On Fri, 29 Jan 2021 10:00:54 GMT, Thomas Schatzl wrote: >> Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> statically known sequential > > src/hotspot/share/gc/serial/genMarkSweep.cpp line 188: > >> 186: >> 187: gch->full_process_roots(&srs, >> 188: false, // not the adjust phase > > The `srs` parameter is also only used to get the number of threads in the callees. > It might be better to remove this parameter similar to the other comment in `serialHeap.cpp` as all that and related code is only used in Serial GC and all the parallel paths here can be removed since CMS is no more. I only removed its usage in the body, but forgot it in the argument list. Good catch. > src/hotspot/share/gc/serial/serialHeap.cpp line 101: > >> 99: >> 100: rem_set()->at_younger_refs_iterate(); >> 101: old_gen()->younger_refs_iterate(old_gen_closure, 0); > > A more complete fix of this issue (failures when setting `StrongRootsScope::n_threads == 1`) would probably be ripping out this second parameter and see what it ripples down to. This extends quite a bit through to `CardGeneration` and `CardTableRS` since serial gc is the only user of this after CMS removal and the parallel code can be removed. > Feel free to create another CR for this though. I think it's best to deal with that in another PR. ------------- PR: https://git.openjdk.java.net/jdk/pull/2280 From tschatzl at openjdk.java.net Fri Jan 29 11:31:43 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 29 Jan 2021 11:31:43 GMT Subject: RFR: 8260574: Remove parallel constructs in GenCollectedHeap::process_roots [v2] In-Reply-To: <9OpauK9GnxHt0S7TlVXUyYTm_RqEzFFukemyPGR4ns8=.b87210a2-f869-4f9b-aed4-4d3883d56df0@github.com> References: <9IzPl1Vjckbc8hGFU-x-3lUOaXPNZdbwHZNGELPzxsg=.aba6ae73-0169-4866-8da2-30f5d16a95aa@github.com> <9OpauK9GnxHt0S7TlVXUyYTm_RqEzFFukemyPGR4ns8=.b87210a2-f869-4f9b-aed4-4d3883d56df0@github.com> Message-ID: On Fri, 29 Jan 2021 10:04:17 GMT, Thomas Schatzl wrote: >> Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> statically known sequential > > Marked as reviewed by tschatzl (Reviewer). Even better. Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/2280 From kbarrett at openjdk.java.net Fri Jan 29 12:55:42 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 29 Jan 2021 12:55:42 GMT Subject: RFR: 8260044: Parallel GC: Concurrent allocation after heap expansion may cause unnecessary full gc In-Reply-To: <8lue9mX77xZJlpcwsI-k2qqPMPbsdxjbVFbkuIWp__Y=.39bede45-3c80-4cd1-93ea-d037abb3a85d@github.com> References: <8lue9mX77xZJlpcwsI-k2qqPMPbsdxjbVFbkuIWp__Y=.39bede45-3c80-4cd1-93ea-d037abb3a85d@github.com> Message-ID: <9Veb4gCU7bPa6rUzOP5pXU8HVOjnBAZGhzzIfY210v4=.44e4b4b7-506f-4b8a-8403-0fa67683a691@github.com> On Fri, 29 Jan 2021 10:08:53 GMT, Thomas Schatzl wrote: >> Please review this change to ParallelGC to avoid unnecessary full GCs when >> concurrent threads attempt oldgen allocations during evacuation. >> >> When a GC thread fails an oldgen allocation it expands the heap and retries >> the allocation. If the second allocation attempt fails then allocation >> failure is reported to the caller, which can lead to a full GC. But the >> retried allocation could fail because, after expansion, some other thread >> allocated enough of the available space that the retry fails. This can >> happen even though there is plenty of space available, if only that retry >> were to perform another expansion. >> >> Rather than trying to combine the allocation retry with the expansion (it's >> not clear there's a way to do so without breaking invariants), we instead >> simply loop on the allocation attempt + expand, until either the allocation >> succeeds or the expand fails. If some other thread "steals" space from the >> expanding thread and causes its next allocation attempt to fail and do >> another expansion, that's functionally no different from the expanding >> thread succeeding and causing the other thread to fail allocation and do the >> expand instead. >> >> This change includes modifying PSOldGen::expand_to_reserved to return false >> when there is no space available, where it previously returned true. It's >> not clear why it returned true; that seems wrong, but was harmless. But it >> must not do so with the new looping behavior for allocation, else it would >> never terminate. >> >> Testing: >> mach5 tier1-3, tier5 (tier2-3, 5 do a lot of ParallelGC testing) > > src/hotspot/share/gc/parallel/psOldGen.hpp line 141: > >> 139: do { >> 140: res = cas_allocate_noexpand(word_size); >> 141: // Retry failed allocation if expand succeeds. > > "... but allocation did not." would be nice to be added to this comment to be complete. That's a "failed allocation". ------------- PR: https://git.openjdk.java.net/jdk/pull/2309 From kbarrett at openjdk.java.net Fri Jan 29 12:58:43 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 29 Jan 2021 12:58:43 GMT Subject: RFR: 8260044: Parallel GC: Concurrent allocation after heap expansion may cause unnecessary full gc In-Reply-To: <8lue9mX77xZJlpcwsI-k2qqPMPbsdxjbVFbkuIWp__Y=.39bede45-3c80-4cd1-93ea-d037abb3a85d@github.com> References: <8lue9mX77xZJlpcwsI-k2qqPMPbsdxjbVFbkuIWp__Y=.39bede45-3c80-4cd1-93ea-d037abb3a85d@github.com> Message-ID: On Fri, 29 Jan 2021 10:15:28 GMT, Thomas Schatzl wrote: >> Please review this change to ParallelGC to avoid unnecessary full GCs when >> concurrent threads attempt oldgen allocations during evacuation. >> >> When a GC thread fails an oldgen allocation it expands the heap and retries >> the allocation. If the second allocation attempt fails then allocation >> failure is reported to the caller, which can lead to a full GC. But the >> retried allocation could fail because, after expansion, some other thread >> allocated enough of the available space that the retry fails. This can >> happen even though there is plenty of space available, if only that retry >> were to perform another expansion. >> >> Rather than trying to combine the allocation retry with the expansion (it's >> not clear there's a way to do so without breaking invariants), we instead >> simply loop on the allocation attempt + expand, until either the allocation >> succeeds or the expand fails. If some other thread "steals" space from the >> expanding thread and causes its next allocation attempt to fail and do >> another expansion, that's functionally no different from the expanding >> thread succeeding and causing the other thread to fail allocation and do the >> expand instead. >> >> This change includes modifying PSOldGen::expand_to_reserved to return false >> when there is no space available, where it previously returned true. It's >> not clear why it returned true; that seems wrong, but was harmless. But it >> must not do so with the new looping behavior for allocation, else it would >> never terminate. >> >> Testing: >> mach5 tier1-3, tier5 (tier2-3, 5 do a lot of ParallelGC testing) > > src/hotspot/share/gc/parallel/psOldGen.cpp line 192: > >> 190: bool PSOldGen::expand(size_t bytes) { >> 191: if (bytes == 0) { >> 192: return true; > > I'd prefer if the code would `guarantee` or at least `assert` that `bytes > 0` because returning `true` here seems scary wrt to the loop. > > All code paths seem to cover this situation already, i.e. with `word_size == 0` this should not be called. > > But if you think it's not a big issue, we can keep it. This is pre-existing of course. Good point. I will make sure a 0 size never gets here and assert/guarantee, or otherwise figure out what to do. ------------- PR: https://git.openjdk.java.net/jdk/pull/2309 From zgu at openjdk.java.net Fri Jan 29 13:04:40 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Fri, 29 Jan 2021 13:04:40 GMT Subject: Integrated: 8259404: Shenandoah: Fix time tracking in parallel_cleaning In-Reply-To: References: Message-ID: On Thu, 14 Jan 2021 01:39:02 GMT, Zhengyu Gu wrote: > Please review this patch fixes timing tracking for parallel cleaning. > > Before: > `[9.844s][info][gc,stats] System Purge = 0.000 s (a = 76 us) (n = 1) (lvls, us = 76, 76, 76, 76, 76)` **<<== looks wrong** > `[9.844s][info][gc,stats] Unload Classes = 0.001 s (a = 541 us) (n = 1) (lvls, us = 541, 541, 541, 541, 541)` > `[9.844s][info][gc,stats] Weak Roots = 0.000 s (a = 75 us) (n = 1) (lvls, us = 75, 75, 75, 75, 75)` > `[9.844s][info][gc,stats] CLDG = 0.000 s (a = 0 us) (n = 1) (lvls, us = 0, 0, 0, 0, 0)` > After: > `[9.936s][info][gc,stats] System Purge = 0.001 s (a = 611 us) (n = 1) (lvls, us = 609, 609, 609, 609, 611)` > `[9.936s][info][gc,stats] Unload Classes = 0.000 s (a = 475 us) (n = 1) (lvls, us = 475, 475, 475, 475, 475)` > `[9.936s][info][gc,stats] DCU: = 0.000 s (a = 162 us) (n = 1) (lvls, us = 160, 160, 160, 160, 162)` > `[9.936s][info][gc,stats] DCU: Code Cache Roots = 0.000 s (a = 162 us) (n = 1) (lvls, us = 160, 160, 160, 160, 162)` > `[9.936s][info][gc,stats] Weak Roots = 0.000 s (a = 105 us) (n = 1) (lvls, us = 105, 105, 105, 105, 105)` > `[9.936s][info][gc,stats] DWR: = 0.000 s (a = 210 us) (n = 1) (lvls, us = 209, 209, 209, 209, 210)` > `[9.936s][info][gc,stats] DWR: VM Weak Roots = 0.000 s (a = 210 us) (n = 1) (lvls, us = 209, 209, 209, 209)` This pull request has now been integrated. Changeset: a5fb5173 Author: Zhengyu Gu URL: https://git.openjdk.java.net/jdk/commit/a5fb5173 Stats: 63 lines in 5 files changed: 25 ins; 16 del; 22 mod 8259404: Shenandoah: Fix time tracking in parallel_cleaning Reviewed-by: shade ------------- PR: https://git.openjdk.java.net/jdk/pull/2073 From rkennke at openjdk.java.net Fri Jan 29 13:42:41 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Fri, 29 Jan 2021 13:42:41 GMT Subject: RFR: 8255837: Shenandoah: Remove ShenandoahConcurrentRoots class In-Reply-To: <0qG_GQeV5H4SJ-at5sZYqb9DfQXxB8iTKqu5Lrq9cX0=.dc182daf-dd84-453c-9f3b-e9cf81acfc02@github.com> References: <0qG_GQeV5H4SJ-at5sZYqb9DfQXxB8iTKqu5Lrq9cX0=.dc182daf-dd84-453c-9f3b-e9cf81acfc02@github.com> Message-ID: <9MAE7v82bOSMMGJ_i-JoxH_jKTHtFYFrms-1BrrrBkQ=.dd609312-c1ba-4d27-a4c2-e490bf0584d8@github.com> On Wed, 27 Jan 2021 14:14:50 GMT, Zhengyu Gu wrote: > The class was introduced for 2 purposes: > 1) a platform supports concurrent class unloading (e.g. the platform supports nmethod_entry_barrier) > 2) should perform concurrent class unloading for particular gc cycle (e.g. STW vs. concurrent GC) > > Now, concurrent class unloading is supported on all Shenandoah supported platforms. Furthermore, STW and concurrent GC are isolated (JDK-8255765), the class not only becomes superfluous, but also looks weird, e.g. > > `bool do_nmethods = heap->unload_classes() && !ShenandoahConcurrentRoots::can_do_concurrent_class_unloading(); ` > > Test: > - [x] hotspot_gc_shenandoah Linux x86_64 and x86_32 > - [x] nightly Looks good to me! ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2262 From zgu at openjdk.java.net Fri Jan 29 14:24:43 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Fri, 29 Jan 2021 14:24:43 GMT Subject: Integrated: 8255837: Shenandoah: Remove ShenandoahConcurrentRoots class In-Reply-To: <0qG_GQeV5H4SJ-at5sZYqb9DfQXxB8iTKqu5Lrq9cX0=.dc182daf-dd84-453c-9f3b-e9cf81acfc02@github.com> References: <0qG_GQeV5H4SJ-at5sZYqb9DfQXxB8iTKqu5Lrq9cX0=.dc182daf-dd84-453c-9f3b-e9cf81acfc02@github.com> Message-ID: On Wed, 27 Jan 2021 14:14:50 GMT, Zhengyu Gu wrote: > The class was introduced for 2 purposes: > 1) a platform supports concurrent class unloading (e.g. the platform supports nmethod_entry_barrier) > 2) should perform concurrent class unloading for particular gc cycle (e.g. STW vs. concurrent GC) > > Now, concurrent class unloading is supported on all Shenandoah supported platforms. Furthermore, STW and concurrent GC are isolated (JDK-8255765), the class not only becomes superfluous, but also looks weird, e.g. > > `bool do_nmethods = heap->unload_classes() && !ShenandoahConcurrentRoots::can_do_concurrent_class_unloading(); ` > > Test: > - [x] hotspot_gc_shenandoah Linux x86_64 and x86_32 > - [x] nightly This pull request has now been integrated. Changeset: 22bfa5b0 Author: Zhengyu Gu URL: https://git.openjdk.java.net/jdk/commit/22bfa5b0 Stats: 165 lines in 16 files changed: 0 ins; 124 del; 41 mod 8255837: Shenandoah: Remove ShenandoahConcurrentRoots class Reviewed-by: rkennke ------------- PR: https://git.openjdk.java.net/jdk/pull/2262 From shade at openjdk.java.net Fri Jan 29 14:34:57 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 29 Jan 2021 14:34:57 GMT Subject: RFR: 8260309: Shenandoah: Clean up ShenandoahBarrierSet In-Reply-To: <5t_ZDBfj_4BxoJLoWh3R0r6OCh2Q0wc-DNJntvfhW1Q=.925a092e-c1d3-41df-b216-1cbb0b936959@github.com> References: <5t_ZDBfj_4BxoJLoWh3R0r6OCh2Q0wc-DNJntvfhW1Q=.925a092e-c1d3-41df-b216-1cbb0b936959@github.com> Message-ID: On Fri, 22 Jan 2021 19:03:14 GMT, Roman Kennke wrote: > We collected some cruft in ShenandoahBarrierSet. Time to clean it up. > > This fixes/removes a number of includes, fixes some comments and it also removes is_a() and is_aligned() which look like leftovers/requirements from earlier incarnations of the superclass BarrierSet. Using the override keyword would be useful for such situations (btw, are we ok to start using override, nullptr, auto etc in Shenandoah, or do we want to keep it C++ for backporting ease?) > > One thing I was not sure about is the ShenandoahHeap* _heap field. Making it const will likely help the compiler avoid repeated access (e.g. in a number of perf-critical paths like the LRB impl). However, maybe we should get rid of the field altogether and make it explicitely using ShenandoahHeap::heap() and avoid repeated access instead of helping the compiler and hoping for the best? > > Testing: > - [x] hotspot_gc_shenandoah release, fastdebug What is up with this PR? There are no recorded changes, and tests are failing ;) ------------- PR: https://git.openjdk.java.net/jdk/pull/2202 From rkennke at openjdk.java.net Fri Jan 29 14:34:57 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Fri, 29 Jan 2021 14:34:57 GMT Subject: RFR: 8260309: Shenandoah: Clean up ShenandoahBarrierSet Message-ID: <5t_ZDBfj_4BxoJLoWh3R0r6OCh2Q0wc-DNJntvfhW1Q=.925a092e-c1d3-41df-b216-1cbb0b936959@github.com> We collected some cruft in ShenandoahBarrierSet. Time to clean it up. This fixes/removes a number of includes, fixes some comments and it also removes is_a() and is_aligned() which look like leftovers/requirements from earlier incarnations of the superclass BarrierSet. Using the override keyword would be useful for such situations (btw, are we ok to start using override, nullptr, auto etc in Shenandoah, or do we want to keep it C++ for backporting ease?) One thing I was not sure about is the ShenandoahHeap* _heap field. Making it const will likely help the compiler avoid repeated access (e.g. in a number of perf-critical paths like the LRB impl). However, maybe we should get rid of the field altogether and make it explicitely using ShenandoahHeap::heap() and avoid repeated access instead of helping the compiler and hoping for the best? Testing: - [x] hotspot_gc_shenandoah release, fastdebug ------------- Commit messages: - 8260309: Shenandoah: Clean up ShenandoahBarrierSet Changes: https://git.openjdk.java.net/jdk/pull/2202/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2202&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260309 Stats: 32 lines in 6 files changed: 3 ins; 21 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/2202.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2202/head:pull/2202 PR: https://git.openjdk.java.net/jdk/pull/2202 From rkennke at openjdk.java.net Fri Jan 29 14:34:57 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Fri, 29 Jan 2021 14:34:57 GMT Subject: RFR: 8260309: Shenandoah: Clean up ShenandoahBarrierSet In-Reply-To: References: <5t_ZDBfj_4BxoJLoWh3R0r6OCh2Q0wc-DNJntvfhW1Q=.925a092e-c1d3-41df-b216-1cbb0b936959@github.com> Message-ID: On Fri, 29 Jan 2021 11:05:31 GMT, Aleksey Shipilev wrote: > What is up with this PR? There are no recorded changes, and tests are failing ;) Weird, I seem to have messed it up. I'll fix it shortly. ------------- PR: https://git.openjdk.java.net/jdk/pull/2202 From rkennke at openjdk.java.net Fri Jan 29 14:45:50 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Fri, 29 Jan 2021 14:45:50 GMT Subject: RFR: 8260309: Shenandoah: Clean up ShenandoahBarrierSet [v2] In-Reply-To: <5t_ZDBfj_4BxoJLoWh3R0r6OCh2Q0wc-DNJntvfhW1Q=.925a092e-c1d3-41df-b216-1cbb0b936959@github.com> References: <5t_ZDBfj_4BxoJLoWh3R0r6OCh2Q0wc-DNJntvfhW1Q=.925a092e-c1d3-41df-b216-1cbb0b936959@github.com> Message-ID: > We collected some cruft in ShenandoahBarrierSet. Time to clean it up. > > This fixes/removes a number of includes, fixes some comments and it also removes is_a() and is_aligned() which look like leftovers/requirements from earlier incarnations of the superclass BarrierSet. Using the override keyword would be useful for such situations (btw, are we ok to start using override, nullptr, auto etc in Shenandoah, or do we want to keep it C++ for backporting ease?) > > One thing I was not sure about is the ShenandoahHeap* _heap field. Making it const will likely help the compiler avoid repeated access (e.g. in a number of perf-critical paths like the LRB impl). However, maybe we should get rid of the field altogether and make it explicitely using ShenandoahHeap::heap() and avoid repeated access instead of helping the compiler and hoping for the best? > > Testing: > - [x] hotspot_gc_shenandoah release, fastdebug Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - Restore some changes that have been lost during merge - Merge branch 'master' into JDK-8260309 - 8260309: Shenandoah: Clean up ShenandoahBarrierSet ------------- Changes: https://git.openjdk.java.net/jdk/pull/2202/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2202&range=01 Stats: 31 lines in 6 files changed: 4 ins; 19 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/2202.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2202/head:pull/2202 PR: https://git.openjdk.java.net/jdk/pull/2202 From rkennke at openjdk.java.net Fri Jan 29 15:00:41 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Fri, 29 Jan 2021 15:00:41 GMT Subject: RFR: 8260309: Shenandoah: Clean up ShenandoahBarrierSet In-Reply-To: References: <5t_ZDBfj_4BxoJLoWh3R0r6OCh2Q0wc-DNJntvfhW1Q=.925a092e-c1d3-41df-b216-1cbb0b936959@github.com> Message-ID: On Fri, 29 Jan 2021 11:16:39 GMT, Roman Kennke wrote: > > What is up with this PR? There are no recorded changes, and tests are failing ;) > > Weird, I seem to have messed it up. I'll fix it shortly. It should be good to review, now. ------------- PR: https://git.openjdk.java.net/jdk/pull/2202 From kvn at openjdk.java.net Fri Jan 29 16:32:43 2021 From: kvn at openjdk.java.net (Vladimir Kozlov) Date: Fri, 29 Jan 2021 16:32:43 GMT Subject: [jdk16] RFR: 8260632: Build failures after JDK-8253353 In-Reply-To: References: Message-ID: <-2ay1k0rXBxWJKLEAFelmBeTw3xcp-Q3xqb-BFjKeLI=.0d2ec250-d835-4432-8a14-4459edad72ad@github.com> On Fri, 29 Jan 2021 08:20:10 GMT, Aleksey Shipilev wrote: > [JDK-8253353](https://bugs.openjdk.java.net/browse/JDK-8253353) changed the field to `uint16_t`, and now `shenadoahSupport.cpp` code runs into ambiguity choosing between `uint8_t` and `uint16_t` when instantiating `MAX2` macro: > > > max_depth = MAX2(max_depth, lpt->_nest); > ^ > In file included from /home/buildbot/worker/build-jdk16u-linux/build/src/hotspot/share/metaprogramming/primitiveConversions.hpp:30:0, > from /home/buildbot/worker/build-jdk16u-linux/build/src/hotspot/share/oops/oopHandle.hpp:28, > from /home/buildbot/worker/build-jdk16u-linux/build/src/hotspot/share/classfile/systemDictionary.hpp:28, > from /home/buildbot/worker/build-jdk16u-linux/build/src/hotspot/share/classfile/javaClasses.hpp:28, > from /home/buildbot/worker/build-jdk16u-linux/build/src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp:27: > > Additional testing: > - [x] Linux x86_64 `hotspot_gc_shenandoah` > - [x] Linux x86_64 `tier1` with Shenandoah Good. I approved fix request for JDK 16. ------------- Marked as reviewed by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk16/pull/138 From neliasso at openjdk.java.net Fri Jan 29 16:34:40 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 29 Jan 2021 16:34:40 GMT Subject: RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled [v4] In-Reply-To: <6nZPJh_IZbeLrS2D1lrwq7NIIry0zGQ8EzAXD6fkSrE=.4b476693-5877-434e-9e97-b26f73870e33@github.com> References: <6nZPJh_IZbeLrS2D1lrwq7NIIry0zGQ8EzAXD6fkSrE=.4b476693-5877-434e-9e97-b26f73870e33@github.com> Message-ID: On Fri, 29 Jan 2021 08:24:13 GMT, Erik ?sterlund wrote: >>> > ArrayCopyNode::load performs the same work as it does here in PhaseVector::optimize_vector_boxes . >>> > Is there a need to provide a similar function in PhaseVector or GraphKit? >>> >>> My point is since PhaseVector effectively enters the parsing phase (by signaling about the possibility of post-parse inlining), technically I don't see why `GraphKit::access_load_at` won't work. But I need to spend more time looking into the details. >>> >>> So far, I took a look at the review thread of 8212243 (which introduced `ArrayCopyNode::load`) and found the following discussion between Roland and Erik: >>> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-October/030971.html >>> >>> ``` >>> > ... Also it beats me that this is strictly speaking a load barrier for loads performed in >>> > arraycopy. Would it be possible to use something like access_load_at instead? ... >>> ... >>> GraphKit is a parse time only thing. So the existing gc interface >>> doesn't offer any way to add barriers once parsing is over. This code >>> runs after parsing in optimization phases. >>> ... >>> ``` >>> >>> Considering `PhaseVector::optimize_vector_boxes()` already has access to a usable `GraphKit` instance, it is possible that `GraphKit::access_load_at` will "just work". >> >> As far as I can see, during the parse phase, GraphKit contains the jvm state info which can be used to get the control and memory for creating new nodes. But during optimization, the jvm state info may be missing like the situation in `PhaseVector::optimize_vector_boxes` or Macro Expansion. So it should use C2OptAccess to create the Load Node directly by providing control and memory nodes. >> >> I think a similar api like `GraphKit::access_load_at ` should be provided for usage during optimization stages, but where should the API be placed? GraphKit or PhaseIterGVN or somewhere else? > >> Considering `PhaseVector::optimize_vector_boxes()` already has access to a usable `GraphKit` instance, it is possible that `GraphKit::access_load_at` will "just work". > > The main thing to make sure you get right, is the aliasing. I'm not sure that will work right after parsing, the way it works now. I suggest you keep this CR as it is since 16 is in rampdown and we need to get approval and push it before Feb 4th (and we do want some margin). Open an enhancement on 17 to fix the api. ------------- PR: https://git.openjdk.java.net/jdk/pull/2253 From neliasso at openjdk.java.net Fri Jan 29 16:38:41 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 29 Jan 2021 16:38:41 GMT Subject: RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled [v7] In-Reply-To: References: Message-ID: <3qxMI_2q1rdcDoLFZ5Qil__d_shHCQaAkVU39VAGPdU=.9fbb67fa-a195-49d6-aa20-5991f34b61d0@github.com> On Fri, 29 Jan 2021 04:14:03 GMT, ?? wrote: >> https://bugs.openjdk.java.net/browse/JDK-8260473 >> >> Function "PhaseVector::expand_vunbox_node" creates a LoadNode, but forgets to make the LoadNode to pass gc barriers. >> >> >> Testing: all Vector API related tests have passed. > > ?? has updated the pull request incrementally with one additional commit since the last revision: > > Fix the copyright date src/hotspot/share/opto/vector.cpp line 419: > 417: Node* vec_field_ld; > 418: { > 419: DecoratorSet decorators = C2_READ_ACCESS | C2_CONTROL_DEPENDENT_LOAD | IN_HEAP; C2_READ_ACCESS will be set by "bs->load_at" so you can skip that. MO_UNORDERED is missing. That corresponds to "MemNode::unordered" in the original code. ------------- PR: https://git.openjdk.java.net/jdk/pull/2253 From vlivanov at openjdk.java.net Fri Jan 29 16:46:41 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 29 Jan 2021 16:46:41 GMT Subject: RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled [v4] In-Reply-To: References: <6nZPJh_IZbeLrS2D1lrwq7NIIry0zGQ8EzAXD6fkSrE=.4b476693-5877-434e-9e97-b26f73870e33@github.com> Message-ID: On Fri, 29 Jan 2021 16:31:47 GMT, Nils Eliasson wrote: > I suggest you keep this CR as it is since 16 is in rampdown and we need to get approval and push it before Feb 4th (and we do want some margin). I agree. @casparcwang, please, file an RFE. ------------- PR: https://git.openjdk.java.net/jdk/pull/2253 From vlivanov at openjdk.java.net Fri Jan 29 16:46:43 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 29 Jan 2021 16:46:43 GMT Subject: RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled [v7] In-Reply-To: <3qxMI_2q1rdcDoLFZ5Qil__d_shHCQaAkVU39VAGPdU=.9fbb67fa-a195-49d6-aa20-5991f34b61d0@github.com> References: <3qxMI_2q1rdcDoLFZ5Qil__d_shHCQaAkVU39VAGPdU=.9fbb67fa-a195-49d6-aa20-5991f34b61d0@github.com> Message-ID: On Fri, 29 Jan 2021 16:35:39 GMT, Nils Eliasson wrote: >> ?? has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix the copyright date > > src/hotspot/share/opto/vector.cpp line 419: > >> 417: Node* vec_field_ld; >> 418: { >> 419: DecoratorSet decorators = C2_READ_ACCESS | C2_CONTROL_DEPENDENT_LOAD | IN_HEAP; > > C2_READ_ACCESS will be set by "bs->load_at" so you can skip that. > MO_UNORDERED is missing. That corresponds to "MemNode::unordered" in the original code. `C2_CONTROL_DEPENDENT_LOAD` is also redundant (though original code does that): it's just a plain load from a final instance field). ------------- PR: https://git.openjdk.java.net/jdk/pull/2253 From vlivanov at openjdk.java.net Fri Jan 29 16:50:40 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Fri, 29 Jan 2021 16:50:40 GMT Subject: RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled [v4] In-Reply-To: References: Message-ID: On Fri, 29 Jan 2021 06:48:36 GMT, ?? wrote: > As far as I can see, during the parse phase, GraphKit contains the jvm state info which can be used to get the control and memory for creating new nodes. But during optimization, the jvm state info may be missing like the situation in PhaseVector::optimize_vector_boxes or Macro Expansion. JVM state is irrelevant here (otherwise, `VectorUnbox` node would have captured relevant info during construction). What is actually missing is `GraphKit` instance lacks info about control and memory. You need to explicitly set it using `GraphKit::set_control()` and `GraphKit::set_all_memory()`. ------------- PR: https://git.openjdk.java.net/jdk/pull/2253 From neliasso at openjdk.java.net Fri Jan 29 17:14:44 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Fri, 29 Jan 2021 17:14:44 GMT Subject: RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled [v4] In-Reply-To: References: Message-ID: <_4PyarggOjbeRGvLGk2nFhLcuCMJCilw5kOaj0C3c4Y=.8f0e5910-1c43-43e7-ac56-edbf87d3bf09@github.com> On Fri, 29 Jan 2021 16:47:53 GMT, Vladimir Ivanov wrote: >>> > ArrayCopyNode::load performs the same work as it does here in PhaseVector::optimize_vector_boxes . >>> > Is there a need to provide a similar function in PhaseVector or GraphKit? >>> >>> My point is since PhaseVector effectively enters the parsing phase (by signaling about the possibility of post-parse inlining), technically I don't see why `GraphKit::access_load_at` won't work. But I need to spend more time looking into the details. >>> >>> So far, I took a look at the review thread of 8212243 (which introduced `ArrayCopyNode::load`) and found the following discussion between Roland and Erik: >>> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-October/030971.html >>> >>> ``` >>> > ... Also it beats me that this is strictly speaking a load barrier for loads performed in >>> > arraycopy. Would it be possible to use something like access_load_at instead? ... >>> ... >>> GraphKit is a parse time only thing. So the existing gc interface >>> doesn't offer any way to add barriers once parsing is over. This code >>> runs after parsing in optimization phases. >>> ... >>> ``` >>> >>> Considering `PhaseVector::optimize_vector_boxes()` already has access to a usable `GraphKit` instance, it is possible that `GraphKit::access_load_at` will "just work". >> >> As far as I can see, during the parse phase, GraphKit contains the jvm state info which can be used to get the control and memory for creating new nodes. But during optimization, the jvm state info may be missing like the situation in `PhaseVector::optimize_vector_boxes` or Macro Expansion. So it should use C2OptAccess to create the Load Node directly by providing control and memory nodes. >> >> I think a similar api like `GraphKit::access_load_at ` should be provided for usage during optimization stages, but where should the API be placed? GraphKit or PhaseIterGVN or somewhere else? > >> As far as I can see, during the parse phase, GraphKit contains the jvm state info which can be used to get the control and memory for creating new nodes. But during optimization, the jvm state info may be missing like the situation in PhaseVector::optimize_vector_boxes or Macro Expansion. > > JVM state is irrelevant here (otherwise, `VectorUnbox` node would have captured relevant info during construction). What is actually missing is `GraphKit` instance lacks info about control and memory. You need to explicitly set it using `GraphKit::set_control()` and `GraphKit::set_all_memory()`. We need this patch to be based on the JDK 16 repository. I will help out with the fix-request and sponsor-ship. ------------- PR: https://git.openjdk.java.net/jdk/pull/2253 From shade at openjdk.java.net Fri Jan 29 17:50:46 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 29 Jan 2021 17:50:46 GMT Subject: [jdk16] RFR: 8260632: Build failures after JDK-8253353 In-Reply-To: <-2ay1k0rXBxWJKLEAFelmBeTw3xcp-Q3xqb-BFjKeLI=.0d2ec250-d835-4432-8a14-4459edad72ad@github.com> References: <-2ay1k0rXBxWJKLEAFelmBeTw3xcp-Q3xqb-BFjKeLI=.0d2ec250-d835-4432-8a14-4459edad72ad@github.com> Message-ID: On Fri, 29 Jan 2021 16:29:44 GMT, Vladimir Kozlov wrote: >> [JDK-8253353](https://bugs.openjdk.java.net/browse/JDK-8253353) changed the field to `uint16_t`, and now `shenadoahSupport.cpp` code runs into ambiguity choosing between `uint8_t` and `uint16_t` when instantiating `MAX2` macro: >> >> >> max_depth = MAX2(max_depth, lpt->_nest); >> ^ >> In file included from /home/buildbot/worker/build-jdk16u-linux/build/src/hotspot/share/metaprogramming/primitiveConversions.hpp:30:0, >> from /home/buildbot/worker/build-jdk16u-linux/build/src/hotspot/share/oops/oopHandle.hpp:28, >> from /home/buildbot/worker/build-jdk16u-linux/build/src/hotspot/share/classfile/systemDictionary.hpp:28, >> from /home/buildbot/worker/build-jdk16u-linux/build/src/hotspot/share/classfile/javaClasses.hpp:28, >> from /home/buildbot/worker/build-jdk16u-linux/build/src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp:27: >> >> Additional testing: >> - [x] Linux x86_64 `hotspot_gc_shenandoah` >> - [x] Linux x86_64 `tier1` with Shenandoah > > Good. I approved fix request for JDK 16. Thank you. ------------- PR: https://git.openjdk.java.net/jdk16/pull/138 From shade at openjdk.java.net Fri Jan 29 17:50:47 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 29 Jan 2021 17:50:47 GMT Subject: [jdk16] Integrated: 8260632: Build failures after JDK-8253353 In-Reply-To: References: Message-ID: On Fri, 29 Jan 2021 08:20:10 GMT, Aleksey Shipilev wrote: > [JDK-8253353](https://bugs.openjdk.java.net/browse/JDK-8253353) changed the field to `uint16_t`, and now `shenadoahSupport.cpp` code runs into ambiguity choosing between `uint8_t` and `uint16_t` when instantiating `MAX2` macro: > > > max_depth = MAX2(max_depth, lpt->_nest); > ^ > In file included from /home/buildbot/worker/build-jdk16u-linux/build/src/hotspot/share/metaprogramming/primitiveConversions.hpp:30:0, > from /home/buildbot/worker/build-jdk16u-linux/build/src/hotspot/share/oops/oopHandle.hpp:28, > from /home/buildbot/worker/build-jdk16u-linux/build/src/hotspot/share/classfile/systemDictionary.hpp:28, > from /home/buildbot/worker/build-jdk16u-linux/build/src/hotspot/share/classfile/javaClasses.hpp:28, > from /home/buildbot/worker/build-jdk16u-linux/build/src/hotspot/share/gc/shenandoah/c2/shenandoahSupport.cpp:27: > > Additional testing: > - [x] Linux x86_64 `hotspot_gc_shenandoah` > - [x] Linux x86_64 `tier1` with Shenandoah This pull request has now been integrated. Changeset: bc41bb10 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk16/commit/bc41bb10 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8260632: Build failures after JDK-8253353 Reviewed-by: stuefe, thartmann, kvn ------------- PR: https://git.openjdk.java.net/jdk16/pull/138 From smonteith at openjdk.java.net Fri Jan 29 21:06:41 2021 From: smonteith at openjdk.java.net (Stuart Monteith) Date: Fri, 29 Jan 2021 21:06:41 GMT Subject: RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled [v4] In-Reply-To: References: Message-ID: <9WQB6haJP9fXJNp2yEPX3L9S7KQWznjuFrDxDGpEANE=.121fcca5-1f77-414d-8ba5-fe427281867c@github.com> On Fri, 29 Jan 2021 02:14:06 GMT, ?? wrote: >> test/hotspot/jtreg/compiler/vectorapi/VectorReshapeTest.java line 40: >> >>> 38: * @modules jdk.incubator.vector >>> 39: * @modules java.base/jdk.internal.vm.annotation >>> 40: * @run main/othervm -XX:CompileCommand=compileonly,jdk/incubator/vector/ByteVector.fromByteBuffer >> >> -XX:CompileCommand=compileonly,jdk/incubator/vector/ByteVector.fromByteBuffer restricts the compilation to a single method for diagnostic purposes. The test runs much quicker without it, and still reproduces the issue. > > The test is changed to 'testng' mode, remove option compileonly will make the test pass the assert in jtreg test framework. But add the option will make it fail the assert. So the option is left unchanged. I accept that - running it away from standalone does change the behaviour sufficiently that the CompileCommand is necessary to run the test (the standalone test was developed without the CompileCommand being necessary). ------------- PR: https://git.openjdk.java.net/jdk/pull/2253 From kbarrett at openjdk.java.net Sat Jan 30 00:21:46 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sat, 30 Jan 2021 00:21:46 GMT Subject: RFR: 8260574: Remove parallel constructs in GenCollectedHeap::process_roots [v3] In-Reply-To: References: Message-ID: On Fri, 29 Jan 2021 10:58:53 GMT, Albert Mingkun Yang wrote: >> This PR is broken into two commits for easier reviewing, one for removing the usage of `SubTasksDone` in `GenCollectedHeap`, and the other for removing `StrongRootsScope` in the call chain of `GenCollectedHeap::process_roots`. >> >> Tested: hotspot_gc > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > remove srs argument Other than the now extraneous block scopes, this looks good. src/hotspot/share/gc/shared/genCollectedHeap.cpp line 803: > 801: assert(code_roots != NULL, "code root closure should always be set"); > 802: > 803: { I'd prefer these unneeded scoping brackets were removed too. src/hotspot/share/gc/shared/strongRootsScope.cpp line 44: > 42: // cases, so they expect the thread claim token to be updated. > 43: if (_n_threads != 0) { > 44: Threads::change_thread_claim_token(); I wonder if we want SerialGC code to deal with StrongRootsScope at all. But that's beyond the scope of this change. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2280 From ayang at openjdk.java.net Sat Jan 30 00:59:54 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Sat, 30 Jan 2021 00:59:54 GMT Subject: RFR: 8260574: Remove parallel constructs in GenCollectedHeap::process_roots [v4] In-Reply-To: References: Message-ID: <7R098X4DGqaFp25gZwHf-tPoYUeIItp1IZ2u-kyf4Rk=.c4348816-05c1-481a-9464-2e364c3273fd@github.com> > This PR is broken into two commits for easier reviewing, one for removing the usage of `SubTasksDone` in `GenCollectedHeap`, and the other for removing `StrongRootsScope` in the call chain of `GenCollectedHeap::process_roots`. > > Tested: hotspot_gc Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: remove surrounding braces ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2280/files - new: https://git.openjdk.java.net/jdk/pull/2280/files/9585038c..f81abed6 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2280&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2280&range=02-03 Stats: 30 lines in 1 file changed: 4 ins; 12 del; 14 mod Patch: https://git.openjdk.java.net/jdk/pull/2280.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2280/head:pull/2280 PR: https://git.openjdk.java.net/jdk/pull/2280 From kbarrett at openjdk.java.net Sat Jan 30 05:56:52 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sat, 30 Jan 2021 05:56:52 GMT Subject: RFR: 8259862: MutableSpace's end should be atomic Message-ID: Please review this change to MutableSpace, making its _end member volatile and using Atomic operations to access the _top and _end members. Some unused accessor functions that would otherwise need updating are removed. Testing: mach5 tier1 ------------- Commit messages: - make _end volatile and use atomic access Changes: https://git.openjdk.java.net/jdk/pull/2323/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2323&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8259862 Stats: 19 lines in 4 files changed: 4 ins; 9 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/2323.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2323/head:pull/2323 PR: https://git.openjdk.java.net/jdk/pull/2323 From kbarrett at openjdk.java.net Sat Jan 30 06:09:01 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sat, 30 Jan 2021 06:09:01 GMT Subject: RFR: 8260044: Parallel GC: Concurrent allocation after heap expansion may cause unnecessary full gc [v2] In-Reply-To: References: Message-ID: > Please review this change to ParallelGC to avoid unnecessary full GCs when > concurrent threads attempt oldgen allocations during evacuation. > > When a GC thread fails an oldgen allocation it expands the heap and retries > the allocation. If the second allocation attempt fails then allocation > failure is reported to the caller, which can lead to a full GC. But the > retried allocation could fail because, after expansion, some other thread > allocated enough of the available space that the retry fails. This can > happen even though there is plenty of space available, if only that retry > were to perform another expansion. > > Rather than trying to combine the allocation retry with the expansion (it's > not clear there's a way to do so without breaking invariants), we instead > simply loop on the allocation attempt + expand, until either the allocation > succeeds or the expand fails. If some other thread "steals" space from the > expanding thread and causes its next allocation attempt to fail and do > another expansion, that's functionally no different from the expanding > thread succeeding and causing the other thread to fail allocation and do the > expand instead. > > This change includes modifying PSOldGen::expand_to_reserved to return false > when there is no space available, where it previously returned true. It's > not clear why it returned true; that seems wrong, but was harmless. But it > must not do so with the new looping behavior for allocation, else it would > never terminate. > > Testing: > mach5 tier1-3, tier5 (tier2-3, 5 do a lot of ParallelGC testing) Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: require non-zero expand size ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2309/files - new: https://git.openjdk.java.net/jdk/pull/2309/files/12500d49..d67d5e20 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2309&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2309&range=00-01 Stats: 14 lines in 1 file changed: 1 ins; 6 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/2309.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2309/head:pull/2309 PR: https://git.openjdk.java.net/jdk/pull/2309 From kbarrett at openjdk.java.net Sat Jan 30 06:09:02 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sat, 30 Jan 2021 06:09:02 GMT Subject: RFR: 8260044: Parallel GC: Concurrent allocation after heap expansion may cause unnecessary full gc [v2] In-Reply-To: References: <8lue9mX77xZJlpcwsI-k2qqPMPbsdxjbVFbkuIWp__Y=.39bede45-3c80-4cd1-93ea-d037abb3a85d@github.com> Message-ID: On Fri, 29 Jan 2021 12:55:53 GMT, Kim Barrett wrote: >> src/hotspot/share/gc/parallel/psOldGen.cpp line 192: >> >>> 190: bool PSOldGen::expand(size_t bytes) { >>> 191: if (bytes == 0) { >>> 192: return true; >> >> I'd prefer if the code would `guarantee` or at least `assert` that `bytes > 0` because returning `true` here seems scary wrt to the loop. >> >> All code paths seem to cover this situation already, i.e. with `word_size == 0` this should not be called. >> >> But if you think it's not a big issue, we can keep it. This is pre-existing of course. > > Good point. I will make sure a 0 size never gets here and assert/guarantee, or otherwise figure out what to do. I've changed various quick returns on zero size to instead be asserts, since none of them should ever be called with a zero size. ------------- PR: https://git.openjdk.java.net/jdk/pull/2309 From kbarrett at openjdk.java.net Sat Jan 30 10:19:57 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sat, 30 Jan 2021 10:19:57 GMT Subject: RFR: 8258508: Merge G1RedirtyCardsQueue into qset Message-ID: Please review this change to G1RedirtyCardsLocalQueueSet to directly incorporate the associated queue, simplifying usage. Testing: mach5 tier1 ------------- Commit messages: - merge redirty cards queue into local qset Changes: https://git.openjdk.java.net/jdk/pull/2325/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2325&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8258508 Stats: 55 lines in 5 files changed: 12 ins; 26 del; 17 mod Patch: https://git.openjdk.java.net/jdk/pull/2325.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2325/head:pull/2325 PR: https://git.openjdk.java.net/jdk/pull/2325 From ayang at openjdk.java.net Sat Jan 30 11:35:44 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Sat, 30 Jan 2021 11:35:44 GMT Subject: RFR: 8260574: Remove parallel constructs in GenCollectedHeap::process_roots [v3] In-Reply-To: References: Message-ID: On Sat, 30 Jan 2021 00:12:10 GMT, Kim Barrett wrote: >> Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> remove srs argument > > src/hotspot/share/gc/shared/genCollectedHeap.cpp line 803: > >> 801: assert(code_roots != NULL, "code root closure should always be set"); >> 802: >> 803: { > > I'd prefer these unneeded scoping brackets were removed too. Done. ------------- PR: https://git.openjdk.java.net/jdk/pull/2280 From ayang at openjdk.java.net Sat Jan 30 11:40:53 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Sat, 30 Jan 2021 11:40:53 GMT Subject: RFR: 8260574: Remove parallel constructs in GenCollectedHeap::process_roots [v3] In-Reply-To: References: Message-ID: On Sat, 30 Jan 2021 00:19:08 GMT, Kim Barrett wrote: >> Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> remove srs argument > > Other than the now extraneous block scopes, this looks good. Thank you for the reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/2280 From github.com+25214855+casparcwang at openjdk.java.net Sat Jan 30 12:08:56 2021 From: github.com+25214855+casparcwang at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Sat, 30 Jan 2021 12:08:56 GMT Subject: [jdk16] RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results =?UTF-8?B?4oCm?= In-Reply-To: <5OfnHC5N00VVv3pWcU9gsAHa23RbAAX7ReEw9Ct6eug=.4f095083-7050-487d-94e0-3befce6744c5@github.com> References: <5OfnHC5N00VVv3pWcU9gsAHa23RbAAX7ReEw9Ct6eug=.4f095083-7050-487d-94e0-3befce6744c5@github.com> Message-ID: <_Sakvj07dhVSGp-YnE5Tvu-Yb_WBI5Oo4ODNlTOP-XY=.853eddb8-a2a6-4371-bb41-380caf16a3da@github.com> On Sat, 30 Jan 2021 12:02:25 GMT, ?? wrote: > ?with ZGC enabled > > https://bugs.openjdk.java.net/browse/JDK-8260473 > > Function "PhaseVector::expand_vunbox_node" creates a LoadNode, but forgets to make the LoadNode to pass gc barriers. > > Testing: all Vector API related tests have passed. > > Original pr: https://github.com/openjdk/jdk/pull/2253 ------------- PR: https://git.openjdk.java.net/jdk16/pull/139 From github.com+25214855+casparcwang at openjdk.java.net Sat Jan 30 12:08:55 2021 From: github.com+25214855+casparcwang at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Sat, 30 Jan 2021 12:08:55 GMT Subject: [jdk16] RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results =?UTF-8?B?4oCm?= Message-ID: <5OfnHC5N00VVv3pWcU9gsAHa23RbAAX7ReEw9Ct6eug=.4f095083-7050-487d-94e0-3befce6744c5@github.com> ?with ZGC enabled https://bugs.openjdk.java.net/browse/JDK-8260473 Function "PhaseVector::expand_vunbox_node" creates a LoadNode, but forgets to make the LoadNode to pass gc barriers. Testing: all Vector API related tests have passed. Original pr: https://github.com/openjdk/jdk/pull/2253 ------------- Commit messages: - 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled Changes: https://git.openjdk.java.net/jdk16/pull/139/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk16&pr=139&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260473 Stats: 174 lines in 2 files changed: 165 ins; 0 del; 9 mod Patch: https://git.openjdk.java.net/jdk16/pull/139.diff Fetch: git fetch https://git.openjdk.java.net/jdk16 pull/139/head:pull/139 PR: https://git.openjdk.java.net/jdk16/pull/139 From github.com+25214855+casparcwang at openjdk.java.net Sat Jan 30 12:14:41 2021 From: github.com+25214855+casparcwang at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Sat, 30 Jan 2021 12:14:41 GMT Subject: RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled [v7] In-Reply-To: References: <3qxMI_2q1rdcDoLFZ5Qil__d_shHCQaAkVU39VAGPdU=.9fbb67fa-a195-49d6-aa20-5991f34b61d0@github.com> Message-ID: On Fri, 29 Jan 2021 16:42:05 GMT, Vladimir Ivanov wrote: >> src/hotspot/share/opto/vector.cpp line 419: >> >>> 417: Node* vec_field_ld; >>> 418: { >>> 419: DecoratorSet decorators = C2_READ_ACCESS | C2_CONTROL_DEPENDENT_LOAD | IN_HEAP; >> >> C2_READ_ACCESS will be set by "bs->load_at" so you can skip that. >> MO_UNORDERED is missing. That corresponds to "MemNode::unordered" in the original code. > > `C2_CONTROL_DEPENDENT_LOAD` is also redundant (though original code does that): it's just a plain load from a final instance field). Fixed in the new pr: https://github.com/openjdk/jdk16/pull/139 ------------- PR: https://git.openjdk.java.net/jdk/pull/2253 From github.com+25214855+casparcwang at openjdk.java.net Sat Jan 30 12:18:41 2021 From: github.com+25214855+casparcwang at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Sat, 30 Jan 2021 12:18:41 GMT Subject: RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled [v4] In-Reply-To: References: Message-ID: On Fri, 29 Jan 2021 16:47:53 GMT, Vladimir Ivanov wrote: > > As far as I can see, during the parse phase, GraphKit contains the jvm state info which can be used to get the control and memory for creating new nodes. But during optimization, the jvm state info may be missing like the situation in PhaseVector::optimize_vector_boxes or Macro Expansion. > > JVM state is irrelevant here (otherwise, `VectorUnbox` node would have captured relevant info during construction). What is actually missing is `GraphKit` instance lacks info about control and memory. You need to explicitly set it using `GraphKit::set_control()` and `GraphKit::set_all_memory()`. Thank you for the explanation. @iwanowww > We need this patch to be based on the JDK 16 repository. > > I will help out with the fix-request and sponsor-ship. Thank you very much. @neliasso I have create a new patch based on JDK16 repo: https://github.com/openjdk/jdk16/pull/139 ------------- PR: https://git.openjdk.java.net/jdk/pull/2253 From ayang at openjdk.java.net Sat Jan 30 12:40:54 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Sat, 30 Jan 2021 12:40:54 GMT Subject: RFR: 8259862: MutableSpace's end should be atomic In-Reply-To: References: Message-ID: On Sat, 30 Jan 2021 05:51:38 GMT, Kim Barrett wrote: > Please review this change to MutableSpace, making its _end member volatile > and using Atomic operations to access the _top and _end members. Some > unused accessor functions that would otherwise need updating are removed. > > Testing: > mach5 tier1 Marked as reviewed by ayang (Author). src/hotspot/share/gc/parallel/mutableSpace.hpp line 62: > 60: HeapWord* _bottom; > 61: HeapWord* volatile _top; > 62: HeapWord* volatile _end; Maybe add some comments explaining how `_top` and `_end` are used in the concurrent setting. ------------- PR: https://git.openjdk.java.net/jdk/pull/2323 From vlivanov at openjdk.java.net Sat Jan 30 18:06:49 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Sat, 30 Jan 2021 18:06:49 GMT Subject: [jdk16] RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled In-Reply-To: <5OfnHC5N00VVv3pWcU9gsAHa23RbAAX7ReEw9Ct6eug=.4f095083-7050-487d-94e0-3befce6744c5@github.com> References: <5OfnHC5N00VVv3pWcU9gsAHa23RbAAX7ReEw9Ct6eug=.4f095083-7050-487d-94e0-3befce6744c5@github.com> Message-ID: On Sat, 30 Jan 2021 12:02:25 GMT, ?? wrote: > https://bugs.openjdk.java.net/browse/JDK-8260473 > > Function "PhaseVector::expand_vunbox_node" creates a LoadNode, but forgets to make the LoadNode to pass gc barriers. > > Testing: all Vector API related tests have passed. > > Original pr: https://github.com/openjdk/jdk/pull/2253 Overall, looks good. test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java line 47: > 45: * @modules java.base/jdk.internal.vm.annotation > 46: * @run testng/othervm -XX:CompileCommand=compileonly,jdk/incubator/vector/ByteVector.fromByteBuffer > 47: * -XX:-TieredCompilation -XX:CICompilerCount=1 -XX:+UseZGC -Xbatch -Xmx256m VectorRebracket128Test Why `-XX:CICompilerCount=1` and `-Xmx256m` are needed? ------------- Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk16/pull/139 From jiefu at openjdk.java.net Sun Jan 31 00:43:46 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Sun, 31 Jan 2021 00:43:46 GMT Subject: [jdk16] RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled In-Reply-To: References: <5OfnHC5N00VVv3pWcU9gsAHa23RbAAX7ReEw9Ct6eug=.4f095083-7050-487d-94e0-3befce6744c5@github.com> Message-ID: On Sat, 30 Jan 2021 17:33:08 GMT, Vladimir Ivanov wrote: > Why `-XX:CICompilerCount=1` and `-Xmx256m` are needed? Thanks @iwanowww for your review. I discussed the same question with @casparcwang offline. The reason is: - Small heap (-Xmx256m) will help to trigger a gc. - compileonly and compilercount=1 will let the VM run slow enough to wait for a gc to be finished. And @casparcwang told me that this bug seems not to be reproduced every time without these JVM args. Thanks. ------------- PR: https://git.openjdk.java.net/jdk16/pull/139 From ayang at openjdk.java.net Sun Jan 31 17:06:45 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Sun, 31 Jan 2021 17:06:45 GMT Subject: Integrated: 8260574: Remove parallel constructs in GenCollectedHeap::process_roots In-Reply-To: References: Message-ID: On Thu, 28 Jan 2021 09:10:52 GMT, Albert Mingkun Yang wrote: > This PR is broken into two commits for easier reviewing, one for removing the usage of `SubTasksDone` in `GenCollectedHeap`, and the other for removing `StrongRootsScope` in the call chain of `GenCollectedHeap::process_roots`. > > Tested: hotspot_gc This pull request has now been integrated. Changeset: 8a9004da Author: Albert Mingkun Yang Committer: Thomas Schatzl URL: https://git.openjdk.java.net/jdk/commit/8a9004da Stats: 95 lines in 8 files changed: 12 ins; 50 del; 33 mod 8260574: Remove parallel constructs in GenCollectedHeap::process_roots Reviewed-by: tschatzl, kbarrett ------------- PR: https://git.openjdk.java.net/jdk/pull/2280 From tschatzl at openjdk.java.net Sun Jan 31 17:11:42 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Sun, 31 Jan 2021 17:11:42 GMT Subject: RFR: 8259862: MutableSpace's end should be atomic In-Reply-To: References: Message-ID: <2cqINp5FrcuaAL6UFRaKW7P3wvCD9hzL_gj8X8t5wdw=.99335a03-1827-4f1c-a0eb-03ef2da13468@github.com> On Sat, 30 Jan 2021 05:51:38 GMT, Kim Barrett wrote: > Please review this change to MutableSpace, making its _end member volatile > and using Atomic operations to access the _top and _end members. Some > unused accessor functions that would otherwise need updating are removed. > > Testing: > mach5 tier1 Marked as reviewed by tschatzl (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2323 From tschatzl at openjdk.java.net Sun Jan 31 17:13:46 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Sun, 31 Jan 2021 17:13:46 GMT Subject: RFR: 8260044: Parallel GC: Concurrent allocation after heap expansion may cause unnecessary full gc [v2] In-Reply-To: References: Message-ID: On Sat, 30 Jan 2021 06:09:01 GMT, Kim Barrett wrote: >> Please review this change to ParallelGC to avoid unnecessary full GCs when >> concurrent threads attempt oldgen allocations during evacuation. >> >> When a GC thread fails an oldgen allocation it expands the heap and retries >> the allocation. If the second allocation attempt fails then allocation >> failure is reported to the caller, which can lead to a full GC. But the >> retried allocation could fail because, after expansion, some other thread >> allocated enough of the available space that the retry fails. This can >> happen even though there is plenty of space available, if only that retry >> were to perform another expansion. >> >> Rather than trying to combine the allocation retry with the expansion (it's >> not clear there's a way to do so without breaking invariants), we instead >> simply loop on the allocation attempt + expand, until either the allocation >> succeeds or the expand fails. If some other thread "steals" space from the >> expanding thread and causes its next allocation attempt to fail and do >> another expansion, that's functionally no different from the expanding >> thread succeeding and causing the other thread to fail allocation and do the >> expand instead. >> >> This change includes modifying PSOldGen::expand_to_reserved to return false >> when there is no space available, where it previously returned true. It's >> not clear why it returned true; that seems wrong, but was harmless. But it >> must not do so with the new looping behavior for allocation, else it would >> never terminate. >> >> Testing: >> mach5 tier1-3, tier5 (tier2-3, 5 do a lot of ParallelGC testing) > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > require non-zero expand size Lgtm. Thanks. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2309 From tschatzl at openjdk.java.net Sun Jan 31 17:13:46 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Sun, 31 Jan 2021 17:13:46 GMT Subject: RFR: 8260044: Parallel GC: Concurrent allocation after heap expansion may cause unnecessary full gc [v2] In-Reply-To: <9Veb4gCU7bPa6rUzOP5pXU8HVOjnBAZGhzzIfY210v4=.44e4b4b7-506f-4b8a-8403-0fa67683a691@github.com> References: <8lue9mX77xZJlpcwsI-k2qqPMPbsdxjbVFbkuIWp__Y=.39bede45-3c80-4cd1-93ea-d037abb3a85d@github.com> <9Veb4gCU7bPa6rUzOP5pXU8HVOjnBAZGhzzIfY210v4=.44e4b4b7-506f-4b8a-8403-0fa67683a691@github.com> Message-ID: On Fri, 29 Jan 2021 12:53:20 GMT, Kim Barrett wrote: >> src/hotspot/share/gc/parallel/psOldGen.hpp line 141: >> >>> 139: do { >>> 140: res = cas_allocate_noexpand(word_size); >>> 141: // Retry failed allocation if expand succeeds. >> >> "... but allocation did not." would be nice to be added to this comment to be complete. > > That's a "failed allocation". Agreed :) ------------- PR: https://git.openjdk.java.net/jdk/pull/2309 From neliasso at openjdk.java.net Sun Jan 31 21:32:46 2021 From: neliasso at openjdk.java.net (Nils Eliasson) Date: Sun, 31 Jan 2021 21:32:46 GMT Subject: [jdk16] RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled In-Reply-To: <5OfnHC5N00VVv3pWcU9gsAHa23RbAAX7ReEw9Ct6eug=.4f095083-7050-487d-94e0-3befce6744c5@github.com> References: <5OfnHC5N00VVv3pWcU9gsAHa23RbAAX7ReEw9Ct6eug=.4f095083-7050-487d-94e0-3befce6744c5@github.com> Message-ID: On Sat, 30 Jan 2021 12:02:25 GMT, ?? wrote: > https://bugs.openjdk.java.net/browse/JDK-8260473 > > Function "PhaseVector::expand_vunbox_node" creates a LoadNode, but forgets to make the LoadNode to pass gc barriers. > > Testing: all Vector API related tests have passed. > > Original pr: https://github.com/openjdk/jdk/pull/2253 Approved. Now awaiting release team approval. ------------- Marked as reviewed by neliasso (Reviewer). PR: https://git.openjdk.java.net/jdk16/pull/139