From github.com+25214855+casparcwang at openjdk.java.net Mon Feb 1 01:31:49 2021 From: github.com+25214855+casparcwang at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Mon, 1 Feb 2021 01:31:49 GMT Subject: [jdk16] Integrated: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled In-Reply-To: <5OfnHC5N00VVv3pWcU9gsAHa23RbAAX7ReEw9Ct6eug=.4f095083-7050-487d-94e0-3befce6744c5@github.com> References: <5OfnHC5N00VVv3pWcU9gsAHa23RbAAX7ReEw9Ct6eug=.4f095083-7050-487d-94e0-3befce6744c5@github.com> Message-ID: On Sat, 30 Jan 2021 12:02:25 GMT, ?? wrote: > https://bugs.openjdk.java.net/browse/JDK-8260473 > > Function "PhaseVector::expand_vunbox_node" creates a LoadNode, but forgets to make the LoadNode to pass gc barriers. > > Testing: all Vector API related tests have passed. > > Original pr: https://github.com/openjdk/jdk/pull/2253 This pull request has now been integrated. Changeset: 0fdf9cdd Author: casparcwang Committer: Jie Fu URL: https://git.openjdk.java.net/jdk16/commit/0fdf9cdd Stats: 174 lines in 2 files changed: 165 ins; 0 del; 9 mod 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled Co-authored-by: Stuart Monteith Co-authored-by: Wang Chao Reviewed-by: vlivanov, neliasso ------------- PR: https://git.openjdk.java.net/jdk16/pull/139 From github.com+25214855+casparcwang at openjdk.java.net Mon Feb 1 01:45:46 2021 From: github.com+25214855+casparcwang at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Mon, 1 Feb 2021 01:45:46 GMT Subject: RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled [v4] In-Reply-To: References: Message-ID: On Fri, 29 Jan 2021 16:47:53 GMT, Vladimir Ivanov wrote: >>> > ArrayCopyNode::load performs the same work as it does here in PhaseVector::optimize_vector_boxes . >>> > Is there a need to provide a similar function in PhaseVector or GraphKit? >>> >>> My point is since PhaseVector effectively enters the parsing phase (by signaling about the possibility of post-parse inlining), technically I don't see why `GraphKit::access_load_at` won't work. But I need to spend more time looking into the details. >>> >>> So far, I took a look at the review thread of 8212243 (which introduced `ArrayCopyNode::load`) and found the following discussion between Roland and Erik: >>> https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2018-October/030971.html >>> >>> ``` >>> > ... Also it beats me that this is strictly speaking a load barrier for loads performed in >>> > arraycopy. Would it be possible to use something like access_load_at instead? ... >>> ... >>> GraphKit is a parse time only thing. So the existing gc interface >>> doesn't offer any way to add barriers once parsing is over. This code >>> runs after parsing in optimization phases. >>> ... >>> ``` >>> >>> Considering `PhaseVector::optimize_vector_boxes()` already has access to a usable `GraphKit` instance, it is possible that `GraphKit::access_load_at` will "just work". >> >> As far as I can see, during the parse phase, GraphKit contains the jvm state info which can be used to get the control and memory for creating new nodes. But during optimization, the jvm state info may be missing like the situation in `PhaseVector::optimize_vector_boxes` or Macro Expansion. So it should use C2OptAccess to create the Load Node directly by providing control and memory nodes. >> >> I think a similar api like `GraphKit::access_load_at ` should be provided for usage during optimization stages, but where should the API be placed? GraphKit or PhaseIterGVN or somewhere else? > >> As far as I can see, during the parse phase, GraphKit contains the jvm state info which can be used to get the control and memory for creating new nodes. But during optimization, the jvm state info may be missing like the situation in PhaseVector::optimize_vector_boxes or Macro Expansion. > > JVM state is irrelevant here (otherwise, `VectorUnbox` node would have captured relevant info during construction). What is actually missing is `GraphKit` instance lacks info about control and memory. You need to explicitly set it using `GraphKit::set_control()` and `GraphKit::set_all_memory()`. Thanks @iwanowww @neliasso @pliden @stooart-mo @XiaohongGong @fisk @DamonFool for the reviews and helping. The patch has integrated in jdk16 (https://github.com/openjdk/jdk16/pull/139), and this pr should be closed. ------------- PR: https://git.openjdk.java.net/jdk/pull/2253 From github.com+25214855+casparcwang at openjdk.java.net Mon Feb 1 01:45:45 2021 From: github.com+25214855+casparcwang at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Mon, 1 Feb 2021 01:45:45 GMT Subject: RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled [v4] In-Reply-To: References: <6nZPJh_IZbeLrS2D1lrwq7NIIry0zGQ8EzAXD6fkSrE=.4b476693-5877-434e-9e97-b26f73870e33@github.com> Message-ID: On Fri, 29 Jan 2021 16:43:54 GMT, Vladimir Ivanov wrote: > > I suggest you keep this CR as it is since 16 is in rampdown and we need to get approval and push it before Feb 4th (and we do want some margin). > > I agree. @casparcwang, please, file an RFE. Jie Fu @DamonFool has helped to create an RFE. https://bugs.openjdk.java.net/browse/JDK-8260682 ------------- PR: https://git.openjdk.java.net/jdk/pull/2253 From github.com+25214855+casparcwang at openjdk.java.net Mon Feb 1 01:45:46 2021 From: github.com+25214855+casparcwang at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Mon, 1 Feb 2021 01:45:46 GMT Subject: Withdrawn: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled In-Reply-To: References: Message-ID: On Wed, 27 Jan 2021 10:05:56 GMT, ?? wrote: > https://bugs.openjdk.java.net/browse/JDK-8260473 > > Function "PhaseVector::expand_vunbox_node" creates a LoadNode, but forgets to make the LoadNode to pass gc barriers. > > > Testing: all Vector API related tests have passed. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/2253 From shade at openjdk.java.net Mon Feb 1 08:52:41 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 1 Feb 2021 08:52:41 GMT Subject: Integrated: 8260591: Shenandoah: improve parallelism for concurrent thread root scans In-Reply-To: References: Message-ID: On Thu, 28 Jan 2021 14:04:07 GMT, Aleksey Shipilev wrote: > Following JDK-8256298, there are a few minor performance issues with the implementation. > > First, in the spirit of JDK-8246100, we should be scanning the Java threads the last, as they have the most parallelism. Less parallel, or lightweight roots should be scanned before them to improve overall parallelism. > > Second, claiming each thread dominates the per-thread processing cost. We should really be doing chunked processing. > > Motivating example is SPECjvm2008 serial, which has very fast concurrent cycles, and thread root scan speed is important. > > Before: > # Baseline > [56.176s][info][gc,stats] Concurrent Mark Roots = 0.308 s (a = 1452 us) (n = 212) (lvls, us = 305, 398, 457, 719, 11216) > [56.176s][info][gc,stats] CMR: = 1.236 s (a = 5832 us) (n = 212) (lvls, us = 2676, 3535, 4199, 5391, 54522) > [56.176s][info][gc,stats] CMR: Thread Roots = 1.179 s (a = 5563 us) (n = 212) (lvls, us = 2441, 3242, 3945, 5156, 54288) > [56.176s][info][gc,stats] CMR: VM Strong Roots = 0.005 s (a = 23 us) (n = 212) (lvls, us = 12, 19, 21, 23, 204) > [56.176s][info][gc,stats] CMR: CLDG Roots = 0.052 s (a = 247 us) (n = 212) (lvls, us = 73, 203, 252, 293, 562) > > ... > [56.176s][info][gc,stats] Concurrent Stack Processing = 0.124 s (a = 5149 us) (n = 24) (lvls, us = 535, 607, 885, 6387, 27177) > [56.176s][info][gc,stats] Threads = 0.632 s (a = 26345 us) (n = 24) (lvls, us = 6465, 8086, 10742, 39453, 145679) > [56.176s][info][gc,stats] CT: = 0.632 s (a = 26345 us) (n = 24) (lvls, us = 6465, 8086, 10742, 39453, 145679) > > After: > [56.010s][info][gc,stats] Concurrent Mark Roots = 0.116 s (a = 587 us) (n = 198) (lvls, us = 312, 371, 400, 502, 4316) > [56.010s][info][gc,stats] CMR: = 0.931 s (a = 4703 us) (n = 198) (lvls, us = 2402, 3438, 3770, 4453, 62629) > [56.010s][info][gc,stats] CMR: Thread Roots = 0.864 s (a = 4366 us) (n = 198) (lvls, us = 1914, 3125, 3477, 4199, 54075) > [56.010s][info][gc,stats] CMR: VM Strong Roots = 0.015 s (a = 76 us) (n = 198) (lvls, us = 20, 31, 35, 38, 4693) > [56.010s][info][gc,stats] CMR: CLDG Roots = 0.052 s (a = 261 us) (n = 198) (lvls, us = 61, 172, 256, 299, 3861) > ... > [56.010s][info][gc,stats] Concurrent Stack Processing = 0.081 s (a = 3671 us) (n = 22) (lvls, us = 457, 537, 770, 3359, 24003) > [56.010s][info][gc,stats] Threads = 0.469 s (a = 21309 us) (n = 22) (lvls, us = 6016, 6855, 8711, 18945, 103939) > [56.010s][info][gc,stats] CT: = 0.469 s (a = 21309 us) (n = 22) (lvls, us = 6016, 6855, 8711, 18945, 103939) This pull request has now been integrated. Changeset: ab727f0a Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/ab727f0a Stats: 39 lines in 3 files changed: 20 ins; 7 del; 12 mod 8260591: Shenandoah: improve parallelism for concurrent thread root scans Reviewed-by: zgu, rkennke ------------- PR: https://git.openjdk.java.net/jdk/pull/2290 From shade at openjdk.java.net Mon Feb 1 09:14:44 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 1 Feb 2021 09:14:44 GMT Subject: RFR: 8260309: Shenandoah: Clean up ShenandoahBarrierSet [v2] In-Reply-To: References: <5t_ZDBfj_4BxoJLoWh3R0r6OCh2Q0wc-DNJntvfhW1Q=.925a092e-c1d3-41df-b216-1cbb0b936959@github.com> Message-ID: On Fri, 29 Jan 2021 14:45:50 GMT, Roman Kennke wrote: >> We collected some cruft in ShenandoahBarrierSet. Time to clean it up. >> >> This fixes/removes a number of includes, fixes some comments and it also removes is_a() and is_aligned() which look like leftovers/requirements from earlier incarnations of the superclass BarrierSet. Using the override keyword would be useful for such situations (btw, are we ok to start using override, nullptr, auto etc in Shenandoah, or do we want to keep it C++ for backporting ease?) >> >> One thing I was not sure about is the ShenandoahHeap* _heap field. Making it const will likely help the compiler avoid repeated access (e.g. in a number of perf-critical paths like the LRB impl). However, maybe we should get rid of the field altogether and make it explicitely using ShenandoahHeap::heap() and avoid repeated access instead of helping the compiler and hoping for the best? >> >> Testing: >> - [x] hotspot_gc_shenandoah release, fastdebug > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - Restore some changes that have been lost during merge > - Merge branch 'master' into JDK-8260309 > - 8260309: Shenandoah: Clean up ShenandoahBarrierSet Looks fine, but I have a minor question. src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.inline.hpp line 28: > 26: #define SHARE_GC_SHENANDOAH_SHENANDOAHBARRIERSET_INLINE_HPP > 27: > 28: #include "gc/shared/accessBarrierSupport.hpp" Should it be `accessBarrierSupport.inline.hpp`? Other `*BarrierSet.inline.hpp`-s seem to include that. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2202 From tschatzl at openjdk.java.net Mon Feb 1 09:46:41 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 1 Feb 2021 09:46:41 GMT Subject: RFR: 8258508: Merge G1RedirtyCardsQueue into qset In-Reply-To: References: Message-ID: On Sat, 30 Jan 2021 10:14:42 GMT, Kim Barrett wrote: > Please review this change to G1RedirtyCardsLocalQueueSet to directly > incorporate the associated queue, simplifying usage. > > Testing: > mach5 tier1 Lgtm. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2325 From sjohanss at openjdk.java.net Mon Feb 1 09:54:39 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Mon, 1 Feb 2021 09:54:39 GMT Subject: RFR: 8217327: G1 Post-Cleanup region liveness printing should not print out-of-date efficiency [v4] In-Reply-To: <95B6j1ZSceUGfTTDsZfF3a5ZbggYlBiv9WJkHKkzO0w=.edd53e67-02ae-4c8a-ae0f-3a50c7ac0676@github.com> References: <95B6j1ZSceUGfTTDsZfF3a5ZbggYlBiv9WJkHKkzO0w=.edd53e67-02ae-4c8a-ae0f-3a50c7ac0676@github.com> Message-ID: On Thu, 28 Jan 2021 12:48:55 GMT, Joakim Nordstr?m wrote: >> **Description** >> This fix addresses the issue where gc-efficiency is printed incorrectly when logging post-marking and post-cleanup. The gc-efficiency is calculated in the end of the marking phase, to be logged in the post-cleanup section. It is however not reset, meaning that next phase's post-marking log will show the old value. >> >> - The gc-efficiency is initialized to -1 when it hasn't been calculated. >> - Negative gc-efficiency is displayed as a hyphen "-" in the summary. >> - The gc-efficiency is reset to -1 in `HeapRegion::note_start_of_marking()` >> >> **Note:** there is a sister issue that moves the post-cleanup printing to a later stage. Without this fix, the logging will still be incorrect, so both fixes are needed. See: [JDK-8260042: G1 Post-cleanup liveness printing occurs too early](https://github.com/openjdk/jdk/pull/2168) >> >> This fix has been tested together with the above mentioned fix. >> >> **Example** >> This is what logging like after fix has been applied. >> ### PHASE Post-Marking @ 410.303 >> ### HEAP reserved: 0x0ffc00000-0x100000000 region-size: 1048576 >> ### >> ### type address-range used prev-live next-live gc-eff remset state code-roots >> ### (bytes) (bytes) (bytes) (bytes/ms) (bytes) (bytes) >> ### OLD 0x0ffc00000-0x0ffd00000 1048368 1048368 1048368 - 8464 UPDAT 6096 >> ### OLD 0x0ffd00000-0x0ffe00000 132856 132856 132856 - 2544 UPDAT 16 >> ### SURV 0x0ffe00000-0x0fff00000 21368 21368 21368 - 2544 CMPLT 16 >> ### FREE 0x0fff00000-0x100000000 0 0 0 - 2384 UNTRA 16 >> ### >> ### SUMMARY capacity: 4.00 MB used: 1.15 MB / 28.67 % prev-live: 1.15 MB / 28.67 % next-live: 1.15 MB / 28.67 % remset: 0.02 MB code-roots: 0.01 MB >> ### PHASE Post-Cleanup @ 410.305 >> ### HEAP reserved: 0x0ffc00000-0x100000000 region-size: 1048576 >> ### >> ### type address-range used prev-live next-live gc-eff remset state code-roots >> ### (bytes) (bytes) (bytes) (bytes/ms) (bytes) (bytes) >> ### OLD 0x0ffc00000-0x0ffd00000 1048368 1048368 1048368 - 8624 UNTRA 6096 >> ### OLD 0x0ffd00000-0x0ffe00000 132856 132856 132856 1352923.9 2544 CMPLT 16 >> ### SURV 0x0ffe00000-0x0fff00000 21368 21368 21368 - 2544 CMPLT 16 >> ### FREE 0x0fff00000-0x100000000 0 0 0 - 2384 UNTRA 16 >> ### >> ### SUMMARY capacity: 4.00 MB used: 1.15 MB / 28.67 % prev-live: 1.15 MB / 28.67 % next-live: 1.15 MB / 28.67 % remset: 0.02 MB code-roots: 0.01 MB >> ### PHASE Post-Marking @ 450.310 >> ### HEAP reserved: 0x0ffc00000-0x100000000 region-size: 1048576 >> ### >> ### type address-range used prev-live next-live gc-eff remset state code-roots >> ### (bytes) (bytes) (bytes) (bytes/ms) (bytes) (bytes) >> ### OLD 0x0ffc00000-0x0ffd00000 1048368 1048368 1048368 - 8624 UPDAT 6096 >> ### OLD 0x0ffd00000-0x0ffe00000 174456 174456 174456 - 2544 UPDAT 16 >> ### SURV 0x0ffe00000-0x0fff00000 21368 21368 21368 - 2544 CMPLT 16 >> ### FREE 0x0fff00000-0x100000000 0 0 0 - 2384 UNTRA 16 >> ### >> ### SUMMARY capacity: 4.00 MB used: 1.19 MB / 29.66 % prev-live: 1.19 MB / 29.66 % next-live: 1.19 MB / 29.66 % remset: 0.02 MB code-roots: 0.01 MB >> ### PHASE Post-Cleanup @ 450.312 >> ### HEAP reserved: 0x0ffc00000-0x100000000 region-size: 1048576 >> ### >> ### type address-range used prev-live next-live gc-eff remset state code-roots >> ### (bytes) (bytes) (bytes) (bytes/ms) (bytes) (bytes) >> ### OLD 0x0ffc00000-0x0ffd00000 1048368 1048368 1048368 - 8624 UNTRA 6096 >> ### OLD 0x0ffd00000-0x0ffe00000 174456 174456 174456 1266519.2 2544 CMPLT 16 >> ### SURV 0x0ffe00000-0x0fff00000 21368 21368 21368 - 2544 CMPLT 16 >> ### FREE 0x0fff00000-0x100000000 0 0 0 - 2384 UNTRA 16 >> ### >> >> **Testing** >> - Manual testing >> - hs-tier1, hs-tier2 > > Joakim Nordstr?m has updated the pull request incrementally with one additional commit since the last revision: > > Using FormatBuffer instead of snprintf. Changed defines to more descriptive names. Looks good. ------------- Marked as reviewed by sjohanss (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2217 From iwalulya at openjdk.java.net Mon Feb 1 09:57:45 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Mon, 1 Feb 2021 09:57:45 GMT Subject: RFR: 8260044: Parallel GC: Concurrent allocation after heap expansion may cause unnecessary full gc [v2] In-Reply-To: References: Message-ID: On Sat, 30 Jan 2021 06:09:01 GMT, Kim Barrett wrote: >> Please review this change to ParallelGC to avoid unnecessary full GCs when >> concurrent threads attempt oldgen allocations during evacuation. >> >> When a GC thread fails an oldgen allocation it expands the heap and retries >> the allocation. If the second allocation attempt fails then allocation >> failure is reported to the caller, which can lead to a full GC. But the >> retried allocation could fail because, after expansion, some other thread >> allocated enough of the available space that the retry fails. This can >> happen even though there is plenty of space available, if only that retry >> were to perform another expansion. >> >> Rather than trying to combine the allocation retry with the expansion (it's >> not clear there's a way to do so without breaking invariants), we instead >> simply loop on the allocation attempt + expand, until either the allocation >> succeeds or the expand fails. If some other thread "steals" space from the >> expanding thread and causes its next allocation attempt to fail and do >> another expansion, that's functionally no different from the expanding >> thread succeeding and causing the other thread to fail allocation and do the >> expand instead. >> >> This change includes modifying PSOldGen::expand_to_reserved to return false >> when there is no space available, where it previously returned true. It's >> not clear why it returned true; that seems wrong, but was harmless. But it >> must not do so with the new looping behavior for allocation, else it would >> never terminate. >> >> Testing: >> mach5 tier1-3, tier5 (tier2-3, 5 do a lot of ParallelGC testing) > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > require non-zero expand size Lgtm! ------------- Marked as reviewed by iwalulya (Committer). PR: https://git.openjdk.java.net/jdk/pull/2309 From sjohanss at openjdk.java.net Mon Feb 1 09:57:46 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Mon, 1 Feb 2021 09:57:46 GMT Subject: RFR: 8260044: Parallel GC: Concurrent allocation after heap expansion may cause unnecessary full gc [v2] In-Reply-To: References: Message-ID: On Sat, 30 Jan 2021 06:09:01 GMT, Kim Barrett wrote: >> Please review this change to ParallelGC to avoid unnecessary full GCs when >> concurrent threads attempt oldgen allocations during evacuation. >> >> When a GC thread fails an oldgen allocation it expands the heap and retries >> the allocation. If the second allocation attempt fails then allocation >> failure is reported to the caller, which can lead to a full GC. But the >> retried allocation could fail because, after expansion, some other thread >> allocated enough of the available space that the retry fails. This can >> happen even though there is plenty of space available, if only that retry >> were to perform another expansion. >> >> Rather than trying to combine the allocation retry with the expansion (it's >> not clear there's a way to do so without breaking invariants), we instead >> simply loop on the allocation attempt + expand, until either the allocation >> succeeds or the expand fails. If some other thread "steals" space from the >> expanding thread and causes its next allocation attempt to fail and do >> another expansion, that's functionally no different from the expanding >> thread succeeding and causing the other thread to fail allocation and do the >> expand instead. >> >> This change includes modifying PSOldGen::expand_to_reserved to return false >> when there is no space available, where it previously returned true. It's >> not clear why it returned true; that seems wrong, but was harmless. But it >> must not do so with the new looping behavior for allocation, else it would >> never terminate. >> >> Testing: >> mach5 tier1-3, tier5 (tier2-3, 5 do a lot of ParallelGC testing) > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > require non-zero expand size Looks good! ------------- Marked as reviewed by sjohanss (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2309 From iwalulya at openjdk.java.net Mon Feb 1 10:23:49 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Mon, 1 Feb 2021 10:23:49 GMT Subject: RFR: 8258508: Merge G1RedirtyCardsQueue into qset In-Reply-To: References: Message-ID: <4HRS-gf7zltSRZ6CmxYhrpDOcqPjXVqHXHlbxQUPJ6M=.b1a935fe-c9f5-4fb1-8102-882904205bfb@github.com> On Sat, 30 Jan 2021 10:14:42 GMT, Kim Barrett wrote: > Please review this change to G1RedirtyCardsLocalQueueSet to directly > incorporate the associated queue, simplifying usage. > > Testing: > mach5 tier1 Looks good! ------------- Marked as reviewed by iwalulya (Committer). PR: https://git.openjdk.java.net/jdk/pull/2325 From rkennke at openjdk.java.net Mon Feb 1 11:00:59 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 1 Feb 2021 11:00:59 GMT Subject: RFR: 8260309: Shenandoah: Clean up ShenandoahBarrierSet [v3] In-Reply-To: <5t_ZDBfj_4BxoJLoWh3R0r6OCh2Q0wc-DNJntvfhW1Q=.925a092e-c1d3-41df-b216-1cbb0b936959@github.com> References: <5t_ZDBfj_4BxoJLoWh3R0r6OCh2Q0wc-DNJntvfhW1Q=.925a092e-c1d3-41df-b216-1cbb0b936959@github.com> Message-ID: <2u_BCcnM4QkcVVj6MVeFDfDgjB789ouIQuBfqY5p6vo=.2a63e65b-4fcd-4c79-9623-ac203c3ba056@github.com> > We collected some cruft in ShenandoahBarrierSet. Time to clean it up. > > This fixes/removes a number of includes, fixes some comments and it also removes is_a() and is_aligned() which look like leftovers/requirements from earlier incarnations of the superclass BarrierSet. Using the override keyword would be useful for such situations (btw, are we ok to start using override, nullptr, auto etc in Shenandoah, or do we want to keep it C++ for backporting ease?) > > One thing I was not sure about is the ShenandoahHeap* _heap field. Making it const will likely help the compiler avoid repeated access (e.g. in a number of perf-critical paths like the LRB impl). However, maybe we should get rid of the field altogether and make it explicitely using ShenandoahHeap::heap() and avoid repeated access instead of helping the compiler and hoping for the best? > > Testing: > - [x] hotspot_gc_shenandoah release, fastdebug Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Include accessBarrierSupport.inline.hpp instead of accessBarrierSupport.hpp ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2202/files - new: https://git.openjdk.java.net/jdk/pull/2202/files/bd7da1e2..5f68a73b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2202&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2202&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2202.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2202/head:pull/2202 PR: https://git.openjdk.java.net/jdk/pull/2202 From rkennke at openjdk.java.net Mon Feb 1 11:01:02 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 1 Feb 2021 11:01:02 GMT Subject: RFR: 8260309: Shenandoah: Clean up ShenandoahBarrierSet [v2] In-Reply-To: References: <5t_ZDBfj_4BxoJLoWh3R0r6OCh2Q0wc-DNJntvfhW1Q=.925a092e-c1d3-41df-b216-1cbb0b936959@github.com> Message-ID: On Mon, 1 Feb 2021 09:06:42 GMT, Aleksey Shipilev wrote: >> Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: >> >> - Restore some changes that have been lost during merge >> - Merge branch 'master' into JDK-8260309 >> - 8260309: Shenandoah: Clean up ShenandoahBarrierSet > > src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.inline.hpp line 28: > >> 26: #define SHARE_GC_SHENANDOAH_SHENANDOAHBARRIERSET_INLINE_HPP >> 27: >> 28: #include "gc/shared/accessBarrierSupport.hpp" > > Should it be `accessBarrierSupport.inline.hpp`? Other `*BarrierSet.inline.hpp`-s seem to include that. Right. I changed that. Thanks! ------------- PR: https://git.openjdk.java.net/jdk/pull/2202 From vlivanov at openjdk.java.net Mon Feb 1 11:38:49 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 1 Feb 2021 11:38:49 GMT Subject: [jdk16] RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled In-Reply-To: References: <5OfnHC5N00VVv3pWcU9gsAHa23RbAAX7ReEw9Ct6eug=.4f095083-7050-487d-94e0-3befce6744c5@github.com> Message-ID: On Sun, 31 Jan 2021 00:41:11 GMT, Jie Fu wrote: > compileonly and compilercount=1 will let the VM run slow enough to wait for a gc to be finished. That's a strange way to provoke the bug. You could just increase the number of iterations instead. But the right way to fix it is to stress ZGC to continuously run in the background while the test case aggressively unboxes vectors in compiled code. `-Xmx256m` helps with that while `-XX:CICompilerCount=1` is irrelevant. ------------- PR: https://git.openjdk.java.net/jdk16/pull/139 From tschatzl at openjdk.java.net Mon Feb 1 11:57:48 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 1 Feb 2021 11:57:48 GMT Subject: RFR: 8260643: Remove parallel version handling in CardTableRS::younger_refs_in_space_iterate() Message-ID: Hi all, can I have reviews for this change that removes parallel handling in `CardTableRS::younger_refs_in_space_iterate` as it is always called with n_threads <= 1, making the parallel code handling there obsolete. A larger cleanup of `CardTableRS` will follow in JDK-8234534. Testing: tier1,2 ------------- Commit messages: - Initial commit Changes: https://git.openjdk.java.net/jdk/pull/2333/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2333&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260643 Stats: 103 lines in 7 files changed: 3 ins; 72 del; 28 mod Patch: https://git.openjdk.java.net/jdk/pull/2333.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2333/head:pull/2333 PR: https://git.openjdk.java.net/jdk/pull/2333 From github.com+25214855+casparcwang at openjdk.java.net Mon Feb 1 12:10:45 2021 From: github.com+25214855+casparcwang at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Mon, 1 Feb 2021 12:10:45 GMT Subject: [jdk16] RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled In-Reply-To: References: <5OfnHC5N00VVv3pWcU9gsAHa23RbAAX7ReEw9Ct6eug=.4f095083-7050-487d-94e0-3befce6744c5@github.com> Message-ID: <_Wm-fi9j4TZ41F0G_92f7ioKQeDNgZiOEMmLkZ0lvvE=.0a9beba5-5089-4368-b4bc-73faf9d5e858@github.com> On Mon, 1 Feb 2021 11:35:13 GMT, Vladimir Ivanov wrote: > > compileonly and compilercount=1 will let the VM run slow enough to wait for a gc to be finished. > > That's a strange way to provoke the bug. You could just increase the number of iterations instead. > > But the right way to fix it is to stress ZGC to continuously run in the background while the test case aggressively unboxes vectors in compiled code. `-Xmx256m` helps with that while `-XX:CICompilerCount=1` is irrelevant. Yes, it's very weird to provoke the bug like this. If CICompilerCount=1 is removed, the test failed 60% roughly on my machine. And the iteration has already changed from 100 to 1000, the run time of the test is nearly 30s on release version of jvm. If I add the following patch, the test always fails on my machine, diff --git a/test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java b/test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java index 1843ec0..959b29a 100644 --- a/test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java +++ b/test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java @@ -44,7 +44,7 @@ import jdk.internal.vm.annotation.ForceInline; * @modules jdk.incubator.vector * @modules java.base/jdk.internal.vm.annotation * @run testng/othervm -XX:CompileCommand=compileonly,jdk/incubator/vector/ByteVector.fromByteBuffer - * -XX:-TieredCompilation -XX:CICompilerCount=1 -XX:+UseZGC -Xbatch -Xmx256m VectorRebracket128Test + * -XX:-TieredCompilation -XX:+UseZGC -Xmx256m VectorRebracket128Test */ @Test @@ -125,6 +125,14 @@ public class VectorRebracket128Test { @ForceInline static void testVectorRebracket(VectorSpecies a, VectorSpecies b, byte[] input, byte[] output) { + new Thread(() -> { + while (true) { + try { + System.gc(); + Thread.sleep(100); + } catch (Exception e) {} + } + }).start(); Vector av = a.fromByteArray(input, 0, ByteOrder.nativeOrder()); int block; assert(input.length == output.length); ------------- PR: https://git.openjdk.java.net/jdk16/pull/139 From github.com+25214855+casparcwang at openjdk.java.net Mon Feb 1 12:18:44 2021 From: github.com+25214855+casparcwang at openjdk.java.net (=?UTF-8?B?546L6LaF?=) Date: Mon, 1 Feb 2021 12:18:44 GMT Subject: [jdk16] RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled In-Reply-To: <_Wm-fi9j4TZ41F0G_92f7ioKQeDNgZiOEMmLkZ0lvvE=.0a9beba5-5089-4368-b4bc-73faf9d5e858@github.com> References: <5OfnHC5N00VVv3pWcU9gsAHa23RbAAX7ReEw9Ct6eug=.4f095083-7050-487d-94e0-3befce6744c5@github.com> <_Wm-fi9j4TZ41F0G_92f7ioKQeDNgZiOEMmLkZ0lvvE=.0a9beba5-5089-4368-b4bc-73faf9d5e858@github.com> Message-ID: On Mon, 1 Feb 2021 12:06:26 GMT, ?? wrote: >>> compileonly and compilercount=1 will let the VM run slow enough to wait for a gc to be finished. >> >> That's a strange way to provoke the bug. You could just increase the number of iterations instead. >> >> But the right way to fix it is to stress ZGC to continuously run in the background while the test case aggressively unboxes vectors in compiled code. `-Xmx256m` helps with that while `-XX:CICompilerCount=1` is irrelevant. > >> > compileonly and compilercount=1 will let the VM run slow enough to wait for a gc to be finished. >> >> That's a strange way to provoke the bug. You could just increase the number of iterations instead. >> >> But the right way to fix it is to stress ZGC to continuously run in the background while the test case aggressively unboxes vectors in compiled code. `-Xmx256m` helps with that while `-XX:CICompilerCount=1` is irrelevant. > > Yes, it's very weird to provoke the bug like this. If CICompilerCount=1 is removed, the test failed 60% roughly on my machine. > And the iteration has already changed from 100 to 1000, the run time of the test is nearly 30s on release version of jvm. > > If I add the following patch, the test always fails on my machine, > > diff --git a/test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java b/test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java > index 1843ec0..959b29a 100644 > --- a/test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java > +++ b/test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java > @@ -44,7 +44,7 @@ import jdk.internal.vm.annotation.ForceInline; > * @modules jdk.incubator.vector > * @modules java.base/jdk.internal.vm.annotation > * @run testng/othervm -XX:CompileCommand=compileonly,jdk/incubator/vector/ByteVector.fromByteBuffer > - * -XX:-TieredCompilation -XX:CICompilerCount=1 -XX:+UseZGC -Xbatch -Xmx256m VectorRebracket128Test > + * -XX:-TieredCompilation -XX:+UseZGC -Xmx256m VectorRebracket128Test > */ > > @Test > @@ -125,6 +125,14 @@ public class VectorRebracket128Test { > @ForceInline > static > void testVectorRebracket(VectorSpecies a, VectorSpecies b, byte[] input, byte[] output) { > + new Thread(() -> { > + while (true) { > + try { > + System.gc(); > + Thread.sleep(100); > + } catch (Exception e) {} > + } > + }).start(); > Vector av = a.fromByteArray(input, 0, ByteOrder.nativeOrder()); > int block; > assert(input.length == output.length); sorry for the wrong patch above, the failed reason of the patch above is due to stack creation failure (create 1000 threads). The following is the right stress gc patch. diff --git a/test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java b/test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java index 6b266db..a761ea2 100644 --- a/test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java +++ b/test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java @@ -44,7 +44,7 @@ import jdk.internal.vm.annotation.ForceInline; * @modules jdk.incubator.vector * @modules java.base/jdk.internal.vm.annotation * @run testng/othervm -XX:CompileCommand=compileonly,jdk/incubator/vector/ByteVector.fromByteBuffer - * -XX:-TieredCompilation -XX:CICompilerCount=1 -XX:+UseZGC -Xbatch -Xmx256m VectorRebracket128Test + * -XX:-TieredCompilation -XX:+UseZGC -Xmx256m VectorRebracket128Test */ @Test @@ -59,6 +59,19 @@ public class VectorRebracket128Test { static final VectorSpecies bspec128 = ByteVector.SPECIES_128; static final VectorSpecies sspec128 = ShortVector.SPECIES_128; + static { + Thread t = new Thread(() -> { + while (true) { + try { + System.gc(); + Thread.sleep(100); + } catch (Exception e) {} + } + }); + t.setDaemon(true); + t.start(); + } + static IntFunction withToString(String s, IntFunction f) { return new IntFunction() { @Override ------------- PR: https://git.openjdk.java.net/jdk16/pull/139 From magnus.ihse.bursie at oracle.com Mon Feb 1 12:29:30 2021 From: magnus.ihse.bursie at oracle.com (Magnus Ihse Bursie) Date: Mon, 1 Feb 2021 13:29:30 +0100 Subject: Build fails when excluding Serial GC In-Reply-To: <2514512e-68d5-868a-5f05-c9d765ae3486@oracle.com> References: <7e2adbed-1b0b-4693-92c0-5c03963b3c55.qingfeng.yy@alibaba-inc.com> <88f8f4b4-941a-5df3-6a89-28741d2f6c7b@oracle.com> <2514512e-68d5-868a-5f05-c9d765ae3486@oracle.com> Message-ID: <3ea7def6-025f-3b63-6598-df001fa8258a@oracle.com> On 2021-01-29 11:19, Stefan Karlsson wrote: > On 2021-01-29 10:49, Magnus Ihse Bursie wrote: >> >> >> On 2021-01-29 09:03, Yang Yi wrote: >>> Hi, >>> >>> It's quite easy to reproduce this problem: >>> ./configure --with-jvm-features=-serialgc ... ; make images >>> >>> I got the following output >>> ``` >>> ... >>> === Output from failing command(s) repeated here === >>> * For target hotspot_variant-server_libjvm_objs_genCollectedHeap.o: >>> /home/qingfeng.yy/openjdk16_so_warning/jdk/src/hotspot/share/gc/shared/genCollectedHeap.cpp: >>> In member function 'virtual void GenCollectedHeap::post_initialize()': >>> /home/qingfeng.yy/openjdk16_so_warning/jdk/src/hotspot/share/gc/shared/genCollectedHeap.cpp:206:3: >>> error: 'MarkSweep' has not been declared >>> ?? 206 |?? MarkSweep::initialize(); >>> ?????? |?? ^~~~~~~~~ >>> * All command lines available in >>> /home/qingfeng.yy/openjdk16_so_warning/jdk/build/linux-x86_64-server-release/make-support/failure-logs. >>> === End of repeated output === >>> ``` >>> I found current JVM features contain the serial gc, but actually I >>> can not >>> build an image that does not contain serial gc. This problem has >>> existed >>> from jdk 11 to jdk head. I am somewhat surprised, so I haven't filed an >>> issue on JBS. Is this really a bug? Or actually we should revise the >>> building >>> document and remove all INCLUDE_SERIALGC macros? >> >> About a year ago I opened >> https://bugs.openjdk.java.net/browse/JDK-8240224, to fix this (and >> other things). This caused quite a heated debate [1], and the result >> was that I closed the bug again. >> >> In summary, my understanding is that hotspot developers view the >> serialgc as essential, and that there exists no reason beyond toy >> applications to remove it from compilation. But furthermore the >> INCLUDE_SERIALGC macros should remain, even though they do not really >> work, since they function as markers of intent for the code. I don't? >> agree 100% with this stance, but it's not my code to complain about. :-) > > I think you got push back on some of the changes. To me and many > others the gcConfig.* changes were really controversial. It doesn't > mean that fixes to clean this up won't be accepted. Fair enough. I think the road to make it possible to exclude serial gc is to first fix the hotspot code for e.g. JDK-8234502, and then we can revisit the build changes needed. But I did get the impression from some developers that this was a futile exercise. /Magnus > In that mail thread, there was a reference to this bug '8234502: Merge > GenCollectedHeap and SerialHeap'. Chipping away at that would be good. > Fixing that would not only make it possible to build without Serial > GC, but also help with the maintainability of our code. > > StefanK > >> >> Possibly, the configure script should be changed so it does not look >> like it's possible to exclude the serialgc... >> >> /Magnus >> >> >> [1] >> https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-March/028779.html >> >>> >>> Cheers,Yang Yi >>> >> > From vlivanov at openjdk.java.net Mon Feb 1 12:47:46 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Mon, 1 Feb 2021 12:47:46 GMT Subject: [jdk16] RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled In-Reply-To: References: <5OfnHC5N00VVv3pWcU9gsAHa23RbAAX7ReEw9Ct6eug=.4f095083-7050-487d-94e0-3befce6744c5@github.com> <_Wm-fi9j4TZ41F0G_92f7ioKQeDNgZiOEMmLkZ0lvvE=.0a9beba5-5089-4368-b4bc-73faf9d5e858@github.com> Message-ID: <226iFOsl1hXrEoSe9uzgBb1Z75wxQEv5azlJIfzCO4k=.69d5ed3a-7337-472d-b106-1ce2e5d361bf@github.com> On Mon, 1 Feb 2021 12:15:38 GMT, ?? wrote: >>> > compileonly and compilercount=1 will let the VM run slow enough to wait for a gc to be finished. >>> >>> That's a strange way to provoke the bug. You could just increase the number of iterations instead. >>> >>> But the right way to fix it is to stress ZGC to continuously run in the background while the test case aggressively unboxes vectors in compiled code. `-Xmx256m` helps with that while `-XX:CICompilerCount=1` is irrelevant. >> >> Yes, it's very weird to provoke the bug like this. If CICompilerCount=1 is removed, the test failed 60% roughly on my machine. >> And the iteration has already changed from 100 to 1000, the run time of the test is nearly 30s on release version of jvm. >> >> If I add the following patch, the test always fails on my machine, >> >> diff --git a/test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java b/test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java >> index 1843ec0..959b29a 100644 >> --- a/test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java >> +++ b/test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java >> @@ -44,7 +44,7 @@ import jdk.internal.vm.annotation.ForceInline; >> * @modules jdk.incubator.vector >> * @modules java.base/jdk.internal.vm.annotation >> * @run testng/othervm -XX:CompileCommand=compileonly,jdk/incubator/vector/ByteVector.fromByteBuffer >> - * -XX:-TieredCompilation -XX:CICompilerCount=1 -XX:+UseZGC -Xbatch -Xmx256m VectorRebracket128Test >> + * -XX:-TieredCompilation -XX:+UseZGC -Xmx256m VectorRebracket128Test >> */ >> >> @Test >> @@ -125,6 +125,14 @@ public class VectorRebracket128Test { >> @ForceInline >> static >> void testVectorRebracket(VectorSpecies a, VectorSpecies b, byte[] input, byte[] output) { >> + new Thread(() -> { >> + while (true) { >> + try { >> + System.gc(); >> + Thread.sleep(100); >> + } catch (Exception e) {} >> + } >> + }).start(); >> Vector av = a.fromByteArray(input, 0, ByteOrder.nativeOrder()); >> int block; >> assert(input.length == output.length); > > sorry for the wrong patch above, the failed reason of the patch above is due to stack creation failure (create 1000 threads). The following is the right stress gc patch. > > diff --git a/test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java b/test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java > index 6b266db..a761ea2 100644 > --- a/test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java > +++ b/test/hotspot/jtreg/compiler/vectorapi/VectorRebracket128Test.java > @@ -44,7 +44,7 @@ import jdk.internal.vm.annotation.ForceInline; > * @modules jdk.incubator.vector > * @modules java.base/jdk.internal.vm.annotation > * @run testng/othervm -XX:CompileCommand=compileonly,jdk/incubator/vector/ByteVector.fromByteBuffer > - * -XX:-TieredCompilation -XX:CICompilerCount=1 -XX:+UseZGC -Xbatch -Xmx256m VectorRebracket128Test > + * -XX:-TieredCompilation -XX:+UseZGC -Xmx256m VectorRebracket128Test > */ > > @Test > @@ -59,6 +59,19 @@ public class VectorRebracket128Test { > static final VectorSpecies bspec128 = ByteVector.SPECIES_128; > static final VectorSpecies sspec128 = ShortVector.SPECIES_128; > > + static { > + Thread t = new Thread(() -> { > + while (true) { > + try { > + System.gc(); > + Thread.sleep(100); > + } catch (Exception e) {} > + } > + }); > + t.setDaemon(true); > + t.start(); > + } > + > static IntFunction withToString(String s, IntFunction f) { > return new IntFunction() { > @Override Good. Please, file a follow-up RFE to improve the test. ------------- PR: https://git.openjdk.java.net/jdk16/pull/139 From shade at openjdk.java.net Mon Feb 1 13:16:42 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 1 Feb 2021 13:16:42 GMT Subject: RFR: 8260309: Shenandoah: Clean up ShenandoahBarrierSet [v3] In-Reply-To: <2u_BCcnM4QkcVVj6MVeFDfDgjB789ouIQuBfqY5p6vo=.2a63e65b-4fcd-4c79-9623-ac203c3ba056@github.com> References: <5t_ZDBfj_4BxoJLoWh3R0r6OCh2Q0wc-DNJntvfhW1Q=.925a092e-c1d3-41df-b216-1cbb0b936959@github.com> <2u_BCcnM4QkcVVj6MVeFDfDgjB789ouIQuBfqY5p6vo=.2a63e65b-4fcd-4c79-9623-ac203c3ba056@github.com> Message-ID: <73Eq3oBXjf_ANTph4ijuK4hfVx-6do89K2w3ChROVlA=.33dfa07d-0a6c-405a-8f65-0671be7cf980@github.com> On Mon, 1 Feb 2021 11:00:59 GMT, Roman Kennke wrote: >> We collected some cruft in ShenandoahBarrierSet. Time to clean it up. >> >> This fixes/removes a number of includes, fixes some comments and it also removes is_a() and is_aligned() which look like leftovers/requirements from earlier incarnations of the superclass BarrierSet. Using the override keyword would be useful for such situations (btw, are we ok to start using override, nullptr, auto etc in Shenandoah, or do we want to keep it C++ for backporting ease?) >> >> One thing I was not sure about is the ShenandoahHeap* _heap field. Making it const will likely help the compiler avoid repeated access (e.g. in a number of perf-critical paths like the LRB impl). However, maybe we should get rid of the field altogether and make it explicitely using ShenandoahHeap::heap() and avoid repeated access instead of helping the compiler and hoping for the best? >> >> Testing: >> - [x] hotspot_gc_shenandoah release, fastdebug > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Include accessBarrierSupport.inline.hpp instead of accessBarrierSupport.hpp Marked as reviewed by shade (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2202 From ayang at openjdk.java.net Mon Feb 1 13:56:46 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Mon, 1 Feb 2021 13:56:46 GMT Subject: RFR: 8260643: Remove parallel version handling in CardTableRS::younger_refs_in_space_iterate() In-Reply-To: References: Message-ID: On Mon, 1 Feb 2021 11:53:00 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this change that removes parallel handling in `CardTableRS::younger_refs_in_space_iterate` as it is always called with n_threads <= 1, making the parallel code handling there obsolete. > > A larger cleanup of `CardTableRS` will follow in JDK-8234534. > > Testing: > tier1,2 Marked as reviewed by ayang (Author). ------------- PR: https://git.openjdk.java.net/jdk/pull/2333 From sjohanss at openjdk.java.net Mon Feb 1 15:35:56 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Mon, 1 Feb 2021 15:35:56 GMT Subject: RFR: 8260643: Remove parallel version handling in CardTableRS::younger_refs_in_space_iterate() In-Reply-To: References: Message-ID: On Mon, 1 Feb 2021 11:53:00 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this change that removes parallel handling in `CardTableRS::younger_refs_in_space_iterate` as it is always called with n_threads <= 1, making the parallel code handling there obsolete. > > A larger cleanup of `CardTableRS` will follow in JDK-8234534. > > Testing: > tier1,2 Looks good. ------------- Marked as reviewed by sjohanss (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2333 From zgu at openjdk.java.net Mon Feb 1 15:38:09 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Mon, 1 Feb 2021 15:38:09 GMT Subject: RFR: 8260004: Shenandoah: Rename ShenandoahMarkCompact to ShenandoahFullGC [v3] In-Reply-To: References: Message-ID: > Please review this patch that renames ShenandoahMarkCompact to ShenandoahFullGC, to be consistent with other GCs. Zhengyu Gu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - Merge branch 'master' into JDK-8260004-rename-fullgc - Merge master - JDK-8260004-rename-fullgc ------------- Changes: https://git.openjdk.java.net/jdk/pull/2266/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2266&range=02 Stats: 38 lines in 9 files changed: 4 ins; 6 del; 28 mod Patch: https://git.openjdk.java.net/jdk/pull/2266.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2266/head:pull/2266 PR: https://git.openjdk.java.net/jdk/pull/2266 From github.com+779991+jaokim at openjdk.java.net Mon Feb 1 15:44:41 2021 From: github.com+779991+jaokim at openjdk.java.net (Joakim =?UTF-8?B?Tm9yZHN0csO2bQ==?=) Date: Mon, 1 Feb 2021 15:44:41 GMT Subject: RFR: 8217327: G1 Post-Cleanup region liveness printing should not print out-of-date efficiency [v4] In-Reply-To: References: <95B6j1ZSceUGfTTDsZfF3a5ZbggYlBiv9WJkHKkzO0w=.edd53e67-02ae-4c8a-ae0f-3a50c7ac0676@github.com> Message-ID: On Mon, 1 Feb 2021 09:52:19 GMT, Stefan Johansson wrote: >> Joakim Nordstr?m has updated the pull request incrementally with one additional commit since the last revision: >> >> Using FormatBuffer instead of snprintf. Changed defines to more descriptive names. > > Looks good. Thanks for review @kstefanj. ------------- PR: https://git.openjdk.java.net/jdk/pull/2217 From zgu at openjdk.java.net Mon Feb 1 16:07:42 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Mon, 1 Feb 2021 16:07:42 GMT Subject: RFR: 8260309: Shenandoah: Clean up ShenandoahBarrierSet [v3] In-Reply-To: <2u_BCcnM4QkcVVj6MVeFDfDgjB789ouIQuBfqY5p6vo=.2a63e65b-4fcd-4c79-9623-ac203c3ba056@github.com> References: <5t_ZDBfj_4BxoJLoWh3R0r6OCh2Q0wc-DNJntvfhW1Q=.925a092e-c1d3-41df-b216-1cbb0b936959@github.com> <2u_BCcnM4QkcVVj6MVeFDfDgjB789ouIQuBfqY5p6vo=.2a63e65b-4fcd-4c79-9623-ac203c3ba056@github.com> Message-ID: <79a4QDkcjzuW0lJPfAaJawWwBn4pejdpTzzDaZxaFl0=.72084bab-c67f-4559-9ab2-235a0126da5d@github.com> On Mon, 1 Feb 2021 11:00:59 GMT, Roman Kennke wrote: >> We collected some cruft in ShenandoahBarrierSet. Time to clean it up. >> >> This fixes/removes a number of includes, fixes some comments and it also removes is_a() and is_aligned() which look like leftovers/requirements from earlier incarnations of the superclass BarrierSet. Using the override keyword would be useful for such situations (btw, are we ok to start using override, nullptr, auto etc in Shenandoah, or do we want to keep it C++ for backporting ease?) >> >> One thing I was not sure about is the ShenandoahHeap* _heap field. Making it const will likely help the compiler avoid repeated access (e.g. in a number of perf-critical paths like the LRB impl). However, maybe we should get rid of the field altogether and make it explicitely using ShenandoahHeap::heap() and avoid repeated access instead of helping the compiler and hoping for the best? >> >> Testing: >> - [x] hotspot_gc_shenandoah release, fastdebug > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Include accessBarrierSupport.inline.hpp instead of accessBarrierSupport.hpp Looks good. ------------- Marked as reviewed by zgu (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2202 From rkennke at openjdk.java.net Mon Feb 1 17:32:41 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 1 Feb 2021 17:32:41 GMT Subject: Integrated: 8260309: Shenandoah: Clean up ShenandoahBarrierSet In-Reply-To: <5t_ZDBfj_4BxoJLoWh3R0r6OCh2Q0wc-DNJntvfhW1Q=.925a092e-c1d3-41df-b216-1cbb0b936959@github.com> References: <5t_ZDBfj_4BxoJLoWh3R0r6OCh2Q0wc-DNJntvfhW1Q=.925a092e-c1d3-41df-b216-1cbb0b936959@github.com> Message-ID: On Fri, 22 Jan 2021 19:03:14 GMT, Roman Kennke wrote: > We collected some cruft in ShenandoahBarrierSet. Time to clean it up. > > This fixes/removes a number of includes, fixes some comments and it also removes is_a() and is_aligned() which look like leftovers/requirements from earlier incarnations of the superclass BarrierSet. Using the override keyword would be useful for such situations (btw, are we ok to start using override, nullptr, auto etc in Shenandoah, or do we want to keep it C++ for backporting ease?) > > One thing I was not sure about is the ShenandoahHeap* _heap field. Making it const will likely help the compiler avoid repeated access (e.g. in a number of perf-critical paths like the LRB impl). However, maybe we should get rid of the field altogether and make it explicitely using ShenandoahHeap::heap() and avoid repeated access instead of helping the compiler and hoping for the best? > > Testing: > - [x] hotspot_gc_shenandoah release, fastdebug This pull request has now been integrated. Changeset: df33595e Author: Roman Kennke URL: https://git.openjdk.java.net/jdk/commit/df33595e Stats: 31 lines in 6 files changed: 4 ins; 19 del; 8 mod 8260309: Shenandoah: Clean up ShenandoahBarrierSet Reviewed-by: shade, zgu ------------- PR: https://git.openjdk.java.net/jdk/pull/2202 From zgu at openjdk.java.net Mon Feb 1 18:13:46 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Mon, 1 Feb 2021 18:13:46 GMT Subject: Integrated: 8260004: Shenandoah: Rename ShenandoahMarkCompact to ShenandoahFullGC In-Reply-To: References: Message-ID: <9hC5B8QLUCOrNRRz4LN22Zyv_rPDN50nl3rdG_okC6w=.88d570f7-153a-49f8-b41b-ef0678d776d7@github.com> On Wed, 27 Jan 2021 18:16:09 GMT, Zhengyu Gu wrote: > Please review this patch that renames ShenandoahMarkCompact to ShenandoahFullGC, to be consistent with other GCs. This pull request has now been integrated. Changeset: e963ebd7 Author: Zhengyu Gu URL: https://git.openjdk.java.net/jdk/commit/e963ebd7 Stats: 38 lines in 9 files changed: 4 ins; 6 del; 28 mod 8260004: Shenandoah: Rename ShenandoahMarkCompact to ShenandoahFullGC Reviewed-by: shade, rkennke ------------- PR: https://git.openjdk.java.net/jdk/pull/2266 From github.com+779991+jaokim at openjdk.java.net Mon Feb 1 18:22:42 2021 From: github.com+779991+jaokim at openjdk.java.net (Joakim =?UTF-8?B?Tm9yZHN0csO2bQ==?=) Date: Mon, 1 Feb 2021 18:22:42 GMT Subject: Integrated: 8217327: G1 Post-Cleanup region liveness printing should not print out-of-date efficiency In-Reply-To: References: Message-ID: On Mon, 25 Jan 2021 11:52:26 GMT, Joakim Nordstr?m wrote: > **Description** > This fix addresses the issue where gc-efficiency is printed incorrectly when logging post-marking and post-cleanup. The gc-efficiency is calculated in the end of the marking phase, to be logged in the post-cleanup section. It is however not reset, meaning that next phase's post-marking log will show the old value. > > - The gc-efficiency is initialized to -1 when it hasn't been calculated. > - Negative gc-efficiency is displayed as a hyphen "-" in the summary. > - The gc-efficiency is reset to -1 in `HeapRegion::note_start_of_marking()` > > **Note:** there is a sister issue that moves the post-cleanup printing to a later stage. Without this fix, the logging will still be incorrect, so both fixes are needed. See: [JDK-8260042: G1 Post-cleanup liveness printing occurs too early](https://github.com/openjdk/jdk/pull/2168) > > This fix has been tested together with the above mentioned fix. > > **Example** > This is what logging like after fix has been applied. > ### PHASE Post-Marking @ 410.303 > ### HEAP reserved: 0x0ffc00000-0x100000000 region-size: 1048576 > ### > ### type address-range used prev-live next-live gc-eff remset state code-roots > ### (bytes) (bytes) (bytes) (bytes/ms) (bytes) (bytes) > ### OLD 0x0ffc00000-0x0ffd00000 1048368 1048368 1048368 - 8464 UPDAT 6096 > ### OLD 0x0ffd00000-0x0ffe00000 132856 132856 132856 - 2544 UPDAT 16 > ### SURV 0x0ffe00000-0x0fff00000 21368 21368 21368 - 2544 CMPLT 16 > ### FREE 0x0fff00000-0x100000000 0 0 0 - 2384 UNTRA 16 > ### > ### SUMMARY capacity: 4.00 MB used: 1.15 MB / 28.67 % prev-live: 1.15 MB / 28.67 % next-live: 1.15 MB / 28.67 % remset: 0.02 MB code-roots: 0.01 MB > ### PHASE Post-Cleanup @ 410.305 > ### HEAP reserved: 0x0ffc00000-0x100000000 region-size: 1048576 > ### > ### type address-range used prev-live next-live gc-eff remset state code-roots > ### (bytes) (bytes) (bytes) (bytes/ms) (bytes) (bytes) > ### OLD 0x0ffc00000-0x0ffd00000 1048368 1048368 1048368 - 8624 UNTRA 6096 > ### OLD 0x0ffd00000-0x0ffe00000 132856 132856 132856 1352923.9 2544 CMPLT 16 > ### SURV 0x0ffe00000-0x0fff00000 21368 21368 21368 - 2544 CMPLT 16 > ### FREE 0x0fff00000-0x100000000 0 0 0 - 2384 UNTRA 16 > ### > ### SUMMARY capacity: 4.00 MB used: 1.15 MB / 28.67 % prev-live: 1.15 MB / 28.67 % next-live: 1.15 MB / 28.67 % remset: 0.02 MB code-roots: 0.01 MB > ### PHASE Post-Marking @ 450.310 > ### HEAP reserved: 0x0ffc00000-0x100000000 region-size: 1048576 > ### > ### type address-range used prev-live next-live gc-eff remset state code-roots > ### (bytes) (bytes) (bytes) (bytes/ms) (bytes) (bytes) > ### OLD 0x0ffc00000-0x0ffd00000 1048368 1048368 1048368 - 8624 UPDAT 6096 > ### OLD 0x0ffd00000-0x0ffe00000 174456 174456 174456 - 2544 UPDAT 16 > ### SURV 0x0ffe00000-0x0fff00000 21368 21368 21368 - 2544 CMPLT 16 > ### FREE 0x0fff00000-0x100000000 0 0 0 - 2384 UNTRA 16 > ### > ### SUMMARY capacity: 4.00 MB used: 1.19 MB / 29.66 % prev-live: 1.19 MB / 29.66 % next-live: 1.19 MB / 29.66 % remset: 0.02 MB code-roots: 0.01 MB > ### PHASE Post-Cleanup @ 450.312 > ### HEAP reserved: 0x0ffc00000-0x100000000 region-size: 1048576 > ### > ### type address-range used prev-live next-live gc-eff remset state code-roots > ### (bytes) (bytes) (bytes) (bytes/ms) (bytes) (bytes) > ### OLD 0x0ffc00000-0x0ffd00000 1048368 1048368 1048368 - 8624 UNTRA 6096 > ### OLD 0x0ffd00000-0x0ffe00000 174456 174456 174456 1266519.2 2544 CMPLT 16 > ### SURV 0x0ffe00000-0x0fff00000 21368 21368 21368 - 2544 CMPLT 16 > ### FREE 0x0fff00000-0x100000000 0 0 0 - 2384 UNTRA 16 > ### > > **Testing** > - Manual testing > - hs-tier1, hs-tier2 This pull request has now been integrated. Changeset: 50f9a70f Author: JSNORDST Committer: Thomas Schatzl URL: https://git.openjdk.java.net/jdk/commit/50f9a70f Stats: 30 lines in 3 files changed: 10 ins; 0 del; 20 mod 8217327: G1 Post-Cleanup region liveness printing should not print out-of-date efficiency Reviewed-by: tschatzl, sjohanss ------------- PR: https://git.openjdk.java.net/jdk/pull/2217 From zgu at openjdk.java.net Mon Feb 1 20:56:49 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Mon, 1 Feb 2021 20:56:49 GMT Subject: RFR: 8260736: Shenandoah: Cleanup includes in ShenandoahGC and families Message-ID: 8260736: Shenandoah: Cleanup includes in ShenandoahGC and families ------------- Commit messages: - update - Merge branch 'master' into JDK-8260736-cleanup-includes-gc - update - init Changes: https://git.openjdk.java.net/jdk/pull/2339/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2339&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260736 Stats: 37 lines in 10 files changed: 2 ins; 30 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/2339.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2339/head:pull/2339 PR: https://git.openjdk.java.net/jdk/pull/2339 From zgu at openjdk.java.net Mon Feb 1 21:25:54 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Mon, 1 Feb 2021 21:25:54 GMT Subject: RFR: 8260736: Shenandoah: Cleanup includes in ShenandoahGC and families [v2] In-Reply-To: References: Message-ID: > 8260736: Shenandoah: Cleanup includes in ShenandoahGC and families Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: Added back vmThread.hpp ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2339/files - new: https://git.openjdk.java.net/jdk/pull/2339/files/001c3094..87924b4c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2339&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2339&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2339.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2339/head:pull/2339 PR: https://git.openjdk.java.net/jdk/pull/2339 From kbarrett at openjdk.java.net Mon Feb 1 21:51:54 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Mon, 1 Feb 2021 21:51:54 GMT Subject: [jdk16] RFR: 8260704: ParallelGC: oldgen expansion needs release-store for _end Message-ID: Please review this change that ensures MutableSpace::_end is updated after everything else that is relevant when expanding, by using a release_store to perform the update. With this change the storestore that was added by JDK-8257999 is no longer needed. Testing: mach5 tier1-3, tier5 ------------- Commit messages: - Move JDK-8257999 barrier to correct location Changes: https://git.openjdk.java.net/jdk16/pull/141/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk16&pr=141&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260704 Stats: 11 lines in 2 files changed: 4 ins; 1 del; 6 mod Patch: https://git.openjdk.java.net/jdk16/pull/141.diff Fetch: git fetch https://git.openjdk.java.net/jdk16 pull/141/head:pull/141 PR: https://git.openjdk.java.net/jdk16/pull/141 From jiefu at openjdk.java.net Tue Feb 2 02:01:48 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Tue, 2 Feb 2021 02:01:48 GMT Subject: [jdk16] RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled In-Reply-To: <226iFOsl1hXrEoSe9uzgBb1Z75wxQEv5azlJIfzCO4k=.69d5ed3a-7337-472d-b106-1ce2e5d361bf@github.com> References: <5OfnHC5N00VVv3pWcU9gsAHa23RbAAX7ReEw9Ct6eug=.4f095083-7050-487d-94e0-3befce6744c5@github.com> <_Wm-fi9j4TZ41F0G_92f7ioKQeDNgZiOEMmLkZ0lvvE=.0a9beba5-5089-4368-b4bc-73faf9d5e858@github.com> <226iFOsl1hXrEoSe9uzgBb1Z75wxQEv5azlJIfzCO4k=.69d5ed3a-7337-472d-b106-1ce2e5d361bf@github.com> Message-ID: On Mon, 1 Feb 2021 12:44:59 GMT, Vladimir Ivanov wrote: > Good. Please, file a follow-up RFE to improve the test. OK. I will help to file a JBS bug once the fix has been merged into the jdk mainline. It will be only fixed in the jdk17, right? Thanks. ------------- PR: https://git.openjdk.java.net/jdk16/pull/139 From iklam at openjdk.java.net Tue Feb 2 04:34:47 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Tue, 2 Feb 2021 04:34:47 GMT Subject: RFR: 8260012: Reduce inclusion of collectedHeap.hpp and heapInspection.hpp Message-ID: collectedHeap.hpp is included by 477 out of 1000 .o files in HotSpot. This file in turn includes many other complex header files. In many cases, an object file only directly includes this file via: - memAllocator.hpp (which does not actually use collectedHeap.hpp) - oop.inline.hpp and compressedOops.inline.hpp (only use collectedHeap.hpp in asserts via `Universe::heap()->is_in()`). By refactoring the above 3 files, we can reduce the .o files that include collectedHeap.hpp to 242. This RFE also removes the unnecessary inclusion of heapInspection.hpp from collectedHeap.hpp. Build time of HotSpot is reduced for about 1%. Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. ------------- Commit messages: - 8260012: Reduce inclusion of collectedHeap.hpp and heapInspection.hpp Changes: https://git.openjdk.java.net/jdk/pull/2347/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2347&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260012 Stats: 110 lines in 60 files changed: 63 ins; 7 del; 40 mod Patch: https://git.openjdk.java.net/jdk/pull/2347.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2347/head:pull/2347 PR: https://git.openjdk.java.net/jdk/pull/2347 From tschatzl at openjdk.java.net Tue Feb 2 07:59:46 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 2 Feb 2021 07:59:46 GMT Subject: [jdk16] RFR: 8260704: ParallelGC: oldgen expansion needs release-store for _end In-Reply-To: References: Message-ID: <1YY9KmPpKnvdDeecG5Y8Ckb-eCG3vjgnl7O7R1hB1sQ=.435ae6c7-3ed0-4a80-a4ea-22bdefd5811c@github.com> On Mon, 1 Feb 2021 10:10:48 GMT, Kim Barrett wrote: > Please review this change that ensures MutableSpace::_end is updated after > everything else that is relevant when expanding, by using a release_store to > perform the update. With this change the storestore that was added by > JDK-8257999 is no longer needed. > > Testing: > mach5 tier1-3, tier5 Lgtm ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk16/pull/141 From sjohanss at openjdk.java.net Tue Feb 2 09:02:47 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Tue, 2 Feb 2021 09:02:47 GMT Subject: [jdk16] RFR: 8260704: ParallelGC: oldgen expansion needs release-store for _end In-Reply-To: References: Message-ID: On Mon, 1 Feb 2021 10:10:48 GMT, Kim Barrett wrote: > Please review this change that ensures MutableSpace::_end is updated after > everything else that is relevant when expanding, by using a release_store to > perform the update. With this change the storestore that was added by > JDK-8257999 is no longer needed. > > Testing: > mach5 tier1-3, tier5 Looks good. ------------- Marked as reviewed by sjohanss (Reviewer). PR: https://git.openjdk.java.net/jdk16/pull/141 From tschatzl at openjdk.java.net Tue Feb 2 11:44:59 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 2 Feb 2021 11:44:59 GMT Subject: RFR: 8260643: Remove parallel version handling in CardTableRS::younger_refs_in_space_iterate() In-Reply-To: References: Message-ID: <81e4IJaRYyC_fIcd2uyNXwvRXM5rnNytai82oDNQk4w=.51b2a370-66e4-4a2d-9b15-8736ea7a7a30@github.com> On Mon, 1 Feb 2021 15:32:53 GMT, Stefan Johansson wrote: >> Hi all, >> >> can I have reviews for this change that removes parallel handling in `CardTableRS::younger_refs_in_space_iterate` as it is always called with n_threads <= 1, making the parallel code handling there obsolete. >> >> A larger cleanup of `CardTableRS` will follow in JDK-8234534. >> >> Testing: >> tier1,2 > > Looks good. Thanks @kstefanj @albertnetymk for your reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/2333 From tschatzl at openjdk.java.net Tue Feb 2 11:45:00 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 2 Feb 2021 11:45:00 GMT Subject: RFR: 8260643: Remove parallel version handling in CardTableRS::younger_refs_in_space_iterate() In-Reply-To: <81e4IJaRYyC_fIcd2uyNXwvRXM5rnNytai82oDNQk4w=.51b2a370-66e4-4a2d-9b15-8736ea7a7a30@github.com> References: <81e4IJaRYyC_fIcd2uyNXwvRXM5rnNytai82oDNQk4w=.51b2a370-66e4-4a2d-9b15-8736ea7a7a30@github.com> Message-ID: On Tue, 2 Feb 2021 11:01:13 GMT, Thomas Schatzl wrote: >> Looks good. > > Thanks @kstefanj @albertnetymk for your reviews. Fwiw I re-ran tier1+2 with no issues ------------- PR: https://git.openjdk.java.net/jdk/pull/2333 From tschatzl at openjdk.java.net Tue Feb 2 11:45:02 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 2 Feb 2021 11:45:02 GMT Subject: Integrated: 8260643: Remove parallel version handling in CardTableRS::younger_refs_in_space_iterate() In-Reply-To: References: Message-ID: On Mon, 1 Feb 2021 11:53:00 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this change that removes parallel handling in `CardTableRS::younger_refs_in_space_iterate` as it is always called with n_threads <= 1, making the parallel code handling there obsolete. > > A larger cleanup of `CardTableRS` will follow in JDK-8234534. > > Testing: > tier1,2 This pull request has now been integrated. Changeset: 288a4fed Author: Thomas Schatzl URL: https://git.openjdk.java.net/jdk/commit/288a4fed Stats: 103 lines in 7 files changed: 3 ins; 72 del; 28 mod 8260643: Remove parallel version handling in CardTableRS::younger_refs_in_space_iterate() Reviewed-by: ayang, sjohanss ------------- PR: https://git.openjdk.java.net/jdk/pull/2333 From vlivanov at openjdk.java.net Tue Feb 2 11:45:43 2021 From: vlivanov at openjdk.java.net (Vladimir Ivanov) Date: Tue, 2 Feb 2021 11:45:43 GMT Subject: [jdk16] RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled In-Reply-To: References: <5OfnHC5N00VVv3pWcU9gsAHa23RbAAX7ReEw9Ct6eug=.4f095083-7050-487d-94e0-3befce6744c5@github.com> Message-ID: On Sun, 31 Jan 2021 21:29:40 GMT, Nils Eliasson wrote: >> https://bugs.openjdk.java.net/browse/JDK-8260473 >> >> Function "PhaseVector::expand_vunbox_node" creates a LoadNode, but forgets to make the LoadNode to pass gc barriers. >> >> Testing: all Vector API related tests have passed. >> >> Original pr: https://github.com/openjdk/jdk/pull/2253 > > Approved. > > Now awaiting release team approval. > It will be only fixed in the jdk17, right? Yes, I'm OK with that. ------------- PR: https://git.openjdk.java.net/jdk16/pull/139 From stefank at openjdk.java.net Tue Feb 2 12:14:45 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Tue, 2 Feb 2021 12:14:45 GMT Subject: RFR: 8260012: Reduce inclusion of collectedHeap.hpp and heapInspection.hpp In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 04:18:24 GMT, Ioi Lam wrote: > collectedHeap.hpp is included by 477 out of 1000 .o files in HotSpot. This file in turn includes many other complex header files. > > In many cases, an object file only directly includes this file via: > > - memAllocator.hpp (which does not actually use collectedHeap.hpp) > - oop.inline.hpp and compressedOops.inline.hpp (only use collectedHeap.hpp in asserts via `Universe::heap()->is_in()`). > > By refactoring the above 3 files, we can reduce the .o files that include collectedHeap.hpp to 242. > > This RFE also removes the unnecessary inclusion of heapInspection.hpp from collectedHeap.hpp. > > Build time of HotSpot is reduced for about 1%. > > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. Looks good. A few things that you might want to consider, but I'm also fine with the patch as it is. src/hotspot/share/gc/shared/memAllocator.hpp line 30: > 28: #include "memory/memRegion.hpp" > 29: #include "oops/oopsHierarchy.hpp" > 30: #include "runtime/thread.hpp" If we want to, this could be changed to a forward declaration if we removed the default value (Thread* thread = Thread::current()) of the constructors. Not needed for this RFE though. src/hotspot/cpu/arm/frame_arm.cpp line 518: > 516: obj = *(oop*)res_addr; > 517: } > 518: assert(obj == NULL || Universe::is_in_heap(obj), "sanity check"); Could have been changed to is_in_heap_or_null. src/hotspot/cpu/ppc/frame_ppc.cpp line 308: > 306: case T_ARRAY: { > 307: oop obj = *(oop*)tos_addr; > 308: assert(obj == NULL || Universe::is_in_heap(obj), "sanity check"); Could have been changed to is_in_heap_or_null. src/hotspot/cpu/s390/frame_s390.cpp line 321: > 319: case T_ARRAY: { > 320: oop obj = *(oop*)tos_addr; > 321: assert(obj == NULL || Universe::is_in_heap(obj), "sanity check"); Could have been changed to is_in_heap_or_null. ------------- Marked as reviewed by stefank (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2347 From tschatzl at openjdk.java.net Tue Feb 2 12:33:47 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 2 Feb 2021 12:33:47 GMT Subject: RFR: 8260012: Reduce inclusion of collectedHeap.hpp and heapInspection.hpp In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 04:18:24 GMT, Ioi Lam wrote: > collectedHeap.hpp is included by 477 out of 1000 .o files in HotSpot. This file in turn includes many other complex header files. > > In many cases, an object file only directly includes this file via: > > - memAllocator.hpp (which does not actually use collectedHeap.hpp) > - oop.inline.hpp and compressedOops.inline.hpp (only use collectedHeap.hpp in asserts via `Universe::heap()->is_in()`). > > By refactoring the above 3 files, we can reduce the .o files that include collectedHeap.hpp to 242. > > This RFE also removes the unnecessary inclusion of heapInspection.hpp from collectedHeap.hpp. > > Build time of HotSpot is reduced for about 1%. > > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. Checked a few includes for missing ones; obviously they are included transitively so add as you see fit. src/hotspot/share/gc/shared/memAllocator.hpp line 30: > 28: #include "memory/memRegion.hpp" > 29: #include "oops/oopsHierarchy.hpp" > 30: #include "runtime/thread.hpp" `utilities/globalDefinitions.hpp` for `HeapWord` is missing. src/hotspot/share/oops/compressedOops.inline.hpp line 28: > 26: #define SHARE_OOPS_COMPRESSEDOOPS_INLINE_HPP > 27: > 28: #include "gc/shared/collectedHeap.hpp" `utilities/globalDefinitions.hpp` for `*PTR_FORMAT` and others is missing. src/hotspot/share/oops/oop.inline.hpp line 28: > 26: #define SHARE_OOPS_OOP_INLINE_HPP > 27: > 28: #include "gc/shared/collectedHeap.hpp" `utilities/globalDefinitions.hpp` for `HeapWord` is missing. `globals.hpp` for some globals. `oopsHierarchy.hpp` for `narrowKlass` `utilties/debug.hpp` for `assert` ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2347 From kbarrett at openjdk.java.net Tue Feb 2 19:23:01 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 2 Feb 2021 19:23:01 GMT Subject: [jdk16] RFR: 8260704: ParallelGC: oldgen expansion needs release-store for _end [v2] In-Reply-To: References: Message-ID: > Please review this change that ensures MutableSpace::_end is updated after > everything else that is relevant when expanding, by using a release_store to > perform the update. With this change the storestore that was added by > JDK-8257999 is no longer needed. > > Testing: > mach5 tier1-3, tier5 Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into move_barrier - Move JDK-8257999 barrier to correct location ------------- Changes: - all: https://git.openjdk.java.net/jdk16/pull/141/files - new: https://git.openjdk.java.net/jdk16/pull/141/files/91d8be35..3929bb7f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk16&pr=141&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk16&pr=141&range=00-01 Stats: 242 lines in 29 files changed: 100 ins; 112 del; 30 mod Patch: https://git.openjdk.java.net/jdk16/pull/141.diff Fetch: git fetch https://git.openjdk.java.net/jdk16 pull/141/head:pull/141 PR: https://git.openjdk.java.net/jdk16/pull/141 From kbarrett at openjdk.java.net Tue Feb 2 19:23:02 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 2 Feb 2021 19:23:02 GMT Subject: [jdk16] RFR: 8260704: ParallelGC: oldgen expansion needs release-store for _end [v2] In-Reply-To: <1YY9KmPpKnvdDeecG5Y8Ckb-eCG3vjgnl7O7R1hB1sQ=.435ae6c7-3ed0-4a80-a4ea-22bdefd5811c@github.com> References: <1YY9KmPpKnvdDeecG5Y8Ckb-eCG3vjgnl7O7R1hB1sQ=.435ae6c7-3ed0-4a80-a4ea-22bdefd5811c@github.com> Message-ID: On Tue, 2 Feb 2021 07:56:56 GMT, Thomas Schatzl wrote: >> Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge branch 'master' into move_barrier >> - Move JDK-8257999 barrier to correct location > > Lgtm Thanks @tschatzl and @kstefanj for reviews. ------------- PR: https://git.openjdk.java.net/jdk16/pull/141 From kbarrett at openjdk.java.net Tue Feb 2 19:23:03 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 2 Feb 2021 19:23:03 GMT Subject: [jdk16] Integrated: 8260704: ParallelGC: oldgen expansion needs release-store for _end In-Reply-To: References: Message-ID: On Mon, 1 Feb 2021 10:10:48 GMT, Kim Barrett wrote: > Please review this change that ensures MutableSpace::_end is updated after > everything else that is relevant when expanding, by using a release_store to > perform the update. With this change the storestore that was added by > JDK-8257999 is no longer needed. > > Testing: > mach5 tier1-3, tier5 This pull request has now been integrated. Changeset: afd5eefd Author: Kim Barrett URL: https://git.openjdk.java.net/jdk16/commit/afd5eefd Stats: 11 lines in 2 files changed: 4 ins; 1 del; 6 mod 8260704: ParallelGC: oldgen expansion needs release-store for _end Move JDK-8257999 barrier to correct location. Reviewed-by: tschatzl, sjohanss ------------- PR: https://git.openjdk.java.net/jdk16/pull/141 From kbarrett at openjdk.java.net Tue Feb 2 19:25:42 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 2 Feb 2021 19:25:42 GMT Subject: RFR: 8258508: Merge G1RedirtyCardsQueue into qset In-Reply-To: References: Message-ID: <5R8m4TmkUnxjS3PFRv9ZoCwmmmQimtukejn019nBUk8=.ee65a633-9e57-4ab9-92c1-a5c352a1e5ef@github.com> On Mon, 1 Feb 2021 09:44:11 GMT, Thomas Schatzl wrote: >> Please review this change to G1RedirtyCardsLocalQueueSet to directly >> incorporate the associated queue, simplifying usage. >> >> Testing: >> mach5 tier1 > > Lgtm. Thanks @tschatzl and @walulyai for reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/2325 From cjplummer at openjdk.java.net Tue Feb 2 19:51:50 2021 From: cjplummer at openjdk.java.net (Chris Plummer) Date: Tue, 2 Feb 2021 19:51:50 GMT Subject: RFR: 8247514: Improve clhsdb 'findpc' ability to determine what an address points to by improving PointerFinder and PointerLocation classes In-Reply-To: References: <4YKNpyXQ9QGrLhR61tkh71Q3A7VvCj5Ete_4OvzAA-o=.28b7be8c-6f05-42d4-892b-87ebea907b24@github.com> Message-ID: On Mon, 25 Jan 2021 20:00:41 GMT, Chris Plummer wrote: >> See the bug for most details. A few notes here about some implementation details: >> >> In the `PointerLocation` class, I added more consistency w.r.t. whether or not a newline is printed. It used to for some address types, but not others. Now it always does. And if you see a comment something like the following: >> >> ` getTLAB().printOn(tty); // includes "\n" ` >> >> That's just clarifying whether or not the `printOn()` method called will include the newline. Some do and some don't, and knowing what the various `printOn()` methods do makes getting the proper inclusion of the newline easier to understand. >> >> I added `verbose` and `printAddress` boolean arguments to `PointerLocation.printOn()`. Currently they are always `true`. The false arguments will be used when I complete [JDK-8250801](https://bugs.openjdk.java.net/browse/JDK-8250801), which will use `PointerFinder/Location` to show what each register points to. >> >> The CR mentions that the main motivation for this work is for eventual replacement of the old clhsdb `whatis` command, which was implemented in javascript. It used to resolve DSO symbols, whereas `findpc` did not. The `whatis` code did this with the following: >> >> var dso = loadObjectContainingPC(addr); >> if (dso == null) { >> return ptrLoc.toString(); >> } >> var sym = dso.closestSymbolToPC(addr); >> if (sym != null) { >> return sym.name + '+' + sym.offset; >> } >> And now you'll see something similar in the PointerFinder code: >> >> loc.loadObject = cdbg.loadObjectContainingPC(a); >> if (loc.loadObject != null) { >> loc.nativeSymbol = loc.loadObject.closestSymbolToPC(a); >> return loc; >> } >> Note that now that `findpc` does everything that `whatis` used to (and more), we don't really need to add a java version of `whatis`, but I'll probably do so anyway just help out people who are used to using the `whatis` command. That will be done using [JDK-8244670](https://bugs.openjdk.java.net/browse/JDK-8244670) > > Ping! Ping again. ------------- PR: https://git.openjdk.java.net/jdk/pull/2111 From zgu at openjdk.java.net Tue Feb 2 21:35:51 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Tue, 2 Feb 2021 21:35:51 GMT Subject: RFR: 8260998: Shenandoah: Restore reference processing statistics reporting Message-ID: Please review this patch that restores reporting of reference processing statistics after JDK-8254315 ------------- Commit messages: - JDK-8260998-ref-proc-stats Changes: https://git.openjdk.java.net/jdk/pull/2362/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2362&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260998 Stats: 20 lines in 3 files changed: 18 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2362.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2362/head:pull/2362 PR: https://git.openjdk.java.net/jdk/pull/2362 From ysuenaga at openjdk.java.net Tue Feb 2 23:24:40 2021 From: ysuenaga at openjdk.java.net (Yasumasa Suenaga) Date: Tue, 2 Feb 2021 23:24:40 GMT Subject: RFR: 8247514: Improve clhsdb 'findpc' ability to determine what an address points to by improving PointerFinder and PointerLocation classes In-Reply-To: <4YKNpyXQ9QGrLhR61tkh71Q3A7VvCj5Ete_4OvzAA-o=.28b7be8c-6f05-42d4-892b-87ebea907b24@github.com> References: <4YKNpyXQ9QGrLhR61tkh71Q3A7VvCj5Ete_4OvzAA-o=.28b7be8c-6f05-42d4-892b-87ebea907b24@github.com> Message-ID: <-09XRqbxFbZGkzqDVewiXrJjVNjuLMdZqfxjnxJf3Oc=.2da660b7-a5c1-40e1-81af-8dc814e199ca@github.com> On Sun, 17 Jan 2021 03:57:59 GMT, Chris Plummer wrote: > See the bug for most details. A few notes here about some implementation details: > > In the `PointerLocation` class, I added more consistency w.r.t. whether or not a newline is printed. It used to for some address types, but not others. Now it always does. And if you see a comment something like the following: > > ` getTLAB().printOn(tty); // includes "\n" ` > > That's just clarifying whether or not the `printOn()` method called will include the newline. Some do and some don't, and knowing what the various `printOn()` methods do makes getting the proper inclusion of the newline easier to understand. > > I added `verbose` and `printAddress` boolean arguments to `PointerLocation.printOn()`. Currently they are always `true`. The false arguments will be used when I complete [JDK-8250801](https://bugs.openjdk.java.net/browse/JDK-8250801), which will use `PointerFinder/Location` to show what each register points to. > > The CR mentions that the main motivation for this work is for eventual replacement of the old clhsdb `whatis` command, which was implemented in javascript. It used to resolve DSO symbols, whereas `findpc` did not. The `whatis` code did this with the following: > > var dso = loadObjectContainingPC(addr); > if (dso == null) { > return ptrLoc.toString(); > } > var sym = dso.closestSymbolToPC(addr); > if (sym != null) { > return sym.name + '+' + sym.offset; > } > And now you'll see something similar in the PointerFinder code: > > loc.loadObject = cdbg.loadObjectContainingPC(a); > if (loc.loadObject != null) { > loc.nativeSymbol = loc.loadObject.closestSymbolToPC(a); > return loc; > } > Note that now that `findpc` does everything that `whatis` used to (and more), we don't really need to add a java version of `whatis`, but I'll probably do so anyway just help out people who are used to using the `whatis` command. That will be done using [JDK-8244670](https://bugs.openjdk.java.net/browse/JDK-8244670) LGTM ------------- Marked as reviewed by ysuenaga (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2111 From kbarrett at openjdk.java.net Wed Feb 3 00:57:02 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 3 Feb 2021 00:57:02 GMT Subject: RFR: 8258508: Merge G1RedirtyCardsQueue into qset [v2] In-Reply-To: References: Message-ID: > Please review this change to G1RedirtyCardsLocalQueueSet to directly > incorporate the associated queue, simplifying usage. > > Testing: > mach5 tier1 Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into merge_redirty_queue - Merge branch 'master' into merge_redirty_queue - merge redirty cards queue into local qset ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2325/files - new: https://git.openjdk.java.net/jdk/pull/2325/files/06057eb0..fbf891ba Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2325&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2325&range=00-01 Stats: 4520 lines in 346 files changed: 2030 ins; 983 del; 1507 mod Patch: https://git.openjdk.java.net/jdk/pull/2325.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2325/head:pull/2325 PR: https://git.openjdk.java.net/jdk/pull/2325 From kbarrett at openjdk.java.net Wed Feb 3 00:57:03 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 3 Feb 2021 00:57:03 GMT Subject: Integrated: 8258508: Merge G1RedirtyCardsQueue into qset In-Reply-To: References: Message-ID: On Sat, 30 Jan 2021 10:14:42 GMT, Kim Barrett wrote: > Please review this change to G1RedirtyCardsLocalQueueSet to directly > incorporate the associated queue, simplifying usage. > > Testing: > mach5 tier1 This pull request has now been integrated. Changeset: d423d368 Author: Kim Barrett URL: https://git.openjdk.java.net/jdk/commit/d423d368 Stats: 55 lines in 5 files changed: 12 ins; 26 del; 17 mod 8258508: Merge G1RedirtyCardsQueue into qset Reviewed-by: tschatzl, iwalulya ------------- PR: https://git.openjdk.java.net/jdk/pull/2325 From iklam at openjdk.java.net Wed Feb 3 06:40:08 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 3 Feb 2021 06:40:08 GMT Subject: RFR: 8260012: Reduce inclusion of collectedHeap.hpp and heapInspection.hpp [v2] In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 12:09:22 GMT, Stefan Karlsson wrote: >> Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - @tschatzl and @stefank comments >> - Merge branch 'master' into 8260012-reduce-inclue-collectedHeap-heapInspection-hpp >> - 8260012: Reduce inclusion of collectedHeap.hpp and heapInspection.hpp > > src/hotspot/share/gc/shared/memAllocator.hpp line 30: > >> 28: #include "memory/memRegion.hpp" >> 29: #include "oops/oopsHierarchy.hpp" >> 30: #include "runtime/thread.hpp" > > If we want to, this could be changed to a forward declaration if we removed the default value (Thread* thread = Thread::current()) of the constructors. Not needed for this RFE though. memAllocator.hpp is not included very often (only 65 out of ~1000 .o files), so I decided to leave it as is. > src/hotspot/cpu/arm/frame_arm.cpp line 518: > >> 516: obj = *(oop*)res_addr; >> 517: } >> 518: assert(obj == NULL || Universe::is_in_heap(obj), "sanity check"); > > Could have been changed to is_in_heap_or_null. Fixed > src/hotspot/cpu/ppc/frame_ppc.cpp line 308: > >> 306: case T_ARRAY: { >> 307: oop obj = *(oop*)tos_addr; >> 308: assert(obj == NULL || Universe::is_in_heap(obj), "sanity check"); > > Could have been changed to is_in_heap_or_null. Fixed. I also change other frame_.cpp files to use is_in_heap_or_null. ------------- PR: https://git.openjdk.java.net/jdk/pull/2347 From iklam at openjdk.java.net Wed Feb 3 06:40:04 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 3 Feb 2021 06:40:04 GMT Subject: RFR: 8260012: Reduce inclusion of collectedHeap.hpp and heapInspection.hpp [v2] In-Reply-To: References: Message-ID: > collectedHeap.hpp is included by 477 out of 1000 .o files in HotSpot. This file in turn includes many other complex header files. > > In many cases, an object file only directly includes this file via: > > - memAllocator.hpp (which does not actually use collectedHeap.hpp) > - oop.inline.hpp and compressedOops.inline.hpp (only use collectedHeap.hpp in asserts via `Universe::heap()->is_in()`). > > By refactoring the above 3 files, we can reduce the .o files that include collectedHeap.hpp to 242. > > This RFE also removes the unnecessary inclusion of heapInspection.hpp from collectedHeap.hpp. > > Build time of HotSpot is reduced for about 1%. > > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - @tschatzl and @stefank comments - Merge branch 'master' into 8260012-reduce-inclue-collectedHeap-heapInspection-hpp - 8260012: Reduce inclusion of collectedHeap.hpp and heapInspection.hpp ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2347/files - new: https://git.openjdk.java.net/jdk/pull/2347/files/a1bdc2f7..529e77e4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2347&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2347&range=00-01 Stats: 3635 lines in 268 files changed: 1458 ins; 983 del; 1194 mod Patch: https://git.openjdk.java.net/jdk/pull/2347.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2347/head:pull/2347 PR: https://git.openjdk.java.net/jdk/pull/2347 From iklam at openjdk.java.net Wed Feb 3 06:40:11 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Wed, 3 Feb 2021 06:40:11 GMT Subject: RFR: 8260012: Reduce inclusion of collectedHeap.hpp and heapInspection.hpp [v2] In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 12:22:50 GMT, Thomas Schatzl wrote: >> Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - @tschatzl and @stefank comments >> - Merge branch 'master' into 8260012-reduce-inclue-collectedHeap-heapInspection-hpp >> - 8260012: Reduce inclusion of collectedHeap.hpp and heapInspection.hpp > > src/hotspot/share/gc/shared/memAllocator.hpp line 30: > >> 28: #include "memory/memRegion.hpp" >> 29: #include "oops/oopsHierarchy.hpp" >> 30: #include "runtime/thread.hpp" > > `utilities/globalDefinitions.hpp` for `HeapWord` is missing. Fixed. > src/hotspot/share/oops/compressedOops.inline.hpp line 28: > >> 26: #define SHARE_OOPS_COMPRESSEDOOPS_INLINE_HPP >> 27: >> 28: #include "gc/shared/collectedHeap.hpp" > > `utilities/globalDefinitions.hpp` for `*PTR_FORMAT` and others is missing. Fixed. > src/hotspot/share/oops/oop.inline.hpp line 28: > >> 26: #define SHARE_OOPS_OOP_INLINE_HPP >> 27: >> 28: #include "gc/shared/collectedHeap.hpp" > > `utilities/globalDefinitions.hpp` for `HeapWord` is missing. > `globals.hpp` for some globals. > `oopsHierarchy.hpp` for `narrowKlass` > `utilties/debug.hpp` for `assert` Fixed. Thanks for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/2347 From shade at openjdk.java.net Wed Feb 3 08:39:42 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 3 Feb 2021 08:39:42 GMT Subject: RFR: 8260998: Shenandoah: Restore reference processing statistics reporting In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 20:49:04 GMT, Zhengyu Gu wrote: > Please review this patch that restores reporting of reference processing statistics after JDK-8254315 Looks fine to me. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2362 From tschatzl at openjdk.java.net Wed Feb 3 09:51:52 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 3 Feb 2021 09:51:52 GMT Subject: RFR: 8261023: Add comment why memory pretouch must be a store Message-ID: Hi all, may I have reviews for this additional comment that explains why `os::pretouch_memory` needs to use a store and must not use a read which would be more convenient? Basically on some (all?) OSes memory pages are only actually backed with physical memory on a store to that page. Before that a common "zero page" may be used to satisfy reads. This is not what is intended here. A previous comment (that has been removed long ago) seems to have been a bit confused about the actual issue: - // Note the use of a write here; originally we tried just a read, but - // since the value read was unused, the optimizer removed the read. - // If we ever have a concurrent touchahead thread, we'll want to use - // a read, to avoid the potential of overwriting data (if a mutator - // thread beats the touchahead thread to a page). There are various - // ways of making sure this read is not optimized away: for example, - // generating the code for a read procedure at runtime. It indicates that the reason for using a store has been that the compiler would optimize away the reads (which begs the question why a `volatile` read has not been used). Maybe these zero page optimizations came later than that original implementation though. Testing: local compilation - it's adding a comment only, really. Thanks, Thomas ------------- Commit messages: - Initial commit Changes: https://git.openjdk.java.net/jdk/pull/2373/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2373&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261023 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2373.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2373/head:pull/2373 PR: https://git.openjdk.java.net/jdk/pull/2373 From shade at openjdk.java.net Wed Feb 3 10:06:45 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 3 Feb 2021 10:06:45 GMT Subject: RFR: 8261023: Add comment why memory pretouch must be a store In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 09:47:04 GMT, Thomas Schatzl wrote: > Hi all, > > may I have reviews for this additional comment that explains why `os::pretouch_memory` needs to use a store and must not use a read which would be more convenient? > > Basically on some (all?) OSes memory pages are only actually backed with physical memory on a store to that page. Before that a common "zero page" may be used to satisfy reads. This is not what is intended here. > > A previous comment (that has been removed long ago) seems to have been a bit confused about the actual issue: > > - // Note the use of a write here; originally we tried just a read, but > - // since the value read was unused, the optimizer removed the read. > - // If we ever have a concurrent touchahead thread, we'll want to use > - // a read, to avoid the potential of overwriting data (if a mutator > - // thread beats the touchahead thread to a page). There are various > - // ways of making sure this read is not optimized away: for example, > - // generating the code for a read procedure at runtime. > > It indicates that the reason for using a store has been that the compiler would optimize away the reads (which begs the question why a `volatile` read has not been used). > > Maybe these zero page optimizations came later than that original implementation though. > > Testing: local compilation - it's adding a comment only, really. > > Thanks, > Thomas Looks fine. Bikeshedding suggestions below. src/hotspot/share/runtime/os.cpp line 1819: > 1817: // optimization where only writes trigger actual backing of memory. Reads > 1818: // access a single shared zero page at first and so will not achieve the > 1819: // desired effect. Consider: Note: this must be a store, not a load. On many OSes loads from the fresh memory would be satisfied from a single mapped zero page. We need to store something to each page to get them backed by their own memory, which is what we want as the effect here. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2373 From iwalulya at openjdk.java.net Wed Feb 3 10:06:45 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Wed, 3 Feb 2021 10:06:45 GMT Subject: RFR: 8261023: Add comment why memory pretouch must be a store In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 09:47:04 GMT, Thomas Schatzl wrote: > Hi all, > > may I have reviews for this additional comment that explains why `os::pretouch_memory` needs to use a store and must not use a read which would be more convenient? > > Basically on some (all?) OSes memory pages are only actually backed with physical memory on a store to that page. Before that a common "zero page" may be used to satisfy reads. This is not what is intended here. > > A previous comment (that has been removed long ago) seems to have been a bit confused about the actual issue: > > - // Note the use of a write here; originally we tried just a read, but > - // since the value read was unused, the optimizer removed the read. > - // If we ever have a concurrent touchahead thread, we'll want to use > - // a read, to avoid the potential of overwriting data (if a mutator > - // thread beats the touchahead thread to a page). There are various > - // ways of making sure this read is not optimized away: for example, > - // generating the code for a read procedure at runtime. > > It indicates that the reason for using a store has been that the compiler would optimize away the reads (which begs the question why a `volatile` read has not been used). > > Maybe these zero page optimizations came later than that original implementation though. > > Testing: local compilation - it's adding a comment only, really. > > Thanks, > Thomas lgtm! ------------- Marked as reviewed by iwalulya (Committer). PR: https://git.openjdk.java.net/jdk/pull/2373 From tschatzl at openjdk.java.net Wed Feb 3 10:09:52 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 3 Feb 2021 10:09:52 GMT Subject: RFR: 8234534: Simplify CardTable code after CMS removal In-Reply-To: References: Message-ID: <47QCVJeDPnsUak4dH0LXGJxDmqutyQeY91MIcPwyi-Q=.19b4ed7a-750a-47a8-aee0-76427c1752cf@github.com> On Tue, 2 Feb 2021 15:13:38 GMT, Thomas Schatzl wrote: > Hi, > > can I have reviews for this cleanup that removes CMS specific code from `CardTable/CardTableRS`? > > Note that there is still this "conc_scan" parameter passed to the card table that affects barrier code generation, for some reason also G1 barrier code generation although it should not as `G1CardTable::scanned_concurrently()` only used for the "normal" card table. Initial attempts showed that removing this is not straightforward, causing crashes and so I left it out for [JDK-8250941](https://bugs.openjdk.java.net/browse/JDK-8260941) so that this change is solely about removing unused code. > > Testing: tier1-4, some tier1-5 runs earlier (before some removal of hunks for files only containing copyright updates or newline changes) (latest tier1-4 testing still stuck on linux-aarch64, but everything else passed. I think there is no particular aarch64 specific change in there...) ------------- PR: https://git.openjdk.java.net/jdk/pull/2354 From tschatzl at openjdk.java.net Wed Feb 3 10:09:52 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 3 Feb 2021 10:09:52 GMT Subject: RFR: 8234534: Simplify CardTable code after CMS removal Message-ID: Hi, can I have reviews for this cleanup that removes CMS specific code from `CardTable/CardTableRS`? Note that there is still this "conc_scan" parameter passed to the card table that affects barrier code generation, for some reason also G1 barrier code generation although it should not as `G1CardTable::scanned_concurrently()` only used for the "normal" card table. Initial attempts showed that removing this is not straightforward, causing crashes and so I left it out for [JDK-8250941](https://bugs.openjdk.java.net/browse/JDK-8260941) so that this change is solely about removing unused code. Testing: tier1-4, some tier1-5 runs earlier (before some removal of hunks for files only containing copyright updates or newline changes) ------------- Commit messages: - Initial commit Changes: https://git.openjdk.java.net/jdk/pull/2354/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2354&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8234534 Stats: 197 lines in 7 files changed: 0 ins; 185 del; 12 mod Patch: https://git.openjdk.java.net/jdk/pull/2354.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2354/head:pull/2354 PR: https://git.openjdk.java.net/jdk/pull/2354 From tschatzl at openjdk.java.net Wed Feb 3 10:31:44 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 3 Feb 2021 10:31:44 GMT Subject: RFR: 8260012: Reduce inclusion of collectedHeap.hpp and heapInspection.hpp [v2] In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 12:30:51 GMT, Thomas Schatzl wrote: >> Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - @tschatzl and @stefank comments >> - Merge branch 'master' into 8260012-reduce-inclue-collectedHeap-heapInspection-hpp >> - 8260012: Reduce inclusion of collectedHeap.hpp and heapInspection.hpp > > Checked a few includes for missing ones; obviously they are included transitively so add as you see fit. Still good. ------------- PR: https://git.openjdk.java.net/jdk/pull/2347 From jiefu at openjdk.java.net Wed Feb 3 11:05:56 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 3 Feb 2021 11:05:56 GMT Subject: RFR: 8261028: ZGC: SIGFPE when MaxVirtMemFraction=0 Message-ID: Hi all, The SIGFPE was caused by this line [1] when MaxVirtMemFraction=0. But according to this comment [2], 0 should not be allowed for MaxVirtMemFraction. Thanks. Best regards, Jie [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/z/zAddressSpaceLimit.cpp#L51 [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shared/gc_globals.hpp#L345 ------------- Commit messages: - 8261028: ZGC: SIGFPE when MaxVirtMemFraction=0 Changes: https://git.openjdk.java.net/jdk/pull/2374/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2374&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261028 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2374.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2374/head:pull/2374 PR: https://git.openjdk.java.net/jdk/pull/2374 From tschatzl at openjdk.java.net Wed Feb 3 11:28:57 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 3 Feb 2021 11:28:57 GMT Subject: RFR: 8261023: Document why memory pretouch must be a store [v2] In-Reply-To: References: Message-ID: > Hi all, > > may I have reviews for this additional comment that explains why `os::pretouch_memory` needs to use a store and must not use a read which would be more convenient? > > Basically on some (all?) OSes memory pages are only actually backed with physical memory on a store to that page. Before that a common "zero page" may be used to satisfy reads. This is not what is intended here. > > A previous comment (that has been removed long ago) seems to have been a bit confused about the actual issue: > > - // Note the use of a write here; originally we tried just a read, but > - // since the value read was unused, the optimizer removed the read. > - // If we ever have a concurrent touchahead thread, we'll want to use > - // a read, to avoid the potential of overwriting data (if a mutator > - // thread beats the touchahead thread to a page). There are various > - // ways of making sure this read is not optimized away: for example, > - // generating the code for a read procedure at runtime. > > It indicates that the reason for using a store has been that the compiler would optimize away the reads (which begs the question why a `volatile` read has not been used). > > Maybe these zero page optimizations came later than that original implementation though. > > Testing: local compilation - it's adding a comment only, really. > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: shade review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2373/files - new: https://git.openjdk.java.net/jdk/pull/2373/files/d30fff80..2009527e Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2373&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2373&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/2373.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2373/head:pull/2373 PR: https://git.openjdk.java.net/jdk/pull/2373 From shade at openjdk.java.net Wed Feb 3 11:32:43 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 3 Feb 2021 11:32:43 GMT Subject: RFR: 8261023: Document why memory pretouch must be a store [v2] In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 11:28:57 GMT, Thomas Schatzl wrote: >> Hi all, >> >> may I have reviews for this additional comment that explains why `os::pretouch_memory` needs to use a store and must not use a read which would be more convenient? >> >> Basically on some (all?) OSes memory pages are only actually backed with physical memory on a store to that page. Before that a common "zero page" may be used to satisfy reads. This is not what is intended here. >> >> A previous comment (that has been removed long ago) seems to have been a bit confused about the actual issue: >> >> - // Note the use of a write here; originally we tried just a read, but >> - // since the value read was unused, the optimizer removed the read. >> - // If we ever have a concurrent touchahead thread, we'll want to use >> - // a read, to avoid the potential of overwriting data (if a mutator >> - // thread beats the touchahead thread to a page). There are various >> - // ways of making sure this read is not optimized away: for example, >> - // generating the code for a read procedure at runtime. >> >> It indicates that the reason for using a store has been that the compiler would optimize away the reads (which begs the question why a `volatile` read has not been used). >> >> Maybe these zero page optimizations came later than that original implementation though. >> >> Testing: local compilation - it's adding a comment only, really. >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > shade review Marked as reviewed by shade (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2373 From stefank at openjdk.java.net Wed Feb 3 11:59:38 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Wed, 3 Feb 2021 11:59:38 GMT Subject: RFR: 8261028: ZGC: SIGFPE when MaxVirtMemFraction=0 In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 11:01:35 GMT, Jie Fu wrote: > Hi all, > > The SIGFPE was caused by this line [1] when MaxVirtMemFraction=0. > But according to this comment [2], 0 should not be allowed for MaxVirtMemFraction. > > Thanks. > Best regards, > Jie > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/z/zAddressSpaceLimit.cpp#L51 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shared/gc_globals.hpp#L345 Looks good. Thanks for fixing! ------------- Marked as reviewed by stefank (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2374 From pliden at openjdk.java.net Wed Feb 3 12:06:40 2021 From: pliden at openjdk.java.net (Per Liden) Date: Wed, 3 Feb 2021 12:06:40 GMT Subject: RFR: 8261028: ZGC: SIGFPE when MaxVirtMemFraction=0 In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 11:01:35 GMT, Jie Fu wrote: > Hi all, > > The SIGFPE was caused by this line [1] when MaxVirtMemFraction=0. > But according to this comment [2], 0 should not be allowed for MaxVirtMemFraction. > > Thanks. > Best regards, > Jie > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/z/zAddressSpaceLimit.cpp#L51 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shared/gc_globals.hpp#L345 Looks good! ------------- Marked as reviewed by pliden (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2374 From jiefu at openjdk.java.net Wed Feb 3 12:26:44 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Wed, 3 Feb 2021 12:26:44 GMT Subject: RFR: 8261028: ZGC: SIGFPE when MaxVirtMemFraction=0 In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 11:57:16 GMT, Stefan Karlsson wrote: >> Hi all, >> >> The SIGFPE was caused by this line [1] when MaxVirtMemFraction=0. >> But according to this comment [2], 0 should not be allowed for MaxVirtMemFraction. >> >> Thanks. >> Best regards, >> Jie >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/z/zAddressSpaceLimit.cpp#L51 >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shared/gc_globals.hpp#L345 > > Looks good. Thanks for fixing! Thanks @stefank and @pliden for your review. Will push it tomorrow. ------------- PR: https://git.openjdk.java.net/jdk/pull/2374 From zgu at openjdk.java.net Wed Feb 3 13:19:53 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 3 Feb 2021 13:19:53 GMT Subject: Integrated: 8260998: Shenandoah: Restore reference processing statistics reporting In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 20:49:04 GMT, Zhengyu Gu wrote: > Please review this patch that restores reporting of reference processing statistics after JDK-8254315 This pull request has now been integrated. Changeset: 5324b5c5 Author: Zhengyu Gu URL: https://git.openjdk.java.net/jdk/commit/5324b5c5 Stats: 20 lines in 3 files changed: 18 ins; 1 del; 1 mod 8260998: Shenandoah: Restore reference processing statistics reporting Reviewed-by: shade ------------- PR: https://git.openjdk.java.net/jdk/pull/2362 From zgu at openjdk.java.net Wed Feb 3 20:10:53 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 3 Feb 2021 20:10:53 GMT Subject: RFR: 8259647: Add support for JFR event ObjectCountAfterGC to Shenandoah Message-ID: Please review this patch that adds JFR ObjectCountAfterGC event support. AFAICT, the event is off by default. If it is enabled, it distorts Shenandoah pause characteristics, since it performs heap walk during final mark pause. When event is disabled: `[191.033s][info][gc,stats] Pause Init Mark (G) 454 us` `[191.033s][info][gc,stats] Pause Init Mark (N) 13 us` When event is enabled: `[396.631s][info][gc,stats] Pause Final Mark (G) 43199 us` `[396.631s][info][gc,stats] Pause Final Mark (N) 42982 us` Test: - [x] hotspot_gc_shenandoah ------------- Commit messages: - JDK-8259647-object_count_jfr Changes: https://git.openjdk.java.net/jdk/pull/2386/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2386&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8259647 Stats: 12 lines in 3 files changed: 6 ins; 4 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2386.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2386/head:pull/2386 PR: https://git.openjdk.java.net/jdk/pull/2386 From jiefu at openjdk.java.net Thu Feb 4 00:08:54 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Thu, 4 Feb 2021 00:08:54 GMT Subject: Integrated: 8261028: ZGC: SIGFPE when MaxVirtMemFraction=0 In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 11:01:35 GMT, Jie Fu wrote: > Hi all, > > The SIGFPE was caused by this line [1] when MaxVirtMemFraction=0. > But according to this comment [2], 0 should not be allowed for MaxVirtMemFraction. > > Thanks. > Best regards, > Jie > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/z/zAddressSpaceLimit.cpp#L51 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shared/gc_globals.hpp#L345 This pull request has now been integrated. Changeset: e2516e41 Author: Jie Fu URL: https://git.openjdk.java.net/jdk/commit/e2516e41 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8261028: ZGC: SIGFPE when MaxVirtMemFraction=0 Reviewed-by: stefank, pliden ------------- PR: https://git.openjdk.java.net/jdk/pull/2374 From iklam at openjdk.java.net Thu Feb 4 02:00:07 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 4 Feb 2021 02:00:07 GMT Subject: RFR: 8260012: Reduce inclusion of collectedHeap.hpp and heapInspection.hpp [v3] In-Reply-To: References: Message-ID: > collectedHeap.hpp is included by 477 out of 1000 .o files in HotSpot. This file in turn includes many other complex header files. > > In many cases, an object file only directly includes this file via: > > - memAllocator.hpp (which does not actually use collectedHeap.hpp) > - oop.inline.hpp and compressedOops.inline.hpp (only use collectedHeap.hpp in asserts via `Universe::heap()->is_in()`). > > By refactoring the above 3 files, we can reduce the .o files that include collectedHeap.hpp to 242. > > This RFE also removes the unnecessary inclusion of heapInspection.hpp from collectedHeap.hpp. > > Build time of HotSpot is reduced for about 1%. > > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' of https://github.com/openjdk/jdk into 8260012-reduce-inclue-collectedHeap-heapInspection-hpp - @tschatzl and @stefank comments - Merge branch 'master' into 8260012-reduce-inclue-collectedHeap-heapInspection-hpp - 8260012: Reduce inclusion of collectedHeap.hpp and heapInspection.hpp ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2347/files - new: https://git.openjdk.java.net/jdk/pull/2347/files/529e77e4..7d9015d2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2347&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2347&range=01-02 Stats: 2516 lines in 114 files changed: 1237 ins; 850 del; 429 mod Patch: https://git.openjdk.java.net/jdk/pull/2347.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2347/head:pull/2347 PR: https://git.openjdk.java.net/jdk/pull/2347 From iklam at openjdk.java.net Thu Feb 4 04:09:06 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 4 Feb 2021 04:09:06 GMT Subject: RFR: 8260012: Reduce inclusion of collectedHeap.hpp and heapInspection.hpp [v4] In-Reply-To: References: Message-ID: > collectedHeap.hpp is included by 477 out of 1000 .o files in HotSpot. This file in turn includes many other complex header files. > > In many cases, an object file only directly includes this file via: > > - memAllocator.hpp (which does not actually use collectedHeap.hpp) > - oop.inline.hpp and compressedOops.inline.hpp (only use collectedHeap.hpp in asserts via `Universe::heap()->is_in()`). > > By refactoring the above 3 files, we can reduce the .o files that include collectedHeap.hpp to 242. > > This RFE also removes the unnecessary inclusion of heapInspection.hpp from collectedHeap.hpp. > > Build time of HotSpot is reduced for about 1%. > > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into 8260012-reduce-inclue-collectedHeap-heapInspection-hpp - Merge branch 'master' of https://github.com/openjdk/jdk into 8260012-reduce-inclue-collectedHeap-heapInspection-hpp - @tschatzl and @stefank comments - Merge branch 'master' into 8260012-reduce-inclue-collectedHeap-heapInspection-hpp - 8260012: Reduce inclusion of collectedHeap.hpp and heapInspection.hpp ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2347/files - new: https://git.openjdk.java.net/jdk/pull/2347/files/7d9015d2..cfd70b3c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2347&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2347&range=02-03 Stats: 2645 lines in 56 files changed: 2497 ins; 69 del; 79 mod Patch: https://git.openjdk.java.net/jdk/pull/2347.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2347/head:pull/2347 PR: https://git.openjdk.java.net/jdk/pull/2347 From iklam at openjdk.java.net Thu Feb 4 04:09:07 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Thu, 4 Feb 2021 04:09:07 GMT Subject: Integrated: 8260012: Reduce inclusion of collectedHeap.hpp and heapInspection.hpp In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 04:18:24 GMT, Ioi Lam wrote: > collectedHeap.hpp is included by 477 out of 1000 .o files in HotSpot. This file in turn includes many other complex header files. > > In many cases, an object file only directly includes this file via: > > - memAllocator.hpp (which does not actually use collectedHeap.hpp) > - oop.inline.hpp and compressedOops.inline.hpp (only use collectedHeap.hpp in asserts via `Universe::heap()->is_in()`). > > By refactoring the above 3 files, we can reduce the .o files that include collectedHeap.hpp to 242. > > This RFE also removes the unnecessary inclusion of heapInspection.hpp from collectedHeap.hpp. > > Build time of HotSpot is reduced for about 1%. > > Tested with mach5: tier1, builds-tier2, builds-tier3, builds-tier4 and builds-tier5. Also locally: aarch64, arm, ppc64, s390, x86, and zero. This pull request has now been integrated. Changeset: 82028e70 Author: Ioi Lam URL: https://git.openjdk.java.net/jdk/commit/82028e70 Stats: 110 lines in 60 files changed: 69 ins; 7 del; 34 mod 8260012: Reduce inclusion of collectedHeap.hpp and heapInspection.hpp Reviewed-by: stefank, tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/2347 From ayang at openjdk.java.net Thu Feb 4 10:15:41 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Thu, 4 Feb 2021 10:15:41 GMT Subject: RFR: 8234534: Simplify CardTable code after CMS removal In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 15:13:38 GMT, Thomas Schatzl wrote: > Hi, > > can I have reviews for this cleanup that removes CMS specific code from `CardTable/CardTableRS`? > > Note that there is still this "conc_scan" parameter passed to the card table that affects barrier code generation, for some reason also G1 barrier code generation although it should not as `G1CardTable::scanned_concurrently()` only used for the "normal" card table. Initial attempts showed that removing this is not straightforward, causing crashes and so I left it out for [JDK-8250941](https://bugs.openjdk.java.net/browse/JDK-8260941) so that this change is solely about removing unused code. > > Testing: tier1-4, some tier1-5 runs earlier (before some removal of hunks for files only containing copyright updates or newline changes) Marked as reviewed by ayang (Author). src/hotspot/share/gc/shared/cardTableRS.cpp line 442: > 440: CardTable(whole_heap, scanned_concurrently) { } > 441: > 442: CardTableRS::~CardTableRS() { } Now that it's empty, is it possible to remove it completely? src/hotspot/share/gc/shared/cardTableRS.hpp line 55: > 53: virtual void verify_used_region_at_save_marks(Space* sp) const NOT_DEBUG_RETURN; > 54: > 55: void inline_write_ref_field_gc(void* field, oop new_val) { It seems that the arg `new_val` is not used. Maybe remove it or add a comment saying it's an intentional omission. ------------- PR: https://git.openjdk.java.net/jdk/pull/2354 From kbarrett at openjdk.java.net Thu Feb 4 10:31:41 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Thu, 4 Feb 2021 10:31:41 GMT Subject: RFR: 8234534: Simplify CardTable code after CMS removal In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 15:13:38 GMT, Thomas Schatzl wrote: > Hi, > > can I have reviews for this cleanup that removes CMS specific code from `CardTable/CardTableRS`? > > Note that there is still this "conc_scan" parameter passed to the card table that affects barrier code generation, for some reason also G1 barrier code generation although it should not as `G1CardTable::scanned_concurrently()` only used for the "normal" card table. Initial attempts showed that removing this is not straightforward, causing crashes and so I left it out for [JDK-8250941](https://bugs.openjdk.java.net/browse/JDK-8260941) so that this change is solely about removing unused code. > > Testing: tier1-4, some tier1-5 runs earlier (before some removal of hunks for files only containing copyright updates or newline changes) Looks good to me, with the one minor nit I commented on and Albert's suggestions. src/hotspot/share/gc/shared/cardTableRS.cpp line 43: > 41: inline bool ClearNoncleanCardWrapper::clear_card(CardValue* entry) { > 42: CardValue entry_val = *entry; > 43: assert(entry_val == CardTableRS::dirty_card_val(), Consider eliminating `entry_val` - just use `*entry` in the assert. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2354 From jiefu at openjdk.java.net Thu Feb 4 11:00:49 2021 From: jiefu at openjdk.java.net (Jie Fu) Date: Thu, 4 Feb 2021 11:00:49 GMT Subject: [jdk16] RFR: 8260473: [vector] ZGC: VectorReshape test produces incorrect results with ZGC enabled In-Reply-To: References: <5OfnHC5N00VVv3pWcU9gsAHa23RbAAX7ReEw9Ct6eug=.4f095083-7050-487d-94e0-3befce6744c5@github.com> <_Wm-fi9j4TZ41F0G_92f7ioKQeDNgZiOEMmLkZ0lvvE=.0a9beba5-5089-4368-b4bc-73faf9d5e858@github.com> <226iFOsl1hXrEoSe9uzgBb1Z75wxQEv5azlJIfzCO4k=.69d5ed3a-7337-472d-b106-1ce2e5d361bf@github.com> Message-ID: On Tue, 2 Feb 2021 01:58:56 GMT, Jie Fu wrote: > Good. Please, file a follow-up RFE to improve the test. The RFE has been filed here: https://bugs.openjdk.java.net/browse/JDK-8261152 Thanks. ------------- PR: https://git.openjdk.java.net/jdk16/pull/139 From tschatzl at openjdk.java.net Thu Feb 4 13:50:41 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 4 Feb 2021 13:50:41 GMT Subject: RFR: 8261023: Document why memory pretouch must be a store [v2] In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 11:29:40 GMT, Aleksey Shipilev wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> shade review > > Marked as reviewed by shade (Reviewer). Thanks @shipilev @walulyai for your reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/2373 From tschatzl at openjdk.java.net Thu Feb 4 13:50:43 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 4 Feb 2021 13:50:43 GMT Subject: Integrated: 8261023: Document why memory pretouch must be a store In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 09:47:04 GMT, Thomas Schatzl wrote: > Hi all, > > may I have reviews for this additional comment that explains why `os::pretouch_memory` needs to use a store and must not use a read which would be more convenient? > > Basically on some (all?) OSes memory pages are only actually backed with physical memory on a store to that page. Before that a common "zero page" may be used to satisfy reads. This is not what is intended here. > > A previous comment (that has been removed long ago) seems to have been a bit confused about the actual issue: > > - // Note the use of a write here; originally we tried just a read, but > - // since the value read was unused, the optimizer removed the read. > - // If we ever have a concurrent touchahead thread, we'll want to use > - // a read, to avoid the potential of overwriting data (if a mutator > - // thread beats the touchahead thread to a page). There are various > - // ways of making sure this read is not optimized away: for example, > - // generating the code for a read procedure at runtime. > > It indicates that the reason for using a store has been that the compiler would optimize away the reads (which begs the question why a `volatile` read has not been used). > > Maybe these zero page optimizations came later than that original implementation though. > > Testing: local compilation - it's adding a comment only, really. > > Thanks, > Thomas This pull request has now been integrated. Changeset: be772ffa Author: Thomas Schatzl URL: https://git.openjdk.java.net/jdk/commit/be772ffa Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod 8261023: Document why memory pretouch must be a store Reviewed-by: shade, iwalulya ------------- PR: https://git.openjdk.java.net/jdk/pull/2373 From tschatzl at openjdk.java.net Thu Feb 4 13:56:58 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 4 Feb 2021 13:56:58 GMT Subject: RFR: 8234534: Simplify CardTable code after CMS removal [v2] In-Reply-To: References: Message-ID: > Hi, > > can I have reviews for this cleanup that removes CMS specific code from `CardTable/CardTableRS`? > > Note that there is still this "conc_scan" parameter passed to the card table that affects barrier code generation, for some reason also G1 barrier code generation although it should not as `G1CardTable::scanned_concurrently()` only used for the "normal" card table. Initial attempts showed that removing this is not straightforward, causing crashes and so I left it out for [JDK-8250941](https://bugs.openjdk.java.net/browse/JDK-8260941) so that this change is solely about removing unused code. > > Testing: tier1-4, some tier1-5 runs earlier (before some removal of hunks for files only containing copyright updates or newline changes) Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: kimbarret, albertnetymk review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2354/files - new: https://git.openjdk.java.net/jdk/pull/2354/files/5aa23d74..849c79bb Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2354&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2354&range=00-01 Stats: 11 lines in 4 files changed: 0 ins; 6 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/2354.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2354/head:pull/2354 PR: https://git.openjdk.java.net/jdk/pull/2354 From tschatzl at openjdk.java.net Thu Feb 4 13:56:59 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 4 Feb 2021 13:56:59 GMT Subject: RFR: 8234534: Simplify CardTable code after CMS removal [v2] In-Reply-To: References: Message-ID: <33XHcZDMFLFqOngnBQUpiuaQ_VlxfZ9HPhinJoDGIYY=.838ade60-1bc8-43c7-98d9-9d8c21ba3d26@github.com> On Thu, 4 Feb 2021 10:29:18 GMT, Kim Barrett wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> kimbarret, albertnetymk review > > Looks good to me, with the one minor nit I commented on and Albert's suggestions. All fixed as suggested. Still compiles. ------------- PR: https://git.openjdk.java.net/jdk/pull/2354 From github.com+71722661+earthling-amzn at openjdk.java.net Thu Feb 4 18:20:40 2021 From: github.com+71722661+earthling-amzn at openjdk.java.net (earthling-amzn) Date: Thu, 4 Feb 2021 18:20:40 GMT Subject: RFR: 8259647: Add support for JFR event ObjectCountAfterGC to Shenandoah In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 20:05:33 GMT, Zhengyu Gu wrote: > Please review this patch that adds JFR ObjectCountAfterGC event support. > > AFAICT, the event is off by default. If it is enabled, it distorts Shenandoah pause characteristics, since it performs heap walk during final mark pause. > > When event is disabled: > `[191.033s][info][gc,stats] Pause Init Mark (G) 454 us` > `[191.033s][info][gc,stats] Pause Init Mark (N) 13 us` > > When event is enabled: > `[396.631s][info][gc,stats] Pause Final Mark (G) 43199 us` > `[396.631s][info][gc,stats] Pause Final Mark (N) 42982 us` > > Test: > - [x] hotspot_gc_shenandoah That certainly is bad news for pause times. Do you think it'd be feasible to "piggyback" the object count calculation on concurrent marking? Might address https://bugs.openjdk.java.net/browse/JDK-8258431 also. ------------- PR: https://git.openjdk.java.net/jdk/pull/2386 From rkennke at openjdk.java.net Thu Feb 4 18:46:41 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Thu, 4 Feb 2021 18:46:41 GMT Subject: RFR: 8259647: Add support for JFR event ObjectCountAfterGC to Shenandoah In-Reply-To: References: Message-ID: On Thu, 4 Feb 2021 18:17:46 GMT, earthling-amzn wrote: > That certainly is bad news for pause times. Do you think it'd be feasible to "piggyback" the object count calculation on concurrent marking? Might address https://bugs.openjdk.java.net/browse/JDK-8258431 also. This could certainly be done, in a similar fashion as liveness counting. However, it would have to be done such that it only actually counts objects when JFR is requesting it, and otherwise stays out of the way, because this costs marking performance. Which means doubling the number of mark-loops, and select the correct loop based on whether or not we need object counts. ------------- PR: https://git.openjdk.java.net/jdk/pull/2386 From zgu at openjdk.java.net Thu Feb 4 19:22:39 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 4 Feb 2021 19:22:39 GMT Subject: RFR: 8259647: Add support for JFR event ObjectCountAfterGC to Shenandoah In-Reply-To: References: Message-ID: On Thu, 4 Feb 2021 18:44:24 GMT, Roman Kennke wrote: > That certainly is bad news for pause times. Do you think it'd be feasible to "piggyback" the object count calculation on concurrent marking? Might address https://bugs.openjdk.java.net/browse/JDK-8258431 also. It dose not just count number of objects, but number of objects by type, much more than liveness counting. Just add a branch in hot marking loop, I can foresee negative impact on performance. ------------- PR: https://git.openjdk.java.net/jdk/pull/2386 From github.com+71722661+earthling-amzn at openjdk.java.net Thu Feb 4 19:45:41 2021 From: github.com+71722661+earthling-amzn at openjdk.java.net (earthling-amzn) Date: Thu, 4 Feb 2021 19:45:41 GMT Subject: RFR: 8259647: Add support for JFR event ObjectCountAfterGC to Shenandoah In-Reply-To: References: Message-ID: On Thu, 4 Feb 2021 19:19:35 GMT, Zhengyu Gu wrote: >>> That certainly is bad news for pause times. Do you think it'd be feasible to "piggyback" the object count calculation on concurrent marking? Might address https://bugs.openjdk.java.net/browse/JDK-8258431 also. >> >> This could certainly be done, in a similar fashion as liveness counting. However, it would have to be done such that it only actually counts objects when JFR is requesting it, and otherwise stays out of the way, because this costs marking performance. Which means doubling the number of mark-loops, and select the correct loop based on whether or not we need object counts. > >> That certainly is bad news for pause times. Do you think it'd be feasible to "piggyback" the object count calculation on concurrent marking? Might address https://bugs.openjdk.java.net/browse/JDK-8258431 also. > > It dose not just count number of objects, but number of objects by type, much more than liveness counting. Just add a branch in hot marking loop, I can foresee negative impact on performance. Would it be possible to combine the object _counting_ closure and the object _marking_ closure into one aggregate closure and complete both calculations in one pass over the live objects? Of course, only do this when the JFR event is enabled (and even then, perhaps only do it periodically). ------------- PR: https://git.openjdk.java.net/jdk/pull/2386 From rkennke at openjdk.java.net Thu Feb 4 19:45:42 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Thu, 4 Feb 2021 19:45:42 GMT Subject: RFR: 8259647: Add support for JFR event ObjectCountAfterGC to Shenandoah In-Reply-To: References: Message-ID: On Thu, 4 Feb 2021 19:19:35 GMT, Zhengyu Gu wrote: > > That certainly is bad news for pause times. Do you think it'd be feasible to "piggyback" the object count calculation on concurrent marking? Might address https://bugs.openjdk.java.net/browse/JDK-8258431 also. > > It dose not just count number of objects, but number of objects by type, much more than liveness counting. Just add a branch in hot marking loop, I can foresee negative impact on performance. Yes, as I suggested earlier, I'd only turn it on when requested by JFR, and otherwise leave it off. It definitely will impact performance. That means another set of mark loops that we need to generate at compile-time. ------------- PR: https://git.openjdk.java.net/jdk/pull/2386 From kbarrett at openjdk.java.net Fri Feb 5 07:07:43 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 5 Feb 2021 07:07:43 GMT Subject: RFR: 8259862: MutableSpace's end should be atomic In-Reply-To: References: Message-ID: On Sat, 30 Jan 2021 12:37:50 GMT, Albert Mingkun Yang wrote: >> Please review this change to MutableSpace, making its _end member volatile >> and using Atomic operations to access the _top and _end members. Some >> unused accessor functions that would otherwise need updating are removed. >> >> Testing: >> mach5 tier1 > > src/hotspot/share/gc/parallel/mutableSpace.hpp line 62: > >> 60: HeapWord* _bottom; >> 61: HeapWord* volatile _top; >> 62: HeapWord* volatile _end; > > Maybe add some comments explaining how `_top` and `_end` are used in the concurrent setting. I've added some comments describing `_bottom`, `_top`, and `_end`. ------------- PR: https://git.openjdk.java.net/jdk/pull/2323 From kbarrett at openjdk.java.net Fri Feb 5 07:27:58 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 5 Feb 2021 07:27:58 GMT Subject: RFR: 8259862: MutableSpace's end should be atomic [v2] In-Reply-To: References: Message-ID: <8ATJD3ux-1gG56DPWs2XwN_C-aEpNGMxGS1rQvcN9cA=.aa1f3e41-4943-4b06-8838-a96243f56d8c@github.com> > Please review this change to MutableSpace, making its _end member volatile > and using Atomic operations to access the _top and _end members. Some > unused accessor functions that would otherwise need updating are removed. > > Testing: > mach5 tier1 Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - describe _top and _end - reinstate end_addr() after JDK-8259778 - Merge branch 'master' into atomic_end - make _end volatile and use atomic access ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2323/files - new: https://git.openjdk.java.net/jdk/pull/2323/files/a091498c..823879a0 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2323&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2323&range=00-01 Stats: 13119 lines in 648 files changed: 7935 ins; 2932 del; 2252 mod Patch: https://git.openjdk.java.net/jdk/pull/2323.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2323/head:pull/2323 PR: https://git.openjdk.java.net/jdk/pull/2323 From kbarrett at openjdk.java.net Fri Feb 5 07:27:59 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 5 Feb 2021 07:27:59 GMT Subject: Integrated: 8259862: MutableSpace's end should be atomic In-Reply-To: References: Message-ID: On Sat, 30 Jan 2021 05:51:38 GMT, Kim Barrett wrote: > Please review this change to MutableSpace, making its _end member volatile > and using Atomic operations to access the _top and _end members. Some > unused accessor functions that would otherwise need updating are removed. > > Testing: > mach5 tier1 This pull request has now been integrated. Changeset: 1e0a1013 Author: Kim Barrett URL: https://git.openjdk.java.net/jdk/commit/1e0a1013 Stats: 27 lines in 4 files changed: 7 ins; 12 del; 8 mod 8259862: MutableSpace's end should be atomic Make _end volatile and use atomic access Reviewed-by: ayang, tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/2323 From tschatzl at openjdk.java.net Fri Feb 5 08:36:40 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 5 Feb 2021 08:36:40 GMT Subject: RFR: 8234534: Simplify CardTable code after CMS removal [v2] In-Reply-To: References: Message-ID: On Thu, 4 Feb 2021 10:29:18 GMT, Kim Barrett wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> kimbarret, albertnetymk review > > Looks good to me, with the one minor nit I commented on and Albert's suggestions. Thanks @kimbarrett @albertnetymk for your reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/2354 From tschatzl at openjdk.java.net Fri Feb 5 08:36:41 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 5 Feb 2021 08:36:41 GMT Subject: Integrated: 8234534: Simplify CardTable code after CMS removal In-Reply-To: References: Message-ID: On Tue, 2 Feb 2021 15:13:38 GMT, Thomas Schatzl wrote: > Hi, > > can I have reviews for this cleanup that removes CMS specific code from `CardTable/CardTableRS`? > > Note that there is still this "conc_scan" parameter passed to the card table that affects barrier code generation, for some reason also G1 barrier code generation although it should not as `G1CardTable::scanned_concurrently()` only used for the "normal" card table. Initial attempts showed that removing this is not straightforward, causing crashes and so I left it out for [JDK-8250941](https://bugs.openjdk.java.net/browse/JDK-8260941) so that this change is solely about removing unused code. > > Testing: tier1-4, some tier1-5 runs earlier (before some removal of hunks for files only containing copyright updates or newline changes) This pull request has now been integrated. Changeset: 78b0d327 Author: Thomas Schatzl URL: https://git.openjdk.java.net/jdk/commit/78b0d327 Stats: 205 lines in 9 files changed: 0 ins; 191 del; 14 mod 8234534: Simplify CardTable code after CMS removal Reviewed-by: ayang, kbarrett ------------- PR: https://git.openjdk.java.net/jdk/pull/2354 From manc at google.com Fri Feb 5 08:47:10 2021 From: manc at google.com (Man Cao) Date: Fri, 5 Feb 2021 00:47:10 -0800 Subject: State of "simplified barriers" for G1 In-Reply-To: References: Message-ID: Hi All, My apology for postponing this. I've been busy rolling out JDK 11 to all our production servers for the last year. The current state is that the OpenJDK GC team and us have determined to implement https://bugs.openjdk.java.net/browse/JDK-8226731 first, before committing the simplified write barrier. We'd like to get rid of the storeload fence even with Conc Refine enabled. Note that JDK-8230187 contains the most up-to-date description for the proposed simplified writer barrier, JDK-8226197 is a bit outdated. I target to get both JDK-8226731 and JDK-8230187 in JDK 17. I'll send a separate email for JDK-8226731, as there are still some challenges there. Yude, thanks for sharing the ideal and results! I think it is best to open a new RFE for further improvement after JDK-8230187 is implemented. If I understand correctly, the proposed approach avoids dirtying the cards for old-to-old reference stores in young-only phases. That's a nice idea. Are the results comparing the two types of simplified write barriers? Or is for comparing the default barrier with the storeload fence, vs your simplified write barrier that filters untracked regions? -Man On Tue, Dec 22, 2020 at 2:31 AM ??? wrote: > Hi All, > > We are also interested in any follow-ups on this topic. If I recall > correctly, when this was discussed in JDK-8226197, one of the TODOs was > that the storeload fence can be skipped when Conc Refine is turned off. > Regarding this, I'd like to share an idea we have been experimenting in the > last couple of months. We took "skipping the fence" a little further and > tried to improve the throughput with less harm to pause time. > > This is from the observation that many card dirtying operations can go > away without concurrent refine. More specifically, writes that produce a > reference OldObj1.foo->OldObj2 need not dirty the card corresponding to > OldObj1 during young-gc-only phase. Currently, with Conc Refine, this > operation will dirty that card, then the card will be refined (thrown away) > by the refinement thread, because it discovers that the reference points to > an Old region, which is "untracked" during young-gc-only phase. > > The refinement thread does this concurrently so that GC doesn't have to do > it during a pause. But we (~lmao) realized that we can use a flag to > indicate whether a region is tracked, and discard the card dirtying > operation immediately in the barrier (after testing against the flag). We > can do it without any atomics/fences, just ~5 instructions in the barrier. > This way, we get rid of the storeload mem barrier, with Conc Refine turned > off, while still getting the same pause time guarantee in young-gc-only > phase. But as you can see, Mixed GCs still suffer from having no concurrent > refinement. > > We saw improvements on Alibaba JDK11u across the benchmarks we used > (positive number means better): > Dacapo: cases vary from -3.3% to +5.1%, on average +0.3% > specjbb2015 on 96x2.50GHz, 16 GC threads, 24g mem: critical-jOPS +1.9%, > max-jOPS +2.8% > specjbb2015 on 8x2.50GHz, 8 GC threads, 16g mem (observed more Mixed GCs): > critical-jOPS +0.1%, max-jOPS +5.7% > specjvm2008: cases vary from -0.7% to +23.4%, on average +3.1% > Extremem: cases vary from -2.1% to +7.8%, on average +1.0% > I'd love to hear any feedbacks, comments, what problems you can see in > this approach, conceptually or practically, and back to the topic, whether > this idea can be incorporated into your future work/plan of creating a > simplified barrier. > > Yude Lin > > > ------------------------------------------------------------------ > ????Gerhard Hueller > ?????2020?12?21?(???) 03:19 > ????hotspot-gc-dev at openjdk.java.net > ? ??State of "simplified barriers" for G1 > > Hi, > > I remember a slide deck talking about the improvements to G1 since JDK8/9 > and one bullet point on the todo-list was simplified barriers for G1. > > I wonder what happened to this improvement, has it been already > implemented? Is this the non-concurrent refinement option implemented by > google some time ago? > Improvements in this area would be really great, CMS still provides better > throughput for most workloads - with the only real advantage of G1 does > offer are avoiding those degenerated STW full GCs. > > Thanks, Gerhard From yude.lyd at alibaba-inc.com Fri Feb 5 09:39:15 2021 From: yude.lyd at alibaba-inc.com (=?UTF-8?B?5p6X6IKy5b63?=) Date: Fri, 05 Feb 2021 17:39:15 +0800 Subject: =?UTF-8?B?UmU6IFN0YXRlIG9mICJzaW1wbGlmaWVkIGJhcnJpZXJzIiBmb3IgRzE=?= In-Reply-To: References: , Message-ID: <1005b1ba-e5f9-401c-887c-6f607c9db5f6.yude.lyd@alibaba-inc.com> Thanks Man, I'm glad to hear the updates. I will follow JDK-8230187 closely. I think it is best to open a new RFE for further improvement after JDK-8230187 is implemented. I will take this approach. If I understand correctly, the proposed approach avoids dirtying the cards for old-to-old reference stores in young-only phases. That is correct. Are the results comparing the two types of simplified write barriers? Or is for comparing the default barrier with the storeload fence, vs your simplified write barrier that filters untracked regions? We compared the default barrier (with storeload fence, concurrent refine on) vs untracked region filter (with no storeload fence, concurrent refine off). Yude ------------------------------------------------------------------ From:Man Cao Send Time:2021?2?5?(???) 16:47 To:hotspot-gc-dev at openjdk.java.net Cc:???(??) Subject:Re: State of "simplified barriers" for G1 Hi All, My apology for postponing this. I've been busy rolling out JDK 11 to all our production servers for the last year. The current state is that the OpenJDK GC team and us have determined to implement https://bugs.openjdk.java.net/browse/JDK-8226731 first, before committing the simplified write barrier. We'd like to get rid of the storeload fence even with Conc Refine enabled. Note that JDK-8230187 contains the most up-to-date description for the proposed simplified writer barrier, JDK-8226197 is a bit outdated. I target to get both JDK-8226731 and JDK-8230187 in JDK 17. I'll send a separate email for JDK-8226731, as there are still some challenges there. Yude, thanks for sharing the ideal and results! I think it is best to open a new RFE for further improvement after JDK-8230187 is implemented. If I understand correctly, the proposed approach avoids dirtying the cards for old-to-old reference stores in young-only phases. That's a nice idea. Are the results comparing the two types of simplified write barriers? Or is for comparing the default barrier with the storeload fence, vs your simplified write barrier that filters untracked regions? -Man On Tue, Dec 22, 2020 at 2:31 AM ??? wrote: Hi All, We are also interested in any follow-ups on this topic. If I recall correctly, when this was discussed in JDK-8226197, one of the TODOs was that the storeload fence can be skipped when Conc Refine is turned off. Regarding this, I'd like to share an idea we have been experimenting in the last couple of months. We took "skipping the fence" a little further and tried to improve the throughput with less harm to pause time. This is from the observation that many card dirtying operations can go away without concurrent refine. More specifically, writes that produce a reference OldObj1.foo->OldObj2 need not dirty the card corresponding to OldObj1 during young-gc-only phase. Currently, with Conc Refine, this operation will dirty that card, then the card will be refined (thrown away) by the refinement thread, because it discovers that the reference points to an Old region, which is "untracked" during young-gc-only phase. The refinement thread does this concurrently so that GC doesn't have to do it during a pause. But we (~lmao) realized that we can use a flag to indicate whether a region is tracked, and discard the card dirtying operation immediately in the barrier (after testing against the flag). We can do it without any atomics/fences, just ~5 instructions in the barrier. This way, we get rid of the storeload mem barrier, with Conc Refine turned off, while still getting the same pause time guarantee in young-gc-only phase. But as you can see, Mixed GCs still suffer from having no concurrent refinement. We saw improvements on Alibaba JDK11u across the benchmarks we used (positive number means better): Dacapo: cases vary from -3.3% to +5.1%, on average +0.3% specjbb2015 on 96x2.50GHz, 16 GC threads, 24g mem: critical-jOPS +1.9%, max-jOPS +2.8% specjbb2015 on 8x2.50GHz, 8 GC threads, 16g mem (observed more Mixed GCs): critical-jOPS +0.1%, max-jOPS +5.7% specjvm2008: cases vary from -0.7% to +23.4%, on average +3.1% Extremem: cases vary from -2.1% to +7.8%, on average +1.0% I'd love to hear any feedbacks, comments, what problems you can see in this approach, conceptually or practically, and back to the topic, whether this idea can be incorporated into your future work/plan of creating a simplified barrier. Yude Lin ------------------------------------------------------------------ ????Gerhard Hueller ?????2020?12?21?(???) 03:19 ????hotspot-gc-dev at openjdk.java.net ????State of "simplified barriers" for G1 Hi, I remember a slide deck talking about the improvements to G1 since JDK8/9 and one bullet point on the todo-list was simplified barriers for G1. I wonder what happened to this improvement, has it been already implemented? Is this the non-concurrent refinement option implemented by google some time ago? Improvements in this area would be really great, CMS still provides better throughput for most workloads - with the only real advantage of G1 does offer are avoiding those degenerated STW full GCs. Thanks, Gerhard From kbarrett at openjdk.java.net Fri Feb 5 10:14:01 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 5 Feb 2021 10:14:01 GMT Subject: RFR: 8261213: [BACKOUT] MutableSpace's end should be atomic Message-ID: This reverts commit 1e0a1013efcb3983d277134f04f5e38f687e88c5. Please review this backout of JDK-8259862: MutableSpace's end should be atomic With that change: gc/TestVerifyDuringStartup.java with -XX:+UseParallelGC -XX:-UseNUMA fails with: # guarantee(false) failed: inline contiguous allocation not supported Testing: Locally (linux-x64) verified the failure is reproducible. Locally (linux-x64) verified no failure with the backout applied. Locally (linux-x64) hotspot:tier1 with -XX:+UseParallelGC -XX:-UseNUMA is fine except for two serviceability tests that always fail locally with UseParallelGC mach5 tier1-2 (in progress) ------------- Commit messages: - Revert "8259862: MutableSpace's end should be atomic" Changes: https://git.openjdk.java.net/jdk/pull/2426/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2426&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261213 Stats: 27 lines in 4 files changed: 12 ins; 7 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/2426.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2426/head:pull/2426 PR: https://git.openjdk.java.net/jdk/pull/2426 From tschatzl at openjdk.java.net Fri Feb 5 10:14:01 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 5 Feb 2021 10:14:01 GMT Subject: RFR: 8261213: [BACKOUT] MutableSpace's end should be atomic In-Reply-To: References: Message-ID: On Fri, 5 Feb 2021 10:03:56 GMT, Kim Barrett wrote: > This reverts commit 1e0a1013efcb3983d277134f04f5e38f687e88c5. > > Please review this backout of > JDK-8259862: MutableSpace's end should be atomic > > With that change: > gc/TestVerifyDuringStartup.java with -XX:+UseParallelGC -XX:-UseNUMA fails with: > # guarantee(false) failed: inline contiguous allocation not supported > > Testing: > Locally (linux-x64) verified the failure is reproducible. > Locally (linux-x64) verified no failure with the backout applied. > Locally (linux-x64) hotspot:tier1 with -XX:+UseParallelGC -XX:-UseNUMA > is fine except for two serviceability tests that always fail locally with > UseParallelGC > mach5 tier1-2 (in progress) Lgtm. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2426 From ayang at openjdk.java.net Fri Feb 5 10:14:01 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Fri, 5 Feb 2021 10:14:01 GMT Subject: RFR: 8261213: [BACKOUT] MutableSpace's end should be atomic In-Reply-To: References: Message-ID: On Fri, 5 Feb 2021 10:03:56 GMT, Kim Barrett wrote: > This reverts commit 1e0a1013efcb3983d277134f04f5e38f687e88c5. > > Please review this backout of > JDK-8259862: MutableSpace's end should be atomic > > With that change: > gc/TestVerifyDuringStartup.java with -XX:+UseParallelGC -XX:-UseNUMA fails with: > # guarantee(false) failed: inline contiguous allocation not supported > > Testing: > Locally (linux-x64) verified the failure is reproducible. > Locally (linux-x64) verified no failure with the backout applied. > Locally (linux-x64) hotspot:tier1 with -XX:+UseParallelGC -XX:-UseNUMA > is fine except for two serviceability tests that always fail locally with > UseParallelGC > mach5 tier1-2 (in progress) Marked as reviewed by ayang (Author). ------------- PR: https://git.openjdk.java.net/jdk/pull/2426 From tschatzl at openjdk.java.net Fri Feb 5 10:14:02 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 5 Feb 2021 10:14:02 GMT Subject: RFR: 8261213: [BACKOUT] MutableSpace's end should be atomic In-Reply-To: References: Message-ID: On Fri, 5 Feb 2021 10:10:25 GMT, Albert Mingkun Yang wrote: >> This reverts commit 1e0a1013efcb3983d277134f04f5e38f687e88c5. >> >> Please review this backout of >> JDK-8259862: MutableSpace's end should be atomic >> >> With that change: >> gc/TestVerifyDuringStartup.java with -XX:+UseParallelGC -XX:-UseNUMA fails with: >> # guarantee(false) failed: inline contiguous allocation not supported >> >> Testing: >> Locally (linux-x64) verified the failure is reproducible. >> Locally (linux-x64) verified no failure with the backout applied. >> Locally (linux-x64) hotspot:tier1 with -XX:+UseParallelGC -XX:-UseNUMA >> is fine except for two serviceability tests that always fail locally with >> UseParallelGC >> mach5 tier1-2 (in progress) > > Marked as reviewed by ayang (Author). Since this looks like a clean backout, please push asap. ------------- PR: https://git.openjdk.java.net/jdk/pull/2426 From kbarrett at openjdk.java.net Fri Feb 5 10:21:42 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 5 Feb 2021 10:21:42 GMT Subject: Integrated: 8261213: [BACKOUT] MutableSpace's end should be atomic In-Reply-To: References: Message-ID: On Fri, 5 Feb 2021 10:03:56 GMT, Kim Barrett wrote: > This reverts commit 1e0a1013efcb3983d277134f04f5e38f687e88c5. > > Please review this backout of > JDK-8259862: MutableSpace's end should be atomic > > With that change: > gc/TestVerifyDuringStartup.java with -XX:+UseParallelGC -XX:-UseNUMA fails with: > # guarantee(false) failed: inline contiguous allocation not supported > > Testing: > Locally (linux-x64) verified the failure is reproducible. > Locally (linux-x64) verified no failure with the backout applied. > Locally (linux-x64) hotspot:tier1 with -XX:+UseParallelGC -XX:-UseNUMA > is fine except for two serviceability tests that always fail locally with > UseParallelGC > mach5 tier1-2 (in progress) This pull request has now been integrated. Changeset: 224c166c Author: Kim Barrett URL: https://git.openjdk.java.net/jdk/commit/224c166c Stats: 27 lines in 4 files changed: 12 ins; 7 del; 8 mod 8261213: [BACKOUT] MutableSpace's end should be atomic Reviewed-by: tschatzl, ayang ------------- PR: https://git.openjdk.java.net/jdk/pull/2426 From thomas.schatzl at oracle.com Fri Feb 5 12:18:18 2021 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 5 Feb 2021 13:18:18 +0100 Subject: State of "simplified barriers" for G1 In-Reply-To: References: Message-ID: <156634fe-ae7e-2eda-8fc5-51b288369e9d@oracle.com> Hi all, sorry for chiming in so late answer, due to holidays and email server move that email thread got lost. On 05.02.21 09:47, Man Cao wrote: > Hi All, > > My apology for postponing this. I've been busy rolling out JDK 11 to all > our production servers for the last year. [...] > and JDK-8230187 in JDK 17. I'll send a separate email for JDK-8226731, as > there are still some challenges there. Great to hear! > > Yude, thanks for sharing the ideal and results! I think it is best to open > a new RFE for further improvement after JDK-8230187 is implemented. > If I understand correctly, the proposed approach avoids dirtying the cards > for old-to-old reference stores in young-only phases. That's a nice idea. > Are the results comparing the two types of simplified write barriers? Or is > for comparing the default barrier with the storeload fence, vs your > simplified write barrier that filters untracked regions? > > -Man > > > On Tue, Dec 22, 2020 at 2:31 AM ??? wrote: > >> Hi All, >> >> We are also interested in any follow-ups on this topic. If I recall >> correctly, when this was discussed in JDK-8226197, one of the TODOs was >> that the storeload fence can be skipped when Conc Refine is turned off. >> Regarding this, I'd like to share an idea we have been experimenting in the >> last couple of months. We took "skipping the fence" a little further and >> tried to improve the throughput with less harm to pause time. >> >> This is from the observation that many card dirtying operations can go >> away without concurrent refine. More specifically, writes that produce a >> reference OldObj1.foo->OldObj2 need not dirty the card corresponding to >> OldObj1 during young-gc-only phase. Currently, with Conc Refine, this >> operation will dirty that card, then the card will be refined (thrown away) >> by the refinement thread, because it discovers that the reference points to >> an Old region, which is "untracked" during young-gc-only phase. >> >> The refinement thread does this concurrently so that GC doesn't have to do >> it during a pause. But we (~lmao) realized that we can use a flag to >> indicate whether a region is tracked, and discard the card dirtying >> operation immediately in the barrier (after testing against the flag). We >> can do it without any atomics/fences, just ~5 instructions in the barrier. >> This way, we get rid of the storeload mem barrier, with Conc Refine turned >> off, while still getting the same pause time guarantee in young-gc-only >> phase. But as you can see, Mixed GCs still suffer from having no concurrent >> refinement. >> >> We saw improvements on Alibaba JDK11u across the benchmarks we used >> (positive number means better): >> Dacapo: cases vary from -3.3% to +5.1%, on average +0.3% >> specjbb2015 on 96x2.50GHz, 16 GC threads, 24g mem: critical-jOPS +1.9%, >> max-jOPS +2.8% >> specjbb2015 on 8x2.50GHz, 8 GC threads, 16g mem (observed more Mixed GCs): >> critical-jOPS +0.1%, max-jOPS +5.7% >> specjvm2008: cases vary from -0.7% to +23.4%, on average +3.1% >> Extremem: cases vary from -2.1% to +7.8%, on average +1.0% >> I'd love to hear any feedbacks, comments, what problems you can see in >> this approach, conceptually or practically, and back to the topic, whether >> this idea can be incorporated into your future work/plan of creating a >> simplified barrier. Fwiw, this sounds what I was trying when I was working on remembered sets and barriers for something like G1. From what I remember these changes yielded mixed results (for DaCapo and other small benchmarks with contemporary desktop machines) similar to yours so it has been dropped at that time (and the comparison point you gave is not clear, and I do not remember what I compared exactly). Basically there has been a table containing a word whether we track outgoing (i.e. what the "young" marks on the card table currently do) or incoming references (i.e. whether the region needs remembered set updates), which sounds very similar to what you have done. If concurrent refinement is turned off you do not need the storeload - but then it can be advantageous to avoid dirtying cards as much as possible to decrease work during gc, this is correct. Also, as you might have noticed from CRs being filed we are actively thinking about improving the current barriers wrt to code size (e.g. JDK-8256279, JDK-8256282, ... not sure if everything has been filed yet what we thought of) and general footprint (e.g. refactoring the PtrQueues, dropping some TLS data to make room for other data to decrease code size) >> >> Yude Lin >> Thanks, Thomas From shade at openjdk.java.net Fri Feb 5 13:27:43 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 5 Feb 2021 13:27:43 GMT Subject: RFR: 8260736: Shenandoah: Cleanup includes in ShenandoahGC and families [v2] In-Reply-To: References: Message-ID: On Mon, 1 Feb 2021 21:25:54 GMT, Zhengyu Gu wrote: >> 8260736: Shenandoah: Cleanup includes in ShenandoahGC and families > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > Added back vmThread.hpp Looks fine. But please pull from recent master to see if other `#include` work breaks these. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2339 From zgu at openjdk.java.net Fri Feb 5 14:15:56 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Fri, 5 Feb 2021 14:15:56 GMT Subject: RFR: 8260736: Shenandoah: Cleanup includes in ShenandoahGC and families [v3] In-Reply-To: References: Message-ID: > 8260736: Shenandoah: Cleanup includes in ShenandoahGC and families Zhengyu Gu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Merge master - Added back vmThread.hpp - update - Merge branch 'master' into JDK-8260736-cleanup-includes-gc - update - init ------------- Changes: https://git.openjdk.java.net/jdk/pull/2339/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2339&range=02 Stats: 35 lines in 10 files changed: 2 ins; 28 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/2339.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2339/head:pull/2339 PR: https://git.openjdk.java.net/jdk/pull/2339 From yude.lyd at alibaba-inc.com Fri Feb 5 15:18:59 2021 From: yude.lyd at alibaba-inc.com (=?UTF-8?B?5p6X6IKy5b63?=) Date: Fri, 05 Feb 2021 23:18:59 +0800 Subject: =?UTF-8?B?UmU6IFN0YXRlIG9mICJzaW1wbGlmaWVkIGJhcnJpZXJzIiBmb3IgRzE=?= In-Reply-To: <156634fe-ae7e-2eda-8fc5-51b288369e9d@oracle.com> References: , <156634fe-ae7e-2eda-8fc5-51b288369e9d@oracle.com> Message-ID: Hi Thomas, I think we are talking about very similar idea. Thanks for the feedbacks. I understand completely if we are talking about code size. We recently did an experiment where we found barrier code size is the major reason behind the performance gap between G1 and CMS (Flink on Nexmark just fyi). That's why we use a technique similar to JDK-8245464 to reduce the "filter" code to just ~17 bytes for C2. For C1, it can be put in the stub. But yeah it can still be too much especially when compared to a CMS-style barrier. Yude Lin ------------------------------------------------------------------ From:Thomas Schatzl Send Time:2021?2?5?(???) 20:19 To:hotspot-gc-dev Subject:Re: State of "simplified barriers" for G1 Hi all, sorry for chiming in so late answer, due to holidays and email server move that email thread got lost. On 05.02.21 09:47, Man Cao wrote: > Hi All, > > My apology for postponing this. I've been busy rolling out JDK 11 to all > our production servers for the last year. [...] > and JDK-8230187 in JDK 17. I'll send a separate email for JDK-8226731, as > there are still some challenges there. Great to hear! > > Yude, thanks for sharing the ideal and results! I think it is best to open > a new RFE for further improvement after JDK-8230187 is implemented. > If I understand correctly, the proposed approach avoids dirtying the cards > for old-to-old reference stores in young-only phases. That's a nice idea. > Are the results comparing the two types of simplified write barriers? Or is > for comparing the default barrier with the storeload fence, vs your > simplified write barrier that filters untracked regions? > > -Man > > > On Tue, Dec 22, 2020 at 2:31 AM ??? wrote: > >> Hi All, >> >> We are also interested in any follow-ups on this topic. If I recall >> correctly, when this was discussed in JDK-8226197, one of the TODOs was >> that the storeload fence can be skipped when Conc Refine is turned off. >> Regarding this, I'd like to share an idea we have been experimenting in the >> last couple of months. We took "skipping the fence" a little further and >> tried to improve the throughput with less harm to pause time. >> >> This is from the observation that many card dirtying operations can go >> away without concurrent refine. More specifically, writes that produce a >> reference OldObj1.foo->OldObj2 need not dirty the card corresponding to >> OldObj1 during young-gc-only phase. Currently, with Conc Refine, this >> operation will dirty that card, then the card will be refined (thrown away) >> by the refinement thread, because it discovers that the reference points to >> an Old region, which is "untracked" during young-gc-only phase. >> >> The refinement thread does this concurrently so that GC doesn't have to do >> it during a pause. But we (~lmao) realized that we can use a flag to >> indicate whether a region is tracked, and discard the card dirtying >> operation immediately in the barrier (after testing against the flag). We >> can do it without any atomics/fences, just ~5 instructions in the barrier. >> This way, we get rid of the storeload mem barrier, with Conc Refine turned >> off, while still getting the same pause time guarantee in young-gc-only >> phase. But as you can see, Mixed GCs still suffer from having no concurrent >> refinement. >> >> We saw improvements on Alibaba JDK11u across the benchmarks we used >> (positive number means better): >> Dacapo: cases vary from -3.3% to +5.1%, on average +0.3% >> specjbb2015 on 96x2.50GHz, 16 GC threads, 24g mem: critical-jOPS +1.9%, >> max-jOPS +2.8% >> specjbb2015 on 8x2.50GHz, 8 GC threads, 16g mem (observed more Mixed GCs): >> critical-jOPS +0.1%, max-jOPS +5.7% >> specjvm2008: cases vary from -0.7% to +23.4%, on average +3.1% >> Extremem: cases vary from -2.1% to +7.8%, on average +1.0% >> I'd love to hear any feedbacks, comments, what problems you can see in >> this approach, conceptually or practically, and back to the topic, whether >> this idea can be incorporated into your future work/plan of creating a >> simplified barrier. Fwiw, this sounds what I was trying when I was working on remembered sets and barriers for something like G1. From what I remember these changes yielded mixed results (for DaCapo and other small benchmarks with contemporary desktop machines) similar to yours so it has been dropped at that time (and the comparison point you gave is not clear, and I do not remember what I compared exactly). Basically there has been a table containing a word whether we track outgoing (i.e. what the "young" marks on the card table currently do) or incoming references (i.e. whether the region needs remembered set updates), which sounds very similar to what you have done. If concurrent refinement is turned off you do not need the storeload - but then it can be advantageous to avoid dirtying cards as much as possible to decrease work during gc, this is correct. Also, as you might have noticed from CRs being filed we are actively thinking about improving the current barriers wrt to code size (e.g. JDK-8256279, JDK-8256282, ... not sure if everything has been filed yet what we thought of) and general footprint (e.g. refactoring the PtrQueues, dropping some TLS data to make room for other data to decrease code size) >> >> Yude Lin >> Thanks, Thomas From rkennke at openjdk.java.net Fri Feb 5 15:49:44 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Fri, 5 Feb 2021 15:49:44 GMT Subject: RFR: 8260736: Shenandoah: Cleanup includes in ShenandoahGC and families [v3] In-Reply-To: References: Message-ID: On Fri, 5 Feb 2021 14:15:56 GMT, Zhengyu Gu wrote: >> 8260736: Shenandoah: Cleanup includes in ShenandoahGC and families > > Zhengyu Gu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge master > - Added back vmThread.hpp > - update > - Merge branch 'master' into JDK-8260736-cleanup-includes-gc > - update > - init Looks good to me! Thanks! ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2339 From rkennke at openjdk.java.net Fri Feb 5 18:26:54 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Fri, 5 Feb 2021 18:26:54 GMT Subject: RFR: 8261251: Shenandoah: Use object size for full GC humongous compaction Message-ID: Currently, copying objects in full GC humongous object comaction copies the full region. We can limit that to copying only the object size and save some wasted cycles. Also, this fixes a test failure with loom where object copy checks that the given size matches the object size. - [x] hotspot_gc_shenandoah - [ ] tier1(+Shenandoah) ------------- Commit messages: - 8261251: Shenandoah: Use object size for full GC humongous compaction Changes: https://git.openjdk.java.net/jdk/pull/2433/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2433&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261251 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2433.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2433/head:pull/2433 PR: https://git.openjdk.java.net/jdk/pull/2433 From zgu at openjdk.java.net Fri Feb 5 19:33:44 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Fri, 5 Feb 2021 19:33:44 GMT Subject: Integrated: 8260736: Shenandoah: Cleanup includes in ShenandoahGC and families In-Reply-To: References: Message-ID: On Mon, 1 Feb 2021 18:55:03 GMT, Zhengyu Gu wrote: > 8260736: Shenandoah: Cleanup includes in ShenandoahGC and families This pull request has now been integrated. Changeset: 7a6c1768 Author: Zhengyu Gu URL: https://git.openjdk.java.net/jdk/commit/7a6c1768 Stats: 35 lines in 10 files changed: 2 ins; 28 del; 5 mod 8260736: Shenandoah: Cleanup includes in ShenandoahGC and families Reviewed-by: shade, rkennke ------------- PR: https://git.openjdk.java.net/jdk/pull/2339 From aph at openjdk.java.net Sat Feb 6 10:37:41 2021 From: aph at openjdk.java.net (Andrew Haley) Date: Sat, 6 Feb 2021 10:37:41 GMT Subject: RFR: 8261251: Shenandoah: Use object size for full GC humongous compaction In-Reply-To: References: Message-ID: On Fri, 5 Feb 2021 18:21:55 GMT, Roman Kennke wrote: > Currently, copying objects in full GC humongous object comaction copies the full region. We can limit that to copying only the object size and save some wasted cycles. Also, this fixes a test failure with loom where object copy checks that the given size matches the object size. > > - [x] hotspot_gc_shenandoah > - [x] tier1(+Shenandoah) Marked as reviewed by aph (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2433 From shade at openjdk.java.net Mon Feb 8 07:31:45 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 8 Feb 2021 07:31:45 GMT Subject: RFR: 8261251: Shenandoah: Use object size for full GC humongous compaction In-Reply-To: References: Message-ID: On Fri, 5 Feb 2021 18:21:55 GMT, Roman Kennke wrote: > Currently, copying objects in full GC humongous object comaction copies the full region. We can limit that to copying only the object size and save some wasted cycles. Also, this fixes a test failure with loom where object copy checks that the given size matches the object size. > > - [x] hotspot_gc_shenandoah > - [x] tier1(+Shenandoah) Marked as reviewed by shade (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2433 From rkennke at openjdk.java.net Mon Feb 8 08:04:43 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 8 Feb 2021 08:04:43 GMT Subject: Integrated: 8261251: Shenandoah: Use object size for full GC humongous compaction In-Reply-To: References: Message-ID: On Fri, 5 Feb 2021 18:21:55 GMT, Roman Kennke wrote: > Currently, copying objects in full GC humongous object comaction copies the full region. We can limit that to copying only the object size and save some wasted cycles. Also, this fixes a test failure with loom where object copy checks that the given size matches the object size. > > - [x] hotspot_gc_shenandoah > - [x] tier1(+Shenandoah) This pull request has now been integrated. Changeset: deb0544f Author: Roman Kennke URL: https://git.openjdk.java.net/jdk/commit/deb0544f Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8261251: Shenandoah: Use object size for full GC humongous compaction Reviewed-by: aph, shade ------------- PR: https://git.openjdk.java.net/jdk/pull/2433 From aph at redhat.com Mon Feb 8 18:14:19 2021 From: aph at redhat.com (Andrew Haley) Date: Mon, 8 Feb 2021 18:14:19 +0000 Subject: Atomic operations: your thoughts are welocme Message-ID: I've been looking at the hottest Atomic operations in HotSpot, with a view to finding out if the default memory_order_conservative (which is very expensive on some architectures) can be weakened to something less. It's impossible to fix all of them, but perhaps we can fix some of the most frequent. These are the hottest compare-and-swap uses in HotSpot, with the count at the end of each line. : :: = 16406757 This one is already memory_order_relaxed, so no problem. ::Table::oop_oop_iterate(G1CMOopClosure*, oopDesc*, Klass*)+336>: :: = 3903178 This is actually MarkBitMap::par_mark calling BitMap::par_set_bit. Does this need to be memory_order_conservative, or would something weaker do? Even acq_rel or seq_cst would be better. : :: = 2376632 : :: = 2003895 I can't imagine that either of these actually need memory_order_conservative, they're just reference counts. : :: = 1719614 BitMap::par_set_bit again. , (MEMFLAGS)5>*)+432>: :: = 1617659 This one is GenericTaskQueue::pop_global calling cmpxchg_age(). Again, do we need conservative here? There is, I suppose, always a possibility that some code somewhere is taking advantage of the memory serializing properties of adjusting refcounts, I suppose. Thanks, -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From ayang at openjdk.java.net Mon Feb 8 18:45:55 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Mon, 8 Feb 2021 18:45:55 GMT Subject: RFR: 8261356: Clean up enum G1Mark Message-ID: After removing the effectively dead entry in `G1Mark`, the whole enum could be turned into a bool. The call-chain is updated and existing comments are revised. ------------- Commit messages: - bool Changes: https://git.openjdk.java.net/jdk/pull/2461/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2461&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261356 Stats: 29 lines in 4 files changed: 1 ins; 11 del; 17 mod Patch: https://git.openjdk.java.net/jdk/pull/2461.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2461/head:pull/2461 PR: https://git.openjdk.java.net/jdk/pull/2461 From rkennke at redhat.com Mon Feb 8 19:37:09 2021 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 8 Feb 2021 20:37:09 +0100 Subject: Where do obj-array-element-classes get checked for marking? Message-ID: <00ea5816-9af2-9075-bd0f-13cff9a0cbf2@redhat.com> Hello friends, I need your help: We have a testcase that is failing only very rarely, and hard to reproduce, but we have caught a hs_err file: https://bugs.openjdk.java.net/browse/JDK-8261341 as far as we can tell, we see an object which has it's Klass* damaged because the class has been unloaded earlier. The testcase generates lots of Class and then lots of empty (!) arrays of that type. Which means that the only way that the class is referenced is via the element-type of objArrayKlass. array->objArrayKlass->element_type->_java_mirror Now I'm wondering how this is supposed to be found during marking. For example, the ObjArrayKlass::oop_oop_iterate() only checks the array-klass, but not the element-klass: template void ObjArrayKlass::oop_oop_iterate(oop obj, OopClosureType* closure) { assert (obj->is_array(), "obj must be array"); objArrayOop a = objArrayOop(obj); if (Devirtualizer::do_metadata(closure)) { Devirtualizer::do_klass(closure, obj->klass()); } oop_oop_iterate_elements(a, closure); } And lacking any actual elements, the _element_klass seems the only way we could reach those classes. do_klass() only fetches the CLD and marks through that, but that wouldn't reach the element-klass either. Do we need something like: template void ObjArrayKlass::oop_oop_iterate(oop obj, OopClosureType* closure) { assert (obj->is_array(), "obj must be array"); objArrayOop a = objArrayOop(obj); if (Devirtualizer::do_metadata(closure)) { Devirtualizer::do_klass(closure, obj->klass()); Devirtualizer::do_klass(closure, a->element_klass()); <-- check element-klass here? } oop_oop_iterate_elements(a, closure); } What do you think? Thanks, Roman From stefan.karlsson at oracle.com Mon Feb 8 20:36:01 2021 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Mon, 8 Feb 2021 21:36:01 +0100 Subject: Where do obj-array-element-classes get checked for marking? In-Reply-To: <00ea5816-9af2-9075-bd0f-13cff9a0cbf2@redhat.com> References: <00ea5816-9af2-9075-bd0f-13cff9a0cbf2@redhat.com> Message-ID: Hi Roman, On 2021-02-08 20:37, Roman Kennke wrote: > Hello friends, > > I need your help: > > We have a testcase that is failing only very rarely, and hard to > reproduce, but we have caught a hs_err file: > > https://bugs.openjdk.java.net/browse/JDK-8261341 > > as far as we can tell, we see an object which has it's Klass* damaged > because the class has been unloaded earlier. > > The testcase generates lots of Class and then > lots of empty (!) arrays of that type. Which means that the only way > that the class is referenced is via the element-type of objArrayKlass. > > array->objArrayKlass->element_type->_java_mirror > > Now I'm wondering how this is supposed to be found during marking. For > example, the ObjArrayKlass::oop_oop_iterate() only checks the > array-klass, but not the element-klass: > > template > void ObjArrayKlass::oop_oop_iterate(oop obj, OopClosureType* closure) { > ? assert (obj->is_array(), "obj must be array"); > ? objArrayOop a = objArrayOop(obj); > > ? if (Devirtualizer::do_metadata(closure)) { > ??? Devirtualizer::do_klass(closure, obj->klass()); > ? } > > ? oop_oop_iterate_elements(a, closure); > } > > And lacking any actual elements, the _element_klass seems the only way > we could reach those classes. do_klass() only fetches the CLD and > marks through that, but that wouldn't reach the element-klass either. > > Do we need something like: > > template > void ObjArrayKlass::oop_oop_iterate(oop obj, OopClosureType* closure) { > ? assert (obj->is_array(), "obj must be array"); > ? objArrayOop a = objArrayOop(obj); > > ? if (Devirtualizer::do_metadata(closure)) { > ??? Devirtualizer::do_klass(closure, obj->klass()); > ??? Devirtualizer::do_klass(closure, a->element_klass()); <-- check > element-klass here? > ? } > > ? oop_oop_iterate_elements(a, closure); > } > > What do you think? They are supposed to be found through: array->objArrayKlass->_class_loader_data->_handles The "handles block" contains a bunch of references to objects that are being kept alive by the class loader. On of those oops should be the java mirror of the element klass. There also seems to be a reference from the mirror to its "component mirror" that gets installed here: java_lang_Class::create_mirror ... ????? // Two-way link between the array klass and its component mirror: ????? // (array_klass) k -> mirror -> component_mirror -> array_klass -> k ????? set_component_mirror(mirror(), comp_mirror()); I think this should make it possible to also find the component mirror from: array->objArrayKlass->_java_mirror->_component_mirror HTH, StefanK > > Thanks, > Roman > From rkennke at redhat.com Mon Feb 8 22:03:54 2021 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 8 Feb 2021 23:03:54 +0100 Subject: Where do obj-array-element-classes get checked for marking? In-Reply-To: References: <00ea5816-9af2-9075-bd0f-13cff9a0cbf2@redhat.com> Message-ID: >> I need your help: >> >> We have a testcase that is failing only very rarely, and hard to >> reproduce, but we have caught a hs_err file: >> >> https://bugs.openjdk.java.net/browse/JDK-8261341 >> >> as far as we can tell, we see an object which has it's Klass* damaged >> because the class has been unloaded earlier. >> >> The testcase generates lots of Class and then >> lots of empty (!) arrays of that type. Which means that the only way >> that the class is referenced is via the element-type of objArrayKlass. >> >> array->objArrayKlass->element_type->_java_mirror >> >> Now I'm wondering how this is supposed to be found during marking. For >> example, the ObjArrayKlass::oop_oop_iterate() only checks the >> array-klass, but not the element-klass: >> >> template >> void ObjArrayKlass::oop_oop_iterate(oop obj, OopClosureType* closure) { >> ? assert (obj->is_array(), "obj must be array"); >> ? objArrayOop a = objArrayOop(obj); >> >> ? if (Devirtualizer::do_metadata(closure)) { >> ??? Devirtualizer::do_klass(closure, obj->klass()); >> ? } >> >> ? oop_oop_iterate_elements(a, closure); >> } >> >> And lacking any actual elements, the _element_klass seems the only way >> we could reach those classes. do_klass() only fetches the CLD and >> marks through that, but that wouldn't reach the element-klass either. >> >> Do we need something like: >> >> template >> void ObjArrayKlass::oop_oop_iterate(oop obj, OopClosureType* closure) { >> ? assert (obj->is_array(), "obj must be array"); >> ? objArrayOop a = objArrayOop(obj); >> >> ? if (Devirtualizer::do_metadata(closure)) { >> ??? Devirtualizer::do_klass(closure, obj->klass()); >> ??? Devirtualizer::do_klass(closure, a->element_klass()); <-- check >> element-klass here? >> ? } >> >> ? oop_oop_iterate_elements(a, closure); >> } >> >> What do you think? > > They are supposed to be found through: > array->objArrayKlass->_class_loader_data->_handles > > The "handles block" contains a bunch of references to objects that are > being kept alive by the class loader. On of those oops should be the > java mirror of the element klass. > > There also seems to be a reference from the mirror to its "component > mirror" that gets installed here: > java_lang_Class::create_mirror > ... > ????? // Two-way link between the array klass and its component mirror: > ????? // (array_klass) k -> mirror -> component_mirror -> array_klass -> k > ????? set_component_mirror(mirror(), comp_mirror()); > > I think this should make it possible to also find the component mirror > from: > array->objArrayKlass->_java_mirror->_component_mirror Ok, thanks! That explains it. Cheers, Roman From kbarrett at openjdk.java.net Mon Feb 8 23:46:01 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Mon, 8 Feb 2021 23:46:01 GMT Subject: RFR: 8260044: Parallel GC: Concurrent allocation after heap expansion may cause unnecessary full gc [v3] In-Reply-To: References: Message-ID: > Please review this change to ParallelGC to avoid unnecessary full GCs when > concurrent threads attempt oldgen allocations during evacuation. > > When a GC thread fails an oldgen allocation it expands the heap and retries > the allocation. If the second allocation attempt fails then allocation > failure is reported to the caller, which can lead to a full GC. But the > retried allocation could fail because, after expansion, some other thread > allocated enough of the available space that the retry fails. This can > happen even though there is plenty of space available, if only that retry > were to perform another expansion. > > Rather than trying to combine the allocation retry with the expansion (it's > not clear there's a way to do so without breaking invariants), we instead > simply loop on the allocation attempt + expand, until either the allocation > succeeds or the expand fails. If some other thread "steals" space from the > expanding thread and causes its next allocation attempt to fail and do > another expansion, that's functionally no different from the expanding > thread succeeding and causing the other thread to fail allocation and do the > expand instead. > > This change includes modifying PSOldGen::expand_to_reserved to return false > when there is no space available, where it previously returned true. It's > not clear why it returned true; that seems wrong, but was harmless. But it > must not do so with the new looping behavior for allocation, else it would > never terminate. > > Testing: > mach5 tier1-3, tier5 (tier2-3, 5 do a lot of ParallelGC testing) Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into retry_alloc - avoid expand storms - Merge branch 'master' into retry_alloc - require non-zero expand size - retry failed allocation if expand succeeds ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2309/files - new: https://git.openjdk.java.net/jdk/pull/2309/files/d67d5e20..72431d39 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2309&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2309&range=01-02 Stats: 30850 lines in 869 files changed: 15950 ins; 10988 del; 3912 mod Patch: https://git.openjdk.java.net/jdk/pull/2309.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2309/head:pull/2309 PR: https://git.openjdk.java.net/jdk/pull/2309 From kbarrett at openjdk.java.net Mon Feb 8 23:46:01 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Mon, 8 Feb 2021 23:46:01 GMT Subject: RFR: 8260044: Parallel GC: Concurrent allocation after heap expansion may cause unnecessary full gc [v2] In-Reply-To: References: Message-ID: <9lEZeYegJZuAopZU3sz-Precwre1lN51Twq220f3-XA=.ebfb6885-652f-41a7-a78a-5b4a3e12e501@github.com> On Mon, 1 Feb 2021 09:55:09 GMT, Stefan Johansson wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> require non-zero expand size > > Looks good! The problem being addressed here is closely related to the "expand storm" problem from JDK-8260045. I thought this one could be addressed separately first, but now think not. Consider if we do an expand with excess here that uses the remainder of the permitted space. If another thread was blocked waiting to expand, its expand attempt will fail. With the old code, there would still be another allocation attempt, but now the failing expand won't do that. Reversing the order of fixes doesn't work very well either, as avoiding the expand storm needs the same sort of infrastructure for retrying a failed allocation after (optional) expansion. New commit "avoid expand storms" adds that fix. It's a little bit kludgy because of JDK-8261284, adding a function to MutableSpace for use only by PSOldGen. It's not the only weird function or behavior in MutableSpace. Testing: mach5 tier1-3, tier5 (tiers with common tests run with ParallelGC) ------------- PR: https://git.openjdk.java.net/jdk/pull/2309 From sjohanss at openjdk.java.net Tue Feb 9 08:19:45 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Tue, 9 Feb 2021 08:19:45 GMT Subject: RFR: 8260044: Parallel GC: Concurrent allocation after heap expansion may cause unnecessary full gc [v3] In-Reply-To: References: Message-ID: On Mon, 8 Feb 2021 23:46:01 GMT, Kim Barrett wrote: >> Please review this change to ParallelGC to avoid unnecessary full GCs when >> concurrent threads attempt oldgen allocations during evacuation. >> >> When a GC thread fails an oldgen allocation it expands the heap and retries >> the allocation. If the second allocation attempt fails then allocation >> failure is reported to the caller, which can lead to a full GC. But the >> retried allocation could fail because, after expansion, some other thread >> allocated enough of the available space that the retry fails. This can >> happen even though there is plenty of space available, if only that retry >> were to perform another expansion. >> >> Rather than trying to combine the allocation retry with the expansion (it's >> not clear there's a way to do so without breaking invariants), we instead >> simply loop on the allocation attempt + expand, until either the allocation >> succeeds or the expand fails. If some other thread "steals" space from the >> expanding thread and causes its next allocation attempt to fail and do >> another expansion, that's functionally no different from the expanding >> thread succeeding and causing the other thread to fail allocation and do the >> expand instead. >> >> This change includes modifying PSOldGen::expand_to_reserved to return false >> when there is no space available, where it previously returned true. It's >> not clear why it returned true; that seems wrong, but was harmless. But it >> must not do so with the new looping behavior for allocation, else it would >> never terminate. >> >> Testing: >> mach5 tier1-3, tier5 (tier2-3, 5 do a lot of ParallelGC testing) > > Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into retry_alloc > - avoid expand storms > - Merge branch 'master' into retry_alloc > - require non-zero expand size > - retry failed allocation if expand succeeds Marked as reviewed by sjohanss (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2309 From sjohanss at openjdk.java.net Tue Feb 9 10:27:30 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Tue, 9 Feb 2021 10:27:30 GMT Subject: RFR: 8261356: Clean up enum G1Mark In-Reply-To: References: Message-ID: On Mon, 8 Feb 2021 18:41:13 GMT, Albert Mingkun Yang wrote: > After removing the effectively dead entry in `G1Mark`, the whole enum could be turned into a bool. The call-chain is updated and existing comments are revised. Nice cleanup. Just a minor nit :) src/hotspot/share/gc/g1/g1RootClosures.cpp line 53: > 51: // The treatment of "weak" roots is selectable through the template parameter, > 52: // this is usually used to control unloading of classes and interned strings. > 53: template I think it could make sense to name this parameter `should_mark_weak` to be more consistent. But if you don't agree just leave it. ------------- Marked as reviewed by sjohanss (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2461 From iwalulya at openjdk.java.net Tue Feb 9 11:03:43 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Tue, 9 Feb 2021 11:03:43 GMT Subject: RFR: 8260044: Parallel GC: Concurrent allocation after heap expansion may cause unnecessary full gc [v3] In-Reply-To: References: Message-ID: On Mon, 8 Feb 2021 23:46:01 GMT, Kim Barrett wrote: >> Please review this change to ParallelGC to avoid unnecessary full GCs when >> concurrent threads attempt oldgen allocations during evacuation. >> >> When a GC thread fails an oldgen allocation it expands the heap and retries >> the allocation. If the second allocation attempt fails then allocation >> failure is reported to the caller, which can lead to a full GC. But the >> retried allocation could fail because, after expansion, some other thread >> allocated enough of the available space that the retry fails. This can >> happen even though there is plenty of space available, if only that retry >> were to perform another expansion. >> >> Rather than trying to combine the allocation retry with the expansion (it's >> not clear there's a way to do so without breaking invariants), we instead >> simply loop on the allocation attempt + expand, until either the allocation >> succeeds or the expand fails. If some other thread "steals" space from the >> expanding thread and causes its next allocation attempt to fail and do >> another expansion, that's functionally no different from the expanding >> thread succeeding and causing the other thread to fail allocation and do the >> expand instead. >> >> This change includes modifying PSOldGen::expand_to_reserved to return false >> when there is no space available, where it previously returned true. It's >> not clear why it returned true; that seems wrong, but was harmless. But it >> must not do so with the new looping behavior for allocation, else it would >> never terminate. >> >> Testing: >> mach5 tier1-3, tier5 (tier2-3, 5 do a lot of ParallelGC testing) > > Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into retry_alloc > - avoid expand storms > - Merge branch 'master' into retry_alloc > - require non-zero expand size > - retry failed allocation if expand succeeds Marked as reviewed by iwalulya (Committer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2309 From rkennke at openjdk.java.net Tue Feb 9 12:03:46 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 9 Feb 2021 12:03:46 GMT Subject: RFR: 8261413: Shenandoah: Disable class-unloading in I-U mode Message-ID: JDK-8261341 describes a serious problem with I-U mode and class-unloading. Let's disable class-unloading in I-U for now as a workaround. Testing: - [ ] hotspot_gc_shenandoah - [ ] tier1 (+UseShenandoahGC +IU) - [x] runtime/CreateMirror/ArraysNewInstanceBug.java (+UseShenandoahGC +IU +aggressive) many times in a row w/o failure ------------- Commit messages: - 8261413: Shenandoah: Disable class-unloading in I-U mode Changes: https://git.openjdk.java.net/jdk/pull/2477/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2477&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261413 Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2477.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2477/head:pull/2477 PR: https://git.openjdk.java.net/jdk/pull/2477 From shade at openjdk.java.net Tue Feb 9 12:41:31 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 9 Feb 2021 12:41:31 GMT Subject: RFR: 8261413: Shenandoah: Disable class-unloading in I-U mode In-Reply-To: References: Message-ID: <8FGC9ajeya9Jh4D4rL7W6kUqiV1m8TBdHvms8JOBLkU=.42a84b4b-08e8-4e6c-8dce-e9660f4506b7@github.com> On Tue, 9 Feb 2021 11:58:58 GMT, Roman Kennke wrote: > JDK-8261341 describes a serious problem with I-U mode and class-unloading. Let's disable class-unloading in I-U for now as a workaround. > > Testing: > - [ ] hotspot_gc_shenandoah > - [ ] tier1 (+UseShenandoahGC +IU) > - [x] runtime/CreateMirror/ArraysNewInstanceBug.java (+UseShenandoahGC +IU +aggressive) many times in a row w/o failure Changes requested by shade (Reviewer). src/hotspot/share/gc/shenandoah/mode/shenandoahIUMode.cpp line 37: > 35: > 36: void ShenandoahIUMode::initialize_flags() const { > 37: // See: https://bugs.openjdk.java.net/browse/JDK-8261341 No need for this comment, as the message prints it out. src/hotspot/share/gc/shenandoah/mode/shenandoahIUMode.cpp line 39: > 37: // See: https://bugs.openjdk.java.net/browse/JDK-8261341 > 38: if (FLAG_IS_CMDLINE(ClassUnloading) && ClassUnloading) { > 39: log_warning(gc)("Shenandoah I-U mode forces -XX:-ClassUnloading, for decails, see https://bugs.openjdk.java.net/browse/JDK-8261341"); "Shenandoah I-U mode sets -XX:-ClassUnloading; see JDK-8261341 for details" ------------- PR: https://git.openjdk.java.net/jdk/pull/2477 From ayang at openjdk.java.net Tue Feb 9 12:45:51 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 9 Feb 2021 12:45:51 GMT Subject: RFR: 8261356: Clean up enum G1Mark [v2] In-Reply-To: References: Message-ID: > After removing the effectively dead entry in `G1Mark`, the whole enum could be turned into a bool. The call-chain is updated and existing comments are revised. Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2461/files - new: https://git.openjdk.java.net/jdk/pull/2461/files/946660de..1a17ae19 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2461&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2461&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2461.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2461/head:pull/2461 PR: https://git.openjdk.java.net/jdk/pull/2461 From ayang at openjdk.java.net Tue Feb 9 12:45:52 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 9 Feb 2021 12:45:52 GMT Subject: RFR: 8261356: Clean up enum G1Mark [v2] In-Reply-To: References: Message-ID: On Tue, 9 Feb 2021 10:18:17 GMT, Stefan Johansson wrote: >> Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > src/hotspot/share/gc/g1/g1RootClosures.cpp line 53: > >> 51: // The treatment of "weak" roots is selectable through the template parameter, >> 52: // this is usually used to control unloading of classes and interned strings. >> 53: template > > I think it could make sense to name this parameter `should_mark_weak` to be more consistent. But if you don't agree just leave it. Fixed. ------------- PR: https://git.openjdk.java.net/jdk/pull/2461 From sjohanss at openjdk.java.net Tue Feb 9 12:58:32 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Tue, 9 Feb 2021 12:58:32 GMT Subject: RFR: 8261356: Clean up enum G1Mark [v2] In-Reply-To: References: Message-ID: On Tue, 9 Feb 2021 12:45:51 GMT, Albert Mingkun Yang wrote: >> After removing the effectively dead entry in `G1Mark`, the whole enum could be turned into a bool. The call-chain is updated and existing comments are revised. > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review src/hotspot/share/gc/g1/g1RootClosures.cpp line 56: > 54: class G1ConcurrentStartMarkClosures : public G1EvacuationRootClosures { > 55: G1SharedClosures _strong; > 56: G1SharedClosures _weak; Sorry for being picky, but now the alignment of the variables are off. I would prefer: Suggestion: G1SharedClosures _strong; G1SharedClosures _weak; ------------- PR: https://git.openjdk.java.net/jdk/pull/2461 From rkennke at openjdk.java.net Tue Feb 9 13:13:43 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 9 Feb 2021 13:13:43 GMT Subject: RFR: 8261413: Shenandoah: Disable class-unloading in I-U mode [v2] In-Reply-To: References: Message-ID: > JDK-8261341 describes a serious problem with I-U mode and class-unloading. Let's disable class-unloading in I-U for now as a workaround. > > Testing: > - [ ] hotspot_gc_shenandoah > - [ ] tier1 (+UseShenandoahGC +IU) > - [x] runtime/CreateMirror/ArraysNewInstanceBug.java (+UseShenandoahGC +IU +aggressive) many times in a row w/o failure Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Some comment and output changes as requested by Aleksey ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2477/files - new: https://git.openjdk.java.net/jdk/pull/2477/files/491c41c1..e3c1b459 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2477&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2477&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2477.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2477/head:pull/2477 PR: https://git.openjdk.java.net/jdk/pull/2477 From shade at openjdk.java.net Tue Feb 9 13:13:43 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 9 Feb 2021 13:13:43 GMT Subject: RFR: 8261413: Shenandoah: Disable class-unloading in I-U mode [v2] In-Reply-To: References: Message-ID: On Tue, 9 Feb 2021 13:10:55 GMT, Roman Kennke wrote: >> JDK-8261341 describes a serious problem with I-U mode and class-unloading. Let's disable class-unloading in I-U for now as a workaround. >> >> Testing: >> - [ ] hotspot_gc_shenandoah >> - [ ] tier1 (+UseShenandoahGC +IU) >> - [x] runtime/CreateMirror/ArraysNewInstanceBug.java (+UseShenandoahGC +IU +aggressive) many times in a row w/o failure > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Some comment and output changes as requested by Aleksey Marked as reviewed by shade (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2477 From zgu at openjdk.java.net Tue Feb 9 13:21:31 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Tue, 9 Feb 2021 13:21:31 GMT Subject: RFR: 8261413: Shenandoah: Disable class-unloading in I-U mode [v2] In-Reply-To: References: Message-ID: On Tue, 9 Feb 2021 13:13:43 GMT, Roman Kennke wrote: >> JDK-8261341 describes a serious problem with I-U mode and class-unloading. Let's disable class-unloading in I-U for now as a workaround. >> >> Testing: >> - [ ] hotspot_gc_shenandoah >> - [ ] tier1 (+UseShenandoahGC +IU) >> - [x] runtime/CreateMirror/ArraysNewInstanceBug.java (+UseShenandoahGC +IU +aggressive) many times in a row w/o failure > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Some comment and output changes as requested by Aleksey Marked as reviewed by zgu (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2477 From tschatzl at openjdk.java.net Tue Feb 9 13:30:11 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 9 Feb 2021 13:30:11 GMT Subject: RFR: 8261356: Clean up enum G1Mark [v2] In-Reply-To: References: Message-ID: On Tue, 9 Feb 2021 12:45:51 GMT, Albert Mingkun Yang wrote: >> After removing the effectively dead entry in `G1Mark`, the whole enum could be turned into a bool. The call-chain is updated and existing comments are revised. > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Lgtm after fixing that indentation issue. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2461 From ayang at openjdk.java.net Tue Feb 9 13:36:58 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 9 Feb 2021 13:36:58 GMT Subject: RFR: 8261356: Clean up enum G1Mark [v3] In-Reply-To: References: Message-ID: <2Yhz6lRmcuo078pqFPOpUCGEMWTuXtVD0Eae5Rdxfto=.4879e088-71be-49e5-89b0-2364c6e42d2d@github.com> > After removing the effectively dead entry in `G1Mark`, the whole enum could be turned into a bool. The call-chain is updated and existing comments are revised. Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/gc/g1/g1RootClosures.cpp Co-authored-by: Stefan Johansson <54407259+kstefanj at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2461/files - new: https://git.openjdk.java.net/jdk/pull/2461/files/1a17ae19..bc69d56f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2461&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2461&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2461.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2461/head:pull/2461 PR: https://git.openjdk.java.net/jdk/pull/2461 From ayang at openjdk.java.net Tue Feb 9 13:39:40 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 9 Feb 2021 13:39:40 GMT Subject: RFR: 8261356: Clean up enum G1Mark [v2] In-Reply-To: References: Message-ID: On Tue, 9 Feb 2021 13:27:48 GMT, Thomas Schatzl wrote: >> Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > Lgtm after fixing that indentation issue. Thank you for the review. PS: Stefan's suggestion comes in a commit-table form that github just offers me a button to click, saving me from edit/commit/push. This is very neat. ------------- PR: https://git.openjdk.java.net/jdk/pull/2461 From rkennke at redhat.com Tue Feb 9 13:54:45 2021 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 9 Feb 2021 14:54:45 +0100 Subject: Conflicting use of StackWatermark in StackWalker vs GC? Message-ID: Hello all, When running StackWalker tests with 'aggressive' Shenandoah mode (i.e. run GCs all the time, even if there is no work), then I observe crashes like this: # Internal Error (/home/rkennke/src/openjdk/jdk/src/hotspot/share/runtime/stackWatermark.cpp:178), pid=549168, tid=549230 # assert(is_frame_safe(f)) failed: Frame must be safe Full hs_err: http://cr.openjdk.java.net/~rkennke/hs_err_pid549168.log I strongly suspect that this is happening because of StackWalker's use of StackWatermark which conflicts with the GC's own use of StackWalker. IOW, it asserts that the frame has been processed, but the GC is still on it. Are we missing some coordination between StackWalker and the GC here? It can be reproduced using: CONF=linux-x86_64-server-fastdebug make run-test TEST=java/lang/StackWalker TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC -XX:ShenandoahGCHeuristics=aggressive" Thanks, Roman From rkennke at redhat.com Tue Feb 9 14:08:53 2021 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 9 Feb 2021 15:08:53 +0100 Subject: Conflicting use of StackWatermark in StackWalker vs GC? In-Reply-To: References: Message-ID: I am getting the same failure with ZGC: CONF=linux-x86_64-server-fastdebug make run-test TEST=java/lang/StackWalker TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+UseZGC -XX:ZCollectionInterval=0.01" > Hello all, > > When running StackWalker tests with 'aggressive' Shenandoah mode (i.e. > run GCs all the time, even if there is no work), then I observe crashes > like this: > > #? Internal Error > (/home/rkennke/src/openjdk/jdk/src/hotspot/share/runtime/stackWatermark.cpp:178), > pid=549168, tid=549230 > #? assert(is_frame_safe(f)) failed: Frame must be safe > > Full hs_err: > http://cr.openjdk.java.net/~rkennke/hs_err_pid549168.log > > I strongly suspect that this is happening because of StackWalker's use > of StackWatermark which conflicts with the GC's own use of StackWalker. > IOW, it asserts that the frame has been processed, but the GC is still > on it. > > Are we missing some coordination between StackWalker and the GC here? > > It can be reproduced using: > CONF=linux-x86_64-server-fastdebug make run-test > TEST=java/lang/StackWalker > TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC > -XX:ShenandoahGCHeuristics=aggressive" > > Thanks, > Roman From stefan.karlsson at oracle.com Tue Feb 9 14:45:52 2021 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Tue, 9 Feb 2021 15:45:52 +0100 Subject: Conflicting use of StackWatermark in StackWalker vs GC? In-Reply-To: References: Message-ID: It's interesting that fetchNextBatch process the entire stack in preparation for filling in the information about the frames: ??? // If we have to get back here for even more frames, then 1) the user did not supply ??? // an accurate hint suggesting the depth of the stack walk, and 2) we are not just ??? // peeking? at a few frames. Take the cost of flushing out any pending deferred GC ??? // processing of the stack. ??? StackWatermarkSet::finish_processing(jt, NULL /* context */, StackWatermarkKind::gc); but further down in fill_in_frames => LiveFrameStream::fill_frame => fill_live_stackframe, we perform object allocation, which could safepoint for a GC that would reset the watermark. After leaving that safepoint we will have processed the top-most frames, but we won't have processed down the the current frame the StackWalker is looking at. This is my guess of what's happening, but I haven't been able to reproduce the problem, so it's a bit hard to verify that this is what's happening. StefanK On 2021-02-09 15:08, Roman Kennke wrote: > I am getting the same failure with ZGC: > > CONF=linux-x86_64-server-fastdebug make run-test > TEST=java/lang/StackWalker > TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+UseZGC > -XX:ZCollectionInterval=0.01" > > >> Hello all, >> >> When running StackWalker tests with 'aggressive' Shenandoah mode >> (i.e. run GCs all the time, even if there is no work), then I observe >> crashes like this: >> >> #? Internal Error >> (/home/rkennke/src/openjdk/jdk/src/hotspot/share/runtime/stackWatermark.cpp:178), >> pid=549168, tid=549230 >> #? assert(is_frame_safe(f)) failed: Frame must be safe >> >> Full hs_err: >> http://cr.openjdk.java.net/~rkennke/hs_err_pid549168.log >> >> I strongly suspect that this is happening because of StackWalker's >> use of StackWatermark which conflicts with the GC's own use of >> StackWalker. IOW, it asserts that the frame has been processed, but >> the GC is still on it. >> >> Are we missing some coordination between StackWalker and the GC here? >> >> It can be reproduced using: >> CONF=linux-x86_64-server-fastdebug make run-test >> TEST=java/lang/StackWalker >> TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC >> -XX:ShenandoahGCHeuristics=aggressive" >> >> Thanks, >> Roman > From rkennke at redhat.com Tue Feb 9 15:08:24 2021 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 9 Feb 2021 16:08:24 +0100 Subject: Conflicting use of StackWatermark in StackWalker vs GC? In-Reply-To: References: Message-ID: Hi Stefan, > It's interesting that fetchNextBatch process the entire stack in > preparation for filling in the information about the frames: > > ??? // If we have to get back here for even more frames, then 1) the > user did not supply > ??? // an accurate hint suggesting the depth of the stack walk, and 2) > we are not just > ??? // peeking? at a few frames. Take the cost of flushing out any > pending deferred GC > ??? // processing of the stack. > ??? StackWatermarkSet::finish_processing(jt, NULL /* context */, > StackWatermarkKind::gc); > > but further down in fill_in_frames => LiveFrameStream::fill_frame => > fill_live_stackframe, we perform object allocation, which could > safepoint for a GC that would reset the watermark. After leaving that > safepoint we will have processed the top-most frames, but we won't have > processed down the the current frame the StackWalker is looking at. This > is my guess of what's happening, but I haven't been able to reproduce > the problem, so it's a bit hard to verify that this is what's happening. That sounds plausible. What would be a way out of this? Scan the stack and collect all relevant information without allocating any Java objects yet, and fill in the Java frames array after the stack scan, maybe? Roman > StefanK > > On 2021-02-09 15:08, Roman Kennke wrote: >> I am getting the same failure with ZGC: >> >> CONF=linux-x86_64-server-fastdebug make run-test >> TEST=java/lang/StackWalker >> TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+UseZGC >> -XX:ZCollectionInterval=0.01" >> >> >>> Hello all, >>> >>> When running StackWalker tests with 'aggressive' Shenandoah mode >>> (i.e. run GCs all the time, even if there is no work), then I observe >>> crashes like this: >>> >>> #? Internal Error >>> (/home/rkennke/src/openjdk/jdk/src/hotspot/share/runtime/stackWatermark.cpp:178), >>> pid=549168, tid=549230 >>> #? assert(is_frame_safe(f)) failed: Frame must be safe >>> >>> Full hs_err: >>> http://cr.openjdk.java.net/~rkennke/hs_err_pid549168.log >>> >>> I strongly suspect that this is happening because of StackWalker's >>> use of StackWatermark which conflicts with the GC's own use of >>> StackWalker. IOW, it asserts that the frame has been processed, but >>> the GC is still on it. >>> >>> Are we missing some coordination between StackWalker and the GC here? >>> >>> It can be reproduced using: >>> CONF=linux-x86_64-server-fastdebug make run-test >>> TEST=java/lang/StackWalker >>> TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC >>> -XX:ShenandoahGCHeuristics=aggressive" >>> >>> Thanks, >>> Roman >> > From sjohanss at openjdk.java.net Tue Feb 9 15:28:38 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Tue, 9 Feb 2021 15:28:38 GMT Subject: RFR: 8261356: Clean up enum G1Mark [v2] In-Reply-To: References: Message-ID: <2_EEtW9nNiIT_VYSfXDE5UgLy_CsiKT_KomO15ZhX2U=.d79bb726-5267-4710-95bb-abe8cfe499c2@github.com> On Tue, 9 Feb 2021 13:37:07 GMT, Albert Mingkun Yang wrote: > Thank you for the review. > > PS: Stefan's suggestion comes in a commit-table form that github just offers me a button to click, saving me from edit/commit/push. This is very neat. I hoped you would make use of that feature =) It really is neat ? Perfect for those small things at the end of a review. ------------- PR: https://git.openjdk.java.net/jdk/pull/2461 From rkennke at redhat.com Tue Feb 9 15:30:23 2021 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 9 Feb 2021 16:30:23 +0100 Subject: Conflicting use of StackWatermark in StackWalker vs GC? In-Reply-To: References: Message-ID: Tracking this here: https://bugs.openjdk.java.net/browse/JDK-8261448 Roman > Hi Stefan, > >> It's interesting that fetchNextBatch process the entire stack in >> preparation for filling in the information about the frames: >> >> ???? // If we have to get back here for even more frames, then 1) the >> user did not supply >> ???? // an accurate hint suggesting the depth of the stack walk, and >> 2) we are not just >> ???? // peeking? at a few frames. Take the cost of flushing out any >> pending deferred GC >> ???? // processing of the stack. >> ???? StackWatermarkSet::finish_processing(jt, NULL /* context */, >> StackWatermarkKind::gc); >> >> but further down in fill_in_frames => LiveFrameStream::fill_frame => >> fill_live_stackframe, we perform object allocation, which could >> safepoint for a GC that would reset the watermark. After leaving that >> safepoint we will have processed the top-most frames, but we won't >> have processed down the the current frame the StackWalker is looking >> at. This is my guess of what's happening, but I haven't been able to >> reproduce the problem, so it's a bit hard to verify that this is >> what's happening. > > That sounds plausible. > > What would be a way out of this? Scan the stack and collect all relevant > information without allocating any Java objects yet, and fill in the > Java frames array after the stack scan, maybe? > > Roman > > >> StefanK >> >> On 2021-02-09 15:08, Roman Kennke wrote: >>> I am getting the same failure with ZGC: >>> >>> CONF=linux-x86_64-server-fastdebug make run-test >>> TEST=java/lang/StackWalker >>> TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+UseZGC >>> -XX:ZCollectionInterval=0.01" >>> >>> >>>> Hello all, >>>> >>>> When running StackWalker tests with 'aggressive' Shenandoah mode >>>> (i.e. run GCs all the time, even if there is no work), then I >>>> observe crashes like this: >>>> >>>> #? Internal Error >>>> (/home/rkennke/src/openjdk/jdk/src/hotspot/share/runtime/stackWatermark.cpp:178), >>>> pid=549168, tid=549230 >>>> #? assert(is_frame_safe(f)) failed: Frame must be safe >>>> >>>> Full hs_err: >>>> http://cr.openjdk.java.net/~rkennke/hs_err_pid549168.log >>>> >>>> I strongly suspect that this is happening because of StackWalker's >>>> use of StackWatermark which conflicts with the GC's own use of >>>> StackWalker. IOW, it asserts that the frame has been processed, but >>>> the GC is still on it. >>>> >>>> Are we missing some coordination between StackWalker and the GC here? >>>> >>>> It can be reproduced using: >>>> CONF=linux-x86_64-server-fastdebug make run-test >>>> TEST=java/lang/StackWalker >>>> TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC >>>> -XX:ShenandoahGCHeuristics=aggressive" >>>> >>>> Thanks, >>>> Roman >>> >> From ayang at openjdk.java.net Tue Feb 9 16:13:35 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 9 Feb 2021 16:13:35 GMT Subject: RFR: 8261356: Clean up enum G1Mark [v2] In-Reply-To: <2_EEtW9nNiIT_VYSfXDE5UgLy_CsiKT_KomO15ZhX2U=.d79bb726-5267-4710-95bb-abe8cfe499c2@github.com> References: <2_EEtW9nNiIT_VYSfXDE5UgLy_CsiKT_KomO15ZhX2U=.d79bb726-5267-4710-95bb-abe8cfe499c2@github.com> Message-ID: On Tue, 9 Feb 2021 15:26:19 GMT, Stefan Johansson wrote: > I hoped you would make use of that feature I didn't know that feature before. Would try using it in future reviews. The CI was green before the last commit (rename only), so I am integrating it without re-running the CI. ------------- PR: https://git.openjdk.java.net/jdk/pull/2461 From jaroslav.bachorik at datadoghq.com Tue Feb 9 16:31:11 2021 From: jaroslav.bachorik at datadoghq.com (=?UTF-8?Q?Jaroslav_Bachor=C3=ADk?=) Date: Tue, 9 Feb 2021 17:31:11 +0100 Subject: Can GC implementations provide a cheap estimation of live set size? Message-ID: Hello, In connection with https://bugs.openjdk.java.net/browse/JDK-8258431 I am trying to figure out whether providing a cheap estimation of live set size is something actually achievable across various GC implementations. What I am looking at is piggy-backing on a concurrent mark task to get the summary size of live objects - using the 'straight-forward' heap-inspection like approach is prohibitively expensive. Thanks and regards, -JB- From ayang at openjdk.java.net Tue Feb 9 17:43:39 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 9 Feb 2021 17:43:39 GMT Subject: Integrated: 8261356: Clean up enum G1Mark In-Reply-To: References: Message-ID: On Mon, 8 Feb 2021 18:41:13 GMT, Albert Mingkun Yang wrote: > After removing the effectively dead entry in `G1Mark`, the whole enum could be turned into a bool. The call-chain is updated and existing comments are revised. This pull request has now been integrated. Changeset: a00b1305 Author: Albert Mingkun Yang Committer: Stefan Johansson URL: https://git.openjdk.java.net/jdk/commit/a00b1305 Stats: 29 lines in 4 files changed: 1 ins; 11 del; 17 mod 8261356: Clean up enum G1Mark Reviewed-by: sjohanss, tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/2461 From sjohanss at openjdk.java.net Tue Feb 9 19:47:49 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Tue, 9 Feb 2021 19:47:49 GMT Subject: RFR: 8261230: GC tracing of page sizes are wrong in a few places Message-ID: The usage of `os::trace_page_sizes()` and friends are wrongly assuming that we always get the page size requested and needs to be updated. This is done by using the helper `ReservedSpace::actual_reserved_page_size()` instead of blindly trusting we get what we ask for. I have plans for the future to get rid of this helper and instead record the page size used in the `ReservedSpace`, but for now the helper is good enough. In G1 we used the helper but switched the order of the page size and the alignment parameter, which in turn helped the test to pass since the alignment will match the page size we expect in the test. The test had to be improved to recognize mapping failures. ------------- Commit messages: - 8261230-test-fix - 8261230: GC tracing of page sizes are wrong in a few places Changes: https://git.openjdk.java.net/jdk/pull/2486/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2486&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261230 Stats: 33 lines in 4 files changed: 26 ins; 1 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/2486.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2486/head:pull/2486 PR: https://git.openjdk.java.net/jdk/pull/2486 From stefan.karlsson at oracle.com Tue Feb 9 22:44:24 2021 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Tue, 9 Feb 2021 23:44:24 +0100 Subject: Conflicting use of StackWatermark in StackWalker vs GC? In-Reply-To: References: Message-ID: <89b7c7e8-ca73-b4b3-ecde-a084a27645ec@oracle.com> On 2021-02-09 16:08, Roman Kennke wrote: > Hi Stefan, > >> It's interesting that fetchNextBatch process the entire stack in >> preparation for filling in the information about the frames: >> >> ???? // If we have to get back here for even more frames, then 1) the >> user did not supply >> ???? // an accurate hint suggesting the depth of the stack walk, and >> 2) we are not just >> ???? // peeking? at a few frames. Take the cost of flushing out any >> pending deferred GC >> ???? // processing of the stack. >> ???? StackWatermarkSet::finish_processing(jt, NULL /* context */, >> StackWatermarkKind::gc); >> >> but further down in fill_in_frames => LiveFrameStream::fill_frame => >> fill_live_stackframe, we perform object allocation, which could >> safepoint for a GC that would reset the watermark. After leaving that >> safepoint we will have processed the top-most frames, but we won't >> have processed down the the current frame the StackWalker is looking >> at. This is my guess of what's happening, but I haven't been able to >> reproduce the problem, so it's a bit hard to verify that this is >> what's happening. > > That sounds plausible. > > What would be a way out of this? Scan the stack and collect all > relevant information without allocating any Java objects yet, and fill > in the Java frames array after the stack scan, maybe? We have a way to deal with similar situations: // Use this class to mark a remote thread you are currently interested // in examining the entire stack, without it slipping into an unprocessed // state at safepoint polls. class KeepStackGCProcessedMark : public StackObj { It installs a link to the other thread, and whenever we hit a safepoint that entire stack is processed. See: void StackWatermark::on_safepoint() { ? start_processing(); ? StackWatermark* linked_watermark = _linked_watermark; ? if (linked_watermark != NULL) { ??? linked_watermark->finish_processing(NULL /* context */); ? } } KeepStackGCProcessedMark isn't reentrant, so we would have to watch out for that. StefanK > > Roman > > >> StefanK >> >> On 2021-02-09 15:08, Roman Kennke wrote: >>> I am getting the same failure with ZGC: >>> >>> CONF=linux-x86_64-server-fastdebug make run-test >>> TEST=java/lang/StackWalker >>> TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+UseZGC >>> -XX:ZCollectionInterval=0.01" >>> >>> >>>> Hello all, >>>> >>>> When running StackWalker tests with 'aggressive' Shenandoah mode >>>> (i.e. run GCs all the time, even if there is no work), then I >>>> observe crashes like this: >>>> >>>> #? Internal Error >>>> (/home/rkennke/src/openjdk/jdk/src/hotspot/share/runtime/stackWatermark.cpp:178), >>>> pid=549168, tid=549230 >>>> #? assert(is_frame_safe(f)) failed: Frame must be safe >>>> >>>> Full hs_err: >>>> http://cr.openjdk.java.net/~rkennke/hs_err_pid549168.log >>>> >>>> I strongly suspect that this is happening because of StackWalker's >>>> use of StackWatermark which conflicts with the GC's own use of >>>> StackWalker. IOW, it asserts that the frame has been processed, but >>>> the GC is still on it. >>>> >>>> Are we missing some coordination between StackWalker and the GC here? >>>> >>>> It can be reproduced using: >>>> CONF=linux-x86_64-server-fastdebug make run-test >>>> TEST=java/lang/StackWalker >>>> TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+UseShenandoahGC >>>> -XX:ShenandoahGCHeuristics=aggressive" >>>> >>>> Thanks, >>>> Roman >>> >> > From rkennke at redhat.com Tue Feb 9 23:23:13 2021 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 10 Feb 2021 00:23:13 +0100 Subject: Conflicting use of StackWatermark in StackWalker vs GC? In-Reply-To: <89b7c7e8-ca73-b4b3-ecde-a084a27645ec@oracle.com> References: <89b7c7e8-ca73-b4b3-ecde-a084a27645ec@oracle.com> Message-ID: <8095f4e3-254d-a953-c4c7-92372adb6c93@redhat.com> >>> It's interesting that fetchNextBatch process the entire stack in >>> preparation for filling in the information about the frames: >>> >>> ???? // If we have to get back here for even more frames, then 1) the >>> user did not supply >>> ???? // an accurate hint suggesting the depth of the stack walk, and >>> 2) we are not just >>> ???? // peeking? at a few frames. Take the cost of flushing out any >>> pending deferred GC >>> ???? // processing of the stack. >>> ???? StackWatermarkSet::finish_processing(jt, NULL /* context */, >>> StackWatermarkKind::gc); >>> >>> but further down in fill_in_frames => LiveFrameStream::fill_frame => >>> fill_live_stackframe, we perform object allocation, which could >>> safepoint for a GC that would reset the watermark. After leaving that >>> safepoint we will have processed the top-most frames, but we won't >>> have processed down the the current frame the StackWalker is looking >>> at. This is my guess of what's happening, but I haven't been able to >>> reproduce the problem, so it's a bit hard to verify that this is >>> what's happening. >> >> That sounds plausible. >> >> What would be a way out of this? Scan the stack and collect all >> relevant information without allocating any Java objects yet, and fill >> in the Java frames array after the stack scan, maybe? > > We have a way to deal with similar situations: > > // Use this class to mark a remote thread you are currently interested > // in examining the entire stack, without it slipping into an unprocessed > // state at safepoint polls. > class KeepStackGCProcessedMark : public StackObj { > > It installs a link to the other thread, and whenever we hit a safepoint > that entire stack is processed. See: > > void StackWatermark::on_safepoint() { > ? start_processing(); > ? StackWatermark* linked_watermark = _linked_watermark; > ? if (linked_watermark != NULL) { > ??? linked_watermark->finish_processing(NULL /* context */); > ? } > } > > KeepStackGCProcessedMark isn't reentrant, so we would have to watch out > for that. Wow, this is very useful! I was almost done with separating stack scanning and setting up the Java stack frame info objects, but using the KeepStackGCProcessedMark it is much simpler: This seems to work perfectly fine and fix the bug: https://gist.github.com/rkennke/553b0ac024d6d094ff0784fa56c85fb0 I'll look at it some more and do more testing, and will file a PR (unless you disagree). Thanks! Roman From ioi.lam at oracle.com Wed Feb 10 06:44:46 2021 From: ioi.lam at oracle.com (Ioi Lam) Date: Tue, 9 Feb 2021 22:44:46 -0800 Subject: Atomic operations: your thoughts are welocme In-Reply-To: References: Message-ID: Just curious, which benchmark is this? Thanks - Ioi On 2/8/21 10:14 AM, Andrew Haley wrote: > I've been looking at the hottest Atomic operations in HotSpot, with a view to > finding out if the default memory_order_conservative (which is very expensive > on some architectures) can be weakened to something less. It's impossible to > fix all of them, but perhaps we can fix some of the most frequent. > > These are the hottest compare-and-swap uses in HotSpot, with the count > at the end of each line. > > : :: = 16406757 > > This one is already memory_order_relaxed, so no problem. > > ::Table::oop_oop_iterate(G1CMOopClosure*, oopDesc*, Klass*)+336>: :: = 3903178 > > This is actually MarkBitMap::par_mark calling BitMap::par_set_bit. Does this > need to be memory_order_conservative, or would something weaker do? Even > acq_rel or seq_cst would be better. > > : :: = 2376632 > : :: = 2003895 > > I can't imagine that either of these actually need memory_order_conservative, > they're just reference counts. > > : :: = 1719614 > > BitMap::par_set_bit again. > > , (MEMFLAGS)5>*)+432>: :: = 1617659 > > This one is GenericTaskQueue::pop_global calling cmpxchg_age(). > Again, do we need conservative here? > > There is, I suppose, always a possibility that some code somewhere is taking > advantage of the memory serializing properties of adjusting refcounts, I suppose. > > Thanks, > From stefan.karlsson at oracle.com Wed Feb 10 08:04:23 2021 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Wed, 10 Feb 2021 09:04:23 +0100 Subject: Conflicting use of StackWatermark in StackWalker vs GC? In-Reply-To: <8095f4e3-254d-a953-c4c7-92372adb6c93@redhat.com> References: <89b7c7e8-ca73-b4b3-ecde-a084a27645ec@oracle.com> <8095f4e3-254d-a953-c4c7-92372adb6c93@redhat.com> Message-ID: On 2021-02-10 00:23, Roman Kennke wrote: > > >>>> It's interesting that fetchNextBatch process the entire stack in >>>> preparation for filling in the information about the frames: >>>> >>>> ???? // If we have to get back here for even more frames, then 1) >>>> the user did not supply >>>> ???? // an accurate hint suggesting the depth of the stack walk, >>>> and 2) we are not just >>>> ???? // peeking? at a few frames. Take the cost of flushing out any >>>> pending deferred GC >>>> ???? // processing of the stack. >>>> ???? StackWatermarkSet::finish_processing(jt, NULL /* context */, >>>> StackWatermarkKind::gc); >>>> >>>> but further down in fill_in_frames => LiveFrameStream::fill_frame >>>> => fill_live_stackframe, we perform object allocation, which could >>>> safepoint for a GC that would reset the watermark. After leaving >>>> that safepoint we will have processed the top-most frames, but we >>>> won't have processed down the the current frame the StackWalker is >>>> looking at. This is my guess of what's happening, but I haven't >>>> been able to reproduce the problem, so it's a bit hard to verify >>>> that this is what's happening. >>> >>> That sounds plausible. >>> >>> What would be a way out of this? Scan the stack and collect all >>> relevant information without allocating any Java objects yet, and >>> fill in the Java frames array after the stack scan, maybe? >> >> We have a way to deal with similar situations: >> >> // Use this class to mark a remote thread you are currently interested >> // in examining the entire stack, without it slipping into an >> unprocessed >> // state at safepoint polls. >> class KeepStackGCProcessedMark : public StackObj { >> >> It installs a link to the other thread, and whenever we hit a >> safepoint that entire stack is processed. See: >> >> void StackWatermark::on_safepoint() { >> ?? start_processing(); >> ?? StackWatermark* linked_watermark = _linked_watermark; >> ?? if (linked_watermark != NULL) { >> ???? linked_watermark->finish_processing(NULL /* context */); >> ?? } >> } >> >> KeepStackGCProcessedMark isn't reentrant, so we would have to watch >> out for that. > > Wow, this is very useful! I was almost done with separating stack > scanning and setting up the Java stack frame info objects, but using > the KeepStackGCProcessedMark it is much simpler: > > This seems to work perfectly fine and fix the bug: > > https://urldefense.com/v3/__https://gist.github.com/rkennke/553b0ac024d6d094ff0784fa56c85fb0__;!!GqivPVa7Brio!JmXVRlaquMtM5x6DZKv2vBX0ldOTtH_YglZnpL0ogEw1DmsUfW9yl1toV-H2Zfro0ZLj$ > > I'll look at it some more and do more testing, and will file a PR > (unless you disagree). Yes, I think that looks good. I haven't looked too closely at StackWalk::fetchFirstBatch, but that one might have to be handled as well. Would be good to get a second opinion from Erik. Thanks, StefanK > > Thanks! > Roman > From rkennke at openjdk.java.net Wed Feb 10 10:12:49 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 10 Feb 2021 10:12:49 GMT Subject: RFR: 8261448: Preserve GC stack watermark across safepoints in StackWalk Message-ID: I am observing the following assert: # Internal Error (/home/rkennke/src/openjdk/loom/src/hotspot/share/runtime/stackWatermark.cpp:178), pid=54418, tid=54534 # assert(is_frame_safe(f)) failed: Frame must be safe (see issue for full hs_err) In StackWalk::fetchNextBatch() we prepare the entire stack to be processed by calling StackWatermarkSet::finish_processing(jt, NULL, StackWatermarkKind::gc), but then subsequently, during frames scan, perform allocations to fill in the frame information (fill_in_frames => LiveFrameStream::fill_frame => fill_live_stackframe) at where we could safepoint for GC, which could reset the stack watermark. This is only relevant for GCs that use the StackWatermark, e.g. ZGC and Shenandoah at the moment. Solution is to preserve the stack-watermark across safepoints in StackWalk::fetchNextBatch(). StackWalk::fetchFirstBatch() doesn't look to be affected by this: it is not using the stack-watermark. Testing: - [x] StackWalk tests with Shenandoah/aggressive - [x] StackWalk tests with ZGC/aggressive - [ ] tier1 (+Shenandoah/ZGC) - [ ] tier2 (+Shenandoah/ZGC) ------------- Commit messages: - 8261448: Preserve GC stack watermark across safepoints in StackWalk Changes: https://git.openjdk.java.net/jdk/pull/2500/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2500&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261448 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2500.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2500/head:pull/2500 PR: https://git.openjdk.java.net/jdk/pull/2500 From shade at openjdk.java.net Wed Feb 10 10:13:54 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 10 Feb 2021 10:13:54 GMT Subject: RFR: 8261493: Shenandoah: reconsider bitmap access memory ordering Message-ID: Shenandoah currently uses its own marking bitmap (added by JDK-8254315). It accesses the marking bitmap with "acquire" for reads and "conservative" for updates. Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. This seems to be excessive for Shenandoah marking bitmap updates, and "release" is enough. I think both are actually excessive for marking bitmap accesses: we do not piggyback object updates on it, the atomics there are only to guarantee the access atomicity and CAS updates to bits. So we might as well use "relaxed" modes for both loads and updates. Sample run with aggressive (back-to-back cycles) on SPECjvm2008:compiler.compiler on AArch64: # Baseline [135.357s][info][gc,stats] Concurrent Marking = 38.795 s (a = 146951 us) (n = 264) (lvls, us = 172, 1719, 150391, 275391, 348305) # Patched [130.475s][info][gc,stats] Concurrent Marking = 34.874 s (a = 120672 us) (n = 289) (lvls, us = 178, 1777, 132812, 222656, 323957) Average time goes down, the number of GC cycles go up, since the cycles are shorter. Additional testing: - [x] Linux x86_64 `hotspot_gc_shenandoah` - [x] Linux AArch64 `hotspot_gc_shenandoah` - [x] Linux AArch64 `tier1` with Shenandoah ------------- Commit messages: - 8261493: Shenandoah: reconsider bitmap access memory ordering Changes: https://git.openjdk.java.net/jdk/pull/2497/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2497&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261493 Stats: 18 lines in 2 files changed: 0 ins; 14 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/2497.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2497/head:pull/2497 PR: https://git.openjdk.java.net/jdk/pull/2497 From rkennke at openjdk.java.net Wed Feb 10 10:27:39 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 10 Feb 2021 10:27:39 GMT Subject: RFR: 8261493: Shenandoah: reconsider bitmap access memory ordering In-Reply-To: References: Message-ID: On Wed, 10 Feb 2021 09:32:18 GMT, Aleksey Shipilev wrote: > Shenandoah currently uses its own marking bitmap (added by JDK-8254315). It accesses the marking bitmap with "acquire" for reads and "conservative" for updates. Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. This seems to be excessive for Shenandoah marking bitmap updates, and "release" is enough. > > I think both are actually excessive for marking bitmap accesses: we do not piggyback object updates on it, the atomics there are only to guarantee the access atomicity and CAS updates to bits. So we might as well use "relaxed" modes for both loads and updates. > > Sample run with aggressive (back-to-back cycles) on SPECjvm2008:compiler.compiler on AArch64: > > # Baseline > [135.357s][info][gc,stats] Concurrent Marking = 38.795 s (a = 146951 us) (n = 264) > (lvls, us = 172, 1719, 150391, 275391, 348305) > > # Patched > [130.475s][info][gc,stats] Concurrent Marking = 34.874 s (a = 120672 us) (n = 289) > (lvls, us = 178, 1777, 132812, 222656, 323957) > > Average time goes down, the number of GC cycles go up, since the cycles are shorter. > > Additional testing: > - [x] Linux x86_64 `hotspot_gc_shenandoah` > - [x] Linux AArch64 `hotspot_gc_shenandoah` > - [x] Linux AArch64 `tier1` with Shenandoah Nice improvement! I think that makes sense. Patch looks good to me! ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2497 From rkennke at redhat.com Wed Feb 10 10:34:18 2021 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 10 Feb 2021 11:34:18 +0100 Subject: Can GC implementations provide a cheap estimation of live set size? In-Reply-To: References: Message-ID: Hello Jaroslav, > In connection with https://bugs.openjdk.java.net/browse/JDK-8258431 I > am trying to figure out whether providing a cheap estimation of live > set size is something actually achievable across various GC > implementations. > > What I am looking at is piggy-backing on a concurrent mark task to get > the summary size of live objects - using the 'straight-forward' > heap-inspection like approach is prohibitively expensive. In Shenandoah, this information is already collected during concurrent marking. We currently don't print it directly, but we could certainly do that. I'll look into implementing it. I'll also look into exposing liveness info via JMX. I'm not quite sure about G1: that information would only be collected during mixed or full collections. I am not sure if G1 prints it, though. ZGC prints this under -Xlog:gc+heap: [6,502s][info][gc,heap ] GC(0) Mark Start Mark End Relocate Start Relocate End High Low [6,502s][info][gc,heap ] GC(0) Capacity: 834M (10%) 1076M (13%) 1092M (14%) 1092M (14%) 1092M (14%) 834M (10%) [6,502s][info][gc,heap ] GC(0) Free: 7154M (90%) 6912M (87%) 6916M (87%) 7388M (92%) 7388M (92%) 6896M (86%) [6,502s][info][gc,heap ] GC(0) Used: 834M (10%) 1076M (13%) 1072M (13%) 600M (8%) 1092M (14%) 600M (8%) [6,502s][info][gc,heap ] GC(0) Live: - 195M (2%) 195M (2%) 195M (2%) - - [6,502s][info][gc,heap ] GC(0) Allocated: - 242M (3%) 270M (3%) 380M (5%) - - [6,502s][info][gc,heap ] GC(0) Garbage: - 638M (8%) 606M (8%) 24M (0%) - - [6,502s][info][gc,heap ] GC(0) Reclaimed: - - 32M (0%) 614M (8%) - - I hope that is useful? Thanks, Roman From cgo at openjdk.java.net Wed Feb 10 12:17:43 2021 From: cgo at openjdk.java.net (Christoph =?UTF-8?B?R8O2dHRzY2hrZXM=?=) Date: Wed, 10 Feb 2021 12:17:43 GMT Subject: RFR: 8261505: Test test/hotspot/jtreg/gc/parallel/TestDynShrinkHeap.java killed by Linux OOM Killer Message-ID: <3qpvJUngDwJhM4n1g-8LcHrKmxzS45welhYUokK9u9o=.1b0db34b-c388-4bea-9f59-2e82dc4e36ce@github.com> On memory constrained devices, the test might get killed by the linux kernel OOM Killer. Executing the test with the JTreg test harness makes the test fail and get killed by the OOM Killer. Executing the test manually, by using the JTreg provided "rerun" command line, the test succeeds. This happened on a Raspberry PI 2, which has only 1G of memory available. I added an "os.maxMemory" requirement, so the test gets skipped. ------------- Commit messages: - Adds os.maxMemory requirement. Changes: https://git.openjdk.java.net/jdk/pull/2507/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2507&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261505 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2507.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2507/head:pull/2507 PR: https://git.openjdk.java.net/jdk/pull/2507 From rkennke at openjdk.java.net Wed Feb 10 12:40:38 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 10 Feb 2021 12:40:38 GMT Subject: RFR: 8261448: Preserve GC stack watermark across safepoints in StackWalk In-Reply-To: References: Message-ID: <2X3mb-VkqGf_YYSIeb3n9pxXmocT1GkUYDYI_C8cOZo=.3f2fab17-f8f6-4860-a6b4-0a6bb6a1256f@github.com> On Wed, 10 Feb 2021 10:07:20 GMT, Roman Kennke wrote: > I am observing the following assert: > > # Internal Error (/home/rkennke/src/openjdk/loom/src/hotspot/share/runtime/stackWatermark.cpp:178), pid=54418, tid=54534 > # assert(is_frame_safe(f)) failed: Frame must be safe > > (see issue for full hs_err) > > In StackWalk::fetchNextBatch() we prepare the entire stack to be processed by calling StackWatermarkSet::finish_processing(jt, NULL, StackWatermarkKind::gc), but then subsequently, during frames scan, perform allocations to fill in the frame information (fill_in_frames => LiveFrameStream::fill_frame => fill_live_stackframe) at where we could safepoint for GC, which could reset the stack watermark. > > This is only relevant for GCs that use the StackWatermark, e.g. ZGC and Shenandoah at the moment. > > Solution is to preserve the stack-watermark across safepoints in StackWalk::fetchNextBatch(). StackWalk::fetchFirstBatch() doesn't look to be affected by this: it is not using the stack-watermark. > > Testing: > - [x] StackWalk tests with Shenandoah/aggressive > - [x] StackWalk tests with ZGC/aggressive > - [ ] tier1 (+Shenandoah/ZGC) > - [ ] tier2 (+Shenandoah/ZGC) I'm converting back to draft. The Loom tests (test/jdk/java/lang/Continuation/*) are still failing and it looks like fetchFirstBatch() does indeed require treatment, and it's complicated because fetchFirstBatch() may end up calling fetchNextBatch() and the KeepStackGCProcessedMark is not reentrant. ------------- PR: https://git.openjdk.java.net/jdk/pull/2500 From shade at openjdk.java.net Wed Feb 10 12:42:45 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 10 Feb 2021 12:42:45 GMT Subject: RFR: 8261495: Shenandoah: reconsider update references memory ordering Message-ID: <4RLKvcdaWu0Cu6owC3yGoVY1KVEsYjBZEFJhfdwnhWg=.65fbeae1-58f6-48d3-a2ed-981858ef7da9@github.com> Shenandoah update heap references code uses default Atomic::cmpxchg to avoid races with mutator updates. Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. This seems to be excessive for Shenandoah update references code, and "relaxed" is enough. We do not seem to piggyback on update-references memory effects anywhere (in fact, if not for mutator, we would not even need a CAS). Sample run with aggressive (back-to-back cycles) on SPECjvm2008:compiler.compiler on AArch64: # Baseline [135.065s][info][gc,stats] Concurrent Update Refs = 73.685 s (a = 295924 us) (n = 249) (lvls, us = 354, 3418, 349609, 564453, 715405) # Patched [127.649s][info][gc,stats] Concurrent Update Refs = 54.389 s (a = 169437 us) (n = 321) (lvls, us = 324, 2188, 183594, 322266, 394495) Average time goes down, the number of GC cycles go up, since the cycles are shorter. Additional testing: - [x] Linux x86_64 hotspot_gc_shenandoah - [x] Linux AArch64 hotspot_gc_shenandoah - [x] Linux AArch64 tier1 with Shenandoah ------------- Commit messages: - 8261495: Shenandoah: reconsider update references memory ordering Changes: https://git.openjdk.java.net/jdk/pull/2498/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2498&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261495 Stats: 15 lines in 5 files changed: 0 ins; 0 del; 15 mod Patch: https://git.openjdk.java.net/jdk/pull/2498.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2498/head:pull/2498 PR: https://git.openjdk.java.net/jdk/pull/2498 From zgu at openjdk.java.net Wed Feb 10 13:11:52 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 10 Feb 2021 13:11:52 GMT Subject: RFR: 8261473: Shenandoah: Add breakpoint suppoprt Message-ID: Please review this patch that adds breakpoint support for Shenandoah, that allows Shenandoah to access a few tests: gc/TestConcurrentGCBreakpoints.java gc/TestJNIWeak/TestJNIWeak.java gc/TestReferenceClearDuringMarking.java gc/TestReferenceClearDuringReferenceProcessing.java gc/TestReferenceRefersTo.java The drawback is that above tests can not run with passive mode, which can result tests to hang, as breakpoints only apply to concurrent GC. Test: - [x] hotspot_gc_shenandoah - [x] tier1 with Shenandoah ------------- Commit messages: - update - init update Changes: https://git.openjdk.java.net/jdk/pull/2489/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2489&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261473 Stats: 170 lines in 8 files changed: 158 ins; 2 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/2489.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2489/head:pull/2489 PR: https://git.openjdk.java.net/jdk/pull/2489 From zgu at openjdk.java.net Wed Feb 10 13:40:39 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 10 Feb 2021 13:40:39 GMT Subject: RFR: 8261495: Shenandoah: reconsider update references memory ordering In-Reply-To: <4RLKvcdaWu0Cu6owC3yGoVY1KVEsYjBZEFJhfdwnhWg=.65fbeae1-58f6-48d3-a2ed-981858ef7da9@github.com> References: <4RLKvcdaWu0Cu6owC3yGoVY1KVEsYjBZEFJhfdwnhWg=.65fbeae1-58f6-48d3-a2ed-981858ef7da9@github.com> Message-ID: On Wed, 10 Feb 2021 09:52:11 GMT, Aleksey Shipilev wrote: > Shenandoah update heap references code uses default Atomic::cmpxchg to avoid races with mutator updates. Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. > > This seems to be excessive for Shenandoah update references code, and "relaxed" is enough. We do not seem to piggyback on update-references memory effects anywhere (in fact, if not for mutator, we would not even need a CAS). > > Sample run with aggressive (back-to-back cycles) on SPECjvm2008:compiler.compiler on AArch64: > > # Baseline > [135.065s][info][gc,stats] Concurrent Update Refs = 73.685 s (a = 295924 us) (n = 249) > (lvls, us = 354, 3418, 349609, 564453, 715405) > > # Patched > [127.649s][info][gc,stats] Concurrent Update Refs = 54.389 s (a = 169437 us) (n = 321) > (lvls, us = 324, 2188, 183594, 322266, 394495) > > Average time goes down, the number of GC cycles go up, since the cycles are shorter. > > Additional testing: > - [x] Linux x86_64 hotspot_gc_shenandoah > - [x] Linux AArch64 hotspot_gc_shenandoah > - [x] Linux AArch64 tier1 with Shenandoah src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp line 149: > 147: assert(is_aligned(addr, sizeof(narrowOop)), "Address should be aligned: " PTR_FORMAT, p2i(addr)); > 148: narrowOop val = CompressedOops::encode(n); > 149: return CompressedOops::decode(Atomic::cmpxchg(addr, c, val, memory_order_relaxed)); Are you sure it is sufficient? I would think it needs acq/rel pair, otherwise, read side can see incomplete oop ... ------------- PR: https://git.openjdk.java.net/jdk/pull/2498 From shade at openjdk.java.net Wed Feb 10 15:13:47 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 10 Feb 2021 15:13:47 GMT Subject: RFR: 8261503: Shenandoah: reconsider verifier memory ordering Message-ID: Shenandoah verifier uses lots of atomic operations. Unfortunately, Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. In most cases, that is excessive for verifier, and "relaxed" would do. Additional testing: - [x] Linux x86_64 hotspot_gc_shenandoah - [x] Linux AArch64 hotspot_gc_shenandoah - [x] Linux AArch64 tier1 with Shenandoah ------------- Commit messages: - 8261503: Shenandoah: reconsider verifier memory ordering Changes: https://git.openjdk.java.net/jdk/pull/2505/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2505&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261503 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/2505.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2505/head:pull/2505 PR: https://git.openjdk.java.net/jdk/pull/2505 From shade at openjdk.java.net Wed Feb 10 15:14:47 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 10 Feb 2021 15:14:47 GMT Subject: RFR: 8261496: Shenandoah: reconsider pacing updates memory ordering Message-ID: <_BlnOgWoSTjE1myt9WfuiZpM9hiIP7sGp38IJmzuyYg=.8a578dda-dbf7-4780-bc74-cf3710609005@github.com> Shenandoah pacer uses atomic operations to update budget, progress, allocations seen. Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. This is excessive for pacing, as we do not piggyback memory effects on it. All pacing updates can use "relaxed". Additional testing: - [x] Linux x86_64 hotspot_gc_shenandoah - [x] Linux AArch64 hotspot_gc_shenandoah - [x] Linux AArch64 tier1 with Shenandoah ------------- Commit messages: - 8261496: Shenandoah: reconsider pacing updates memory ordering Changes: https://git.openjdk.java.net/jdk/pull/2501/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2501&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261496 Stats: 7 lines in 3 files changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/2501.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2501/head:pull/2501 PR: https://git.openjdk.java.net/jdk/pull/2501 From zgu at openjdk.java.net Wed Feb 10 15:25:39 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 10 Feb 2021 15:25:39 GMT Subject: RFR: 8261496: Shenandoah: reconsider pacing updates memory ordering In-Reply-To: <_BlnOgWoSTjE1myt9WfuiZpM9hiIP7sGp38IJmzuyYg=.8a578dda-dbf7-4780-bc74-cf3710609005@github.com> References: <_BlnOgWoSTjE1myt9WfuiZpM9hiIP7sGp38IJmzuyYg=.8a578dda-dbf7-4780-bc74-cf3710609005@github.com> Message-ID: On Wed, 10 Feb 2021 10:13:47 GMT, Aleksey Shipilev wrote: > Shenandoah pacer uses atomic operations to update budget, progress, allocations seen. Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. > > This is excessive for pacing, as we do not piggyback memory effects on it. All pacing updates can use "relaxed". > > Additional testing: > - [x] Linux x86_64 hotspot_gc_shenandoah > - [x] Linux AArch64 hotspot_gc_shenandoah > - [x] Linux AArch64 tier1 with Shenandoah Looks good to me ------------- Marked as reviewed by zgu (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2501 From zgu at openjdk.java.net Wed Feb 10 15:27:44 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 10 Feb 2021 15:27:44 GMT Subject: RFR: 8261503: Shenandoah: reconsider verifier memory ordering In-Reply-To: References: Message-ID: On Wed, 10 Feb 2021 11:41:45 GMT, Aleksey Shipilev wrote: > Shenandoah verifier uses lots of atomic operations. Unfortunately, Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. > > In most cases, that is excessive for verifier, and "relaxed" would do. > > Additional testing: > - [x] Linux x86_64 hotspot_gc_shenandoah > - [x] Linux AArch64 hotspot_gc_shenandoah > - [x] Linux AArch64 tier1 with Shenandoah Looks good. ------------- Marked as reviewed by zgu (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2505 From shade at openjdk.java.net Wed Feb 10 15:28:39 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 10 Feb 2021 15:28:39 GMT Subject: RFR: 8261495: Shenandoah: reconsider update references memory ordering In-Reply-To: References: <4RLKvcdaWu0Cu6owC3yGoVY1KVEsYjBZEFJhfdwnhWg=.65fbeae1-58f6-48d3-a2ed-981858ef7da9@github.com> Message-ID: <-XbC4UcEc8lhp2-6w1hq2sOHrX2R-x7nfdgMuUWTxwg=.b38923c7-ca15-4f17-804d-e44942f71621@github.com> On Wed, 10 Feb 2021 13:37:59 GMT, Zhengyu Gu wrote: >> Shenandoah update heap references code uses default Atomic::cmpxchg to avoid races with mutator updates. Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. >> >> This seems to be excessive for Shenandoah update references code, and "relaxed" is enough. We do not seem to piggyback on update-references memory effects anywhere (in fact, if not for mutator, we would not even need a CAS). >> >> Sample run with aggressive (back-to-back cycles) on SPECjvm2008:compiler.compiler on AArch64: >> >> # Baseline >> [135.065s][info][gc,stats] Concurrent Update Refs = 73.685 s (a = 295924 us) (n = 249) >> (lvls, us = 354, 3418, 349609, 564453, 715405) >> >> # Patched >> [127.649s][info][gc,stats] Concurrent Update Refs = 54.389 s (a = 169437 us) (n = 321) >> (lvls, us = 324, 2188, 183594, 322266, 394495) >> >> Average time goes down, the number of GC cycles go up, since the cycles are shorter. >> >> Additional testing: >> - [x] Linux x86_64 hotspot_gc_shenandoah >> - [x] Linux AArch64 hotspot_gc_shenandoah >> - [x] Linux AArch64 tier1 with Shenandoah > > src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp line 149: > >> 147: assert(is_aligned(addr, sizeof(narrowOop)), "Address should be aligned: " PTR_FORMAT, p2i(addr)); >> 148: narrowOop val = CompressedOops::encode(n); >> 149: return CompressedOops::decode(Atomic::cmpxchg(addr, c, val, memory_order_relaxed)); > > Are you sure it is sufficient? I would think it needs acq/rel pair, otherwise, read side can see incomplete oop ... Actually, I think you are right: we must ensure the cumulativity of the barriers. Let me think a bit more about it. ------------- PR: https://git.openjdk.java.net/jdk/pull/2498 From aph at redhat.com Wed Feb 10 16:07:16 2021 From: aph at redhat.com (Andrew Haley) Date: Wed, 10 Feb 2021 16:07:16 +0000 Subject: Atomic operations: your thoughts are welocme In-Reply-To: References: Message-ID: Oh, sorry. This is my favourite benchmark, javac all of java.base. I'm mostly using that because it's easy to run without any external dependencies, and it loads a lot of classes. It's no better or worse than any other random program. On 2/10/21 6:44 AM, Ioi Lam wrote: > Just curious, which benchmark is this? > > Thanks > - Ioi > > On 2/8/21 10:14 AM, Andrew Haley wrote: >> I've been looking at the hottest Atomic operations in HotSpot, with a view to >> finding out if the default memory_order_conservative (which is very expensive >> on some architectures) can be weakened to something less. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From shade at openjdk.java.net Wed Feb 10 16:14:45 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 10 Feb 2021 16:14:45 GMT Subject: RFR: 8261501: Shenandoah: reconsider heap statistics memory ordering Message-ID: <0O1tXXs991770rhrpYioXIWr6m-OhDFMZINDiQ_UXc4=.92460035-468e-4bf5-97cb-bff58d1a2ede@github.com> ShenandoahHeap collects heap-wide statistics (used, committed, etc). It does so by atomically updating them with default CASes. Unfortunately, Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. This is excessive for statistics gathering, and "relaxed" should be just as good. Additional testing: - [x] Linux x86_64 hotspot_gc_shenandoah - [x] Linux AArch64 hotspot_gc_shenandoah - [x] Linux AArch64 tier1 with Shenandoah ------------- Commit messages: - 8261501: Shenandoah: reconsider heap statistics memory ordering Changes: https://git.openjdk.java.net/jdk/pull/2504/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2504&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261501 Stats: 9 lines in 1 file changed: 0 ins; 1 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/2504.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2504/head:pull/2504 PR: https://git.openjdk.java.net/jdk/pull/2504 From shade at openjdk.java.net Wed Feb 10 17:43:47 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 10 Feb 2021 17:43:47 GMT Subject: RFR: 8261500: Shenandoah: reconsider region live data memory ordering Message-ID: Current Shenandoah region live data tracking uses default CAS updates to achieve atomicity of updates. Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. This seems to be excessive for live data tracking, and "relaxed" could be used instead. The only serious user of that data is collection set chooser, which runs at safepoint and so everything should be quiescent when that happens. Additional testing: - [x] Linux x86_64 hotspot_gc_shenandoah - [x] Linux AArch64 hotspot_gc_shenandoah - [x] Linux AArch64 tier1 with Shenandoah ------------- Commit messages: - 8261500: Shenandoah: reconsider region live data memory ordering Changes: https://git.openjdk.java.net/jdk/pull/2503/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2503&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261500 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/2503.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2503/head:pull/2503 PR: https://git.openjdk.java.net/jdk/pull/2503 From kevinw at openjdk.java.net Wed Feb 10 17:56:39 2021 From: kevinw at openjdk.java.net (Kevin Walls) Date: Wed, 10 Feb 2021 17:56:39 GMT Subject: RFR: 8247514: Improve clhsdb 'findpc' ability to determine what an address points to by improving PointerFinder and PointerLocation classes In-Reply-To: <4YKNpyXQ9QGrLhR61tkh71Q3A7VvCj5Ete_4OvzAA-o=.28b7be8c-6f05-42d4-892b-87ebea907b24@github.com> References: <4YKNpyXQ9QGrLhR61tkh71Q3A7VvCj5Ete_4OvzAA-o=.28b7be8c-6f05-42d4-892b-87ebea907b24@github.com> Message-ID: On Sun, 17 Jan 2021 03:57:59 GMT, Chris Plummer wrote: > See the bug for most details. A few notes here about some implementation details: > > In the `PointerLocation` class, I added more consistency w.r.t. whether or not a newline is printed. It used to for some address types, but not others. Now it always does. And if you see a comment something like the following: > > ` getTLAB().printOn(tty); // includes "\n" ` > > That's just clarifying whether or not the `printOn()` method called will include the newline. Some do and some don't, and knowing what the various `printOn()` methods do makes getting the proper inclusion of the newline easier to understand. > > I added `verbose` and `printAddress` boolean arguments to `PointerLocation.printOn()`. Currently they are always `true`. The false arguments will be used when I complete [JDK-8250801](https://bugs.openjdk.java.net/browse/JDK-8250801), which will use `PointerFinder/Location` to show what each register points to. > > The CR mentions that the main motivation for this work is for eventual replacement of the old clhsdb `whatis` command, which was implemented in javascript. It used to resolve DSO symbols, whereas `findpc` did not. The `whatis` code did this with the following: > > var dso = loadObjectContainingPC(addr); > if (dso == null) { > return ptrLoc.toString(); > } > var sym = dso.closestSymbolToPC(addr); > if (sym != null) { > return sym.name + '+' + sym.offset; > } > And now you'll see something similar in the PointerFinder code: > > loc.loadObject = cdbg.loadObjectContainingPC(a); > if (loc.loadObject != null) { > loc.nativeSymbol = loc.loadObject.closestSymbolToPC(a); > return loc; > } > Note that now that `findpc` does everything that `whatis` used to (and more), we don't really need to add a java version of `whatis`, but I'll probably do so anyway just help out people who are used to using the `whatis` command. That will be done using [JDK-8244670](https://bugs.openjdk.java.net/browse/JDK-8244670) Looks good, thanks. (Comment in PointerLocation.java, treat as you see fit.) src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/utilities/PointerLocation.java line 247: > 245: stackThread.getStackBase(), stackThread.lastSPDbg(), > 246: stackThread.getStackBase().addOffsetTo(-stackThread.getStackSize()), > 247: stackThread); When we print a JavaThread, in the verbose block, the final argument to tty.format in line 247, I wonder what that prints? We then call printThreadInfoOn() which will first print the quoted thread name, so maybe we don't need that item. Or maybe we want the JavaThread.toString()? ------------- Marked as reviewed by kevinw (Committer). PR: https://git.openjdk.java.net/jdk/pull/2111 From zgu at openjdk.java.net Wed Feb 10 17:58:39 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 10 Feb 2021 17:58:39 GMT Subject: RFR: 8261500: Shenandoah: reconsider region live data memory ordering In-Reply-To: References: Message-ID: On Wed, 10 Feb 2021 10:40:26 GMT, Aleksey Shipilev wrote: > Current Shenandoah region live data tracking uses default CAS updates to achieve atomicity of updates. Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. > > This seems to be excessive for live data tracking, and "relaxed" could be used instead. The only serious user of that data is collection set chooser, which runs at safepoint and so everything should be quiescent when that happens. > > Additional testing: > - [x] Linux x86_64 hotspot_gc_shenandoah > - [x] Linux AArch64 hotspot_gc_shenandoah > - [x] Linux AArch64 tier1 with Shenandoah Looks good ------------- Marked as reviewed by zgu (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2503 From shade at openjdk.java.net Wed Feb 10 19:07:37 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 10 Feb 2021 19:07:37 GMT Subject: RFR: 8261495: Shenandoah: reconsider update references memory ordering [v2] In-Reply-To: <4RLKvcdaWu0Cu6owC3yGoVY1KVEsYjBZEFJhfdwnhWg=.65fbeae1-58f6-48d3-a2ed-981858ef7da9@github.com> References: <4RLKvcdaWu0Cu6owC3yGoVY1KVEsYjBZEFJhfdwnhWg=.65fbeae1-58f6-48d3-a2ed-981858ef7da9@github.com> Message-ID: > Shenandoah update heap references code uses default Atomic::cmpxchg to avoid races with mutator updates. Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. > > This seems to be excessive for Shenandoah update references code, and "acq_rel" is enough. We do not seem to piggyback on update-references memory effects anywhere (in fact, if not for mutator, we would not even need a CAS). But, there is an interplay with concurrent evacuation and updates from self-healing. > > Sample run with aggressive (back-to-back cycles) on SPECjvm2008:compiler.compiler on AArch64: > > # Baseline > [135.065s][info][gc,stats] Concurrent Update Refs = 73.685 s (a = 295924 us) (n = 249) > (lvls, us = 354, 3418, 349609, 564453, 715405) > > # Patched > [127.649s][info][gc,stats] Concurrent Update Refs = 54.389 s (a = 169437 us) (n = 321) > (lvls, us = 324, 2188, 183594, 322266, 394495) > > Average time goes down, the number of GC cycles go up, since the cycles are shorter. > > Additional testing: > - [x] Linux x86_64 hotspot_gc_shenandoah > - [x] Linux AArch64 hotspot_gc_shenandoah > - [x] Linux AArch64 tier1 with Shenandoah Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Do acq_rel instead ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2498/files - new: https://git.openjdk.java.net/jdk/pull/2498/files/d83b9af4..87a609f4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2498&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2498&range=00-01 Stats: 41 lines in 1 file changed: 38 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/2498.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2498/head:pull/2498 PR: https://git.openjdk.java.net/jdk/pull/2498 From shade at openjdk.java.net Wed Feb 10 19:07:38 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 10 Feb 2021 19:07:38 GMT Subject: RFR: 8261495: Shenandoah: reconsider update references memory ordering [v2] In-Reply-To: <-XbC4UcEc8lhp2-6w1hq2sOHrX2R-x7nfdgMuUWTxwg=.b38923c7-ca15-4f17-804d-e44942f71621@github.com> References: <4RLKvcdaWu0Cu6owC3yGoVY1KVEsYjBZEFJhfdwnhWg=.65fbeae1-58f6-48d3-a2ed-981858ef7da9@github.com> <-XbC4UcEc8lhp2-6w1hq2sOHrX2R-x7nfdgMuUWTxwg=.b38923c7-ca15-4f17-804d-e44942f71621@github.com> Message-ID: On Wed, 10 Feb 2021 15:25:33 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp line 149: >> >>> 147: assert(is_aligned(addr, sizeof(narrowOop)), "Address should be aligned: " PTR_FORMAT, p2i(addr)); >>> 148: narrowOop val = CompressedOops::encode(n); >>> 149: return CompressedOops::decode(Atomic::cmpxchg(addr, c, val, memory_order_relaxed)); >> >> Are you sure it is sufficient? I would think it needs acq/rel pair, otherwise, read side can see incomplete oop ... > > Actually, I think you are right: we must ensure the cumulativity of the barriers. Let me think a bit more about it. I think I convinced myself there is a need for `memory_order_acq_rel`. I added a sketch of (counter-)example in code comments. See if that what you were concerned about? ------------- PR: https://git.openjdk.java.net/jdk/pull/2498 From shade at openjdk.java.net Wed Feb 10 19:10:43 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 10 Feb 2021 19:10:43 GMT Subject: RFR: 8261504: Shenandoah: reconsider ShenandoahJavaThreadsIterator::claim memory ordering Message-ID: JDK-8256298 added the thread iterator for thread roots, and I don't think we need the Hotspot's default memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. The simple "relaxed" should do. Additional testing: - [x] Linux x86_64 hotspot_gc_shenandoah - [x] Linux AArch64 hotspot_gc_shenandoah - [x] Linux AArch64 tier1 with Shenandoah ------------- Commit messages: - 8261504: Shenandoah: reconsider ShenandoahJavaThreadsIterator::claim memory ordering Changes: https://git.openjdk.java.net/jdk/pull/2506/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2506&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261504 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2506.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2506/head:pull/2506 PR: https://git.openjdk.java.net/jdk/pull/2506 From zgu at openjdk.java.net Wed Feb 10 19:23:39 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 10 Feb 2021 19:23:39 GMT Subject: RFR: 8261495: Shenandoah: reconsider update references memory ordering [v2] In-Reply-To: References: <4RLKvcdaWu0Cu6owC3yGoVY1KVEsYjBZEFJhfdwnhWg=.65fbeae1-58f6-48d3-a2ed-981858ef7da9@github.com> <-XbC4UcEc8lhp2-6w1hq2sOHrX2R-x7nfdgMuUWTxwg=.b38923c7-ca15-4f17-804d-e44942f71621@github.com> Message-ID: On Wed, 10 Feb 2021 18:59:42 GMT, Aleksey Shipilev wrote: >> Actually, I think you are right: we must ensure the cumulativity of the barriers. Let me think a bit more about it. > > I think I convinced myself there is a need for `memory_order_acq_rel`. I added a sketch of (counter-)example in code comments. See if that what you were concerned about? Actually, what I meant is that CAS here should use memory_order_release and load barrier needs memory_order_acquire. I am not sure memory_order_acq_rel is sufficient, C++ states _memory_order_acq_rel_: A read-modify-write operation with this memory order is both an acquire operation and a release operation. No memory reads or writes in the current thread can be reordered before or after this store. All writes in other threads that release the same atomic variable are visible before the modification and **the modification is visible in other threads that acquire the same atomic variable.** ------------- PR: https://git.openjdk.java.net/jdk/pull/2498 From zgu at openjdk.java.net Wed Feb 10 19:25:38 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 10 Feb 2021 19:25:38 GMT Subject: RFR: 8261504: Shenandoah: reconsider ShenandoahJavaThreadsIterator::claim memory ordering In-Reply-To: References: Message-ID: On Wed, 10 Feb 2021 12:00:38 GMT, Aleksey Shipilev wrote: > JDK-8256298 added the thread iterator for thread roots, and I don't think we need the Hotspot's default memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. The simple "relaxed" should do. > > Additional testing: > - [x] Linux x86_64 hotspot_gc_shenandoah > - [x] Linux AArch64 hotspot_gc_shenandoah > - [x] Linux AArch64 tier1 with Shenandoah Looks good. ------------- Marked as reviewed by zgu (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2506 From zgu at openjdk.java.net Wed Feb 10 20:13:52 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 10 Feb 2021 20:13:52 GMT Subject: RFR: 8261473: Shenandoah: Add breakpoint suppoprt [v2] In-Reply-To: References: Message-ID: <9gXmTI0gU9zTr-HffSqSsVEVjmUED0rNINulpr_mjQM=.b353278c-a4f1-45f5-bf8d-ed1e33fbb0c9@github.com> > Please review this patch that adds breakpoint support for Shenandoah, that allows Shenandoah to access a few tests: > > gc/TestConcurrentGCBreakpoints.java > gc/TestJNIWeak/TestJNIWeak.java > gc/TestReferenceClearDuringMarking.java > gc/TestReferenceClearDuringReferenceProcessing.java > gc/TestReferenceRefersTo.java > > The drawback is that above tests can not run with passive mode, which can result tests to hang, as breakpoints only apply to concurrent GC. > > Test: > - [x] hotspot_gc_shenandoah > - [x] tier1 with Shenandoah Zhengyu Gu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - Merge - update - init update ------------- Changes: https://git.openjdk.java.net/jdk/pull/2489/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2489&range=01 Stats: 170 lines in 8 files changed: 158 ins; 2 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/2489.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2489/head:pull/2489 PR: https://git.openjdk.java.net/jdk/pull/2489 From zgu at openjdk.java.net Wed Feb 10 20:47:37 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 10 Feb 2021 20:47:37 GMT Subject: RFR: 8261495: Shenandoah: reconsider update references memory ordering [v2] In-Reply-To: References: <4RLKvcdaWu0Cu6owC3yGoVY1KVEsYjBZEFJhfdwnhWg=.65fbeae1-58f6-48d3-a2ed-981858ef7da9@github.com> <-XbC4UcEc8lhp2-6w1hq2sOHrX2R-x7nfdgMuUWTxwg=.b38923c7-ca15-4f17-804d-e44942f71621@github.com> Message-ID: On Wed, 10 Feb 2021 18:59:42 GMT, Aleksey Shipilev wrote: >> Actually, I think you are right: we must ensure the cumulativity of the barriers. Let me think a bit more about it. > > I think I convinced myself there is a need for `memory_order_acq_rel`. I added a sketch of (counter-)example in code comments. See if that what you were concerned about? Actually, what I meant is that CAS here should use memory_order_release and load barrier needs memory_order_acquire. I am not sure memory_order_acq_rel is sufficient, C++ states _memory_order_acq_rel_: A read-modify-write operation with this memory order is both an acquire operation and a release operation. No memory reads or writes in the current thread can be reordered before or after this store. All writes in other threads that release the same atomic variable are visible before the modification and **the modification is visible in other threads that acquire the same atomic variable.** ------------- PR: https://git.openjdk.java.net/jdk/pull/2498 From cjplummer at openjdk.java.net Wed Feb 10 21:14:43 2021 From: cjplummer at openjdk.java.net (Chris Plummer) Date: Wed, 10 Feb 2021 21:14:43 GMT Subject: RFR: 8247514: Improve clhsdb 'findpc' ability to determine what an address points to by improving PointerFinder and PointerLocation classes In-Reply-To: References: <4YKNpyXQ9QGrLhR61tkh71Q3A7VvCj5Ete_4OvzAA-o=.28b7be8c-6f05-42d4-892b-87ebea907b24@github.com> Message-ID: <3oj2UVlWuk4yylfNEKxWcKSqUAw7p0oG9C9QsGxYidc=.6ddd8f76-bba7-43b4-b508-c55ab893c7d1@github.com> On Wed, 10 Feb 2021 17:52:59 GMT, Kevin Walls wrote: >> See the bug for most details. A few notes here about some implementation details: >> >> In the `PointerLocation` class, I added more consistency w.r.t. whether or not a newline is printed. It used to for some address types, but not others. Now it always does. And if you see a comment something like the following: >> >> ` getTLAB().printOn(tty); // includes "\n" ` >> >> That's just clarifying whether or not the `printOn()` method called will include the newline. Some do and some don't, and knowing what the various `printOn()` methods do makes getting the proper inclusion of the newline easier to understand. >> >> I added `verbose` and `printAddress` boolean arguments to `PointerLocation.printOn()`. Currently they are always `true`. The false arguments will be used when I complete [JDK-8250801](https://bugs.openjdk.java.net/browse/JDK-8250801), which will use `PointerFinder/Location` to show what each register points to. >> >> The CR mentions that the main motivation for this work is for eventual replacement of the old clhsdb `whatis` command, which was implemented in javascript. It used to resolve DSO symbols, whereas `findpc` did not. The `whatis` code did this with the following: >> >> var dso = loadObjectContainingPC(addr); >> if (dso == null) { >> return ptrLoc.toString(); >> } >> var sym = dso.closestSymbolToPC(addr); >> if (sym != null) { >> return sym.name + '+' + sym.offset; >> } >> And now you'll see something similar in the PointerFinder code: >> >> loc.loadObject = cdbg.loadObjectContainingPC(a); >> if (loc.loadObject != null) { >> loc.nativeSymbol = loc.loadObject.closestSymbolToPC(a); >> return loc; >> } >> Note that now that `findpc` does everything that `whatis` used to (and more), we don't really need to add a java version of `whatis`, but I'll probably do so anyway just help out people who are used to using the `whatis` command. That will be done using [JDK-8244670](https://bugs.openjdk.java.net/browse/JDK-8244670) > > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/utilities/PointerLocation.java line 247: > >> 245: stackThread.getStackBase(), stackThread.lastSPDbg(), >> 246: stackThread.getStackBase().addOffsetTo(-stackThread.getStackSize()), >> 247: stackThread); > > When we print a JavaThread, in the verbose block, > the final argument to tty.format in line 247, I wonder what that prints? > > We then call printThreadInfoOn() which will first print the quoted thread name, > so maybe we don't need that item. > Or maybe we want the JavaThread.toString()? `stackThread.toString()` ends up in `VMObject.toString()`: public String toString() { return getClass().getName() + "@" + addr; } And here's an example output: hsdb> + findpc 0x0000152f45df6000 Address 0x0000152f45df6000: In java stack [0x0000152f45df8000,0x0000152f45df6580,0x0000152f45cf7000] for thread sun.jvm.hotspot.runtime.JavaThread at 0x0000152f3c026f70: "main" #1 prio=5 tid=0x0000152f3c026f70 nid=0x308e waiting on condition [0x0000152f45df6000] java.lang.Thread.State: TIMED_WAITING (sleeping) JavaThread state: _thread_blocked So I think the `stackThread` argument is doing what was intended, and there is no duplication in the output. ------------- PR: https://git.openjdk.java.net/jdk/pull/2111 From duke at openjdk.java.net Wed Feb 10 21:41:41 2021 From: duke at openjdk.java.net (duke) Date: Wed, 10 Feb 2021 21:41:41 GMT Subject: Withdrawn: 8257774: G1: Trigger collect when free region count drops below threshold to prevent evacuation failures In-Reply-To: References: Message-ID: On Sun, 6 Dec 2020 17:39:54 GMT, Charlie Gracie wrote: > Bursts of short lived Humongous object allocations can cause GCs to be initiated with 0 free regions. When these GCs happen they take significantly longer to complete. No objects are evacuated so there is a large amount of time spent in reversing self forwarded pointers and the only memory recovered is from the short lived humongous objects. My proposal is to add a check to the slow allocation path which will force a GC to happen if the number of free regions drops below the amount that would be required to complete the GC if it happened at that moment. The threshold will be based on the survival rates from Eden and survivor spaces along with the space required for Tenure space evacuations. > > The goal is to resolve the issue with bursts of short lived humongous objects without impacting other workloads negatively. I would appreciate reviews and any feedback that you might have. Thanks. > > Here are the links to the threads on the mailing list where I initially discussion the issue and my idea to resolve it: > https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-November/032189.html > https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2020-December/032677.html This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/1650 From cjplummer at openjdk.java.net Thu Feb 11 00:06:48 2021 From: cjplummer at openjdk.java.net (Chris Plummer) Date: Thu, 11 Feb 2021 00:06:48 GMT Subject: RFR: 8247514: Improve clhsdb 'findpc' ability to determine what an address points to by improving PointerFinder and PointerLocation classes [v2] In-Reply-To: <4YKNpyXQ9QGrLhR61tkh71Q3A7VvCj5Ete_4OvzAA-o=.28b7be8c-6f05-42d4-892b-87ebea907b24@github.com> References: <4YKNpyXQ9QGrLhR61tkh71Q3A7VvCj5Ete_4OvzAA-o=.28b7be8c-6f05-42d4-892b-87ebea907b24@github.com> Message-ID: > See the bug for most details. A few notes here about some implementation details: > > In the `PointerLocation` class, I added more consistency w.r.t. whether or not a newline is printed. It used to for some address types, but not others. Now it always does. And if you see a comment something like the following: > > ` getTLAB().printOn(tty); // includes "\n" ` > > That's just clarifying whether or not the `printOn()` method called will include the newline. Some do and some don't, and knowing what the various `printOn()` methods do makes getting the proper inclusion of the newline easier to understand. > > I added `verbose` and `printAddress` boolean arguments to `PointerLocation.printOn()`. Currently they are always `true`. The false arguments will be used when I complete [JDK-8250801](https://bugs.openjdk.java.net/browse/JDK-8250801), which will use `PointerFinder/Location` to show what each register points to. > > The CR mentions that the main motivation for this work is for eventual replacement of the old clhsdb `whatis` command, which was implemented in javascript. It used to resolve DSO symbols, whereas `findpc` did not. The `whatis` code did this with the following: > > var dso = loadObjectContainingPC(addr); > if (dso == null) { > return ptrLoc.toString(); > } > var sym = dso.closestSymbolToPC(addr); > if (sym != null) { > return sym.name + '+' + sym.offset; > } > And now you'll see something similar in the PointerFinder code: > > loc.loadObject = cdbg.loadObjectContainingPC(a); > if (loc.loadObject != null) { > loc.nativeSymbol = loc.loadObject.closestSymbolToPC(a); > return loc; > } > Note that now that `findpc` does everything that `whatis` used to (and more), we don't really need to add a java version of `whatis`, but I'll probably do so anyway just help out people who are used to using the `whatis` command. That will be done using [JDK-8244670](https://bugs.openjdk.java.net/browse/JDK-8244670) Chris Plummer has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - Merge master - Improvements for PointerFinder and findpc command. ------------- Changes: https://git.openjdk.java.net/jdk/pull/2111/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2111&range=01 Stats: 291 lines in 5 files changed: 237 ins; 8 del; 46 mod Patch: https://git.openjdk.java.net/jdk/pull/2111.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2111/head:pull/2111 PR: https://git.openjdk.java.net/jdk/pull/2111 From kim.barrett at oracle.com Thu Feb 11 03:59:27 2021 From: kim.barrett at oracle.com (Kim Barrett) Date: Wed, 10 Feb 2021 22:59:27 -0500 Subject: Atomic operations: your thoughts are welocme In-Reply-To: References: Message-ID: <448C638F-D688-4913-875C-5D8BA9235126@oracle.com> > On Feb 8, 2021, at 1:14 PM, Andrew Haley wrote: > > I've been looking at the hottest Atomic operations in HotSpot, with a view to > finding out if the default memory_order_conservative (which is very expensive > on some architectures) can be weakened to something less. It's impossible to > fix all of them, but perhaps we can fix some of the most frequent. Is there any information about the possible performance improvement from such changes? 1.5-3M occurrences doesn't mean much without context. We don't presently have support for sequentially consistent semantics, only "conservative". My recollection is that this is in part because there might be code that is assuming the possibly stronger "conservative" semantics, and in part because there are different and incompatible approaches to implementing sequentially consistent semantics on some hardware platforms and we didn't want to make assumptions there. We also don't presently have any cmpxchg implementation that really supports anything between conservative and relaxed, nor do we support different order constraints for the success vs failure cases. Things can be complicated enough as is; while we *could* fill some of that in, I'm not sure we should. > These are the hottest compare-and-swap uses in HotSpot, with the count > at the end of each line. > > : :: = 16406757 > > This one is already memory_order_relaxed, so no problem. Right. Although I?m now wondering why this doesn?t need to do anything on the failure side, similar to what is needed in the similar place in ParallelGC when that was changed to use a relaxed cmpxchg. > ::Table::oop_oop_iterate(G1CMOopClosure*, oopDesc*, Klass*)+336>: :: = 3903178 > > This is actually MarkBitMap::par_mark calling BitMap::par_set_bit. Does this > need to be memory_order_conservative, or would something weaker do? Even > acq_rel or seq_cst would be better. I think for setting bits in a bitmap the thing to do would be to identify places that are safe and useful (impacts performance) to do so first. Then add a weaker variant for use in those places, assuming any are found. > : :: = 2376632 > : :: = 2003895 > > I can't imagine that either of these actually need memory_order_conservative, > they're just reference counts. The "usual" refcount implementation involves relaxed increment and stronger ordering for decrement. (If I'm remembering correctly, dec-acquire and a release fence on the zero value path before deleting. But I've not thought about what one might want for this CAS-based variant that handles boundary cases specially.) And as you say, whether any of these could be weakened depends on whether there is any code surrounding a use that depends on the stronger ordering semantics. At a guess I suspect increment could be changed to relaxed, but I've not looked. This one is probably a question for runtime folks. > : :: = 1719614 > > BitMap::par_set_bit again. > > , (MEMFLAGS)5>*)+432>: :: = 1617659 > > This one is GenericTaskQueue::pop_global calling cmpxchg_age(). > Again, do we need conservative here? This needs at least sequentially consistent semantics on the success path. See the referenced paper by Le, et al. There is also a cmpxchg_age in pop_local_slow. The Le, et al paper doesn't deal with that path. But it's also not in your list, which is good since this is supposed to be infrequently taken. From kim.barrett at oracle.com Thu Feb 11 04:09:56 2021 From: kim.barrett at oracle.com (Kim Barrett) Date: Wed, 10 Feb 2021 23:09:56 -0500 Subject: Atomic operations: your thoughts are welocme In-Reply-To: <448C638F-D688-4913-875C-5D8BA9235126@oracle.com> References: <448C638F-D688-4913-875C-5D8BA9235126@oracle.com> Message-ID: <74CD1B2A-E99A-4A97-BBE3-3DF6ED506A11@oracle.com> > On Feb 10, 2021, at 10:59 PM, Kim Barrett wrote: > We also don't presently have any cmpxchg implementation that really supports > anything between conservative and relaxed, nor do we support different order > constraints for the success vs failure cases. Things can be complicated > enough as is; while we *could* fill some of that in, I'm not sure we should. I forgot that the linux-ppc port tries harder in this area. This was so a release-cmpxchg could be used in ParallelGC's PSPromotionManager::copy_to_survivor_space and benefit from that. The initial proposal was to use relaxed-cmpxchg, but that was shown to be insufficient. From shade at openjdk.java.net Thu Feb 11 06:37:56 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 11 Feb 2021 06:37:56 GMT Subject: RFR: 8261495: Shenandoah: reconsider update references memory ordering [v3] In-Reply-To: <4RLKvcdaWu0Cu6owC3yGoVY1KVEsYjBZEFJhfdwnhWg=.65fbeae1-58f6-48d3-a2ed-981858ef7da9@github.com> References: <4RLKvcdaWu0Cu6owC3yGoVY1KVEsYjBZEFJhfdwnhWg=.65fbeae1-58f6-48d3-a2ed-981858ef7da9@github.com> Message-ID: > Shenandoah update heap references code uses default Atomic::cmpxchg to avoid races with mutator updates. Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. > > This seems to be excessive for Shenandoah update references code, and "acq_rel" is enough. We do not seem to piggyback on update-references memory effects anywhere (in fact, if not for mutator, we would not even need a CAS). But, there is an interplay with concurrent evacuation and updates from self-healing. > > Sample run with aggressive (back-to-back cycles) on SPECjvm2008:compiler.compiler on AArch64: > > # Baseline > [135.065s][info][gc,stats] Concurrent Update Refs = 73.685 s (a = 295924 us) (n = 249) > (lvls, us = 354, 3418, 349609, 564453, 715405) > > # Patched > [127.649s][info][gc,stats] Concurrent Update Refs = 54.389 s (a = 169437 us) (n = 321) > (lvls, us = 324, 2188, 183594, 322266, 394495) > > Average time goes down, the number of GC cycles go up, since the cycles are shorter. > > Additional testing: > - [x] Linux x86_64 hotspot_gc_shenandoah > - [x] Linux AArch64 hotspot_gc_shenandoah > - [x] Linux AArch64 tier1 with Shenandoah Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Use release only ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2498/files - new: https://git.openjdk.java.net/jdk/pull/2498/files/87a609f4..36bee3a9 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2498&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2498&range=01-02 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/2498.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2498/head:pull/2498 PR: https://git.openjdk.java.net/jdk/pull/2498 From kevinw at openjdk.java.net Thu Feb 11 09:00:44 2021 From: kevinw at openjdk.java.net (Kevin Walls) Date: Thu, 11 Feb 2021 09:00:44 GMT Subject: RFR: 8247514: Improve clhsdb 'findpc' ability to determine what an address points to by improving PointerFinder and PointerLocation classes [v2] In-Reply-To: <3oj2UVlWuk4yylfNEKxWcKSqUAw7p0oG9C9QsGxYidc=.6ddd8f76-bba7-43b4-b508-c55ab893c7d1@github.com> References: <4YKNpyXQ9QGrLhR61tkh71Q3A7VvCj5Ete_4OvzAA-o=.28b7be8c-6f05-42d4-892b-87ebea907b24@github.com> <3oj2UVlWuk4yylfNEKxWcKSqUAw7p0oG9C9QsGxYidc=.6ddd8f76-bba7-43b4-b508-c55ab893c7d1@github.com> Message-ID: On Wed, 10 Feb 2021 21:12:19 GMT, Chris Plummer wrote: >> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/utilities/PointerLocation.java line 247: >> >>> 245: stackThread.getStackBase(), stackThread.lastSPDbg(), >>> 246: stackThread.getStackBase().addOffsetTo(-stackThread.getStackSize()), >>> 247: stackThread); >> >> When we print a JavaThread, in the verbose block, >> the final argument to tty.format in line 247, I wonder what that prints? >> >> We then call printThreadInfoOn() which will first print the quoted thread name, >> so maybe we don't need that item. >> Or maybe we want the JavaThread.toString()? > > `stackThread.toString()` ends up in `VMObject.toString()`: > > public String toString() { > return getClass().getName() + "@" + addr; > } > And here's an example output: > hsdb> + findpc 0x0000152f45df6000 > Address 0x0000152f45df6000: In java stack [0x0000152f45df8000,0x0000152f45df6580,0x0000152f45cf7000] for thread sun.jvm.hotspot.runtime.JavaThread at 0x0000152f3c026f70: > "main" #1 prio=5 tid=0x0000152f3c026f70 nid=0x308e waiting on condition [0x0000152f45df6000] > java.lang.Thread.State: TIMED_WAITING (sleeping) > JavaThread state: _thread_blocked > So I think the `stackThread` argument is doing what was intended, and there is no duplication in the output. Great, thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/2111 From shade at redhat.com Thu Feb 11 13:13:39 2021 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 11 Feb 2021 14:13:39 +0100 Subject: Atomic operations: your thoughts are welocme In-Reply-To: <448C638F-D688-4913-875C-5D8BA9235126@oracle.com> References: <448C638F-D688-4913-875C-5D8BA9235126@oracle.com> Message-ID: On 2/11/21 4:59 AM, Kim Barrett wrote: >> On Feb 8, 2021, at 1:14 PM, Andrew Haley wrote: >> >> I've been looking at the hottest Atomic operations in HotSpot, with a view to >> finding out if the default memory_order_conservative (which is very expensive >> on some architectures) can be weakened to something less. It's impossible to >> fix all of them, but perhaps we can fix some of the most frequent. > > Is there any information about the possible performance improvement from > such changes? 1.5-3M occurrences doesn't mean much without context. I am going through the exercise of relaxing some of the memory orders in Shenandoah code, and AArch64 benefits greatly from it (= two-way barriers are bad in hot code). There are obvious things like relaxing counter updates: JDK-8261503: Shenandoah: reconsider verifier memory ordering JDK-8261501: Shenandoah: reconsider heap statistics memory ordering JDK-8261500: Shenandoah: reconsider region live data memory ordering JDK-8261496: Shenandoah: reconsider pacing updates memory ordering There are more interesting things like relaxing accesses to marking bitmap (which is a large counter array in disguise) -- which effectively implies a CAS (and thus two FULL_MEM_BARRIER-s on AArch64) per marked object: JDK-8261493: Shenandoah: reconsider bitmap access memory ordering These five relaxations above cut down marking phase time on AArch64 for about 10..15%. And there is more advanced stuff where relaxed is not enough, but conservative is too conservative. There, acq/rel should be enough -- but we cannot yet test it, because AArch64 cmpxchg does not do anything except relaxed/conservative (JDK-8261579): JDK-8261492: Shenandoah: reconsider forwardee accesses memory ordering JDK-8261495: Shenandoah: reconsider update references memory ordering These two (along with experimental 8261579 fix) cut down evacuation and update-references phase times for about 25..30% and 10..15%, respectively. All in all, this cuts down Shenandoah GC cycle times on AArch64 for about 15..20%! So, I believe this shows enough benefit to invest our time. Heavy-duty GC code is where I expect the most benefit. -- Thanks, -Aleksey From zgu at openjdk.java.net Thu Feb 11 13:30:43 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 11 Feb 2021 13:30:43 GMT Subject: RFR: 8261495: Shenandoah: reconsider update references memory ordering [v3] In-Reply-To: References: <4RLKvcdaWu0Cu6owC3yGoVY1KVEsYjBZEFJhfdwnhWg=.65fbeae1-58f6-48d3-a2ed-981858ef7da9@github.com> <-XbC4UcEc8lhp2-6w1hq2sOHrX2R-x7nfdgMuUWTxwg=.b38923c7-ca15-4f17-804d-e44942f71621@github.com> Message-ID: On Wed, 10 Feb 2021 20:44:34 GMT, Zhengyu Gu wrote: >> I think I convinced myself there is a need for `memory_order_acq_rel`. I added a sketch of (counter-)example in code comments. See if that what you were concerned about? > > Actually, what I meant is that CAS here should use memory_order_release and load barrier needs memory_order_acquire. > I am not sure memory_order_acq_rel is sufficient, C++ states _memory_order_acq_rel_: > > A read-modify-write operation with this memory order is both an acquire operation and a release operation. No memory reads or writes in the current thread can be reordered before or after this store. All writes in other threads that release the same atomic variable are visible before the modification and **the modification is visible in other threads that acquire the same atomic variable.** > > For atomic_update_oop(), leading acq is prob. unnecessary, since the oop is either evacuated by current thread or there is a safepoint in between. My concern is missing read barrier on read side, e.g. an oop is evacuated and updated by a mutator, then the second Java thread follows fwdptr and loads the oop, without a read barrier on then load, it may see incompleted/newly evacuated oop. Ah, I missed JDK-8261495, which is counterpart of this change. I think memory_order_release is good. ------------- PR: https://git.openjdk.java.net/jdk/pull/2498 From zgu at openjdk.java.net Thu Feb 11 13:30:42 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 11 Feb 2021 13:30:42 GMT Subject: RFR: 8261495: Shenandoah: reconsider update references memory ordering [v3] In-Reply-To: References: <4RLKvcdaWu0Cu6owC3yGoVY1KVEsYjBZEFJhfdwnhWg=.65fbeae1-58f6-48d3-a2ed-981858ef7da9@github.com> Message-ID: On Thu, 11 Feb 2021 06:37:56 GMT, Aleksey Shipilev wrote: >> Shenandoah update heap references code uses default Atomic::cmpxchg to avoid races with mutator updates. Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. >> >> This seems to be excessive for Shenandoah update references code, and "release" is enough. We do not seem to piggyback on update-references memory effects anywhere (in fact, if not for mutator, we would not even need a CAS). But, there is an interplay with concurrent evacuation and updates from self-healing. >> >> Average time goes down, the number of GC cycles go up, since the cycles are shorter. >> >> Additional testing: >> - [x] Linux x86_64 hotspot_gc_shenandoah >> - [x] Linux AArch64 hotspot_gc_shenandoah >> - [x] Linux AArch64 tier1 with Shenandoah > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Use release only Looks good to me. ------------- Marked as reviewed by zgu (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2498 From aph at redhat.com Thu Feb 11 13:33:30 2021 From: aph at redhat.com (Andrew Haley) Date: Thu, 11 Feb 2021 13:33:30 +0000 Subject: Atomic operations: your thoughts are welocme In-Reply-To: <448C638F-D688-4913-875C-5D8BA9235126@oracle.com> References: <448C638F-D688-4913-875C-5D8BA9235126@oracle.com> Message-ID: <49d0408a-13f9-ddc8-06e3-e0eb27a708dd@redhat.com> On 11/02/2021 03:59, Kim Barrett wrote: >> On Feb 8, 2021, at 1:14 PM, Andrew Haley wrote: >> >> I've been looking at the hottest Atomic operations in HotSpot, with a view to >> finding out if the default memory_order_conservative (which is very expensive >> on some architectures) can be weakened to something less. It's impossible to >> fix all of them, but perhaps we can fix some of the most frequent. > > Is there any information about the possible performance improvement from > such changes? 1.5-3M occurrences doesn't mean much without context. > > We don't presently have support for sequentially consistent semantics, only > "conservative". My recollection is that this is in part because there might > be code that is assuming the possibly stronger "conservative" semantics, and > in part because there are different and incompatible approaches to > implementing sequentially consistent semantics on some hardware platforms > and we didn't want to make assumptions there. > > We also don't presently have any cmpxchg implementation that really supports > anything between conservative and relaxed, nor do we support different order > constraints for the success vs failure cases. Things can be complicated > enough as is; while we *could* fill some of that in, I'm not sure we should. OK. However, even though we don't implement any of them, we do have an API that includes acq, rel, and seq_cst. The fact that we don't have anything behind them is, I thought, To Be Done rather than Won't Do. >> ::Table::oop_oop_iterate(G1CMOopClosure*, oopDesc*, Klass*)+336>: :: = 3903178 >> >> This is actually MarkBitMap::par_mark calling BitMap::par_set_bit. Does this >> need to be memory_order_conservative, or would something weaker do? Even >> acq_rel or seq_cst would be better. > > I think for setting bits in a bitmap the thing to do would be to identify > places that are safe and useful (impacts performance) to do so first. Then > add a weaker variant for use in those places, assuming any are found. I see. I'm assuming that frequency of use is a useful proxy for impact. Aleksey has already, very helpfully, measured how significant these are for Shenandoah, and I suspect all concurrent GCs would benefit in a similar fashion. >> : :: = 2376632 >> : :: = 2003895 >> >> I can't imagine that either of these actually need memory_order_conservative, >> they're just reference counts. > > The "usual" refcount implementation involves relaxed increment and stronger > ordering for decrement. (If I'm remembering correctly, dec-acquire and a > release fence on the zero value path before deleting. But I've not thought > about what one might want for this CAS-based variant that handles boundary > cases specially.) And as you say, whether any of these could be weakened > depends on whether there is any code surrounding a use that depends on the > stronger ordering semantics. At a guess I suspect increment could be changed > to relaxed, but I've not looked. This one is probably a question for runtime > folks. OK, this makes sense. I'm thinking of the long road to getting this stuff documented so that we can see what side effects of atomic operations are actually required. >> : :: = 1719614 >> >> BitMap::par_set_bit again. >> >> , (MEMFLAGS)5>*)+432>: :: = 1617659 >> >> This one is GenericTaskQueue::pop_global calling cmpxchg_age(). >> Again, do we need conservative here? > > This needs at least sequentially consistent semantics on the success path. Yep. That's easy, it's the full barrier in the failure path that I'd love to eliminate. > See the referenced paper by Le, et al. > > There is also a cmpxchg_age in pop_local_slow. The Le, et al paper doesn't > deal with that path. But it's also not in your list, which is good since > this is supposed to be infrequently taken. Right. I'm trying to concentrate on the low-hanging fruit. Thank you for the very detailed and informative reply. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From ayang at openjdk.java.net Thu Feb 11 15:54:41 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Thu, 11 Feb 2021 15:54:41 GMT Subject: RFR: 8261230: GC tracing of page sizes are wrong in a few places In-Reply-To: References: Message-ID: On Tue, 9 Feb 2021 19:42:15 GMT, Stefan Johansson wrote: > The usage of `os::trace_page_sizes()` and friends are wrongly assuming that we always get the page size requested and needs to be updated. This is done by using the helper `ReservedSpace::actual_reserved_page_size()` instead of blindly trusting we get what we ask for. I have plans for the future to get rid of this helper and instead record the page size used in the `ReservedSpace`, but for now the helper is good enough. > > In G1 we used the helper but switched the order of the page size and the alignment parameter, which in turn helped the test to pass since the alignment will match the page size we expect in the test. The test had to be improved to recognize mapping failures. In the test, there's some string matching to detect if large page is properly set up. I think it's best to include an excerpt of the log showing both the success and failure modes in the comments. This way even readers who are not intimately familiar with the gc-logs output could still feel fairly confident that the output parsing part is indeed correct. ------------- Changes requested by ayang (Author). PR: https://git.openjdk.java.net/jdk/pull/2486 From sjohanss at openjdk.java.net Thu Feb 11 16:22:42 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Thu, 11 Feb 2021 16:22:42 GMT Subject: RFR: 8261230: GC tracing of page sizes are wrong in a few places In-Reply-To: References: Message-ID: On Tue, 9 Feb 2021 19:42:15 GMT, Stefan Johansson wrote: > The usage of `os::trace_page_sizes()` and friends are wrongly assuming that we always get the page size requested and needs to be updated. This is done by using the helper `ReservedSpace::actual_reserved_page_size()` instead of blindly trusting we get what we ask for. I have plans for the future to get rid of this helper and instead record the page size used in the `ReservedSpace`, but for now the helper is good enough. > > In G1 we used the helper but switched the order of the page size and the alignment parameter, which in turn helped the test to pass since the alignment will match the page size we expect in the test. The test had to be improved to recognize mapping failures. test/hotspot/jtreg/gc/g1/TestLargePageUseForAuxMemory.java line 59: > 57: static void checkSize(OutputAnalyzer output, long expectedSize, String pattern) { > 58: // First check if there is a large page failure associated with > 59: // the data structure being checked. Are you thinking something like this @albertnetymk? Suggestion: // First check if there is a large page failure associated with // the data structure being checked. In case of a large page // allocation failure the output will include logs like this for // the affected data structure: // [0.048s][debug][gc,heap,coops] Reserve regular memory without large pages // [0.048s][info ][pagesize ] Next Bitmap: ... page_size=2M ... ------------- PR: https://git.openjdk.java.net/jdk/pull/2486 From ayang at openjdk.java.net Thu Feb 11 16:32:39 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Thu, 11 Feb 2021 16:32:39 GMT Subject: RFR: 8261230: GC tracing of page sizes are wrong in a few places In-Reply-To: References: Message-ID: <4iQZOAzPHRx26UBskJ6u6TY_gGyr0QwcsbqD9KCNiiU=.4bcea2f1-d5a7-4880-81e8-611bb78dba5c@github.com> On Thu, 11 Feb 2021 16:20:07 GMT, Stefan Johansson wrote: >> The usage of `os::trace_page_sizes()` and friends are wrongly assuming that we always get the page size requested and needs to be updated. This is done by using the helper `ReservedSpace::actual_reserved_page_size()` instead of blindly trusting we get what we ask for. I have plans for the future to get rid of this helper and instead record the page size used in the `ReservedSpace`, but for now the helper is good enough. >> >> In G1 we used the helper but switched the order of the page size and the alignment parameter, which in turn helped the test to pass since the alignment will match the page size we expect in the test. The test had to be improved to recognize mapping failures. > > test/hotspot/jtreg/gc/g1/TestLargePageUseForAuxMemory.java line 59: > >> 57: static void checkSize(OutputAnalyzer output, long expectedSize, String pattern) { >> 58: // First check if there is a large page failure associated with >> 59: // the data structure being checked. > > Are you thinking something like this @albertnetymk? > Suggestion: > > // First check if there is a large page failure associated with > // the data structure being checked. In case of a large page > // allocation failure the output will include logs like this for > // the affected data structure: > // [0.048s][debug][gc,heap,coops] Reserve regular memory without large pages > // [0.048s][info ][pagesize ] Next Bitmap: ... page_size=4K ... Yes, and also for `checkLargePagesEnabled`. It's not obvious to me why we parse the output in those two places, one looking for the failure mode, and the other looking for the success mode. That's why I asked for an sample of the "expected" log output. ------------- PR: https://git.openjdk.java.net/jdk/pull/2486 From jaroslav.bachorik at datadoghq.com Thu Feb 11 17:29:32 2021 From: jaroslav.bachorik at datadoghq.com (=?UTF-8?Q?Jaroslav_Bachor=C3=ADk?=) Date: Thu, 11 Feb 2021 18:29:32 +0100 Subject: Can GC implementations provide a cheap estimation of live set size? In-Reply-To: References: Message-ID: Hi Roman, Thanks for your response. I checked ZGC implementation and, indeed, it is very easy to get the liveness information just by extending `ZStatHeap` class to report the last valid value of `_at_mark_end.live`. I am also able to get this info from Shenandoah, although my first attempt still involves a safepointing VM operation since I need to iterate over regions to get the liveness info for each of them and sum it up. I think it is still an acceptable trade-off, though. The next one in the queue is the Serial GC. My assumptions, based on reading the code, are that for young gen 'live = used' at the end of DefNewGeneration::collect() method and for old gen 'live = used - slack' (slack is the cumulative size of objects considered to be alive for the purpose of compaction although they are really dead - see CompactibleSpace::scan_and_forward()). Does this sound reasonable? I will post my findings for Parallel GC and G1 GC later. Cheers, -JB- On Wed, Feb 10, 2021 at 11:34 AM Roman Kennke wrote: > > Hello Jaroslav, > > > In connection with https://bugs.openjdk.java.net/browse/JDK-8258431 I > > am trying to figure out whether providing a cheap estimation of live > > set size is something actually achievable across various GC > > implementations. > > > > What I am looking at is piggy-backing on a concurrent mark task to get > > the summary size of live objects - using the 'straight-forward' > > heap-inspection like approach is prohibitively expensive. > > In Shenandoah, this information is already collected during concurrent > marking. We currently don't print it directly, but we could certainly do > that. I'll look into implementing it. I'll also look into exposing > liveness info via JMX. > > I'm not quite sure about G1: that information would only be collected > during mixed or full collections. I am not sure if G1 prints it, though. > > ZGC prints this under -Xlog:gc+heap: > > [6,502s][info][gc,heap ] GC(0) Mark Start > Mark End Relocate Start Relocate End High > Low > [6,502s][info][gc,heap ] GC(0) Capacity: 834M (10%) > 1076M (13%) 1092M (14%) 1092M (14%) 1092M (14%) > 834M (10%) > [6,502s][info][gc,heap ] GC(0) Free: 7154M (90%) > 6912M (87%) 6916M (87%) 7388M (92%) 7388M (92%) > 6896M (86%) > [6,502s][info][gc,heap ] GC(0) Used: 834M (10%) > 1076M (13%) 1072M (13%) 600M (8%) 1092M (14%) > 600M (8%) > [6,502s][info][gc,heap ] GC(0) Live: - > 195M (2%) 195M (2%) 195M (2%) - > - > [6,502s][info][gc,heap ] GC(0) Allocated: - > 242M (3%) 270M (3%) 380M (5%) - > - > [6,502s][info][gc,heap ] GC(0) Garbage: - > 638M (8%) 606M (8%) 24M (0%) - > - > [6,502s][info][gc,heap ] GC(0) Reclaimed: - > - 32M (0%) 614M (8%) - > - > > I hope that is useful? > > Thanks, > Roman > From stuefe at openjdk.java.net Thu Feb 11 17:34:38 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Thu, 11 Feb 2021 17:34:38 GMT Subject: RFR: 8261230: GC tracing of page sizes are wrong in a few places In-Reply-To: References: Message-ID: On Tue, 9 Feb 2021 19:42:15 GMT, Stefan Johansson wrote: > The usage of `os::trace_page_sizes()` and friends are wrongly assuming that we always get the page size requested and needs to be updated. This is done by using the helper `ReservedSpace::actual_reserved_page_size()` instead of blindly trusting we get what we ask for. I have plans for the future to get rid of this helper and instead record the page size used in the `ReservedSpace`, but for now the helper is good enough. > > In G1 we used the helper but switched the order of the page size and the alignment parameter, which in turn helped the test to pass since the alignment will match the page size we expect in the test. The test had to be improved to recognize mapping failures. Looks good. `os::trace_page_sizes_for_requested_size` is not easy to understand, especially with the alignment vs preferred_page_size semantic. Not sure what alignment has to do with preferred page size. We could prevent these kind of errors and make the code more readable by introducing a page size enum. We only have a handful of valid values anyway. ------------- Marked as reviewed by stuefe (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2486 From sjohanss at openjdk.java.net Thu Feb 11 17:34:40 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Thu, 11 Feb 2021 17:34:40 GMT Subject: RFR: 8261230: GC tracing of page sizes are wrong in a few places In-Reply-To: <4iQZOAzPHRx26UBskJ6u6TY_gGyr0QwcsbqD9KCNiiU=.4bcea2f1-d5a7-4880-81e8-611bb78dba5c@github.com> References: <4iQZOAzPHRx26UBskJ6u6TY_gGyr0QwcsbqD9KCNiiU=.4bcea2f1-d5a7-4880-81e8-611bb78dba5c@github.com> Message-ID: On Thu, 11 Feb 2021 16:29:53 GMT, Albert Mingkun Yang wrote: >> test/hotspot/jtreg/gc/g1/TestLargePageUseForAuxMemory.java line 59: >> >>> 57: static void checkSize(OutputAnalyzer output, long expectedSize, String pattern) { >>> 58: // First check if there is a large page failure associated with >>> 59: // the data structure being checked. >> >> Are you thinking something like this @albertnetymk? >> Suggestion: >> >> // First check if there is a large page failure associated with >> // the data structure being checked. In case of a large page >> // allocation failure the output will include logs like this for >> // the affected data structure: >> // [0.048s][debug][gc,heap,coops] Reserve regular memory without large pages >> // [0.048s][info ][pagesize ] Next Bitmap: ... page_size=4K ... > > Yes, and also for `checkLargePagesEnabled`. It's not obvious to me why we parse the output in those two places, one looking for the failure mode, and the other looking for the success mode. That's why I asked for an sample of the "expected" log output. I think the check if large pages are enable is pretty straight forward. We should never expect large page sizes in the output unless large pages are enabled. I do however agree that this check is a bit clunky. Would it help to extract it to a separate function? Something like `largePagesAllocationFailure(pattern)`, I could also change the name of the function above to just be `largePagesEnabled()` then the code reads even better? ------------- PR: https://git.openjdk.java.net/jdk/pull/2486 From rkennke at redhat.com Thu Feb 11 17:55:24 2021 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 11 Feb 2021 18:55:24 +0100 Subject: Can GC implementations provide a cheap estimation of live set size? In-Reply-To: References: Message-ID: Notice that liveness information is only somewhat reliable right after marking. In Shenandoah, this is in the final-mark pause, and then the program is at a safepoint already. This is where you'd want to emit a JMX event or something similar. You can't simply query a counter and assume it represents current liveness in the middle or outside of GC cycle. This should be true for all GCs. For Serial and Parallel I am not sure at all that you can do this. AFAIK, they don't count liveness at all. Roman > Hi Roman, > > Thanks for your response. I checked ZGC implementation and, indeed, it > is very easy to get the liveness information just by extending > `ZStatHeap` class to report the last valid value of > `_at_mark_end.live`. > > I am also able to get this info from Shenandoah, although my first > attempt still involves a safepointing VM operation since I need to > iterate over regions to get the liveness info for each of them and sum > it up. I think it is still an acceptable trade-off, though. > > The next one in the queue is the Serial GC. My assumptions, based on > reading the code, are that for young gen 'live = used' at the end of > DefNewGeneration::collect() method and for old gen 'live = used - > slack' (slack is the cumulative size of objects considered to be alive > for the purpose of compaction although they are really dead - see > CompactibleSpace::scan_and_forward()). Does this sound reasonable? > > I will post my findings for Parallel GC and G1 GC later. > > Cheers, > > -JB- > > On Wed, Feb 10, 2021 at 11:34 AM Roman Kennke wrote: >> >> Hello Jaroslav, >> >>> In connection with https://bugs.openjdk.java.net/browse/JDK-8258431 I >>> am trying to figure out whether providing a cheap estimation of live >>> set size is something actually achievable across various GC >>> implementations. >>> >>> What I am looking at is piggy-backing on a concurrent mark task to get >>> the summary size of live objects - using the 'straight-forward' >>> heap-inspection like approach is prohibitively expensive. >> >> In Shenandoah, this information is already collected during concurrent >> marking. We currently don't print it directly, but we could certainly do >> that. I'll look into implementing it. I'll also look into exposing >> liveness info via JMX. >> >> I'm not quite sure about G1: that information would only be collected >> during mixed or full collections. I am not sure if G1 prints it, though. >> >> ZGC prints this under -Xlog:gc+heap: >> >> [6,502s][info][gc,heap ] GC(0) Mark Start >> Mark End Relocate Start Relocate End High >> Low >> [6,502s][info][gc,heap ] GC(0) Capacity: 834M (10%) >> 1076M (13%) 1092M (14%) 1092M (14%) 1092M (14%) >> 834M (10%) >> [6,502s][info][gc,heap ] GC(0) Free: 7154M (90%) >> 6912M (87%) 6916M (87%) 7388M (92%) 7388M (92%) >> 6896M (86%) >> [6,502s][info][gc,heap ] GC(0) Used: 834M (10%) >> 1076M (13%) 1072M (13%) 600M (8%) 1092M (14%) >> 600M (8%) >> [6,502s][info][gc,heap ] GC(0) Live: - >> 195M (2%) 195M (2%) 195M (2%) - >> - >> [6,502s][info][gc,heap ] GC(0) Allocated: - >> 242M (3%) 270M (3%) 380M (5%) - >> - >> [6,502s][info][gc,heap ] GC(0) Garbage: - >> 638M (8%) 606M (8%) 24M (0%) - >> - >> [6,502s][info][gc,heap ] GC(0) Reclaimed: - >> - 32M (0%) 614M (8%) - >> - >> >> I hope that is useful? >> >> Thanks, >> Roman >> > From jaroslav.bachorik at datadoghq.com Thu Feb 11 18:09:38 2021 From: jaroslav.bachorik at datadoghq.com (=?UTF-8?Q?Jaroslav_Bachor=C3=ADk?=) Date: Thu, 11 Feb 2021 19:09:38 +0100 Subject: Can GC implementations provide a cheap estimation of live set size? In-Reply-To: References: Message-ID: On Thu, Feb 11, 2021 at 6:55 PM Roman Kennke wrote: > > Notice that liveness information is only somewhat reliable right after > marking. In Shenandoah, this is in the final-mark pause, and then the Yes, I understand this. What I am looking at is to have something like 'last known liveness' value - captured at a well defined point and providing an estimate within the bounds of GC implementation. > program is at a safepoint already. This is where you'd want to emit a > JMX event or something similar. You can't simply query a counter and > assume it represents current liveness in the middle or outside of GC > cycle. This should be true for all GCs. > > For Serial and Parallel I am not sure at all that you can do this. > AFAIK, they don't count liveness at all. > > Roman > > > Hi Roman, > > > > Thanks for your response. I checked ZGC implementation and, indeed, it > > is very easy to get the liveness information just by extending > > `ZStatHeap` class to report the last valid value of > > `_at_mark_end.live`. > > > > I am also able to get this info from Shenandoah, although my first > > attempt still involves a safepointing VM operation since I need to > > iterate over regions to get the liveness info for each of them and sum > > it up. I think it is still an acceptable trade-off, though. > > > > The next one in the queue is the Serial GC. My assumptions, based on > > reading the code, are that for young gen 'live = used' at the end of > > DefNewGeneration::collect() method and for old gen 'live = used - > > slack' (slack is the cumulative size of objects considered to be alive > > for the purpose of compaction although they are really dead - see > > CompactibleSpace::scan_and_forward()). Does this sound reasonable? > > > > I will post my findings for Parallel GC and G1 GC later. > > > > Cheers, > > > > -JB- > > > > On Wed, Feb 10, 2021 at 11:34 AM Roman Kennke wrote: > >> > >> Hello Jaroslav, > >> > >>> In connection with https://bugs.openjdk.java.net/browse/JDK-8258431 I > >>> am trying to figure out whether providing a cheap estimation of live > >>> set size is something actually achievable across various GC > >>> implementations. > >>> > >>> What I am looking at is piggy-backing on a concurrent mark task to get > >>> the summary size of live objects - using the 'straight-forward' > >>> heap-inspection like approach is prohibitively expensive. > >> > >> In Shenandoah, this information is already collected during concurrent > >> marking. We currently don't print it directly, but we could certainly do > >> that. I'll look into implementing it. I'll also look into exposing > >> liveness info via JMX. > >> > >> I'm not quite sure about G1: that information would only be collected > >> during mixed or full collections. I am not sure if G1 prints it, though. > >> > >> ZGC prints this under -Xlog:gc+heap: > >> > >> [6,502s][info][gc,heap ] GC(0) Mark Start > >> Mark End Relocate Start Relocate End High > >> Low > >> [6,502s][info][gc,heap ] GC(0) Capacity: 834M (10%) > >> 1076M (13%) 1092M (14%) 1092M (14%) 1092M (14%) > >> 834M (10%) > >> [6,502s][info][gc,heap ] GC(0) Free: 7154M (90%) > >> 6912M (87%) 6916M (87%) 7388M (92%) 7388M (92%) > >> 6896M (86%) > >> [6,502s][info][gc,heap ] GC(0) Used: 834M (10%) > >> 1076M (13%) 1072M (13%) 600M (8%) 1092M (14%) > >> 600M (8%) > >> [6,502s][info][gc,heap ] GC(0) Live: - > >> 195M (2%) 195M (2%) 195M (2%) - > >> - > >> [6,502s][info][gc,heap ] GC(0) Allocated: - > >> 242M (3%) 270M (3%) 380M (5%) - > >> - > >> [6,502s][info][gc,heap ] GC(0) Garbage: - > >> 638M (8%) 606M (8%) 24M (0%) - > >> - > >> [6,502s][info][gc,heap ] GC(0) Reclaimed: - > >> - 32M (0%) 614M (8%) - > >> - > >> > >> I hope that is useful? > >> > >> Thanks, > >> Roman > >> > > > From zgu at openjdk.java.net Thu Feb 11 18:10:41 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 11 Feb 2021 18:10:41 GMT Subject: RFR: 8261493: Shenandoah: reconsider bitmap access memory ordering In-Reply-To: References: Message-ID: On Wed, 10 Feb 2021 09:32:18 GMT, Aleksey Shipilev wrote: > Shenandoah currently uses its own marking bitmap (added by JDK-8254315). It accesses the marking bitmap with "acquire" for reads and "conservative" for updates. Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. > > I think both are actually excessive for marking bitmap accesses: we do not piggyback object updates on it, the atomics there are only to guarantee the access atomicity and CAS updates to bits. It seems "relaxed" is enough for marking bitmap accesses. > > Sample run with "compact" (frequent GC cycles) on SPECjvm2008:compiler.sunflow on AArch64: > > # Baseline > # Baseline > [146.028s][info][gc,stats] Concurrent Marking = 50.315 s (a = 258024 us) (n = 195) (lvls, us = 31836, 230469, 273438, 306641, 464255) > [141.458s][info][gc,stats] Concurrent Marking = 47.819 s (a = 242737 us) (n = 197) (lvls, us = 42773, 197266, 267578, 287109, 433948) > [144.108s][info][gc,stats] Concurrent Marking = 49.806 s (a = 250283 us) (n = 199) (lvls, us = 32227, 201172, 267578, 296875, 448549) > > # Patched > [144.238s][info][gc,stats] Concurrent Marking = 46.627 s (a = 220981 us) (n = 211) (lvls, us = 24414, 197266, 238281, 259766, 345112) > [138.406s][info][gc,stats] Concurrent Marking = 45.022 s (a = 227383 us) (n = 198) (lvls, us = 20508, 205078, 244141, 271484, 427658) > [140.950s][info][gc,stats] Concurrent Marking = 45.073 s (a = 222036 us) (n = 203) (lvls, us = 21680, 181641, 240234, 265625, 375750) > > Average time goes down, total marking time goes down. > > Additional testing: > - [x] Linux x86_64 `hotspot_gc_shenandoah` > - [x] Linux AArch64 `hotspot_gc_shenandoah` > - [x] Linux AArch64 `tier1` with Shenandoah Now, you need load barrier on read side (e.g. ShenandoahMarkBitMap::at()). Although, it is not a correctness issue, but seeing stale value means extra unnecessary work. ------------- Changes requested by zgu (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2497 From sjohanss at openjdk.java.net Thu Feb 11 18:10:57 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Thu, 11 Feb 2021 18:10:57 GMT Subject: RFR: 8261230: GC tracing of page sizes are wrong in a few places [v2] In-Reply-To: References: Message-ID: > The usage of `os::trace_page_sizes()` and friends are wrongly assuming that we always get the page size requested and needs to be updated. This is done by using the helper `ReservedSpace::actual_reserved_page_size()` instead of blindly trusting we get what we ask for. I have plans for the future to get rid of this helper and instead record the page size used in the `ReservedSpace`, but for now the helper is good enough. > > In G1 we used the helper but switched the order of the page size and the alignment parameter, which in turn helped the test to pass since the alignment will match the page size we expect in the test. The test had to be improved to recognize mapping failures. Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: Albert review Renamed helper to improve how the code read. Also extracted the failure check into a separate function. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2486/files - new: https://git.openjdk.java.net/jdk/pull/2486/files/76faa2e4..b46f6a75 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2486&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2486&range=00-01 Stats: 22 lines in 1 file changed: 15 ins; 1 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/2486.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2486/head:pull/2486 PR: https://git.openjdk.java.net/jdk/pull/2486 From martin.doerr at sap.com Thu Feb 11 18:29:42 2021 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 11 Feb 2021 18:29:42 +0000 Subject: Atomic operations: your thoughts are welocme In-Reply-To: <49d0408a-13f9-ddc8-06e3-e0eb27a708dd@redhat.com> References: <448C638F-D688-4913-875C-5D8BA9235126@oracle.com> <49d0408a-13f9-ddc8-06e3-e0eb27a708dd@redhat.com> Message-ID: Hi, I appreciate this investigation. PPC64 has optimized versions for _relaxed, _acquire, _release and _acq_rel which are substantially faster than the other memory order modes. So we should be able to observe performance improvements when any of these ones are used in hot code. Best regards, Martin > -----Original Message----- > From: hotspot-dev On Behalf Of > Andrew Haley > Sent: Donnerstag, 11. Februar 2021 14:34 > To: Kim Barrett > Cc: hotspot-gc-dev openjdk.java.net ; > hotspot-dev at openjdk.java.net > Subject: Re: Atomic operations: your thoughts are welocme > > On 11/02/2021 03:59, Kim Barrett wrote: > >> On Feb 8, 2021, at 1:14 PM, Andrew Haley wrote: > >> > >> I've been looking at the hottest Atomic operations in HotSpot, with a view > to > >> finding out if the default memory_order_conservative (which is very > expensive > >> on some architectures) can be weakened to something less. It's > impossible to > >> fix all of them, but perhaps we can fix some of the most frequent. > > > > Is there any information about the possible performance improvement > from > > such changes? 1.5-3M occurrences doesn't mean much without context. > > > > We don't presently have support for sequentially consistent semantics, > only > > "conservative". My recollection is that this is in part because there might > > be code that is assuming the possibly stronger "conservative" semantics, > and > > in part because there are different and incompatible approaches to > > implementing sequentially consistent semantics on some hardware > platforms > > and we didn't want to make assumptions there. > > > > We also don't presently have any cmpxchg implementation that really > supports > > anything between conservative and relaxed, nor do we support different > order > > constraints for the success vs failure cases. Things can be complicated > > enough as is; while we *could* fill some of that in, I'm not sure we should. > > OK. However, even though we don't implement any of them, we do have an > API that includes acq, rel, and seq_cst. The fact that we don't have > anything behind them is, I thought, To Be Done rather than Won't Do. > > >> > ::Table::oop_oop_iterate anceKlass, narrowOop>(G1CMOopClosure*, oopDesc*, Klass*)+336>: :: = > 3903178 > >> > >> This is actually MarkBitMap::par_mark calling BitMap::par_set_bit. Does > this > >> need to be memory_order_conservative, or would something weaker > do? Even > >> acq_rel or seq_cst would be better. > > > > I think for setting bits in a bitmap the thing to do would be to identify > > places that are safe and useful (impacts performance) to do so first. Then > > add a weaker variant for use in those places, assuming any are found. > > I see. I'm assuming that frequency of use is a useful proxy for impact. > Aleksey has already, very helpfully, measured how significant these are > for Shenandoah, and I suspect all concurrent GCs would benefit in a > similar fashion. > > >> : :: = 2376632 > >> : :: = 2003895 > >> > >> I can't imagine that either of these actually need > memory_order_conservative, > >> they're just reference counts. > > > > The "usual" refcount implementation involves relaxed increment and > stronger > > ordering for decrement. (If I'm remembering correctly, dec-acquire and a > > release fence on the zero value path before deleting. But I've not thought > > about what one might want for this CAS-based variant that handles > boundary > > cases specially.) And as you say, whether any of these could be weakened > > depends on whether there is any code surrounding a use that depends on > the > > stronger ordering semantics. At a guess I suspect increment could be > changed > > to relaxed, but I've not looked. This one is probably a question for runtime > > folks. > > OK, this makes sense. I'm thinking of the long road to getting this stuff > documented so that we can see what side effects of atomic operations are > actually required. > > >> : :: = > 1719614 > >> > >> BitMap::par_set_bit again. > >> > >> > erflowTaskQueue, > (MEMFLAGS)5>*)+432>: :: = 1617659 > >> > >> This one is GenericTaskQueue::pop_global calling cmpxchg_age(). > >> Again, do we need conservative here? > > > > This needs at least sequentially consistent semantics on the success path. > > Yep. That's easy, it's the full barrier in the failure path that > I'd love to eliminate. > > > See the referenced paper by Le, et al. > > > > There is also a cmpxchg_age in pop_local_slow. The Le, et al paper doesn't > > deal with that path. But it's also not in your list, which is good since > > this is supposed to be infrequently taken. > > Right. I'm trying to concentrate on the low-hanging fruit. > > Thank you for the very detailed and informative reply. > > -- > Andrew Haley (he/him) > Java Platform Lead Engineer > Red Hat UK Ltd. > https://keybase.io/andrewhaley > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From sjohanss at openjdk.java.net Thu Feb 11 18:30:38 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Thu, 11 Feb 2021 18:30:38 GMT Subject: RFR: 8261230: GC tracing of page sizes are wrong in a few places [v2] In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 17:31:57 GMT, Thomas Stuefe wrote: > Looks good. `os::trace_page_sizes_for_requested_size` is not easy to understand, especially with the alignment vs preferred_page_size semantic. Not sure what alignment has to do with preferred page size. Thanks. I agree that it is not that straight forward but I think the intention here is to pass in both page size and alignment to make it easier to understand why the actual size might be large than the requested size. For example in cases like this: [debug][gc,heap,coops] Reserve regular memory without large pages [info ][pagesize ] Next Bitmap: req_size=32000K base=0x00007fa4ef400000 page_size=4K alignment=2M size=32M > We could prevent these kind of errors and make the code more readable by introducing a page size enum. We only have a handful of valid values anyway. Yes, there is certainly room for improvement in this area =) ------------- PR: https://git.openjdk.java.net/jdk/pull/2486 From sjohanss at openjdk.java.net Thu Feb 11 19:42:40 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Thu, 11 Feb 2021 19:42:40 GMT Subject: RFR: 8261505: Test test/hotspot/jtreg/gc/parallel/TestDynShrinkHeap.java killed by Linux OOM Killer In-Reply-To: <3qpvJUngDwJhM4n1g-8LcHrKmxzS45welhYUokK9u9o=.1b0db34b-c388-4bea-9f59-2e82dc4e36ce@github.com> References: <3qpvJUngDwJhM4n1g-8LcHrKmxzS45welhYUokK9u9o=.1b0db34b-c388-4bea-9f59-2e82dc4e36ce@github.com> Message-ID: On Wed, 10 Feb 2021 12:13:54 GMT, Christoph G?ttschkes wrote: > On memory constrained devices, the test might get killed by the linux kernel OOM Killer. > > Executing the test with the JTreg test harness makes the test fail and get killed by the OOM Killer. > Executing the test manually, by using the JTreg provided "rerun" command line, the test succeeds. > This happened on a Raspberry PI 2, which has only 1G of memory available. > > I added an "os.maxMemory" requirement, so the test gets skipped. Marked as reviewed by sjohanss (Reviewer). Hi Christoph, This looks good. The test is setting a 1GB max heap so it seems reasonable to require the system to have at least that. Another thing to look at when tests are getting killed by the OOM killer is the number of concurrent test jobs. For a system with 1GB of memory that should be 1, so in your case you can't go lower. To be certain you only run one test at a time you could run `configure` with `--with-test-jobs=1 `, but according to `doc/testing.md` this should be the default for your system: The test concurrency (`-concurrency`). Defaults to TEST_JOBS (if set by `--with-test-jobs=`), otherwise it defaults to JOBS, except for Hotspot, where the default is *number of CPU cores/2*, but never more than *memory size in GB/2*. The reason that the rerun succeeds is most likely because then you don't have the JTREG process running along side the test and consuming resources. ------------- PR: https://git.openjdk.java.net/jdk/pull/2507 From cjplummer at openjdk.java.net Thu Feb 11 23:52:58 2021 From: cjplummer at openjdk.java.net (Chris Plummer) Date: Thu, 11 Feb 2021 23:52:58 GMT Subject: RFR: 8247514: Improve clhsdb 'findpc' ability to determine what an address points to by improving PointerFinder and PointerLocation classes [v3] In-Reply-To: <4YKNpyXQ9QGrLhR61tkh71Q3A7VvCj5Ete_4OvzAA-o=.28b7be8c-6f05-42d4-892b-87ebea907b24@github.com> References: <4YKNpyXQ9QGrLhR61tkh71Q3A7VvCj5Ete_4OvzAA-o=.28b7be8c-6f05-42d4-892b-87ebea907b24@github.com> Message-ID: > See the bug for most details. A few notes here about some implementation details: > > In the `PointerLocation` class, I added more consistency w.r.t. whether or not a newline is printed. It used to for some address types, but not others. Now it always does. And if you see a comment something like the following: > > ` getTLAB().printOn(tty); // includes "\n" ` > > That's just clarifying whether or not the `printOn()` method called will include the newline. Some do and some don't, and knowing what the various `printOn()` methods do makes getting the proper inclusion of the newline easier to understand. > > I added `verbose` and `printAddress` boolean arguments to `PointerLocation.printOn()`. Currently they are always `true`. The false arguments will be used when I complete [JDK-8250801](https://bugs.openjdk.java.net/browse/JDK-8250801), which will use `PointerFinder/Location` to show what each register points to. > > The CR mentions that the main motivation for this work is for eventual replacement of the old clhsdb `whatis` command, which was implemented in javascript. It used to resolve DSO symbols, whereas `findpc` did not. The `whatis` code did this with the following: > > var dso = loadObjectContainingPC(addr); > if (dso == null) { > return ptrLoc.toString(); > } > var sym = dso.closestSymbolToPC(addr); > if (sym != null) { > return sym.name + '+' + sym.offset; > } > And now you'll see something similar in the PointerFinder code: > > loc.loadObject = cdbg.loadObjectContainingPC(a); > if (loc.loadObject != null) { > loc.nativeSymbol = loc.loadObject.closestSymbolToPC(a); > return loc; > } > Note that now that `findpc` does everything that `whatis` used to (and more), we don't really need to add a java version of `whatis`, but I'll probably do so anyway just help out people who are used to using the `whatis` command. That will be done using [JDK-8244670](https://bugs.openjdk.java.net/browse/JDK-8244670) Chris Plummer has updated the pull request incrementally with one additional commit since the last revision: Fix issue with parsing 'examine' output when there is unexecptected output due to CDS logging or -Xcheck:jni warnings. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2111/files - new: https://git.openjdk.java.net/jdk/pull/2111/files/69a8ae59..79cb1080 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2111&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2111&range=01-02 Stats: 5 lines in 1 file changed: 2 ins; 1 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2111.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2111/head:pull/2111 PR: https://git.openjdk.java.net/jdk/pull/2111 From cjplummer at openjdk.java.net Fri Feb 12 00:05:39 2021 From: cjplummer at openjdk.java.net (Chris Plummer) Date: Fri, 12 Feb 2021 00:05:39 GMT Subject: RFR: 8247514: Improve clhsdb 'findpc' ability to determine what an address points to by improving PointerFinder and PointerLocation classes [v3] In-Reply-To: <-09XRqbxFbZGkzqDVewiXrJjVNjuLMdZqfxjnxJf3Oc=.2da660b7-a5c1-40e1-81af-8dc814e199ca@github.com> References: <4YKNpyXQ9QGrLhR61tkh71Q3A7VvCj5Ete_4OvzAA-o=.28b7be8c-6f05-42d4-892b-87ebea907b24@github.com> <-09XRqbxFbZGkzqDVewiXrJjVNjuLMdZqfxjnxJf3Oc=.2da660b7-a5c1-40e1-81af-8dc814e199ca@github.com> Message-ID: On Tue, 2 Feb 2021 23:21:50 GMT, Yasumasa Suenaga wrote: >> Chris Plummer has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix issue with parsing 'examine' output when there is unexecptected output due to CDS logging or -Xcheck:jni warnings. > > LGTM @YaSuenag and @kevinjwalls I had to make a minor fix to the test. Can you please review it. The issued turned up when I ran some higher test tiers, one of which enabled CDS with some tracing and the other enabled `-Xcheck:jni`, which produced output due to [JDK-8261607](https://bugs.openjdk.java.net/browse/JDK-8261607). Both caused extra output that resulted in improperly parsing the `examine` output and not actually finding that address that it produced. This was because there were lines of output before even issuing the `examine` command that matched the pattern being looked for. I made the pattern more specific by including the tid of the thread. I also cleaned up the comments around that code a bit. thanks. ------------- PR: https://git.openjdk.java.net/jdk/pull/2111 From kbarrett at openjdk.java.net Fri Feb 12 03:50:41 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 12 Feb 2021 03:50:41 GMT Subject: RFR: 8260044: Parallel GC: Concurrent allocation after heap expansion may cause unnecessary full gc [v2] In-Reply-To: References: Message-ID: On Sun, 31 Jan 2021 17:10:55 GMT, Thomas Schatzl wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> require non-zero expand size > > Lgtm. Thanks. Thanks @tschatzl , @kstefanj , and @walulyai for reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/2309 From shade at openjdk.java.net Fri Feb 12 07:34:38 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 12 Feb 2021 07:34:38 GMT Subject: RFR: 8261493: Shenandoah: reconsider bitmap access memory ordering In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 18:07:58 GMT, Zhengyu Gu wrote: > Now, you need load barrier on read side (e.g. ShenandoahMarkBitMap::at()). Although, it is not a correctness issue, but seeing stale value means extra unnecessary work. I don't see why. Adding load barriers would not affect promptness of seeing the memory updates to the bitmap itself. It might affect the promptness of seeing the object contents that we are reading after asking `is_marked` -- but that would be a race either way, because we do not use mark bitmap for memory ordering at all (i.e. there is no "release" on bitmap update). ------------- PR: https://git.openjdk.java.net/jdk/pull/2497 From cgo at openjdk.java.net Fri Feb 12 07:41:38 2021 From: cgo at openjdk.java.net (Christoph =?UTF-8?B?R8O2dHRzY2hrZXM=?=) Date: Fri, 12 Feb 2021 07:41:38 GMT Subject: RFR: 8261505: Test test/hotspot/jtreg/gc/parallel/TestDynShrinkHeap.java killed by Linux OOM Killer In-Reply-To: References: <3qpvJUngDwJhM4n1g-8LcHrKmxzS45welhYUokK9u9o=.1b0db34b-c388-4bea-9f59-2e82dc4e36ce@github.com> Message-ID: On Thu, 11 Feb 2021 19:39:23 GMT, Stefan Johansson wrote: >> On memory constrained devices, the test might get killed by the linux kernel OOM Killer. >> >> Executing the test with the JTreg test harness makes the test fail and get killed by the OOM Killer. >> Executing the test manually, by using the JTreg provided "rerun" command line, the test succeeds. >> This happened on a Raspberry PI 2, which has only 1G of memory available. >> >> I added an "os.maxMemory" requirement, so the test gets skipped. > > Marked as reviewed by sjohanss (Reviewer). Hi Stefan, thanks for the review. I am aware of the concurrency feature of the JTreg runner and am always using a concurrency of 1 on embedded devices. Even if they are more powerful, since it makes the test execution less reliable. I found some more tests with the same problem, but will file a single bug and fix all in one go, as soon as I have time for that. ------------- PR: https://git.openjdk.java.net/jdk/pull/2507 From kbarrett at openjdk.java.net Fri Feb 12 08:24:59 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 12 Feb 2021 08:24:59 GMT Subject: RFR: 8260044: Parallel GC: Concurrent allocation after heap expansion may cause unnecessary full gc [v4] In-Reply-To: References: Message-ID: > Please review this change to ParallelGC to avoid unnecessary full GCs when > concurrent threads attempt oldgen allocations during evacuation. > > When a GC thread fails an oldgen allocation it expands the heap and retries > the allocation. If the second allocation attempt fails then allocation > failure is reported to the caller, which can lead to a full GC. But the > retried allocation could fail because, after expansion, some other thread > allocated enough of the available space that the retry fails. This can > happen even though there is plenty of space available, if only that retry > were to perform another expansion. > > Rather than trying to combine the allocation retry with the expansion (it's > not clear there's a way to do so without breaking invariants), we instead > simply loop on the allocation attempt + expand, until either the allocation > succeeds or the expand fails. If some other thread "steals" space from the > expanding thread and causes its next allocation attempt to fail and do > another expansion, that's functionally no different from the expanding > thread succeeding and causing the other thread to fail allocation and do the > expand instead. > > This change includes modifying PSOldGen::expand_to_reserved to return false > when there is no space available, where it previously returned true. It's > not clear why it returned true; that seems wrong, but was harmless. But it > must not do so with the new looping behavior for allocation, else it would > never terminate. > > Testing: > mach5 tier1-3, tier5 (tier2-3, 5 do a lot of ParallelGC testing) Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Merge branch 'master' into retry_alloc - Merge branch 'master' into retry_alloc - avoid expand storms - Merge branch 'master' into retry_alloc - require non-zero expand size - retry failed allocation if expand succeeds ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2309/files - new: https://git.openjdk.java.net/jdk/pull/2309/files/72431d39..d463925f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2309&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2309&range=02-03 Stats: 8695 lines in 369 files changed: 4618 ins; 2020 del; 2057 mod Patch: https://git.openjdk.java.net/jdk/pull/2309.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2309/head:pull/2309 PR: https://git.openjdk.java.net/jdk/pull/2309 From kbarrett at openjdk.java.net Fri Feb 12 08:25:00 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 12 Feb 2021 08:25:00 GMT Subject: Integrated: 8260044: Parallel GC: Concurrent allocation after heap expansion may cause unnecessary full gc In-Reply-To: References: Message-ID: <6tGBh9-yA3lFIiEsjOhL7Me8t0vlXB9unieHrubSIiU=.bc444e47-0595-4f8b-ae3e-d3c335ddc4bf@github.com> On Fri, 29 Jan 2021 08:24:13 GMT, Kim Barrett wrote: > Please review this change to ParallelGC to avoid unnecessary full GCs when > concurrent threads attempt oldgen allocations during evacuation. > > When a GC thread fails an oldgen allocation it expands the heap and retries > the allocation. If the second allocation attempt fails then allocation > failure is reported to the caller, which can lead to a full GC. But the > retried allocation could fail because, after expansion, some other thread > allocated enough of the available space that the retry fails. This can > happen even though there is plenty of space available, if only that retry > were to perform another expansion. > > Rather than trying to combine the allocation retry with the expansion (it's > not clear there's a way to do so without breaking invariants), we instead > simply loop on the allocation attempt + expand, until either the allocation > succeeds or the expand fails. If some other thread "steals" space from the > expanding thread and causes its next allocation attempt to fail and do > another expansion, that's functionally no different from the expanding > thread succeeding and causing the other thread to fail allocation and do the > expand instead. > > This change includes modifying PSOldGen::expand_to_reserved to return false > when there is no space available, where it previously returned true. It's > not clear why it returned true; that seems wrong, but was harmless. But it > must not do so with the new looping behavior for allocation, else it would > never terminate. > > Testing: > mach5 tier1-3, tier5 (tier2-3, 5 do a lot of ParallelGC testing) This pull request has now been integrated. Changeset: 6a84ec68 Author: Kim Barrett URL: https://git.openjdk.java.net/jdk/commit/6a84ec68 Stats: 57 lines in 4 files changed: 33 ins; 6 del; 18 mod 8260044: Parallel GC: Concurrent allocation after heap expansion may cause unnecessary full gc 8260045: Parallel GC: Waiting on ExpandHeap_lock may cause "expansion storm" Loop to retry allocation if expand succeeds. Treat space available after obtaining expand lock as expand success. Reviewed-by: tschatzl, iwalulya, sjohanss ------------- PR: https://git.openjdk.java.net/jdk/pull/2309 From sjohanss at openjdk.java.net Fri Feb 12 08:51:40 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Fri, 12 Feb 2021 08:51:40 GMT Subject: RFR: 8261505: Test test/hotspot/jtreg/gc/parallel/TestDynShrinkHeap.java killed by Linux OOM Killer In-Reply-To: References: <3qpvJUngDwJhM4n1g-8LcHrKmxzS45welhYUokK9u9o=.1b0db34b-c388-4bea-9f59-2e82dc4e36ce@github.com> Message-ID: On Fri, 12 Feb 2021 07:38:38 GMT, Christoph G?ttschkes wrote: >> Marked as reviewed by sjohanss (Reviewer). > > Hi Stefan, > > thanks for the review. I am aware of the concurrency feature of the JTreg runner and am always using a concurrency of 1 on embedded devices. Even if they are more powerful, since it makes the test execution less reliable. > > I found some more tests with the same problem, but will file a single bug and fix all in one go, as soon as I have time for that. I'll wait for a second reviewer before sponsoring this, just in case anyone has a different view on how to handle this. ------------- PR: https://git.openjdk.java.net/jdk/pull/2507 From tschatzl at openjdk.java.net Fri Feb 12 09:37:40 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 12 Feb 2021 09:37:40 GMT Subject: RFR: 8260044: Parallel GC: Concurrent allocation after heap expansion may cause unnecessary full gc [v2] In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 03:47:36 GMT, Kim Barrett wrote: >> Lgtm. Thanks. > > Thanks @tschatzl , @kstefanj , and @walulyai for reviews. Still looks good. Thanks. (I am aware this change has already been integrated). ------------- PR: https://git.openjdk.java.net/jdk/pull/2309 From ysuenaga at openjdk.java.net Fri Feb 12 09:38:46 2021 From: ysuenaga at openjdk.java.net (Yasumasa Suenaga) Date: Fri, 12 Feb 2021 09:38:46 GMT Subject: RFR: 8247514: Improve clhsdb 'findpc' ability to determine what an address points to by improving PointerFinder and PointerLocation classes [v3] In-Reply-To: References: <4YKNpyXQ9QGrLhR61tkh71Q3A7VvCj5Ete_4OvzAA-o=.28b7be8c-6f05-42d4-892b-87ebea907b24@github.com> Message-ID: On Thu, 11 Feb 2021 23:52:58 GMT, Chris Plummer wrote: >> See the bug for most details. A few notes here about some implementation details: >> >> In the `PointerLocation` class, I added more consistency w.r.t. whether or not a newline is printed. It used to for some address types, but not others. Now it always does. And if you see a comment something like the following: >> >> ` getTLAB().printOn(tty); // includes "\n" ` >> >> That's just clarifying whether or not the `printOn()` method called will include the newline. Some do and some don't, and knowing what the various `printOn()` methods do makes getting the proper inclusion of the newline easier to understand. >> >> I added `verbose` and `printAddress` boolean arguments to `PointerLocation.printOn()`. Currently they are always `true`. The false arguments will be used when I complete [JDK-8250801](https://bugs.openjdk.java.net/browse/JDK-8250801), which will use `PointerFinder/Location` to show what each register points to. >> >> The CR mentions that the main motivation for this work is for eventual replacement of the old clhsdb `whatis` command, which was implemented in javascript. It used to resolve DSO symbols, whereas `findpc` did not. The `whatis` code did this with the following: >> >> var dso = loadObjectContainingPC(addr); >> if (dso == null) { >> return ptrLoc.toString(); >> } >> var sym = dso.closestSymbolToPC(addr); >> if (sym != null) { >> return sym.name + '+' + sym.offset; >> } >> And now you'll see something similar in the PointerFinder code: >> >> loc.loadObject = cdbg.loadObjectContainingPC(a); >> if (loc.loadObject != null) { >> loc.nativeSymbol = loc.loadObject.closestSymbolToPC(a); >> return loc; >> } >> Note that now that `findpc` does everything that `whatis` used to (and more), we don't really need to add a java version of `whatis`, but I'll probably do so anyway just help out people who are used to using the `whatis` command. That will be done using [JDK-8244670](https://bugs.openjdk.java.net/browse/JDK-8244670) > > Chris Plummer has updated the pull request incrementally with one additional commit since the last revision: > > Fix issue with parsing 'examine' output when there is unexecptected output due to CDS logging or -Xcheck:jni warnings. LGTM We may be able to use regex to collect any addresses from jstack output, but I'm not sure it makes the test code simpler... ------------- Marked as reviewed by ysuenaga (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2111 From kevinw at openjdk.java.net Fri Feb 12 09:38:46 2021 From: kevinw at openjdk.java.net (Kevin Walls) Date: Fri, 12 Feb 2021 09:38:46 GMT Subject: RFR: 8247514: Improve clhsdb 'findpc' ability to determine what an address points to by improving PointerFinder and PointerLocation classes [v3] In-Reply-To: References: <4YKNpyXQ9QGrLhR61tkh71Q3A7VvCj5Ete_4OvzAA-o=.28b7be8c-6f05-42d4-892b-87ebea907b24@github.com> Message-ID: On Thu, 11 Feb 2021 23:52:58 GMT, Chris Plummer wrote: >> See the bug for most details. A few notes here about some implementation details: >> >> In the `PointerLocation` class, I added more consistency w.r.t. whether or not a newline is printed. It used to for some address types, but not others. Now it always does. And if you see a comment something like the following: >> >> ` getTLAB().printOn(tty); // includes "\n" ` >> >> That's just clarifying whether or not the `printOn()` method called will include the newline. Some do and some don't, and knowing what the various `printOn()` methods do makes getting the proper inclusion of the newline easier to understand. >> >> I added `verbose` and `printAddress` boolean arguments to `PointerLocation.printOn()`. Currently they are always `true`. The false arguments will be used when I complete [JDK-8250801](https://bugs.openjdk.java.net/browse/JDK-8250801), which will use `PointerFinder/Location` to show what each register points to. >> >> The CR mentions that the main motivation for this work is for eventual replacement of the old clhsdb `whatis` command, which was implemented in javascript. It used to resolve DSO symbols, whereas `findpc` did not. The `whatis` code did this with the following: >> >> var dso = loadObjectContainingPC(addr); >> if (dso == null) { >> return ptrLoc.toString(); >> } >> var sym = dso.closestSymbolToPC(addr); >> if (sym != null) { >> return sym.name + '+' + sym.offset; >> } >> And now you'll see something similar in the PointerFinder code: >> >> loc.loadObject = cdbg.loadObjectContainingPC(a); >> if (loc.loadObject != null) { >> loc.nativeSymbol = loc.loadObject.closestSymbolToPC(a); >> return loc; >> } >> Note that now that `findpc` does everything that `whatis` used to (and more), we don't really need to add a java version of `whatis`, but I'll probably do so anyway just help out people who are used to using the `whatis` command. That will be done using [JDK-8244670](https://bugs.openjdk.java.net/browse/JDK-8244670) > > Chris Plummer has updated the pull request incrementally with one additional commit since the last revision: > > Fix issue with parsing 'examine' output when there is unexecptected output due to CDS logging or -Xcheck:jni warnings. Yes, joy of text processing. Looks good. ------------- Marked as reviewed by kevinw (Committer). PR: https://git.openjdk.java.net/jdk/pull/2111 From tschatzl at openjdk.java.net Fri Feb 12 09:39:38 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 12 Feb 2021 09:39:38 GMT Subject: RFR: 8261505: Test test/hotspot/jtreg/gc/parallel/TestDynShrinkHeap.java killed by Linux OOM Killer In-Reply-To: <3qpvJUngDwJhM4n1g-8LcHrKmxzS45welhYUokK9u9o=.1b0db34b-c388-4bea-9f59-2e82dc4e36ce@github.com> References: <3qpvJUngDwJhM4n1g-8LcHrKmxzS45welhYUokK9u9o=.1b0db34b-c388-4bea-9f59-2e82dc4e36ce@github.com> Message-ID: <-gM9imm9AV0u9JQOKkDJ5MpM23p31Tvipg1iP7-26u0=.fbac4b67-194f-4bea-8382-609b7b2092da@github.com> On Wed, 10 Feb 2021 12:13:54 GMT, Christoph G?ttschkes wrote: > On memory constrained devices, the test might get killed by the linux kernel OOM Killer. > > Executing the test with the JTreg test harness makes the test fail and get killed by the OOM Killer. > Executing the test manually, by using the JTreg provided "rerun" command line, the test succeeds. > This happened on a Raspberry PI 2, which has only 1G of memory available. > > I added an "os.maxMemory" requirement, so the test gets skipped. Lgtm. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2507 From cgo at openjdk.java.net Fri Feb 12 09:44:39 2021 From: cgo at openjdk.java.net (Christoph =?UTF-8?B?R8O2dHRzY2hrZXM=?=) Date: Fri, 12 Feb 2021 09:44:39 GMT Subject: Integrated: 8261505: Test test/hotspot/jtreg/gc/parallel/TestDynShrinkHeap.java killed by Linux OOM Killer In-Reply-To: <3qpvJUngDwJhM4n1g-8LcHrKmxzS45welhYUokK9u9o=.1b0db34b-c388-4bea-9f59-2e82dc4e36ce@github.com> References: <3qpvJUngDwJhM4n1g-8LcHrKmxzS45welhYUokK9u9o=.1b0db34b-c388-4bea-9f59-2e82dc4e36ce@github.com> Message-ID: On Wed, 10 Feb 2021 12:13:54 GMT, Christoph G?ttschkes wrote: > On memory constrained devices, the test might get killed by the linux kernel OOM Killer. > > Executing the test with the JTreg test harness makes the test fail and get killed by the OOM Killer. > Executing the test manually, by using the JTreg provided "rerun" command line, the test succeeds. > This happened on a Raspberry PI 2, which has only 1G of memory available. > > I added an "os.maxMemory" requirement, so the test gets skipped. This pull request has now been integrated. Changeset: ebaa58d9 Author: Christoph G?ttschkes Committer: Stefan Johansson URL: https://git.openjdk.java.net/jdk/commit/ebaa58d9 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8261505: Test test/hotspot/jtreg/gc/parallel/TestDynShrinkHeap.java killed by Linux OOM Killer Reviewed-by: sjohanss, tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/2507 From kim.barrett at oracle.com Fri Feb 12 09:58:37 2021 From: kim.barrett at oracle.com (Kim Barrett) Date: Fri, 12 Feb 2021 09:58:37 +0000 Subject: Atomic operations: your thoughts are welocme In-Reply-To: <49d0408a-13f9-ddc8-06e3-e0eb27a708dd@redhat.com> References: <448C638F-D688-4913-875C-5D8BA9235126@oracle.com> <49d0408a-13f9-ddc8-06e3-e0eb27a708dd@redhat.com> Message-ID: <032D7C47-8862-4FDE-9B88-CE209D64C46F@oracle.com> > On Feb 11, 2021, at 8:33 AM, Andrew Haley wrote: > > On 11/02/2021 03:59, Kim Barrett wrote: >>> On Feb 8, 2021, at 1:14 PM, Andrew Haley wrote: >>> >>> I've been looking at the hottest Atomic operations in HotSpot, with a view to >>> finding out if the default memory_order_conservative (which is very expensive >>> on some architectures) can be weakened to something less. It's impossible to >>> fix all of them, but perhaps we can fix some of the most frequent. >> >> Is there any information about the possible performance improvement from >> such changes? 1.5-3M occurrences doesn't mean much without context. >> >> We don't presently have support for sequentially consistent semantics, only >> "conservative". My recollection is that this is in part because there might >> be code that is assuming the possibly stronger "conservative" semantics, and >> in part because there are different and incompatible approaches to >> implementing sequentially consistent semantics on some hardware platforms >> and we didn't want to make assumptions there. >> >> We also don't presently have any cmpxchg implementation that really supports >> anything between conservative and relaxed, nor do we support different order >> constraints for the success vs failure cases. Things can be complicated >> enough as is; while we *could* fill some of that in, I'm not sure we should. > > OK. However, even though we don't implement any of them, we do have an > API that includes acq, rel, and seq_cst. The fact that we don't have > anything behind them is, I thought, To Be Done rather than Won't Do. My inclination is to be pretty conservative in this area. (No pun intended.) I'm not eager to have a lot of reviews like that for JDK-8154736. (And in looking back at that, I see we ended up not addressing non-ppc platforms, even though there was specific concern at the time that by not dealing with them (particularly arm/aarch64) that we might be fobbing off some really hard debugging on some poor future person.) >>> ::Table::oop_oop_iterate(G1CMOopClosure*, oopDesc*, Klass*)+336>: :: = 3903178 >>> >>> This is actually MarkBitMap::par_mark calling BitMap::par_set_bit. Does this >>> need to be memory_order_conservative, or would something weaker do? Even >>> acq_rel or seq_cst would be better. >> >> I think for setting bits in a bitmap the thing to do would be to identify >> places that are safe and useful (impacts performance) to do so first. Then >> add a weaker variant for use in those places, assuming any are found. > > I see. I'm assuming that frequency of use is a useful proxy for impact. > Aleksey has already, very helpfully, measured how significant these are > for Shenandoah, and I suspect all concurrent GCs would benefit in a > similar fashion. Absolute counts don't say much without context. So what if there are a million of these, if they are swamped by the 100 bazillion not-these? Aleksey's measurements turned out to be less informative to me than they seemed at first reading. Many of the proposed changes involve simple counters or accumulators. Changing such to use relaxed atomic addition operations is likely an easy improvement. But even that can suffer badly from contention. If one is serious about reducing the cost of multi-threaded accumulators, much better would be something like http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p0261r4.html >>> , (MEMFLAGS)5>*)+432>: :: = 1617659 >>> >>> This one is GenericTaskQueue::pop_global calling cmpxchg_age(). >>> Again, do we need conservative here? >> >> This needs at least sequentially consistent semantics on the success path. > > Yep. That's easy, it's the full barrier in the failure path that > I'd love to eliminate. Why does the failure path matter here? It should be rare [*], since it only fails when either there is contention between a thief and the owner for the sole entry in the queue, or there is contention between multiple thieves. The former should be rare because non-empty queues usually contain more than one element. The latter should be rare because of the random selection of queues the steal from. And in both cases a losing thief will look for a new queue to steal from. [*] The age/top (where pop_global takes from) and bottom (where push adds and pop_local takes from) used to be adjacent members, so local operations might induce false-sharing failures for the age/top CAS. These members were separated in JDK 15. From aph at redhat.com Fri Feb 12 10:25:42 2021 From: aph at redhat.com (Andrew Haley) Date: Fri, 12 Feb 2021 10:25:42 +0000 Subject: Atomic operations: your thoughts are welocme In-Reply-To: <032D7C47-8862-4FDE-9B88-CE209D64C46F@oracle.com> References: <448C638F-D688-4913-875C-5D8BA9235126@oracle.com> <49d0408a-13f9-ddc8-06e3-e0eb27a708dd@redhat.com> <032D7C47-8862-4FDE-9B88-CE209D64C46F@oracle.com> Message-ID: <28181731-2880-5f85-80db-354881440295@redhat.com> On 12/02/2021 09:58, Kim Barrett wrote: >> On Feb 11, 2021, at 8:33 AM, Andrew Haley wrote: >> >> On 11/02/2021 03:59, Kim Barrett wrote: >>> >>> We also don't presently have any cmpxchg implementation that really supports >>> anything between conservative and relaxed, nor do we support different order >>> constraints for the success vs failure cases. Things can be complicated >>> enough as is; while we *could* fill some of that in, I'm not sure we should. >> >> OK. However, even though we don't implement any of them, we do have an >> API that includes acq, rel, and seq_cst. The fact that we don't have >> anything behind them is, I thought, To Be Done rather than Won't Do. > > My inclination is to be pretty conservative in this area. (No pun intended.) > I'm not eager to have a lot of reviews like that for JDK-8154736. (And in > looking back at that, I see we ended up not addressing non-ppc platforms, > even though there was specific concern at the time that by not dealing with > them (particularly arm/aarch64) that we might be fobbing off some really > hard debugging on some poor future person.) Sure, and as you are probably aware I've had to do that, more than once, on dusty old GC code that didn't follow the memory model. IMVHO, there are not many places where seq_cst won't be adequate. >> I see. I'm assuming that frequency of use is a useful proxy for impact. >> Aleksey has already, very helpfully, measured how significant these are >> for Shenandoah, and I suspect all concurrent GCs would benefit in a >> similar fashion. > > Absolute counts don't say much without context. So what if there are a > million of these, if they are swamped by the 100 bazillion not-these? > > Aleksey's measurements turned out to be less informative to me than they > seemed at first reading. Many of the proposed changes involve simple > counters or accumulators. Changing such to use relaxed atomic addition > operations is likely an easy improvement. But even that can suffer badly > from contention. If one is serious about reducing the cost of multi-threaded > accumulators, much better would be something like > http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p0261r4.html I very strongly disagree. Aleksey managed to prove a substantial gain with only a couple of hours' work. We're talking about low- hanging fruit here. >>>> , (MEMFLAGS)5>*)+432>: :: = 1617659 >>>> >>>> This one is GenericTaskQueue::pop_global calling cmpxchg_age(). >>>> Again, do we need conservative here? >>> >>> This needs at least sequentially consistent semantics on the success path. >> >> Yep. That's easy, it's the full barrier in the failure path that >> I'd love to eliminate. > > Why does the failure path matter here? > > It should be rare [*], since it only fails when either there is contention > between a thief and the owner for the sole entry in the queue, or there is > contention between multiple thieves. OK, so that's useful guidance for an implementer: full barriers for CAS failures should be wrapped in a conditional. That is a pain, because it complexifies the code, but OK. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From ayang at openjdk.java.net Fri Feb 12 12:27:45 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Fri, 12 Feb 2021 12:27:45 GMT Subject: RFR: 8261230: GC tracing of page sizes are wrong in a few places [v2] In-Reply-To: References: Message-ID: On Thu, 11 Feb 2021 18:10:57 GMT, Stefan Johansson wrote: >> The usage of `os::trace_page_sizes()` and friends are wrongly assuming that we always get the page size requested and needs to be updated. This is done by using the helper `ReservedSpace::actual_reserved_page_size()` instead of blindly trusting we get what we ask for. I have plans for the future to get rid of this helper and instead record the page size used in the `ReservedSpace`, but for now the helper is good enough. >> >> In G1 we used the helper but switched the order of the page size and the alignment parameter, which in turn helped the test to pass since the alignment will match the page size we expect in the test. The test had to be improved to recognize mapping failures. > > Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: > > Albert review > > Renamed helper to improve how the code read. Also extracted the failure check into a separate function. Marked as reviewed by ayang (Author). ------------- PR: https://git.openjdk.java.net/jdk/pull/2486 From ayang at openjdk.java.net Fri Feb 12 12:27:45 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Fri, 12 Feb 2021 12:27:45 GMT Subject: RFR: 8261230: GC tracing of page sizes are wrong in a few places [v2] In-Reply-To: References: <4iQZOAzPHRx26UBskJ6u6TY_gGyr0QwcsbqD9KCNiiU=.4bcea2f1-d5a7-4880-81e8-611bb78dba5c@github.com> Message-ID: On Thu, 11 Feb 2021 17:29:34 GMT, Stefan Johansson wrote: >> Yes, and also for `checkLargePagesEnabled`. It's not obvious to me why we parse the output in those two places, one looking for the failure mode, and the other looking for the success mode. That's why I asked for an sample of the "expected" log output. > > I think the check if large pages are enable is pretty straight forward. We should never expect large page sizes in the output unless large pages are enabled. I do however agree that this check is a bit clunky. Would it help to extract it to a separate function? Something like `largePagesAllocationFailure(pattern)`, I could also change the name of the function above to just be `largePagesEnabled()` then the code reads even better? I overlooked the fact that `largePagesAllocationFailure` is pattern/data structure specific. I am happy with the current patch. Thank you. ------------- PR: https://git.openjdk.java.net/jdk/pull/2486 From zgu at openjdk.java.net Fri Feb 12 13:18:41 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Fri, 12 Feb 2021 13:18:41 GMT Subject: RFR: 8261493: Shenandoah: reconsider bitmap access memory ordering In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 07:31:34 GMT, Aleksey Shipilev wrote: > > Now, you need load barrier on read side (e.g. ShenandoahMarkBitMap::at()). Although, it is not a correctness issue, but seeing stale value means extra unnecessary work. > > I don't see why. Adding load barriers would not affect promptness of seeing the memory updates to the bitmap itself. It might affect the promptness of seeing the object contents that we are reading after asking `is_marked` -- but that would be a race either way, because we do not use mark bitmap for memory ordering at all (i.e. there is no "release" on bitmap update). The `load barrier` I were thinking, is something that can prompt seeing the updating of bitmap. I did a little digging, it does not seem we have something like that. We rarely call is_marked() and variants during mark phase, except SATB filtering. I wonder if adding a leading fence in SH::requires_marking() can accomplish that. Although, it is still a race, but I think 1) small price to pay compares to the work of enqueuing a marked oop 2) might help the termination by not enqueuing marked oops. ------------- PR: https://git.openjdk.java.net/jdk/pull/2497 From sjohanss at openjdk.java.net Fri Feb 12 14:59:38 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Fri, 12 Feb 2021 14:59:38 GMT Subject: Integrated: 8261230: GC tracing of page sizes are wrong in a few places In-Reply-To: References: Message-ID: On Tue, 9 Feb 2021 19:42:15 GMT, Stefan Johansson wrote: > The usage of `os::trace_page_sizes()` and friends are wrongly assuming that we always get the page size requested and needs to be updated. This is done by using the helper `ReservedSpace::actual_reserved_page_size()` instead of blindly trusting we get what we ask for. I have plans for the future to get rid of this helper and instead record the page size used in the `ReservedSpace`, but for now the helper is good enough. > > In G1 we used the helper but switched the order of the page size and the alignment parameter, which in turn helped the test to pass since the alignment will match the page size we expect in the test. The test had to be improved to recognize mapping failures. This pull request has now been integrated. Changeset: 9f81ca81 Author: Stefan Johansson URL: https://git.openjdk.java.net/jdk/commit/9f81ca81 Stats: 47 lines in 4 files changed: 40 ins; 1 del; 6 mod 8261230: GC tracing of page sizes are wrong in a few places Reviewed-by: ayang, stuefe ------------- PR: https://git.openjdk.java.net/jdk/pull/2486 From sjohanss at openjdk.java.net Fri Feb 12 15:19:40 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Fri, 12 Feb 2021 15:19:40 GMT Subject: RFR: 8261230: GC tracing of page sizes are wrong in a few places [v2] In-Reply-To: References: Message-ID: <0v1p1uxvWd-13AOErjQP0Lri9b2nSJLbhqhvde802E4=.10c5cace-b517-4674-a0b3-2dbef10dca64@github.com> On Fri, 12 Feb 2021 12:24:29 GMT, Albert Mingkun Yang wrote: >> Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision: >> >> Albert review >> >> Renamed helper to improve how the code read. Also extracted the failure check into a separate function. > > Marked as reviewed by ayang (Author). Thanks for the reviews @albertnetymk and @tstuefe! ------------- PR: https://git.openjdk.java.net/jdk/pull/2486 From shade at openjdk.java.net Fri Feb 12 15:53:38 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 12 Feb 2021 15:53:38 GMT Subject: RFR: 8261493: Shenandoah: reconsider bitmap access memory ordering In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 13:16:03 GMT, Zhengyu Gu wrote: > The `load barrier` I were thinking, is something that can prompt seeing the updating of bitmap. I did a little digging, it does not seem we have something like that. > > We rarely call is_marked() and variants during mark phase, except SATB filtering. I wonder if adding a leading fence in SH::requires_marking() can accomplish that. Although, it is still a race, but I think 1) small price to pay compares to the work of enqueuing a marked oop 2) might help the termination by not enqueuing marked oops. I suspect it would not help much, mostly because hardware does not hoard mutable data on the timescales that are important for performance (they do it on timescales that are important for correctness though: data coming out of order "just" 100ps later is still out of order). I believe we would be paying barrier costs for a very little gain in promptness. Our current handshaking-before-final-mark and SATB locking provides enough of memory bashing, I think. ------------- PR: https://git.openjdk.java.net/jdk/pull/2497 From zgu at openjdk.java.net Fri Feb 12 19:03:39 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Fri, 12 Feb 2021 19:03:39 GMT Subject: RFR: 8261493: Shenandoah: reconsider bitmap access memory ordering In-Reply-To: References: Message-ID: On Wed, 10 Feb 2021 09:32:18 GMT, Aleksey Shipilev wrote: > Shenandoah currently uses its own marking bitmap (added by JDK-8254315). It accesses the marking bitmap with "acquire" for reads and "conservative" for updates. Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. > > I think both are actually excessive for marking bitmap accesses: we do not piggyback object updates on it, the atomics there are only to guarantee the access atomicity and CAS updates to bits. It seems "relaxed" is enough for marking bitmap accesses. > > Sample run with "compact" (frequent GC cycles) on SPECjvm2008:compiler.sunflow on AArch64: > > # Baseline > # Baseline > [146.028s][info][gc,stats] Concurrent Marking = 50.315 s (a = 258024 us) (n = 195) (lvls, us = 31836, 230469, 273438, 306641, 464255) > [141.458s][info][gc,stats] Concurrent Marking = 47.819 s (a = 242737 us) (n = 197) (lvls, us = 42773, 197266, 267578, 287109, 433948) > [144.108s][info][gc,stats] Concurrent Marking = 49.806 s (a = 250283 us) (n = 199) (lvls, us = 32227, 201172, 267578, 296875, 448549) > > # Patched > [144.238s][info][gc,stats] Concurrent Marking = 46.627 s (a = 220981 us) (n = 211) (lvls, us = 24414, 197266, 238281, 259766, 345112) > [138.406s][info][gc,stats] Concurrent Marking = 45.022 s (a = 227383 us) (n = 198) (lvls, us = 20508, 205078, 244141, 271484, 427658) > [140.950s][info][gc,stats] Concurrent Marking = 45.073 s (a = 222036 us) (n = 203) (lvls, us = 21680, 181641, 240234, 265625, 375750) > > Average time goes down, total marking time goes down. > > Additional testing: > - [x] Linux x86_64 `hotspot_gc_shenandoah` > - [x] Linux AArch64 `hotspot_gc_shenandoah` > - [x] Linux AArch64 `tier1` with Shenandoah Marked as reviewed by zgu (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2497 From zgu at openjdk.java.net Fri Feb 12 19:03:40 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Fri, 12 Feb 2021 19:03:40 GMT Subject: RFR: 8261493: Shenandoah: reconsider bitmap access memory ordering In-Reply-To: References: Message-ID: <-k9uEgpi0Kil5Jze_XXSAHngq5nuRTatgFxReISTyIQ=.15e7429b-9a2c-4e73-b8c9-80741bcc781d@github.com> On Fri, 12 Feb 2021 15:50:31 GMT, Aleksey Shipilev wrote: > > The `load barrier` I were thinking, is something that can prompt seeing the updating of bitmap. I did a little digging, it does not seem we have something like that. > > We rarely call is_marked() and variants during mark phase, except SATB filtering. I wonder if adding a leading fence in SH::requires_marking() can accomplish that. Although, it is still a race, but I think 1) small price to pay compares to the work of enqueuing a marked oop 2) might help the termination by not enqueuing marked oops. > > I suspect it would not help much, mostly because hardware does not hoard mutable data on the timescales that are important for performance (they do it on timescales that are important for correctness though: data coming out of order "just" 100ps later is still out of order). I believe we would be paying barrier costs for a very little gain in promptness. Our current handshaking-before-final-mark and SATB locking provides enough of memory bashing, I think. Okay, then. ------------- PR: https://git.openjdk.java.net/jdk/pull/2497 From dcubed at openjdk.java.net Fri Feb 12 19:59:52 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Fri, 12 Feb 2021 19:59:52 GMT Subject: RFR: 8261661: gc/stress/TestReclaimStringsLeaksMemory.java fails because Reserved memory size is too big In-Reply-To: References: Message-ID: <7Io5yLfZ2FY7XuBAzV5iLmU6CLhvCG-MrFtSKiO6FY4=.c05e6878-b879-4e87-87ac-319fd466b7c5@github.com> On Fri, 12 Feb 2021 19:53:23 GMT, Daniel D. Daugherty wrote: > A trivial fix to adjust a test to work with the fix from: > > https://bugs.openjdk.java.net/browse/JDK-8261297 NMT: Final report should use scale 1 > > The idea for the fix came from @albertnetymk. Thanks! > The failure reproduces on my local MBP13 and does not reproduce > with this fix in place. @tstuefe and @zhengyu123 - Please check out this test adjustment that was needed due to the fix for: https://bugs.openjdk.java.net/browse/JDK-8261297 NMT: Final report should use scale 1 ------------- PR: https://git.openjdk.java.net/jdk/pull/2557 From dcubed at openjdk.java.net Fri Feb 12 19:59:51 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Fri, 12 Feb 2021 19:59:51 GMT Subject: RFR: 8261661: gc/stress/TestReclaimStringsLeaksMemory.java fails because Reserved memory size is too big Message-ID: A trivial fix to adjust a test to work with the fix from: https://bugs.openjdk.java.net/browse/JDK-8261297 NMT: Final report should use scale 1 The idea for the fix came from @albertnetymk. Thanks! The failure reproduces on my local MBP13 and does not reproduce with this fix in place. ------------- Commit messages: - 8261661: gc/stress/TestReclaimStringsLeaksMemory.java fails because Reserved memory size is too big Changes: https://git.openjdk.java.net/jdk/pull/2557/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2557&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261661 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/2557.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2557/head:pull/2557 PR: https://git.openjdk.java.net/jdk/pull/2557 From dcubed at openjdk.java.net Fri Feb 12 20:06:38 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Fri, 12 Feb 2021 20:06:38 GMT Subject: RFR: 8261661: gc/stress/TestReclaimStringsLeaksMemory.java fails because Reserved memory size is too big In-Reply-To: <7Io5yLfZ2FY7XuBAzV5iLmU6CLhvCG-MrFtSKiO6FY4=.c05e6878-b879-4e87-87ac-319fd466b7c5@github.com> References: <7Io5yLfZ2FY7XuBAzV5iLmU6CLhvCG-MrFtSKiO6FY4=.c05e6878-b879-4e87-87ac-319fd466b7c5@github.com> Message-ID: <0uGNzLkZbcRjLCmVFdxlj5vT-38Qwqhji8vtgp1F5iA=.ac86279e-9061-4488-92ae-70fc05d1e2ff@github.com> On Fri, 12 Feb 2021 19:56:37 GMT, Daniel D. Daugherty wrote: >> A trivial fix to adjust a test to work with the fix from: >> >> https://bugs.openjdk.java.net/browse/JDK-8261297 NMT: Final report should use scale 1 >> >> The idea for the fix came from @albertnetymk. Thanks! >> The failure reproduces on my local MBP13 and does not reproduce >> with this fix in place. > > @tstuefe and @zhengyu123 - Please check out this test adjustment that was needed > due to the fix for: > https://bugs.openjdk.java.net/browse/JDK-8261297 NMT: Final report should use scale 1 The reason I'm pursuing this fix is to reduce the noise in the JDK17 CI for the weekend. ------------- PR: https://git.openjdk.java.net/jdk/pull/2557 From ayang at openjdk.java.net Fri Feb 12 20:21:41 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Fri, 12 Feb 2021 20:21:41 GMT Subject: RFR: 8261661: gc/stress/TestReclaimStringsLeaksMemory.java fails because Reserved memory size is too big In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 19:53:23 GMT, Daniel D. Daugherty wrote: > A trivial fix to adjust a test to work with the fix from: > > https://bugs.openjdk.java.net/browse/JDK-8261297 NMT: Final report should use scale 1 > > The idea for the fix came from @albertnetymk. Thanks! > The failure reproduces on my local MBP13 and does not reproduce > with this fix in place. Marked as reviewed by ayang (Author). ------------- PR: https://git.openjdk.java.net/jdk/pull/2557 From dcubed at openjdk.java.net Fri Feb 12 20:25:39 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Fri, 12 Feb 2021 20:25:39 GMT Subject: RFR: 8261661: gc/stress/TestReclaimStringsLeaksMemory.java fails because Reserved memory size is too big In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 20:18:41 GMT, Albert Mingkun Yang wrote: >> A trivial fix to adjust a test to work with the fix from: >> >> https://bugs.openjdk.java.net/browse/JDK-8261297 NMT: Final report should use scale 1 >> >> The idea for the fix came from @albertnetymk. Thanks! >> The failure reproduces on my local MBP13 and does not reproduce >> with this fix in place. > > Marked as reviewed by ayang (Author). @albertnetymk - Thanks for the fast review! ------------- PR: https://git.openjdk.java.net/jdk/pull/2557 From rkennke at openjdk.java.net Fri Feb 12 21:45:57 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Fri, 12 Feb 2021 21:45:57 GMT Subject: RFR: 8261448: Preserve GC stack watermark across safepoints in StackWalk [v2] In-Reply-To: References: Message-ID: > I am observing the following assert: > > # Internal Error (/home/rkennke/src/openjdk/loom/src/hotspot/share/runtime/stackWatermark.cpp:178), pid=54418, tid=54534 > # assert(is_frame_safe(f)) failed: Frame must be safe > > (see issue for full hs_err) > > In StackWalk::fetchNextBatch() we prepare the entire stack to be processed by calling StackWatermarkSet::finish_processing(jt, NULL, StackWatermarkKind::gc), but then subsequently, during frames scan, perform allocations to fill in the frame information (fill_in_frames => LiveFrameStream::fill_frame => fill_live_stackframe) at where we could safepoint for GC, which could reset the stack watermark. > > This is only relevant for GCs that use the StackWatermark, e.g. ZGC and Shenandoah at the moment. > > Solution is to preserve the stack-watermark across safepoints in StackWalk::fetchNextBatch(). StackWalk::fetchFirstBatch() doesn't look to be affected by this: it is not using the stack-watermark. > > Testing: > - [x] StackWalk tests with Shenandoah/aggressive > - [x] StackWalk tests with ZGC/aggressive > - [ ] tier1 (+Shenandoah/ZGC) > - [ ] tier2 (+Shenandoah/ZGC) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Make KeepStackGCProcessedMark reentrant; Place a KeepStackGCProcessedMark in StackWalker::fetchFirstBatch() ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2500/files - new: https://git.openjdk.java.net/jdk/pull/2500/files/72f20e13..6946499c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2500&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2500&range=00-01 Stats: 12 lines in 4 files changed: 10 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2500.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2500/head:pull/2500 PR: https://git.openjdk.java.net/jdk/pull/2500 From rkennke at openjdk.java.net Fri Feb 12 21:45:57 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Fri, 12 Feb 2021 21:45:57 GMT Subject: RFR: 8261448: Preserve GC stack watermark across safepoints in StackWalk In-Reply-To: <2X3mb-VkqGf_YYSIeb3n9pxXmocT1GkUYDYI_C8cOZo=.3f2fab17-f8f6-4860-a6b4-0a6bb6a1256f@github.com> References: <2X3mb-VkqGf_YYSIeb3n9pxXmocT1GkUYDYI_C8cOZo=.3f2fab17-f8f6-4860-a6b4-0a6bb6a1256f@github.com> Message-ID: On Wed, 10 Feb 2021 12:38:10 GMT, Roman Kennke wrote: >> I am observing the following assert: >> >> # Internal Error (/home/rkennke/src/openjdk/loom/src/hotspot/share/runtime/stackWatermark.cpp:178), pid=54418, tid=54534 >> # assert(is_frame_safe(f)) failed: Frame must be safe >> >> (see issue for full hs_err) >> >> In StackWalk::fetchNextBatch() we prepare the entire stack to be processed by calling StackWatermarkSet::finish_processing(jt, NULL, StackWatermarkKind::gc), but then subsequently, during frames scan, perform allocations to fill in the frame information (fill_in_frames => LiveFrameStream::fill_frame => fill_live_stackframe) at where we could safepoint for GC, which could reset the stack watermark. >> >> This is only relevant for GCs that use the StackWatermark, e.g. ZGC and Shenandoah at the moment. >> >> Solution is to preserve the stack-watermark across safepoints in StackWalk::fetchNextBatch(). StackWalk::fetchFirstBatch() doesn't look to be affected by this: it is not using the stack-watermark. >> >> Testing: >> - [x] StackWalk tests with Shenandoah/aggressive >> - [x] StackWalk tests with ZGC/aggressive >> - [ ] tier1 (+Shenandoah/ZGC) >> - [ ] tier2 (+Shenandoah/ZGC) > > I'm converting back to draft. The Loom tests (test/jdk/java/lang/Continuation/*) are still failing and it looks like fetchFirstBatch() does indeed require treatment, and it's complicated because fetchFirstBatch() may end up calling fetchNextBatch() and the KeepStackGCProcessedMark is not reentrant. I tested the original patch in Loom with tests that use stack-walking and it failed because we'd need another KeepStackGCProcessedMark in fetchFirstBatch() too. Unfortunately, fetchFirstBatch() can wind up calling fetchNextBatch() recursively, but we *also* can call fetchNextBatch() without calling fetchFirstBatch() on outer frame, thus we need KeepStackGCProcessedMark to be reentrant. I achieved this by linking together nested linked watermark. I am not sure this is the right way to achieve it. It fixes all tests in Loom *and* mainline JDK though. ------------- PR: https://git.openjdk.java.net/jdk/pull/2500 From kbarrett at openjdk.java.net Fri Feb 12 21:51:38 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 12 Feb 2021 21:51:38 GMT Subject: RFR: 8261661: gc/stress/TestReclaimStringsLeaksMemory.java fails because Reserved memory size is too big In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 19:53:23 GMT, Daniel D. Daugherty wrote: > A trivial fix to adjust a test to work with the fix from: > > https://bugs.openjdk.java.net/browse/JDK-8261297 NMT: Final report should use scale 1 > > The idea for the fix came from @albertnetymk. Thanks! > The failure reproduces on my local MBP13 and does not reproduce > with this fix in place. Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2557 From cjplummer at openjdk.java.net Fri Feb 12 22:04:41 2021 From: cjplummer at openjdk.java.net (Chris Plummer) Date: Fri, 12 Feb 2021 22:04:41 GMT Subject: Integrated: 8247514: Improve clhsdb 'findpc' ability to determine what an address points to by improving PointerFinder and PointerLocation classes In-Reply-To: <4YKNpyXQ9QGrLhR61tkh71Q3A7VvCj5Ete_4OvzAA-o=.28b7be8c-6f05-42d4-892b-87ebea907b24@github.com> References: <4YKNpyXQ9QGrLhR61tkh71Q3A7VvCj5Ete_4OvzAA-o=.28b7be8c-6f05-42d4-892b-87ebea907b24@github.com> Message-ID: On Sun, 17 Jan 2021 03:57:59 GMT, Chris Plummer wrote: > See the bug for most details. A few notes here about some implementation details: > > In the `PointerLocation` class, I added more consistency w.r.t. whether or not a newline is printed. It used to for some address types, but not others. Now it always does. And if you see a comment something like the following: > > ` getTLAB().printOn(tty); // includes "\n" ` > > That's just clarifying whether or not the `printOn()` method called will include the newline. Some do and some don't, and knowing what the various `printOn()` methods do makes getting the proper inclusion of the newline easier to understand. > > I added `verbose` and `printAddress` boolean arguments to `PointerLocation.printOn()`. Currently they are always `true`. The false arguments will be used when I complete [JDK-8250801](https://bugs.openjdk.java.net/browse/JDK-8250801), which will use `PointerFinder/Location` to show what each register points to. > > The CR mentions that the main motivation for this work is for eventual replacement of the old clhsdb `whatis` command, which was implemented in javascript. It used to resolve DSO symbols, whereas `findpc` did not. The `whatis` code did this with the following: > > var dso = loadObjectContainingPC(addr); > if (dso == null) { > return ptrLoc.toString(); > } > var sym = dso.closestSymbolToPC(addr); > if (sym != null) { > return sym.name + '+' + sym.offset; > } > And now you'll see something similar in the PointerFinder code: > > loc.loadObject = cdbg.loadObjectContainingPC(a); > if (loc.loadObject != null) { > loc.nativeSymbol = loc.loadObject.closestSymbolToPC(a); > return loc; > } > Note that now that `findpc` does everything that `whatis` used to (and more), we don't really need to add a java version of `whatis`, but I'll probably do so anyway just help out people who are used to using the `whatis` command. That will be done using [JDK-8244670](https://bugs.openjdk.java.net/browse/JDK-8244670) This pull request has now been integrated. Changeset: e29c560a Author: Chris Plummer URL: https://git.openjdk.java.net/jdk/commit/e29c560a Stats: 292 lines in 5 files changed: 238 ins; 8 del; 46 mod 8247514: Improve clhsdb 'findpc' ability to determine what an address points to by improving PointerFinder and PointerLocation classes Reviewed-by: ysuenaga, kevinw ------------- PR: https://git.openjdk.java.net/jdk/pull/2111 From dcubed at openjdk.java.net Fri Feb 12 22:08:39 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Fri, 12 Feb 2021 22:08:39 GMT Subject: RFR: 8261661: gc/stress/TestReclaimStringsLeaksMemory.java fails because Reserved memory size is too big In-Reply-To: References: Message-ID: <4sToElO22vfEpbWPzcab-lsik2ntERqa3XqgzeWZHmQ=.03f022fb-6f6f-4441-9995-56c0ed6c7f40@github.com> On Fri, 12 Feb 2021 21:48:28 GMT, Kim Barrett wrote: >> A trivial fix to adjust a test to work with the fix from: >> >> https://bugs.openjdk.java.net/browse/JDK-8261297 NMT: Final report should use scale 1 >> >> The idea for the fix came from @albertnetymk. Thanks! >> The failure reproduces on my local MBP13 and does not reproduce >> with this fix in place. > > Looks good. @kimbarrett - Thanks for the review! Do you think we need to wait for a Runtime/NMT reviewer? ------------- PR: https://git.openjdk.java.net/jdk/pull/2557 From lkorinth at openjdk.java.net Fri Feb 12 22:37:51 2021 From: lkorinth at openjdk.java.net (Leo Korinth) Date: Fri, 12 Feb 2021 22:37:51 GMT Subject: RFR: 8260414: Remove unused set_single_threaded_mode() method in task executor Message-ID: This code is not used any more. ------------- Commit messages: - 8260414: Remove unused set_single_threaded_mode() method in task executor Changes: https://git.openjdk.java.net/jdk/pull/2558/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2558&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260414 Stats: 9 lines in 2 files changed: 0 ins; 8 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2558.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2558/head:pull/2558 PR: https://git.openjdk.java.net/jdk/pull/2558 From dcubed at openjdk.java.net Fri Feb 12 22:44:39 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Fri, 12 Feb 2021 22:44:39 GMT Subject: Integrated: 8261661: gc/stress/TestReclaimStringsLeaksMemory.java fails because Reserved memory size is too big In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 19:53:23 GMT, Daniel D. Daugherty wrote: > A trivial fix to adjust a test to work with the fix from: > > https://bugs.openjdk.java.net/browse/JDK-8261297 NMT: Final report should use scale 1 > > The idea for the fix came from @albertnetymk. Thanks! > The failure reproduces on my local MBP13 and does not reproduce > with this fix in place. This pull request has now been integrated. Changeset: 735757f1 Author: Daniel D. Daugherty URL: https://git.openjdk.java.net/jdk/commit/735757f1 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod 8261661: gc/stress/TestReclaimStringsLeaksMemory.java fails because Reserved memory size is too big Co-authored-by: Albert Mingkun Yang Reviewed-by: ayang, kbarrett ------------- PR: https://git.openjdk.java.net/jdk/pull/2557 From dcubed at openjdk.java.net Fri Feb 12 22:44:38 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Fri, 12 Feb 2021 22:44:38 GMT Subject: RFR: 8261661: gc/stress/TestReclaimStringsLeaksMemory.java fails because Reserved memory size is too big In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 21:48:28 GMT, Kim Barrett wrote: >> A trivial fix to adjust a test to work with the fix from: >> >> https://bugs.openjdk.java.net/browse/JDK-8261297 NMT: Final report should use scale 1 >> >> The idea for the fix came from @albertnetymk. Thanks! >> The failure reproduces on my local MBP13 and does not reproduce >> with this fix in place. > > Looks good. @kimbarrett and @coleenp just told me that I don't really need to wait for a Runtime/NMT review. ------------- PR: https://git.openjdk.java.net/jdk/pull/2557 From eosterlund at openjdk.java.net Fri Feb 12 23:17:39 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 12 Feb 2021 23:17:39 GMT Subject: RFR: 8261448: Preserve GC stack watermark across safepoints in StackWalk In-Reply-To: References: <2X3mb-VkqGf_YYSIeb3n9pxXmocT1GkUYDYI_C8cOZo=.3f2fab17-f8f6-4860-a6b4-0a6bb6a1256f@github.com> Message-ID: On Fri, 12 Feb 2021 21:43:20 GMT, Roman Kennke wrote: >> I'm converting back to draft. The Loom tests (test/jdk/java/lang/Continuation/*) are still failing and it looks like fetchFirstBatch() does indeed require treatment, and it's complicated because fetchFirstBatch() may end up calling fetchNextBatch() and the KeepStackGCProcessedMark is not reentrant. > > I tested the original patch in Loom with tests that use stack-walking and it failed because we'd need another KeepStackGCProcessedMark in fetchFirstBatch() too. Unfortunately, fetchFirstBatch() can wind up calling fetchNextBatch() recursively, but we *also* can call fetchNextBatch() without calling fetchFirstBatch() on outer frame, thus we need KeepStackGCProcessedMark to be reentrant. I achieved this by linking together nested linked watermark. I am not sure this is the right way to achieve it. It fixes all tests in Loom *and* mainline JDK though. I think this solution is wrong, regarding nesting. There is only a single node but it looks like you think there are multiple. The result is seemingly that the unlink function won't unlink anything, which permanently disables incremental stack scanning on that thread. Is there any way the mark can be placed closer to the problematic allocation so we don't need nesting? ------------- PR: https://git.openjdk.java.net/jdk/pull/2500 From eosterlund at openjdk.java.net Fri Feb 12 23:17:38 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 12 Feb 2021 23:17:38 GMT Subject: RFR: 8261448: Preserve GC stack watermark across safepoints in StackWalk [v2] In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 21:45:57 GMT, Roman Kennke wrote: >> I am observing the following assert: >> >> # Internal Error (/home/rkennke/src/openjdk/loom/src/hotspot/share/runtime/stackWatermark.cpp:178), pid=54418, tid=54534 >> # assert(is_frame_safe(f)) failed: Frame must be safe >> >> (see issue for full hs_err) >> >> In StackWalk::fetchNextBatch() we prepare the entire stack to be processed by calling StackWatermarkSet::finish_processing(jt, NULL, StackWatermarkKind::gc), but then subsequently, during frames scan, perform allocations to fill in the frame information (fill_in_frames => LiveFrameStream::fill_frame => fill_live_stackframe) at where we could safepoint for GC, which could reset the stack watermark. >> >> This is only relevant for GCs that use the StackWatermark, e.g. ZGC and Shenandoah at the moment. >> >> Solution is to preserve the stack-watermark across safepoints in StackWalk::fetchNextBatch(). StackWalk::fetchFirstBatch() doesn't look to be affected by this: it is not using the stack-watermark. >> >> Testing: >> - [x] StackWalk tests with Shenandoah/aggressive >> - [x] StackWalk tests with ZGC/aggressive >> - [ ] tier1 (+Shenandoah/ZGC) >> - [ ] tier2 (+Shenandoah/ZGC) > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Make KeepStackGCProcessedMark reentrant; Place a KeepStackGCProcessedMark in StackWalker::fetchFirstBatch() Nesting code looks wrong. ------------- Changes requested by eosterlund (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2500 From kbarrett at openjdk.java.net Sat Feb 13 00:59:42 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sat, 13 Feb 2021 00:59:42 GMT Subject: RFR: 8260414: Remove unused set_single_threaded_mode() method in task executor In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 22:32:41 GMT, Leo Korinth wrote: > This code is not used any more. Looks good, and trivial. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2558 From stuefe at openjdk.java.net Sat Feb 13 04:38:39 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Sat, 13 Feb 2021 04:38:39 GMT Subject: RFR: 8261661: gc/stress/TestReclaimStringsLeaksMemory.java fails because Reserved memory size is too big In-Reply-To: References: Message-ID: <_x78XJVdntIn9mTH3xs5vZZnwl0GAcAOL7IiMKReVI4=.860ccd6a-1dc3-4cdd-808c-beb89c28c804@github.com> On Fri, 12 Feb 2021 22:40:17 GMT, Daniel D. Daugherty wrote: >> Looks good. > > @kimbarrett and @coleenp just told me that I don't really need to wait for a Runtime/NMT review. This looks good. Sorry for the trouble. ------------- PR: https://git.openjdk.java.net/jdk/pull/2557 From ayang at openjdk.java.net Sat Feb 13 09:44:39 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Sat, 13 Feb 2021 09:44:39 GMT Subject: RFR: 8260414: Remove unused set_single_threaded_mode() method in task executor In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 22:32:41 GMT, Leo Korinth wrote: > This code is not used any more. Marked as reviewed by ayang (Author). ------------- PR: https://git.openjdk.java.net/jdk/pull/2558 From shade at openjdk.java.net Mon Feb 15 08:44:41 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 15 Feb 2021 08:44:41 GMT Subject: Integrated: 8261503: Shenandoah: reconsider verifier memory ordering In-Reply-To: References: Message-ID: On Wed, 10 Feb 2021 11:41:45 GMT, Aleksey Shipilev wrote: > Shenandoah verifier uses lots of atomic operations. Unfortunately, Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. > > In most cases, that is excessive for verifier, and "relaxed" would do. > > Additional testing: > - [x] Linux x86_64 hotspot_gc_shenandoah > - [x] Linux AArch64 hotspot_gc_shenandoah > - [x] Linux AArch64 tier1 with Shenandoah This pull request has now been integrated. Changeset: 7c931591 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/7c931591 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod 8261503: Shenandoah: reconsider verifier memory ordering Reviewed-by: zgu ------------- PR: https://git.openjdk.java.net/jdk/pull/2505 From shade at openjdk.java.net Mon Feb 15 08:45:40 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 15 Feb 2021 08:45:40 GMT Subject: Integrated: 8261496: Shenandoah: reconsider pacing updates memory ordering In-Reply-To: <_BlnOgWoSTjE1myt9WfuiZpM9hiIP7sGp38IJmzuyYg=.8a578dda-dbf7-4780-bc74-cf3710609005@github.com> References: <_BlnOgWoSTjE1myt9WfuiZpM9hiIP7sGp38IJmzuyYg=.8a578dda-dbf7-4780-bc74-cf3710609005@github.com> Message-ID: On Wed, 10 Feb 2021 10:13:47 GMT, Aleksey Shipilev wrote: > Shenandoah pacer uses atomic operations to update budget, progress, allocations seen. Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. > > This is excessive for pacing, as we do not piggyback memory effects on it. All pacing updates can use "relaxed". > > Additional testing: > - [x] Linux x86_64 hotspot_gc_shenandoah > - [x] Linux AArch64 hotspot_gc_shenandoah > - [x] Linux AArch64 tier1 with Shenandoah This pull request has now been integrated. Changeset: 4642730b Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/4642730b Stats: 7 lines in 3 files changed: 0 ins; 0 del; 7 mod 8261496: Shenandoah: reconsider pacing updates memory ordering Reviewed-by: zgu ------------- PR: https://git.openjdk.java.net/jdk/pull/2501 From shade at openjdk.java.net Mon Feb 15 08:46:45 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 15 Feb 2021 08:46:45 GMT Subject: RFR: 8261501: Shenandoah: reconsider heap statistics memory ordering In-Reply-To: <0O1tXXs991770rhrpYioXIWr6m-OhDFMZINDiQ_UXc4=.92460035-468e-4bf5-97cb-bff58d1a2ede@github.com> References: <0O1tXXs991770rhrpYioXIWr6m-OhDFMZINDiQ_UXc4=.92460035-468e-4bf5-97cb-bff58d1a2ede@github.com> Message-ID: On Wed, 10 Feb 2021 11:10:35 GMT, Aleksey Shipilev wrote: > ShenandoahHeap collects heap-wide statistics (used, committed, etc). It does so by atomically updating them with default CASes. Unfortunately, Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. > > This is excessive for statistics gathering, and "relaxed" should be just as good. > > Additional testing: > - [x] Linux x86_64 hotspot_gc_shenandoah > - [x] Linux AArch64 hotspot_gc_shenandoah > - [x] Linux AArch64 tier1 with Shenandoah Friendly reminder. ------------- PR: https://git.openjdk.java.net/jdk/pull/2504 From shade at openjdk.java.net Mon Feb 15 08:47:44 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 15 Feb 2021 08:47:44 GMT Subject: Integrated: 8261493: Shenandoah: reconsider bitmap access memory ordering In-Reply-To: References: Message-ID: On Wed, 10 Feb 2021 09:32:18 GMT, Aleksey Shipilev wrote: > Shenandoah currently uses its own marking bitmap (added by JDK-8254315). It accesses the marking bitmap with "acquire" for reads and "conservative" for updates. Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. > > I think both are actually excessive for marking bitmap accesses: we do not piggyback object updates on it, the atomics there are only to guarantee the access atomicity and CAS updates to bits. It seems "relaxed" is enough for marking bitmap accesses. > > Sample run with "compact" (frequent GC cycles) on SPECjvm2008:compiler.sunflow on AArch64: > > # Baseline > # Baseline > [146.028s][info][gc,stats] Concurrent Marking = 50.315 s (a = 258024 us) (n = 195) (lvls, us = 31836, 230469, 273438, 306641, 464255) > [141.458s][info][gc,stats] Concurrent Marking = 47.819 s (a = 242737 us) (n = 197) (lvls, us = 42773, 197266, 267578, 287109, 433948) > [144.108s][info][gc,stats] Concurrent Marking = 49.806 s (a = 250283 us) (n = 199) (lvls, us = 32227, 201172, 267578, 296875, 448549) > > # Patched > [144.238s][info][gc,stats] Concurrent Marking = 46.627 s (a = 220981 us) (n = 211) (lvls, us = 24414, 197266, 238281, 259766, 345112) > [138.406s][info][gc,stats] Concurrent Marking = 45.022 s (a = 227383 us) (n = 198) (lvls, us = 20508, 205078, 244141, 271484, 427658) > [140.950s][info][gc,stats] Concurrent Marking = 45.073 s (a = 222036 us) (n = 203) (lvls, us = 21680, 181641, 240234, 265625, 375750) > > Average time goes down, total marking time goes down. > > Additional testing: > - [x] Linux x86_64 `hotspot_gc_shenandoah` > - [x] Linux AArch64 `hotspot_gc_shenandoah` > - [x] Linux AArch64 `tier1` with Shenandoah This pull request has now been integrated. Changeset: 745c0b91 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/745c0b91 Stats: 18 lines in 2 files changed: 0 ins; 14 del; 4 mod 8261493: Shenandoah: reconsider bitmap access memory ordering Reviewed-by: rkennke, zgu ------------- PR: https://git.openjdk.java.net/jdk/pull/2497 From shade at openjdk.java.net Mon Feb 15 08:47:42 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 15 Feb 2021 08:47:42 GMT Subject: Integrated: 8261504: Shenandoah: reconsider ShenandoahJavaThreadsIterator::claim memory ordering In-Reply-To: References: Message-ID: On Wed, 10 Feb 2021 12:00:38 GMT, Aleksey Shipilev wrote: > JDK-8256298 added the thread iterator for thread roots, and I don't think we need the Hotspot's default memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. The simple "relaxed" should do. > > Additional testing: > - [x] Linux x86_64 hotspot_gc_shenandoah > - [x] Linux AArch64 hotspot_gc_shenandoah > - [x] Linux AArch64 tier1 with Shenandoah This pull request has now been integrated. Changeset: df0897ea Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/df0897ea Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8261504: Shenandoah: reconsider ShenandoahJavaThreadsIterator::claim memory ordering Reviewed-by: zgu ------------- PR: https://git.openjdk.java.net/jdk/pull/2506 From shade at openjdk.java.net Mon Feb 15 08:47:44 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Mon, 15 Feb 2021 08:47:44 GMT Subject: Integrated: 8261500: Shenandoah: reconsider region live data memory ordering In-Reply-To: References: Message-ID: On Wed, 10 Feb 2021 10:40:26 GMT, Aleksey Shipilev wrote: > Current Shenandoah region live data tracking uses default CAS updates to achieve atomicity of updates. Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. > > This seems to be excessive for live data tracking, and "relaxed" could be used instead. The only serious user of that data is collection set chooser, which runs at safepoint and so everything should be quiescent when that happens. > > Additional testing: > - [x] Linux x86_64 hotspot_gc_shenandoah > - [x] Linux AArch64 hotspot_gc_shenandoah > - [x] Linux AArch64 tier1 with Shenandoah This pull request has now been integrated. Changeset: c6eedda8 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/c6eedda8 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod 8261500: Shenandoah: reconsider region live data memory ordering Reviewed-by: zgu ------------- PR: https://git.openjdk.java.net/jdk/pull/2503 From lkorinth at openjdk.java.net Mon Feb 15 08:55:39 2021 From: lkorinth at openjdk.java.net (Leo Korinth) Date: Mon, 15 Feb 2021 08:55:39 GMT Subject: RFR: 8260414: Remove unused set_single_threaded_mode() method in task executor In-Reply-To: References: Message-ID: On Sat, 13 Feb 2021 09:41:33 GMT, Albert Mingkun Yang wrote: >> This code is not used any more. > > Marked as reviewed by ayang (Author). Thanks Kim and Albert! ------------- PR: https://git.openjdk.java.net/jdk/pull/2558 From lkorinth at openjdk.java.net Mon Feb 15 08:55:40 2021 From: lkorinth at openjdk.java.net (Leo Korinth) Date: Mon, 15 Feb 2021 08:55:40 GMT Subject: Integrated: 8260414: Remove unused set_single_threaded_mode() method in task executor In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 22:32:41 GMT, Leo Korinth wrote: > This code is not used any more. This pull request has now been integrated. Changeset: 3882fda8 Author: Leo Korinth URL: https://git.openjdk.java.net/jdk/commit/3882fda8 Stats: 9 lines in 2 files changed: 0 ins; 8 del; 1 mod 8260414: Remove unused set_single_threaded_mode() method in task executor Reviewed-by: kbarrett, ayang ------------- PR: https://git.openjdk.java.net/jdk/pull/2558 From stefank at openjdk.java.net Mon Feb 15 09:28:41 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Mon, 15 Feb 2021 09:28:41 GMT Subject: RFR: 8261448: Preserve GC stack watermark across safepoints in StackWalk [v2] In-Reply-To: References: Message-ID: On Fri, 12 Feb 2021 23:14:47 GMT, Erik ?sterlund wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Make KeepStackGCProcessedMark reentrant; Place a KeepStackGCProcessedMark in StackWalker::fetchFirstBatch() > > Nesting code looks wrong. I incorrectly read Erik's comment as "Nesting code looks **good**", so I created a unit test to show the problem with the patch: https://github.com/stefank/jdk/commit/8760f1b0409b3cccf76a8ea417b90e66da31af72 Maybe you could build a few more test based on this? ------------- PR: https://git.openjdk.java.net/jdk/pull/2500 From jaroslav.bachorik at datadoghq.com Mon Feb 15 09:44:04 2021 From: jaroslav.bachorik at datadoghq.com (=?UTF-8?Q?Jaroslav_Bachor=C3=ADk?=) Date: Mon, 15 Feb 2021 10:44:04 +0100 Subject: Can GC implementations provide a cheap estimation of live set size? In-Reply-To: References: Message-ID: Hi again, I continued experimenting with Shenandoah and ZGC which already are tracking liveness. I am emitting a (partially filled) GCHeapSummary JFR event to capture used/live sizes. For Shenandoah the event is emitted at the very end of the `ShenandoahConcurrentGC::op_final_mark()` method and for ZGC it is the `ZMark::end()` method. The exact changes can be checked via branch comparison (https://github.com/openjdk/jdk/compare/master...DataDog:jb/live_set_1) but bear in mind that this is just an experimental code with no intention being checked in in its current form. Unfortunately, when I run an application on such modified JVM and collect a JFR recording the live set size numbers seem a bit 'low' - eg. on both ZGC and Shenandoah (using an already available liveness info) the reported liveness is ~50% of the reported usage. Is there a good explanation for this? Thanks! -JB- On Thu, Feb 11, 2021 at 7:09 PM Jaroslav Bachor?k wrote: > > On Thu, Feb 11, 2021 at 6:55 PM Roman Kennke wrote: > > > > Notice that liveness information is only somewhat reliable right after > > marking. In Shenandoah, this is in the final-mark pause, and then the > > Yes, I understand this. What I am looking at is to have something like > 'last known liveness' value - captured at a well defined point and > providing an estimate within the bounds of GC implementation. > > > program is at a safepoint already. This is where you'd want to emit a > > JMX event or something similar. You can't simply query a counter and > > assume it represents current liveness in the middle or outside of GC > > cycle. This should be true for all GCs. > > > > For Serial and Parallel I am not sure at all that you can do this. > > AFAIK, they don't count liveness at all. > > > > Roman > > > > > Hi Roman, > > > > > > Thanks for your response. I checked ZGC implementation and, indeed, it > > > is very easy to get the liveness information just by extending > > > `ZStatHeap` class to report the last valid value of > > > `_at_mark_end.live`. > > > > > > I am also able to get this info from Shenandoah, although my first > > > attempt still involves a safepointing VM operation since I need to > > > iterate over regions to get the liveness info for each of them and sum > > > it up. I think it is still an acceptable trade-off, though. > > > > > > The next one in the queue is the Serial GC. My assumptions, based on > > > reading the code, are that for young gen 'live = used' at the end of > > > DefNewGeneration::collect() method and for old gen 'live = used - > > > slack' (slack is the cumulative size of objects considered to be alive > > > for the purpose of compaction although they are really dead - see > > > CompactibleSpace::scan_and_forward()). Does this sound reasonable? > > > > > > I will post my findings for Parallel GC and G1 GC later. > > > > > > Cheers, > > > > > > -JB- > > > > > > On Wed, Feb 10, 2021 at 11:34 AM Roman Kennke wrote: > > >> > > >> Hello Jaroslav, > > >> > > >>> In connection with https://bugs.openjdk.java.net/browse/JDK-8258431 I > > >>> am trying to figure out whether providing a cheap estimation of live > > >>> set size is something actually achievable across various GC > > >>> implementations. > > >>> > > >>> What I am looking at is piggy-backing on a concurrent mark task to get > > >>> the summary size of live objects - using the 'straight-forward' > > >>> heap-inspection like approach is prohibitively expensive. > > >> > > >> In Shenandoah, this information is already collected during concurrent > > >> marking. We currently don't print it directly, but we could certainly do > > >> that. I'll look into implementing it. I'll also look into exposing > > >> liveness info via JMX. > > >> > > >> I'm not quite sure about G1: that information would only be collected > > >> during mixed or full collections. I am not sure if G1 prints it, though. > > >> > > >> ZGC prints this under -Xlog:gc+heap: > > >> > > >> [6,502s][info][gc,heap ] GC(0) Mark Start > > >> Mark End Relocate Start Relocate End High > > >> Low > > >> [6,502s][info][gc,heap ] GC(0) Capacity: 834M (10%) > > >> 1076M (13%) 1092M (14%) 1092M (14%) 1092M (14%) > > >> 834M (10%) > > >> [6,502s][info][gc,heap ] GC(0) Free: 7154M (90%) > > >> 6912M (87%) 6916M (87%) 7388M (92%) 7388M (92%) > > >> 6896M (86%) > > >> [6,502s][info][gc,heap ] GC(0) Used: 834M (10%) > > >> 1076M (13%) 1072M (13%) 600M (8%) 1092M (14%) > > >> 600M (8%) > > >> [6,502s][info][gc,heap ] GC(0) Live: - > > >> 195M (2%) 195M (2%) 195M (2%) - > > >> - > > >> [6,502s][info][gc,heap ] GC(0) Allocated: - > > >> 242M (3%) 270M (3%) 380M (5%) - > > >> - > > >> [6,502s][info][gc,heap ] GC(0) Garbage: - > > >> 638M (8%) 606M (8%) 24M (0%) - > > >> - > > >> [6,502s][info][gc,heap ] GC(0) Reclaimed: - > > >> - 32M (0%) 614M (8%) - > > >> - > > >> > > >> I hope that is useful? > > >> > > >> Thanks, > > >> Roman > > >> > > > > > From per.liden at oracle.com Mon Feb 15 10:24:03 2021 From: per.liden at oracle.com (Per Liden) Date: Mon, 15 Feb 2021 11:24:03 +0100 Subject: Can GC implementations provide a cheap estimation of live set size? In-Reply-To: References: Message-ID: <440f220b-c1ae-574b-741f-c52bdb1230e2@oracle.com> Hi, On 2/15/21 10:44 AM, Jaroslav Bachor?k wrote: > Hi again, > > I continued experimenting with Shenandoah and ZGC which already are > tracking liveness. I am emitting a (partially filled) GCHeapSummary > JFR event to capture used/live sizes. > For Shenandoah the event is emitted at the very end of the > `ShenandoahConcurrentGC::op_final_mark()` method and for ZGC it is the > `ZMark::end()` method. The exact changes can be checked via branch > comparison (https://github.com/openjdk/jdk/compare/master...DataDog:jb/live_set_1) > but bear in mind that this is just an experimental code with no > intention being checked in in its current form. > > Unfortunately, when I run an application on such modified JVM and > collect a JFR recording the live set size numbers seem a bit 'low' - > eg. on both ZGC and Shenandoah (using an already available liveness > info) the reported liveness is ~50% of the reported usage. Is there a > good explanation for this? When you create the GCHeapSummary, the "live" value reflects what was live after marking, while the "used" value reflects the usage when the GC cycle ended. So, after marking ended, some amount of garbage was likely reclaimed, but then new objects were also allocated. For ZGC (don't know if Shenandoah shows this), you can see details of how much was reclaimed and how much was allocated in the GC log. /Per > > Thanks! > > -JB- > > On Thu, Feb 11, 2021 at 7:09 PM Jaroslav Bachor?k > wrote: >> >> On Thu, Feb 11, 2021 at 6:55 PM Roman Kennke wrote: >>> >>> Notice that liveness information is only somewhat reliable right after >>> marking. In Shenandoah, this is in the final-mark pause, and then the >> >> Yes, I understand this. What I am looking at is to have something like >> 'last known liveness' value - captured at a well defined point and >> providing an estimate within the bounds of GC implementation. >> >>> program is at a safepoint already. This is where you'd want to emit a >>> JMX event or something similar. You can't simply query a counter and >>> assume it represents current liveness in the middle or outside of GC >>> cycle. This should be true for all GCs. >>> >>> For Serial and Parallel I am not sure at all that you can do this. >>> AFAIK, they don't count liveness at all. >>> >>> Roman >>> >>>> Hi Roman, >>>> >>>> Thanks for your response. I checked ZGC implementation and, indeed, it >>>> is very easy to get the liveness information just by extending >>>> `ZStatHeap` class to report the last valid value of >>>> `_at_mark_end.live`. >>>> >>>> I am also able to get this info from Shenandoah, although my first >>>> attempt still involves a safepointing VM operation since I need to >>>> iterate over regions to get the liveness info for each of them and sum >>>> it up. I think it is still an acceptable trade-off, though. >>>> >>>> The next one in the queue is the Serial GC. My assumptions, based on >>>> reading the code, are that for young gen 'live = used' at the end of >>>> DefNewGeneration::collect() method and for old gen 'live = used - >>>> slack' (slack is the cumulative size of objects considered to be alive >>>> for the purpose of compaction although they are really dead - see >>>> CompactibleSpace::scan_and_forward()). Does this sound reasonable? >>>> >>>> I will post my findings for Parallel GC and G1 GC later. >>>> >>>> Cheers, >>>> >>>> -JB- >>>> >>>> On Wed, Feb 10, 2021 at 11:34 AM Roman Kennke wrote: >>>>> >>>>> Hello Jaroslav, >>>>> >>>>>> In connection with https://bugs.openjdk.java.net/browse/JDK-8258431 I >>>>>> am trying to figure out whether providing a cheap estimation of live >>>>>> set size is something actually achievable across various GC >>>>>> implementations. >>>>>> >>>>>> What I am looking at is piggy-backing on a concurrent mark task to get >>>>>> the summary size of live objects - using the 'straight-forward' >>>>>> heap-inspection like approach is prohibitively expensive. >>>>> >>>>> In Shenandoah, this information is already collected during concurrent >>>>> marking. We currently don't print it directly, but we could certainly do >>>>> that. I'll look into implementing it. I'll also look into exposing >>>>> liveness info via JMX. >>>>> >>>>> I'm not quite sure about G1: that information would only be collected >>>>> during mixed or full collections. I am not sure if G1 prints it, though. >>>>> >>>>> ZGC prints this under -Xlog:gc+heap: >>>>> >>>>> [6,502s][info][gc,heap ] GC(0) Mark Start >>>>> Mark End Relocate Start Relocate End High >>>>> Low >>>>> [6,502s][info][gc,heap ] GC(0) Capacity: 834M (10%) >>>>> 1076M (13%) 1092M (14%) 1092M (14%) 1092M (14%) >>>>> 834M (10%) >>>>> [6,502s][info][gc,heap ] GC(0) Free: 7154M (90%) >>>>> 6912M (87%) 6916M (87%) 7388M (92%) 7388M (92%) >>>>> 6896M (86%) >>>>> [6,502s][info][gc,heap ] GC(0) Used: 834M (10%) >>>>> 1076M (13%) 1072M (13%) 600M (8%) 1092M (14%) >>>>> 600M (8%) >>>>> [6,502s][info][gc,heap ] GC(0) Live: - >>>>> 195M (2%) 195M (2%) 195M (2%) - >>>>> - >>>>> [6,502s][info][gc,heap ] GC(0) Allocated: - >>>>> 242M (3%) 270M (3%) 380M (5%) - >>>>> - >>>>> [6,502s][info][gc,heap ] GC(0) Garbage: - >>>>> 638M (8%) 606M (8%) 24M (0%) - >>>>> - >>>>> [6,502s][info][gc,heap ] GC(0) Reclaimed: - >>>>> - 32M (0%) 614M (8%) - >>>>> - >>>>> >>>>> I hope that is useful? >>>>> >>>>> Thanks, >>>>> Roman >>>>> >>>> >>> From jaroslav.bachorik at datadoghq.com Mon Feb 15 10:47:01 2021 From: jaroslav.bachorik at datadoghq.com (=?UTF-8?Q?Jaroslav_Bachor=C3=ADk?=) Date: Mon, 15 Feb 2021 11:47:01 +0100 Subject: Can GC implementations provide a cheap estimation of live set size? In-Reply-To: <440f220b-c1ae-574b-741f-c52bdb1230e2@oracle.com> References: <440f220b-c1ae-574b-741f-c52bdb1230e2@oracle.com> Message-ID: On Mon, Feb 15, 2021 at 11:24 AM Per Liden wrote: > > Hi, > > On 2/15/21 10:44 AM, Jaroslav Bachor?k wrote: > > Hi again, > > > > I continued experimenting with Shenandoah and ZGC which already are > > tracking liveness. I am emitting a (partially filled) GCHeapSummary > > JFR event to capture used/live sizes. > > For Shenandoah the event is emitted at the very end of the > > `ShenandoahConcurrentGC::op_final_mark()` method and for ZGC it is the > > `ZMark::end()` method. The exact changes can be checked via branch > > comparison (https://github.com/openjdk/jdk/compare/master...DataDog:jb/live_set_1) > > but bear in mind that this is just an experimental code with no > > intention being checked in in its current form. > > > > Unfortunately, when I run an application on such modified JVM and > > collect a JFR recording the live set size numbers seem a bit 'low' - > > eg. on both ZGC and Shenandoah (using an already available liveness > > info) the reported liveness is ~50% of the reported usage. Is there a > > good explanation for this? > > When you create the GCHeapSummary, the "live" value reflects what was > live after marking, while the "used" value reflects the usage when the > GC cycle ended. So, after marking ended, some amount of garbage was > likely reclaimed, but then new objects were also allocated. For ZGC > (don't know if Shenandoah shows this), you can see details of how much > was reclaimed and how much was allocated in the GC log. Definitely - it's just that a diff of >100MB (eg. for ZGC 350MB used vs. 170MB live) struck me as a bit suspicious. But maybe it is expected. -JB- > > /Per > > > > > Thanks! > > > > -JB- > > > > On Thu, Feb 11, 2021 at 7:09 PM Jaroslav Bachor?k > > wrote: > >> > >> On Thu, Feb 11, 2021 at 6:55 PM Roman Kennke wrote: > >>> > >>> Notice that liveness information is only somewhat reliable right after > >>> marking. In Shenandoah, this is in the final-mark pause, and then the > >> > >> Yes, I understand this. What I am looking at is to have something like > >> 'last known liveness' value - captured at a well defined point and > >> providing an estimate within the bounds of GC implementation. > >> > >>> program is at a safepoint already. This is where you'd want to emit a > >>> JMX event or something similar. You can't simply query a counter and > >>> assume it represents current liveness in the middle or outside of GC > >>> cycle. This should be true for all GCs. > >>> > >>> For Serial and Parallel I am not sure at all that you can do this. > >>> AFAIK, they don't count liveness at all. > >>> > >>> Roman > >>> > >>>> Hi Roman, > >>>> > >>>> Thanks for your response. I checked ZGC implementation and, indeed, it > >>>> is very easy to get the liveness information just by extending > >>>> `ZStatHeap` class to report the last valid value of > >>>> `_at_mark_end.live`. > >>>> > >>>> I am also able to get this info from Shenandoah, although my first > >>>> attempt still involves a safepointing VM operation since I need to > >>>> iterate over regions to get the liveness info for each of them and sum > >>>> it up. I think it is still an acceptable trade-off, though. > >>>> > >>>> The next one in the queue is the Serial GC. My assumptions, based on > >>>> reading the code, are that for young gen 'live = used' at the end of > >>>> DefNewGeneration::collect() method and for old gen 'live = used - > >>>> slack' (slack is the cumulative size of objects considered to be alive > >>>> for the purpose of compaction although they are really dead - see > >>>> CompactibleSpace::scan_and_forward()). Does this sound reasonable? > >>>> > >>>> I will post my findings for Parallel GC and G1 GC later. > >>>> > >>>> Cheers, > >>>> > >>>> -JB- > >>>> > >>>> On Wed, Feb 10, 2021 at 11:34 AM Roman Kennke wrote: > >>>>> > >>>>> Hello Jaroslav, > >>>>> > >>>>>> In connection with https://bugs.openjdk.java.net/browse/JDK-8258431 I > >>>>>> am trying to figure out whether providing a cheap estimation of live > >>>>>> set size is something actually achievable across various GC > >>>>>> implementations. > >>>>>> > >>>>>> What I am looking at is piggy-backing on a concurrent mark task to get > >>>>>> the summary size of live objects - using the 'straight-forward' > >>>>>> heap-inspection like approach is prohibitively expensive. > >>>>> > >>>>> In Shenandoah, this information is already collected during concurrent > >>>>> marking. We currently don't print it directly, but we could certainly do > >>>>> that. I'll look into implementing it. I'll also look into exposing > >>>>> liveness info via JMX. > >>>>> > >>>>> I'm not quite sure about G1: that information would only be collected > >>>>> during mixed or full collections. I am not sure if G1 prints it, though. > >>>>> > >>>>> ZGC prints this under -Xlog:gc+heap: > >>>>> > >>>>> [6,502s][info][gc,heap ] GC(0) Mark Start > >>>>> Mark End Relocate Start Relocate End High > >>>>> Low > >>>>> [6,502s][info][gc,heap ] GC(0) Capacity: 834M (10%) > >>>>> 1076M (13%) 1092M (14%) 1092M (14%) 1092M (14%) > >>>>> 834M (10%) > >>>>> [6,502s][info][gc,heap ] GC(0) Free: 7154M (90%) > >>>>> 6912M (87%) 6916M (87%) 7388M (92%) 7388M (92%) > >>>>> 6896M (86%) > >>>>> [6,502s][info][gc,heap ] GC(0) Used: 834M (10%) > >>>>> 1076M (13%) 1072M (13%) 600M (8%) 1092M (14%) > >>>>> 600M (8%) > >>>>> [6,502s][info][gc,heap ] GC(0) Live: - > >>>>> 195M (2%) 195M (2%) 195M (2%) - > >>>>> - > >>>>> [6,502s][info][gc,heap ] GC(0) Allocated: - > >>>>> 242M (3%) 270M (3%) 380M (5%) - > >>>>> - > >>>>> [6,502s][info][gc,heap ] GC(0) Garbage: - > >>>>> 638M (8%) 606M (8%) 24M (0%) - > >>>>> - > >>>>> [6,502s][info][gc,heap ] GC(0) Reclaimed: - > >>>>> - 32M (0%) 614M (8%) - > >>>>> - > >>>>> > >>>>> I hope that is useful? > >>>>> > >>>>> Thanks, > >>>>> Roman > >>>>> > >>>> > >>> From per.liden at oracle.com Mon Feb 15 10:58:52 2021 From: per.liden at oracle.com (Per Liden) Date: Mon, 15 Feb 2021 11:58:52 +0100 Subject: Can GC implementations provide a cheap estimation of live set size? In-Reply-To: References: <440f220b-c1ae-574b-741f-c52bdb1230e2@oracle.com> Message-ID: <3496985f-7041-c792-7d5b-d7d569836437@oracle.com> On 2/15/21 11:47 AM, Jaroslav Bachor?k wrote: > On Mon, Feb 15, 2021 at 11:24 AM Per Liden wrote: >> >> Hi, >> >> On 2/15/21 10:44 AM, Jaroslav Bachor?k wrote: >>> Hi again, >>> >>> I continued experimenting with Shenandoah and ZGC which already are >>> tracking liveness. I am emitting a (partially filled) GCHeapSummary >>> JFR event to capture used/live sizes. >>> For Shenandoah the event is emitted at the very end of the >>> `ShenandoahConcurrentGC::op_final_mark()` method and for ZGC it is the >>> `ZMark::end()` method. The exact changes can be checked via branch >>> comparison (https://github.com/openjdk/jdk/compare/master...DataDog:jb/live_set_1) >>> but bear in mind that this is just an experimental code with no >>> intention being checked in in its current form. >>> >>> Unfortunately, when I run an application on such modified JVM and >>> collect a JFR recording the live set size numbers seem a bit 'low' - >>> eg. on both ZGC and Shenandoah (using an already available liveness >>> info) the reported liveness is ~50% of the reported usage. Is there a >>> good explanation for this? >> >> When you create the GCHeapSummary, the "live" value reflects what was >> live after marking, while the "used" value reflects the usage when the >> GC cycle ended. So, after marking ended, some amount of garbage was >> likely reclaimed, but then new objects were also allocated. For ZGC >> (don't know if Shenandoah shows this), you can see details of how much >> was reclaimed and how much was allocated in the GC log. > > Definitely - it's just that a diff of >100MB (eg. for ZGC 350MB used > vs. 170MB live) struck me as a bit suspicious. But maybe it is > expected. It's impossible to say if it's expected or not, without knowing what the application is doing, it's allocation rate, etc. The application could be allocating several gigabytes per second, in which case the diff could be large. However, if the application is just idling and isn't allocating anything, then live is expected to be equal (or close to equal) to used. /Per > > -JB- > >> >> /Per >> >>> >>> Thanks! >>> >>> -JB- >>> >>> On Thu, Feb 11, 2021 at 7:09 PM Jaroslav Bachor?k >>> wrote: >>>> >>>> On Thu, Feb 11, 2021 at 6:55 PM Roman Kennke wrote: >>>>> >>>>> Notice that liveness information is only somewhat reliable right after >>>>> marking. In Shenandoah, this is in the final-mark pause, and then the >>>> >>>> Yes, I understand this. What I am looking at is to have something like >>>> 'last known liveness' value - captured at a well defined point and >>>> providing an estimate within the bounds of GC implementation. >>>> >>>>> program is at a safepoint already. This is where you'd want to emit a >>>>> JMX event or something similar. You can't simply query a counter and >>>>> assume it represents current liveness in the middle or outside of GC >>>>> cycle. This should be true for all GCs. >>>>> >>>>> For Serial and Parallel I am not sure at all that you can do this. >>>>> AFAIK, they don't count liveness at all. >>>>> >>>>> Roman >>>>> >>>>>> Hi Roman, >>>>>> >>>>>> Thanks for your response. I checked ZGC implementation and, indeed, it >>>>>> is very easy to get the liveness information just by extending >>>>>> `ZStatHeap` class to report the last valid value of >>>>>> `_at_mark_end.live`. >>>>>> >>>>>> I am also able to get this info from Shenandoah, although my first >>>>>> attempt still involves a safepointing VM operation since I need to >>>>>> iterate over regions to get the liveness info for each of them and sum >>>>>> it up. I think it is still an acceptable trade-off, though. >>>>>> >>>>>> The next one in the queue is the Serial GC. My assumptions, based on >>>>>> reading the code, are that for young gen 'live = used' at the end of >>>>>> DefNewGeneration::collect() method and for old gen 'live = used - >>>>>> slack' (slack is the cumulative size of objects considered to be alive >>>>>> for the purpose of compaction although they are really dead - see >>>>>> CompactibleSpace::scan_and_forward()). Does this sound reasonable? >>>>>> >>>>>> I will post my findings for Parallel GC and G1 GC later. >>>>>> >>>>>> Cheers, >>>>>> >>>>>> -JB- >>>>>> >>>>>> On Wed, Feb 10, 2021 at 11:34 AM Roman Kennke wrote: >>>>>>> >>>>>>> Hello Jaroslav, >>>>>>> >>>>>>>> In connection with https://bugs.openjdk.java.net/browse/JDK-8258431 I >>>>>>>> am trying to figure out whether providing a cheap estimation of live >>>>>>>> set size is something actually achievable across various GC >>>>>>>> implementations. >>>>>>>> >>>>>>>> What I am looking at is piggy-backing on a concurrent mark task to get >>>>>>>> the summary size of live objects - using the 'straight-forward' >>>>>>>> heap-inspection like approach is prohibitively expensive. >>>>>>> >>>>>>> In Shenandoah, this information is already collected during concurrent >>>>>>> marking. We currently don't print it directly, but we could certainly do >>>>>>> that. I'll look into implementing it. I'll also look into exposing >>>>>>> liveness info via JMX. >>>>>>> >>>>>>> I'm not quite sure about G1: that information would only be collected >>>>>>> during mixed or full collections. I am not sure if G1 prints it, though. >>>>>>> >>>>>>> ZGC prints this under -Xlog:gc+heap: >>>>>>> >>>>>>> [6,502s][info][gc,heap ] GC(0) Mark Start >>>>>>> Mark End Relocate Start Relocate End High >>>>>>> Low >>>>>>> [6,502s][info][gc,heap ] GC(0) Capacity: 834M (10%) >>>>>>> 1076M (13%) 1092M (14%) 1092M (14%) 1092M (14%) >>>>>>> 834M (10%) >>>>>>> [6,502s][info][gc,heap ] GC(0) Free: 7154M (90%) >>>>>>> 6912M (87%) 6916M (87%) 7388M (92%) 7388M (92%) >>>>>>> 6896M (86%) >>>>>>> [6,502s][info][gc,heap ] GC(0) Used: 834M (10%) >>>>>>> 1076M (13%) 1072M (13%) 600M (8%) 1092M (14%) >>>>>>> 600M (8%) >>>>>>> [6,502s][info][gc,heap ] GC(0) Live: - >>>>>>> 195M (2%) 195M (2%) 195M (2%) - >>>>>>> - >>>>>>> [6,502s][info][gc,heap ] GC(0) Allocated: - >>>>>>> 242M (3%) 270M (3%) 380M (5%) - >>>>>>> - >>>>>>> [6,502s][info][gc,heap ] GC(0) Garbage: - >>>>>>> 638M (8%) 606M (8%) 24M (0%) - >>>>>>> - >>>>>>> [6,502s][info][gc,heap ] GC(0) Reclaimed: - >>>>>>> - 32M (0%) 614M (8%) - >>>>>>> - >>>>>>> >>>>>>> I hope that is useful? >>>>>>> >>>>>>> Thanks, >>>>>>> Roman >>>>>>> >>>>>> >>>>> From cgo at openjdk.java.net Mon Feb 15 13:29:53 2021 From: cgo at openjdk.java.net (Christoph =?UTF-8?B?R8O2dHRzY2hrZXM=?=) Date: Mon, 15 Feb 2021 13:29:53 GMT Subject: RFR: 8261752: Multiple GC test are missing memory requirements Message-ID: I used systemd to figure out which memory requirement makes sense for which test: $ systemd-run --user --scope -p MemoryMax=768M -p MemorySwapMax=0 /usr/bin/make TEST="..." test Tests succeeding with `768M` of MemoryMax got a requirement of 1G, all others got 2G and succeeded with a MemoryMax of 1536M. ------------- Commit messages: - Adds memory requirements. Changes: https://git.openjdk.java.net/jdk/pull/2575/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2575&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261752 Stats: 7 lines in 7 files changed: 2 ins; 0 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/2575.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2575/head:pull/2575 PR: https://git.openjdk.java.net/jdk/pull/2575 From rkennke at openjdk.java.net Mon Feb 15 15:20:58 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 15 Feb 2021 15:20:58 GMT Subject: RFR: 8261448: Preserve GC stack watermark across safepoints in StackWalk [v3] In-Reply-To: References: Message-ID: > I am observing the following assert: > > # Internal Error (/home/rkennke/src/openjdk/loom/src/hotspot/share/runtime/stackWatermark.cpp:178), pid=54418, tid=54534 > # assert(is_frame_safe(f)) failed: Frame must be safe > > (see issue for full hs_err) > > In StackWalk::fetchNextBatch() we prepare the entire stack to be processed by calling StackWatermarkSet::finish_processing(jt, NULL, StackWatermarkKind::gc), but then subsequently, during frames scan, perform allocations to fill in the frame information (fill_in_frames => LiveFrameStream::fill_frame => fill_live_stackframe) at where we could safepoint for GC, which could reset the stack watermark. > > This is only relevant for GCs that use the StackWatermark, e.g. ZGC and Shenandoah at the moment. > > Solution is to preserve the stack-watermark across safepoints in StackWalk::fetchNextBatch(). StackWalk::fetchFirstBatch() doesn't look to be affected by this: it is not using the stack-watermark. > > Testing: > - [x] StackWalk tests with Shenandoah/aggressive > - [x] StackWalk tests with ZGC/aggressive > - [ ] tier1 (+Shenandoah/ZGC) > - [ ] tier2 (+Shenandoah/ZGC) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Make KeepStackGCProcessedMark non-reentrant again ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2500/files - new: https://git.openjdk.java.net/jdk/pull/2500/files/6946499c..345f78b4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2500&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2500&range=01-02 Stats: 11 lines in 3 files changed: 0 ins; 9 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2500.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2500/head:pull/2500 PR: https://git.openjdk.java.net/jdk/pull/2500 From rkennke at openjdk.java.net Mon Feb 15 15:20:59 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 15 Feb 2021 15:20:59 GMT Subject: RFR: 8261448: Preserve GC stack watermark across safepoints in StackWalk [v2] In-Reply-To: References: Message-ID: <2KGNm2sghEHT4velRWjE5yMCU5lBdvYhk2UkPUZktV8=.01c29e94-e5e6-47b1-815b-e327076d8c74@github.com> On Mon, 15 Feb 2021 09:26:03 GMT, Stefan Karlsson wrote: >> Nesting code looks wrong. > > I incorrectly read Erik's comment as "Nesting code looks **good**", so I created a unit test to show the problem with the patch: > https://github.com/stefank/jdk/commit/8760f1b0409b3cccf76a8ea417b90e66da31af72 > > Maybe you could build a few more test based on this? > I think this solution is wrong, regarding nesting. There is only a single node but it looks like you think there are multiple. The result is seemingly that the unlink function won't unlink anything, which permanently disables incremental stack scanning on that thread. > Is there any way the mark can be placed closer to the problematic allocation so we don't need nesting? I just realized that the reentrancy comes from the Java call lower in fetchFirstBatch(). The problem can be easily avoided by putting the KeepStackGCProcessedMark in sensible scope that excludes the call. ------------- PR: https://git.openjdk.java.net/jdk/pull/2500 From cgo at openjdk.java.net Mon Feb 15 15:25:54 2021 From: cgo at openjdk.java.net (Christoph =?UTF-8?B?R8O2dHRzY2hrZXM=?=) Date: Mon, 15 Feb 2021 15:25:54 GMT Subject: RFR: 8261758: [TESTBUG] gc/g1/TestGCLogMessages.java fails if ergonomics detect too small InitialHeapSize Message-ID: Adds an explicit -Xms to one part of the test case, to not rely on ergonomics to detect the correct InitialHeapSize. It looks like one part of the whole test case implicitly relied on the fact, that `InitialHeapSize` == `MaxHeapSize`. Since the `MaxHeapSize` is very small (32M), this is almost always true. But if the test device has less than 2G of memory, the ergonomics configure the `InitialHeapSize` to be smaller than the `MaxHeapSize`. ------------- Commit messages: - Fixes indention. - Adds -Xms = -Xmx to not rely on ergonomics. Changes: https://git.openjdk.java.net/jdk/pull/2577/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2577&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261758 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2577.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2577/head:pull/2577 PR: https://git.openjdk.java.net/jdk/pull/2577 From lkorinth at openjdk.java.net Mon Feb 15 15:56:53 2021 From: lkorinth at openjdk.java.net (Leo Korinth) Date: Mon, 15 Feb 2021 15:56:53 GMT Subject: RFR: 8260415: Remove unused class ReferenceProcessorMTProcMutator Message-ID: ReferenceProcessorMTProcMutator is not used. ReferenceProcessorMTDiscoveryMutator seems to do the same and is still being used. ------------- Commit messages: - 8260415 Changes: https://git.openjdk.java.net/jdk/pull/2578/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2578&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260415 Stats: 22 lines in 1 file changed: 0 ins; 22 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2578.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2578/head:pull/2578 PR: https://git.openjdk.java.net/jdk/pull/2578 From ayang at openjdk.java.net Mon Feb 15 16:02:38 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Mon, 15 Feb 2021 16:02:38 GMT Subject: RFR: 8260415: Remove unused class ReferenceProcessorMTProcMutator In-Reply-To: References: Message-ID: On Mon, 15 Feb 2021 15:51:30 GMT, Leo Korinth wrote: > ReferenceProcessorMTProcMutator is not used. ReferenceProcessorMTDiscoveryMutator seems to do the same and is still being used. Marked as reviewed by ayang (Author). ------------- PR: https://git.openjdk.java.net/jdk/pull/2578 From sjohanss at openjdk.java.net Mon Feb 15 20:41:38 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Mon, 15 Feb 2021 20:41:38 GMT Subject: RFR: 8260415: Remove unused class ReferenceProcessorMTProcMutator In-Reply-To: References: Message-ID: On Mon, 15 Feb 2021 15:51:30 GMT, Leo Korinth wrote: > ReferenceProcessorMTProcMutator is not used. ReferenceProcessorMTDiscoveryMutator seems to do the same and is still being used. Looks good and trivial. ------------- Marked as reviewed by sjohanss (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2578 From rkennke at openjdk.java.net Mon Feb 15 21:12:53 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 15 Feb 2021 21:12:53 GMT Subject: RFR: 8261413: Shenandoah: Disable class-unloading in I-U mode [v3] In-Reply-To: References: Message-ID: <5rxb-j7jLWGsanoowSjvLIzFwCtlH_FgBHo-GM7fkyQ=.4cf0989c-f784-48ad-b60c-e4613f35270d@github.com> > JDK-8261341 describes a serious problem with I-U mode and class-unloading. Let's disable class-unloading in I-U for now as a workaround. > > Testing: > - [ ] hotspot_gc_shenandoah > - [ ] tier1 (+UseShenandoahGC +IU) > - [x] runtime/CreateMirror/ArraysNewInstanceBug.java (+UseShenandoahGC +IU +aggressive) many times in a row w/o failure Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Don't disable all class-unloading with I-U, disabling concurrent class-unloading is sufficient ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2477/files - new: https://git.openjdk.java.net/jdk/pull/2477/files/e3c1b459..6e99cc98 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2477&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2477&range=01-02 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/2477.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2477/head:pull/2477 PR: https://git.openjdk.java.net/jdk/pull/2477 From shade at openjdk.java.net Tue Feb 16 07:26:40 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 16 Feb 2021 07:26:40 GMT Subject: RFR: 8261413: Shenandoah: Disable class-unloading in I-U mode [v3] In-Reply-To: <5rxb-j7jLWGsanoowSjvLIzFwCtlH_FgBHo-GM7fkyQ=.4cf0989c-f784-48ad-b60c-e4613f35270d@github.com> References: <5rxb-j7jLWGsanoowSjvLIzFwCtlH_FgBHo-GM7fkyQ=.4cf0989c-f784-48ad-b60c-e4613f35270d@github.com> Message-ID: On Mon, 15 Feb 2021 21:12:53 GMT, Roman Kennke wrote: >> JDK-8261341 describes a serious problem with I-U mode and class-unloading. Let's disable class-unloading in I-U for now as a workaround. >> >> Testing: >> - [x] hotspot_gc_shenandoah >> - [x] tier1 (+UseShenandoahGC +IU) >> - [x] runtime/CreateMirror/ArraysNewInstanceBug.java (+UseShenandoahGC +IU +aggressive) many times in a row w/o failure > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Don't disable all class-unloading with I-U, disabling concurrent class-unloading is sufficient Marked as reviewed by shade (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2477 From rkennke at openjdk.java.net Tue Feb 16 08:20:39 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 16 Feb 2021 08:20:39 GMT Subject: Integrated: 8261413: Shenandoah: Disable class-unloading in I-U mode In-Reply-To: References: Message-ID: On Tue, 9 Feb 2021 11:58:58 GMT, Roman Kennke wrote: > JDK-8261341 describes a serious problem with I-U mode and class-unloading. Let's disable class-unloading in I-U for now as a workaround. > > Testing: > - [x] hotspot_gc_shenandoah > - [x] tier1 (+UseShenandoahGC +IU) > - [x] runtime/CreateMirror/ArraysNewInstanceBug.java (+UseShenandoahGC +IU +aggressive) many times in a row w/o failure This pull request has now been integrated. Changeset: e2d52ae2 Author: Roman Kennke URL: https://git.openjdk.java.net/jdk/commit/e2d52ae2 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod 8261413: Shenandoah: Disable class-unloading in I-U mode Reviewed-by: shade, zgu ------------- PR: https://git.openjdk.java.net/jdk/pull/2477 From tschatzl at openjdk.java.net Tue Feb 16 08:53:42 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 16 Feb 2021 08:53:42 GMT Subject: RFR: 8261758: [TESTBUG] gc/g1/TestGCLogMessages.java fails if ergonomics detect too small InitialHeapSize In-Reply-To: References: Message-ID: <4GPMkO2QkdP7_JXlyDsFYP_BBUEWCaS0VVDSs3Go7aE=.7538081c-4662-4158-a8c4-36e9a15f5d0d@github.com> On Mon, 15 Feb 2021 15:20:56 GMT, Christoph G?ttschkes wrote: > Adds an explicit -Xms to one part of the test case, to not rely on ergonomics to detect the correct InitialHeapSize. > > It looks like one part of the whole test case implicitly relied on the fact, that `InitialHeapSize` == `MaxHeapSize`. Since the `MaxHeapSize` is very small (32M), this is almost always true. But if the test device has less than 2G of memory, the ergonomics configure the `InitialHeapSize` to be smaller than the `MaxHeapSize`. Marked as reviewed by tschatzl (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2577 From tschatzl at openjdk.java.net Tue Feb 16 08:55:40 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 16 Feb 2021 08:55:40 GMT Subject: RFR: 8261752: Multiple GC test are missing memory requirements In-Reply-To: References: Message-ID: On Mon, 15 Feb 2021 13:24:57 GMT, Christoph G?ttschkes wrote: > I used systemd to figure out which memory requirement makes sense for which test: > > $ systemd-run --user --scope -p MemoryMax=768M -p MemorySwapMax=0 /usr/bin/make TEST="..." test > > Tests succeeding with `768M` of MemoryMax got a requirement of 1G, all others got 2G and succeeded with a MemoryMax of 1536M. Thanks. Lgtm. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2575 From rkennke at openjdk.java.net Tue Feb 16 10:14:40 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 16 Feb 2021 10:14:40 GMT Subject: RFR: 8261501: Shenandoah: reconsider heap statistics memory ordering In-Reply-To: <0O1tXXs991770rhrpYioXIWr6m-OhDFMZINDiQ_UXc4=.92460035-468e-4bf5-97cb-bff58d1a2ede@github.com> References: <0O1tXXs991770rhrpYioXIWr6m-OhDFMZINDiQ_UXc4=.92460035-468e-4bf5-97cb-bff58d1a2ede@github.com> Message-ID: <3w0bggGwSikPsnaGTFPIMjsNNLUNu1vxVCLraAf6nhA=.8f35a639-9325-4c18-9b09-7b62e67b8dd8@github.com> On Wed, 10 Feb 2021 11:10:35 GMT, Aleksey Shipilev wrote: > ShenandoahHeap collects heap-wide statistics (used, committed, etc). It does so by atomically updating them with default CASes. Unfortunately, Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. > > This is excessive for statistics gathering, and "relaxed" should be just as good. > > Additional testing: > - [x] Linux x86_64 hotspot_gc_shenandoah > - [x] Linux AArch64 hotspot_gc_shenandoah > - [x] Linux AArch64 tier1 with Shenandoah Looks good to me! Thanks! ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2504 From shade at openjdk.java.net Tue Feb 16 11:35:51 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 16 Feb 2021 11:35:51 GMT Subject: Integrated: 8261501: Shenandoah: reconsider heap statistics memory ordering In-Reply-To: <0O1tXXs991770rhrpYioXIWr6m-OhDFMZINDiQ_UXc4=.92460035-468e-4bf5-97cb-bff58d1a2ede@github.com> References: <0O1tXXs991770rhrpYioXIWr6m-OhDFMZINDiQ_UXc4=.92460035-468e-4bf5-97cb-bff58d1a2ede@github.com> Message-ID: On Wed, 10 Feb 2021 11:10:35 GMT, Aleksey Shipilev wrote: > ShenandoahHeap collects heap-wide statistics (used, committed, etc). It does so by atomically updating them with default CASes. Unfortunately, Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. > > This is excessive for statistics gathering, and "relaxed" should be just as good. > > Additional testing: > - [x] Linux x86_64 hotspot_gc_shenandoah > - [x] Linux AArch64 hotspot_gc_shenandoah > - [x] Linux AArch64 tier1 with Shenandoah This pull request has now been integrated. Changeset: 3f8819c6 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/3f8819c6 Stats: 9 lines in 1 file changed: 0 ins; 1 del; 8 mod 8261501: Shenandoah: reconsider heap statistics memory ordering Reviewed-by: rkennke ------------- PR: https://git.openjdk.java.net/jdk/pull/2504 From shade at openjdk.java.net Tue Feb 16 13:21:00 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 16 Feb 2021 13:21:00 GMT Subject: RFR: 8261495: Shenandoah: reconsider update references memory ordering [v4] In-Reply-To: <4RLKvcdaWu0Cu6owC3yGoVY1KVEsYjBZEFJhfdwnhWg=.65fbeae1-58f6-48d3-a2ed-981858ef7da9@github.com> References: <4RLKvcdaWu0Cu6owC3yGoVY1KVEsYjBZEFJhfdwnhWg=.65fbeae1-58f6-48d3-a2ed-981858ef7da9@github.com> Message-ID: > Shenandoah update heap references code uses default Atomic::cmpxchg to avoid races with mutator updates. Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. > > This seems to be excessive for Shenandoah update references code, and "release" is enough. We do not seem to piggyback on update-references memory effects anywhere (in fact, if not for mutator, we would not even need a CAS). But, there is an interplay with concurrent evacuation and updates from self-healing. > > Average time goes down, the number of GC cycles go up, since the cycles are shorter. > > Additional testing: > - [x] Linux x86_64 hotspot_gc_shenandoah > - [x] Linux AArch64 hotspot_gc_shenandoah > - [x] Linux AArch64 tier1 with Shenandoah Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: - Comment touchup - Specialize out witness-checking methods, drop acquire again - Even more explanation - Move the comment - Also handle clearing the oops - Minor touchups to the comment - Merge branch 'master' into JDK-8261495-shenandoah-updaterefs-memord - Use release only - Do acq_rel instead - 8261495: Shenandoah: reconsider update references memory ordering ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2498/files - new: https://git.openjdk.java.net/jdk/pull/2498/files/36bee3a9..0d299968 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2498&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2498&range=02-03 Stats: 12253 lines in 405 files changed: 6246 ins; 3773 del; 2234 mod Patch: https://git.openjdk.java.net/jdk/pull/2498.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2498/head:pull/2498 PR: https://git.openjdk.java.net/jdk/pull/2498 From github.com+168222+mgkwill at openjdk.java.net Tue Feb 16 16:32:56 2021 From: github.com+168222+mgkwill at openjdk.java.net (Marcus G K Williams) Date: Tue, 16 Feb 2021 16:32:56 GMT Subject: RFR: JDK-8256155: os::Linux Populate all large_page_sizes, select smallest page size in reserve_memory_special_huge_tlbfs* [v16] In-Reply-To: References: Message-ID: > When using LargePageSizeInBytes=1G, os::Linux::reserve_memory_special_huge_tlbfs* cannot select large pages smaller than 1G. Code heap usually uses less than 1G, so currently the code precludes code heap from using > Large pages in this circumstance and when os::Linux::reserve_memory_special_huge_tlbfs* is called page sizes fall back to Linux::page_size() (usually 4k). > > This change allows the above use case by populating all large_page_sizes present in /sys/kernel/mm/hugepages in _page_sizes upon calling os::Linux::setup_large_page_size(). > > In os::Linux::reserve_memory_special_huge_tlbfs* we then select the largest large page size available in _page_sizes that is smaller than bytes being reserved. Marcus G K Williams has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 25 commits: - Merge branch 'master' into pull/1153 - kstefanj update Signed-off-by: Marcus G K Williams - Merge branch 'master' into update_hlp - Merge branch 'master' into update_hlp - Remove extraneous ' from warning Signed-off-by: Marcus G K Williams - Merge branch 'master' into update_hlp - Merge branch 'master' into update_hlp - Merge branch 'master' into update_hlp - Fix os::large_page_size() in last update Signed-off-by: Marcus G K Williams - Ivan W. Requested Changes Removed os::Linux::select_large_page_size and use os::page_size_for_region instead Removed Linux::find_large_page_size and use register_large_page_sizes. Streamlined Linux::setup_large_page_size Signed-off-by: Marcus G K Williams - ... and 15 more: https://git.openjdk.java.net/jdk/compare/f4cfd758...f2e44ac7 ------------- Changes: https://git.openjdk.java.net/jdk/pull/1153/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1153&range=15 Stats: 71 lines in 2 files changed: 32 ins; 10 del; 29 mod Patch: https://git.openjdk.java.net/jdk/pull/1153.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/1153/head:pull/1153 PR: https://git.openjdk.java.net/jdk/pull/1153 From lkorinth at openjdk.java.net Tue Feb 16 18:32:38 2021 From: lkorinth at openjdk.java.net (Leo Korinth) Date: Tue, 16 Feb 2021 18:32:38 GMT Subject: RFR: 8260415: Remove unused class ReferenceProcessorMTProcMutator In-Reply-To: References: Message-ID: On Mon, 15 Feb 2021 20:39:16 GMT, Stefan Johansson wrote: >> ReferenceProcessorMTProcMutator is not used. ReferenceProcessorMTDiscoveryMutator seems to do the same and is still being used. > > Looks good and trivial. Thanks Albert and Stefan! ------------- PR: https://git.openjdk.java.net/jdk/pull/2578 From lkorinth at openjdk.java.net Tue Feb 16 18:32:39 2021 From: lkorinth at openjdk.java.net (Leo Korinth) Date: Tue, 16 Feb 2021 18:32:39 GMT Subject: Integrated: 8260415: Remove unused class ReferenceProcessorMTProcMutator In-Reply-To: References: Message-ID: On Mon, 15 Feb 2021 15:51:30 GMT, Leo Korinth wrote: > ReferenceProcessorMTProcMutator is not used. ReferenceProcessorMTDiscoveryMutator seems to do the same and is still being used. This pull request has now been integrated. Changeset: 61a659f4 Author: Leo Korinth URL: https://git.openjdk.java.net/jdk/commit/61a659f4 Stats: 22 lines in 1 file changed: 0 ins; 22 del; 0 mod 8260415: Remove unused class ReferenceProcessorMTProcMutator Reviewed-by: ayang, sjohanss ------------- PR: https://git.openjdk.java.net/jdk/pull/2578 From lkorinth at openjdk.java.net Tue Feb 16 18:59:51 2021 From: lkorinth at openjdk.java.net (Leo Korinth) Date: Tue, 16 Feb 2021 18:59:51 GMT Subject: RFR: 8260416: Remove unused method ReferenceProcessor::is_mt_processing_set_up() Message-ID: Code is not used. ------------- Commit messages: - 8260416: Remove unused method ReferenceProcessor::is_mt_processing_set_up() Changes: https://git.openjdk.java.net/jdk/pull/2591/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2591&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8260416 Stats: 7 lines in 2 files changed: 0 ins; 7 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2591.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2591/head:pull/2591 PR: https://git.openjdk.java.net/jdk/pull/2591 From shade at openjdk.java.net Tue Feb 16 19:17:53 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 16 Feb 2021 19:17:53 GMT Subject: RFR: 8261842: Shenandoah: cleanup ShenandoahHeapRegionSet Message-ID: There are a couple of stale/unused methods in ShenandoahHeapRegionSet that we can eliminate instead of improving them, for example in JDK-8261838. Additional testing: - [x] Linux x86_64 `hotspot_gc_shenandoah` ------------- Commit messages: - Update - 8261842: Shenandoah: cleanup ShenandoahHeapRegionSet Changes: https://git.openjdk.java.net/jdk/pull/2592/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2592&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261842 Stats: 88 lines in 3 files changed: 0 ins; 82 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/2592.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2592/head:pull/2592 PR: https://git.openjdk.java.net/jdk/pull/2592 From shade at openjdk.java.net Tue Feb 16 19:19:00 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Tue, 16 Feb 2021 19:19:00 GMT Subject: RFR: 8261838: Shenandoah: reconsider heap region iterators memory ordering Message-ID: We use CASes to distributed workers between regions. Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. This seems to be excessive for region distribution code, and "relaxed" is enough, since we don't piggyback memory ordering on these. This also calls for some refactoring in the code itself. Additional testing: - [x] `hotspot_gc_shenandoah` - [ ] Ad-hoc performance runs ------------- Commit messages: - 8261838: Shenandoah: reconsider heap region iterators memory ordering Changes: https://git.openjdk.java.net/jdk/pull/2593/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2593&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261838 Stats: 24 lines in 4 files changed: 2 ins; 3 del; 19 mod Patch: https://git.openjdk.java.net/jdk/pull/2593.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2593/head:pull/2593 PR: https://git.openjdk.java.net/jdk/pull/2593 From rkennke at openjdk.java.net Tue Feb 16 19:28:40 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 16 Feb 2021 19:28:40 GMT Subject: RFR: 8261842: Shenandoah: cleanup ShenandoahHeapRegionSet In-Reply-To: References: Message-ID: On Tue, 16 Feb 2021 19:11:27 GMT, Aleksey Shipilev wrote: > There are a couple of stale/unused methods in ShenandoahHeapRegionSet that we can eliminate instead of improving them, for example in JDK-8261838. > > Additional testing: > - [x] Linux x86_64 `hotspot_gc_shenandoah` Ok! ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2592 From rkennke at openjdk.java.net Tue Feb 16 19:32:40 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 16 Feb 2021 19:32:40 GMT Subject: RFR: 8261838: Shenandoah: reconsider heap region iterators memory ordering In-Reply-To: References: Message-ID: <5iWqnhftllWZb8XoOXKQcGTgR1pbf3odYvN8BGW6Xwg=.3856984a-aeae-44f9-beba-ed476d9f6e22@github.com> On Tue, 16 Feb 2021 19:13:03 GMT, Aleksey Shipilev wrote: > We use CASes to distributed workers between regions. Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. > > This seems to be excessive for region distribution code, and "relaxed" is enough, since we don't piggyback memory ordering on these. > > This also calls for some refactoring in the code itself. > > Additional testing: > - [x] `hotspot_gc_shenandoah` > - [ ] Ad-hoc performance runs Looks good! ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2593 From ayang at openjdk.java.net Tue Feb 16 19:38:41 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 16 Feb 2021 19:38:41 GMT Subject: RFR: 8260416: Remove unused method ReferenceProcessor::is_mt_processing_set_up() In-Reply-To: References: Message-ID: On Tue, 16 Feb 2021 18:53:54 GMT, Leo Korinth wrote: > Code is not used. Marked as reviewed by ayang (Author). ------------- PR: https://git.openjdk.java.net/jdk/pull/2591 From kbarrett at openjdk.java.net Wed Feb 17 06:43:47 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 17 Feb 2021 06:43:47 GMT Subject: RFR: 8260416: Remove unused method ReferenceProcessor::is_mt_processing_set_up() In-Reply-To: References: Message-ID: On Tue, 16 Feb 2021 18:53:54 GMT, Leo Korinth wrote: > Code is not used. Marked as reviewed by kbarrett (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2591 From shade at openjdk.java.net Wed Feb 17 07:00:40 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Wed, 17 Feb 2021 07:00:40 GMT Subject: Integrated: 8261842: Shenandoah: cleanup ShenandoahHeapRegionSet In-Reply-To: References: Message-ID: On Tue, 16 Feb 2021 19:11:27 GMT, Aleksey Shipilev wrote: > There are a couple of stale/unused methods in ShenandoahHeapRegionSet that we can eliminate instead of improving them, for example in JDK-8261838. > > Additional testing: > - [x] Linux x86_64 `hotspot_gc_shenandoah` This pull request has now been integrated. Changeset: d1950335 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/d1950335 Stats: 88 lines in 3 files changed: 0 ins; 82 del; 6 mod 8261842: Shenandoah: cleanup ShenandoahHeapRegionSet Reviewed-by: rkennke ------------- PR: https://git.openjdk.java.net/jdk/pull/2592 From ayang at openjdk.java.net Wed Feb 17 08:05:01 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Wed, 17 Feb 2021 08:05:01 GMT Subject: RFR: 8228748: Remove GCLocker::_doing_gc Message-ID: Some refactoring in `GCLocker` and more comments in `jni_lock` on how the synchronization works there. ------------- Commit messages: - lock Changes: https://git.openjdk.java.net/jdk/pull/2602/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2602&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8228748 Stats: 15 lines in 2 files changed: 4 ins; 5 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/2602.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2602/head:pull/2602 PR: https://git.openjdk.java.net/jdk/pull/2602 From cgo at openjdk.java.net Wed Feb 17 08:12:38 2021 From: cgo at openjdk.java.net (Christoph =?UTF-8?B?R8O2dHRzY2hrZXM=?=) Date: Wed, 17 Feb 2021 08:12:38 GMT Subject: RFR: 8261752: Multiple GC test are missing memory requirements In-Reply-To: References: Message-ID: On Tue, 16 Feb 2021 08:53:21 GMT, Thomas Schatzl wrote: >> I used systemd to figure out which memory requirement makes sense for which test: >> >> $ systemd-run --user --scope -p MemoryMax=768M -p MemorySwapMax=0 /usr/bin/make TEST="..." test >> >> Tests succeeding with `768M` of MemoryMax got a requirement of 1G, all others got 2G and succeeded with a MemoryMax of 1536M. > > Thanks. Lgtm. Thanks for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/2575 From cgo at openjdk.java.net Wed Feb 17 08:13:45 2021 From: cgo at openjdk.java.net (Christoph =?UTF-8?B?R8O2dHRzY2hrZXM=?=) Date: Wed, 17 Feb 2021 08:13:45 GMT Subject: RFR: 8261758: [TESTBUG] gc/g1/TestGCLogMessages.java fails if ergonomics detect too small InitialHeapSize In-Reply-To: <4GPMkO2QkdP7_JXlyDsFYP_BBUEWCaS0VVDSs3Go7aE=.7538081c-4662-4158-a8c4-36e9a15f5d0d@github.com> References: <4GPMkO2QkdP7_JXlyDsFYP_BBUEWCaS0VVDSs3Go7aE=.7538081c-4662-4158-a8c4-36e9a15f5d0d@github.com> Message-ID: On Tue, 16 Feb 2021 08:50:48 GMT, Thomas Schatzl wrote: >> Adds an explicit -Xms to one part of the test case, to not rely on ergonomics to detect the correct InitialHeapSize. >> >> It looks like one part of the whole test case implicitly relied on the fact, that `InitialHeapSize` == `MaxHeapSize`. Since the `MaxHeapSize` is very small (32M), this is almost always true. But if the test device has less than 2G of memory, the ergonomics configure the `InitialHeapSize` to be smaller than the `MaxHeapSize`. > > Marked as reviewed by tschatzl (Reviewer). Thanks for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/2577 From sjohanss at openjdk.java.net Wed Feb 17 09:06:44 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 17 Feb 2021 09:06:44 GMT Subject: RFR: 8261758: [TESTBUG] gc/g1/TestGCLogMessages.java fails if ergonomics detect too small InitialHeapSize In-Reply-To: References: Message-ID: On Mon, 15 Feb 2021 15:20:56 GMT, Christoph G?ttschkes wrote: > Adds an explicit -Xms to one part of the test case, to not rely on ergonomics to detect the correct InitialHeapSize. > > It looks like one part of the whole test case implicitly relied on the fact, that `InitialHeapSize` == `MaxHeapSize`. Since the `MaxHeapSize` is very small (32M), this is almost always true. But if the test device has less than 2G of memory, the ergonomics configure the `InitialHeapSize` to be smaller than the `MaxHeapSize`. Marked as reviewed by sjohanss (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2577 From sjohanss at openjdk.java.net Wed Feb 17 09:09:41 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 17 Feb 2021 09:09:41 GMT Subject: RFR: 8261752: Multiple GC test are missing memory requirements In-Reply-To: References: Message-ID: On Mon, 15 Feb 2021 13:24:57 GMT, Christoph G?ttschkes wrote: > I used systemd to figure out which memory requirement makes sense for which test: > > $ systemd-run --user --scope -p MemoryMax=768M -p MemorySwapMax=0 /usr/bin/make TEST="..." test > > Tests succeeding with `768M` of MemoryMax got a requirement of 1G, all others got 2G and succeeded with a MemoryMax of 1536M. Looks good. ------------- Marked as reviewed by sjohanss (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2575 From cgo at openjdk.java.net Wed Feb 17 10:44:51 2021 From: cgo at openjdk.java.net (Christoph =?UTF-8?B?R8O2dHRzY2hrZXM=?=) Date: Wed, 17 Feb 2021 10:44:51 GMT Subject: Integrated: 8261758: [TESTBUG] gc/g1/TestGCLogMessages.java fails if ergonomics detect too small InitialHeapSize In-Reply-To: References: Message-ID: On Mon, 15 Feb 2021 15:20:56 GMT, Christoph G?ttschkes wrote: > Adds an explicit -Xms to one part of the test case, to not rely on ergonomics to detect the correct InitialHeapSize. > > It looks like one part of the whole test case implicitly relied on the fact, that `InitialHeapSize` == `MaxHeapSize`. Since the `MaxHeapSize` is very small (32M), this is almost always true. But if the test device has less than 2G of memory, the ergonomics configure the `InitialHeapSize` to be smaller than the `MaxHeapSize`. This pull request has now been integrated. Changeset: c7885eb1 Author: Christoph G?ttschkes Committer: Thomas Schatzl URL: https://git.openjdk.java.net/jdk/commit/c7885eb1 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8261758: [TESTBUG] gc/g1/TestGCLogMessages.java fails if ergonomics detect too small InitialHeapSize Reviewed-by: tschatzl, sjohanss ------------- PR: https://git.openjdk.java.net/jdk/pull/2577 From cgo at openjdk.java.net Wed Feb 17 10:44:38 2021 From: cgo at openjdk.java.net (Christoph =?UTF-8?B?R8O2dHRzY2hrZXM=?=) Date: Wed, 17 Feb 2021 10:44:38 GMT Subject: Integrated: 8261752: Multiple GC test are missing memory requirements In-Reply-To: References: Message-ID: On Mon, 15 Feb 2021 13:24:57 GMT, Christoph G?ttschkes wrote: > I used systemd to figure out which memory requirement makes sense for which test: > > $ systemd-run --user --scope -p MemoryMax=768M -p MemorySwapMax=0 /usr/bin/make TEST="..." test > > Tests succeeding with `768M` of MemoryMax got a requirement of 1G, all others got 2G and succeeded with a MemoryMax of 1536M. This pull request has now been integrated. Changeset: 2e18b52a Author: Christoph G?ttschkes Committer: Thomas Schatzl URL: https://git.openjdk.java.net/jdk/commit/2e18b52a Stats: 7 lines in 7 files changed: 2 ins; 0 del; 5 mod 8261752: Multiple GC test are missing memory requirements Reviewed-by: tschatzl, sjohanss ------------- PR: https://git.openjdk.java.net/jdk/pull/2575 From kbarrett at openjdk.java.net Wed Feb 17 15:23:54 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 17 Feb 2021 15:23:54 GMT Subject: RFR: 8261905: Move implementation of OopStorage num_dead related functions Message-ID: Please review this trivial change which just moves several functions to a different location in the same file. The old location is in the middle of some unrelated functionality. Testing: mach5 tier1 ------------- Commit messages: - move num_dead functions Changes: https://git.openjdk.java.net/jdk/pull/2608/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2608&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261905 Stats: 30 lines in 1 file changed: 15 ins; 15 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2608.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2608/head:pull/2608 PR: https://git.openjdk.java.net/jdk/pull/2608 From ayang at openjdk.java.net Wed Feb 17 15:41:49 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Wed, 17 Feb 2021 15:41:49 GMT Subject: RFR: 8261905: Move implementation of OopStorage num_dead related functions In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 15:18:40 GMT, Kim Barrett wrote: > Please review this trivial change which just moves several functions to a > different location in the same file. The old location is in the middle of > some unrelated functionality. > > Testing: > mach5 tier1 Marked as reviewed by ayang (Author). ------------- PR: https://git.openjdk.java.net/jdk/pull/2608 From rkennke at openjdk.java.net Wed Feb 17 17:36:45 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Wed, 17 Feb 2021 17:36:45 GMT Subject: RFR: 8261473: Shenandoah: Add breakpoint suppoprt [v2] In-Reply-To: <9gXmTI0gU9zTr-HffSqSsVEVjmUED0rNINulpr_mjQM=.b353278c-a4f1-45f5-bf8d-ed1e33fbb0c9@github.com> References: <9gXmTI0gU9zTr-HffSqSsVEVjmUED0rNINulpr_mjQM=.b353278c-a4f1-45f5-bf8d-ed1e33fbb0c9@github.com> Message-ID: On Wed, 10 Feb 2021 20:13:52 GMT, Zhengyu Gu wrote: >> Please review this patch that adds breakpoint support for Shenandoah, that allows Shenandoah to access a few tests: >> >> gc/TestConcurrentGCBreakpoints.java >> gc/TestJNIWeak/TestJNIWeak.java >> gc/TestReferenceClearDuringMarking.java >> gc/TestReferenceClearDuringReferenceProcessing.java >> gc/TestReferenceRefersTo.java >> >> The drawback is that above tests can not run with passive mode, which can result tests to hang, as breakpoints only apply to concurrent GC. >> >> Test: >> - [x] hotspot_gc_shenandoah >> - [x] tier1 with Shenandoah > > Zhengyu Gu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - Merge > - update > - init update The change looks good to me, thanks! ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2489 From zgu at openjdk.java.net Wed Feb 17 18:35:40 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Wed, 17 Feb 2021 18:35:40 GMT Subject: Withdrawn: 8259647: Add support for JFR event ObjectCountAfterGC to Shenandoah In-Reply-To: References: Message-ID: On Wed, 3 Feb 2021 20:05:33 GMT, Zhengyu Gu wrote: > Please review this patch that adds JFR ObjectCountAfterGC event support. > > AFAICT, the event is off by default. If it is enabled, it distorts Shenandoah pause characteristics, since it performs heap walk during final mark pause. > > When event is disabled: > `[191.033s][info][gc,stats] Pause Init Mark (G) 454 us` > `[191.033s][info][gc,stats] Pause Init Mark (N) 13 us` > > When event is enabled: > `[396.631s][info][gc,stats] Pause Final Mark (G) 43199 us` > `[396.631s][info][gc,stats] Pause Final Mark (N) 42982 us` > > Test: > - [x] hotspot_gc_shenandoah This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.java.net/jdk/pull/2386 From tschatzl at openjdk.java.net Wed Feb 17 21:29:39 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 17 Feb 2021 21:29:39 GMT Subject: RFR: 8261905: Move implementation of OopStorage num_dead related functions In-Reply-To: References: Message-ID: <5pirmUU8MAKjn9tKXjnVa7bgmqtbGkn3tJ_DatXPMy4=.af779ba6-9828-48b7-95c0-aa908294a4c2@github.com> On Wed, 17 Feb 2021 15:18:40 GMT, Kim Barrett wrote: > Please review this trivial change which just moves several functions to a > different location in the same file. The old location is in the middle of > some unrelated functionality. > > Testing: > mach5 tier1 Marked as reviewed by tschatzl (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2608 From samuel_thomas at brown.edu Wed Feb 17 21:55:18 2021 From: samuel_thomas at brown.edu (Sam Thomas) Date: Wed, 17 Feb 2021 16:55:18 -0500 Subject: Flush Cache from Within Hotspot Message-ID: <22f38dda-f4b3-6dd8-09b7-487df03c761d@brown.edu> Hello all, I am working on a project that explores different architecture simulations using JDK14, and I am trying to flush the cache from the source code. We are working with an aarch64 target and I have tried the following to no avail: 1. Use the clflush C++ built-in instruction (this is an x86 instruction) 2. Dynamically allocate a large (2 MB) block of memory, and assign random numbers to the array (Hotspot assertions do not allow for dynamic memory allocation) 3. Allocate a large (2 MB) block of memory on the stack, and assign random numbers to the array (this properly compiles, but the stack overflows and a segmentation fault is raised) Has anyone done projects like this in the past, or is there a standard for how to perform such a procedure? Thank you for your help! Best, Sam From aver.shining at gmail.com Thu Feb 18 01:42:28 2021 From: aver.shining at gmail.com (=?UTF-8?B?0JDQvdC00YDQtdC5INCS0LXRgNGI0LjQvdC40L0=?=) Date: Thu, 18 Feb 2021 04:42:28 +0300 Subject: RFR: 8254239: G1ConcurrentMark.hpp unnecessarily disables MSVC++ warning 4522 Message-ID: Hello everybody, My name is Andrey Vershinin, I've chosen to fix JDK-8254239 as my first contribution to the project. I'm a Java developer, with some C++ knowledge (in the process of improving it). My goal is to gain a deeper understanding of the inner workings of the platform I'm interested in, its concepts and the code itself, and contribute to the best of my ability. The bug I've selected is a 'starter' one, to get involved in the process. The patch is attached below. Thanks, Andrey =================================================================== diff --git a/src/hotspot/share/gc/g1/g1ConcurrentMark.hpp b/src/hotspot/share/gc/g1/g1ConcurrentMark.hpp --- a/src/hotspot/share/gc/g1/g1ConcurrentMark.hpp (revision 06348dfcae0b6b82970e8c56391396affd311f90) +++ b/src/hotspot/share/gc/g1/g1ConcurrentMark.hpp (revision e635cee530e414503d6e84261ce636123d282ee9) @@ -50,10 +50,6 @@ class G1SurvivorRegions; class ThreadClosure; -PRAGMA_DIAG_PUSH -// warning C4522: multiple assignment operators specified -PRAGMA_DISABLE_MSVC_WARNING(4522) - // This is a container class for either an oop or a continuation address for // mark stack entries. Both are pushed onto the mark stack. class G1TaskQueueEntry { @@ -89,8 +85,6 @@ bool is_null() const { return _holder == NULL; } }; -PRAGMA_DIAG_POP - typedef GenericTaskQueue G1CMTaskQueue; typedef GenericTaskQueueSet G1CMTaskQueueSet; From jbachorik at openjdk.java.net Thu Feb 18 10:09:02 2021 From: jbachorik at openjdk.java.net (Jaroslav Bachorik) Date: Thu, 18 Feb 2021 10:09:02 GMT Subject: RFR: 8258431: Provide a JFR event with live set size estimate Message-ID: <7HVs4jngEbNIQIPQByuE6IRYAxdijfa82uhEFWHld5U=.a7784482-d7e1-4d59-88ee-455d8691631e@github.com> The purpose of this change is to expose a 'cheap' estimate of the current live set size (the meaning of 'current' is dependent on each particular GC implementation but in worst case 'at last full GC') in form of a periodically emitted JFR event. ## Introducing new JFR event While there is already 'GC Heap Summary' JFR event it does not fit the requirements as it is closely tied to GC cycle so eg. for ZGC or Shenandoah it may not happen for quite a long time, increasing the risk of not having the heap summary events being present in the JFR recording at all. Because of this I am proposing to add a new 'Heap Usage Summary' event which will be emitted periodically, by default on each JFR chunk, and will contain the information abut the heap capacity, the used and live bytes. This information is available from all GC implementations and can be provided at literally any time. ## Implementation The implementation differs from GC to GC because each GC algorithm/implementation provides a slightly different way to track the liveness. The common part is `size_t live() const` method added to `CollectedHeap` superclass and the use of a cached 'liveness' value computed after the last GC cycle. If `liveness` hasn't been calculated yet the implementation will default to returning 'used' value. The implementations are based on my (rather shallow) knowledge of inner working of the respective GC engines and I am open to suggestions to make them better/correct. ### Epsilon GC Trivial implementation - just return `used()` instead. ### Serial GC Here we utilize the fact that mark-copy phase is naturally compacting so the number of bytes after copy is 'live' and that the mark-sweep implementation keeps an internal info about objects being 'dead' but excluded from the compaction effort and we can these numbers to derive the old-gen live set size (used bytes minus the cumulative size of the 'un-dead' objects). ### Parallel GC For Parallel GC the liveness is calculated as the sum of used bytes in all regions after the last GC cycle. This seems to be a safe bet because this collector is always compacting (AFAIK). ### G1 GC Using `G1ConcurrentMark::remark()` method the live set size is computed as a sum of `_live_words` from the associated `G1RegionMarkStats` objects. Here I am not 100% sure this approach covers all eventualities and it would be great to have someone skilled in G1 implementation to chime in so I can fix it. However, the numbers I am getting for G1 are comparable to other GCs for the same application. ### Shenandoah In Shenandoah, the regions are keeping the liveness info. However, the VM op that is used for iterating regions is a safe-pointing one so it would be great to run it in an already safe-pointed context. This leads to hooking into `ShenandoahConcurrentMark::finish_mark()` and `ShenandoahSTWMark::mark()` where at the end of the marking process the liveness info is summarized and set to `ShenandoahHeap::_live` volatile field - which is later read by the event emitting code. ### ZGC `ZStatHeap` is already holding the liveness info - so this implementation is just making it accessible via `ZCollectedHeap::live()` method. ------------- Commit messages: - 8258431: Provide a JFR event with live set size estimate Changes: https://git.openjdk.java.net/jdk/pull/2579/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2579&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8258431 Stats: 177 lines in 33 files changed: 172 ins; 1 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/2579.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2579/head:pull/2579 PR: https://git.openjdk.java.net/jdk/pull/2579 From shade at openjdk.java.net Thu Feb 18 10:29:41 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 18 Feb 2021 10:29:41 GMT Subject: RFR: 8258431: Provide a JFR event with live set size estimate In-Reply-To: References: <7HVs4jngEbNIQIPQByuE6IRYAxdijfa82uhEFWHld5U=.a7784482-d7e1-4d59-88ee-455d8691631e@github.com> Message-ID: On Thu, 18 Feb 2021 10:23:37 GMT, Aleksey Shipilev wrote: >> The purpose of this change is to expose a 'cheap' estimate of the current live set size (the meaning of 'current' is dependent on each particular GC implementation but in worst case 'at last full GC') in form of a periodically emitted JFR event. >> >> ## Introducing new JFR event >> >> While there is already 'GC Heap Summary' JFR event it does not fit the requirements as it is closely tied to GC cycle so eg. for ZGC or Shenandoah it may not happen for quite a long time, increasing the risk of not having the heap summary events being present in the JFR recording at all. >> Because of this I am proposing to add a new 'Heap Usage Summary' event which will be emitted periodically, by default on each JFR chunk, and will contain the information abut the heap capacity, the used and live bytes. This information is available from all GC implementations and can be provided at literally any time. >> >> ## Implementation >> >> The implementation differs from GC to GC because each GC algorithm/implementation provides a slightly different way to track the liveness. The common part is `size_t live() const` method added to `CollectedHeap` superclass and the use of a cached 'liveness' value computed after the last GC cycle. If `liveness` hasn't been calculated yet the implementation will default to returning 'used' value. >> >> The implementations are based on my (rather shallow) knowledge of inner working of the respective GC engines and I am open to suggestions to make them better/correct. >> >> ### Epsilon GC >> >> Trivial implementation - just return `used()` instead. >> >> ### Serial GC >> >> Here we utilize the fact that mark-copy phase is naturally compacting so the number of bytes after copy is 'live' and that the mark-sweep implementation keeps an internal info about objects being 'dead' but excluded from the compaction effort and we can these numbers to derive the old-gen live set size (used bytes minus the cumulative size of the 'un-dead' objects). >> >> ### Parallel GC >> >> For Parallel GC the liveness is calculated as the sum of used bytes in all regions after the last GC cycle. This seems to be a safe bet because this collector is always compacting (AFAIK). >> >> ### G1 GC >> >> Using `G1ConcurrentMark::remark()` method the live set size is computed as a sum of `_live_words` from the associated `G1RegionMarkStats` objects. Here I am not 100% sure this approach covers all eventualities and it would be great to have someone skilled in G1 implementation to chime in so I can fix it. However, the numbers I am getting for G1 are comparable to other GCs for the same application. >> >> ### Shenandoah >> >> In Shenandoah, the regions are keeping the liveness info. However, the VM op that is used for iterating regions is a safe-pointing one so it would be great to run it in an already safe-pointed context. >> This leads to hooking into `ShenandoahConcurrentMark::finish_mark()` and `ShenandoahSTWMark::mark()` where at the end of the marking process the liveness info is summarized and set to `ShenandoahHeap::_live` volatile field - which is later read by the event emitting code. >> >> ### ZGC >> >> `ZStatHeap` is already holding the liveness info - so this implementation is just making it accessible via `ZCollectedHeap::live()` method. > > src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 627: > >> 625: >> 626: size_t ShenandoahHeap::live() const { >> 627: size_t live = Atomic::load_acquire(&_live); > > I understand you copy-pasted from the same file. We have removed `_acquire` with #2504. Do `Atomic::load` here. ...which also means you want to merge from master to get recent changes? ------------- PR: https://git.openjdk.java.net/jdk/pull/2579 From shade at openjdk.java.net Thu Feb 18 10:29:40 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 18 Feb 2021 10:29:40 GMT Subject: RFR: 8258431: Provide a JFR event with live set size estimate In-Reply-To: <7HVs4jngEbNIQIPQByuE6IRYAxdijfa82uhEFWHld5U=.a7784482-d7e1-4d59-88ee-455d8691631e@github.com> References: <7HVs4jngEbNIQIPQByuE6IRYAxdijfa82uhEFWHld5U=.a7784482-d7e1-4d59-88ee-455d8691631e@github.com> Message-ID: On Mon, 15 Feb 2021 17:23:44 GMT, Jaroslav Bachorik wrote: > The purpose of this change is to expose a 'cheap' estimate of the current live set size (the meaning of 'current' is dependent on each particular GC implementation but in worst case 'at last full GC') in form of a periodically emitted JFR event. > > ## Introducing new JFR event > > While there is already 'GC Heap Summary' JFR event it does not fit the requirements as it is closely tied to GC cycle so eg. for ZGC or Shenandoah it may not happen for quite a long time, increasing the risk of not having the heap summary events being present in the JFR recording at all. > Because of this I am proposing to add a new 'Heap Usage Summary' event which will be emitted periodically, by default on each JFR chunk, and will contain the information abut the heap capacity, the used and live bytes. This information is available from all GC implementations and can be provided at literally any time. > > ## Implementation > > The implementation differs from GC to GC because each GC algorithm/implementation provides a slightly different way to track the liveness. The common part is `size_t live() const` method added to `CollectedHeap` superclass and the use of a cached 'liveness' value computed after the last GC cycle. If `liveness` hasn't been calculated yet the implementation will default to returning 'used' value. > > The implementations are based on my (rather shallow) knowledge of inner working of the respective GC engines and I am open to suggestions to make them better/correct. > > ### Epsilon GC > > Trivial implementation - just return `used()` instead. > > ### Serial GC > > Here we utilize the fact that mark-copy phase is naturally compacting so the number of bytes after copy is 'live' and that the mark-sweep implementation keeps an internal info about objects being 'dead' but excluded from the compaction effort and we can these numbers to derive the old-gen live set size (used bytes minus the cumulative size of the 'un-dead' objects). > > ### Parallel GC > > For Parallel GC the liveness is calculated as the sum of used bytes in all regions after the last GC cycle. This seems to be a safe bet because this collector is always compacting (AFAIK). > > ### G1 GC > > Using `G1ConcurrentMark::remark()` method the live set size is computed as a sum of `_live_words` from the associated `G1RegionMarkStats` objects. Here I am not 100% sure this approach covers all eventualities and it would be great to have someone skilled in G1 implementation to chime in so I can fix it. However, the numbers I am getting for G1 are comparable to other GCs for the same application. > > ### Shenandoah > > In Shenandoah, the regions are keeping the liveness info. However, the VM op that is used for iterating regions is a safe-pointing one so it would be great to run it in an already safe-pointed context. > This leads to hooking into `ShenandoahConcurrentMark::finish_mark()` and `ShenandoahSTWMark::mark()` where at the end of the marking process the liveness info is summarized and set to `ShenandoahHeap::_live` volatile field - which is later read by the event emitting code. > > ### ZGC > > `ZStatHeap` is already holding the liveness info - so this implementation is just making it accessible via `ZCollectedHeap::live()` method. Interesting! Cursory review follows. src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 4578: > 4576: > 4577: void G1CollectedHeap::set_live(size_t bytes) { > 4578: Atomic::release_store(&_live_size, bytes); I don't think this requires `release_store`, regular `store` would be enough. G1 folks can say for sure. src/hotspot/share/gc/parallel/parallelScavengeHeap.hpp line 100: > 98: HeapWord* mem_allocate_old_gen(size_t size); > 99: > 100: Excess newline? src/hotspot/share/gc/shared/collectedHeap.hpp line 217: > 215: virtual size_t capacity() const = 0; > 216: virtual size_t used() const = 0; > 217: // a best-effort estimate of the live set size Suggestion: // Returns the estimate of live set size. Because live set changes over time, // this is a best-effort estimate by each of the implementations. These usually // are most precise right after the GC cycle. src/hotspot/share/gc/shared/genCollectedHeap.cpp line 1144: > 1142: _old_gen->prepare_for_compaction(&cp); > 1143: _young_gen->prepare_for_compaction(&cp); > 1144: Stray newline? src/hotspot/share/gc/shared/genCollectedHeap.hpp line 183: > 181: size_t live = _live_size; > 182: return live > 0 ? live : used(); > 183: }; I think the implementation belongs to `genCollectedHeap.cpp`. src/hotspot/share/gc/shared/generation.hpp line 140: > 138: virtual size_t used() const = 0; // The number of used bytes in the gen. > 139: virtual size_t free() const = 0; // The number of free bytes in the gen. > 140: virtual size_t live() const = 0; Needs a comment to match the lines above? Say, `// The estimate of live bytes in the gen.` src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 579: > 577: event.set_heapLive(heap->live()); > 578: event.commit(); > 579: } On the first sight, this belongs in `ShenandoahConcurrentMark::finish_mark()`. Placing the event here would fire the event when concurrent GC is cancelled, which is not what you want. src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 265: > 263: ShenandoahHeap* const heap = ShenandoahHeap::heap(); > 264: heap->set_concurrent_mark_in_progress(false); > 265: heap->mark_finished(); Let's not rename this method. Introduce a new method, `ShenandoahHeap::update_live`, and call it every time after `mark_complete_marking_context()` is called. src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 627: > 625: > 626: size_t ShenandoahHeap::live() const { > 627: size_t live = Atomic::load_acquire(&_live); I understand you copy-pasted from the same file. We have removed `_acquire` with #2504. Do `Atomic::load` here. src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 655: > 653: > 654: void ShenandoahHeap::set_live(size_t bytes) { > 655: Atomic::release_store_fence(&_live, bytes); Same, do `Atomic::store` here. src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp line 494: > 492: mark_complete_marking_context(); > 493: > 494: class ShenandoahCollectLiveSizeClosure : public ShenandoahHeapRegionClosure { We don't usually use the in-method declarations like these, pull it out of the method. src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp line 511: > 509: > 510: ShenandoahCollectLiveSizeClosure cl; > 511: heap_region_iterate(&cl); I think you want `parallel_heap_region_iterate` on this path, and do `Atomic::add(&_live, r->get_live_data_bytes())` in the closure. We shall see if this makes sense to make fully concurrently... src/hotspot/share/gc/epsilon/epsilonHeap.hpp line 80: > 78: virtual size_t capacity() const { return _virtual_space.committed_size(); } > 79: virtual size_t used() const { return _space->used(); } > 80: virtual size_t live() const { return used(); } I'd prefer to call `_space->used()` directly here. Minor optimization, I know. ------------- Changes requested by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2579 From aph at redhat.com Thu Feb 18 11:35:54 2021 From: aph at redhat.com (Andrew Haley) Date: Thu, 18 Feb 2021 11:35:54 +0000 Subject: Flush Cache from Within Hotspot In-Reply-To: <22f38dda-f4b3-6dd8-09b7-487df03c761d@brown.edu> References: <22f38dda-f4b3-6dd8-09b7-487df03c761d@brown.edu> Message-ID: <22855e6a-9eb7-84a7-7148-e72d0b2f0127@redhat.com> On 17/02/2021 21:55, Sam Thomas wrote: > I am working on a project that explores different architecture > simulations using JDK14, and I am trying to flush the cache Which cache? There is more than one. > from the > source code. We are working with an aarch64 target and I have tried the > following to no avail: > > 1. Use the clflush C++ built-in instruction (this is an x86 instruction) > > 2. Dynamically allocate a large (2 MB) block of memory, and assign > random numbers to the array (Hotspot assertions do not allow for dynamic > memory allocation) > > 3. Allocate a large (2 MB) block of memory on the stack, and assign > random numbers to the array (this properly compiles, but the stack > overflows and a segmentation fault is raised) > > Has anyone done projects like this in the past, or is there a standard > for how to perform such a procedure? Thank you for your help! GCC (and LLVM) have __builtin___clear_cache (char *BEGIN, char *END) ... which clears all caches in the address space to the point of unification. Windows has FlushInstructionCache which does the same. I don't think there's any builtin that flushes the data caches without flushing the instruction caches too, to do that you need to use assembly language. DC CVAU is what you need. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Thu Feb 18 11:37:21 2021 From: aph at redhat.com (Andrew Haley) Date: Thu, 18 Feb 2021 11:37:21 +0000 Subject: Flush Cache from Within Hotspot In-Reply-To: <22855e6a-9eb7-84a7-7148-e72d0b2f0127@redhat.com> References: <22f38dda-f4b3-6dd8-09b7-487df03c761d@brown.edu> <22855e6a-9eb7-84a7-7148-e72d0b2f0127@redhat.com> Message-ID: <384d9768-4bc4-85fa-6211-8d3e2ad17cbb@redhat.com> On 18/02/2021 11:35, Andrew Haley wrote: > GCC (and LLVM) have > __builtin___clear_cache (char *BEGIN, char *END) > > ... which clears all caches in the address ... region between BEGIN and END. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From lkorinth at openjdk.java.net Thu Feb 18 11:46:40 2021 From: lkorinth at openjdk.java.net (Leo Korinth) Date: Thu, 18 Feb 2021 11:46:40 GMT Subject: RFR: 8260416: Remove unused method ReferenceProcessor::is_mt_processing_set_up() In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 06:41:04 GMT, Kim Barrett wrote: >> Code is not used. > > Marked as reviewed by kbarrett (Reviewer). Thanks Albert and Kim! ------------- PR: https://git.openjdk.java.net/jdk/pull/2591 From lkorinth at openjdk.java.net Thu Feb 18 11:46:41 2021 From: lkorinth at openjdk.java.net (Leo Korinth) Date: Thu, 18 Feb 2021 11:46:41 GMT Subject: Integrated: 8260416: Remove unused method ReferenceProcessor::is_mt_processing_set_up() In-Reply-To: References: Message-ID: On Tue, 16 Feb 2021 18:53:54 GMT, Leo Korinth wrote: > Code is not used. This pull request has now been integrated. Changeset: 1a7adc86 Author: Leo Korinth URL: https://git.openjdk.java.net/jdk/commit/1a7adc86 Stats: 7 lines in 2 files changed: 0 ins; 7 del; 0 mod 8260416: Remove unused method ReferenceProcessor::is_mt_processing_set_up() Reviewed-by: ayang, kbarrett ------------- PR: https://git.openjdk.java.net/jdk/pull/2591 From shade at openjdk.java.net Thu Feb 18 13:23:40 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 18 Feb 2021 13:23:40 GMT Subject: RFR: 8261473: Shenandoah: Add breakpoint suppoprt [v2] In-Reply-To: <9gXmTI0gU9zTr-HffSqSsVEVjmUED0rNINulpr_mjQM=.b353278c-a4f1-45f5-bf8d-ed1e33fbb0c9@github.com> References: <9gXmTI0gU9zTr-HffSqSsVEVjmUED0rNINulpr_mjQM=.b353278c-a4f1-45f5-bf8d-ed1e33fbb0c9@github.com> Message-ID: On Wed, 10 Feb 2021 20:13:52 GMT, Zhengyu Gu wrote: >> Please review this patch that adds breakpoint support for Shenandoah, that allows Shenandoah to access a few tests: >> >> gc/TestConcurrentGCBreakpoints.java >> gc/TestJNIWeak/TestJNIWeak.java >> gc/TestReferenceClearDuringMarking.java >> gc/TestReferenceClearDuringReferenceProcessing.java >> gc/TestReferenceRefersTo.java >> >> The drawback is that above tests can not run with passive mode, which can result tests to hang, as breakpoints only apply to concurrent GC. >> >> Test: >> - [x] hotspot_gc_shenandoah >> - [x] tier1 with Shenandoah > > Zhengyu Gu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - Merge > - update > - init update Looks fine, minor nits. src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 52: > 50: > 51: // Breakpoint support > 52: class ShenandoahConcurrentGCScope : public StackObj { Let's call these `ShenandoahGCBreakpointScope` and `ShenandoahMarkBreakpointScope`? src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 476: > 474: bool ShenandoahControlThread::is_async_gc(GCCause::Cause cause) const { > 475: return cause == GCCause::_wb_breakpoint; > 476: } Do we really need this method? What is "async gc" anyway? I think you can just inline the method at its only use. ------------- Marked as reviewed by shade (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2489 From zgu at openjdk.java.net Thu Feb 18 14:06:56 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 18 Feb 2021 14:06:56 GMT Subject: RFR: 8261473: Shenandoah: Add breakpoint suppoprt [v2] In-Reply-To: References: <9gXmTI0gU9zTr-HffSqSsVEVjmUED0rNINulpr_mjQM=.b353278c-a4f1-45f5-bf8d-ed1e33fbb0c9@github.com> Message-ID: On Thu, 18 Feb 2021 13:19:18 GMT, Aleksey Shipilev wrote: >> Zhengyu Gu has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: >> >> - Merge >> - update >> - init update > > src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 52: > >> 50: >> 51: // Breakpoint support >> 52: class ShenandoahConcurrentGCScope : public StackObj { > > Let's call these `ShenandoahGCBreakpointScope` and `ShenandoahMarkBreakpointScope`? Done > src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 476: > >> 474: bool ShenandoahControlThread::is_async_gc(GCCause::Cause cause) const { >> 475: return cause == GCCause::_wb_breakpoint; >> 476: } > > Do we really need this method? What is "async gc" anyway? I think you can just inline the method at its only use. Done ------------- PR: https://git.openjdk.java.net/jdk/pull/2489 From zgu at openjdk.java.net Thu Feb 18 14:06:54 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 18 Feb 2021 14:06:54 GMT Subject: RFR: 8261473: Shenandoah: Add breakpoint suppoprt [v3] In-Reply-To: References: Message-ID: > Please review this patch that adds breakpoint support for Shenandoah, that allows Shenandoah to access a few tests: > > gc/TestConcurrentGCBreakpoints.java > gc/TestJNIWeak/TestJNIWeak.java > gc/TestReferenceClearDuringMarking.java > gc/TestReferenceClearDuringReferenceProcessing.java > gc/TestReferenceRefersTo.java > > The drawback is that above tests can not run with passive mode, which can result tests to hang, as breakpoints only apply to concurrent GC. > > Test: > - [x] hotspot_gc_shenandoah > - [x] tier1 with Shenandoah Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: Shade's comments and fixing a merge error ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2489/files - new: https://git.openjdk.java.net/jdk/pull/2489/files/2b88f7a6..4f19c4cd Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2489&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2489&range=01-02 Stats: 18 lines in 3 files changed: 2 ins; 7 del; 9 mod Patch: https://git.openjdk.java.net/jdk/pull/2489.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2489/head:pull/2489 PR: https://git.openjdk.java.net/jdk/pull/2489 From shade at openjdk.java.net Thu Feb 18 15:32:41 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 18 Feb 2021 15:32:41 GMT Subject: RFR: 8261473: Shenandoah: Add breakpoint suppoprt [v3] In-Reply-To: References: Message-ID: On Thu, 18 Feb 2021 14:06:54 GMT, Zhengyu Gu wrote: >> Please review this patch that adds breakpoint support for Shenandoah, that allows Shenandoah to access a few tests: >> >> gc/TestConcurrentGCBreakpoints.java >> gc/TestJNIWeak/TestJNIWeak.java >> gc/TestReferenceClearDuringMarking.java >> gc/TestReferenceClearDuringReferenceProcessing.java >> gc/TestReferenceRefersTo.java >> >> The drawback is that above tests can not run with passive mode, which can result tests to hang, as breakpoints only apply to concurrent GC. >> >> Test: >> - [x] hotspot_gc_shenandoah >> - [x] tier1 with Shenandoah > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > Shade's comments and fixing a merge error Marked as reviewed by shade (Reviewer). src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 52: > 50: > 51: // Breakpoint support > 52: class ShenandoahBreakpointScope : public StackObj { Should probably be `ShenandoahBreakpointGCScope` to match that other "MarkScope". ------------- PR: https://git.openjdk.java.net/jdk/pull/2489 From lkorinth at openjdk.java.net Thu Feb 18 15:32:53 2021 From: lkorinth at openjdk.java.net (Leo Korinth) Date: Thu, 18 Feb 2021 15:32:53 GMT Subject: RFR: 8261799: Remove unnecessary cast in psParallelCompact.hpp Message-ID: <_FiQTtJC_l4VWbY5gIUZPcpPqCkaDfz4QHH3nylSPFM=.f636b4c3-c715-486e-a9a1-b14c7ef2fbd7@github.com> Unnecessary casts confuses me. ------------- Commit messages: - 8261799: Remove unnessesary cast in psParallelCompact.hpp Changes: https://git.openjdk.java.net/jdk/pull/2628/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2628&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261799 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2628.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2628/head:pull/2628 PR: https://git.openjdk.java.net/jdk/pull/2628 From lkorinth at openjdk.java.net Thu Feb 18 15:42:46 2021 From: lkorinth at openjdk.java.net (Leo Korinth) Date: Thu, 18 Feb 2021 15:42:46 GMT Subject: RFR: 8261803: Remove unused TaskTerminator in g1 full gc ref proc executor Message-ID: 8261803: Remove unused TaskTerminator in g1 full gc ref proc executor ------------- Commit messages: - 8261803: Remove unused TaskTerminator in g1 full gc ref proc executor Changes: https://git.openjdk.java.net/jdk/pull/2629/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2629&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261803 Stats: 5 lines in 2 files changed: 0 ins; 2 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/2629.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2629/head:pull/2629 PR: https://git.openjdk.java.net/jdk/pull/2629 From ayang at openjdk.java.net Thu Feb 18 15:43:39 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Thu, 18 Feb 2021 15:43:39 GMT Subject: RFR: 8261799: Remove unnecessary cast in psParallelCompact.hpp In-Reply-To: <_FiQTtJC_l4VWbY5gIUZPcpPqCkaDfz4QHH3nylSPFM=.f636b4c3-c715-486e-a9a1-b14c7ef2fbd7@github.com> References: <_FiQTtJC_l4VWbY5gIUZPcpPqCkaDfz4QHH3nylSPFM=.f636b4c3-c715-486e-a9a1-b14c7ef2fbd7@github.com> Message-ID: On Thu, 18 Feb 2021 15:26:38 GMT, Leo Korinth wrote: > Unnecessary casts confuses me. Marked as reviewed by ayang (Author). ------------- PR: https://git.openjdk.java.net/jdk/pull/2628 From ayang at openjdk.java.net Thu Feb 18 15:46:39 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Thu, 18 Feb 2021 15:46:39 GMT Subject: RFR: 8261803: Remove unused TaskTerminator in g1 full gc ref proc executor In-Reply-To: References: Message-ID: On Thu, 18 Feb 2021 15:37:30 GMT, Leo Korinth wrote: > 8261803: Remove unused TaskTerminator in g1 full gc ref proc executor Marked as reviewed by ayang (Author). ------------- PR: https://git.openjdk.java.net/jdk/pull/2629 From zgu at openjdk.java.net Thu Feb 18 15:52:56 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 18 Feb 2021 15:52:56 GMT Subject: RFR: 8261473: Shenandoah: Add breakpoint suppoprt [v4] In-Reply-To: References: Message-ID: > Please review this patch that adds breakpoint support for Shenandoah, that allows Shenandoah to access a few tests: > > gc/TestConcurrentGCBreakpoints.java > gc/TestJNIWeak/TestJNIWeak.java > gc/TestReferenceClearDuringMarking.java > gc/TestReferenceClearDuringReferenceProcessing.java > gc/TestReferenceRefersTo.java > > The drawback is that above tests can not run with passive mode, which can result tests to hang, as breakpoints only apply to concurrent GC. > > Test: > - [x] hotspot_gc_shenandoah > - [x] tier1 with Shenandoah Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: More renaming per @shade ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2489/files - new: https://git.openjdk.java.net/jdk/pull/2489/files/4f19c4cd..a9629c8f Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2489&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2489&range=02-03 Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/2489.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2489/head:pull/2489 PR: https://git.openjdk.java.net/jdk/pull/2489 From shade at openjdk.java.net Thu Feb 18 15:53:41 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 18 Feb 2021 15:53:41 GMT Subject: Integrated: 8261838: Shenandoah: reconsider heap region iterators memory ordering In-Reply-To: References: Message-ID: <62us8rPYIjJz0Im-fzSBkSgAdtRZUaB0UzvBAdVPMNA=.2127c9ef-271e-4d4e-b962-914ee54f906b@github.com> On Tue, 16 Feb 2021 19:13:03 GMT, Aleksey Shipilev wrote: > We use CASes to distributed workers between regions. Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. > > This seems to be excessive for region distribution code, and "relaxed" is enough, since we don't piggyback memory ordering on these. > > This also calls for some refactoring in the code itself. > > Additional testing: > - [x] `hotspot_gc_shenandoah` > - [ ] Ad-hoc performance runs This pull request has now been integrated. Changeset: fd098e71 Author: Aleksey Shipilev URL: https://git.openjdk.java.net/jdk/commit/fd098e71 Stats: 24 lines in 4 files changed: 2 ins; 3 del; 19 mod 8261838: Shenandoah: reconsider heap region iterators memory ordering Reviewed-by: rkennke ------------- PR: https://git.openjdk.java.net/jdk/pull/2593 From shade at openjdk.java.net Thu Feb 18 15:55:40 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Thu, 18 Feb 2021 15:55:40 GMT Subject: RFR: 8261473: Shenandoah: Add breakpoint suppoprt [v4] In-Reply-To: References: Message-ID: <0DbP2gE5bq0tJ2vwI6sJfWfjBd_5HmyVSdRsaHyhoMg=.b832313e-a0b7-47ef-9d0c-58579ea3e7b1@github.com> On Thu, 18 Feb 2021 15:52:56 GMT, Zhengyu Gu wrote: >> Please review this patch that adds breakpoint support for Shenandoah, that allows Shenandoah to access a few tests: >> >> gc/TestConcurrentGCBreakpoints.java >> gc/TestJNIWeak/TestJNIWeak.java >> gc/TestReferenceClearDuringMarking.java >> gc/TestReferenceClearDuringReferenceProcessing.java >> gc/TestReferenceRefersTo.java >> >> The drawback is that above tests can not run with passive mode, which can result tests to hang, as breakpoints only apply to concurrent GC. >> >> Test: >> - [x] hotspot_gc_shenandoah >> - [x] tier1 with Shenandoah > > Zhengyu Gu has updated the pull request incrementally with one additional commit since the last revision: > > More renaming per @shade Marked as reviewed by shade (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2489 From zgu at openjdk.java.net Thu Feb 18 18:34:44 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 18 Feb 2021 18:34:44 GMT Subject: Integrated: 8261473: Shenandoah: Add breakpoint support In-Reply-To: References: Message-ID: On Tue, 9 Feb 2021 21:19:11 GMT, Zhengyu Gu wrote: > Please review this patch that adds breakpoint support for Shenandoah, that allows Shenandoah to access a few tests: > > gc/TestConcurrentGCBreakpoints.java > gc/TestJNIWeak/TestJNIWeak.java > gc/TestReferenceClearDuringMarking.java > gc/TestReferenceClearDuringReferenceProcessing.java > gc/TestReferenceRefersTo.java > > The drawback is that above tests can not run with passive mode, which can result tests to hang, as breakpoints only apply to concurrent GC. > > Test: > - [x] hotspot_gc_shenandoah > - [x] tier1 with Shenandoah This pull request has now been integrated. Changeset: 9cf4f90d Author: Zhengyu Gu URL: https://git.openjdk.java.net/jdk/commit/9cf4f90d Stats: 165 lines in 7 files changed: 153 ins; 2 del; 10 mod 8261473: Shenandoah: Add breakpoint support Reviewed-by: rkennke, shade ------------- PR: https://git.openjdk.java.net/jdk/pull/2489 From zgu at openjdk.java.net Thu Feb 18 21:08:46 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 18 Feb 2021 21:08:46 GMT Subject: RFR: 8261984: Shenandoah: Remove unused ShenandoahPushWorkerQueuesScope class Message-ID: Please review this trivial change that removes unused ShenandoahPushWorkerQueuesScope class. ------------- Commit messages: - 8261984: Shenandoah: Remove unused ShenandoahPushWorkerQueuesScope class Changes: https://git.openjdk.java.net/jdk/pull/2632/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2632&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261984 Stats: 21 lines in 2 files changed: 0 ins; 19 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2632.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2632/head:pull/2632 PR: https://git.openjdk.java.net/jdk/pull/2632 From stefank at openjdk.java.net Thu Feb 18 21:27:44 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Thu, 18 Feb 2021 21:27:44 GMT Subject: RFR: 8261799: Remove unnecessary cast in psParallelCompact.hpp In-Reply-To: <_FiQTtJC_l4VWbY5gIUZPcpPqCkaDfz4QHH3nylSPFM=.f636b4c3-c715-486e-a9a1-b14c7ef2fbd7@github.com> References: <_FiQTtJC_l4VWbY5gIUZPcpPqCkaDfz4QHH3nylSPFM=.f636b4c3-c715-486e-a9a1-b14c7ef2fbd7@github.com> Message-ID: On Thu, 18 Feb 2021 15:26:38 GMT, Leo Korinth wrote: > Unnecessary casts confuses me. Marked as reviewed by stefank (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2628 From tschatzl at openjdk.java.net Thu Feb 18 23:06:38 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 18 Feb 2021 23:06:38 GMT Subject: RFR: 8261803: Remove unused TaskTerminator in g1 full gc ref proc executor In-Reply-To: References: Message-ID: <72rVWOBACSiwuThBbz6GlG2erbvhW3EsYthSuwbgFhY=.8a943c88-7a30-4eee-b776-e0ce07d4d821@github.com> On Thu, 18 Feb 2021 15:37:30 GMT, Leo Korinth wrote: > 8261803: Remove unused TaskTerminator in g1 full gc ref proc executor Marked as reviewed by tschatzl (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2629 From kbarrett at openjdk.java.net Fri Feb 19 02:53:56 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 19 Feb 2021 02:53:56 GMT Subject: RFR: 8261905: Move implementation of OopStorage num_dead related functions [v2] In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 15:39:02 GMT, Albert Mingkun Yang wrote: >> Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge branch 'master' into move_num_dead >> - move num_dead functions > > Marked as reviewed by ayang (Author). Thanks @albertnetymk and @tschatzl for reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/2608 From kbarrett at openjdk.java.net Fri Feb 19 02:53:55 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 19 Feb 2021 02:53:55 GMT Subject: RFR: 8261905: Move implementation of OopStorage num_dead related functions [v2] In-Reply-To: References: Message-ID: > Please review this trivial change which just moves several functions to a > different location in the same file. The old location is in the middle of > some unrelated functionality. > > Testing: > mach5 tier1 Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into move_num_dead - move num_dead functions ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2608/files - new: https://git.openjdk.java.net/jdk/pull/2608/files/baa9f5d2..578009f4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2608&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2608&range=00-01 Stats: 947 lines in 49 files changed: 661 ins; 155 del; 131 mod Patch: https://git.openjdk.java.net/jdk/pull/2608.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2608/head:pull/2608 PR: https://git.openjdk.java.net/jdk/pull/2608 From kbarrett at openjdk.java.net Fri Feb 19 02:53:57 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 19 Feb 2021 02:53:57 GMT Subject: Integrated: 8261905: Move implementation of OopStorage num_dead related functions In-Reply-To: References: Message-ID: <-F3VzaS6AJrLtGZ52t6viHGKRzatvd0u5d_x8Smyp6I=.a78938fd-748a-4bda-ae57-e22c0422c934@github.com> On Wed, 17 Feb 2021 15:18:40 GMT, Kim Barrett wrote: > Please review this trivial change which just moves several functions to a > different location in the same file. The old location is in the middle of > some unrelated functionality. > > Testing: > mach5 tier1 This pull request has now been integrated. Changeset: 7e78c777 Author: Kim Barrett URL: https://git.openjdk.java.net/jdk/commit/7e78c777 Stats: 30 lines in 1 file changed: 15 ins; 15 del; 0 mod 8261905: Move implementation of OopStorage num_dead related functions Reviewed-by: ayang, tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/2608 From shade at openjdk.java.net Fri Feb 19 06:13:38 2021 From: shade at openjdk.java.net (Aleksey Shipilev) Date: Fri, 19 Feb 2021 06:13:38 GMT Subject: RFR: 8261984: Shenandoah: Remove unused ShenandoahPushWorkerQueuesScope class In-Reply-To: References: Message-ID: On Thu, 18 Feb 2021 21:03:34 GMT, Zhengyu Gu wrote: > Please review this trivial change that removes unused ShenandoahPushWorkerQueuesScope class. Marked as reviewed by shade (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2632 From ayang at openjdk.java.net Fri Feb 19 08:36:40 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Fri, 19 Feb 2021 08:36:40 GMT Subject: RFR: 8258431: Provide a JFR event with live set size estimate In-Reply-To: <7HVs4jngEbNIQIPQByuE6IRYAxdijfa82uhEFWHld5U=.a7784482-d7e1-4d59-88ee-455d8691631e@github.com> References: <7HVs4jngEbNIQIPQByuE6IRYAxdijfa82uhEFWHld5U=.a7784482-d7e1-4d59-88ee-455d8691631e@github.com> Message-ID: On Mon, 15 Feb 2021 17:23:44 GMT, Jaroslav Bachorik wrote: > The purpose of this change is to expose a 'cheap' estimate of the current live set size (the meaning of 'current' is dependent on each particular GC implementation but in worst case 'at last full GC') in form of a periodically emitted JFR event. > > ## Introducing new JFR event > > While there is already 'GC Heap Summary' JFR event it does not fit the requirements as it is closely tied to GC cycle so eg. for ZGC or Shenandoah it may not happen for quite a long time, increasing the risk of not having the heap summary events being present in the JFR recording at all. > Because of this I am proposing to add a new 'Heap Usage Summary' event which will be emitted periodically, by default on each JFR chunk, and will contain the information abut the heap capacity, the used and live bytes. This information is available from all GC implementations and can be provided at literally any time. > > ## Implementation > > The implementation differs from GC to GC because each GC algorithm/implementation provides a slightly different way to track the liveness. The common part is `size_t live() const` method added to `CollectedHeap` superclass and the use of a cached 'liveness' value computed after the last GC cycle. If `liveness` hasn't been calculated yet the implementation will default to returning 'used' value. > > The implementations are based on my (rather shallow) knowledge of inner working of the respective GC engines and I am open to suggestions to make them better/correct. > > ### Epsilon GC > > Trivial implementation - just return `used()` instead. > > ### Serial GC > > Here we utilize the fact that mark-copy phase is naturally compacting so the number of bytes after copy is 'live' and that the mark-sweep implementation keeps an internal info about objects being 'dead' but excluded from the compaction effort and we can these numbers to derive the old-gen live set size (used bytes minus the cumulative size of the 'un-dead' objects). > > ### Parallel GC > > For Parallel GC the liveness is calculated as the sum of used bytes in all regions after the last GC cycle. This seems to be a safe bet because this collector is always compacting (AFAIK). > > ### G1 GC > > Using `G1ConcurrentMark::remark()` method the live set size is computed as a sum of `_live_words` from the associated `G1RegionMarkStats` objects. Here I am not 100% sure this approach covers all eventualities and it would be great to have someone skilled in G1 implementation to chime in so I can fix it. However, the numbers I am getting for G1 are comparable to other GCs for the same application. > > ### Shenandoah > > In Shenandoah, the regions are keeping the liveness info. However, the VM op that is used for iterating regions is a safe-pointing one so it would be great to run it in an already safe-pointed context. > This leads to hooking into `ShenandoahConcurrentMark::finish_mark()` and `ShenandoahSTWMark::mark()` where at the end of the marking process the liveness info is summarized and set to `ShenandoahHeap::_live` volatile field - which is later read by the event emitting code. > > ### ZGC > > `ZStatHeap` is already holding the liveness info - so this implementation is just making it accessible via `ZCollectedHeap::live()` method. Additionally, some test(s) on this new feature would be nice. Maybe you can add sth in `HeapSummaryEventAllGcs`? PS: I was looking into how to get periodic heap usage info just a few days ago, and settled for `MemProfiling` as a workaround. Thank you for the patch. src/hotspot/share/jfr/periodic/jfrPeriodic.cpp line 649: > 647: TRACE_REQUEST_FUNC(HeapUsageSummary) { > 648: EventHeapUsageSummary event; > 649: if (event.should_commit()) { I believe the `should_commit` check is not needed; the period check is handle by the caller. src/hotspot/share/gc/parallel/parallelScavengeHeap.hpp line 79: > 77: size_t _young_live; > 78: size_t _eden_live; > 79: size_t _old_live; It's only the sum that's ever exposed, right? I wonder if it makes sense to merge them into one var to only track the sum. ------------- Changes requested by ayang (Author). PR: https://git.openjdk.java.net/jdk/pull/2579 From sjohanss at openjdk.java.net Fri Feb 19 08:39:38 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Fri, 19 Feb 2021 08:39:38 GMT Subject: RFR: 8261803: Remove unused TaskTerminator in g1 full gc ref proc executor In-Reply-To: References: Message-ID: On Thu, 18 Feb 2021 15:37:30 GMT, Leo Korinth wrote: > 8261803: Remove unused TaskTerminator in g1 full gc ref proc executor Thanks for cleaning this up. ------------- Marked as reviewed by sjohanss (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2629 From github.com+779991+jaokim at openjdk.java.net Fri Feb 19 09:30:03 2021 From: github.com+779991+jaokim at openjdk.java.net (Joakim =?UTF-8?B?Tm9yZHN0csO2bQ==?=) Date: Fri, 19 Feb 2021 09:30:03 GMT Subject: RFR: 8242032: G1 region remembered sets may contain non-coarse level PRTs for already coarsened regions Message-ID: <4IlVuwhMhFGZty7e8RXnSx_E57BJ9tVAhgCpL6pc7ts=.c7457983-075d-4c7b-a6d5-81e48ce54f2a@github.com> This fix adds a check for coarsened region in mutex guarded section, when adding a reference to a remembered set. Haven't been able to produce a testcase -- please advice on how to, or if not necessary. **Testing:** * hs-tier, hs-tier2 ------------- Commit messages: - Added check for coarsened region in mutex guarded section. Changes: https://git.openjdk.java.net/jdk/pull/2545/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2545&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8242032 Stats: 9 lines in 1 file changed: 9 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2545.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2545/head:pull/2545 PR: https://git.openjdk.java.net/jdk/pull/2545 From github.com+779991+jaokim at openjdk.java.net Fri Feb 19 11:26:17 2021 From: github.com+779991+jaokim at openjdk.java.net (Joakim =?UTF-8?B?Tm9yZHN0csO2bQ==?=) Date: Fri, 19 Feb 2021 11:26:17 GMT Subject: RFR: 8242032: G1 region remembered sets may contain non-coarse level PRTs for already coarsened regions [v2] In-Reply-To: <4IlVuwhMhFGZty7e8RXnSx_E57BJ9tVAhgCpL6pc7ts=.c7457983-075d-4c7b-a6d5-81e48ce54f2a@github.com> References: <4IlVuwhMhFGZty7e8RXnSx_E57BJ9tVAhgCpL6pc7ts=.c7457983-075d-4c7b-a6d5-81e48ce54f2a@github.com> Message-ID: > This fix adds a check for coarsened region in mutex guarded section, when adding a reference to a remembered set. > > Haven't been able to produce a testcase -- please advice on how to, or if not necessary. > > **Testing:** > * hs-tier, hs-tier2 Joakim Nordstr?m has updated the pull request incrementally with two additional commits since the last revision: - Clarified comment on re-checking coarsening - Minor typo ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2545/files - new: https://git.openjdk.java.net/jdk/pull/2545/files/7398c89b..4bc0f4b7 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2545&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2545&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2545.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2545/head:pull/2545 PR: https://git.openjdk.java.net/jdk/pull/2545 From ayang at openjdk.java.net Fri Feb 19 11:26:18 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Fri, 19 Feb 2021 11:26:18 GMT Subject: RFR: 8242032: G1 region remembered sets may contain non-coarse level PRTs for already coarsened regions [v2] In-Reply-To: References: <4IlVuwhMhFGZty7e8RXnSx_E57BJ9tVAhgCpL6pc7ts=.c7457983-075d-4c7b-a6d5-81e48ce54f2a@github.com> Message-ID: On Fri, 19 Feb 2021 10:59:45 GMT, Joakim Nordstr?m wrote: >> This fix adds a check for coarsened region in mutex guarded section, when adding a reference to a remembered set. >> >> Haven't been able to produce a testcase -- please advice on how to, or if not necessary. >> >> **Testing:** >> * hs-tier, hs-tier2 > > Joakim Nordstr?m has updated the pull request incrementally with two additional commits since the last revision: > > - Clarified comment on re-checking coarsening > - Minor typo Marked as reviewed by ayang (Author). src/hotspot/share/gc/g1/heapRegionRemSet.cpp line 149: > 147: MutexLocker x(_m, Mutex::_no_safepoint_check_flag); > 148: > 149: // Make sure region hasn't been coarsened by other thread. Maybe mentioning this is an intentional re-check under lock, sth like, "Rechecking if the region is coarsened while holding the lock." ------------- PR: https://git.openjdk.java.net/jdk/pull/2545 From github.com+779991+jaokim at openjdk.java.net Fri Feb 19 11:26:19 2021 From: github.com+779991+jaokim at openjdk.java.net (Joakim =?UTF-8?B?Tm9yZHN0csO2bQ==?=) Date: Fri, 19 Feb 2021 11:26:19 GMT Subject: RFR: 8242032: G1 region remembered sets may contain non-coarse level PRTs for already coarsened regions [v2] In-Reply-To: References: <4IlVuwhMhFGZty7e8RXnSx_E57BJ9tVAhgCpL6pc7ts=.c7457983-075d-4c7b-a6d5-81e48ce54f2a@github.com> Message-ID: On Fri, 19 Feb 2021 10:38:35 GMT, Albert Mingkun Yang wrote: >> Joakim Nordstr?m has updated the pull request incrementally with two additional commits since the last revision: >> >> - Clarified comment on re-checking coarsening >> - Minor typo > > src/hotspot/share/gc/g1/heapRegionRemSet.cpp line 149: > >> 147: MutexLocker x(_m, Mutex::_no_safepoint_check_flag); >> 148: >> 149: // Make sure region hasn't been coarsened by other thread. > > Maybe mentioning this is an intentional re-check under lock, sth like, "Rechecking if the region is coarsened while holding the lock." Yes, thank you! ------------- PR: https://git.openjdk.java.net/jdk/pull/2545 From zgu at openjdk.java.net Fri Feb 19 13:46:39 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Fri, 19 Feb 2021 13:46:39 GMT Subject: Integrated: 8261984: Shenandoah: Remove unused ShenandoahPushWorkerQueuesScope class In-Reply-To: References: Message-ID: On Thu, 18 Feb 2021 21:03:34 GMT, Zhengyu Gu wrote: > Please review this trivial change that removes unused ShenandoahPushWorkerQueuesScope class. This pull request has now been integrated. Changeset: 55463b04 Author: Zhengyu Gu URL: https://git.openjdk.java.net/jdk/commit/55463b04 Stats: 21 lines in 2 files changed: 0 ins; 19 del; 2 mod 8261984: Shenandoah: Remove unused ShenandoahPushWorkerQueuesScope class Reviewed-by: shade ------------- PR: https://git.openjdk.java.net/jdk/pull/2632 From zgu at openjdk.java.net Fri Feb 19 13:58:59 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Fri, 19 Feb 2021 13:58:59 GMT Subject: RFR: 8261973: Shenandoah: Cleanup/simplify root verifier Message-ID: Root processing has gone through significant changes. For example, we used to mark through weak roots when class unloading is off, that is no long the case, OopStorages also simplify roots. Shenandoah root verifier can be simplified into 2 cases, with/without class unloading. - [x] hotspot_gc_shenandoah with -XX:+ShenandoahVerify ------------- Commit messages: - update - 8261973 Changes: https://git.openjdk.java.net/jdk/pull/2643/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2643&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261973 Stats: 198 lines in 4 files changed: 10 ins; 159 del; 29 mod Patch: https://git.openjdk.java.net/jdk/pull/2643.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2643/head:pull/2643 PR: https://git.openjdk.java.net/jdk/pull/2643 From kim.barrett at oracle.com Fri Feb 19 14:01:14 2021 From: kim.barrett at oracle.com (Kim Barrett) Date: Fri, 19 Feb 2021 14:01:14 +0000 Subject: RFR: 8254239: G1ConcurrentMark.hpp unnecessarily disables MSVC++ warning 4522 In-Reply-To: References: Message-ID: > On Feb 17, 2021, at 8:42 PM, ?????? ???????? wrote: > > Hello everybody, > > My name is Andrey Vershinin, I've chosen to fix JDK-8254239 as my first > contribution to the project. I'm a Java developer, with some C++ knowledge > (in the process of improving it). > My goal is to gain a deeper understanding of the inner workings of the > platform I'm interested in, its concepts and the code itself, and > contribute to the best of my ability. > The bug I've selected is a 'starter' one, to get involved in the process. > The patch is attached below. The normal way to make openjdk changes now is via github pull requests, rather than patches in email. But there are some preliminary steps as well. In particular, have you signed the OCA? I suggest you take a look here: https://openjdk.java.net/guide/ https://openjdk.java.net/guide/#i-have-a-patch-what-do-i-do When you get to the point of finding a sponser, I can do that. > Thanks, > Andrey > > =================================================================== > diff --git a/src/hotspot/share/gc/g1/g1ConcurrentMark.hpp > b/src/hotspot/share/gc/g1/g1ConcurrentMark.hpp > --- a/src/hotspot/share/gc/g1/g1ConcurrentMark.hpp (revision > 06348dfcae0b6b82970e8c56391396affd311f90) > +++ b/src/hotspot/share/gc/g1/g1ConcurrentMark.hpp (revision > e635cee530e414503d6e84261ce636123d282ee9) > @@ -50,10 +50,6 @@ > class G1SurvivorRegions; > class ThreadClosure; > > -PRAGMA_DIAG_PUSH > -// warning C4522: multiple assignment operators specified > -PRAGMA_DISABLE_MSVC_WARNING(4522) > - > // This is a container class for either an oop or a continuation address > for > // mark stack entries. Both are pushed onto the mark stack. > class G1TaskQueueEntry { > @@ -89,8 +85,6 @@ > bool is_null() const { return _holder == NULL; } > }; > > -PRAGMA_DIAG_POP > - > typedef GenericTaskQueue G1CMTaskQueue; > typedef GenericTaskQueueSet G1CMTaskQueueSet; From github.com+31506961+vshining at openjdk.java.net Fri Feb 19 15:09:47 2021 From: github.com+31506961+vshining at openjdk.java.net (Andrey Vershinin) Date: Fri, 19 Feb 2021 15:09:47 GMT Subject: RFR: JDK-8254239: G1ConcurrentMark.hpp unnecessarily disables MSVC++ warning 4522 Message-ID: <5QLZk5Yxoc3tVFvm0sKBQZ4KGgEG-obeHlxRPOza0yI=.c53526e0-80cc-4bdd-a765-e8590988cf99@github.com> This is a simple change removing disabling of MSVC++ warning 4522. Since it only affects build process, no tests were ran. ------------- Commit messages: - 8254239: G1ConcurrentMark.hpp unnecessarily disables MSVC++ warning 4522 Changes: https://git.openjdk.java.net/jdk/pull/2646/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2646&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8254239 Stats: 6 lines in 1 file changed: 0 ins; 6 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2646.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2646/head:pull/2646 PR: https://git.openjdk.java.net/jdk/pull/2646 From aver.shining at gmail.com Fri Feb 19 15:11:54 2021 From: aver.shining at gmail.com (=?UTF-8?B?0JDQvdC00YDQtdC5INCS0LXRgNGI0LjQvdC40L0=?=) Date: Fri, 19 Feb 2021 18:11:54 +0300 Subject: RFR: 8254239: G1ConcurrentMark.hpp unnecessarily disables MSVC++ warning 4522 In-Reply-To: References: Message-ID: Hello Kim, I've signed the OCA with my Github username written in. PR: https://github.com/openjdk/jdk/pull/2646 I would be grateful if you could sponsor this ??, 19 ????. 2021 ?. ? 17:01, Kim Barrett : > > > On Feb 17, 2021, at 8:42 PM, ?????? ???????? > wrote: > > > > Hello everybody, > > > > My name is Andrey Vershinin, I've chosen to fix JDK-8254239 as my first > > contribution to the project. I'm a Java developer, with some C++ > knowledge > > (in the process of improving it). > > My goal is to gain a deeper understanding of the inner workings of the > > platform I'm interested in, its concepts and the code itself, and > > contribute to the best of my ability. > > The bug I've selected is a 'starter' one, to get involved in the process. > > The patch is attached below. > > The normal way to make openjdk changes now is via github pull > requests, rather than patches in email. But there are some preliminary > steps as well. In particular, have you signed the OCA? I suggest you > take a look here: > > https://openjdk.java.net/guide/ > https://openjdk.java.net/guide/#i-have-a-patch-what-do-i-do > > When you get to the point of finding a sponser, I can do that. > > > Thanks, > > Andrey > > > > =================================================================== > > diff --git a/src/hotspot/share/gc/g1/g1ConcurrentMark.hpp > > b/src/hotspot/share/gc/g1/g1ConcurrentMark.hpp > > --- a/src/hotspot/share/gc/g1/g1ConcurrentMark.hpp (revision > > 06348dfcae0b6b82970e8c56391396affd311f90) > > +++ b/src/hotspot/share/gc/g1/g1ConcurrentMark.hpp (revision > > e635cee530e414503d6e84261ce636123d282ee9) > > @@ -50,10 +50,6 @@ > > class G1SurvivorRegions; > > class ThreadClosure; > > > > -PRAGMA_DIAG_PUSH > > -// warning C4522: multiple assignment operators specified > > -PRAGMA_DISABLE_MSVC_WARNING(4522) > > - > > // This is a container class for either an oop or a continuation address > > for > > // mark stack entries. Both are pushed onto the mark stack. > > class G1TaskQueueEntry { > > @@ -89,8 +85,6 @@ > > bool is_null() const { return _holder == NULL; } > > }; > > > > -PRAGMA_DIAG_POP > > - > > typedef GenericTaskQueue G1CMTaskQueue; > > typedef GenericTaskQueueSet G1CMTaskQueueSet; > > From iklam at openjdk.java.net Fri Feb 19 17:21:48 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Fri, 19 Feb 2021 17:21:48 GMT Subject: RFR: JDK-8254239: G1ConcurrentMark.hpp unnecessarily disables MSVC++ warning 4522. In-Reply-To: <5QLZk5Yxoc3tVFvm0sKBQZ4KGgEG-obeHlxRPOza0yI=.c53526e0-80cc-4bdd-a765-e8590988cf99@github.com> References: <5QLZk5Yxoc3tVFvm0sKBQZ4KGgEG-obeHlxRPOza0yI=.c53526e0-80cc-4bdd-a765-e8590988cf99@github.com> Message-ID: On Fri, 19 Feb 2021 15:05:24 GMT, Andrey Vershinin wrote: > This is a simple change removing disabling of MSVC++ warning 4522. Since it only affects build process, no tests were ran. Pre-submit test was skipped because: > Testing is not configured > In order to run pre-submit tests, the source repository must be properly configured to allow test execution. See https://wiki.openjdk.java.net/display/SKARA/Testing for more information on how to configure this. Since this is a build change, please enable pre-submit testing to make sure it doesn't break anything. ------------- PR: https://git.openjdk.java.net/jdk/pull/2646 From rkennke at openjdk.java.net Fri Feb 19 19:53:52 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Fri, 19 Feb 2021 19:53:52 GMT Subject: RFR: 8262049: [TESTBUG] Shenandoah: Adjustments in TestReferenceRefersTo.java for IU mode Message-ID: Shenandoah's IU mode allows referents to be cleared even when accessed during concurrent marking. The test TestReferenceRefersTo.java needs to be adjusted to allow for that. Test: - [x] TestReferenceRefersTo.java + Shenandoah/IU - [x] TestReferenceRefersTo.java + Shenandoah/SATB - [x] TestReferenceRefersTo.java + G1 ------------- Commit messages: - 8262049: [TESTBUG] Shenandoah: Adjustments in TestReferenceRefersTo.java for IU mode Changes: https://git.openjdk.java.net/jdk/pull/2653/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2653&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8262049 Stats: 18 lines in 1 file changed: 12 ins; 0 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/2653.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2653/head:pull/2653 PR: https://git.openjdk.java.net/jdk/pull/2653 From kbarrett at openjdk.java.net Sat Feb 20 09:55:50 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sat, 20 Feb 2021 09:55:50 GMT Subject: RFR: 8228748: Remove GCLocker::_doing_gc In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 08:00:04 GMT, Albert Mingkun Yang wrote: > Some refactoring in `GCLocker` and more comments in `jni_lock` on how the synchronization works there. Looks good. Copyright for gcLocker.hpp needs to be updated. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2602 From ayang at openjdk.java.net Sat Feb 20 10:27:04 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Sat, 20 Feb 2021 10:27:04 GMT Subject: RFR: 8228748: Remove GCLocker::_doing_gc [v2] In-Reply-To: References: Message-ID: > Some refactoring in `GCLocker` and more comments in `jni_lock` on how the synchronization works there. Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2602/files - new: https://git.openjdk.java.net/jdk/pull/2602/files/17ff21e7..41081d99 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2602&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2602&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2602.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2602/head:pull/2602 PR: https://git.openjdk.java.net/jdk/pull/2602 From iwalulya at openjdk.java.net Sat Feb 20 13:36:40 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Sat, 20 Feb 2021 13:36:40 GMT Subject: RFR: 8228748: Remove GCLocker::_doing_gc [v2] In-Reply-To: References: Message-ID: On Sat, 20 Feb 2021 10:27:04 GMT, Albert Mingkun Yang wrote: >> Some refactoring in `GCLocker` and more comments in `jni_lock` on how the synchronization works there. > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Lgtm! ------------- Marked as reviewed by iwalulya (Committer). PR: https://git.openjdk.java.net/jdk/pull/2602 From kbarrett at openjdk.java.net Sat Feb 20 14:04:39 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sat, 20 Feb 2021 14:04:39 GMT Subject: RFR: 8262049: [TESTBUG] Shenandoah: Adjustments in TestReferenceRefersTo.java for IU mode In-Reply-To: References: Message-ID: <_U1Cx7873x8vv9p4mF8vpeEh6lxR7synbooaKqFoR1E=.f893fa98-bb06-4a69-8cb2-1d751f5ef650@github.com> On Fri, 19 Feb 2021 19:48:51 GMT, Roman Kennke wrote: > Shenandoah's IU mode allows referents to be cleared even when accessed during concurrent marking. The test TestReferenceRefersTo.java needs to be adjusted to allow for that. > > Test: > - [x] TestReferenceRefersTo.java + Shenandoah/IU > - [x] TestReferenceRefersTo.java + Shenandoah/SATB > - [x] TestReferenceRefersTo.java + G1 Changes requested by kbarrett (Reviewer). test/hotspot/jtreg/gc/TestReferenceRefersTo.java line 166: > 164: > 165: private static boolean isShenandoahIUMode() { > 166: return WB.getBooleanVMFlag("UseShenandoahGC") && "iu".equals(WB.getStringVMFlag("ShenandoahGCMode")); This should be using sun.hotspot.gc.GC.Shenandoah.isSelected() test/hotspot/jtreg/gc/TestReferenceRefersTo.java line 211: > 209: } else { > 210: expectNotCleared(testWeak4, "testWeak4"); > 211: } I think I would prefer to keep this test program "generic", rather than having this Shenandoah IU mode intrusion. So remove the old check of testWeak4 state here, and remove the check of obj4 below. Instead, change the later check of testWeak4 being notified, where the new test is that either testWeak4 and obj4 are both null (IU and the like) or both are non-null (SATB and others). Then add a couple of tests programs for the specific clearing or not clearing expected behaviors, with appropriate `@requires` restrictions. ------------- PR: https://git.openjdk.java.net/jdk/pull/2653 From kbarrett at openjdk.java.net Sat Feb 20 14:08:50 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sat, 20 Feb 2021 14:08:50 GMT Subject: RFR: 8262049: [TESTBUG] Shenandoah: Adjustments in TestReferenceRefersTo.java for IU mode In-Reply-To: <_U1Cx7873x8vv9p4mF8vpeEh6lxR7synbooaKqFoR1E=.f893fa98-bb06-4a69-8cb2-1d751f5ef650@github.com> References: <_U1Cx7873x8vv9p4mF8vpeEh6lxR7synbooaKqFoR1E=.f893fa98-bb06-4a69-8cb2-1d751f5ef650@github.com> Message-ID: <-Tb0x43_7gBFH4oPVCoeC9ogfvU_TSBilaQ6jDdYPrE=.21639387-3f88-4ac6-8f5c-35564fc8f6fe@github.com> On Sat, 20 Feb 2021 14:01:51 GMT, Kim Barrett wrote: >> Shenandoah's IU mode allows referents to be cleared even when accessed during concurrent marking. The test TestReferenceRefersTo.java needs to be adjusted to allow for that. >> >> Test: >> - [x] TestReferenceRefersTo.java + Shenandoah/IU >> - [x] TestReferenceRefersTo.java + Shenandoah/SATB >> - [x] TestReferenceRefersTo.java + G1 > > Changes requested by kbarrett (Reviewer). Because this is a shared test, I suggest renaming the bug to something like "[TESTBUG] Fix TestReferenceRefersTo.java for Shenandoah IU mode", and remove the gc-shenandoah label. ------------- PR: https://git.openjdk.java.net/jdk/pull/2653 From ayang at openjdk.java.net Sat Feb 20 15:37:39 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Sat, 20 Feb 2021 15:37:39 GMT Subject: RFR: 8228748: Remove GCLocker::_doing_gc [v2] In-Reply-To: References: Message-ID: On Sat, 20 Feb 2021 13:33:44 GMT, Ivan Walulya wrote: >> Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > Lgtm! Thanks for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/2602 From github.com+31506961+vshining at openjdk.java.net Sat Feb 20 16:47:39 2021 From: github.com+31506961+vshining at openjdk.java.net (Andrey Vershinin) Date: Sat, 20 Feb 2021 16:47:39 GMT Subject: RFR: JDK-8254239: G1ConcurrentMark.hpp unnecessarily disables MSVC++ warning 4522. In-Reply-To: References: <5QLZk5Yxoc3tVFvm0sKBQZ4KGgEG-obeHlxRPOza0yI=.c53526e0-80cc-4bdd-a765-e8590988cf99@github.com> Message-ID: On Fri, 19 Feb 2021 17:19:15 GMT, Ioi Lam wrote: >> This is a simple change removing disabling of MSVC++ warning 4522. Since it only affects build process, no tests were ran. > > Pre-submit test was skipped because: > >> Testing is not configured >> In order to run pre-submit tests, the source repository must be properly configured to allow test execution. See https://wiki.openjdk.java.net/display/SKARA/Testing for more information on how to configure this. > > Since this is a build change, please enable pre-submit testing to make sure it doesn't break anything. @iklam Thanks for the notice, the tests have passed now. ------------- PR: https://git.openjdk.java.net/jdk/pull/2646 From iklam at openjdk.java.net Sat Feb 20 19:53:39 2021 From: iklam at openjdk.java.net (Ioi Lam) Date: Sat, 20 Feb 2021 19:53:39 GMT Subject: RFR: JDK-8254239: G1ConcurrentMark.hpp unnecessarily disables MSVC++ warning 4522. In-Reply-To: <5QLZk5Yxoc3tVFvm0sKBQZ4KGgEG-obeHlxRPOza0yI=.c53526e0-80cc-4bdd-a765-e8590988cf99@github.com> References: <5QLZk5Yxoc3tVFvm0sKBQZ4KGgEG-obeHlxRPOza0yI=.c53526e0-80cc-4bdd-a765-e8590988cf99@github.com> Message-ID: On Fri, 19 Feb 2021 15:05:24 GMT, Andrey Vershinin wrote: > This is a simple change removing disabling of MSVC++ warning 4522. Since it only affects build process, no tests were ran. Marked as reviewed by iklam (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2646 From kbarrett at openjdk.java.net Sun Feb 21 03:00:41 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sun, 21 Feb 2021 03:00:41 GMT Subject: RFR: JDK-8254239: G1ConcurrentMark.hpp unnecessarily disables MSVC++ warning 4522. In-Reply-To: <5QLZk5Yxoc3tVFvm0sKBQZ4KGgEG-obeHlxRPOza0yI=.c53526e0-80cc-4bdd-a765-e8590988cf99@github.com> References: <5QLZk5Yxoc3tVFvm0sKBQZ4KGgEG-obeHlxRPOza0yI=.c53526e0-80cc-4bdd-a765-e8590988cf99@github.com> Message-ID: On Fri, 19 Feb 2021 15:05:24 GMT, Andrey Vershinin wrote: > This is a simple change removing disabling of MSVC++ warning 4522. Since it only affects build process, no tests were ran. Marked as reviewed by kbarrett (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2646 From ayang at openjdk.java.net Sun Feb 21 11:35:56 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Sun, 21 Feb 2021 11:35:56 GMT Subject: RFR: 8262087: Use atomic boolean type in G1FullGCAdjustTask Message-ID: Use atomic boolean type to make the intention clear. ------------- Commit messages: - atomic_bool Changes: https://git.openjdk.java.net/jdk/pull/2664/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2664&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8262087 Stats: 4 lines in 2 files changed: 0 ins; 1 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/2664.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2664/head:pull/2664 PR: https://git.openjdk.java.net/jdk/pull/2664 From kbarrett at openjdk.java.net Mon Feb 22 05:55:41 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Mon, 22 Feb 2021 05:55:41 GMT Subject: RFR: 8262087: Use atomic boolean type in G1FullGCAdjustTask In-Reply-To: References: Message-ID: <3pSbtGhZgKwu29czJJnR8LyvGJOxtytSSu0ZqAZCtDE=.e4fc80d6-ea1d-4e93-b789-7cf5d0bfdda5@github.com> On Sun, 21 Feb 2021 11:30:52 GMT, Albert Mingkun Yang wrote: > Use atomic boolean type to make the intention clear. The change looks fine. I would hope though that in the future this flag will be eliminated and this can instead invoke parallel reference processing, rather than forcing it to be done single threaded. Doing anything about that is a task for after Leo's in-progress work on cleaning up reference processing tasking. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2664 From stefank at openjdk.java.net Mon Feb 22 08:28:40 2021 From: stefank at openjdk.java.net (Stefan Karlsson) Date: Mon, 22 Feb 2021 08:28:40 GMT Subject: RFR: 8261448: Preserve GC stack watermark across safepoints in StackWalk [v3] In-Reply-To: References: Message-ID: On Mon, 15 Feb 2021 15:20:58 GMT, Roman Kennke wrote: >> I am observing the following assert: >> >> # Internal Error (/home/rkennke/src/openjdk/loom/src/hotspot/share/runtime/stackWatermark.cpp:178), pid=54418, tid=54534 >> # assert(is_frame_safe(f)) failed: Frame must be safe >> >> (see issue for full hs_err) >> >> In StackWalk::fetchNextBatch() we prepare the entire stack to be processed by calling StackWatermarkSet::finish_processing(jt, NULL, StackWatermarkKind::gc), but then subsequently, during frames scan, perform allocations to fill in the frame information (fill_in_frames => LiveFrameStream::fill_frame => fill_live_stackframe) at where we could safepoint for GC, which could reset the stack watermark. >> >> This is only relevant for GCs that use the StackWatermark, e.g. ZGC and Shenandoah at the moment. >> >> Solution is to preserve the stack-watermark across safepoints in StackWalk::fetchNextBatch(). StackWalk::fetchFirstBatch() doesn't look to be affected by this: it is not using the stack-watermark. >> >> Testing: >> - [x] StackWalk tests with Shenandoah/aggressive >> - [x] StackWalk tests with ZGC/aggressive >> - [ ] tier1 (+Shenandoah/ZGC) >> - [ ] tier2 (+Shenandoah/ZGC) > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Make KeepStackGCProcessedMark non-reentrant again Looks good. ------------- Marked as reviewed by stefank (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2500 From github.com+31506961+vshining at openjdk.java.net Mon Feb 22 08:34:52 2021 From: github.com+31506961+vshining at openjdk.java.net (Andrey Vershinin) Date: Mon, 22 Feb 2021 08:34:52 GMT Subject: Integrated: JDK-8254239: G1ConcurrentMark.hpp unnecessarily disables MSVC++ warning 4522. In-Reply-To: <5QLZk5Yxoc3tVFvm0sKBQZ4KGgEG-obeHlxRPOza0yI=.c53526e0-80cc-4bdd-a765-e8590988cf99@github.com> References: <5QLZk5Yxoc3tVFvm0sKBQZ4KGgEG-obeHlxRPOza0yI=.c53526e0-80cc-4bdd-a765-e8590988cf99@github.com> Message-ID: On Fri, 19 Feb 2021 15:05:24 GMT, Andrey Vershinin wrote: > This is a simple change removing disabling of MSVC++ warning 4522. Since it only affects build process, no tests were ran. This pull request has now been integrated. Changeset: 26c1db90 Author: Andrey Vershinin Committer: Kim Barrett URL: https://git.openjdk.java.net/jdk/commit/26c1db90 Stats: 6 lines in 1 file changed: 0 ins; 6 del; 0 mod 8254239: G1ConcurrentMark.hpp unnecessarily disables MSVC++ warning 4522. Reviewed-by: iklam, kbarrett ------------- PR: https://git.openjdk.java.net/jdk/pull/2646 From pliden at openjdk.java.net Mon Feb 22 08:47:39 2021 From: pliden at openjdk.java.net (Per Liden) Date: Mon, 22 Feb 2021 08:47:39 GMT Subject: RFR: 8258431: Provide a JFR event with live set size estimate In-Reply-To: <7HVs4jngEbNIQIPQByuE6IRYAxdijfa82uhEFWHld5U=.a7784482-d7e1-4d59-88ee-455d8691631e@github.com> References: <7HVs4jngEbNIQIPQByuE6IRYAxdijfa82uhEFWHld5U=.a7784482-d7e1-4d59-88ee-455d8691631e@github.com> Message-ID: On Mon, 15 Feb 2021 17:23:44 GMT, Jaroslav Bachorik wrote: > The purpose of this change is to expose a 'cheap' estimate of the current live set size (the meaning of 'current' is dependent on each particular GC implementation but in worst case 'at last full GC') in form of a periodically emitted JFR event. > > ## Introducing new JFR event > > While there is already 'GC Heap Summary' JFR event it does not fit the requirements as it is closely tied to GC cycle so eg. for ZGC or Shenandoah it may not happen for quite a long time, increasing the risk of not having the heap summary events being present in the JFR recording at all. > Because of this I am proposing to add a new 'Heap Usage Summary' event which will be emitted periodically, by default on each JFR chunk, and will contain the information abut the heap capacity, the used and live bytes. This information is available from all GC implementations and can be provided at literally any time. > > ## Implementation > > The implementation differs from GC to GC because each GC algorithm/implementation provides a slightly different way to track the liveness. The common part is `size_t live() const` method added to `CollectedHeap` superclass and the use of a cached 'liveness' value computed after the last GC cycle. If `liveness` hasn't been calculated yet the implementation will default to returning 'used' value. > > The implementations are based on my (rather shallow) knowledge of inner working of the respective GC engines and I am open to suggestions to make them better/correct. > > ### Epsilon GC > > Trivial implementation - just return `used()` instead. > > ### Serial GC > > Here we utilize the fact that mark-copy phase is naturally compacting so the number of bytes after copy is 'live' and that the mark-sweep implementation keeps an internal info about objects being 'dead' but excluded from the compaction effort and we can these numbers to derive the old-gen live set size (used bytes minus the cumulative size of the 'un-dead' objects). > > ### Parallel GC > > For Parallel GC the liveness is calculated as the sum of used bytes in all regions after the last GC cycle. This seems to be a safe bet because this collector is always compacting (AFAIK). > > ### G1 GC > > Using `G1ConcurrentMark::remark()` method the live set size is computed as a sum of `_live_words` from the associated `G1RegionMarkStats` objects. Here I am not 100% sure this approach covers all eventualities and it would be great to have someone skilled in G1 implementation to chime in so I can fix it. However, the numbers I am getting for G1 are comparable to other GCs for the same application. > > ### Shenandoah > > In Shenandoah, the regions are keeping the liveness info. However, the VM op that is used for iterating regions is a safe-pointing one so it would be great to run it in an already safe-pointed context. > This leads to hooking into `ShenandoahConcurrentMark::finish_mark()` and `ShenandoahSTWMark::mark()` where at the end of the marking process the liveness info is summarized and set to `ShenandoahHeap::_live` volatile field - which is later read by the event emitting code. > > ### ZGC > > `ZStatHeap` is already holding the liveness info - so this implementation is just making it accessible via `ZCollectedHeap::live()` method. src/hotspot/share/gc/z/zStat.hpp line 549: > 547: static size_t used_at_mark_start(); > 548: static size_t used_at_relocate_end(); > 549: static size_t live(); Please call this `live_at_mark_end()` to match the names of the neighboring functions. ------------- PR: https://git.openjdk.java.net/jdk/pull/2579 From tschatzl at openjdk.java.net Mon Feb 22 08:54:49 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 22 Feb 2021 08:54:49 GMT Subject: RFR: 8262087: Use atomic boolean type in G1FullGCAdjustTask In-Reply-To: References: Message-ID: On Sun, 21 Feb 2021 11:30:52 GMT, Albert Mingkun Yang wrote: > Use atomic boolean type to make the intention clear. Marked as reviewed by tschatzl (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2664 From rkennke at openjdk.java.net Mon Feb 22 09:35:46 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 22 Feb 2021 09:35:46 GMT Subject: RFR: 8261448: Preserve GC stack watermark across safepoints in StackWalk [v3] In-Reply-To: References: Message-ID: On Mon, 22 Feb 2021 08:26:19 GMT, Stefan Karlsson wrote: > Looks good. Thanks, Stefan! @fisk also good? ------------- PR: https://git.openjdk.java.net/jdk/pull/2500 From pliden at openjdk.java.net Mon Feb 22 09:39:44 2021 From: pliden at openjdk.java.net (Per Liden) Date: Mon, 22 Feb 2021 09:39:44 GMT Subject: RFR: 8228748: Remove GCLocker::_doing_gc [v2] In-Reply-To: References: Message-ID: On Sat, 20 Feb 2021 10:27:04 GMT, Albert Mingkun Yang wrote: >> Some refactoring in `GCLocker` and more comments in `jni_lock` on how the synchronization works there. > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Looks good. ------------- Marked as reviewed by pliden (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2602 From eosterlund at openjdk.java.net Mon Feb 22 09:42:40 2021 From: eosterlund at openjdk.java.net (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 22 Feb 2021 09:42:40 GMT Subject: RFR: 8261448: Preserve GC stack watermark across safepoints in StackWalk [v3] In-Reply-To: References: Message-ID: On Mon, 15 Feb 2021 15:20:58 GMT, Roman Kennke wrote: >> I am observing the following assert: >> >> # Internal Error (/home/rkennke/src/openjdk/loom/src/hotspot/share/runtime/stackWatermark.cpp:178), pid=54418, tid=54534 >> # assert(is_frame_safe(f)) failed: Frame must be safe >> >> (see issue for full hs_err) >> >> In StackWalk::fetchNextBatch() we prepare the entire stack to be processed by calling StackWatermarkSet::finish_processing(jt, NULL, StackWatermarkKind::gc), but then subsequently, during frames scan, perform allocations to fill in the frame information (fill_in_frames => LiveFrameStream::fill_frame => fill_live_stackframe) at where we could safepoint for GC, which could reset the stack watermark. >> >> This is only relevant for GCs that use the StackWatermark, e.g. ZGC and Shenandoah at the moment. >> >> Solution is to preserve the stack-watermark across safepoints in StackWalk::fetchNextBatch(). StackWalk::fetchFirstBatch() doesn't look to be affected by this: it is not using the stack-watermark. >> >> Testing: >> - [x] StackWalk tests with Shenandoah/aggressive >> - [x] StackWalk tests with ZGC/aggressive >> - [ ] tier1 (+Shenandoah/ZGC) >> - [ ] tier2 (+Shenandoah/ZGC) > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Make KeepStackGCProcessedMark non-reentrant again Also good! ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2500 From tschatzl at openjdk.java.net Mon Feb 22 10:04:40 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 22 Feb 2021 10:04:40 GMT Subject: RFR: 8242032: G1 region remembered sets may contain non-coarse level PRTs for already coarsened regions [v2] In-Reply-To: References: <4IlVuwhMhFGZty7e8RXnSx_E57BJ9tVAhgCpL6pc7ts=.c7457983-075d-4c7b-a6d5-81e48ce54f2a@github.com> Message-ID: On Fri, 19 Feb 2021 11:26:17 GMT, Joakim Nordstr?m wrote: >> This fix adds a check for coarsened region in mutex guarded section, when adding a reference to a remembered set. >> >> Haven't been able to produce a testcase -- please advice on how to, or if not necessary. >> >> **Testing:** >> * hs-tier, hs-tier2 > > Joakim Nordstr?m has updated the pull request incrementally with two additional commits since the last revision: > > - Clarified comment on re-checking coarsening > - Minor typo Marked as reviewed by tschatzl (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2545 From github.com+779991+jaokim at openjdk.java.net Mon Feb 22 10:08:40 2021 From: github.com+779991+jaokim at openjdk.java.net (Joakim =?UTF-8?B?Tm9yZHN0csO2bQ==?=) Date: Mon, 22 Feb 2021 10:08:40 GMT Subject: RFR: 8242032: G1 region remembered sets may contain non-coarse level PRTs for already coarsened regions [v2] In-Reply-To: References: <4IlVuwhMhFGZty7e8RXnSx_E57BJ9tVAhgCpL6pc7ts=.c7457983-075d-4c7b-a6d5-81e48ce54f2a@github.com> Message-ID: On Fri, 19 Feb 2021 10:38:48 GMT, Albert Mingkun Yang wrote: >> Joakim Nordstr?m has updated the pull request incrementally with two additional commits since the last revision: >> >> - Clarified comment on re-checking coarsening >> - Minor typo > > Marked as reviewed by ayang (Author). Thank you @albertnetymk and @tschatzl for review! ------------- PR: https://git.openjdk.java.net/jdk/pull/2545 From tschatzl at openjdk.java.net Mon Feb 22 10:13:39 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 22 Feb 2021 10:13:39 GMT Subject: RFR: 8228748: Remove GCLocker::_doing_gc [v2] In-Reply-To: References: Message-ID: On Sat, 20 Feb 2021 10:27:04 GMT, Albert Mingkun Yang wrote: >> Some refactoring in `GCLocker` and more comments in `jni_lock` on how the synchronization works there. > > Albert Mingkun Yang has updated the pull request incrementally with one additional commit since the last revision: > > review Marked as reviewed by tschatzl (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2602 From ayang at openjdk.java.net Mon Feb 22 10:13:41 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Mon, 22 Feb 2021 10:13:41 GMT Subject: Integrated: 8228748: Remove GCLocker::_doing_gc In-Reply-To: References: Message-ID: On Wed, 17 Feb 2021 08:00:04 GMT, Albert Mingkun Yang wrote: > Some refactoring in `GCLocker` and more comments in `jni_lock` on how the synchronization works there. This pull request has now been integrated. Changeset: 6b7575bb Author: Albert Mingkun Yang Committer: Thomas Schatzl URL: https://git.openjdk.java.net/jdk/commit/6b7575bb Stats: 16 lines in 2 files changed: 4 ins; 5 del; 7 mod 8228748: Remove GCLocker::_doing_gc Reviewed-by: kbarrett, iwalulya, pliden, tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/2602 From rkennke at openjdk.java.net Mon Feb 22 10:13:48 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 22 Feb 2021 10:13:48 GMT Subject: Integrated: 8261448: Preserve GC stack watermark across safepoints in StackWalk In-Reply-To: References: Message-ID: On Wed, 10 Feb 2021 10:07:20 GMT, Roman Kennke wrote: > I am observing the following assert: > > # Internal Error (/home/rkennke/src/openjdk/loom/src/hotspot/share/runtime/stackWatermark.cpp:178), pid=54418, tid=54534 > # assert(is_frame_safe(f)) failed: Frame must be safe > > (see issue for full hs_err) > > In StackWalk::fetchNextBatch() we prepare the entire stack to be processed by calling StackWatermarkSet::finish_processing(jt, NULL, StackWatermarkKind::gc), but then subsequently, during frames scan, perform allocations to fill in the frame information (fill_in_frames => LiveFrameStream::fill_frame => fill_live_stackframe) at where we could safepoint for GC, which could reset the stack watermark. > > This is only relevant for GCs that use the StackWatermark, e.g. ZGC and Shenandoah at the moment. > > Solution is to preserve the stack-watermark across safepoints in StackWalk::fetchNextBatch(). StackWalk::fetchFirstBatch() doesn't look to be affected by this: it is not using the stack-watermark. > > Testing: > - [x] StackWalk tests with Shenandoah/aggressive > - [x] StackWalk tests with ZGC/aggressive > - [x] tier1 (+Shenandoah/ZGC) > - [x] tier2 (+Shenandoah/ZGC) This pull request has now been integrated. Changeset: c20fb5db Author: Roman Kennke URL: https://git.openjdk.java.net/jdk/commit/c20fb5db Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod 8261448: Preserve GC stack watermark across safepoints in StackWalk Reviewed-by: eosterlund, stefank ------------- PR: https://git.openjdk.java.net/jdk/pull/2500 From ayang at openjdk.java.net Mon Feb 22 10:40:39 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Mon, 22 Feb 2021 10:40:39 GMT Subject: RFR: 8262087: Use atomic boolean type in G1FullGCAdjustTask In-Reply-To: References: Message-ID: On Mon, 22 Feb 2021 08:51:32 GMT, Thomas Schatzl wrote: >> Use atomic boolean type to make the intention clear. > > Marked as reviewed by tschatzl (Reviewer). Thanks for the review. ------------- PR: https://git.openjdk.java.net/jdk/pull/2664 From lkorinth at openjdk.java.net Mon Feb 22 11:31:43 2021 From: lkorinth at openjdk.java.net (Leo Korinth) Date: Mon, 22 Feb 2021 11:31:43 GMT Subject: RFR: 8261799: Remove unnecessary cast in psParallelCompact.hpp In-Reply-To: References: <_FiQTtJC_l4VWbY5gIUZPcpPqCkaDfz4QHH3nylSPFM=.f636b4c3-c715-486e-a9a1-b14c7ef2fbd7@github.com> Message-ID: <5UFK8bDnB9CqgPe6TLi0tQDfCpxutaL51ZjsEVXBodQ=.310e9827-f4c1-4f3c-be02-04517bdac615@github.com> On Thu, 18 Feb 2021 21:25:10 GMT, Stefan Karlsson wrote: >> Unnecessary casts confuses me. > > Marked as reviewed by stefank (Reviewer). Thanks Stefan and Albert! ------------- PR: https://git.openjdk.java.net/jdk/pull/2628 From lkorinth at openjdk.java.net Mon Feb 22 11:34:40 2021 From: lkorinth at openjdk.java.net (Leo Korinth) Date: Mon, 22 Feb 2021 11:34:40 GMT Subject: Integrated: 8261799: Remove unnecessary cast in psParallelCompact.hpp In-Reply-To: <_FiQTtJC_l4VWbY5gIUZPcpPqCkaDfz4QHH3nylSPFM=.f636b4c3-c715-486e-a9a1-b14c7ef2fbd7@github.com> References: <_FiQTtJC_l4VWbY5gIUZPcpPqCkaDfz4QHH3nylSPFM=.f636b4c3-c715-486e-a9a1-b14c7ef2fbd7@github.com> Message-ID: On Thu, 18 Feb 2021 15:26:38 GMT, Leo Korinth wrote: > Unnecessary casts confuses me. This pull request has now been integrated. Changeset: 011f5a54 Author: Leo Korinth URL: https://git.openjdk.java.net/jdk/commit/011f5a54 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8261799: Remove unnecessary cast in psParallelCompact.hpp Reviewed-by: ayang, stefank ------------- PR: https://git.openjdk.java.net/jdk/pull/2628 From lkorinth at openjdk.java.net Mon Feb 22 11:36:46 2021 From: lkorinth at openjdk.java.net (Leo Korinth) Date: Mon, 22 Feb 2021 11:36:46 GMT Subject: RFR: 8261803: Remove unused TaskTerminator in g1 full gc ref proc executor In-Reply-To: References: Message-ID: On Fri, 19 Feb 2021 08:37:13 GMT, Stefan Johansson wrote: >> 8261803: Remove unused TaskTerminator in g1 full gc ref proc executor > > Thanks for cleaning this up. Thanks Albert, Thomas and Stefan! ------------- PR: https://git.openjdk.java.net/jdk/pull/2629 From lkorinth at openjdk.java.net Mon Feb 22 11:36:47 2021 From: lkorinth at openjdk.java.net (Leo Korinth) Date: Mon, 22 Feb 2021 11:36:47 GMT Subject: Integrated: 8261803: Remove unused TaskTerminator in g1 full gc ref proc executor In-Reply-To: References: Message-ID: On Thu, 18 Feb 2021 15:37:30 GMT, Leo Korinth wrote: > 8261803: Remove unused TaskTerminator in g1 full gc ref proc executor This pull request has now been integrated. Changeset: 419717dd Author: Leo Korinth URL: https://git.openjdk.java.net/jdk/commit/419717dd Stats: 5 lines in 2 files changed: 0 ins; 2 del; 3 mod 8261803: Remove unused TaskTerminator in g1 full gc ref proc executor Reviewed-by: ayang, tschatzl, sjohanss ------------- PR: https://git.openjdk.java.net/jdk/pull/2629 From rkennke at openjdk.java.net Mon Feb 22 11:39:17 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 22 Feb 2021 11:39:17 GMT Subject: RFR: 8262049: [TESTBUG] Fix TestReferenceRefersTo.java for Shenandoah IU mode [v2] In-Reply-To: References: Message-ID: > Shenandoah's IU mode allows referents to be cleared even when accessed during concurrent marking. The test TestReferenceRefersTo.java needs to be adjusted to allow for that. > > Test: > - [x] TestReferenceRefersTo.java + Shenandoah/IU > - [x] TestReferenceRefersTo.java + Shenandoah/SATB > - [x] TestReferenceRefersTo.java + G1 Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Split TestReferenceRefersTo test in generic and non-Shenandoah parts ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2653/files - new: https://git.openjdk.java.net/jdk/pull/2653/files/b1fffb02..9746bc85 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2653&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2653&range=00-01 Stats: 230 lines in 2 files changed: 206 ins; 20 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/2653.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2653/head:pull/2653 PR: https://git.openjdk.java.net/jdk/pull/2653 From rkennke at openjdk.java.net Mon Feb 22 11:39:18 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 22 Feb 2021 11:39:18 GMT Subject: RFR: 8262049: [TESTBUG] Fix TestReferenceRefersTo.java for Shenandoah IU mode [v2] In-Reply-To: <_U1Cx7873x8vv9p4mF8vpeEh6lxR7synbooaKqFoR1E=.f893fa98-bb06-4a69-8cb2-1d751f5ef650@github.com> References: <_U1Cx7873x8vv9p4mF8vpeEh6lxR7synbooaKqFoR1E=.f893fa98-bb06-4a69-8cb2-1d751f5ef650@github.com> Message-ID: On Sat, 20 Feb 2021 13:55:33 GMT, Kim Barrett wrote: > This should be using sun.hotspot.gc.GC.Shenandoah.isSelected() Yes, but which int constant should be used there? Doesn't matter much, I'm not using this at all anymore, following your other suggestions. > test/hotspot/jtreg/gc/TestReferenceRefersTo.java line 211: > >> 209: } else { >> 210: expectNotCleared(testWeak4, "testWeak4"); >> 211: } > > I think I would prefer to keep this test program "generic", rather than having this Shenandoah IU mode intrusion. So remove the old check of testWeak4 state here, and remove the check of obj4 below. Instead, change the later check of testWeak4 being notified, where the new test is that either testWeak4 and obj4 are both null (IU and the like) or both are non-null (SATB and others). Then add a couple of tests programs for the specific clearing or not clearing expected behaviors, with appropriate `@requires` restrictions. Right, that is even better. I made the base test generic, extracted the offending parts into its own test, and will push the Shenandoah specific test under a different PR. Thank you! ------------- PR: https://git.openjdk.java.net/jdk/pull/2653 From rkennke at openjdk.java.net Mon Feb 22 11:57:56 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 22 Feb 2021 11:57:56 GMT Subject: RFR: 8262122: [TESTBUG] Shenandoah-specific variant of TestReferenceRefersTo Message-ID: Before JDK-8262049, the test TestReferenceRefersTo.java has been failing with I-U mode, because it asserted that weak references would not be cleared when accessed during mark. JDK-8262049 split up the test into a generic part that removed the offending test, and a non-Shenandoah part that contains the test. I think it would be useful to add the full test with Shenandoah runners under gc/shenandoah to include it in hotspot_gc_shenandoah runs. Test: - [x] TestReferenceRefersToShenandoah.java - [ ] hotspot_gc_shenandoah ------------- Commit messages: - 8262122: [TESTBUG] Shenandoah-specific variant of TestReferenceRefersTo Changes: https://git.openjdk.java.net/jdk/pull/2674/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2674&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8262122 Stats: 325 lines in 1 file changed: 325 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2674.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2674/head:pull/2674 PR: https://git.openjdk.java.net/jdk/pull/2674 From rkennke at openjdk.java.net Mon Feb 22 15:12:03 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 22 Feb 2021 15:12:03 GMT Subject: RFR: 8262049: [TESTBUG] Fix TestReferenceRefersTo.java for Shenandoah IU mode [v3] In-Reply-To: References: Message-ID: > Shenandoah's IU mode allows referents to be cleared even when accessed during concurrent marking. The test TestReferenceRefersTo.java needs to be adjusted to allow for that. > > Test: > - [x] TestReferenceRefersTo.java + Shenandoah/IU > - [x] TestReferenceRefersTo.java + Shenandoah/SATB > - [x] TestReferenceRefersTo.java + G1 Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix compilation failures after renames ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2653/files - new: https://git.openjdk.java.net/jdk/pull/2653/files/9746bc85..54a027c9 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2653&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2653&range=01-02 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.java.net/jdk/pull/2653.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2653/head:pull/2653 PR: https://git.openjdk.java.net/jdk/pull/2653 From kbarrett at openjdk.java.net Mon Feb 22 15:34:42 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Mon, 22 Feb 2021 15:34:42 GMT Subject: RFR: 8262049: [TESTBUG] Fix TestReferenceRefersTo.java for Shenandoah IU mode [v3] In-Reply-To: References: Message-ID: On Mon, 22 Feb 2021 15:12:03 GMT, Roman Kennke wrote: >> Shenandoah's IU mode allows referents to be cleared even when accessed during concurrent marking. The test TestReferenceRefersTo.java needs to be adjusted to allow for that. >> >> Test: >> - [x] TestReferenceRefersTo.java + Shenandoah/IU >> - [x] TestReferenceRefersTo.java + Shenandoah/SATB >> - [x] TestReferenceRefersTo.java + G1 > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix compilation failures after renames Changes requested by kbarrett (Reviewer). test/hotspot/jtreg/gc/TestReferenceRefersTo.java line 241: > 239: } > 240: if ((testWeak4 == null) != (obj4 == null)) { > 241: fail("either referent is cleared and we got notified, or neither of this happened"); It might be helpful if the failure reported which one was non-null. Not that it should ever fail... test/hotspot/jtreg/gc/TestReferenceRefersToDuringConcMark.java line 96: > 94: } > 95: > 96: private static void expectCleared(Reference ref, Unused. test/hotspot/jtreg/gc/TestReferenceRefersToDuringConcMark.java line 27: > 25: > 26: /* @test > 27: * @requires vm.gc != "Epsilon" I think this test "works" for Epsilon just as well as it does for Serial or Parallel or any other GC that doesn't support concurrent breakpoints, and either all should be excluded or none. test/hotspot/jtreg/gc/TestReferenceRefersToDuringConcMark.java line 28: > 26: /* @test > 27: * @requires vm.gc != "Epsilon" > 28: * @requires vm.gc != "Shenandoah" I think this test works for Shenandoah so long as it's not in IU mode. Is that possible to exclude with another `@requires` constraint? test/hotspot/jtreg/gc/TestReferenceRefersToDuringConcMark.java line 39: > 37: */ > 38: > 39: import java.lang.ref.PhantomReference; PhantomReference is unused. test/hotspot/jtreg/gc/TestReferenceRefersToDuringConcMark.java line 145: > 143: > 144: progress("acquire control of concurrent cycles"); > 145: WB.concurrentGCAcquireControl(); I think this test could be made a lot smaller and more obvious if it was explicitly just testing the keep-alive behavior of Reference.get for most concurrent collectors, rather than being a trimmed down copy of the earlier test. ------------- PR: https://git.openjdk.java.net/jdk/pull/2653 From github.com+779991+jaokim at openjdk.java.net Mon Feb 22 16:19:41 2021 From: github.com+779991+jaokim at openjdk.java.net (Joakim =?UTF-8?B?Tm9yZHN0csO2bQ==?=) Date: Mon, 22 Feb 2021 16:19:41 GMT Subject: Integrated: 8242032: G1 region remembered sets may contain non-coarse level PRTs for already coarsened regions In-Reply-To: <4IlVuwhMhFGZty7e8RXnSx_E57BJ9tVAhgCpL6pc7ts=.c7457983-075d-4c7b-a6d5-81e48ce54f2a@github.com> References: <4IlVuwhMhFGZty7e8RXnSx_E57BJ9tVAhgCpL6pc7ts=.c7457983-075d-4c7b-a6d5-81e48ce54f2a@github.com> Message-ID: On Fri, 12 Feb 2021 13:01:52 GMT, Joakim Nordstr?m wrote: > This fix adds a check for coarsened region in mutex guarded section, when adding a reference to a remembered set. > > Haven't been able to produce a testcase -- please advice on how to, or if not necessary. > > **Testing:** > * hs-tier, hs-tier2 This pull request has now been integrated. Changeset: a6a7e439 Author: Joakim Nordstr?m Committer: Thomas Schatzl URL: https://git.openjdk.java.net/jdk/commit/a6a7e439 Stats: 9 lines in 1 file changed: 9 ins; 0 del; 0 mod 8242032: G1 region remembered sets may contain non-coarse level PRTs for already coarsened regions Reviewed-by: ayang, tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/2545 From rkennke at openjdk.java.net Mon Feb 22 16:26:00 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 22 Feb 2021 16:26:00 GMT Subject: RFR: 8262049: [TESTBUG] Fix TestReferenceRefersTo.java for Shenandoah IU mode [v4] In-Reply-To: References: Message-ID: > Shenandoah's IU mode allows referents to be cleared even when accessed during concurrent marking. The test TestReferenceRefersTo.java needs to be adjusted to allow for that. > > Test: > - [x] TestReferenceRefersTo.java + Shenandoah/IU > - [x] TestReferenceRefersTo.java + Shenandoah/SATB > - [x] TestReferenceRefersTo.java + G1 Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Some more trimming ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2653/files - new: https://git.openjdk.java.net/jdk/pull/2653/files/54a027c9..51f2d695 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2653&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2653&range=02-03 Stats: 90 lines in 2 files changed: 1 ins; 79 del; 10 mod Patch: https://git.openjdk.java.net/jdk/pull/2653.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2653/head:pull/2653 PR: https://git.openjdk.java.net/jdk/pull/2653 From rkennke at openjdk.java.net Mon Feb 22 16:26:03 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 22 Feb 2021 16:26:03 GMT Subject: RFR: 8262049: [TESTBUG] Fix TestReferenceRefersTo.java for Shenandoah IU mode [v3] In-Reply-To: References: Message-ID: On Mon, 22 Feb 2021 15:25:46 GMT, Kim Barrett wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix compilation failures after renames > > test/hotspot/jtreg/gc/TestReferenceRefersToDuringConcMark.java line 27: > >> 25: >> 26: /* @test >> 27: * @requires vm.gc != "Epsilon" > > I think this test "works" for Epsilon just as well as it does for Serial or Parallel or any other GC that doesn't support concurrent breakpoints, and either all should be excluded or none. Right. I excluded none. > test/hotspot/jtreg/gc/TestReferenceRefersToDuringConcMark.java line 28: > >> 26: /* @test >> 27: * @requires vm.gc != "Epsilon" >> 28: * @requires vm.gc != "Shenandoah" > > I think this test works for Shenandoah so long as it's not in IU mode. Is that possible to exclude with another `@requires` constraint? How would I do that? IU mode can only be distinguished by VM flag. > test/hotspot/jtreg/gc/TestReferenceRefersToDuringConcMark.java line 145: > >> 143: >> 144: progress("acquire control of concurrent cycles"); >> 145: WB.concurrentGCAcquireControl(); > > I think this test could be made a lot smaller and more obvious if it was explicitly just testing the keep-alive behavior of Reference.get for most concurrent collectors, rather than being a trimmed down copy of the earlier test. Right. I trimmed it some more. I think it's cleaner now. ------------- PR: https://git.openjdk.java.net/jdk/pull/2653 From tschatzl at openjdk.java.net Mon Feb 22 17:23:43 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 22 Feb 2021 17:23:43 GMT Subject: RFR: 8258431: Provide a JFR event with live set size estimate In-Reply-To: <7HVs4jngEbNIQIPQByuE6IRYAxdijfa82uhEFWHld5U=.a7784482-d7e1-4d59-88ee-455d8691631e@github.com> References: <7HVs4jngEbNIQIPQByuE6IRYAxdijfa82uhEFWHld5U=.a7784482-d7e1-4d59-88ee-455d8691631e@github.com> Message-ID: On Mon, 15 Feb 2021 17:23:44 GMT, Jaroslav Bachorik wrote: > The purpose of this change is to expose a 'cheap' estimate of the current live set size (the meaning of 'current' is dependent on each particular GC implementation but in worst case 'at last full GC') in form of a periodically emitted JFR event. > > ## Introducing new JFR event > > While there is already 'GC Heap Summary' JFR event it does not fit the requirements as it is closely tied to GC cycle so eg. for ZGC or Shenandoah it may not happen for quite a long time, increasing the risk of not having the heap summary events being present in the JFR recording at all. > Because of this I am proposing to add a new 'Heap Usage Summary' event which will be emitted periodically, by default on each JFR chunk, and will contain the information abut the heap capacity, the used and live bytes. This information is available from all GC implementations and can be provided at literally any time. > > ## Implementation > > The implementation differs from GC to GC because each GC algorithm/implementation provides a slightly different way to track the liveness. The common part is `size_t live() const` method added to `CollectedHeap` superclass and the use of a cached 'liveness' value computed after the last GC cycle. If `liveness` hasn't been calculated yet the implementation will default to returning 'used' value. > > The implementations are based on my (rather shallow) knowledge of inner working of the respective GC engines and I am open to suggestions to make them better/correct. > > ### Epsilon GC > > Trivial implementation - just return `used()` instead. > > ### Serial GC > > Here we utilize the fact that mark-copy phase is naturally compacting so the number of bytes after copy is 'live' and that the mark-sweep implementation keeps an internal info about objects being 'dead' but excluded from the compaction effort and we can these numbers to derive the old-gen live set size (used bytes minus the cumulative size of the 'un-dead' objects). > > ### Parallel GC > > For Parallel GC the liveness is calculated as the sum of used bytes in all regions after the last GC cycle. This seems to be a safe bet because this collector is always compacting (AFAIK). > > ### G1 GC > > Using `G1ConcurrentMark::remark()` method the live set size is computed as a sum of `_live_words` from the associated `G1RegionMarkStats` objects. Here I am not 100% sure this approach covers all eventualities and it would be great to have someone skilled in G1 implementation to chime in so I can fix it. However, the numbers I am getting for G1 are comparable to other GCs for the same application. > > ### Shenandoah > > In Shenandoah, the regions are keeping the liveness info. However, the VM op that is used for iterating regions is a safe-pointing one so it would be great to run it in an already safe-pointed context. > This leads to hooking into `ShenandoahConcurrentMark::finish_mark()` and `ShenandoahSTWMark::mark()` where at the end of the marking process the liveness info is summarized and set to `ShenandoahHeap::_live` volatile field - which is later read by the event emitting code. > > ### ZGC > > `ZStatHeap` is already holding the liveness info - so this implementation is just making it accessible via `ZCollectedHeap::live()` method. The change also misses liveness update after G1 Full GC: it should at least reset the internal liveness counter to 0 so that `used()` is used. I think there is the same issue for Parallel Full GC. Serial seems to be handled. src/hotspot/share/gc/shared/collectedHeap.hpp line 217: > 215: virtual size_t capacity() const = 0; > 216: virtual size_t used() const = 0; > 217: // a best-effort estimate of the live set size I would prefer @shipilev's comment. Also I would like to suggest to call this method `live_estimate()` to set the expectations right. src/hotspot/share/gc/g1/g1ConcurrentMark.cpp line 1114: > 1112: > 1113: _g1h->set_live(live_size * HeapWordSize); > 1114: This code is located in the wrong place. It will return only the live words for the areas that have been marked, not eden or objects allocated in old gen after the marking started. Further it iterates over all regions, which can be large compared to actually active regions. A better place is in `G1UpdateRemSetTrackingBeforeRebuild::do_heap_region()` after the last method call - at that point, `HeapRegion::live_bytes()` contains the per-region number of live data for all regions. `G1UpdateRemSetTrackingBeforeRebuild` is instantiated and then called by multiple threads. It's probably best that that `HeapClosure` locally sums up the live byte estimates and then in the caller `G1UpdateRemSetTrackingBeforeRebuildTask::work()` sums up the per thread results like is done for `G1UpdateRemSetTrackingBeforeRebuildTask::_total_selected_for_rebuild`, which is then set in the caller of the `G1UpdateRemSetTrackingBeforeRebuildTask`. src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 1850: > 1848: size_t G1CollectedHeap::live() const { > 1849: size_t size = Atomic::load(&_live_size); > 1850: return size > 0 ? size : used(); note that `used()` is susceptible to fluttering due to memory ordering problems: since its result consists of multiple reads, you can get readings from very different situations. It is recommended to use `used_unlocked()` instead, which does not take allocation regions and archive regions into account, but at least it is not susceptible to jumping around when re-reading it in quick succession. src/hotspot/share/gc/parallel/parallelScavengeHeap.inline.hpp line 49: > 47: _young_live = young_gen()->used_in_bytes(); > 48: _eden_live = young_gen()->eden_space()->used_in_bytes(); > 49: _old_live = old_gen()->used_in_bytes(); `_young_live` already seems to contain `_eden_live` looking at the implementation of `PSYoungGen::used_in_bytes()`: I.e. `size_t PSYoungGen::used_in_bytes() const { return eden_space()->used_in_bytes() + from_space()->used_in_bytes(); // to_space() is only used during scavenge } ` but maybe I'm wrong here. src/hotspot/share/gc/shared/genCollectedHeap.cpp line 683: > 681: } > 682: // update the live size after last GC > 683: _live_size = _young_gen->live() + _old_gen->live(); I would prefer if that code were placed into `gc_epilogue`. src/hotspot/share/gc/shared/space.inline.hpp line 189: > 187: oop obj = oop(cur_obj); > 188: size_t obj_size = obj->size(); > 189: live_offset += obj_size; It seems more natural to me to put this counting into the `DeadSpacer` as this is what this change does. Also, the actual dead space "used" can be calculated from the difference between the `_allowed_deadspace_words` and the maximum (calculated in the constructor of `DeadSpacer`) afaict at the end of evacuation. So there is no need to incur per-object costs during evacuation at all. ------------- Changes requested by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2579 From tschatzl at openjdk.java.net Mon Feb 22 17:23:44 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 22 Feb 2021 17:23:44 GMT Subject: RFR: 8258431: Provide a JFR event with live set size estimate In-Reply-To: References: <7HVs4jngEbNIQIPQByuE6IRYAxdijfa82uhEFWHld5U=.a7784482-d7e1-4d59-88ee-455d8691631e@github.com> Message-ID: On Thu, 18 Feb 2021 10:15:37 GMT, Aleksey Shipilev wrote: >> The purpose of this change is to expose a 'cheap' estimate of the current live set size (the meaning of 'current' is dependent on each particular GC implementation but in worst case 'at last full GC') in form of a periodically emitted JFR event. >> >> ## Introducing new JFR event >> >> While there is already 'GC Heap Summary' JFR event it does not fit the requirements as it is closely tied to GC cycle so eg. for ZGC or Shenandoah it may not happen for quite a long time, increasing the risk of not having the heap summary events being present in the JFR recording at all. >> Because of this I am proposing to add a new 'Heap Usage Summary' event which will be emitted periodically, by default on each JFR chunk, and will contain the information abut the heap capacity, the used and live bytes. This information is available from all GC implementations and can be provided at literally any time. >> >> ## Implementation >> >> The implementation differs from GC to GC because each GC algorithm/implementation provides a slightly different way to track the liveness. The common part is `size_t live() const` method added to `CollectedHeap` superclass and the use of a cached 'liveness' value computed after the last GC cycle. If `liveness` hasn't been calculated yet the implementation will default to returning 'used' value. >> >> The implementations are based on my (rather shallow) knowledge of inner working of the respective GC engines and I am open to suggestions to make them better/correct. >> >> ### Epsilon GC >> >> Trivial implementation - just return `used()` instead. >> >> ### Serial GC >> >> Here we utilize the fact that mark-copy phase is naturally compacting so the number of bytes after copy is 'live' and that the mark-sweep implementation keeps an internal info about objects being 'dead' but excluded from the compaction effort and we can these numbers to derive the old-gen live set size (used bytes minus the cumulative size of the 'un-dead' objects). >> >> ### Parallel GC >> >> For Parallel GC the liveness is calculated as the sum of used bytes in all regions after the last GC cycle. This seems to be a safe bet because this collector is always compacting (AFAIK). >> >> ### G1 GC >> >> Using `G1ConcurrentMark::remark()` method the live set size is computed as a sum of `_live_words` from the associated `G1RegionMarkStats` objects. Here I am not 100% sure this approach covers all eventualities and it would be great to have someone skilled in G1 implementation to chime in so I can fix it. However, the numbers I am getting for G1 are comparable to other GCs for the same application. >> >> ### Shenandoah >> >> In Shenandoah, the regions are keeping the liveness info. However, the VM op that is used for iterating regions is a safe-pointing one so it would be great to run it in an already safe-pointed context. >> This leads to hooking into `ShenandoahConcurrentMark::finish_mark()` and `ShenandoahSTWMark::mark()` where at the end of the marking process the liveness info is summarized and set to `ShenandoahHeap::_live` volatile field - which is later read by the event emitting code. >> >> ### ZGC >> >> `ZStatHeap` is already holding the liveness info - so this implementation is just making it accessible via `ZCollectedHeap::live()` method. > > src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 4578: > >> 4576: >> 4577: void G1CollectedHeap::set_live(size_t bytes) { >> 4578: Atomic::release_store(&_live_size, bytes); > > I don't think this requires `release_store`, regular `store` would be enough. G1 folks can say for sure. Not required. > src/hotspot/share/gc/shared/genCollectedHeap.hpp line 183: > >> 181: size_t live = _live_size; >> 182: return live > 0 ? live : used(); >> 183: }; > > I think the implementation belongs to `genCollectedHeap.cpp`. +1. Does not seem to be performance sensitive. ------------- PR: https://git.openjdk.java.net/jdk/pull/2579 From tschatzl at openjdk.java.net Mon Feb 22 17:23:45 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Mon, 22 Feb 2021 17:23:45 GMT Subject: RFR: 8258431: Provide a JFR event with live set size estimate In-Reply-To: References: <7HVs4jngEbNIQIPQByuE6IRYAxdijfa82uhEFWHld5U=.a7784482-d7e1-4d59-88ee-455d8691631e@github.com> Message-ID: On Fri, 19 Feb 2021 08:22:56 GMT, Albert Mingkun Yang wrote: >> The purpose of this change is to expose a 'cheap' estimate of the current live set size (the meaning of 'current' is dependent on each particular GC implementation but in worst case 'at last full GC') in form of a periodically emitted JFR event. >> >> ## Introducing new JFR event >> >> While there is already 'GC Heap Summary' JFR event it does not fit the requirements as it is closely tied to GC cycle so eg. for ZGC or Shenandoah it may not happen for quite a long time, increasing the risk of not having the heap summary events being present in the JFR recording at all. >> Because of this I am proposing to add a new 'Heap Usage Summary' event which will be emitted periodically, by default on each JFR chunk, and will contain the information abut the heap capacity, the used and live bytes. This information is available from all GC implementations and can be provided at literally any time. >> >> ## Implementation >> >> The implementation differs from GC to GC because each GC algorithm/implementation provides a slightly different way to track the liveness. The common part is `size_t live() const` method added to `CollectedHeap` superclass and the use of a cached 'liveness' value computed after the last GC cycle. If `liveness` hasn't been calculated yet the implementation will default to returning 'used' value. >> >> The implementations are based on my (rather shallow) knowledge of inner working of the respective GC engines and I am open to suggestions to make them better/correct. >> >> ### Epsilon GC >> >> Trivial implementation - just return `used()` instead. >> >> ### Serial GC >> >> Here we utilize the fact that mark-copy phase is naturally compacting so the number of bytes after copy is 'live' and that the mark-sweep implementation keeps an internal info about objects being 'dead' but excluded from the compaction effort and we can these numbers to derive the old-gen live set size (used bytes minus the cumulative size of the 'un-dead' objects). >> >> ### Parallel GC >> >> For Parallel GC the liveness is calculated as the sum of used bytes in all regions after the last GC cycle. This seems to be a safe bet because this collector is always compacting (AFAIK). >> >> ### G1 GC >> >> Using `G1ConcurrentMark::remark()` method the live set size is computed as a sum of `_live_words` from the associated `G1RegionMarkStats` objects. Here I am not 100% sure this approach covers all eventualities and it would be great to have someone skilled in G1 implementation to chime in so I can fix it. However, the numbers I am getting for G1 are comparable to other GCs for the same application. >> >> ### Shenandoah >> >> In Shenandoah, the regions are keeping the liveness info. However, the VM op that is used for iterating regions is a safe-pointing one so it would be great to run it in an already safe-pointed context. >> This leads to hooking into `ShenandoahConcurrentMark::finish_mark()` and `ShenandoahSTWMark::mark()` where at the end of the marking process the liveness info is summarized and set to `ShenandoahHeap::_live` volatile field - which is later read by the event emitting code. >> >> ### ZGC >> >> `ZStatHeap` is already holding the liveness info - so this implementation is just making it accessible via `ZCollectedHeap::live()` method. > > src/hotspot/share/gc/parallel/parallelScavengeHeap.hpp line 79: > >> 77: size_t _young_live; >> 78: size_t _eden_live; >> 79: size_t _old_live; > > It's only the sum that's ever exposed, right? I wonder if it makes sense to merge them into one var to only track the sum. I agree because they seem to be always read and written at the same time. ------------- PR: https://git.openjdk.java.net/jdk/pull/2579 From rkennke at openjdk.java.net Mon Feb 22 18:56:39 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 22 Feb 2021 18:56:39 GMT Subject: RFR: 8261973: Shenandoah: Cleanup/simplify root verifier In-Reply-To: References: Message-ID: On Fri, 19 Feb 2021 13:53:26 GMT, Zhengyu Gu wrote: > Root processing has gone through significant changes. For example, we used to mark through weak roots when class unloading is off, that is no long the case, OopStorages also simplify roots. > > Shenandoah root verifier can be simplified into 2 cases, with/without class unloading. > > - [x] hotspot_gc_shenandoah with -XX:+ShenandoahVerify Looks good! Thank you! ------------- Marked as reviewed by rkennke (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2643 From rkennke at openjdk.java.net Mon Feb 22 19:03:06 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Mon, 22 Feb 2021 19:03:06 GMT Subject: RFR: 8262049: [TESTBUG] Fix TestReferenceRefersTo.java for Shenandoah IU mode [v5] In-Reply-To: References: Message-ID: > Shenandoah's IU mode allows referents to be cleared even when accessed during concurrent marking. The test TestReferenceRefersTo.java needs to be adjusted to allow for that. > > Test: > - [x] TestReferenceRefersTo.java + Shenandoah/IU > - [x] TestReferenceRefersTo.java + Shenandoah/SATB > - [x] TestReferenceRefersTo.java + G1 Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix another compilation failure ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2653/files - new: https://git.openjdk.java.net/jdk/pull/2653/files/51f2d695..44f99b86 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2653&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2653&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2653.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2653/head:pull/2653 PR: https://git.openjdk.java.net/jdk/pull/2653 From zgu at openjdk.java.net Mon Feb 22 19:16:39 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Mon, 22 Feb 2021 19:16:39 GMT Subject: Integrated: 8261973: Shenandoah: Cleanup/simplify root verifier In-Reply-To: References: Message-ID: On Fri, 19 Feb 2021 13:53:26 GMT, Zhengyu Gu wrote: > Root processing has gone through significant changes. For example, we used to mark through weak roots when class unloading is off, that is no long the case, OopStorages also simplify roots. > > Shenandoah root verifier can be simplified into 2 cases, with/without class unloading. > > - [x] hotspot_gc_shenandoah with -XX:+ShenandoahVerify This pull request has now been integrated. Changeset: 7b924d8a Author: Zhengyu Gu URL: https://git.openjdk.java.net/jdk/commit/7b924d8a Stats: 198 lines in 4 files changed: 10 ins; 159 del; 29 mod 8261973: Shenandoah: Cleanup/simplify root verifier Reviewed-by: rkennke ------------- PR: https://git.openjdk.java.net/jdk/pull/2643 From egahlin at openjdk.java.net Mon Feb 22 19:40:39 2021 From: egahlin at openjdk.java.net (Erik Gahlin) Date: Mon, 22 Feb 2021 19:40:39 GMT Subject: RFR: 8258431: Provide a JFR event with live set size estimate In-Reply-To: <7HVs4jngEbNIQIPQByuE6IRYAxdijfa82uhEFWHld5U=.a7784482-d7e1-4d59-88ee-455d8691631e@github.com> References: <7HVs4jngEbNIQIPQByuE6IRYAxdijfa82uhEFWHld5U=.a7784482-d7e1-4d59-88ee-455d8691631e@github.com> Message-ID: On Mon, 15 Feb 2021 17:23:44 GMT, Jaroslav Bachorik wrote: > The purpose of this change is to expose a 'cheap' estimate of the current live set size (the meaning of 'current' is dependent on each particular GC implementation but in worst case 'at last full GC') in form of a periodically emitted JFR event. > > ## Introducing new JFR event > > While there is already 'GC Heap Summary' JFR event it does not fit the requirements as it is closely tied to GC cycle so eg. for ZGC or Shenandoah it may not happen for quite a long time, increasing the risk of not having the heap summary events being present in the JFR recording at all. > Because of this I am proposing to add a new 'Heap Usage Summary' event which will be emitted periodically, by default on each JFR chunk, and will contain the information abut the heap capacity, the used and live bytes. This information is available from all GC implementations and can be provided at literally any time. > > ## Implementation > > The implementation differs from GC to GC because each GC algorithm/implementation provides a slightly different way to track the liveness. The common part is `size_t live() const` method added to `CollectedHeap` superclass and the use of a cached 'liveness' value computed after the last GC cycle. If `liveness` hasn't been calculated yet the implementation will default to returning 'used' value. > > The implementations are based on my (rather shallow) knowledge of inner working of the respective GC engines and I am open to suggestions to make them better/correct. > > ### Epsilon GC > > Trivial implementation - just return `used()` instead. > > ### Serial GC > > Here we utilize the fact that mark-copy phase is naturally compacting so the number of bytes after copy is 'live' and that the mark-sweep implementation keeps an internal info about objects being 'dead' but excluded from the compaction effort and we can these numbers to derive the old-gen live set size (used bytes minus the cumulative size of the 'un-dead' objects). > > ### Parallel GC > > For Parallel GC the liveness is calculated as the sum of used bytes in all regions after the last GC cycle. This seems to be a safe bet because this collector is always compacting (AFAIK). > > ### G1 GC > > Using `G1ConcurrentMark::remark()` method the live set size is computed as a sum of `_live_words` from the associated `G1RegionMarkStats` objects. Here I am not 100% sure this approach covers all eventualities and it would be great to have someone skilled in G1 implementation to chime in so I can fix it. However, the numbers I am getting for G1 are comparable to other GCs for the same application. > > ### Shenandoah > > In Shenandoah, the regions are keeping the liveness info. However, the VM op that is used for iterating regions is a safe-pointing one so it would be great to run it in an already safe-pointed context. > This leads to hooking into `ShenandoahConcurrentMark::finish_mark()` and `ShenandoahSTWMark::mark()` where at the end of the marking process the liveness info is summarized and set to `ShenandoahHeap::_live` volatile field - which is later read by the event emitting code. > > ### ZGC > > `ZStatHeap` is already holding the liveness info - so this implementation is just making it accessible via `ZCollectedHeap::live()` method. src/hotspot/share/jfr/metadata/metadata.xml line 205: > 203: > 204: > 205: I think it would be good to mention in the description that it is an estimate, i.e. "Estimate of live bytes ....". ------------- PR: https://git.openjdk.java.net/jdk/pull/2579 From kbarrett at openjdk.java.net Tue Feb 23 08:27:41 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 23 Feb 2021 08:27:41 GMT Subject: RFR: 8262049: [TESTBUG] Fix TestReferenceRefersTo.java for Shenandoah IU mode [v3] In-Reply-To: References: Message-ID: On Mon, 22 Feb 2021 16:22:36 GMT, Roman Kennke wrote: >> test/hotspot/jtreg/gc/TestReferenceRefersToDuringConcMark.java line 28: >> >>> 26: /* @test >>> 27: * @requires vm.gc != "Epsilon" >>> 28: * @requires vm.gc != "Shenandoah" >> >> I think this test works for Shenandoah so long as it's not in IU mode. Is that possible to exclude with another `@requires` constraint? > > How would I do that? IU mode can only be distinguished by VM flag. I've not tried it, but this might work. `@requires vm.gc != "Shenandoah" | vm.opt.ShenandoahGCMode != "iu"` ------------- PR: https://git.openjdk.java.net/jdk/pull/2653 From rkennke at openjdk.java.net Tue Feb 23 08:44:58 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 23 Feb 2021 08:44:58 GMT Subject: RFR: 8262049: [TESTBUG] Fix TestReferenceRefersTo.java for Shenandoah IU mode [v6] In-Reply-To: References: Message-ID: > Shenandoah's IU mode allows referents to be cleared even when accessed during concurrent marking. The test TestReferenceRefersTo.java needs to be adjusted to allow for that. > > Test: > - [x] TestReferenceRefersTo.java + Shenandoah/IU > - [x] TestReferenceRefersTo.java + Shenandoah/SATB > - [x] TestReferenceRefersTo.java + G1 Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Allow Shenandoah for TestReferenceRefersToDuringConcMark test, except IU mode ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2653/files - new: https://git.openjdk.java.net/jdk/pull/2653/files/44f99b86..8f4d8606 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2653&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2653&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2653.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2653/head:pull/2653 PR: https://git.openjdk.java.net/jdk/pull/2653 From rkennke at openjdk.java.net Tue Feb 23 08:44:58 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 23 Feb 2021 08:44:58 GMT Subject: RFR: 8262049: [TESTBUG] Fix TestReferenceRefersTo.java for Shenandoah IU mode [v3] In-Reply-To: References: Message-ID: On Tue, 23 Feb 2021 08:24:51 GMT, Kim Barrett wrote: >> How would I do that? IU mode can only be distinguished by VM flag. > > I've not tried it, but this might work. > `@requires vm.gc != "Shenandoah" | vm.opt.ShenandoahGCMode != "iu"` Indeed, it does. I changed the test requires accordingly. ------------- PR: https://git.openjdk.java.net/jdk/pull/2653 From ayang at openjdk.java.net Tue Feb 23 09:30:42 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 23 Feb 2021 09:30:42 GMT Subject: Integrated: 8262087: Use atomic boolean type in G1FullGCAdjustTask In-Reply-To: References: Message-ID: On Sun, 21 Feb 2021 11:30:52 GMT, Albert Mingkun Yang wrote: > Use atomic boolean type to make the intention clear. This pull request has now been integrated. Changeset: 12f6ba0d Author: Albert Mingkun Yang Committer: Thomas Schatzl URL: https://git.openjdk.java.net/jdk/commit/12f6ba0d Stats: 4 lines in 2 files changed: 0 ins; 1 del; 3 mod 8262087: Use atomic boolean type in G1FullGCAdjustTask Reviewed-by: kbarrett, tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/2664 From tschatzl at openjdk.java.net Tue Feb 23 14:16:00 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 23 Feb 2021 14:16:00 GMT Subject: RFR: 8262197: JDK-8242032 uses wrong contains_reference() in assertion code Message-ID: <66c7QAgFeccKl5T9du1SQOo8XsPazCW5kbnxlwf2HD0=.b34272fa-277b-42ee-a331-71f5da075504@github.com> Hi all, can I have reviews for this change that fixes use of the wrong HeapRegionRemSet::contains_reference() method, causing a thread lock the same mutex again, resulting in problems like assertion failures? The code in question has been introduced in JDK-8242032: + // Rechecking if the region is coarsened, while holding the lock. + if (is_region_coarsened(from_hrm_ind)) { + assert(contains_reference(from), "We just found " PTR_FORMAT " in the Coarse table", p2i(from)); + return; + } The problem is the call to `contains_reference`, which locks the same lock that "we know we are already locking" per the comment above. Correct is using `contains_reference_locked` added for just this purpose. In the original change the PR already mentioned that the situation where this condition should hold could not be reproduced - now we know that it actually occurs ;) Testing: tier1. Trying to reproduce with the original some of the failing tests without luck - however the problematic line and the fix is very obvious. Thanks, Thomas ------------- Commit messages: - Use correct contains_reference_locked Changes: https://git.openjdk.java.net/jdk/pull/2690/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2690&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8262197 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2690.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2690/head:pull/2690 PR: https://git.openjdk.java.net/jdk/pull/2690 From ayang at openjdk.java.net Tue Feb 23 15:28:41 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Tue, 23 Feb 2021 15:28:41 GMT Subject: RFR: 8262197: JDK-8242032 uses wrong contains_reference() in assertion code In-Reply-To: <66c7QAgFeccKl5T9du1SQOo8XsPazCW5kbnxlwf2HD0=.b34272fa-277b-42ee-a331-71f5da075504@github.com> References: <66c7QAgFeccKl5T9du1SQOo8XsPazCW5kbnxlwf2HD0=.b34272fa-277b-42ee-a331-71f5da075504@github.com> Message-ID: On Tue, 23 Feb 2021 13:22:25 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this change that fixes use of the wrong HeapRegionRemSet::contains_reference() method, causing a thread lock the same mutex again, resulting in problems like assertion failures? > > The code in question has been introduced in JDK-8242032: > > > + // Rechecking if the region is coarsened, while holding the lock. > + if (is_region_coarsened(from_hrm_ind)) { > + assert(contains_reference(from), "We just found " PTR_FORMAT " in the Coarse table", p2i(from)); > + return; > + } > > The problem is the call to `contains_reference`, which locks the same lock that "we know we are already locking" per the comment above. Correct is using `contains_reference_locked` added for just this purpose. > > In the original change the PR already mentioned that the situation where this condition should hold could not be reproduced - now we know that it actually occurs ;) > > Testing: tier1. Trying to reproduce with the original some of the failing tests without luck - however the problematic line and the fix is very obvious. > > Thanks, > Thomas Marked as reviewed by ayang (Author). ------------- PR: https://git.openjdk.java.net/jdk/pull/2690 From kbarrett at openjdk.java.net Tue Feb 23 15:48:39 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 23 Feb 2021 15:48:39 GMT Subject: RFR: 8262049: [TESTBUG] Fix TestReferenceRefersTo.java for Shenandoah IU mode [v6] In-Reply-To: References: Message-ID: On Tue, 23 Feb 2021 08:44:58 GMT, Roman Kennke wrote: >> Shenandoah's IU mode allows referents to be cleared even when accessed during concurrent marking. The test TestReferenceRefersTo.java needs to be adjusted to allow for that. >> >> Test: >> - [x] TestReferenceRefersTo.java + Shenandoah/IU >> - [x] TestReferenceRefersTo.java + Shenandoah/SATB >> - [x] TestReferenceRefersTo.java + G1 > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Allow Shenandoah for TestReferenceRefersToDuringConcMark test, except IU mode Marked as reviewed by kbarrett (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2653 From kbarrett at openjdk.java.net Tue Feb 23 15:53:40 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 23 Feb 2021 15:53:40 GMT Subject: RFR: 8262197: JDK-8242032 uses wrong contains_reference() in assertion code In-Reply-To: <66c7QAgFeccKl5T9du1SQOo8XsPazCW5kbnxlwf2HD0=.b34272fa-277b-42ee-a331-71f5da075504@github.com> References: <66c7QAgFeccKl5T9du1SQOo8XsPazCW5kbnxlwf2HD0=.b34272fa-277b-42ee-a331-71f5da075504@github.com> Message-ID: On Tue, 23 Feb 2021 13:22:25 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this change that fixes use of the wrong HeapRegionRemSet::contains_reference() method, causing a thread lock the same mutex again, resulting in problems like assertion failures? > > The code in question has been introduced in JDK-8242032: > > > + // Rechecking if the region is coarsened, while holding the lock. > + if (is_region_coarsened(from_hrm_ind)) { > + assert(contains_reference(from), "We just found " PTR_FORMAT " in the Coarse table", p2i(from)); > + return; > + } > > The problem is the call to `contains_reference`, which locks the same lock that "we know we are already locking" per the comment above. Correct is using `contains_reference_locked` added for just this purpose. > > In the original change the PR already mentioned that the situation where this condition should hold could not be reproduced - now we know that it actually occurs ;) > > Testing: tier1. Trying to reproduce with the original some of the failing tests without luck - however the problematic line and the fix is very obvious. > > Thanks, > Thomas Marked as reviewed by kbarrett (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2690 From kbarrett at openjdk.java.net Tue Feb 23 15:53:40 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Tue, 23 Feb 2021 15:53:40 GMT Subject: RFR: 8262197: JDK-8242032 uses wrong contains_reference() in assertion code In-Reply-To: References: <66c7QAgFeccKl5T9du1SQOo8XsPazCW5kbnxlwf2HD0=.b34272fa-277b-42ee-a331-71f5da075504@github.com> Message-ID: On Tue, 23 Feb 2021 15:49:17 GMT, Kim Barrett wrote: >> Hi all, >> >> can I have reviews for this change that fixes use of the wrong HeapRegionRemSet::contains_reference() method, causing a thread lock the same mutex again, resulting in problems like assertion failures? >> >> The code in question has been introduced in JDK-8242032: >> >> >> + // Rechecking if the region is coarsened, while holding the lock. >> + if (is_region_coarsened(from_hrm_ind)) { >> + assert(contains_reference(from), "We just found " PTR_FORMAT " in the Coarse table", p2i(from)); >> + return; >> + } >> >> The problem is the call to `contains_reference`, which locks the same lock that "we know we are already locking" per the comment above. Correct is using `contains_reference_locked` added for just this purpose. >> >> In the original change the PR already mentioned that the situation where this condition should hold could not be reproduced - now we know that it actually occurs ;) >> >> Testing: tier1. Trying to reproduce with the original some of the failing tests without luck - however the problematic line and the fix is very obvious. >> >> Thanks, >> Thomas > > Marked as reviewed by kbarrett (Reviewer). As this is pretty simple, and seems likely to introduce a lot of testing noise, I'd be okay with pushing without waiting for 24h. ------------- PR: https://git.openjdk.java.net/jdk/pull/2690 From tschatzl at openjdk.java.net Tue Feb 23 15:57:41 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 23 Feb 2021 15:57:41 GMT Subject: RFR: 8262197: JDK-8242032 uses wrong contains_reference() in assertion code In-Reply-To: References: <66c7QAgFeccKl5T9du1SQOo8XsPazCW5kbnxlwf2HD0=.b34272fa-277b-42ee-a331-71f5da075504@github.com> Message-ID: On Tue, 23 Feb 2021 15:25:56 GMT, Albert Mingkun Yang wrote: >> Hi all, >> >> can I have reviews for this change that fixes use of the wrong HeapRegionRemSet::contains_reference() method, causing a thread lock the same mutex again, resulting in problems like assertion failures? >> >> The code in question has been introduced in JDK-8242032: >> >> >> + // Rechecking if the region is coarsened, while holding the lock. >> + if (is_region_coarsened(from_hrm_ind)) { >> + assert(contains_reference(from), "We just found " PTR_FORMAT " in the Coarse table", p2i(from)); >> + return; >> + } >> >> The problem is the call to `contains_reference`, which locks the same lock that "we know we are already locking" per the comment above. Correct is using `contains_reference_locked` added for just this purpose. >> >> In the original change the PR already mentioned that the situation where this condition should hold could not be reproduced - now we know that it actually occurs ;) >> >> Testing: tier1. Trying to reproduce with the original some of the failing tests without luck - however the problematic line and the fix is very obvious. >> >> Thanks, >> Thomas > > Marked as reviewed by ayang (Author). Thanks @albertnetymk @kimbarrett for the reviews. ------------- PR: https://git.openjdk.java.net/jdk/pull/2690 From tschatzl at openjdk.java.net Tue Feb 23 15:57:43 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Tue, 23 Feb 2021 15:57:43 GMT Subject: Integrated: 8262197: JDK-8242032 uses wrong contains_reference() in assertion code In-Reply-To: <66c7QAgFeccKl5T9du1SQOo8XsPazCW5kbnxlwf2HD0=.b34272fa-277b-42ee-a331-71f5da075504@github.com> References: <66c7QAgFeccKl5T9du1SQOo8XsPazCW5kbnxlwf2HD0=.b34272fa-277b-42ee-a331-71f5da075504@github.com> Message-ID: On Tue, 23 Feb 2021 13:22:25 GMT, Thomas Schatzl wrote: > Hi all, > > can I have reviews for this change that fixes use of the wrong HeapRegionRemSet::contains_reference() method, causing a thread lock the same mutex again, resulting in problems like assertion failures? > > The code in question has been introduced in JDK-8242032: > > > + // Rechecking if the region is coarsened, while holding the lock. > + if (is_region_coarsened(from_hrm_ind)) { > + assert(contains_reference(from), "We just found " PTR_FORMAT " in the Coarse table", p2i(from)); > + return; > + } > > The problem is the call to `contains_reference`, which locks the same lock that "we know we are already locking" per the comment above. Correct is using `contains_reference_locked` added for just this purpose. > > In the original change the PR already mentioned that the situation where this condition should hold could not be reproduced - now we know that it actually occurs ;) > > Testing: tier1. Trying to reproduce with the original some of the failing tests without luck - however the problematic line and the fix is very obvious. > > Thanks, > Thomas This pull request has now been integrated. Changeset: 67762de6 Author: Thomas Schatzl URL: https://git.openjdk.java.net/jdk/commit/67762de6 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8262197: JDK-8242032 uses wrong contains_reference() in assertion code Reviewed-by: ayang, kbarrett ------------- PR: https://git.openjdk.java.net/jdk/pull/2690 From zgu at openjdk.java.net Tue Feb 23 18:46:43 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Tue, 23 Feb 2021 18:46:43 GMT Subject: RFR: 8262049: [TESTBUG] Fix TestReferenceRefersTo.java for Shenandoah IU mode [v6] In-Reply-To: References: Message-ID: <6sTm2s7uGccTmtNgAPHE0f4jo8xnLq2wO48h77eKvms=.a7e251f4-f02a-41c9-a4a5-18e9a4a1b6c1@github.com> On Tue, 23 Feb 2021 08:44:58 GMT, Roman Kennke wrote: >> Shenandoah's IU mode allows referents to be cleared even when accessed during concurrent marking. The test TestReferenceRefersTo.java needs to be adjusted to allow for that. >> >> Test: >> - [x] TestReferenceRefersTo.java + Shenandoah/IU >> - [x] TestReferenceRefersTo.java + Shenandoah/SATB >> - [x] TestReferenceRefersTo.java + G1 > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Allow Shenandoah for TestReferenceRefersToDuringConcMark test, except IU mode Good to me. ------------- Marked as reviewed by zgu (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2653 From rkennke at openjdk.java.net Tue Feb 23 21:46:42 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Tue, 23 Feb 2021 21:46:42 GMT Subject: Integrated: 8262049: [TESTBUG] Fix TestReferenceRefersTo.java for Shenandoah IU mode In-Reply-To: References: Message-ID: On Fri, 19 Feb 2021 19:48:51 GMT, Roman Kennke wrote: > Shenandoah's IU mode allows referents to be cleared even when accessed during concurrent marking. The test TestReferenceRefersTo.java needs to be adjusted to allow for that. > > Test: > - [x] TestReferenceRefersTo.java + Shenandoah/IU > - [x] TestReferenceRefersTo.java + Shenandoah/SATB > - [x] TestReferenceRefersTo.java + G1 This pull request has now been integrated. Changeset: c6eae061 Author: Roman Kennke URL: https://git.openjdk.java.net/jdk/commit/c6eae061 Stats: 139 lines in 2 files changed: 127 ins; 7 del; 5 mod 8262049: [TESTBUG] Fix TestReferenceRefersTo.java for Shenandoah IU mode Reviewed-by: kbarrett, zgu ------------- PR: https://git.openjdk.java.net/jdk/pull/2653 From dholmes at openjdk.java.net Tue Feb 23 22:11:53 2021 From: dholmes at openjdk.java.net (David Holmes) Date: Tue, 23 Feb 2021 22:11:53 GMT Subject: Integrated: 8262266: JDK-8262049 fails validate-source In-Reply-To: References: Message-ID: On Tue, 23 Feb 2021 22:05:59 GMT, Daniel D. Daugherty wrote: > A trivial copyright header fix. > Now passes "make CONF=macosx-x86_64-normal-server-release validate-headers". LGTM. Thanks, David ------------- Marked as reviewed by dholmes (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2699 From dcubed at openjdk.java.net Tue Feb 23 22:11:52 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Tue, 23 Feb 2021 22:11:52 GMT Subject: Integrated: 8262266: JDK-8262049 fails validate-source Message-ID: A trivial copyright header fix. Now passes "make CONF=macosx-x86_64-normal-server-release validate-headers". ------------- Commit messages: - 8262266: JDK-8262049 fails validate-source Changes: https://git.openjdk.java.net/jdk/pull/2699/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2699&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8262266 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2699.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2699/head:pull/2699 PR: https://git.openjdk.java.net/jdk/pull/2699 From dcubed at openjdk.java.net Tue Feb 23 22:11:53 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Tue, 23 Feb 2021 22:11:53 GMT Subject: Integrated: 8262266: JDK-8262049 fails validate-source In-Reply-To: References: Message-ID: <_YzfSSdluXPIo-ys8ftnOmsjbUcZ7cut9w4hv42GLno=.8044827c-d1e9-4306-9de4-e6f4486264e9@github.com> On Tue, 23 Feb 2021 22:05:59 GMT, Daniel D. Daugherty wrote: > A trivial copyright header fix. > Now passes "make CONF=macosx-x86_64-normal-server-release validate-headers". This pull request has now been integrated. Changeset: c769388d Author: Daniel D. Daugherty URL: https://git.openjdk.java.net/jdk/commit/c769388d Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod 8262266: JDK-8262049 fails validate-source Reviewed-by: dholmes ------------- PR: https://git.openjdk.java.net/jdk/pull/2699 From dcubed at openjdk.java.net Tue Feb 23 22:11:53 2021 From: dcubed at openjdk.java.net (Daniel D.Daugherty) Date: Tue, 23 Feb 2021 22:11:53 GMT Subject: Integrated: 8262266: JDK-8262049 fails validate-source In-Reply-To: References: Message-ID: On Tue, 23 Feb 2021 22:07:45 GMT, David Holmes wrote: >> A trivial copyright header fix. >> Now passes "make CONF=macosx-x86_64-normal-server-release validate-headers". > > LGTM. > > Thanks, > David @dholmes-ora - Thanks for the fast review! ------------- PR: https://git.openjdk.java.net/jdk/pull/2699 From stuefe at openjdk.java.net Wed Feb 24 08:25:50 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 24 Feb 2021 08:25:50 GMT Subject: RFR: JDK-8256155: os::Linux Populate all large_page_sizes, select smallest page size in reserve_memory_special_huge_tlbfs* [v16] In-Reply-To: References: Message-ID: On Wed, 24 Feb 2021 08:12:52 GMT, Thomas Stuefe wrote: >> Marcus G K Williams has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 25 commits: >> >> - Merge branch 'master' into pull/1153 >> - kstefanj update >> >> Signed-off-by: Marcus G K Williams >> - Merge branch 'master' into update_hlp >> - Merge branch 'master' into update_hlp >> - Remove extraneous ' from warning >> >> Signed-off-by: Marcus G K Williams >> - Merge branch 'master' into update_hlp >> - Merge branch 'master' into update_hlp >> - Merge branch 'master' into update_hlp >> - Fix os::large_page_size() in last update >> >> Signed-off-by: Marcus G K Williams >> - Ivan W. Requested Changes >> >> Removed os::Linux::select_large_page_size and >> use os::page_size_for_region instead >> >> Removed Linux::find_large_page_size and use >> register_large_page_sizes. Streamlined >> Linux::setup_large_page_size >> >> Signed-off-by: Marcus G K Williams >> - ... and 15 more: https://git.openjdk.java.net/jdk/compare/f4cfd758...f2e44ac7 > > src/hotspot/os/linux/os_linux.cpp line 3670: > >> 3668: // If we can't open /sys/kernel/mm/hugepages >> 3669: // Add _default_large_page_size to _page_sizes >> 3670: _page_sizes.add(_default_large_page_size); > > missing return here. But see my general remarks. I would modify this function to not change outside state at all, just to return the found page sizes in a os::PageSizes object. ------------- PR: https://git.openjdk.java.net/jdk/pull/1153 From stuefe at openjdk.java.net Wed Feb 24 08:25:48 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 24 Feb 2021 08:25:48 GMT Subject: RFR: JDK-8256155: os::Linux Populate all large_page_sizes, select smallest page size in reserve_memory_special_huge_tlbfs* [v16] In-Reply-To: References: Message-ID: On Tue, 16 Feb 2021 16:32:56 GMT, Marcus G K Williams wrote: >> When using LargePageSizeInBytes=1G, os::Linux::reserve_memory_special_huge_tlbfs* cannot select large pages smaller than 1G. Code heap usually uses less than 1G, so currently the code precludes code heap from using >> Large pages in this circumstance and when os::Linux::reserve_memory_special_huge_tlbfs* is called page sizes fall back to Linux::page_size() (usually 4k). >> >> This change allows the above use case by populating all large_page_sizes present in /sys/kernel/mm/hugepages in _page_sizes upon calling os::Linux::setup_large_page_size(). >> >> In os::Linux::reserve_memory_special_huge_tlbfs* we then select the largest large page size available in _page_sizes that is smaller than bytes being reserved. > > Marcus G K Williams has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 25 commits: > > - Merge branch 'master' into pull/1153 > - kstefanj update > > Signed-off-by: Marcus G K Williams > - Merge branch 'master' into update_hlp > - Merge branch 'master' into update_hlp > - Remove extraneous ' from warning > > Signed-off-by: Marcus G K Williams > - Merge branch 'master' into update_hlp > - Merge branch 'master' into update_hlp > - Merge branch 'master' into update_hlp > - Fix os::large_page_size() in last update > > Signed-off-by: Marcus G K Williams > - Ivan W. Requested Changes > > Removed os::Linux::select_large_page_size and > use os::page_size_for_region instead > > Removed Linux::find_large_page_size and use > register_large_page_sizes. Streamlined > Linux::setup_large_page_size > > Signed-off-by: Marcus G K Williams > - ... and 15 more: https://git.openjdk.java.net/jdk/compare/f4cfd758...f2e44ac7 Hi Markus, Many apologies for letting this cook too long, the last months have been hectic. I looked closer at the code today, at least the initialization parts, and have some suggestions and remarks. Will look at the runtime side later. A lot of my remarks will be referring to pre-existing code without me pointing it out each time, just know that I am aware that a lot of that stuff has nothing to do with your work. I propose some simplifications and streamlining with initialization. Main point would be to clearly separate getting information from the OS from post-processing (consistency checks and decisions), in addition to a bit clearer naming. --- We have `find_default_large_page_size()` and `register_large_page_sizes()`. The names could be a bit clearer, and I do not think they should be known outside of this file, so I would propose to redefine them to be local convenience functions which just scan the proc fs and do not change outside state, just return values, like this: - `static size_t scan_default_large_page_size();` - `static os::PageSizes scan_multiple_page_support();` (naming is lent from vm/hugetlbpage.txt) --- Today, in `find_default_large_page_size()`, if we have no default huge page configured, currently we return a hard coded default: https://github.com/openjdk/jdk/blob/f2e44ac726bad2e7db1ec9f5e77703a99ccfb683/src/hotspot/os/linux/os_linux.cpp#L3627-L3636 I am not sure this makes sense. The kernel documentation states that if this entry does not exist, we cannot use huge pages. I would consider removing this and just return 0 in that case. The point is that these low level convenience functions should read OS information and not make up stuff. Making up stuff should be done, if at all, in the caller. --- When consistency checking and post-processing what we got from the OS, note that there are slight inconsistencies (preexisting) how we handle things: - we gracefully handle the non-existence of /sys/kernel/mm/hugepages - or if it exists, the fact that the default page size may be missing from it - by transparently adding the default huge page size to os::_page_sizes. - But if the user specifies UseLargePageSize in bytes, overwriting the default large page size, we now require the multi page size to be present in os::_page_sizes. So in that case, /sys/kernel/mm/hugepages had to be present. I mean, either we trust /sys/kernel/mm/hugepages, or we don't. We happily make up page sizes in find_default_large_page_size(), but here we check rather strictly. It makes sense to check the user input for validity, but then, could we not always just require /sys/kernel/mm/hugepages to be present and consistent with /proc/meminfo? --- I am not sure of the usefulness of `os::Linux::setup_large_page_size()`. Its just a thin wrapper. I would remove it and merge it directly into `os::large_page_init()`, which would be easier to understand. So, `os::large_page_init()` could look like this: void os::large_page_init() { // 1) Handle the case where we do not want to use huge pages and hence // there is no need to scan the OS for related info if (!UseLargePages && !UseTransparentHugePages && !UseHugeTLBFS && !UseSHM) { // Not using large pages. return; } if (!FLAG_IS_DEFAULT(UseLargePages) && !UseLargePages) { // The user explicitly turned off large pages. // Ignore the rest of the large pages flags. UseTransparentHugePages = false; UseHugeTLBFS = false; UseSHM = false; return; } // 2) Scan OS info size_t default_large_page_size = scan_default_large_page_size(); if (default_large_page_size == 0) { // We are done, no large pages configured. UseTransparentHugePages = false; UseHugeTLBFS = false; UseSHM = false; return; } os::PageSizes all_pages = scan_multiple_page_support(); // 3) Consistency check and post-processing // It is unclear if /sys/kernel/mm/hugepages/ and /proc/meminfo could disagree. Manually // re-add the default page size to the list of page sizes to be sure. all_pages.add(default_large_page_size); // Handle LargePageSizeInBytes if (!FLAG_IS_DEFAULT(LargePageSizeInBytes) && LargePageSizeInBytes != _default_large_page_size) { ... blabla default_large_page_size = LargePageSizeInBytes log_info(os)("Overriding default huge page size.."); ... } // Now determine the type of large pages to use: os::Linux::setup_large_page_type() set_coredump_filter(LARGEPAGES_BIT); // Any final logging: logloglog } What do you think? I think this would be a bit easier to read and understand, and we have that clear separation between scanning OS info and deciding what we do with it. Still a small nit is that we let the user override the OS info with LargePageSizeInBytes. I rather would have a variable containing unmodified OS info, and a separate variable for whatever we make up. But thats just a small issue. src/hotspot/os/linux/os_linux.cpp line 3679: > 3677: sscanf(entry->d_name, "hugepages-%zukB", &page_size) == 1) { > 3678: // The kernel is using kB, hotspot uses bytes > 3679: if (page_size * K > (size_t)Linux::page_size()) { I do not think excluding the base page size is needed here. The directory only contains entries for huge pages. If for any weird reason this is the same as the base page size (which I have never seen) I would include it, since its a huge page too. But I do not think this can happen. src/hotspot/os/linux/os_linux.cpp line 3670: > 3668: // If we can't open /sys/kernel/mm/hugepages > 3669: // Add _default_large_page_size to _page_sizes > 3670: _page_sizes.add(_default_large_page_size); missing return here. src/hotspot/os/linux/os_linux.cpp line 3692: > 3690: ls.print("Available page sizes: "); > 3691: _page_sizes.print_on(&ls); > 3692: } Does this work and show something? I know UL is initialization time sensitive (which is annoying btw). ------------- Changes requested by stuefe (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/1153 From tschatzl at openjdk.java.net Wed Feb 24 08:36:48 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 24 Feb 2021 08:36:48 GMT Subject: RFR: 8262185: G1: Prune collection set candidates early Message-ID: Hello, can I have reviews for this change to the collection candidate selection procedure, moving the G1HeapWastePercent exclusion criteria right after candidate selection instead of at the end of mixed gcs? This can save lots of memory with negligible other impact. Long version: Currently G1 maintains collection set candidates from the marking phase (where it determines those) until the end of the mixed gc phase. Mixed gc phase ends when the amount of space that can be reclaimed in the java heap with the remaining collection set candidates is smaller than G1HeapWastePercent of the (current) heap capacity. This means that in some cases a significant amount of memory is used for regions that will never be evacuated. In addition to that, maintaining the remembered sets for these never evacuated regions uses execution time and more memory for the remembered set. In fact, in some cases, it can happen that in the first few mixed gcs of a mixed gc phase, remembered set memory consumption *increases* even though the remembered sets of recently evacuated old gen regions are freed. The proposed alternative is to prune the collection set candidates as early as possible, filtering out regions that are never going to be evacuated (or have a very low probability). In some cases doing this can save half of peak remembered set memory usage. There are a few drawbacks here that should be considered during evaluation, also comparing against the old heuristic. * In the old heuristic, G1 checks *at the end* of gc whether the remaining collection set candidates are worth collecting (remaining collectible space < G1HeapWastePercent means: stop). Which helps with ensuring at least some forward progress in evacuating the heap because (assuming there are candidates) at least some space will be reclaimed. (I do not know whether this behavior as is is intentional for that reason, but it has been there since initial implementation; then it did not really matter because G1 has been maintaining the old gen remembered sets for all regions all the time anyway). This is approximated by not removing all of the candidates to have at least one "minimal" mixed collection in this change. * In some cases in the old heuristic G1 would just evacuate all candidate regions (in the extreme in a single gc) if the pause time permitted, reclaiming a bit more space (i.e. that amount < G1HeapWastePercent of the total heap). You would expect (given the default value of 5), there will be more mixed cycles because of that with the new heuristic. Testing: tier1-5, Oracle internal performance test suite Thanks, Thomas ------------- Commit messages: - Prune early initial commit Changes: https://git.openjdk.java.net/jdk/pull/2693/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2693&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8262185 Stats: 59 lines in 5 files changed: 43 ins; 10 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/2693.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2693/head:pull/2693 PR: https://git.openjdk.java.net/jdk/pull/2693 From sjohanss at openjdk.java.net Wed Feb 24 08:40:45 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 24 Feb 2021 08:40:45 GMT Subject: RFR: JDK-8256155: os::Linux Populate all large_page_sizes, select smallest page size in reserve_memory_special_huge_tlbfs* [v16] In-Reply-To: References: Message-ID: On Wed, 24 Feb 2021 08:14:48 GMT, Thomas Stuefe wrote: >> Marcus G K Williams has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 25 commits: >> >> - Merge branch 'master' into pull/1153 >> - kstefanj update >> >> Signed-off-by: Marcus G K Williams >> - Merge branch 'master' into update_hlp >> - Merge branch 'master' into update_hlp >> - Remove extraneous ' from warning >> >> Signed-off-by: Marcus G K Williams >> - Merge branch 'master' into update_hlp >> - Merge branch 'master' into update_hlp >> - Merge branch 'master' into update_hlp >> - Fix os::large_page_size() in last update >> >> Signed-off-by: Marcus G K Williams >> - Ivan W. Requested Changes >> >> Removed os::Linux::select_large_page_size and >> use os::page_size_for_region instead >> >> Removed Linux::find_large_page_size and use >> register_large_page_sizes. Streamlined >> Linux::setup_large_page_size >> >> Signed-off-by: Marcus G K Williams >> - ... and 15 more: https://git.openjdk.java.net/jdk/compare/f4cfd758...f2e44ac7 > > src/hotspot/os/linux/os_linux.cpp line 3692: > >> 3690: ls.print("Available page sizes: "); >> 3691: _page_sizes.print_on(&ls); >> 3692: } > > Does this work and show something? I know UL is initialization time sensitive (which is annoying btw). This comes from I comment I made and UL is initialized here. Not sure this is exactly where this should end up since it will only be printed when large pages are enabled. I think it might make sense to move somewhere else or make it a completely separate change. ------------- PR: https://git.openjdk.java.net/jdk/pull/1153 From sjohanss at openjdk.java.net Wed Feb 24 09:00:43 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 24 Feb 2021 09:00:43 GMT Subject: RFR: JDK-8256155: os::Linux Populate all large_page_sizes, select smallest page size in reserve_memory_special_huge_tlbfs* [v16] In-Reply-To: References: Message-ID: On Wed, 24 Feb 2021 08:23:13 GMT, Thomas Stuefe wrote: > What do you think? I think this would be a bit easier to read and understand, and we have that clear separation between scanning OS info and deciding what we do with it. > I think what you propose Thomas looks good. One additional thing to keep in mind and think about here is how we should do the "sanity checking" when allowing multiple large page sizes. I think the best thing would be to sanity check all and if none succeeds disable `UseLargePages`. > Still a small nit is that we let the user override the OS info with LargePageSizeInBytes. I rather would have a variable containing unmodified OS info, and a separate variable for whatever we make up. But thats just a small issue. I think we need to rethink exactly what `LargePageSizeInBytes` means when allowing multiple large page sizes. I've poked around in this area quite a bit lately and I'm not sure this flag is needed when we scan for available page sizes. But to allow it to go away we would have to change the APIs a bit to start passing down the page size we want to use for a certain mapping rather than using `os::large_page_size()` to get the page size. ------------- PR: https://git.openjdk.java.net/jdk/pull/1153 From lkorinth at openjdk.java.net Wed Feb 24 12:05:55 2021 From: lkorinth at openjdk.java.net (Leo Korinth) Date: Wed, 24 Feb 2021 12:05:55 GMT Subject: RFR: 8261804: Remove field _processing_is_mt, calculate it instead Message-ID: In the reference processor, remove _processing_is_mt. Instead calculate its value in its accessor, processing_is_mt(). This change will remove some state, make RefProcMTDegreeAdjuster a bit simpler and make it much easier to derive when processing is indeed multi threaded. ------------- Commit messages: - 8261804: Remove field _processing_is_mt, calculate it instead Changes: https://git.openjdk.java.net/jdk/pull/2704/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2704&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261804 Stats: 41 lines in 5 files changed: 2 ins; 16 del; 23 mod Patch: https://git.openjdk.java.net/jdk/pull/2704.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2704/head:pull/2704 PR: https://git.openjdk.java.net/jdk/pull/2704 From ayang at openjdk.java.net Wed Feb 24 12:44:38 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Wed, 24 Feb 2021 12:44:38 GMT Subject: RFR: 8261804: Remove field _processing_is_mt, calculate it instead In-Reply-To: References: Message-ID: On Wed, 24 Feb 2021 11:59:48 GMT, Leo Korinth wrote: > In the reference processor, remove _processing_is_mt. Instead calculate its value in its accessor, processing_is_mt(). > > This change will remove some state, make RefProcMTDegreeAdjuster a bit simpler and make it much easier to derive when processing is indeed multi threaded. I am curious if `mt_discovery` could be removed as well following the same reasoning. ------------- Marked as reviewed by ayang (Author). PR: https://git.openjdk.java.net/jdk/pull/2704 From iwalulya at openjdk.java.net Wed Feb 24 12:59:40 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Wed, 24 Feb 2021 12:59:40 GMT Subject: RFR: 8262185: G1: Prune collection set candidates early In-Reply-To: References: Message-ID: <8NBL2PJqPm-dFtKwt9Ys9-wR-QPanaMa51NdhEU8WgU=.bba2d13c-6b41-40b5-aaa5-1e4ee24358ee@github.com> On Tue, 23 Feb 2021 14:07:33 GMT, Thomas Schatzl wrote: > Hello, > > can I have reviews for this change to the collection candidate selection procedure, moving the G1HeapWastePercent exclusion criteria right after candidate selection instead of at the end of mixed gcs? > > This can save lots of memory with negligible other impact. > > Long version: > Currently G1 maintains collection set candidates from the marking phase (where it determines those) until the end of the mixed gc phase. > > Mixed gc phase ends when the amount of space that can be reclaimed in the java heap with the remaining collection set candidates is smaller than G1HeapWastePercent of the (current) heap capacity. > > This means that in some cases a significant amount of memory is used for regions that will never be evacuated. In addition to that, maintaining the remembered sets for these never evacuated regions uses execution time and more memory for the remembered set. > > In fact, in some cases, it can happen that in the first few mixed gcs of a mixed gc phase, remembered set memory consumption *increases* even though the remembered sets of recently evacuated old gen regions are freed. > > The proposed alternative is to prune the collection set candidates as early as possible, filtering out regions that are never going to be evacuated (or have a very low probability). > > In some cases doing this can save half of peak remembered set memory usage. > > There are a few drawbacks here that should be considered during evaluation, also comparing against the old heuristic. > > * In the old heuristic, G1 checks *at the end* of gc whether the remaining collection set candidates are worth collecting (remaining collectible space < G1HeapWastePercent means: stop). Which helps with ensuring at least some forward progress in evacuating the heap because (assuming there are candidates) at least some space will be reclaimed. > (I do not know whether this behavior as is is intentional for that reason, but it has been there since initial implementation; then it did not really matter because G1 has been maintaining the old gen remembered sets for all regions all the time anyway). > This is approximated by not removing all of the candidates to have at least one "minimal" mixed collection in this change. > > * In some cases in the old heuristic G1 would just evacuate all candidate regions (in the extreme in a single gc) if the pause time permitted, reclaiming a bit more space (i.e. that amount < G1HeapWastePercent of the total heap). > You would expect (given the default value of 5), there will be more mixed cycles because of that with the new heuristic. > > Testing: tier1-5, Oracle internal performance test suite > > Thanks, > Thomas Lgtm! ------------- Marked as reviewed by iwalulya (Committer). PR: https://git.openjdk.java.net/jdk/pull/2693 From lkorinth at openjdk.java.net Wed Feb 24 13:22:39 2021 From: lkorinth at openjdk.java.net (Leo Korinth) Date: Wed, 24 Feb 2021 13:22:39 GMT Subject: RFR: 8261804: Remove field _processing_is_mt, calculate it instead In-Reply-To: References: Message-ID: On Wed, 24 Feb 2021 12:42:07 GMT, Albert Mingkun Yang wrote: > I am curious if `mt_discovery` could be removed as well following the same reasoning. Possibly, I will look into it. The logic is a bit different though. ------------- PR: https://git.openjdk.java.net/jdk/pull/2704 From tschatzl at openjdk.java.net Wed Feb 24 13:39:59 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 24 Feb 2021 13:39:59 GMT Subject: RFR: 8262185: G1: Prune collection set candidates early [v2] In-Reply-To: References: Message-ID: > Hello, > > can I have reviews for this change to the collection candidate selection procedure, moving the G1HeapWastePercent exclusion criteria right after candidate selection instead of at the end of mixed gcs? > > This can save lots of memory with negligible other impact. > > Long version: > Currently G1 maintains collection set candidates from the marking phase (where it determines those) until the end of the mixed gc phase. > > Mixed gc phase ends when the amount of space that can be reclaimed in the java heap with the remaining collection set candidates is smaller than G1HeapWastePercent of the (current) heap capacity. > > This means that in some cases a significant amount of memory is used for regions that will never be evacuated. In addition to that, maintaining the remembered sets for these never evacuated regions uses execution time and more memory for the remembered set. > > In fact, in some cases, it can happen that in the first few mixed gcs of a mixed gc phase, remembered set memory consumption *increases* even though the remembered sets of recently evacuated old gen regions are freed. > > The proposed alternative is to prune the collection set candidates as early as possible, filtering out regions that are never going to be evacuated (or have a very low probability). > > In some cases doing this can save half of peak remembered set memory usage. > > There are a few drawbacks here that should be considered during evaluation, also comparing against the old heuristic. > > * In the old heuristic, G1 checks *at the end* of gc whether the remaining collection set candidates are worth collecting (remaining collectible space < G1HeapWastePercent means: stop). Which helps with ensuring at least some forward progress in evacuating the heap because (assuming there are candidates) at least some space will be reclaimed. > (I do not know whether this behavior as is is intentional for that reason, but it has been there since initial implementation; then it did not really matter because G1 has been maintaining the old gen remembered sets for all regions all the time anyway). > This is approximated by not removing all of the candidates to have at least one "minimal" mixed collection in this change. > > * In some cases in the old heuristic G1 would just evacuate all candidate regions (in the extreme in a single gc) if the pause time permitted, reclaiming a bit more space (i.e. that amount < G1HeapWastePercent of the total heap). > You would expect (given the default value of 5), there will be more mixed cycles because of that with the new heuristic. > > Testing: tier1-5, Oracle internal performance test suite > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: iwalulya review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2693/files - new: https://git.openjdk.java.net/jdk/pull/2693/files/bcdd94ed..66736efa Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2693&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2693&range=00-01 Stats: 8 lines in 2 files changed: 1 ins; 0 del; 7 mod Patch: https://git.openjdk.java.net/jdk/pull/2693.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2693/head:pull/2693 PR: https://git.openjdk.java.net/jdk/pull/2693 From tschatzl at openjdk.java.net Wed Feb 24 13:39:59 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 24 Feb 2021 13:39:59 GMT Subject: RFR: 8262185: G1: Prune collection set candidates early [v2] In-Reply-To: <8NBL2PJqPm-dFtKwt9Ys9-wR-QPanaMa51NdhEU8WgU=.bba2d13c-6b41-40b5-aaa5-1e4ee24358ee@github.com> References: <8NBL2PJqPm-dFtKwt9Ys9-wR-QPanaMa51NdhEU8WgU=.bba2d13c-6b41-40b5-aaa5-1e4ee24358ee@github.com> Message-ID: On Wed, 24 Feb 2021 12:56:39 GMT, Ivan Walulya wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> iwalulya review > > Lgtm! @walulyai and me privately discussed some minor naming changes that the most recent push adds since I think they are good. Also tried to improve the comment for `G1CollectionSetCandidates::prune()` a bit. ------------- PR: https://git.openjdk.java.net/jdk/pull/2693 From iwalulya at openjdk.java.net Wed Feb 24 13:58:40 2021 From: iwalulya at openjdk.java.net (Ivan Walulya) Date: Wed, 24 Feb 2021 13:58:40 GMT Subject: RFR: 8262185: G1: Prune collection set candidates early [v2] In-Reply-To: References: Message-ID: On Wed, 24 Feb 2021 13:39:59 GMT, Thomas Schatzl wrote: >> Hello, >> >> can I have reviews for this change to the collection candidate selection procedure, moving the G1HeapWastePercent exclusion criteria right after candidate selection instead of at the end of mixed gcs? >> >> This can save lots of memory with negligible other impact. >> >> Long version: >> Currently G1 maintains collection set candidates from the marking phase (where it determines those) until the end of the mixed gc phase. >> >> Mixed gc phase ends when the amount of space that can be reclaimed in the java heap with the remaining collection set candidates is smaller than G1HeapWastePercent of the (current) heap capacity. >> >> This means that in some cases a significant amount of memory is used for regions that will never be evacuated. In addition to that, maintaining the remembered sets for these never evacuated regions uses execution time and more memory for the remembered set. >> >> In fact, in some cases, it can happen that in the first few mixed gcs of a mixed gc phase, remembered set memory consumption *increases* even though the remembered sets of recently evacuated old gen regions are freed. >> >> The proposed alternative is to prune the collection set candidates as early as possible, filtering out regions that are never going to be evacuated (or have a very low probability). >> >> In some cases doing this can save half of peak remembered set memory usage. >> >> There are a few drawbacks here that should be considered during evaluation, also comparing against the old heuristic. >> >> * In the old heuristic, G1 checks *at the end* of gc whether the remaining collection set candidates are worth collecting (remaining collectible space < G1HeapWastePercent means: stop). Which helps with ensuring at least some forward progress in evacuating the heap because (assuming there are candidates) at least some space will be reclaimed. >> (I do not know whether this behavior as is is intentional for that reason, but it has been there since initial implementation; then it did not really matter because G1 has been maintaining the old gen remembered sets for all regions all the time anyway). >> This is approximated by not removing all of the candidates to have at least one "minimal" mixed collection in this change. >> >> * In some cases in the old heuristic G1 would just evacuate all candidate regions (in the extreme in a single gc) if the pause time permitted, reclaiming a bit more space (i.e. that amount < G1HeapWastePercent of the total heap). >> You would expect (given the default value of 5), there will be more mixed cycles because of that with the new heuristic. >> >> Testing: tier1-5, Oracle internal performance test suite >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > iwalulya review still good! ------------- Marked as reviewed by iwalulya (Committer). PR: https://git.openjdk.java.net/jdk/pull/2693 From sjohanss at openjdk.java.net Wed Feb 24 14:10:44 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 24 Feb 2021 14:10:44 GMT Subject: RFR: 8262185: G1: Prune collection set candidates early [v2] In-Reply-To: References: Message-ID: On Wed, 24 Feb 2021 13:39:59 GMT, Thomas Schatzl wrote: >> Hello, >> >> can I have reviews for this change to the collection candidate selection procedure, moving the G1HeapWastePercent exclusion criteria right after candidate selection instead of at the end of mixed gcs? >> >> This can save lots of memory with negligible other impact. >> >> Long version: >> Currently G1 maintains collection set candidates from the marking phase (where it determines those) until the end of the mixed gc phase. >> >> Mixed gc phase ends when the amount of space that can be reclaimed in the java heap with the remaining collection set candidates is smaller than G1HeapWastePercent of the (current) heap capacity. >> >> This means that in some cases a significant amount of memory is used for regions that will never be evacuated. In addition to that, maintaining the remembered sets for these never evacuated regions uses execution time and more memory for the remembered set. >> >> In fact, in some cases, it can happen that in the first few mixed gcs of a mixed gc phase, remembered set memory consumption *increases* even though the remembered sets of recently evacuated old gen regions are freed. >> >> The proposed alternative is to prune the collection set candidates as early as possible, filtering out regions that are never going to be evacuated (or have a very low probability). >> >> In some cases doing this can save half of peak remembered set memory usage. >> >> There are a few drawbacks here that should be considered during evaluation, also comparing against the old heuristic. >> >> * In the old heuristic, G1 checks *at the end* of gc whether the remaining collection set candidates are worth collecting (remaining collectible space < G1HeapWastePercent means: stop). Which helps with ensuring at least some forward progress in evacuating the heap because (assuming there are candidates) at least some space will be reclaimed. >> (I do not know whether this behavior as is is intentional for that reason, but it has been there since initial implementation; then it did not really matter because G1 has been maintaining the old gen remembered sets for all regions all the time anyway). >> This is approximated by not removing all of the candidates to have at least one "minimal" mixed collection in this change. >> >> * In some cases in the old heuristic G1 would just evacuate all candidate regions (in the extreme in a single gc) if the pause time permitted, reclaiming a bit more space (i.e. that amount < G1HeapWastePercent of the total heap). >> You would expect (given the default value of 5), there will be more mixed cycles because of that with the new heuristic. >> >> Testing: tier1-5, Oracle internal performance test suite >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > iwalulya review Looks good, just a few small comments. src/hotspot/share/gc/g1/g1CollectionSetCandidates.cpp line 40: > 38: void G1CollectionSetCandidates::prune(uint keep_min_regions, size_t prune_total_bytes) { > 39: uint regions_left = _num_regions; > 40: size_t reclaimed_bytes = 0; Wouldn't `pruned_bytes` be a better name? src/hotspot/share/gc/g1/g1CollectionSetCandidates.cpp line 43: > 41: while (regions_left > keep_min_regions && > 42: (at(regions_left - 1)->reclaimable_bytes() + reclaimed_bytes) <= prune_total_bytes) { > 43: uint cur_idx = regions_left - 1; I would prefer to split the second condition into an if+break (to avoid two `regions_left - 1`. What do you think about something like this: Suggestion: while (regions_left > keep_min_regions) { uint cur_idx = regions_left - 1; // Never prune more than prune_total_bytes. if (at(cur_idx)->reclaimable_bytes() + reclaimed_bytes) > prune_total_bytes) { break; } ------------- Changes requested by sjohanss (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2693 From sjohanss at openjdk.java.net Wed Feb 24 15:01:42 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 24 Feb 2021 15:01:42 GMT Subject: RFR: JDK-8256155: os::Linux Populate all large_page_sizes, select smallest page size in reserve_memory_special_huge_tlbfs* In-Reply-To: References: Message-ID: On Wed, 11 Nov 2020 01:48:46 GMT, Marcus G K Williams wrote: > When using LargePageSizeInBytes=1G, os::Linux::reserve_memory_special_huge_tlbfs* cannot select large pages smaller than 1G. Code heap usually uses less than 1G, so currently the code precludes code heap from using > Large pages in this circumstance and when os::Linux::reserve_memory_special_huge_tlbfs* is called page sizes fall back to Linux::page_size() (usually 4k). > > This change allows the above use case by populating all large_page_sizes present in /sys/kernel/mm/hugepages in _page_sizes upon calling os::Linux::setup_large_page_size(). > > In os::Linux::reserve_memory_special_huge_tlbfs* we then select the largest large page size available in _page_sizes that is smaller than bytes being reserved. @mgkwill, I've been doing some measurements trying to see what kind of improvements to expect from backing the code-cache with 2m pages and the heap with 1g pages. Can you share what benchmarks you've used when analyzing the performance of this change and also what kind of setups you've used (heap-size, code-cache size, etc). ------------- PR: https://git.openjdk.java.net/jdk/pull/1153 From stuefe at openjdk.java.net Wed Feb 24 15:11:41 2021 From: stuefe at openjdk.java.net (Thomas Stuefe) Date: Wed, 24 Feb 2021 15:11:41 GMT Subject: RFR: JDK-8256155: os::Linux Populate all large_page_sizes, select smallest page size in reserve_memory_special_huge_tlbfs* [v16] In-Reply-To: References: Message-ID: <5Hmhp7S8616Kfbdsu5ObzFNy2uUFgJPCp0kvHr-U310=.3cabbe74-fe65-436b-973d-d6f3e64cd743@github.com> On Wed, 24 Feb 2021 08:57:29 GMT, Stefan Johansson wrote: > > What do you think? I think this would be a bit easier to read and understand, and we have that clear separation between scanning OS info and deciding what we do with it. > > I think what you propose Thomas looks good. One additional thing to keep in mind and think about here is how we should do the "sanity checking" when allowing multiple large page sizes. I think the best thing would be to sanity check all and if none succeeds disable `UseLargePages`. Oh, sure. I made this not explicit but implied this under "post processing and deciding". Presumably in the context of setup_large_page_type(). > > > Still a small nit is that we let the user override the OS info with LargePageSizeInBytes. I rather would have a variable containing unmodified OS info, and a separate variable for whatever we make up. But thats just a small issue. > > I think we need to rethink exactly what `LargePageSizeInBytes` means when allowing multiple large page sizes. I've poked around in this area quite a bit lately and I'm not sure this flag is needed when we scan for available page sizes. But to allow it to go away we would have to change the APIs a bit to start passing down the page size we want to use for a certain mapping rather than using `os::large_page_size()` to get the page size. If we could do without this flag this would be fine for me too. But how would you let the user specify that the VM is to use a different default page size than is set on system level? ------------- PR: https://git.openjdk.java.net/jdk/pull/1153 From tschatzl at openjdk.java.net Wed Feb 24 15:36:57 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 24 Feb 2021 15:36:57 GMT Subject: RFR: 8262185: G1: Prune collection set candidates early [v3] In-Reply-To: References: Message-ID: > Hello, > > can I have reviews for this change to the collection candidate selection procedure, moving the G1HeapWastePercent exclusion criteria right after candidate selection instead of at the end of mixed gcs? > > This can save lots of memory with negligible other impact. > > Long version: > Currently G1 maintains collection set candidates from the marking phase (where it determines those) until the end of the mixed gc phase. > > Mixed gc phase ends when the amount of space that can be reclaimed in the java heap with the remaining collection set candidates is smaller than G1HeapWastePercent of the (current) heap capacity. > > This means that in some cases a significant amount of memory is used for regions that will never be evacuated. In addition to that, maintaining the remembered sets for these never evacuated regions uses execution time and more memory for the remembered set. > > In fact, in some cases, it can happen that in the first few mixed gcs of a mixed gc phase, remembered set memory consumption *increases* even though the remembered sets of recently evacuated old gen regions are freed. > > The proposed alternative is to prune the collection set candidates as early as possible, filtering out regions that are never going to be evacuated (or have a very low probability). > > In some cases doing this can save half of peak remembered set memory usage. > > There are a few drawbacks here that should be considered during evaluation, also comparing against the old heuristic. > > * In the old heuristic, G1 checks *at the end* of gc whether the remaining collection set candidates are worth collecting (remaining collectible space < G1HeapWastePercent means: stop). Which helps with ensuring at least some forward progress in evacuating the heap because (assuming there are candidates) at least some space will be reclaimed. > (I do not know whether this behavior as is is intentional for that reason, but it has been there since initial implementation; then it did not really matter because G1 has been maintaining the old gen remembered sets for all regions all the time anyway). > This is approximated by not removing all of the candidates to have at least one "minimal" mixed collection in this change. > > * In some cases in the old heuristic G1 would just evacuate all candidate regions (in the extreme in a single gc) if the pause time permitted, reclaiming a bit more space (i.e. that amount < G1HeapWastePercent of the total heap). > You would expect (given the default value of 5), there will be more mixed cycles because of that with the new heuristic. > > Testing: tier1-5, Oracle internal performance test suite > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: sjohanss review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2693/files - new: https://git.openjdk.java.net/jdk/pull/2693/files/66736efa..5288539a Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2693&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2693&range=01-02 Stats: 9 lines in 1 file changed: 4 ins; 0 del; 5 mod Patch: https://git.openjdk.java.net/jdk/pull/2693.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2693/head:pull/2693 PR: https://git.openjdk.java.net/jdk/pull/2693 From sjohanss at openjdk.java.net Wed Feb 24 16:02:43 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 24 Feb 2021 16:02:43 GMT Subject: RFR: JDK-8256155: os::Linux Populate all large_page_sizes, select smallest page size in reserve_memory_special_huge_tlbfs* [v16] In-Reply-To: <5Hmhp7S8616Kfbdsu5ObzFNy2uUFgJPCp0kvHr-U310=.3cabbe74-fe65-436b-973d-d6f3e64cd743@github.com> References: <5Hmhp7S8616Kfbdsu5ObzFNy2uUFgJPCp0kvHr-U310=.3cabbe74-fe65-436b-973d-d6f3e64cd743@github.com> Message-ID: On Wed, 24 Feb 2021 15:09:15 GMT, Thomas Stuefe wrote: > > > What do you think? I think this would be a bit easier to read and understand, and we have that clear separation between scanning OS info and deciding what we do with it. > > > > > > I think what you propose Thomas looks good. One additional thing to keep in mind and think about here is how we should do the "sanity checking" when allowing multiple large page sizes. I think the best thing would be to sanity check all and if none succeeds disable `UseLargePages`. > > Oh, sure. I made this not explicit but implied this under "post processing and deciding". Presumably in the context of setup_large_page_type(). > Sure, got that, just wanted to highlight that we need to figure out how to handle the sanity check for multiple sizes. Should a size that fail the sanity check be removed from the `_page_sizes` member. Maybe `_page_sizes` should include all page sizes, and then we have an additional member for "useable large page sizes". As I said, not sure how to best handle this. > > > Still a small nit is that we let the user override the OS info with LargePageSizeInBytes. I rather would have a variable containing unmodified OS info, and a separate variable for whatever we make up. But thats just a small issue. > > > > > > I think we need to rethink exactly what `LargePageSizeInBytes` means when allowing multiple large page sizes. I've poked around in this area quite a bit lately and I'm not sure this flag is needed when we scan for available page sizes. But to allow it to go away we would have to change the APIs a bit to start passing down the page size we want to use for a certain mapping rather than using `os::large_page_size()` to get the page size. > > If we could do without this flag this would be fine for me too. But how would you let the user specify that the VM is to use a different default page size than is set on system level? I agree, it's not obvious how to make this work in a good way. But using the `os::page_size_for_region*` functions in the upper layers to request a page size could be one solution. But we probably need to have a way to change the "default" value for some cases. Another thing to think about/discuss is what should be done if a reservation-request within the VM for 4G with 1G pages fail, should we fall straight back to 4k page, should we try 2M page or possible fail hard to show something is probably wrong with the config. ------------- PR: https://git.openjdk.java.net/jdk/pull/1153 From tschatzl at openjdk.java.net Wed Feb 24 16:12:00 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Wed, 24 Feb 2021 16:12:00 GMT Subject: RFR: 8262185: G1: Prune collection set candidates early [v4] In-Reply-To: References: Message-ID: > Hello, > > can I have reviews for this change to the collection candidate selection procedure, moving the G1HeapWastePercent exclusion criteria right after candidate selection instead of at the end of mixed gcs? > > This can save lots of memory with negligible other impact. > > Long version: > Currently G1 maintains collection set candidates from the marking phase (where it determines those) until the end of the mixed gc phase. > > Mixed gc phase ends when the amount of space that can be reclaimed in the java heap with the remaining collection set candidates is smaller than G1HeapWastePercent of the (current) heap capacity. > > This means that in some cases a significant amount of memory is used for regions that will never be evacuated. In addition to that, maintaining the remembered sets for these never evacuated regions uses execution time and more memory for the remembered set. > > In fact, in some cases, it can happen that in the first few mixed gcs of a mixed gc phase, remembered set memory consumption *increases* even though the remembered sets of recently evacuated old gen regions are freed. > > The proposed alternative is to prune the collection set candidates as early as possible, filtering out regions that are never going to be evacuated (or have a very low probability). > > In some cases doing this can save half of peak remembered set memory usage. > > There are a few drawbacks here that should be considered during evaluation, also comparing against the old heuristic. > > * In the old heuristic, G1 checks *at the end* of gc whether the remaining collection set candidates are worth collecting (remaining collectible space < G1HeapWastePercent means: stop). Which helps with ensuring at least some forward progress in evacuating the heap because (assuming there are candidates) at least some space will be reclaimed. > (I do not know whether this behavior as is is intentional for that reason, but it has been there since initial implementation; then it did not really matter because G1 has been maintaining the old gen remembered sets for all regions all the time anyway). > This is approximated by not removing all of the candidates to have at least one "minimal" mixed collection in this change. > > * In some cases in the old heuristic G1 would just evacuate all candidate regions (in the extreme in a single gc) if the pause time permitted, reclaiming a bit more space (i.e. that amount < G1HeapWastePercent of the total heap). > You would expect (given the default value of 5), there will be more mixed cycles because of that with the new heuristic. > > Testing: tier1-5, Oracle internal performance test suite > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/gc/g1/g1CollectionSetCandidates.cpp Co-authored-by: Stefan Johansson <54407259+kstefanj at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2693/files - new: https://git.openjdk.java.net/jdk/pull/2693/files/5288539a..5675f39c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2693&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2693&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2693.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2693/head:pull/2693 PR: https://git.openjdk.java.net/jdk/pull/2693 From sjohanss at openjdk.java.net Wed Feb 24 16:12:00 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 24 Feb 2021 16:12:00 GMT Subject: RFR: 8262185: G1: Prune collection set candidates early [v4] In-Reply-To: References: Message-ID: On Wed, 24 Feb 2021 16:09:32 GMT, Thomas Schatzl wrote: >> Hello, >> >> can I have reviews for this change to the collection candidate selection procedure, moving the G1HeapWastePercent exclusion criteria right after candidate selection instead of at the end of mixed gcs? >> >> This can save lots of memory with negligible other impact. >> >> Long version: >> Currently G1 maintains collection set candidates from the marking phase (where it determines those) until the end of the mixed gc phase. >> >> Mixed gc phase ends when the amount of space that can be reclaimed in the java heap with the remaining collection set candidates is smaller than G1HeapWastePercent of the (current) heap capacity. >> >> This means that in some cases a significant amount of memory is used for regions that will never be evacuated. In addition to that, maintaining the remembered sets for these never evacuated regions uses execution time and more memory for the remembered set. >> >> In fact, in some cases, it can happen that in the first few mixed gcs of a mixed gc phase, remembered set memory consumption *increases* even though the remembered sets of recently evacuated old gen regions are freed. >> >> The proposed alternative is to prune the collection set candidates as early as possible, filtering out regions that are never going to be evacuated (or have a very low probability). >> >> In some cases doing this can save half of peak remembered set memory usage. >> >> There are a few drawbacks here that should be considered during evaluation, also comparing against the old heuristic. >> >> * In the old heuristic, G1 checks *at the end* of gc whether the remaining collection set candidates are worth collecting (remaining collectible space < G1HeapWastePercent means: stop). Which helps with ensuring at least some forward progress in evacuating the heap because (assuming there are candidates) at least some space will be reclaimed. >> (I do not know whether this behavior as is is intentional for that reason, but it has been there since initial implementation; then it did not really matter because G1 has been maintaining the old gen remembered sets for all regions all the time anyway). >> This is approximated by not removing all of the candidates to have at least one "minimal" mixed collection in this change. >> >> * In some cases in the old heuristic G1 would just evacuate all candidate regions (in the extreme in a single gc) if the pause time permitted, reclaiming a bit more space (i.e. that amount < G1HeapWastePercent of the total heap). >> You would expect (given the default value of 5), there will be more mixed cycles because of that with the new heuristic. >> >> Testing: tier1-5, Oracle internal performance test suite >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/gc/g1/g1CollectionSetCandidates.cpp > > Co-authored-by: Stefan Johansson <54407259+kstefanj at users.noreply.github.com> Looks good =) ------------- Marked as reviewed by sjohanss (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2693 From sjohanss at openjdk.java.net Wed Feb 24 16:12:02 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Wed, 24 Feb 2021 16:12:02 GMT Subject: RFR: 8262185: G1: Prune collection set candidates early [v3] In-Reply-To: References: Message-ID: On Wed, 24 Feb 2021 15:36:57 GMT, Thomas Schatzl wrote: >> Hello, >> >> can I have reviews for this change to the collection candidate selection procedure, moving the G1HeapWastePercent exclusion criteria right after candidate selection instead of at the end of mixed gcs? >> >> This can save lots of memory with negligible other impact. >> >> Long version: >> Currently G1 maintains collection set candidates from the marking phase (where it determines those) until the end of the mixed gc phase. >> >> Mixed gc phase ends when the amount of space that can be reclaimed in the java heap with the remaining collection set candidates is smaller than G1HeapWastePercent of the (current) heap capacity. >> >> This means that in some cases a significant amount of memory is used for regions that will never be evacuated. In addition to that, maintaining the remembered sets for these never evacuated regions uses execution time and more memory for the remembered set. >> >> In fact, in some cases, it can happen that in the first few mixed gcs of a mixed gc phase, remembered set memory consumption *increases* even though the remembered sets of recently evacuated old gen regions are freed. >> >> The proposed alternative is to prune the collection set candidates as early as possible, filtering out regions that are never going to be evacuated (or have a very low probability). >> >> In some cases doing this can save half of peak remembered set memory usage. >> >> There are a few drawbacks here that should be considered during evaluation, also comparing against the old heuristic. >> >> * In the old heuristic, G1 checks *at the end* of gc whether the remaining collection set candidates are worth collecting (remaining collectible space < G1HeapWastePercent means: stop). Which helps with ensuring at least some forward progress in evacuating the heap because (assuming there are candidates) at least some space will be reclaimed. >> (I do not know whether this behavior as is is intentional for that reason, but it has been there since initial implementation; then it did not really matter because G1 has been maintaining the old gen remembered sets for all regions all the time anyway). >> This is approximated by not removing all of the candidates to have at least one "minimal" mixed collection in this change. >> >> * In some cases in the old heuristic G1 would just evacuate all candidate regions (in the extreme in a single gc) if the pause time permitted, reclaiming a bit more space (i.e. that amount < G1HeapWastePercent of the total heap). >> You would expect (given the default value of 5), there will be more mixed cycles because of that with the new heuristic. >> >> Testing: tier1-5, Oracle internal performance test suite >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > sjohanss review src/hotspot/share/gc/g1/g1CollectionSetCandidates.cpp line 45: > 43: uint cur_idx = regions_left - 1; > 44: // Do not prune more than prune_total_bytes. > 45: if ((at(cur_idx)->reclaimable_bytes() + pruned_bytes) <= prune_total_bytes) { This check should be the other way around, right? Suggestion: if ((at(cur_idx)->reclaimable_bytes() + pruned_bytes) > prune_total_bytes) { ------------- PR: https://git.openjdk.java.net/jdk/pull/2693 From kbarrett at openjdk.java.net Wed Feb 24 18:01:46 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Wed, 24 Feb 2021 18:01:46 GMT Subject: RFR: 8261804: Remove field _processing_is_mt, calculate it instead In-Reply-To: References: Message-ID: On Wed, 24 Feb 2021 11:59:48 GMT, Leo Korinth wrote: > In the reference processor, remove _processing_is_mt. Instead calculate its value in its accessor, processing_is_mt(). > > This change will remove some state, make RefProcMTDegreeAdjuster a bit simpler and make it much easier to derive when processing is indeed multi threaded. Other than possibly moving the definition of `processing_is_mt()` to the .cpp file, this looks good. src/hotspot/share/gc/shared/referenceProcessor.hpp line 417: > 415: > 416: // Whether we are in a phase when _processing_ is MT. > 417: bool processing_is_mt() const { return ParallelRefProcEnabled && _num_queues > 1; } I don't think this needs to be inline, and I think moving it to the .cpp file would avoid needing to include gc_globals.hpp here. ------------- Marked as reviewed by kbarrett (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2704 From tschatzl at openjdk.java.net Thu Feb 25 12:00:45 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 25 Feb 2021 12:00:45 GMT Subject: RFR: 8261804: Remove field _processing_is_mt, calculate it instead In-Reply-To: References: Message-ID: On Wed, 24 Feb 2021 11:59:48 GMT, Leo Korinth wrote: > In the reference processor, remove _processing_is_mt. Instead calculate its value in its accessor, processing_is_mt(). > > This change will remove some state, make RefProcMTDegreeAdjuster a bit simpler and make it much easier to derive when processing is indeed multi threaded. Please move the implementation of `processing_is_mt` to the .cpp files as @kimbarrett suggested. Lgtm. ------------- Marked as reviewed by tschatzl (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2704 From lkorinth at openjdk.java.net Thu Feb 25 13:09:40 2021 From: lkorinth at openjdk.java.net (Leo Korinth) Date: Thu, 25 Feb 2021 13:09:40 GMT Subject: RFR: 8261804: Remove field _processing_is_mt, calculate it instead In-Reply-To: References: Message-ID: On Wed, 24 Feb 2021 17:58:29 GMT, Kim Barrett wrote: >> In the reference processor, remove _processing_is_mt. Instead calculate its value in its accessor, processing_is_mt(). >> >> This change will remove some state, make RefProcMTDegreeAdjuster a bit simpler and make it much easier to derive when processing is indeed multi threaded. > > src/hotspot/share/gc/shared/referenceProcessor.hpp line 417: > >> 415: >> 416: // Whether we are in a phase when _processing_ is MT. >> 417: bool processing_is_mt() const { return ParallelRefProcEnabled && _num_queues > 1; } > > I don't think this needs to be inline, and I think moving it to the .cpp file would avoid needing to include gc_globals.hpp here. I agree and will fix it, thanks for spotting it! ------------- PR: https://git.openjdk.java.net/jdk/pull/2704 From tschatzl at openjdk.java.net Thu Feb 25 14:21:07 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 25 Feb 2021 14:21:07 GMT Subject: RFR: 8262185: G1: Prune collection set candidates early [v5] In-Reply-To: References: Message-ID: > Hello, > > can I have reviews for this change to the collection candidate selection procedure, moving the G1HeapWastePercent exclusion criteria right after candidate selection instead of at the end of mixed gcs? > > This can save lots of memory with negligible other impact. > > Long version: > Currently G1 maintains collection set candidates from the marking phase (where it determines those) until the end of the mixed gc phase. > > Mixed gc phase ends when the amount of space that can be reclaimed in the java heap with the remaining collection set candidates is smaller than G1HeapWastePercent of the (current) heap capacity. > > This means that in some cases a significant amount of memory is used for regions that will never be evacuated. In addition to that, maintaining the remembered sets for these never evacuated regions uses execution time and more memory for the remembered set. > > In fact, in some cases, it can happen that in the first few mixed gcs of a mixed gc phase, remembered set memory consumption *increases* even though the remembered sets of recently evacuated old gen regions are freed. > > The proposed alternative is to prune the collection set candidates as early as possible, filtering out regions that are never going to be evacuated (or have a very low probability). > > In some cases doing this can save half of peak remembered set memory usage. > > There are a few drawbacks here that should be considered during evaluation, also comparing against the old heuristic. > > * In the old heuristic, G1 checks *at the end* of gc whether the remaining collection set candidates are worth collecting (remaining collectible space < G1HeapWastePercent means: stop). Which helps with ensuring at least some forward progress in evacuating the heap because (assuming there are candidates) at least some space will be reclaimed. > (I do not know whether this behavior as is is intentional for that reason, but it has been there since initial implementation; then it did not really matter because G1 has been maintaining the old gen remembered sets for all regions all the time anyway). > This is approximated by not removing all of the candidates to have at least one "minimal" mixed collection in this change. > > * In some cases in the old heuristic G1 would just evacuate all candidate regions (in the extreme in a single gc) if the pause time permitted, reclaiming a bit more space (i.e. that amount < G1HeapWastePercent of the total heap). > You would expect (given the default value of 5), there will be more mixed cycles because of that with the new heuristic. > > Testing: tier1-5, Oracle internal performance test suite > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: ayang-review ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2693/files - new: https://git.openjdk.java.net/jdk/pull/2693/files/5675f39c..62b498df Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2693&range=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2693&range=03-04 Stats: 28 lines in 5 files changed: 13 ins; 2 del; 13 mod Patch: https://git.openjdk.java.net/jdk/pull/2693.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2693/head:pull/2693 PR: https://git.openjdk.java.net/jdk/pull/2693 From ayang at openjdk.java.net Thu Feb 25 14:26:41 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Thu, 25 Feb 2021 14:26:41 GMT Subject: RFR: 8262185: G1: Prune collection set candidates early [v5] In-Reply-To: References: Message-ID: On Thu, 25 Feb 2021 14:21:07 GMT, Thomas Schatzl wrote: >> Hello, >> >> can I have reviews for this change to the collection candidate selection procedure, moving the G1HeapWastePercent exclusion criteria right after candidate selection instead of at the end of mixed gcs? >> >> This can save lots of memory with negligible other impact. >> >> Long version: >> Currently G1 maintains collection set candidates from the marking phase (where it determines those) until the end of the mixed gc phase. >> >> Mixed gc phase ends when the amount of space that can be reclaimed in the java heap with the remaining collection set candidates is smaller than G1HeapWastePercent of the (current) heap capacity. >> >> This means that in some cases a significant amount of memory is used for regions that will never be evacuated. In addition to that, maintaining the remembered sets for these never evacuated regions uses execution time and more memory for the remembered set. >> >> In fact, in some cases, it can happen that in the first few mixed gcs of a mixed gc phase, remembered set memory consumption *increases* even though the remembered sets of recently evacuated old gen regions are freed. >> >> The proposed alternative is to prune the collection set candidates as early as possible, filtering out regions that are never going to be evacuated (or have a very low probability). >> >> In some cases doing this can save half of peak remembered set memory usage. >> >> There are a few drawbacks here that should be considered during evaluation, also comparing against the old heuristic. >> >> * In the old heuristic, G1 checks *at the end* of gc whether the remaining collection set candidates are worth collecting (remaining collectible space < G1HeapWastePercent means: stop). Which helps with ensuring at least some forward progress in evacuating the heap because (assuming there are candidates) at least some space will be reclaimed. >> (I do not know whether this behavior as is is intentional for that reason, but it has been there since initial implementation; then it did not really matter because G1 has been maintaining the old gen remembered sets for all regions all the time anyway). >> This is approximated by not removing all of the candidates to have at least one "minimal" mixed collection in this change. >> >> * In some cases in the old heuristic G1 would just evacuate all candidate regions (in the extreme in a single gc) if the pause time permitted, reclaiming a bit more space (i.e. that amount < G1HeapWastePercent of the total heap). >> You would expect (given the default value of 5), there will be more mixed cycles because of that with the new heuristic. >> >> Testing: tier1-5, Oracle internal performance test suite >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > ayang-review Marked as reviewed by ayang (Author). ------------- PR: https://git.openjdk.java.net/jdk/pull/2693 From lkorinth at openjdk.java.net Thu Feb 25 14:35:08 2021 From: lkorinth at openjdk.java.net (Leo Korinth) Date: Thu, 25 Feb 2021 14:35:08 GMT Subject: RFR: 8261804: Remove field _processing_is_mt, calculate it instead [v2] In-Reply-To: References: Message-ID: <3shY2Cc79MQM7T80q69rpHVT4mqvpNdQfZ0mQZO_KqY=.cee50bfa-a063-4e57-bed4-a20f41977e17@github.com> > In the reference processor, remove _processing_is_mt. Instead calculate its value in its accessor, processing_is_mt(). > > This change will remove some state, make RefProcMTDegreeAdjuster a bit simpler and make it much easier to derive when processing is indeed multi threaded. Leo Korinth has updated the pull request incrementally with one additional commit since the last revision: Fixup suggested by Kim ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2704/files - new: https://git.openjdk.java.net/jdk/pull/2704/files/5b80656f..eff32142 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2704&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2704&range=00-01 Stats: 8 lines in 2 files changed: 5 ins; 2 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2704.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2704/head:pull/2704 PR: https://git.openjdk.java.net/jdk/pull/2704 From tschatzl at openjdk.java.net Thu Feb 25 14:42:40 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Thu, 25 Feb 2021 14:42:40 GMT Subject: RFR: 8262185: G1: Prune collection set candidates early [v5] In-Reply-To: References: Message-ID: On Thu, 25 Feb 2021 14:23:28 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> ayang-review > > Marked as reviewed by ayang (Author). Albert and me were discussing some further code improvements that this last change implements. ------------- PR: https://git.openjdk.java.net/jdk/pull/2693 From lkorinth at openjdk.java.net Thu Feb 25 14:42:42 2021 From: lkorinth at openjdk.java.net (Leo Korinth) Date: Thu, 25 Feb 2021 14:42:42 GMT Subject: RFR: 8261804: Remove field _processing_is_mt, calculate it instead [v2] In-Reply-To: References: Message-ID: On Thu, 25 Feb 2021 11:58:01 GMT, Thomas Schatzl wrote: >> Leo Korinth has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixup suggested by Kim > > Please move the implementation of `processing_is_mt` to the .cpp files as @kimbarrett suggested. > > Lgtm. I moved the flag-reading into the .cpp as suggested by Kim. I will not change _discovery_is_mt in this fix; I created https://bugs.openjdk.java.net/browse/JDK-8262367 to possibly fix it in the future. ------------- PR: https://git.openjdk.java.net/jdk/pull/2704 From zgu at openjdk.java.net Thu Feb 25 20:10:53 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 25 Feb 2021 20:10:53 GMT Subject: RFR: 8262398: Shenandoah: Disable nmethod barrier and stack watermark when running with passive mode Message-ID: nmethod barrier and stack watermark allow GC not to process nmethods at GC pauses, and aim to reduce GC latency, they do not benefit STW GCs, who process nmethods at pauses anyway. Test: - [x] hotspot_gc_shenandoah - [x] tier1 with -XX:+UseShenandoahGC - [x] tier1 with -XX:+UseShenandoahGC -XX:ShenandoahGCMode=passive - [x] tier1 with -XX:+UseShenandoahGC -XX:ShenandoahGCMode=passive -XX:-ShenandoahDegeneratedGC ------------- Commit messages: - JDK-8262398 - init Changes: https://git.openjdk.java.net/jdk/pull/2727/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2727&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8262398 Stats: 39 lines in 11 files changed: 21 ins; 0 del; 18 mod Patch: https://git.openjdk.java.net/jdk/pull/2727.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2727/head:pull/2727 PR: https://git.openjdk.java.net/jdk/pull/2727 From rkennke at openjdk.java.net Thu Feb 25 20:23:38 2021 From: rkennke at openjdk.java.net (Roman Kennke) Date: Thu, 25 Feb 2021 20:23:38 GMT Subject: RFR: 8262398: Shenandoah: Disable nmethod barrier and stack watermark when running with passive mode In-Reply-To: References: Message-ID: On Thu, 25 Feb 2021 20:06:40 GMT, Zhengyu Gu wrote: > nmethod barrier and stack watermark allow GC not to process nmethods at GC pauses, and aim to reduce GC latency, they do not benefit STW GCs, who process nmethods at pauses anyway. > > Test: > > - [x] hotspot_gc_shenandoah > - [x] tier1 with -XX:+UseShenandoahGC > - [x] tier1 with -XX:+UseShenandoahGC -XX:ShenandoahGCMode=passive > - [x] tier1 with -XX:+UseShenandoahGC -XX:ShenandoahGCMode=passive -XX:-ShenandoahDegeneratedGC If it is possible at all, it would be most useful if we could have flags like ShenandoahNMethodBarrier and ShenandoahStackWatermark, and let them drive enabling and disabling of the corresponding barriers/features. It would be even better, if we could configure concurrent modes such that they do STW class-unloading and stack-processing (if possible with reasonable amount of work). That would help ports where those features are not yet present (current ports under development with the problem are PPC and Graal). ------------- PR: https://git.openjdk.java.net/jdk/pull/2727 From zgu at openjdk.java.net Thu Feb 25 21:15:39 2021 From: zgu at openjdk.java.net (Zhengyu Gu) Date: Thu, 25 Feb 2021 21:15:39 GMT Subject: RFR: 8262398: Shenandoah: Disable nmethod barrier and stack watermark when running with passive mode In-Reply-To: References: Message-ID: On Thu, 25 Feb 2021 20:20:55 GMT, Roman Kennke wrote: > If it is possible at all, it would be most useful if we could have flags like ShenandoahNMethodBarrier and ShenandoahStackWatermark, and let them drive enabling and disabling of the corresponding barriers/features. It would be even better, if we could configure concurrent modes such that they do STW class-unloading and stack-processing (if possible with reasonable amount of work). That would help ports where those features are not yet present (current ports under development with the problem are PPC and Graal). Disabling nmethod barrier and/or stack watermark for concurrent GC means crashes, we no longer have backup (e.g. processing thread roots and code roots at pauses) so I don't think it will help ports. ------------- PR: https://git.openjdk.java.net/jdk/pull/2727 From tschatzl at openjdk.java.net Fri Feb 26 09:03:02 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 26 Feb 2021 09:03:02 GMT Subject: RFR: 8262185: G1: Prune collection set candidates early [v6] In-Reply-To: References: Message-ID: > Hello, > > can I have reviews for this change to the collection candidate selection procedure, moving the G1HeapWastePercent exclusion criteria right after candidate selection instead of at the end of mixed gcs? > > This can save lots of memory with negligible other impact. > > Long version: > Currently G1 maintains collection set candidates from the marking phase (where it determines those) until the end of the mixed gc phase. > > Mixed gc phase ends when the amount of space that can be reclaimed in the java heap with the remaining collection set candidates is smaller than G1HeapWastePercent of the (current) heap capacity. > > This means that in some cases a significant amount of memory is used for regions that will never be evacuated. In addition to that, maintaining the remembered sets for these never evacuated regions uses execution time and more memory for the remembered set. > > In fact, in some cases, it can happen that in the first few mixed gcs of a mixed gc phase, remembered set memory consumption *increases* even though the remembered sets of recently evacuated old gen regions are freed. > > The proposed alternative is to prune the collection set candidates as early as possible, filtering out regions that are never going to be evacuated (or have a very low probability). > > In some cases doing this can save half of peak remembered set memory usage. > > There are a few drawbacks here that should be considered during evaluation, also comparing against the old heuristic. > > * In the old heuristic, G1 checks *at the end* of gc whether the remaining collection set candidates are worth collecting (remaining collectible space < G1HeapWastePercent means: stop). Which helps with ensuring at least some forward progress in evacuating the heap because (assuming there are candidates) at least some space will be reclaimed. > (I do not know whether this behavior as is is intentional for that reason, but it has been there since initial implementation; then it did not really matter because G1 has been maintaining the old gen remembered sets for all regions all the time anyway). > This is approximated by not removing all of the candidates to have at least one "minimal" mixed collection in this change. > > * In some cases in the old heuristic G1 would just evacuate all candidate regions (in the extreme in a single gc) if the pause time permitted, reclaiming a bit more space (i.e. that amount < G1HeapWastePercent of the total heap). > You would expect (given the default value of 5), there will be more mixed cycles because of that with the new heuristic. > > Testing: tier1-5, Oracle internal performance test suite > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: refactoring ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2693/files - new: https://git.openjdk.java.net/jdk/pull/2693/files/62b498df..60cbd5d4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2693&range=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2693&range=04-05 Stats: 123 lines in 6 files changed: 88 ins; 24 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/2693.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2693/head:pull/2693 PR: https://git.openjdk.java.net/jdk/pull/2693 From tschatzl at openjdk.java.net Fri Feb 26 09:03:02 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 26 Feb 2021 09:03:02 GMT Subject: RFR: 8262185: G1: Prune collection set candidates early [v4] In-Reply-To: References: Message-ID: On Wed, 24 Feb 2021 16:08:59 GMT, Stefan Johansson wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/hotspot/share/gc/g1/g1CollectionSetCandidates.cpp >> >> Co-authored-by: Stefan Johansson <54407259+kstefanj at users.noreply.github.com> > > Looks good =) ... and then @kstefanj also chimed in and we thought of moving the logic completely away from `G1CollectionSetCandidates` which is in spirit of the current implementation. tier1 tested again, local testing that it works. ------------- PR: https://git.openjdk.java.net/jdk/pull/2693 From sjohanss at openjdk.java.net Fri Feb 26 10:30:41 2021 From: sjohanss at openjdk.java.net (Stefan Johansson) Date: Fri, 26 Feb 2021 10:30:41 GMT Subject: RFR: 8262185: G1: Prune collection set candidates early [v6] In-Reply-To: References: Message-ID: On Fri, 26 Feb 2021 09:03:02 GMT, Thomas Schatzl wrote: >> Hello, >> >> can I have reviews for this change to the collection candidate selection procedure, moving the G1HeapWastePercent exclusion criteria right after candidate selection instead of at the end of mixed gcs? >> >> This can save lots of memory with negligible other impact. >> >> Long version: >> Currently G1 maintains collection set candidates from the marking phase (where it determines those) until the end of the mixed gc phase. >> >> Mixed gc phase ends when the amount of space that can be reclaimed in the java heap with the remaining collection set candidates is smaller than G1HeapWastePercent of the (current) heap capacity. >> >> This means that in some cases a significant amount of memory is used for regions that will never be evacuated. In addition to that, maintaining the remembered sets for these never evacuated regions uses execution time and more memory for the remembered set. >> >> In fact, in some cases, it can happen that in the first few mixed gcs of a mixed gc phase, remembered set memory consumption *increases* even though the remembered sets of recently evacuated old gen regions are freed. >> >> The proposed alternative is to prune the collection set candidates as early as possible, filtering out regions that are never going to be evacuated (or have a very low probability). >> >> In some cases doing this can save half of peak remembered set memory usage. >> >> There are a few drawbacks here that should be considered during evaluation, also comparing against the old heuristic. >> >> * In the old heuristic, G1 checks *at the end* of gc whether the remaining collection set candidates are worth collecting (remaining collectible space < G1HeapWastePercent means: stop). Which helps with ensuring at least some forward progress in evacuating the heap because (assuming there are candidates) at least some space will be reclaimed. >> (I do not know whether this behavior as is is intentional for that reason, but it has been there since initial implementation; then it did not really matter because G1 has been maintaining the old gen remembered sets for all regions all the time anyway). >> This is approximated by not removing all of the candidates to have at least one "minimal" mixed collection in this change. >> >> * In some cases in the old heuristic G1 would just evacuate all candidate regions (in the extreme in a single gc) if the pause time permitted, reclaiming a bit more space (i.e. that amount < G1HeapWastePercent of the total heap). >> You would expect (given the default value of 5), there will be more mixed cycles because of that with the new heuristic. >> >> Testing: tier1-5, Oracle internal performance test suite >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > refactoring Looks good, apart from the now unused code. src/hotspot/share/gc/g1/g1CollectionSetCandidates.cpp line 73: > 71: _remaining_reclaimable_bytes -= pruned_bytes; > 72: _num_regions = regions_left; > 73: } Not used anymore, right? Suggestion: src/hotspot/share/gc/g1/g1CollectionSetCandidates.hpp line 81: > 79: // collection set candidate regions first. Applies cl on the pruned regions. > 80: void prune(uint keep_min_regions, size_t prune_total_bytes, HeapRegionClosure* cl); > 81: Same here I guess. Suggestion: ------------- Marked as reviewed by sjohanss (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2693 From tschatzl at openjdk.java.net Fri Feb 26 10:41:00 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 26 Feb 2021 10:41:00 GMT Subject: RFR: 8262185: G1: Prune collection set candidates early [v7] In-Reply-To: References: Message-ID: > Hello, > > can I have reviews for this change to the collection candidate selection procedure, moving the G1HeapWastePercent exclusion criteria right after candidate selection instead of at the end of mixed gcs? > > This can save lots of memory with negligible other impact. > > Long version: > Currently G1 maintains collection set candidates from the marking phase (where it determines those) until the end of the mixed gc phase. > > Mixed gc phase ends when the amount of space that can be reclaimed in the java heap with the remaining collection set candidates is smaller than G1HeapWastePercent of the (current) heap capacity. > > This means that in some cases a significant amount of memory is used for regions that will never be evacuated. In addition to that, maintaining the remembered sets for these never evacuated regions uses execution time and more memory for the remembered set. > > In fact, in some cases, it can happen that in the first few mixed gcs of a mixed gc phase, remembered set memory consumption *increases* even though the remembered sets of recently evacuated old gen regions are freed. > > The proposed alternative is to prune the collection set candidates as early as possible, filtering out regions that are never going to be evacuated (or have a very low probability). > > In some cases doing this can save half of peak remembered set memory usage. > > There are a few drawbacks here that should be considered during evaluation, also comparing against the old heuristic. > > * In the old heuristic, G1 checks *at the end* of gc whether the remaining collection set candidates are worth collecting (remaining collectible space < G1HeapWastePercent means: stop). Which helps with ensuring at least some forward progress in evacuating the heap because (assuming there are candidates) at least some space will be reclaimed. > (I do not know whether this behavior as is is intentional for that reason, but it has been there since initial implementation; then it did not really matter because G1 has been maintaining the old gen remembered sets for all regions all the time anyway). > This is approximated by not removing all of the candidates to have at least one "minimal" mixed collection in this change. > > * In some cases in the old heuristic G1 would just evacuate all candidate regions (in the extreme in a single gc) if the pause time permitted, reclaiming a bit more space (i.e. that amount < G1HeapWastePercent of the total heap). > You would expect (given the default value of 5), there will be more mixed cycles because of that with the new heuristic. > > Testing: tier1-5, Oracle internal performance test suite > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: Remove now unused code ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2693/files - new: https://git.openjdk.java.net/jdk/pull/2693/files/60cbd5d4..ac6a2709 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2693&range=06 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2693&range=05-06 Stats: 27 lines in 2 files changed: 0 ins; 27 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2693.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2693/head:pull/2693 PR: https://git.openjdk.java.net/jdk/pull/2693 From tschatzl at openjdk.java.net Fri Feb 26 10:45:43 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 26 Feb 2021 10:45:43 GMT Subject: RFR: 8262185: G1: Prune collection set candidates early [v6] In-Reply-To: References: Message-ID: On Fri, 26 Feb 2021 10:27:52 GMT, Stefan Johansson wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> refactoring > > Looks good, apart from the now unused code. Thanks, removed the obsolete code. ------------- PR: https://git.openjdk.java.net/jdk/pull/2693 From tschatzl at openjdk.java.net Fri Feb 26 15:11:00 2021 From: tschatzl at openjdk.java.net (Thomas Schatzl) Date: Fri, 26 Feb 2021 15:11:00 GMT Subject: RFR: 8262185: G1: Prune collection set candidates early [v8] In-Reply-To: References: Message-ID: > Hello, > > can I have reviews for this change to the collection candidate selection procedure, moving the G1HeapWastePercent exclusion criteria right after candidate selection instead of at the end of mixed gcs? > > This can save lots of memory with negligible other impact. > > Long version: > Currently G1 maintains collection set candidates from the marking phase (where it determines those) until the end of the mixed gc phase. > > Mixed gc phase ends when the amount of space that can be reclaimed in the java heap with the remaining collection set candidates is smaller than G1HeapWastePercent of the (current) heap capacity. > > This means that in some cases a significant amount of memory is used for regions that will never be evacuated. In addition to that, maintaining the remembered sets for these never evacuated regions uses execution time and more memory for the remembered set. > > In fact, in some cases, it can happen that in the first few mixed gcs of a mixed gc phase, remembered set memory consumption *increases* even though the remembered sets of recently evacuated old gen regions are freed. > > The proposed alternative is to prune the collection set candidates as early as possible, filtering out regions that are never going to be evacuated (or have a very low probability). > > In some cases doing this can save half of peak remembered set memory usage. > > There are a few drawbacks here that should be considered during evaluation, also comparing against the old heuristic. > > * In the old heuristic, G1 checks *at the end* of gc whether the remaining collection set candidates are worth collecting (remaining collectible space < G1HeapWastePercent means: stop). Which helps with ensuring at least some forward progress in evacuating the heap because (assuming there are candidates) at least some space will be reclaimed. > (I do not know whether this behavior as is is intentional for that reason, but it has been there since initial implementation; then it did not really matter because G1 has been maintaining the old gen remembered sets for all regions all the time anyway). > This is approximated by not removing all of the candidates to have at least one "minimal" mixed collection in this change. > > * In some cases in the old heuristic G1 would just evacuate all candidate regions (in the extreme in a single gc) if the pause time permitted, reclaiming a bit more space (i.e. that amount < G1HeapWastePercent of the total heap). > You would expect (given the default value of 5), there will be more mixed cycles because of that with the new heuristic. > > Testing: tier1-5, Oracle internal performance test suite > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: Optimizations ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2693/files - new: https://git.openjdk.java.net/jdk/pull/2693/files/ac6a2709..d3de8f66 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2693&range=07 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2693&range=06-07 Stats: 11 lines in 3 files changed: 5 ins; 0 del; 6 mod Patch: https://git.openjdk.java.net/jdk/pull/2693.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2693/head:pull/2693 PR: https://git.openjdk.java.net/jdk/pull/2693 From kbarrett at openjdk.java.net Fri Feb 26 20:05:49 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Fri, 26 Feb 2021 20:05:49 GMT Subject: RFR: 8261804: Remove field _processing_is_mt, calculate it instead [v2] In-Reply-To: <3shY2Cc79MQM7T80q69rpHVT4mqvpNdQfZ0mQZO_KqY=.cee50bfa-a063-4e57-bed4-a20f41977e17@github.com> References: <3shY2Cc79MQM7T80q69rpHVT4mqvpNdQfZ0mQZO_KqY=.cee50bfa-a063-4e57-bed4-a20f41977e17@github.com> Message-ID: On Thu, 25 Feb 2021 14:35:08 GMT, Leo Korinth wrote: >> In the reference processor, remove _processing_is_mt. Instead calculate its value in its accessor, processing_is_mt(). >> >> This change will remove some state, make RefProcMTDegreeAdjuster a bit simpler and make it much easier to derive when processing is indeed multi threaded. > > Leo Korinth has updated the pull request incrementally with one additional commit since the last revision: > > Fixup suggested by Kim Marked as reviewed by kbarrett (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/2704 From lkorinth at openjdk.java.net Fri Feb 26 20:11:44 2021 From: lkorinth at openjdk.java.net (Leo Korinth) Date: Fri, 26 Feb 2021 20:11:44 GMT Subject: Integrated: 8261804: Remove field _processing_is_mt, calculate it instead In-Reply-To: References: Message-ID: On Wed, 24 Feb 2021 11:59:48 GMT, Leo Korinth wrote: > In the reference processor, remove _processing_is_mt. Instead calculate its value in its accessor, processing_is_mt(). > > This change will remove some state, make RefProcMTDegreeAdjuster a bit simpler and make it much easier to derive when processing is indeed multi threaded. This pull request has now been integrated. Changeset: 03d888f4 Author: Leo Korinth URL: https://git.openjdk.java.net/jdk/commit/03d888f4 Stats: 45 lines in 5 files changed: 6 ins; 17 del; 22 mod 8261804: Remove field _processing_is_mt, calculate it instead Reviewed-by: ayang, kbarrett, tschatzl ------------- PR: https://git.openjdk.java.net/jdk/pull/2704 From lihuaming3 at huawei.com Sat Feb 27 06:32:29 2021 From: lihuaming3 at huawei.com (Hamlin) Date: Sat, 27 Feb 2021 14:32:29 +0800 Subject: RFR of JDK-8262068: Improve G1 Full GC by skipping compaction for regions with high survival ratio Message-ID: <5dd6445b-402c-8ad0-267e-19becb477044@huawei.com> Hi, Would you please help to review an improvement in G1 full GC? bug: https://bugs.openjdk.java.net/browse/JDK-8262068 webrev: https://github.com/openjdk/jdk/pull/2760/commits/4be6c18e2201fc8d22ee0f31d11ca7892be21a43 Summary ----------- Improve G1 Full GC by skip compaction for regions with high survival ratio. Backgroud ----------- There are 4 steps in full gc of G1 GC. - mark live objects - prepare forwardee - adjust pointers - compact When full gc occurs, there may be very high percentage of live bytes in some regions. For these regions, it's not efficient to compact them and better to skip them, as there are little space to save but many objects to copy. Description ----------- We enhance the full gc implementation for the above situation through following steps: - accumulate live bytes of every hr in mark phase; - add hr's with high percentage of live bytes into a "no moving" set rather the normal compaction set in prepare phase, and fill dummy objects in the places of dead objects in these hr's; - nothing special is done in adjust phase; - just compact the regions in compaction set; VM options added ----------- - G1FullGCNoMoving: enable "no moving region" mode in G1 Full GC. - G1NoMovingRegionLiveBytesLowerThreshold: the lower threshold (percent) of heap region live bytes to skip compaction Test ----------- - specjbb2015: no regression - dacapo: 3%-11% improvement of full gc pause. Attachment is the dacapo h2 full gc pause. $ java -Xmx1g -Xms1g -XX:ParallelGCThreads=4 -Xlog:gc*=info:file=gc.log -jar dacapo-9.12-bach.jar --iterations 5 --size huge --no-pre-iteration-gc h2 Thanks, Hamlin From mli at openjdk.java.net Sat Feb 27 06:36:01 2021 From: mli at openjdk.java.net (Hamlin Li) Date: Sat, 27 Feb 2021 06:36:01 GMT Subject: RFR: JDK-8262068: Improve G1 Full GC by skipping compaction for regions with high survival ratio Message-ID: Summary ----------- Improve G1 Full GC by skip compaction for regions with high survival ratio. Backgroud ----------- There are 4 steps in full gc of G1 GC. - mark live objects - prepare forwardee - adjust pointers - compact When full gc occurs, there may be very high percentage of live bytes in some regions. For these regions, it's not efficient to compact them and better to skip them, as there are little space to save but many objects to copy. Description ----------- We enhance the full gc implementation for the above situation through following steps: - accumulate live bytes of every hr in mark phase; - add hr's with high percentage of live bytes into a "no moving" set rather the normal compaction set in prepare phase, and fill dummy objects in the places of dead objects in these hr's; - nothing special is done in adjust phase; - just compact the regions in compaction set; VM options added ----------- - G1FullGCNoMoving: enable "no moving region" mode in G1 Full GC. - G1NoMovingRegionLiveBytesLowerThreshold: the lower threshold (percent) of heap region live bytes to skip compaction Test ----------- - specjbb2015: no regression - dacapo: 3%-11% improvement of full gc pause. Attachment is the dacapo h2 full gc pause. $ java -Xmx1g -Xms1g -XX:ParallelGCThreads=4 -Xlog:gc*=info:file=gc.log -jar dacapo-9.12-bach.jar --iterations 5 --size huge --no-pre-iteration-gc h2 ------------- Commit messages: - JDK-8262068: Improve G1 Full GC by skipping compaction for regions with high survival ratio Changes: https://git.openjdk.java.net/jdk/pull/2760/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2760&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8262068 Stats: 287 lines in 16 files changed: 274 ins; 1 del; 12 mod Patch: https://git.openjdk.java.net/jdk/pull/2760.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2760/head:pull/2760 PR: https://git.openjdk.java.net/jdk/pull/2760 From lihuaming3 at huawei.com Sat Feb 27 06:59:03 2021 From: lihuaming3 at huawei.com (Hamlin) Date: Sat, 27 Feb 2021 14:59:03 +0800 Subject: RFR of JDK-8262068: Improve G1 Full GC by skipping compaction for regions with high survival ratio In-Reply-To: <5dd6445b-402c-8ad0-267e-19becb477044@huawei.com> References: <5dd6445b-402c-8ad0-267e-19becb477044@huawei.com> Message-ID: <95ff2979-75d5-ddd2-c066-714752c01986@huawei.com> Just modified some typo in the original email, please check below content to review again. Thanks, Hamlin ? 2021/2/27 14:32, Hamlin ??: > > Hi, > > Would you please help to review an improvement in G1 full GC? > > bug: https://bugs.openjdk.java.net/browse/JDK-8262068 > > webrev: > https://github.com/openjdk/jdk/pull/2760/commits/4be6c18e2201fc8d22ee0f31d11ca7892be21a43 > > Summary > ----------- > > Improve G1 Full GC by skip compaction for regions with high survival > ratio. > > Backgroud > ----------- > > There are 4 steps in full gc of G1 GC. > - mark live objects > - prepare forwardee > - adjust pointers > - compact > > When full gc occurs, there may be very high percentage of live bytes > in some regions. For these regions, it's not efficient to compact them > and better to skip them, as there are little space to save but many > objects to copy. > > Description > ----------- > > We enhance the full gc implementation for the above situation through > following steps: > - accumulate live bytes of every hr in mark phase; > - add hr's with high percentage of live bytes into a "no moving" set > rather the normal compaction set in prepare phase, and fill dummy > objects in the places of dead objects in these hr's; > - nothing special is done in adjust phase; > - just compact the regions in compaction set; > > VM options added > ----------- > > - G1SkipCompactionLiveBytesLowerThreshold: The lower threshold of heap > region live bytes percent in G1 full GC > > Test > ----------- > > - specjbb2015: no regression > - dacapo: 3%-11% improvement of full gc pause. Attachment is the > dacapo h2 full gc pause. > > $ java -Xmx1g -Xms1g -XX:G1SkipCompactionLiveBytesLowerThreshold=98 > -XX:ParallelGCThreads=4 -Xlog:gc*=info:file=gc.log -jar > dacapo-9.12-bach.jar --iterations 5 --size huge --no-pre-iteration-gc h2 > > > Thanks, > > Hamlin > From lihuaming3 at huawei.com Sat Feb 27 07:01:41 2021 From: lihuaming3 at huawei.com (Hamlin) Date: Sat, 27 Feb 2021 15:01:41 +0800 Subject: RFR of JDK-8262068: Improve G1 Full GC by skipping compaction for regions with high survival ratio In-Reply-To: <95ff2979-75d5-ddd2-c066-714752c01986@huawei.com> References: <5dd6445b-402c-8ad0-267e-19becb477044@huawei.com> <95ff2979-75d5-ddd2-c066-714752c01986@huawei.com> Message-ID: <1c90f849-0df8-dbe5-8b3f-fb50976494bd@huawei.com> webrev of original stype: https://openjdk.github.io/cr/?repo=jdk&pr=2760&range=00 ? 2021/2/27 14:59, Hamlin ??: > > Just modified some typo in the original email, please check below > content to review again. > > Thanks, > > Hamlin > > ? 2021/2/27 14:32, Hamlin ??: >> >> Hi, >> >> Would you please help to review an improvement in G1 full GC? >> >> bug: https://bugs.openjdk.java.net/browse/JDK-8262068 >> >> webrev: >> https://github.com/openjdk/jdk/pull/2760/commits/4be6c18e2201fc8d22ee0f31d11ca7892be21a43 >> >> Summary >> ----------- >> >> Improve G1 Full GC by skip compaction for regions with high survival >> ratio. >> >> Backgroud >> ----------- >> >> There are 4 steps in full gc of G1 GC. >> - mark live objects >> - prepare forwardee >> - adjust pointers >> - compact >> >> When full gc occurs, there may be very high percentage of live bytes >> in some regions. For these regions, it's not efficient to compact >> them and better to skip them, as there are little space to save but >> many objects to copy. >> >> Description >> ----------- >> >> We enhance the full gc implementation for the above situation through >> following steps: >> - accumulate live bytes of every hr in mark phase; >> - add hr's with high percentage of live bytes into a "no moving" set >> rather the normal compaction set in prepare phase, and fill dummy >> objects in the places of dead objects in these hr's; >> - nothing special is done in adjust phase; >> - just compact the regions in compaction set; >> >> VM options added >> ----------- >> >> - G1SkipCompactionLiveBytesLowerThreshold: The lower threshold of >> heap region live bytes percent in G1 full GC >> >> Test >> ----------- >> >> - specjbb2015: no regression >> - dacapo: 3%-11% improvement of full gc pause. Attachment is the >> dacapo h2 full gc pause. >> >> $ java -Xmx1g -Xms1g -XX:G1SkipCompactionLiveBytesLowerThreshold=98 >> -XX:ParallelGCThreads=4 -Xlog:gc*=info:file=gc.log -jar >> dacapo-9.12-bach.jar --iterations 5 --size huge --no-pre-iteration-gc h2 >> >> >> Thanks, >> >> Hamlin >> From ofirg6 at gmail.com Sat Feb 27 16:04:30 2021 From: ofirg6 at gmail.com (Ofir Gordon) Date: Sat, 27 Feb 2021 18:04:30 +0200 Subject: G1 Full GC Write Barrier Mechanism Message-ID: Hi all, I'm currently working on the JDK 14 version source code. I'm trying to understand the process of a full collection using the G1 gc. specifically, I can't follow the write barrier mechanism--where is it being activated? what exactly happens in the write barrier? (I see that there are both remember set and card tables, are they both being used for tracking changes in the heap during the collection?) Can anyone help me understand the basic flow of this gc in the code? I can see that the marking phase begins in G1FullCollector::collect()->phase1_mark_live_objects(), is this part running concurrently with the program? is the barrier being activated beforehand? (where?) In addition, is there a way to verify that the barrier is being executed during the marking? which code is supposed to run for this part? I know that this are a lot of questions, I'm just trying to figure out the basic flow of the process so if someone can give me an explanation that will help me start I'll be really thankful. Thanks a lot, Ofir From kim.barrett at oracle.com Sun Feb 28 19:34:27 2021 From: kim.barrett at oracle.com (Kim Barrett) Date: Sun, 28 Feb 2021 19:34:27 +0000 Subject: G1 Full GC Write Barrier Mechanism In-Reply-To: References: Message-ID: <7CE7AF30-8437-4FB8-BCF1-E120D3B87D84@oracle.com> > On Feb 27, 2021, at 11:04 AM, Ofir Gordon wrote: > > Hi all, > > I'm currently working on the JDK 14 version source code. > I'm trying to understand the process of a full collection using the G1 gc. > specifically, I can't follow the write barrier mechanism--where is it being > activated? what exactly happens in the write barrier? (I see that there are > both remember set and card tables, are they both being used for tracking > changes in the heap during the collection?) > > Can anyone help me understand the basic flow of this gc in the code? > I can see that the marking phase begins in > G1FullCollector::collect()->phase1_mark_live_objects(), is this part > running concurrently with the program? is the barrier being activated > beforehand? (where?) G1FullCollector and related classes (in files g1Full*) are for the G1 STW full GC, which doesn?t use the write barriers. The write post-barrier is used by G1 young/mixed STW collections to track locations that might contain inter-region references that need to be updated when objects are moved by those collections. This is what uses the card table and remembered sets. The write pre-barrier is used by the G1 concurrent oldgen collector to track values that were reachable at the start of the concurrent collection but might no longer be so because references have been overwritten before the concurrent collector gets around to examining the location. This is needed to maintain the SATB invariant. This doesn?t involve the card table and remembered sets. > In addition, is there a way to verify that the barrier is being executed > during the marking? which code is supposed to run for this part? The barrier implementations are part of the interpreter/compilers. The interpreter invokes relevant functions, and the compilers insert barriers in the generated code. This is all managed through the Access API, which provides an abstraction over the various GCs for use by non-GC code that needs to read or write object locations while maintaining whatever information the selected GC requires. > I know that this are a lot of questions, I'm just trying to figure out the > basic flow of the process so if someone can give me an explanation that > will help me start I'll be really thankful. > > Thanks a lot, > Ofir From kbarrett at openjdk.java.net Sun Feb 28 19:36:49 2021 From: kbarrett at openjdk.java.net (Kim Barrett) Date: Sun, 28 Feb 2021 19:36:49 GMT Subject: RFR: 8261859: gc/g1/TestStringDeduplicationTableRehash.java failed with "RuntimeException: 'Rehash Count: 0' found in stdout" Message-ID: Please review this fix of an intermittently failing string deduplication test. There are several problems. (1) Because of environmental or other unrelated changes, the test might simply fail. The test was considering it a failure if any GC reported a zero rehash count. But if the first GC triggered a resize, that would suppress the requested "rehash a lot", and could report a zero rehash count, failing the test. So the test criteria is changed to require at least one non-zero rehash count rather than no zero rehash counts. Since rehashes are normally unlikely, and the primary point is to exercise the rehash code, having some reported non-zero rehash count is sufficient. (2) Reporting only occurs if the string dedup thread was triggered and had work to do. If the initial collections all need resizes, and none of the subsequent ones have any work for the thread to do, then again we won't have any reported rehashes. The test is changed to always generate some new strings for later GCs to discover and queue for deduplication processing, causing the dedup thread to run and reporting at the end. (3) The table resizing mechanism was only doing one step (doubling or halving) of size change per collection. If the number of table entries is large (small), several GCs might be required for the table to grow (shrink) to the desired size. Once again, this could suppress table rehashes, causing the test to fail. It also wastes effort because the table needs to be resized multiple times when one right-sized resize would be sufficient. Resizing now computes the "final" size based on the number of entries and load factors, and may increase or decrease the table size by multiple powers of 2 in one resizing operation. Testing: Manual execution of the string dedup tests and examining their logs. Manual execution of the resize and rehash string dedup tests with a small initial table size, to similate an environment with a larger initial set of strings that triggers early resize. mach5 tier1 ------------- Commit messages: - fix rehash test - jump resize Changes: https://git.openjdk.java.net/jdk/pull/2769/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2769&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261859 Stats: 45 lines in 2 files changed: 20 ins; 6 del; 19 mod Patch: https://git.openjdk.java.net/jdk/pull/2769.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2769/head:pull/2769 PR: https://git.openjdk.java.net/jdk/pull/2769 From ayang at openjdk.java.net Sun Feb 28 21:56:40 2021 From: ayang at openjdk.java.net (Albert Mingkun Yang) Date: Sun, 28 Feb 2021 21:56:40 GMT Subject: RFR: 8261859: gc/g1/TestStringDeduplicationTableRehash.java failed with "RuntimeException: 'Rehash Count: 0' found in stdout" In-Reply-To: References: Message-ID: On Sun, 28 Feb 2021 19:31:48 GMT, Kim Barrett wrote: > Please review this fix of an intermittently failing string deduplication test. > > There are several problems. > > (1) Because of environmental or other unrelated changes, the test might > simply fail. The test was considering it a failure if any GC reported a > zero rehash count. But if the first GC triggered a resize, that would > suppress the requested "rehash a lot", and could report a zero rehash count, > failing the test. So the test criteria is changed to require at least one > non-zero rehash count rather than no zero rehash counts. Since rehashes are > normally unlikely, and the primary point is to exercise the rehash code, > having some reported non-zero rehash count is sufficient. > > (2) Reporting only occurs if the string dedup thread was triggered and had > work to do. If the initial collections all need resizes, and none of the > subsequent ones have any work for the thread to do, then again we won't have > any reported rehashes. The test is changed to always generate some new > strings for later GCs to discover and queue for deduplication processing, > causing the dedup thread to run and reporting at the end. > > (3) The table resizing mechanism was only doing one step (doubling or > halving) of size change per collection. If the number of table entries is > large (small), several GCs might be required for the table to grow (shrink) > to the desired size. Once again, this could suppress table rehashes, > causing the test to fail. It also wastes effort because the table needs to > be resized multiple times when one right-sized resize would be sufficient. > Resizing now computes the "final" size based on the number of entries and > load factors, and may increase or decrease the table size by multiple powers > of 2 in one resizing operation. > > Testing: > Manual execution of the string dedup tests and examining their logs. > > Manual execution of the resize and rehash string dedup tests with a small > initial table size, to similate an environment with a larger initial set of > strings that triggers early resize. > > mach5 tier1 Changes requested by ayang (Author). src/hotspot/share/gc/shared/stringdedup/stringDedupTable.cpp line 405: > 403: size = round_up_power_of_2(needed); > 404: } else { > 405: size = _max_size; It's not obvious to me which one is larger, btw `round_up_power_of_2(needed)` and `_max_size`. src/hotspot/share/gc/shared/stringdedup/stringDedupTable.cpp line 426: > 424: } > 425: } > 426: How about adding an assertion, sth like, `asssert(size >= _min_size && size <= _max_size)`? ------------- PR: https://git.openjdk.java.net/jdk/pull/2769