From maaartinus at gmail.com Mon Jan 1 20:29:22 2018 From: maaartinus at gmail.com (Martin Grajcar) Date: Mon, 1 Jan 2018 21:29:22 +0100 Subject: Using the Klass gap (Was: Master Thesis on Shenandoah) Message-ID: >* Am 08.11.2017 um 19:07 schrieb Dominik Inf?hr:* I was pondering the idea to squeeze the fwd ptr into the so-called > Klass-gap. This is 32 unused bits when the Klass* is compressed. It's > only available for non-arrays, because for arrays, the array-length is > squeezed into those 32bits. A possibly stupid question, but shouldn't it be the other way round? Currently, array length gets packed in a gap and you're thinking about using the gap -- when available -- for the fwd ptr. This sounds slow and complicated/ Can't you instead *always* use this gap for non-arrays and use a new slot for the array length? This saves memory for non-arrays in exactly the same way and needs no new conditional logic (I guess, arrays can already deal with the case they need a new slot for their length). From rkennke at redhat.com Tue Jan 2 11:23:23 2018 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 2 Jan 2018 12:23:23 +0100 Subject: Using the Klass gap (Was: Master Thesis on Shenandoah) In-Reply-To: References: Message-ID: Am 01.01.2018 um 21:29 schrieb Martin Grajcar: >> * Am 08.11.2017 um 19:07 schrieb Dominik Inf?hr:* > > I was pondering the idea to squeeze the fwd ptr into the so-called >> Klass-gap. This is 32 unused bits when the Klass* is compressed. It's >> only available for non-arrays, because for arrays, the array-length is >> squeezed into those 32bits. > > > A possibly stupid question, but shouldn't it be the other way round? > > Currently, array length gets packed in a gap and you're thinking about > using the gap -- when available -- for the fwd ptr. This sounds slow and > complicated/ > > Can't you instead *always* use this gap for non-arrays and use a new slot > for the array length? This saves memory for non-arrays in exactly the same > way and needs no new conditional logic (I guess, arrays can already deal > with the case they need a new slot for their length). Yes, this sounds like an attractive possibility. :-) Thanks, Roman From zgu at redhat.com Tue Jan 2 13:52:50 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 2 Jan 2018 08:52:50 -0500 Subject: Using the Klass gap (Was: Master Thesis on Shenandoah) In-Reply-To: References: Message-ID: On 01/02/2018 06:23 AM, Roman Kennke wrote: > Am 01.01.2018 um 21:29 schrieb Martin Grajcar: >>> * Am 08.11.2017 um 19:07 schrieb Dominik Inf?hr:* >> >> I was pondering the idea to squeeze the fwd ptr into the so-called >>> Klass-gap. This is 32 unused bits when the Klass* is compressed. It's >>> only available for non-arrays, because for arrays, the array-length is >>> squeezed into those 32bits. >> >> >> A possibly stupid question, but shouldn't it be the other way round? >> >> Currently, array length gets packed in a gap and you're thinking about >> using the gap -- when available -- for the fwd ptr. This sounds slow and >> complicated/ >> >> Can't you instead *always* use this gap for non-arrays and use a new slot >> for the array length? This saves memory for non-arrays in exactly the >> same >> way and needs no new conditional logic (I guess, arrays can already deal >> with the case they need a new slot for their length). > > Yes, this sounds like an attractive possibility. :-) Agree. I think Java will have to switch to 64-bit array index at some point. -Zhengyu > > Thanks, > Roman > From aph at redhat.com Tue Jan 2 14:07:11 2018 From: aph at redhat.com (Andrew Haley) Date: Tue, 2 Jan 2018 14:07:11 +0000 Subject: Using the Klass gap (Was: Master Thesis on Shenandoah) In-Reply-To: References: Message-ID: <5f0c8175-b04c-2b13-d54b-c16d4dc45804@redhat.com> On 02/01/18 11:23, Roman Kennke wrote: > Am 01.01.2018 um 21:29 schrieb Martin Grajcar: >>> * Am 08.11.2017 um 19:07 schrieb Dominik Inf?hr:* >> >> I was pondering the idea to squeeze the fwd ptr into the so-called >>> Klass-gap. This is 32 unused bits when the Klass* is compressed. It's >>> only available for non-arrays, because for arrays, the array-length is >>> squeezed into those 32bits. >> >> >> A possibly stupid question, but shouldn't it be the other way round? >> >> Currently, array length gets packed in a gap and you're thinking about >> using the gap -- when available -- for the fwd ptr. This sounds slow and >> complicated/ >> >> Can't you instead *always* use this gap for non-arrays and use a new slot >> for the array length? This saves memory for non-arrays in exactly the same >> way and needs no new conditional logic (I guess, arrays can already deal >> with the case they need a new slot for their length). > > Yes, this sounds like an attractive possibility. :-) Certainly. That negative index thing on the read barrier side is rather icky. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Tue Jan 2 14:21:30 2018 From: aph at redhat.com (Andrew Haley) Date: Tue, 2 Jan 2018 14:21:30 +0000 Subject: RFR: Check BS type in immByteMapBase predicate In-Reply-To: <07ece366-e8b1-96f5-2539-cbe07edd8a6d@redhat.com> References: <66f43903-b1ed-a8e5-0283-91afc81a5222@redhat.com> <5d41f45a-aca8-c74c-a4dd-37e327b586d3@kennke.org> <07ece366-e8b1-96f5-2539-cbe07edd8a6d@redhat.com> Message-ID: <29ec0b2c-e74d-3421-6051-f5675efe98fb@redhat.com> On 05/12/17 12:19, Aleksey Shipilev wrote: > On 12/05/2017 01:11 PM, Roman Kennke wrote: >> Am 05.12.2017 um 11:55 schrieb Aleksey Shipilev: >>> On 12/05/2017 11:50 AM, Roman Kennke wrote: >>> ?What would happen if code uses that operand, but new predicate mismatches it (e.g. in Shenandoah)? >> It cannot be used in Shenandoah because we don't? use the CardTableModRefBS. Checking for the BS >> type seems the safest way to prevent the bug. > > Oh, okay. > >>>> I intend to push backports of this to 9 and 8 too. Do I need extra reviews for those? >>> Since this is not 9- or 8u-specific, I think you just push to sh/jdk10, and then regular backports >>> process handles the propagation to sh/jdk9 and sh/jdk10. >> >> Ok. > > This is okay to go to sh/jdk10. Can you give aarch64 maintainers a heads-up about this fix? It > probably warrants the fix in upstream for other collector's benefit, like Epsilon. It looks OK. If we have a bug report we can push it to all live repos. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From rkennke at redhat.com Tue Jan 2 17:08:46 2018 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 2 Jan 2018 18:08:46 +0100 Subject: RFR: Check BS type in immByteMapBase predicate In-Reply-To: <29ec0b2c-e74d-3421-6051-f5675efe98fb@redhat.com> References: <66f43903-b1ed-a8e5-0283-91afc81a5222@redhat.com> <5d41f45a-aca8-c74c-a4dd-37e327b586d3@kennke.org> <07ece366-e8b1-96f5-2539-cbe07edd8a6d@redhat.com> <29ec0b2c-e74d-3421-6051-f5675efe98fb@redhat.com> Message-ID: <80ae30ef-b73c-af85-e193-f9a43c6b6764@redhat.com> Am 02.01.2018 um 15:21 schrieb Andrew Haley: > On 05/12/17 12:19, Aleksey Shipilev wrote: >> On 12/05/2017 01:11 PM, Roman Kennke wrote: >>> Am 05.12.2017 um 11:55 schrieb Aleksey Shipilev: >>>> On 12/05/2017 11:50 AM, Roman Kennke wrote: >>>> ?What would happen if code uses that operand, but new predicate mismatches it (e.g. in Shenandoah)? >>> It cannot be used in Shenandoah because we don't? use the CardTableModRefBS. Checking for the BS >>> type seems the safest way to prevent the bug. >> >> Oh, okay. >> >>>>> I intend to push backports of this to 9 and 8 too. Do I need extra reviews for those? >>>> Since this is not 9- or 8u-specific, I think you just push to sh/jdk10, and then regular backports >>>> process handles the propagation to sh/jdk9 and sh/jdk10. >>> >>> Ok. >> >> This is okay to go to sh/jdk10. Can you give aarch64 maintainers a heads-up about this fix? It >> probably warrants the fix in upstream for other collector's benefit, like Epsilon. > > It looks OK. If we have a bug report we can push it to all live > repos. > Thanks for sending a heads-up. The bug is here: https://bugs.openjdk.java.net/browse/JDK-8193193 The review thread here: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2017-December/027858.html And the commit is here now: http://hg.openjdk.java.net/jdk/hs/rev/9ca19ebea22d Thanks, Roman From aph at redhat.com Wed Jan 3 10:45:16 2018 From: aph at redhat.com (Andrew Haley) Date: Wed, 3 Jan 2018 10:45:16 +0000 Subject: RFR: Missing enter/leave around keep_alive_barrier in AArch64 In-Reply-To: <19b41b84-c0bc-7ca8-ba95-2553fe5f0aad@redhat.com> References: <19b41b84-c0bc-7ca8-ba95-2553fe5f0aad@redhat.com> Message-ID: <346df7f3-a4b4-eb7e-8743-8fd6c3b90d5d@redhat.com> On 07/12/17 13:18, Roman Kennke wrote: > I've been missing enter/leave calls around the SATB pre barrier call in > MacroAssembler::keep_alive_barrier() for Shenandoah. This has been > sending EvilSyncBug (and possible some other tests) into endless loops. > > The cleanest place to have them is in the (only) user of it in > generate_Reference_get(): > > http://cr.openjdk.java.net/~rkennke/aarch64-enter-leave/webrev.00/ > > Test: EvilSyncBug terminates now (aarch64). Running other tests right now > > Ok? All this saving and restoring of registers looks fantastically inefficient. Is it the this does not matter because it is very rare? -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From rkennke at redhat.com Wed Jan 3 12:08:55 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 3 Jan 2018 13:08:55 +0100 Subject: RFR: Missing enter/leave around keep_alive_barrier in AArch64 In-Reply-To: <346df7f3-a4b4-eb7e-8743-8fd6c3b90d5d@redhat.com> References: <19b41b84-c0bc-7ca8-ba95-2553fe5f0aad@redhat.com> <346df7f3-a4b4-eb7e-8743-8fd6c3b90d5d@redhat.com> Message-ID: Am 03.01.2018 um 11:45 schrieb Andrew Haley: > On 07/12/17 13:18, Roman Kennke wrote: >> I've been missing enter/leave calls around the SATB pre barrier call in >> MacroAssembler::keep_alive_barrier() for Shenandoah. This has been >> sending EvilSyncBug (and possible some other tests) into endless loops. >> >> The cleanest place to have them is in the (only) user of it in >> generate_Reference_get(): >> >> http://cr.openjdk.java.net/~rkennke/aarch64-enter-leave/webrev.00/ >> >> Test: EvilSyncBug terminates now (aarch64). Running other tests right now >> >> Ok? > > All this saving and restoring of registers looks fantastically inefficient. > Is it the this does not matter because it is very rare? > Are you referring to enter()/leave() around calling the keep-alive-barriers? I think this is ok: it only pushes/pops a stack frame, and it is only needed and done in the Reference_get() interpreter 'intrinsic', because it doesn't have a stack frame on its own. Roman From aph at redhat.com Wed Jan 3 12:16:45 2018 From: aph at redhat.com (Andrew Haley) Date: Wed, 3 Jan 2018 12:16:45 +0000 Subject: RFR: Missing enter/leave around keep_alive_barrier in AArch64 In-Reply-To: References: <19b41b84-c0bc-7ca8-ba95-2553fe5f0aad@redhat.com> <346df7f3-a4b4-eb7e-8743-8fd6c3b90d5d@redhat.com> Message-ID: <7938b616-d257-eb96-2288-8e17ff83e4fe@redhat.com> On 03/01/18 12:08, Roman Kennke wrote: > Are you referring to enter()/leave() around calling the > keep-alive-barriers? I think this is ok: it only pushes/pops a stack > frame, and it is only needed and done in the Reference_get() interpreter > 'intrinsic', because it doesn't have a stack frame on its own. I'm thinking about the SATB barrier. Calling into the runtime clobbers all call-clobbered registers, and that's a lot, just to push one pointer onto a list. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From rkennke at redhat.com Wed Jan 3 12:34:10 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 3 Jan 2018 13:34:10 +0100 Subject: RFR: Missing enter/leave around keep_alive_barrier in AArch64 In-Reply-To: <7938b616-d257-eb96-2288-8e17ff83e4fe@redhat.com> References: <19b41b84-c0bc-7ca8-ba95-2553fe5f0aad@redhat.com> <346df7f3-a4b4-eb7e-8743-8fd6c3b90d5d@redhat.com> <7938b616-d257-eb96-2288-8e17ff83e4fe@redhat.com> Message-ID: <1aa12595-2582-e730-3614-e94851534284@redhat.com> Am 03.01.2018 um 13:16 schrieb Andrew Haley: > On 03/01/18 12:08, Roman Kennke wrote: >> Are you referring to enter()/leave() around calling the >> keep-alive-barriers? I think this is ok: it only pushes/pops a stack >> frame, and it is only needed and done in the Reference_get() interpreter >> 'intrinsic', because it doesn't have a stack frame on its own. > > I'm thinking about the SATB barrier. Calling into the runtime > clobbers all call-clobbered registers, and that's a lot, just to > push one pointer onto a list. > Ah. This is ok. There is an assembly fast-path that checks for SATB-active, and if it is, pushes the pointer to the list. Only when the buffer is full, it calls into the slowpath/runtime, and only then it needs to push/pop the registers. And only in interpreted code. Roman From zgu at redhat.com Wed Jan 3 18:29:26 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 3 Jan 2018 13:29:26 -0500 Subject: RFR: Minor cleanup, uses latest Atomic API Message-ID: <5bc3c6bf-c222-b391-dda3-39f1d6a8a2a3@redhat.com> Minor cleanup. Uses Atomic::sub() and Atomic::replace_if_null() APIs. Webrev: http://cr.openjdk.java.net/~zgu/shenandoah/atomic_cleanup/webrev.00/ Test: hotspot_gc_shenandoah (fastdebug + release) Thanks, -Zhengyu From rkennke at redhat.com Wed Jan 3 18:40:37 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 03 Jan 2018 19:40:37 +0100 Subject: RFR: Minor cleanup, uses latest Atomic API In-Reply-To: <5bc3c6bf-c222-b391-dda3-39f1d6a8a2a3@redhat.com> References: <5bc3c6bf-c222-b391-dda3-39f1d6a8a2a3@redhat.com> Message-ID: <27EE6BAC-3C0C-4DF9-8663-EB540D4F604E@redhat.com> Looks good! Thanks! Am 3. Januar 2018 19:29:26 MEZ schrieb Zhengyu Gu : >Minor cleanup. Uses Atomic::sub() and Atomic::replace_if_null() APIs. > >Webrev: >http://cr.openjdk.java.net/~zgu/shenandoah/atomic_cleanup/webrev.00/ > > >Test: > hotspot_gc_shenandoah (fastdebug + release) > > >Thanks, > >-Zhengyu -- Diese Nachricht wurde von meinem Android-Ger?t mit K-9 Mail gesendet. From zgu at redhat.com Wed Jan 3 18:48:58 2018 From: zgu at redhat.com (zgu at redhat.com) Date: Wed, 03 Jan 2018 18:48:58 +0000 Subject: hg: shenandoah/jdk10: Minor cleanup, uses latest Atomic API Message-ID: <201801031848.w03ImxWc027341@aojmv0008.oracle.com> Changeset: 1819ee64325f Author: zgu Date: 2018-01-03 13:44 -0500 URL: http://hg.openjdk.java.net/shenandoah/jdk10/rev/1819ee64325f Minor cleanup, uses latest Atomic API ! src/hotspot/share/gc/shenandoah/shenandoahCodeRoots.hpp ! src/hotspot/share/gc/shenandoah/shenandoahStrDedupTable.cpp From shade at redhat.com Tue Jan 9 15:28:52 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 9 Jan 2018 16:28:52 +0100 Subject: RFR: Match barrier fastpath checks better Message-ID: http://cr.openjdk.java.net/~shade/shenandoah/match-barrier-checks/webrev.01/ (Roland made the draft revision of this patch last year) Current barrier fastpath checks the flags like this: 0x0: movzbl 0x3d8(%r15),%r10d ; check evac-in-progress +0x8: test %r10d,%r10d +0xB: jne SLOW-PATH +0x11: ... This wastes the register %reg, which is bad when barriers are back-to-back and register pressure is high. The fix trivially folds the checks against memory with byte-sized immediates with cmpb, so the resulting code is register-less and shorter: 0x0: cmpb $0x0,0x3d8(%r15) +0x8: jne SLOW-PATH +0xE: ... This follows similar .ad patterns that fold particular cmp shapes, and the fix would be upstreamed separately. We would like to have this in Shenandoah repos for more thorough testing. "Unsigned" shape covers Shenandoah WB checks, and "signed" covers SATB checks. (Amusingly, this affects C2, but not C1, which generates cmpb for cases like these.) We actually need only tests against zero-es, but there is nothing that prevents us to check for the entire range of bytes. Regular benchmarks are affected very little, with some tiny improvements -- because barriers there are already well-optimized. But in cases where barriers are not optimized(-able), the improvement is substantial. For example, in recent SPSCQueue benchmarks [1], the score improved around +50%. Testing: hotspot_gc_shenandoah {fastdebug|release}, specjvm Thanks, -Aleksey [1] http://cr.openjdk.java.net/~shade/shenandoah/jctools-QueueThroughputBackoffNone.txt From rkennke at redhat.com Tue Jan 9 15:57:05 2018 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 9 Jan 2018 16:57:05 +0100 Subject: RFR: Match barrier fastpath checks better In-Reply-To: References: Message-ID: <47f6a86d-b7f8-8a87-d359-93af58cd69de@redhat.com> Am 09.01.2018 um 16:28 schrieb Aleksey Shipilev: > http://cr.openjdk.java.net/~shade/shenandoah/match-barrier-checks/webrev.01/ > (Roland made the draft revision of this patch last year) > > Current barrier fastpath checks the flags like this: > > 0x0: movzbl 0x3d8(%r15),%r10d ; check evac-in-progress > +0x8: test %r10d,%r10d > +0xB: jne SLOW-PATH > +0x11: ... > > This wastes the register %reg, which is bad when barriers are back-to-back and register pressure is > high. The fix trivially folds the checks against memory with byte-sized immediates with cmpb, so the > resulting code is register-less and shorter: > > 0x0: cmpb $0x0,0x3d8(%r15) > +0x8: jne SLOW-PATH > +0xE: ... > > This follows similar .ad patterns that fold particular cmp shapes, and the fix would be upstreamed > separately. We would like to have this in Shenandoah repos for more thorough testing. "Unsigned" > shape covers Shenandoah WB checks, and "signed" covers SATB checks. (Amusingly, this affects C2, but > not C1, which generates cmpb for cases like these.) We actually need only tests against zero-es, but > there is nothing that prevents us to check for the entire range of bytes. > > Regular benchmarks are affected very little, with some tiny improvements -- because barriers there > are already well-optimized. But in cases where barriers are not optimized(-able), the improvement is > substantial. For example, in recent SPSCQueue benchmarks [1], the score improved around +50%. > > Testing: hotspot_gc_shenandoah {fastdebug|release}, specjvm > > Thanks, > -Aleksey > > [1] http://cr.openjdk.java.net/~shade/shenandoah/jctools-QueueThroughputBackoffNone.txt > Looks good to me. Will test it later with traversal heuristics. From rwestrel at redhat.com Wed Jan 10 07:51:48 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 10 Jan 2018 08:51:48 +0100 Subject: RFR: Match barrier fastpath checks better In-Reply-To: References: Message-ID: > http://cr.openjdk.java.net/~shade/shenandoah/match-barrier-checks/webrev.01/ Good. Thanks for taking care of that. Roland. From ashipile at redhat.com Wed Jan 10 09:25:51 2018 From: ashipile at redhat.com (ashipile at redhat.com) Date: Wed, 10 Jan 2018 09:25:51 +0000 Subject: hg: shenandoah/jdk10: Match barrier fastpath checks better Message-ID: <201801100925.w0A9PpjI015263@aojmv0008.oracle.com> Changeset: 5eee46621175 Author: shade Date: 2018-01-09 16:05 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk10/rev/5eee46621175 Match barrier fastpath checks better ! src/hotspot/cpu/x86/x86_64.ad From shade at redhat.com Wed Jan 10 09:45:26 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 10 Jan 2018 10:45:26 +0100 Subject: Perf: SATB and WB coalescing Message-ID: <5d967fd8-1cf3-d42d-6ad0-a3401fa23fbb@redhat.com> If you do a few back-to-back reference stores, like this: http://icedtea.classpath.org/hg/gc-bench/file/6ec38e1bea7a/src/main/java/org/openjdk/gcbench/wip/BarriersMultiple.java Then you shall find what WB coalescing breaks because of the SATB barriers in-between. See: *) No WB, no SATB -> back-to-back stores: http://cr.openjdk.java.net/~shade/shenandoah/perf-wb-satb/noWB-noSATB.perfasm *) WB, but no SATB -> initial evac-in-progress check, then back-to-back stores with RBs: http://cr.openjdk.java.net/~shade/shenandoah/perf-wb-satb/WB-noSATB.perfasm *) WB with SATB -> interleaved evac-in-progress and conc-mark-in-progress checks: http://cr.openjdk.java.net/~shade/shenandoah/perf-wb-satb/WB-SATB.perfasm It seems the impact of the non-coalesced SATB barriers alone is the culprit, and WB coalescing is the second-order effect: Benchmark Mode Cnt Score Error Units # Base BarriersMultiple.test avgt 15 2.739 ? 0.003 ns/op BarriersMultiple.test:L1-dcache-loads avgt 3 13.128 ? 0.475 #/op BarriersMultiple.test:L1-dcache-stores avgt 3 8.103 ? 0.133 #/op BarriersMultiple.test:branches avgt 3 4.039 ? 0.213 #/op BarriersMultiple.test:cycles avgt 3 10.344 ? 0.413 #/op BarriersMultiple.test:instructions avgt 3 30.273 ? 1.280 #/op # +WB BarriersMultiple.test avgt 15 3.459 ? 0.011 ns/op BarriersMultiple.test:L1-dcache-loads avgt 3 19.195 ? 0.638 #/op // +6 BarriersMultiple.test:L1-dcache-stores avgt 3 8.080 ? 0.539 #/op BarriersMultiple.test:branches avgt 3 4.045 ? 0.118 #/op BarriersMultiple.test:cycles avgt 3 13.031 ? 0.324 #/op // +3 BarriersMultiple.test:instructions avgt 3 40.426 ? 1.133 #/op # +SATB BarriersMultiple.test avgt 15 3.620 ? 0.005 ns/op BarriersMultiple.test:L1-dcache-loads avgt 3 18.148 ? 0.519 #/op // +5 BarriersMultiple.test:L1-dcache-stores avgt 3 8.065 ? 0.409 #/op BarriersMultiple.test:branches avgt 3 13.115 ? 0.423 #/op BarriersMultiple.test:cycles avgt 3 13.628 ? 0.471 #/op // +3.5 BarriersMultiple.test:instructions avgt 3 49.421 ? 1.880 #/op # +SATB +WB BarriersMultiple.test avgt 15 4.923 ? 0.040 ns/op BarriersMultiple.test:L1-dcache-loads avgt 3 28.269 ? 1.519 #/op // +15 (should be +11) BarriersMultiple.test:L1-dcache-stores avgt 3 8.112 ? 1.161 #/op BarriersMultiple.test:branches avgt 3 13.134 ? 1.134 #/op BarriersMultiple.test:cycles avgt 3 18.561 ? 1.198 #/op // +8 (should be +6.5) BarriersMultiple.test:instructions avgt 3 56.577 ? 4.024 #/op I wonder if that means we need to go forward with tracking the GC state in one single flag, and polling it with different masks, then coalescing the paths when masks are similar? Thanks, -Aleksey From rkennke at redhat.com Wed Jan 10 11:12:41 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 10 Jan 2018 12:12:41 +0100 Subject: Perf: SATB and WB coalescing In-Reply-To: <5d967fd8-1cf3-d42d-6ad0-a3401fa23fbb@redhat.com> References: <5d967fd8-1cf3-d42d-6ad0-a3401fa23fbb@redhat.com> Message-ID: <4bde5ea8-4d85-ca74-dc76-6214f23d5823@redhat.com> Am 10.01.2018 um 10:45 schrieb Aleksey Shipilev: > If you do a few back-to-back reference stores, like this: > > http://icedtea.classpath.org/hg/gc-bench/file/6ec38e1bea7a/src/main/java/org/openjdk/gcbench/wip/BarriersMultiple.java > > Then you shall find what WB coalescing breaks because of the SATB barriers in-between. See: > > *) No WB, no SATB -> back-to-back stores: > http://cr.openjdk.java.net/~shade/shenandoah/perf-wb-satb/noWB-noSATB.perfasm > > *) WB, but no SATB -> initial evac-in-progress check, then back-to-back stores with RBs: > http://cr.openjdk.java.net/~shade/shenandoah/perf-wb-satb/WB-noSATB.perfasm > > *) WB with SATB -> interleaved evac-in-progress and conc-mark-in-progress checks: > http://cr.openjdk.java.net/~shade/shenandoah/perf-wb-satb/WB-SATB.perfasm > > It seems the impact of the non-coalesced SATB barriers alone is the culprit, and WB coalescing is > the second-order effect: > > Benchmark Mode Cnt Score Error Units > > # Base > BarriersMultiple.test avgt 15 2.739 ? 0.003 ns/op > BarriersMultiple.test:L1-dcache-loads avgt 3 13.128 ? 0.475 #/op > BarriersMultiple.test:L1-dcache-stores avgt 3 8.103 ? 0.133 #/op > BarriersMultiple.test:branches avgt 3 4.039 ? 0.213 #/op > BarriersMultiple.test:cycles avgt 3 10.344 ? 0.413 #/op > BarriersMultiple.test:instructions avgt 3 30.273 ? 1.280 #/op > > # +WB > BarriersMultiple.test avgt 15 3.459 ? 0.011 ns/op > BarriersMultiple.test:L1-dcache-loads avgt 3 19.195 ? 0.638 #/op // +6 > BarriersMultiple.test:L1-dcache-stores avgt 3 8.080 ? 0.539 #/op > BarriersMultiple.test:branches avgt 3 4.045 ? 0.118 #/op > BarriersMultiple.test:cycles avgt 3 13.031 ? 0.324 #/op // +3 > BarriersMultiple.test:instructions avgt 3 40.426 ? 1.133 #/op > > # +SATB > BarriersMultiple.test avgt 15 3.620 ? 0.005 ns/op > BarriersMultiple.test:L1-dcache-loads avgt 3 18.148 ? 0.519 #/op // +5 > BarriersMultiple.test:L1-dcache-stores avgt 3 8.065 ? 0.409 #/op > BarriersMultiple.test:branches avgt 3 13.115 ? 0.423 #/op > BarriersMultiple.test:cycles avgt 3 13.628 ? 0.471 #/op // +3.5 > BarriersMultiple.test:instructions avgt 3 49.421 ? 1.880 #/op > > # +SATB +WB > BarriersMultiple.test avgt 15 4.923 ? 0.040 ns/op > BarriersMultiple.test:L1-dcache-loads avgt 3 28.269 ? 1.519 #/op // +15 (should be +11) > BarriersMultiple.test:L1-dcache-stores avgt 3 8.112 ? 1.161 #/op > BarriersMultiple.test:branches avgt 3 13.134 ? 1.134 #/op > BarriersMultiple.test:cycles avgt 3 18.561 ? 1.198 #/op // +8 (should be +6.5) > BarriersMultiple.test:instructions avgt 3 56.577 ? 4.024 #/op > > I wonder if that means we need to go forward with tracking the GC state in one single flag, and > polling it with different masks, then coalescing the paths when masks are similar? > > Thanks, > -Aleksey > That confirms what I suspected since a while. And I also sorta hope that the traversal GC will solve it, because it only ever polls a single flag. We might even want to wrap RBs into evac-flag-checks initially, so that the optimizer can coalesce them too, and remove lone evac-checks-around-RBs after optimization. Another related issue may be that both the GC barriers and a bunch of other stuff pollutes the raw memory slice. Which means that an interleaving allocation (among other stuff) in between barriers may prevent coalescing and optimization. I wonder if it makes sense to put all GC barriers on a separate memory slice instead? We basically need a memory slice that says 'stuff on this slice only ever changes at safepoints'. Roman Roman From shade at redhat.com Wed Jan 10 11:16:37 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 10 Jan 2018 12:16:37 +0100 Subject: Perf: SATB and WB coalescing In-Reply-To: <4bde5ea8-4d85-ca74-dc76-6214f23d5823@redhat.com> References: <5d967fd8-1cf3-d42d-6ad0-a3401fa23fbb@redhat.com> <4bde5ea8-4d85-ca74-dc76-6214f23d5823@redhat.com> Message-ID: <219f8962-fc13-765a-4239-c8f4b92bdf22@redhat.com> On 01/10/2018 12:12 PM, Roman Kennke wrote: > That confirms what I suspected since a while. And I also sorta hope that the traversal GC will solve > it, because it only ever polls a single flag. We might even want to wrap RBs into evac-flag-checks > initially, so that the optimizer can coalesce them too, and remove lone evac-checks-around-RBs after > optimization. Let's not conflate this with traversal GC: flag handling and coalescing barriers in important even for our regular cycle. So I'd rather improve that part of the story, and then build traversal GC on top. Do you have a separate patch that introduces a single flag instead of the assortment of {mark,evac,updaterefs}-in-progress and fixes all the uses around? That would be a base for further compiler optimizations, I think. Thanks, -Aleksey From rkennke at redhat.com Wed Jan 10 11:20:25 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 10 Jan 2018 12:20:25 +0100 Subject: Perf: SATB and WB coalescing In-Reply-To: <219f8962-fc13-765a-4239-c8f4b92bdf22@redhat.com> References: <5d967fd8-1cf3-d42d-6ad0-a3401fa23fbb@redhat.com> <4bde5ea8-4d85-ca74-dc76-6214f23d5823@redhat.com> <219f8962-fc13-765a-4239-c8f4b92bdf22@redhat.com> Message-ID: <59e36329-c6d2-0ce7-ba76-29319274072e@redhat.com> Am 10.01.2018 um 12:16 schrieb Aleksey Shipilev: > On 01/10/2018 12:12 PM, Roman Kennke wrote: >> That confirms what I suspected since a while. And I also sorta hope that the traversal GC will solve >> it, because it only ever polls a single flag. We might even want to wrap RBs into evac-flag-checks >> initially, so that the optimizer can coalesce them too, and remove lone evac-checks-around-RBs after >> optimization. > > Let's not conflate this with traversal GC: flag handling and coalescing barriers in important even > for our regular cycle. So I'd rather improve that part of the story, and then build traversal GC on top. > Sure, that makes sense. > Do you have a separate patch that introduces a single flag instead of the assortment of > {mark,evac,updaterefs}-in-progress and fixes all the uses around? That would be a base for further > compiler optimizations, I think. No, for traversal GC I simply picked one flag (evac) and barriers use only that. How would you use a single flag, if we need to check 2 or 3 different phases? Roman From aph at redhat.com Wed Jan 10 11:22:05 2018 From: aph at redhat.com (Andrew Haley) Date: Wed, 10 Jan 2018 11:22:05 +0000 Subject: Perf: SATB and WB coalescing In-Reply-To: <4bde5ea8-4d85-ca74-dc76-6214f23d5823@redhat.com> References: <5d967fd8-1cf3-d42d-6ad0-a3401fa23fbb@redhat.com> <4bde5ea8-4d85-ca74-dc76-6214f23d5823@redhat.com> Message-ID: <9c5043ae-c437-af9e-649b-9ad46a19c4b8@redhat.com> On 10/01/18 11:12, Roman Kennke wrote: > We basically need a > memory slice that says 'stuff on this slice only ever changes at > safepoints'. I need something very similar for unmappable ByteBuffers. -- Andrew Haley Java Platform Lead Engineer Red Hat UK Ltd. EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From shade at redhat.com Wed Jan 10 11:30:22 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 10 Jan 2018 12:30:22 +0100 Subject: Perf: SATB and WB coalescing In-Reply-To: <59e36329-c6d2-0ce7-ba76-29319274072e@redhat.com> References: <5d967fd8-1cf3-d42d-6ad0-a3401fa23fbb@redhat.com> <4bde5ea8-4d85-ca74-dc76-6214f23d5823@redhat.com> <219f8962-fc13-765a-4239-c8f4b92bdf22@redhat.com> <59e36329-c6d2-0ce7-ba76-29319274072e@redhat.com> Message-ID: On 01/10/2018 12:20 PM, Roman Kennke wrote: > Am 10.01.2018 um 12:16 schrieb Aleksey Shipilev: >> On 01/10/2018 12:12 PM, Roman Kennke wrote: >>> That confirms what I suspected since a while. And I also sorta hope that the traversal GC will solve >>> it, because it only ever polls a single flag. We might even want to wrap RBs into evac-flag-checks >>> initially, so that the optimizer can coalesce them too, and remove lone evac-checks-around-RBs after >>> optimization. >> >> Let's not conflate this with traversal GC: flag handling and coalescing barriers in important even >> for our regular cycle. So I'd rather improve that part of the story, and then build traversal GC >> on top. >> > > Sure, that makes sense. > >> Do you have a separate patch that introduces a single flag instead of the assortment of >> {mark,evac,updaterefs}-in-progress and fixes all the uses around? That would be a base for further >> compiler optimizations, I think. > > No, for traversal GC I simply picked one flag (evac) and barriers use only that. > > How would you use a single flag, if we need to check 2 or 3 different phases? Flag is int, and then bitmask it? Something like: // Describes the current global GC state enum ShenandoahCollectorState { // Heap is not stable: there are forwarded objects. _heap_unstable, // Heap is under stabilization: do not introduce new forwarded objects. _heap_updating, // Heap is under evacuation: new forwarded objects are introduced. _heap_evacuating, // Heap is under marking. _heap_marking, }; enum ShenandoahCollectorStateMask { _mask_heap_marking = 1 << _heap_marking; _mask_heap_evacuating = 1 << _heap_evacuating; ... }; Later: movptr(tmp, (intptr_t) ShenandoahHeap::gc_state_addr()); movb(tmp, Address(tmp, 0)); testb(tmp, ShenandoahHeap::_mask_heap_evacuating); I think phases really have different bit patterns then: mark: _heap_marking mark + UR: _heap_marking + _heap_updating + _heap_unstable evac: _heap_evacuating + _heap_unstable update-refs: _heap_updating + _heap_unstable idle: 0 idle + waiting for mark to UR: _heap_unstable partial: _heap_evacuating + _heap_updating + _heap_unstable Barriers mapping: RB, CAS, ACMP if _heap_unstable WB if _heap_evacuating SVRB if _heap_updating, but not _heap_evacuating SWRB if _heap_updating, and _heap_evacuating SATB if _heap_marking Something like that... Then a happy path in compiler-specialized code basically checks the GC state for 0, which means no barriers are required whatsoever until the safepoint hits, or _heap_unstable, which means only RBs are required on that path until the safepoint. Thanks, -Aleksey From rkennke at redhat.com Wed Jan 10 11:35:46 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 10 Jan 2018 12:35:46 +0100 Subject: Perf: SATB and WB coalescing In-Reply-To: References: <5d967fd8-1cf3-d42d-6ad0-a3401fa23fbb@redhat.com> <4bde5ea8-4d85-ca74-dc76-6214f23d5823@redhat.com> <219f8962-fc13-765a-4239-c8f4b92bdf22@redhat.com> <59e36329-c6d2-0ce7-ba76-29319274072e@redhat.com> Message-ID: <93b69a79-a645-1bd0-bd0f-1fffbd32aaec@redhat.com> Am 10.01.2018 um 12:30 schrieb Aleksey Shipilev: > On 01/10/2018 12:20 PM, Roman Kennke wrote: >> Am 10.01.2018 um 12:16 schrieb Aleksey Shipilev: >>> On 01/10/2018 12:12 PM, Roman Kennke wrote: >>>> That confirms what I suspected since a while. And I also sorta hope that the traversal GC will solve >>>> it, because it only ever polls a single flag. We might even want to wrap RBs into evac-flag-checks >>>> initially, so that the optimizer can coalesce them too, and remove lone evac-checks-around-RBs after >>>> optimization. >>> >>> Let's not conflate this with traversal GC: flag handling and coalescing barriers in important even >>> for our regular cycle. So I'd rather improve that part of the story, and then build traversal GC >>> on top. >>> >> >> Sure, that makes sense. >> >>> Do you have a separate patch that introduces a single flag instead of the assortment of >>> {mark,evac,updaterefs}-in-progress and fixes all the uses around? That would be a base for further >>> compiler optimizations, I think. >> >> No, for traversal GC I simply picked one flag (evac) and barriers use only that. >> >> How would you use a single flag, if we need to check 2 or 3 different phases? > > Flag is int, and then bitmask it? > > Something like: > > // Describes the current global GC state > enum ShenandoahCollectorState { > // Heap is not stable: there are forwarded objects. > _heap_unstable, > > // Heap is under stabilization: do not introduce new forwarded objects. > _heap_updating, > > // Heap is under evacuation: new forwarded objects are introduced. > _heap_evacuating, > > // Heap is under marking. > _heap_marking, > }; > > enum ShenandoahCollectorStateMask { > _mask_heap_marking = 1 << _heap_marking; > _mask_heap_evacuating = 1 << _heap_evacuating; > ... > }; > > Later: > > movptr(tmp, (intptr_t) ShenandoahHeap::gc_state_addr()); > movb(tmp, Address(tmp, 0)); > testb(tmp, ShenandoahHeap::_mask_heap_evacuating); > > I think phases really have different bit patterns then: > > mark: _heap_marking > mark + UR: _heap_marking + _heap_updating + _heap_unstable > evac: _heap_evacuating + _heap_unstable > update-refs: _heap_updating + _heap_unstable > idle: 0 > idle + waiting for mark to UR: _heap_unstable > partial: _heap_evacuating + _heap_updating + _heap_unstable > > Barriers mapping: > RB, CAS, ACMP if _heap_unstable > WB if _heap_evacuating > SVRB if _heap_updating, but not _heap_evacuating > SWRB if _heap_updating, and _heap_evacuating > SATB if _heap_marking > > Something like that... > > Then a happy path in compiler-specialized code basically checks the GC state for 0, which means no > barriers are required whatsoever until the safepoint hits, or _heap_unstable, which means only RBs > are required on that path until the safepoint. > > Thanks, > -Aleksey > Ah! I made something like this a while ago and it hasn't gone in back then: http://cr.openjdk.java.net/~rkennke/gc-phase-flag/webrev.01/ Roman From shade at redhat.com Wed Jan 10 11:43:10 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 10 Jan 2018 12:43:10 +0100 Subject: Perf: SATB and WB coalescing In-Reply-To: <93b69a79-a645-1bd0-bd0f-1fffbd32aaec@redhat.com> References: <5d967fd8-1cf3-d42d-6ad0-a3401fa23fbb@redhat.com> <4bde5ea8-4d85-ca74-dc76-6214f23d5823@redhat.com> <219f8962-fc13-765a-4239-c8f4b92bdf22@redhat.com> <59e36329-c6d2-0ce7-ba76-29319274072e@redhat.com> <93b69a79-a645-1bd0-bd0f-1fffbd32aaec@redhat.com> Message-ID: <0acac6bd-fddc-3325-fe0d-8e60e466dbc3@redhat.com> On 01/10/2018 12:35 PM, Roman Kennke wrote: > Ah! > I made something like this a while ago and it hasn't gone in back then: > http://cr.openjdk.java.net/~rkennke/gc-phase-flag/webrev.01/ Okay! That looks like a good start. Now we "only" need to cover all other phases, and fix up the codegen to make use of "test 0xOFF(TLS), mask". :) I still think the phases themselves are inconvenient to encode, because they don't say everything about the heap. For example, you would want to disambiguate the idle phase that has forwarded objects waiting for CM-with-UR to fix stuff up, and idle phase where everything is fixed up. Maybe just introducing separate "idle" and "idle-need-fixup" phases would be enough? Then we can approach compiler checking for "idle" state, and optimize the happy path accordingly. Thanks, -Aleksey From rkennke at redhat.com Wed Jan 10 11:45:37 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 10 Jan 2018 12:45:37 +0100 Subject: RFR: Match barrier fastpath checks better In-Reply-To: References: Message-ID: <36120d11-e479-e174-2529-d5d66f0b40d1@redhat.com> Am 09.01.2018 um 16:28 schrieb Aleksey Shipilev: > http://cr.openjdk.java.net/~shade/shenandoah/match-barrier-checks/webrev.01/ > (Roland made the draft revision of this patch last year) > > Current barrier fastpath checks the flags like this: > > 0x0: movzbl 0x3d8(%r15),%r10d ; check evac-in-progress > +0x8: test %r10d,%r10d > +0xB: jne SLOW-PATH > +0x11: ... > > This wastes the register %reg, which is bad when barriers are back-to-back and register pressure is > high. The fix trivially folds the checks against memory with byte-sized immediates with cmpb, so the > resulting code is register-less and shorter: > > 0x0: cmpb $0x0,0x3d8(%r15) > +0x8: jne SLOW-PATH > +0xE: ... > > This follows similar .ad patterns that fold particular cmp shapes, and the fix would be upstreamed > separately. We would like to have this in Shenandoah repos for more thorough testing. "Unsigned" > shape covers Shenandoah WB checks, and "signed" covers SATB checks. (Amusingly, this affects C2, but > not C1, which generates cmpb for cases like these.) We actually need only tests against zero-es, but > there is nothing that prevents us to check for the entire range of bytes. > > Regular benchmarks are affected very little, with some tiny improvements -- because barriers there > are already well-optimized. But in cases where barriers are not optimized(-able), the improvement is > substantial. For example, in recent SPSCQueue benchmarks [1], the score improved around +50%. > > Testing: hotspot_gc_shenandoah {fastdebug|release}, specjvm > > Thanks, > -Aleksey > > [1] http://cr.openjdk.java.net/~shade/shenandoah/jctools-QueueThroughputBackoffNone.txt > I tested it with traversal GC. It works and doesn't crash. It doesn't seem faster. But traversal GC is handicapped anyway until we get some proper optimizations. Roman From shade at redhat.com Wed Jan 10 12:08:40 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 10 Jan 2018 13:08:40 +0100 Subject: RFR: ShenandoahWriteBarrierRB flag to conditionally disable RB on WB fastpath Message-ID: http://cr.openjdk.java.net/~shade/shenandoah/barrier-disable-wb-fastpath-rb/webrev.01/ I keep reimplementing this patch during performance investigations. It is sometimes useful to dissect the WB cost by measuring if the evac-in-progress check itself, or the RB on the fastpath is the reason for performance penalty. New flag allows to do that. Testing: hotspot_gc_shenandoah, eyeballing assembly Thanks, -Aleksey From rkennke at redhat.com Wed Jan 10 12:22:43 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 10 Jan 2018 13:22:43 +0100 Subject: Perf: SATB and WB coalescing In-Reply-To: <4bde5ea8-4d85-ca74-dc76-6214f23d5823@redhat.com> References: <5d967fd8-1cf3-d42d-6ad0-a3401fa23fbb@redhat.com> <4bde5ea8-4d85-ca74-dc76-6214f23d5823@redhat.com> Message-ID: <91984108-4922-3756-93d5-e57bbd28fce4@redhat.com> > That confirms what I suspected since a while. And I also sorta hope that > the traversal GC will solve it, because it only ever polls a single > flag. We might even want to wrap RBs into evac-flag-checks initially, so > that the optimizer can coalesce them too, and remove lone > evac-checks-around-RBs after optimization. > > Another related issue may be that both the GC barriers and a bunch of > other stuff pollutes the raw memory slice. Which means that an > interleaving allocation (among other stuff) in between barriers may > prevent coalescing and optimization. I wonder if it makes sense to put > all GC barriers on a separate memory slice instead? We basically need a > memory slice that says 'stuff on this slice only ever changes at > safepoints'. Allocations are probably a bad example, because allocations *can* trigger safepoints (on slowpath). Not sure if we could possibly generate barrier-free-paths on paths with allocations but without alloc-slow-paths? A better example is indeed SATB barriers: they currently consume and produce raw memory slice. Which means that they disturb optimizations of other barriers. I.e. they cause re-load and re-check of the -in-progress-flags (and thus coalescing them). As you noted, SATB barriers are particularily bad because they tend to interleave with RBs and WBs. There are other things that produce raw memory, but cannot cause a safepoint that would disturb us similarily (e.g. monitorexit). Ideally, when the new GC interface arrives, we'll get to generate the whole blob for 'store-oop-to-heap' in which case we can generate one gc-phase-check to begin with, and put all relevant barriers inside that check (...and still be subject to further coalescing,, path-splitting and loop hoisting in later optimization phases). Roman From rkennke at redhat.com Wed Jan 10 12:23:14 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 10 Jan 2018 13:23:14 +0100 Subject: RFR: ShenandoahWriteBarrierRB flag to conditionally disable RB on WB fastpath In-Reply-To: References: Message-ID: Am 10.01.2018 um 13:08 schrieb Aleksey Shipilev: > http://cr.openjdk.java.net/~shade/shenandoah/barrier-disable-wb-fastpath-rb/webrev.01/ > > I keep reimplementing this patch during performance investigations. It is sometimes useful to > dissect the WB cost by measuring if the evac-in-progress check itself, or the RB on the fastpath is > the reason for performance penalty. New flag allows to do that. > > Testing: hotspot_gc_shenandoah, eyeballing assembly > > Thanks, > -Aleksey > Ok From ashipile at redhat.com Wed Jan 10 20:12:06 2018 From: ashipile at redhat.com (ashipile at redhat.com) Date: Wed, 10 Jan 2018 20:12:06 +0000 Subject: hg: shenandoah/jdk10: ShenandoahWriteBarrierRB flag to conditionally disable RB on WB fastpath Message-ID: <201801102012.w0AKC6EG020627@aojmv0008.oracle.com> Changeset: 92710862e1a5 Author: shade Date: 2018-01-10 13:05 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk10/rev/92710862e1a5 ShenandoahWriteBarrierRB flag to conditionally disable RB on WB fastpath ! src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp ! src/hotspot/cpu/x86/macroAssembler_x86.cpp ! src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp ! src/hotspot/share/opto/shenandoahSupport.cpp From shade at redhat.com Wed Jan 10 20:29:16 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 10 Jan 2018 21:29:16 +0100 Subject: Perf: SATB and WB coalescing In-Reply-To: <0acac6bd-fddc-3325-fe0d-8e60e466dbc3@redhat.com> References: <5d967fd8-1cf3-d42d-6ad0-a3401fa23fbb@redhat.com> <4bde5ea8-4d85-ca74-dc76-6214f23d5823@redhat.com> <219f8962-fc13-765a-4239-c8f4b92bdf22@redhat.com> <59e36329-c6d2-0ce7-ba76-29319274072e@redhat.com> <93b69a79-a645-1bd0-bd0f-1fffbd32aaec@redhat.com> <0acac6bd-fddc-3325-fe0d-8e60e466dbc3@redhat.com> Message-ID: <259d13bd-3543-2de2-028b-42d855797ffc@redhat.com> On 01/10/2018 12:43 PM, Aleksey Shipilev wrote: > On 01/10/2018 12:35 PM, Roman Kennke wrote: >> Ah! >> I made something like this a while ago and it hasn't gone in back then: >> http://cr.openjdk.java.net/~rkennke/gc-phase-flag/webrev.01/ > > I still think the phases themselves are inconvenient to encode, because they don't say everything > about the heap. For example, you would want to disambiguate the idle phase that has forwarded > objects waiting for CM-with-UR to fix stuff up, and idle phase where everything is fixed up. Maybe > just introducing separate "idle" and "idle-need-fixup" phases would be enough? Ah, that is probably solved by treating need_update_refs specially. > Then we can approach compiler checking for "idle" state, and optimize the happy path accordingly. Okay, so the dirty patch for the idea: http://cr.openjdk.java.net/~shade/shenandoah/single-flag/webrev.00/ perfasm for the offending test: http://cr.openjdk.java.net/~shade/shenandoah/single-flag/single-flag.perfasm Both SATB and WB are checking off the same TLS flag. Now, two ideas: *) The way the patch is structured now, successful testb $0x0, 0x3d8(%r15) means no barriers are required until the next safepoint poll (e.g. no marking, no evac, no update-refs, no partial, and *no need to update refs*) -- which means the heap is as stable as it gets; *) Can we instruct compiler to trust the value of 0x3d8(%r15) until the next safepoint poll? I think that would eliminate excessive L1 accesses for that TLS field at expense of wasting a register -- which might be the lesser evil; -Aleksey From rkennke at redhat.com Wed Jan 10 20:42:26 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 10 Jan 2018 21:42:26 +0100 Subject: Perf: SATB and WB coalescing In-Reply-To: <259d13bd-3543-2de2-028b-42d855797ffc@redhat.com> References: <5d967fd8-1cf3-d42d-6ad0-a3401fa23fbb@redhat.com> <4bde5ea8-4d85-ca74-dc76-6214f23d5823@redhat.com> <219f8962-fc13-765a-4239-c8f4b92bdf22@redhat.com> <59e36329-c6d2-0ce7-ba76-29319274072e@redhat.com> <93b69a79-a645-1bd0-bd0f-1fffbd32aaec@redhat.com> <0acac6bd-fddc-3325-fe0d-8e60e466dbc3@redhat.com> <259d13bd-3543-2de2-028b-42d855797ffc@redhat.com> Message-ID: <55f7bf32-7bfd-4247-21ad-6a49ee87d728@redhat.com> Am 10.01.2018 um 21:29 schrieb Aleksey Shipilev: > On 01/10/2018 12:43 PM, Aleksey Shipilev wrote: >> On 01/10/2018 12:35 PM, Roman Kennke wrote: >>> Ah! >>> I made something like this a while ago and it hasn't gone in back then: >>> http://cr.openjdk.java.net/~rkennke/gc-phase-flag/webrev.01/ >> >> I still think the phases themselves are inconvenient to encode, because they don't say everything >> about the heap. For example, you would want to disambiguate the idle phase that has forwarded >> objects waiting for CM-with-UR to fix stuff up, and idle phase where everything is fixed up. Maybe >> just introducing separate "idle" and "idle-need-fixup" phases would be enough? > > Ah, that is probably solved by treating need_update_refs specially. > >> Then we can approach compiler checking for "idle" state, and optimize the happy path accordingly. > > Okay, so the dirty patch for the idea: > http://cr.openjdk.java.net/~shade/shenandoah/single-flag/webrev.00/ > > perfasm for the offending test: > http://cr.openjdk.java.net/~shade/shenandoah/single-flag/single-flag.perfasm > > Both SATB and WB are checking off the same TLS flag. > > Now, two ideas: > > *) The way the patch is structured now, successful testb $0x0, 0x3d8(%r15) means no barriers are > required until the next safepoint poll (e.g. no marking, no evac, no update-refs, no partial, and > *no need to update refs*) -- which means the heap is as stable as it gets; > > *) Can we instruct compiler to trust the value of 0x3d8(%r15) until the next safepoint poll? I > think that would eliminate excessive L1 accesses for that TLS field at expense of wasting a register > -- which might be the lesser evil; > > -Aleksey > I was discussing this with Roland before Xmas until now. There seem to be ways to do that and all are rather complex. This could lead to split-ifs and versioned-loops that generate code paths completely without barriers. E.g.: code shaped like this: while (..) { // Assuming no SP inside loop if (evac-in-progress) { barrier() } store(); } Could be: if (evac-in-progress) { while (..) { barrier(); store(); } } else { while (..) { store(); } } Currently we also suffer other problems: since all evac- and satb-checks are consuming raw memory slice, and things like SATB barriers produce raw memory slice (for no really good reason, except that we store some non-Java-memory), we constantly pollute raw memory, leading to the compiler to not trust the evac-flags across multiple barriers or other code that produces raw memory! Roland proposed to implement compiler optimization passes that specifically optimize gc-phase-checks with respect to safepoints. I was thinking in a different direction: we could introduce a new special memory slice, e.g. Compile::SafepointIdx, with the meaning 'stuff on this slice only ever changes at safepoints'. I.e. any node that is a safepoint or could trigger a safepoint (e.g. calls, allocs, etc), would produce a new state on that slice. GC-phase-checks would consume it. This way, I think we could automatically get what we want by exploiting C2's memory aliasing model. According to Roland, this is not very trivial either though: currently SafepointNode (and sub-classes) don't produce any memory state. This might need lots of work to get right. Roman From zgu at redhat.com Wed Jan 10 21:18:36 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 10 Jan 2018 16:18:36 -0500 Subject: RFR: [9 Backport] Shenandoah string deduplication support Message-ID: This is jdk9 backport of latest string deduplication implementation. Webrev: http://cr.openjdk.java.net/~zgu/shenandoah/sh_strdedup/backport_jdk9/webrev.00/ Test: hotspot_gc_shenandoah (release + fastdebug) Thanks, -Zhengyu From shade at redhat.com Wed Jan 10 21:24:55 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 10 Jan 2018 22:24:55 +0100 Subject: RFR: [9 Backport] Shenandoah string deduplication support In-Reply-To: References: Message-ID: <734df127-06b4-d3c9-10ab-2d075bf5a41b@redhat.com> On 01/10/2018 10:18 PM, Zhengyu Gu wrote: > Webrev: http://cr.openjdk.java.net/~zgu/shenandoah/sh_strdedup/backport_jdk9/webrev.00/ Thank you, this looks good. How much was changed compared to sh/jdk10, and are there places we should take a special look at? I assume G1 changes are the retraction to the upstream jdk9 state? Thanks, -Aleksey From zgu at redhat.com Wed Jan 10 21:31:10 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 10 Jan 2018 16:31:10 -0500 Subject: RFR: [9 Backport] Shenandoah string deduplication support In-Reply-To: <734df127-06b4-d3c9-10ab-2d075bf5a41b@redhat.com> References: <734df127-06b4-d3c9-10ab-2d075bf5a41b@redhat.com> Message-ID: <211e8fe2-cffd-a713-a42c-c95452dce4ba@redhat.com> On 01/10/2018 04:24 PM, Aleksey Shipilev wrote: > On 01/10/2018 10:18 PM, Zhengyu Gu wrote: >> Webrev: http://cr.openjdk.java.net/~zgu/shenandoah/sh_strdedup/backport_jdk9/webrev.00/ > > Thank you, this looks good. > > How much was changed compared to sh/jdk10, and are there places we should take a special look at? > I assume G1 changes are the retraction to the upstream jdk9 state? > jdk10 patch applied pretty clean. The only file got mismerged was shenandoahRootProcessor.cpp. Yes, G1 changes reverted early changes back to upstream state. Thanks, -Zhengyu > Thanks, > -Aleksey > From rkennke at redhat.com Wed Jan 10 22:04:20 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 10 Jan 2018 23:04:20 +0100 Subject: RFR: Simplify and optimize ShenandoahHeap::requires_marking() Message-ID: When I implemented partial-GC, I complicated ShenandoahHeap::requires_marking(). This method basically decides which oops on SATB queues to keep and which to discard when filtering SATB queues (before processing them). The idea behind this was that we need all oops on the queues for partial-GC, but only not-yet-marked oops during concurrent marking. Work on traversal GC lead to a similar problem, and then it occurred to me that we don't need to complicate that code at all: during partial GC, we never use the bitmap. Simply returning !is_marked_next(obj) does the same as return true, and is probably. This should restore a little bit of performance of the regular Shenandoah mode, at the potential cost of a little bit of performance for partial GC. However, I am not even sure that this code is performance critical at all. I couldn't see any performance changes. http://cr.openjdk.java.net/~rkennke/req-marking/webrev.00/ Testing: hotspot_gc_shenandoah Ok to push? Roman From shade at redhat.com Wed Jan 10 22:23:25 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 10 Jan 2018 23:23:25 +0100 Subject: RFR: [9 Backport] Shenandoah string deduplication support In-Reply-To: <211e8fe2-cffd-a713-a42c-c95452dce4ba@redhat.com> References: <734df127-06b4-d3c9-10ab-2d075bf5a41b@redhat.com> <211e8fe2-cffd-a713-a42c-c95452dce4ba@redhat.com> Message-ID: <91279636-d93e-7232-9384-467d93f04c4a@redhat.com> On 01/10/2018 10:31 PM, Zhengyu Gu wrote: > > On 01/10/2018 04:24 PM, Aleksey Shipilev wrote: >> On 01/10/2018 10:18 PM, Zhengyu Gu wrote: >>> Webrev: http://cr.openjdk.java.net/~zgu/shenandoah/sh_strdedup/backport_jdk9/webrev.00/ >> >> Thank you, this looks good. >> >> How much was changed compared to sh/jdk10, and are there places we should take a special look at? >> I assume G1 changes are the retraction to the upstream jdk9 state? >> > > jdk10 patch applied pretty clean. The only file got mismerged was shenandoahRootProcessor.cpp. > > Yes, G1 changes reverted early changes back to upstream state. Good for me then! -Aleksey From shade at redhat.com Wed Jan 10 22:29:07 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 10 Jan 2018 23:29:07 +0100 Subject: RFR: Simplify and optimize ShenandoahHeap::requires_marking() In-Reply-To: References: Message-ID: <14f0d81d-db04-6af4-3630-c5ddb631d9e0@redhat.com> On 01/10/2018 11:04 PM, Roman Kennke wrote: > When I implemented partial-GC, I complicated ShenandoahHeap::requires_marking(). This method > basically decides which oops on SATB queues to keep and which to discard when filtering SATB queues > (before processing them). The idea behind this was that we need all oops on the queues for > partial-GC, but only not-yet-marked oops during concurrent marking. > > Work on traversal GC lead to a similar problem, and then it occurred to me that we don't need to > complicate that code at all: during partial GC, we never use the bitmap. Simply returning > !is_marked_next(obj) does the same as return true, and is probably. This should restore a little bit > of performance of the regular Shenandoah mode, at the potential cost of a little bit of performance > for partial GC. However, I am not even sure that this code is performance critical at all. I > couldn't see any performance changes. > > http://cr.openjdk.java.net/~rkennke/req-marking/webrev.00/ > > Testing: hotspot_gc_shenandoah > > Ok to push? That makes sense. Looks good. Thanks, -Aleksey From shade at redhat.com Thu Jan 11 10:51:24 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 11 Jan 2018 11:51:24 +0100 Subject: Perf: SATB and WB coalescing In-Reply-To: <259d13bd-3543-2de2-028b-42d855797ffc@redhat.com> References: <5d967fd8-1cf3-d42d-6ad0-a3401fa23fbb@redhat.com> <4bde5ea8-4d85-ca74-dc76-6214f23d5823@redhat.com> <219f8962-fc13-765a-4239-c8f4b92bdf22@redhat.com> <59e36329-c6d2-0ce7-ba76-29319274072e@redhat.com> <93b69a79-a645-1bd0-bd0f-1fffbd32aaec@redhat.com> <0acac6bd-fddc-3325-fe0d-8e60e466dbc3@redhat.com> <259d13bd-3543-2de2-028b-42d855797ffc@redhat.com> Message-ID: On 01/10/2018 09:29 PM, Aleksey Shipilev wrote: > Okay, so the dirty patch for the idea: > http://cr.openjdk.java.net/~shade/shenandoah/single-flag/webrev.00/ > > perfasm for the offending test: > http://cr.openjdk.java.net/~shade/shenandoah/single-flag/single-flag.perfasm > > *) Can we instruct compiler to trust the value of 0x3d8(%r15) until the next safepoint poll? I > think that would eliminate excessive L1 accesses for that TLS field at expense of wasting a register > -- which might be the lesser evil; Hey, this one works with the dirty hack like this: http://cr.openjdk.java.net/~shade/shenandoah/perf-wb-satb/common-single-flag.patch It now drags commons GC state loads (and puts in the register): http://cr.openjdk.java.net/~shade/shenandoah/perf-wb-satb/WB-SATB-commonTLS.perfasm ...and this eliminates around 8 L1 reads, that recovers 50% of the overhead: Benchmark Mode Cnt Score Error Units # -WB -SATB BarriersMultiple.test avgt 15 2.760 ? 0.081 ns/op BarriersMultiple.test:L1-dcache-loads avgt 3 13.121 ? 0.444 #/op BarriersMultiple.test:L1-dcache-stores avgt 3 8.089 ? 0.141 #/op BarriersMultiple.test:branches avgt 3 4.039 ? 0.220 #/op BarriersMultiple.test:cycles avgt 3 10.429 ? 2.041 #/op BarriersMultiple.test:instructions avgt 3 30.306 ? 2.414 #/op # +WB +SATB BarriersMultiple.test avgt 15 4.897 ? 0.003 ns/op BarriersMultiple.test:L1-dcache-loads avgt 3 28.195 ? 0.838 #/op BarriersMultiple.test:L1-dcache-stores avgt 3 8.102 ? 0.274 #/op BarriersMultiple.test:branches avgt 3 13.074 ? 0.344 #/op BarriersMultiple.test:cycles avgt 3 18.492 ? 2.365 #/op BarriersMultiple.test:instructions avgt 3 56.423 ? 1.681 #/op # +WB +SATB +TLS commoning BarriersMultiple.test avgt 15 3.884 ? 0.003 ns/op BarriersMultiple.test:L1-dcache-loads avgt 3 20.221 ? 0.602 #/op // -8! BarriersMultiple.test:L1-dcache-stores avgt 3 8.093 ? 0.264 #/op BarriersMultiple.test:branches avgt 3 13.133 ? 0.395 #/op BarriersMultiple.test:cycles avgt 3 14.668 ? 0.771 #/op // -4! BarriersMultiple.test:instructions avgt 3 58.636 ? 2.368 #/op Thanks, -Aleksey From roman at kennke.org Thu Jan 11 11:17:03 2018 From: roman at kennke.org (roman at kennke.org) Date: Thu, 11 Jan 2018 11:17:03 +0000 Subject: hg: shenandoah/jdk10: Simplify and optimize ShenandoahHeap::requires_marking() Message-ID: <201801111117.w0BBH3fb027627@aojmv0008.oracle.com> Changeset: 2795496dbaf3 Author: rkennke Date: 2018-01-11 12:12 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk10/rev/2795496dbaf3 Simplify and optimize ShenandoahHeap::requires_marking() ! src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp ! src/hotspot/share/gc/shenandoah/shenandoahPartialGC.cpp From rkennke at redhat.com Thu Jan 11 11:19:50 2018 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 11 Jan 2018 12:19:50 +0100 Subject: Perf: SATB and WB coalescing In-Reply-To: References: <5d967fd8-1cf3-d42d-6ad0-a3401fa23fbb@redhat.com> <4bde5ea8-4d85-ca74-dc76-6214f23d5823@redhat.com> <219f8962-fc13-765a-4239-c8f4b92bdf22@redhat.com> <59e36329-c6d2-0ce7-ba76-29319274072e@redhat.com> <93b69a79-a645-1bd0-bd0f-1fffbd32aaec@redhat.com> <0acac6bd-fddc-3325-fe0d-8e60e466dbc3@redhat.com> <259d13bd-3543-2de2-028b-42d855797ffc@redhat.com> Message-ID: Am 11.01.2018 um 11:51 schrieb Aleksey Shipilev: > On 01/10/2018 09:29 PM, Aleksey Shipilev wrote: >> Okay, so the dirty patch for the idea: >> http://cr.openjdk.java.net/~shade/shenandoah/single-flag/webrev.00/ >> >> perfasm for the offending test: >> http://cr.openjdk.java.net/~shade/shenandoah/single-flag/single-flag.perfasm >> >> *) Can we instruct compiler to trust the value of 0x3d8(%r15) until the next safepoint poll? I >> think that would eliminate excessive L1 accesses for that TLS field at expense of wasting a register >> -- which might be the lesser evil; > > Hey, this one works with the dirty hack like this: > http://cr.openjdk.java.net/~shade/shenandoah/perf-wb-satb/common-single-flag.patch > > It now drags commons GC state loads (and puts in the register): > http://cr.openjdk.java.net/~shade/shenandoah/perf-wb-satb/WB-SATB-commonTLS.perfasm > > ...and this eliminates around 8 L1 reads, that recovers 50% of the overhead: > > Benchmark Mode Cnt Score Error Units > > # -WB -SATB > BarriersMultiple.test avgt 15 2.760 ? 0.081 ns/op > BarriersMultiple.test:L1-dcache-loads avgt 3 13.121 ? 0.444 #/op > BarriersMultiple.test:L1-dcache-stores avgt 3 8.089 ? 0.141 #/op > BarriersMultiple.test:branches avgt 3 4.039 ? 0.220 #/op > BarriersMultiple.test:cycles avgt 3 10.429 ? 2.041 #/op > BarriersMultiple.test:instructions avgt 3 30.306 ? 2.414 #/op > > # +WB +SATB > BarriersMultiple.test avgt 15 4.897 ? 0.003 ns/op > BarriersMultiple.test:L1-dcache-loads avgt 3 28.195 ? 0.838 #/op > BarriersMultiple.test:L1-dcache-stores avgt 3 8.102 ? 0.274 #/op > BarriersMultiple.test:branches avgt 3 13.074 ? 0.344 #/op > BarriersMultiple.test:cycles avgt 3 18.492 ? 2.365 #/op > BarriersMultiple.test:instructions avgt 3 56.423 ? 1.681 #/op > > # +WB +SATB +TLS commoning > BarriersMultiple.test avgt 15 3.884 ? 0.003 ns/op > BarriersMultiple.test:L1-dcache-loads avgt 3 20.221 ? 0.602 #/op // -8! > BarriersMultiple.test:L1-dcache-stores avgt 3 8.093 ? 0.264 #/op > BarriersMultiple.test:branches avgt 3 13.133 ? 0.395 #/op > BarriersMultiple.test:cycles avgt 3 14.668 ? 0.771 #/op // -4! > BarriersMultiple.test:instructions avgt 3 58.636 ? 2.368 #/op > > > Thanks, > -Aleksey > Ok, this basically makes the load of the flag appear to access immutable memory. It can now basically freely float above or below safepoints. We need to ensure that this cannot happen, otherwise we'll see the wrong flag state. But it seems to be step #1. Maybe restore the control into the LoadUBNode is enough to keep it at the right side of safepoints? Roman From shade at redhat.com Thu Jan 11 11:35:04 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 11 Jan 2018 12:35:04 +0100 Subject: Perf: SATB and WB coalescing In-Reply-To: References: <5d967fd8-1cf3-d42d-6ad0-a3401fa23fbb@redhat.com> <4bde5ea8-4d85-ca74-dc76-6214f23d5823@redhat.com> <219f8962-fc13-765a-4239-c8f4b92bdf22@redhat.com> <59e36329-c6d2-0ce7-ba76-29319274072e@redhat.com> <93b69a79-a645-1bd0-bd0f-1fffbd32aaec@redhat.com> <0acac6bd-fddc-3325-fe0d-8e60e466dbc3@redhat.com> <259d13bd-3543-2de2-028b-42d855797ffc@redhat.com> Message-ID: <12e9d1a4-c039-9ab8-214f-205306759ad4@redhat.com> On 01/11/2018 12:19 PM, Roman Kennke wrote: > Am 11.01.2018 um 11:51 schrieb Aleksey Shipilev: >> On 01/10/2018 09:29 PM, Aleksey Shipilev wrote: >>> Okay, so the dirty patch for the idea: >>> ?? http://cr.openjdk.java.net/~shade/shenandoah/single-flag/webrev.00/ >>> > >>> perfasm for the offending test: >>> ?? http://cr.openjdk.java.net/~shade/shenandoah/single-flag/single-flag.perfasm >>> >>> ? *) Can we instruct compiler to trust the value of 0x3d8(%r15) until the next safepoint poll? I >>> think that would eliminate excessive L1 accesses for that TLS field at expense of wasting a register >>> -- which might be the lesser evil; >> >> Hey, this one works with the dirty hack like this: >> ?? http://cr.openjdk.java.net/~shade/shenandoah/perf-wb-satb/common-single-flag.patch >> >> It now drags commons GC state loads (and puts in the register): >> ?? http://cr.openjdk.java.net/~shade/shenandoah/perf-wb-satb/WB-SATB-commonTLS.perfasm >> > > Ok, this basically makes the load of the flag appear to access immutable memory. It can now > basically freely float above or below safepoints. We need to ensure that this cannot happen, > otherwise we'll see the wrong flag state. But it seems to be step #1. Maybe restore the control into > the LoadUBNode is enough to keep it at the right side of safepoints? That was basically a hack to see if the idea is profitable. It appears profitable. In addition to that safepoint caveat, I had to disable WB coalescing, because the hack produces broken graph otherwise, and C2 asserts. Roland said he can sketch the real patch some time later. Meanwhile, I'd go and prepare the base patch for single-flag that TLS coalescing thing implicitly relies on. We can try other hacks if Roland has no cycles to look at it, after the base patch is done. -Aleksey From rkennke at redhat.com Thu Jan 11 11:51:58 2018 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 11 Jan 2018 12:51:58 +0100 Subject: Perf: SATB and WB coalescing In-Reply-To: <12e9d1a4-c039-9ab8-214f-205306759ad4@redhat.com> References: <5d967fd8-1cf3-d42d-6ad0-a3401fa23fbb@redhat.com> <4bde5ea8-4d85-ca74-dc76-6214f23d5823@redhat.com> <219f8962-fc13-765a-4239-c8f4b92bdf22@redhat.com> <59e36329-c6d2-0ce7-ba76-29319274072e@redhat.com> <93b69a79-a645-1bd0-bd0f-1fffbd32aaec@redhat.com> <0acac6bd-fddc-3325-fe0d-8e60e466dbc3@redhat.com> <259d13bd-3543-2de2-028b-42d855797ffc@redhat.com> <12e9d1a4-c039-9ab8-214f-205306759ad4@redhat.com> Message-ID: <2697760b-840b-5290-496e-084b589a9459@redhat.com> Am 11.01.2018 um 12:35 schrieb Aleksey Shipilev: > On 01/11/2018 12:19 PM, Roman Kennke wrote: >> Am 11.01.2018 um 11:51 schrieb Aleksey Shipilev: >>> On 01/10/2018 09:29 PM, Aleksey Shipilev wrote: >>>> Okay, so the dirty patch for the idea: >>>> ?? http://cr.openjdk.java.net/~shade/shenandoah/single-flag/webrev.00/ >>>> >> >>>> perfasm for the offending test: >>>> ?? http://cr.openjdk.java.net/~shade/shenandoah/single-flag/single-flag.perfasm >>>> >>>> ? *) Can we instruct compiler to trust the value of 0x3d8(%r15) until the next safepoint poll? I >>>> think that would eliminate excessive L1 accesses for that TLS field at expense of wasting a register >>>> -- which might be the lesser evil; >>> >>> Hey, this one works with the dirty hack like this: >>> ?? http://cr.openjdk.java.net/~shade/shenandoah/perf-wb-satb/common-single-flag.patch >>> >>> It now drags commons GC state loads (and puts in the register): >>> ?? http://cr.openjdk.java.net/~shade/shenandoah/perf-wb-satb/WB-SATB-commonTLS.perfasm >>> >> >> Ok, this basically makes the load of the flag appear to access immutable memory. It can now >> basically freely float above or below safepoints. We need to ensure that this cannot happen, >> otherwise we'll see the wrong flag state. But it seems to be step #1. Maybe restore the control into >> the LoadUBNode is enough to keep it at the right side of safepoints? > > That was basically a hack to see if the idea is profitable. It appears profitable. In addition to > that safepoint caveat, I had to disable WB coalescing, because the hack produces broken graph > otherwise, and C2 asserts. Roland said he can sketch the real patch some time later. Meanwhile, I'd > go and prepare the base patch for single-flag that TLS coalescing thing implicitly relies on. We can > try other hacks if Roland has no cycles to look at it, after the base patch is done. > > -Aleksey > Yeah ok. I tried your hack with traversal GC. It does work, and I think I see some little improvement, but I guess the disabled optimization off-sets it a little. I'll clean up the traversal GC and propose it soon-ish. It's not useful to have it wait in limbo until all possible optimizations are in place. Performance is already quite good (and exceeds default shenandoah for some workloads too, and looses some other workloads). Thanks and cheers, Roman From zgu at redhat.com Thu Jan 11 15:09:31 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Thu, 11 Jan 2018 10:09:31 -0500 Subject: RFR: [9 Backport] Shenandoah string deduplication support In-Reply-To: <91279636-d93e-7232-9384-467d93f04c4a@redhat.com> References: <734df127-06b4-d3c9-10ab-2d075bf5a41b@redhat.com> <211e8fe2-cffd-a713-a42c-c95452dce4ba@redhat.com> <91279636-d93e-7232-9384-467d93f04c4a@redhat.com> Message-ID: <47c6d051-8440-5af8-4961-e735e922f6c1@redhat.com> Hi Roman, Could you review this backport? Thanks, -Zhengyu On 01/10/2018 05:23 PM, Aleksey Shipilev wrote: > On 01/10/2018 10:31 PM, Zhengyu Gu wrote: >> >> On 01/10/2018 04:24 PM, Aleksey Shipilev wrote: >>> On 01/10/2018 10:18 PM, Zhengyu Gu wrote: >>>> Webrev: http://cr.openjdk.java.net/~zgu/shenandoah/sh_strdedup/backport_jdk9/webrev.00/ >>> >>> Thank you, this looks good. >>> >>> How much was changed compared to sh/jdk10, and are there places we should take a special look at? >>> I assume G1 changes are the retraction to the upstream jdk9 state? >>> >> >> jdk10 patch applied pretty clean. The only file got mismerged was shenandoahRootProcessor.cpp. >> >> Yes, G1 changes reverted early changes back to upstream state. > > Good for me then! > > -Aleksey > > From roman at kennke.org Thu Jan 11 15:43:16 2018 From: roman at kennke.org (Roman Kennke) Date: Thu, 11 Jan 2018 16:43:16 +0100 Subject: RFR: [9 Backport] Shenandoah string deduplication support In-Reply-To: <47c6d051-8440-5af8-4961-e735e922f6c1@redhat.com> References: <734df127-06b4-d3c9-10ab-2d075bf5a41b@redhat.com> <211e8fe2-cffd-a713-a42c-c95452dce4ba@redhat.com> <91279636-d93e-7232-9384-467d93f04c4a@redhat.com> <47c6d051-8440-5af8-4961-e735e922f6c1@redhat.com> Message-ID: <37D74C17-294C-4096-8C7C-B160417922A9@kennke.org> Yes, in a few hours (I hope...) Am 11. Januar 2018 16:09:31 MEZ schrieb Zhengyu Gu : >Hi Roman, > >Could you review this backport? > >Thanks, > >-Zhengyu > >On 01/10/2018 05:23 PM, Aleksey Shipilev wrote: >> On 01/10/2018 10:31 PM, Zhengyu Gu wrote: >>> >>> On 01/10/2018 04:24 PM, Aleksey Shipilev wrote: >>>> On 01/10/2018 10:18 PM, Zhengyu Gu wrote: >>>>> Webrev: >http://cr.openjdk.java.net/~zgu/shenandoah/sh_strdedup/backport_jdk9/webrev.00/ >>>> >>>> Thank you, this looks good. >>>> >>>> How much was changed compared to sh/jdk10, and are there places we >should take a special look at? >>>> I assume G1 changes are the retraction to the upstream jdk9 state? >>>> >>> >>> jdk10 patch applied pretty clean. The only file got mismerged was >shenandoahRootProcessor.cpp. >>> >>> Yes, G1 changes reverted early changes back to upstream state. >> >> Good for me then! >> >> -Aleksey >> >> -- Diese Nachricht wurde von meinem Android-Ger?t mit K-9 Mail gesendet. From rkennke at redhat.com Thu Jan 11 20:59:29 2018 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 11 Jan 2018 21:59:29 +0100 Subject: RFR: [9 Backport] Shenandoah string deduplication support In-Reply-To: <47c6d051-8440-5af8-4961-e735e922f6c1@redhat.com> References: <734df127-06b4-d3c9-10ab-2d075bf5a41b@redhat.com> <211e8fe2-cffd-a713-a42c-c95452dce4ba@redhat.com> <91279636-d93e-7232-9384-467d93f04c4a@redhat.com> <47c6d051-8440-5af8-4961-e735e922f6c1@redhat.com> Message-ID: <46d2f90a-29ff-a729-42f9-672f33e93568@redhat.com> Looks good to me. Thank you for doing this! Roman > Hi Roman, > > Could you review this backport? > > Thanks, > > -Zhengyu > > On 01/10/2018 05:23 PM, Aleksey Shipilev wrote: >> On 01/10/2018 10:31 PM, Zhengyu Gu wrote: >>> >>> On 01/10/2018 04:24 PM, Aleksey Shipilev wrote: >>>> On 01/10/2018 10:18 PM, Zhengyu Gu wrote: >>>>> Webrev: >>>>> http://cr.openjdk.java.net/~zgu/shenandoah/sh_strdedup/backport_jdk9/webrev.00/ >>>>> >>>> >>>> Thank you, this looks good. >>>> >>>> How much was changed compared to sh/jdk10, and are there places we >>>> should take a special look at? >>>> I assume G1 changes are the retraction to the upstream jdk9 state? >>>> >>> >>> jdk10 patch applied pretty clean. The only file got mismerged was >>> shenandoahRootProcessor.cpp. >>> >>> Yes, G1 changes reverted early changes back to upstream state. >> >> Good for me then! >> >> -Aleksey >> >> From zgu at redhat.com Thu Jan 11 21:14:38 2018 From: zgu at redhat.com (zgu at redhat.com) Date: Thu, 11 Jan 2018 21:14:38 +0000 Subject: hg: shenandoah/jdk9/hotspot: [Backport] Shenandoah string deduplication support Message-ID: <201801112114.w0BLEc9f022904@aojmv0008.oracle.com> Changeset: 917523f492d2 Author: zgu Date: 2018-01-11 16:10 -0500 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/917523f492d2 [Backport] Shenandoah string deduplication support ! src/cpu/aarch64/vm/stubGenerator_aarch64.cpp ! src/cpu/x86/vm/stubGenerator_x86_64.cpp ! src/share/vm/classfile/stringTable.cpp ! src/share/vm/gc/g1/g1StringDedup.hpp ! src/share/vm/gc/g1/g1StringDedupQueue.cpp ! src/share/vm/gc/g1/g1StringDedupQueue.hpp ! src/share/vm/gc/g1/g1StringDedupTable.cpp ! src/share/vm/gc/g1/g1StringDedupThread.cpp ! src/share/vm/gc/g1/g1StringDedupThread.hpp ! src/share/vm/gc/shenandoah/shenandoahConcurrentMark.cpp ! src/share/vm/gc/shenandoah/shenandoahConcurrentMark.hpp ! src/share/vm/gc/shenandoah/shenandoahConcurrentMark.inline.hpp ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc/shenandoah/shenandoahHeap.inline.hpp ! src/share/vm/gc/shenandoah/shenandoahMarkCompact.cpp ! src/share/vm/gc/shenandoah/shenandoahOopClosures.hpp ! src/share/vm/gc/shenandoah/shenandoahOopClosures.inline.hpp ! src/share/vm/gc/shenandoah/shenandoahPhaseTimings.cpp ! src/share/vm/gc/shenandoah/shenandoahPhaseTimings.hpp ! src/share/vm/gc/shenandoah/shenandoahRootProcessor.cpp + src/share/vm/gc/shenandoah/shenandoahStrDedupQueue.cpp + src/share/vm/gc/shenandoah/shenandoahStrDedupQueue.hpp + src/share/vm/gc/shenandoah/shenandoahStrDedupQueue.inline.hpp + src/share/vm/gc/shenandoah/shenandoahStrDedupTable.cpp + src/share/vm/gc/shenandoah/shenandoahStrDedupTable.hpp + src/share/vm/gc/shenandoah/shenandoahStrDedupThread.cpp + src/share/vm/gc/shenandoah/shenandoahStrDedupThread.hpp ! src/share/vm/gc/shenandoah/shenandoahStringDedup.cpp ! src/share/vm/gc/shenandoah/shenandoahStringDedup.hpp ! src/share/vm/runtime/arguments.cpp ! src/share/vm/runtime/mutexLocker.cpp ! test/gc/shenandoah/ShenandoahStrDedupStress.java ! test/gc/shenandoah/TestShenandoahStrDedup.java From shade at redhat.com Fri Jan 12 11:11:49 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 12 Jan 2018 12:11:49 +0100 Subject: RFR: Single thread-local GC state flag for all barriers Message-ID: http://cr.openjdk.java.net/~shade/shenandoah/single-flag/webrev.01/ Please review this carefully. I did AArch64 change blindly, symmetric with x86. Taking care of SATB check requires some specialization in g1_wb_pre, and the relevant compiler matching changes. This would get convenient as we common the gc-state load between the safepoints, and that would not touch G1 SATB barriers then. Example disassembly, (0x3d8(%r15) is our flag): http://cr.openjdk.java.net/~shade/shenandoah/single-flag/single-flag.perfasm Testing: hotspot_gc_shenandoah, eyeballing generated code Thanks, -Aleksey From rwestrel at redhat.com Fri Jan 12 14:27:13 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 12 Jan 2018 15:27:13 +0100 Subject: RFR: leverage profiling for tableswitch/lookup switch Message-ID: http://cr.openjdk.java.net/~roland/shenandoah/switch-profiling/webrev.00/ This change is independent of shenandoah but the plan is to have it bake for a bit here before it's proposed upstream. This is a follow up to: http://mail.openjdk.java.net/pipermail/shenandoah-dev/2017-December/004535.html 1) profile collection is fixed with c1 2) C2 uses profiling to set frequencies of the branches of the switch 3) the tree of choices is trimmed down (if some branches are never taken) 4) the backend uses frequencies from profiling so scheduling is not messed up We saw that not having 4) messes up loop strip mining. 3) totally flies with the microbenchmarks: before with -XX:+UseShenandoahGC: WriteBarrierTableSwitch.common 1000 avgt 15 1109.139 ? 9.030 ns/op WriteBarrierTableSwitch.separate 1000 avgt 15 2383.219 ? 229.815 ns/op after with -XX:+UseShenandoahGC: WriteBarrierTableSwitch.common 1000 avgt 15 514.100 ? 20.067 ns/op WriteBarrierTableSwitch.separate 1000 avgt 15 505.883 ? 14.498 ns/op I have another patch coming that should help this microbenchmark when more than one branch of the switch is taken. Roland. From shade at redhat.com Fri Jan 12 15:20:22 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 12 Jan 2018 16:20:22 +0100 Subject: RFR: leverage profiling for tableswitch/lookup switch In-Reply-To: References: Message-ID: On 01/12/2018 03:27 PM, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/shenandoah/switch-profiling/webrev.00/ Not qualified of judging on intricate details for C2, so cursory review: *) Maybe we should guard the feature with "chicken" diagnostic flag, like ShenandoahTableSwitchProfiling? This would also mark the paths we would need to remove/refresh once the change trickles down from upstream. *) gcm.cpp, comment is outdated: 1870 // Divide the frequency between all successors evenly *) parse2.cpp, this one is just table[3*j + 0], etc: 498 table[j+j+j+0] = iter().get_int_table(2+j+j); 499 table[j+j+j+1] = iter().get_dest_table(2+j+j+1); 500 table[j+j+j+2] = profile == NULL ? 1 : profile->count_at(j); 527 jint match_int = table[j+j+j+0]; 528 int dest = table[j+j+j+1]; 529 int cnt = table[j+j+j+2]; Thanks, -Aleksey From shade at redhat.com Fri Jan 12 15:57:59 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 12 Jan 2018 16:57:59 +0100 Subject: RFR: Single thread-local GC state flag for all barriers In-Reply-To: References: Message-ID: On 01/12/2018 12:11 PM, Aleksey Shipilev wrote: > http://cr.openjdk.java.net/~shade/shenandoah/single-flag/webrev.01/ > > Please review this carefully. I did AArch64 change blindly, symmetric with x86. Taking care of SATB > check requires some specialization in g1_wb_pre, and the relevant compiler matching changes. This > would get convenient as we common the gc-state load between the safepoints, and that would not touch > G1 SATB barriers then. > > Example disassembly, (0x3d8(%r15) is our flag): > http://cr.openjdk.java.net/~shade/shenandoah/single-flag/single-flag.perfasm > > Testing: hotspot_gc_shenandoah, eyeballing generated code Renamed need_update_refs to is_unstable -- this captures the intent better, and updated comments a little: http://cr.openjdk.java.net/~shade/shenandoah/single-flag/webrev.02/ Thanks, -Aleksey From rwestrel at redhat.com Fri Jan 12 16:05:17 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 12 Jan 2018 17:05:17 +0100 Subject: RFR: leverage profiling for tableswitch/lookup switch In-Reply-To: References: Message-ID: > *) Maybe we should guard the feature with "chicken" diagnostic flag, like > ShenandoahTableSwitchProfiling? This would also mark the paths we would need to remove/refresh once > the change trickles down from upstream. Sure but would you want all code paths that were changed to be guarded by a new flag? That sounds a bit overkill to me. > *) gcm.cpp, comment is outdated: > 1870 // Divide the frequency between all successors evenly Right. > *) parse2.cpp, this one is just table[3*j + 0], etc: > > 498 table[j+j+j+0] = iter().get_int_table(2+j+j); > 499 table[j+j+j+1] = iter().get_dest_table(2+j+j+1); > 500 table[j+j+j+2] = profile == NULL ? 1 : profile->count_at(j); > > 527 jint match_int = table[j+j+j+0]; > 528 int dest = table[j+j+j+1]; > 529 int cnt = table[j+j+j+2]; The original code insists on using j+j instead of 2*j, I suppose because it emphasizes that each element is really a different field or something like that. I followed along. Anyway I can change it if you like. Roland. From shade at redhat.com Fri Jan 12 16:10:40 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 12 Jan 2018 17:10:40 +0100 Subject: RFR: leverage profiling for tableswitch/lookup switch In-Reply-To: References: Message-ID: <56642b43-0540-7f35-65b8-37e51f5b6a80@redhat.com> On 01/12/2018 05:05 PM, Roland Westrelin wrote: > >> *) Maybe we should guard the feature with "chicken" diagnostic flag, like >> ShenandoahTableSwitchProfiling? This would also mark the paths we would need to remove/refresh once >> the change trickles down from upstream. > > Sure but would you want all code paths that were changed to be guarded > by a new flag? That sounds a bit overkill to me. I guess the entry in LIRGenerator::do_TableSwitch, LIRGenerator::do_LookupSwitch and Parse::create_jump_tables would be enough? >> *) parse2.cpp, this one is just table[3*j + 0], etc: >> >> 498 table[j+j+j+0] = iter().get_int_table(2+j+j); >> 499 table[j+j+j+1] = iter().get_dest_table(2+j+j+1); >> 500 table[j+j+j+2] = profile == NULL ? 1 : profile->count_at(j); >> >> 527 jint match_int = table[j+j+j+0]; >> 528 int dest = table[j+j+j+1]; >> 529 int cnt = table[j+j+j+2]; > > The original code insists on using j+j instead of 2*j, I suppose because > it emphasizes that each element is really a different field or something > like that. I followed along. Anyway I can change it if you like. This looks like a plain Array-of-Structs, and the usual idiom is len*sIndex + off, so yeah, "j+j+j" looks very odd. I suspect "j+j" was the attempt at microoptimization? I would guess upstream would suggest the same change. Thanks, -Aleksey From rwestrel at redhat.com Fri Jan 12 16:54:42 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 12 Jan 2018 17:54:42 +0100 Subject: RFR: improve profiled predicates Message-ID: http://cr.openjdk.java.net/~roland/shenandoah/improved-profiled-predicates/webrev.00/ This change should make profiled predicates robuster: - they now handle more control flow constructs - instead of bailing out when profiling is missing, they assume it's an untaken path which allows frequencies to be computed on all paths - they now support profiling data from lookupswitch/tableswitch (on Jump nodes) - if a profiled predicate traps, the trap is recorded separately from regular predicate traps. The fallback will then be to recompile without profiled predicates but with regular predicates (instead of disabling predicates entirely). That change requires changing how traps are recorded and that's why there are small changes spread over so many files. Roland. From rkennke at redhat.com Fri Jan 12 22:06:51 2018 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 12 Jan 2018 23:06:51 +0100 Subject: RFR: Single thread-local GC state flag for all barriers In-Reply-To: References: Message-ID: Am 12.01.2018 um 16:57 schrieb Aleksey Shipilev: > On 01/12/2018 12:11 PM, Aleksey Shipilev wrote: >> http://cr.openjdk.java.net/~shade/shenandoah/single-flag/webrev.01/ >> >> Please review this carefully. I did AArch64 change blindly, symmetric with x86. Taking care of SATB >> check requires some specialization in g1_wb_pre, and the relevant compiler matching changes. This >> would get convenient as we common the gc-state load between the safepoints, and that would not touch >> G1 SATB barriers then. >> >> Example disassembly, (0x3d8(%r15) is our flag): >> http://cr.openjdk.java.net/~shade/shenandoah/single-flag/single-flag.perfasm >> >> Testing: hotspot_gc_shenandoah, eyeballing generated code > > Renamed need_update_refs to is_unstable -- this captures the intent better, and updated comments a > little: > http://cr.openjdk.java.net/~shade/shenandoah/single-flag/webrev.02/ > > Thanks, > -Aleksey > > This is great stuff! One little note: traversal GC will only have one state, and thus doesn't need the masking. I need to see how that fits with this new code :-) I only have fairly minor comments: src/hotspot/cpu/x86/macroAssembler_x86.cpp: if (ShenandoahConditionalSATBBarrier) { Label done; - movptr(tmp, (intptr_t) ShenandoahHeap::concurrent_mark_in_progress_addr()); - testb(Address(tmp, 0), 1); + movptr(tmp, (intptr_t) ShenandoahHeap::gc_state_addr()); + testb(Address(tmp, 0), ShenandoahHeap::MARKING); Can't we use the thread-local flag here? There are several occurances of the pattern in that file. src/hotspot/share/c1/c1_LIRGenerator.cpp: same as above? src/hotspot/share/opto/graphKit.cpp same as above? This stuff probably qualifies for a separate patch, as it diverges from previous code. ------------- need_update_refs() -> is_unstable(): I don't think this captures the intent better. The former, I know right away what it means, the latter means I need to look up what 'unstable' means in our context. --------------- C2 changes look ok from afar, but Roland should look at them too. ------------- src/hotspot/share/runtime/thread.hpp: + // Support for Shenandoah barriers + static char _gc_state_global; + char _gc_state; Little side-note: upstream got rid of the static stuff for SATB state and moved it into the collector. It's not merged into Shenandoah yet (we really need to update our codebase!!) Maybe we should do the same? Keep global state in ShenandoahHeap and ThreadLocal state in Thread, and nowhere else. ------------- AArch64 looks reasonable, but should probably be built and tested once? I don't have resources right now to do that though. Everything else looks great! From shade at redhat.com Sat Jan 13 09:51:00 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Sat, 13 Jan 2018 10:51:00 +0100 Subject: Perf: WB without RB on fastpath Message-ID: <821a291c-73d2-36d2-3f43-ee70b43ab0ae@redhat.com> The single flag change opens up an interesting opportunity for us: we can check for the GC state to be zero, and that means no barriers are required whatsoever. So, instead of doing: testb $0x4, 0x3d8(TLS) jnz EVAC-IN-PROGRESS mov %r, -0x8(%r) DONE: ... (later) EVAC-IN-PROGRESS: ...we can do: cmpb $0x0, 0x3d8(TLS) jne NON-STABLE-HEAP DONE: ... (later) NON-STABLE HEAP: test $0x4, 0x3d8(TLS) jz DONE So the fastpath is the same, we just test against different value. Slowpath gets a bit slower. The performance improvement can be estimated with passive, -XX:+ShWB and -XX:(+|-)ShWriteBarrierRB. Overnight runs translate to: Compiler.compiler: +1.0% Compiler.sunflow: +1.2% Compress: +2.6% CryptoSignVerify: +0.3% MpegAudio: +1.9% ScimarkLU.large: +4.8% ScimarkLU.small: +9.5% XmlTransform: +1.6% XmlValidation: +2.5% ...and no regressions! Roman mentions separately that Traversal GC does not require RB at all on fastpath, which seems to be the special case of this generic optimization. Thanks, -Aleksey From shade at redhat.com Mon Jan 15 11:05:02 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 15 Jan 2018 12:05:02 +0100 Subject: RFR: Single thread-local GC state flag for all barriers In-Reply-To: References: Message-ID: <4c0cc38f-5a4c-a604-b574-20c3af9078ab@redhat.com> Updated patch: http://cr.openjdk.java.net/~shade/shenandoah/single-flag/webrev.03/ On 01/12/2018 11:06 PM, Roman Kennke wrote: > One little note: traversal GC will only have one state, and thus doesn't need the masking. I need to > see how that fits with this new code :-) So does partial now. You can add another constant to ShenandoahHeap::GCState, and act accordingly. > I only have fairly minor comments: > > src/hotspot/cpu/x86/macroAssembler_x86.cpp: > > ?? if (ShenandoahConditionalSATBBarrier) { > ???? Label done; > -??? movptr(tmp, (intptr_t) ShenandoahHeap::concurrent_mark_in_progress_addr()); > -??? testb(Address(tmp, 0), 1); > +??? movptr(tmp, (intptr_t) ShenandoahHeap::gc_state_addr()); > +??? testb(Address(tmp, 0), ShenandoahHeap::MARKING); > > Can't we use the thread-local flag here? There are several occurances of the pattern in that file. We can use it here, fixed. > src/hotspot/share/c1/c1_LIRGenerator.cpp: > > same as above? Not really: the TLS access is complicated there. WB does it differently: it is accessing TLS flags after the actual lowering. Kept intact. > src/hotspot/share/opto/graphKit.cpp > > same as above? Tried to, but the change gets uncomfortably large. > need_update_refs() -> is_unstable(): > > I don't think this captures the intent better. The former, I know right away what it means, the > latter means I need to look up what 'unstable' means in our context. I still think need_update_refs is a bad name: it describes "what corrective action we should do", not "what the heap state is". This gets awkward when we choose not to do RB when need_update_refs is false. But I guess "is_unstable" is too generic, how about "has_forwarded_objects"? This clearly states what the heap state is, and makes it trivial to comprehend. > --------------- > > C2 changes look ok from afar, but Roland should look at them too. > > ------------- Roland, can you take a look? > src/hotspot/share/runtime/thread.hpp: > > +? // Support for Shenandoah barriers > +? static char _gc_state_global; > +? char _gc_state; > > Little side-note: upstream got rid of the static stuff for SATB state and moved it into the > collector. It's not merged into Shenandoah yet (we really need to update our codebase!!) Maybe we > should do the same? Keep global state in ShenandoahHeap and ThreadLocal state in Thread, and nowhere > else. This patch follows what we have with evac_in_progress -- we have static field in Thread, let's keep it that way for a time being. > AArch64 looks reasonable, but should probably be built and tested once? I don't have resources > right now to do that though. I have cross-compiled it to AArch64 and ran basic tests on RPi 3. It failed, and I discovered a few AArch64-specific bugs in the patch. Fixed them, and the basic tests run fine now. Thanks, -Aleksey From rkennke at redhat.com Mon Jan 15 11:33:34 2018 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 15 Jan 2018 12:33:34 +0100 Subject: RFR: Single thread-local GC state flag for all barriers In-Reply-To: <4c0cc38f-5a4c-a604-b574-20c3af9078ab@redhat.com> References: <4c0cc38f-5a4c-a604-b574-20c3af9078ab@redhat.com> Message-ID: <017badf7-f155-7642-25eb-d673879deb65@redhat.com> Am 15.01.2018 um 12:05 schrieb Aleksey Shipilev: > Updated patch: > http://cr.openjdk.java.net/~shade/shenandoah/single-flag/webrev.03/ > > > On 01/12/2018 11:06 PM, Roman Kennke wrote: >> One little note: traversal GC will only have one state, and thus doesn't need the masking. I need to >> see how that fits with this new code :-) > > So does partial now. You can add another constant to ShenandoahHeap::GCState, and act accordingly. Yes. I believe traversal is still different: with partial we can fall back to regular Shenandoah (intermediate GC), and thus need to check different states, and thus need the masking check. With traversal we'd never do that, and thus can use a simple zero/not-zero check. I suspect this would be a tiny little bit faster. ? But I'll figure this out once your patch is in. >> I only have fairly minor comments: >> >> src/hotspot/cpu/x86/macroAssembler_x86.cpp: >> >> ?? if (ShenandoahConditionalSATBBarrier) { >> ???? Label done; >> -??? movptr(tmp, (intptr_t) ShenandoahHeap::concurrent_mark_in_progress_addr()); >> -??? testb(Address(tmp, 0), 1); >> +??? movptr(tmp, (intptr_t) ShenandoahHeap::gc_state_addr()); >> +??? testb(Address(tmp, 0), ShenandoahHeap::MARKING); >> >> Can't we use the thread-local flag here? There are several occurances of the pattern in that file. > > We can use it here, fixed. Thanks! >> src/hotspot/share/c1/c1_LIRGenerator.cpp: >> >> same as above? > > Not really: the TLS access is complicated there. WB does it differently: it is accessing TLS flags > after the actual lowering. Kept intact. Ok. >> src/hotspot/share/opto/graphKit.cpp >> >> same as above? > > Tried to, but the change gets uncomfortably large. Ok, that is fine. Should do that later. (But I suspect it only affects partial GC stuff, and is thus not high priority.) >> need_update_refs() -> is_unstable(): >> >> I don't think this captures the intent better. The former, I know right away what it means, the >> latter means I need to look up what 'unstable' means in our context. > > I still think need_update_refs is a bad name: it describes "what corrective action we should do", > not "what the heap state is". This gets awkward when we choose not to do RB when need_update_refs is > false. But I guess "is_unstable" is too generic, how about "has_forwarded_objects"? This clearly > states what the heap state is, and makes it trivial to comprehend. Very good! > >> --------------- >> >> C2 changes look ok from afar, but Roland should look at them too. >> >> ------------- > > Roland, can you take a look? > >> src/hotspot/share/runtime/thread.hpp: >> >> +? // Support for Shenandoah barriers >> +? static char _gc_state_global; >> +? char _gc_state; >> >> Little side-note: upstream got rid of the static stuff for SATB state and moved it into the >> collector. It's not merged into Shenandoah yet (we really need to update our codebase!!) Maybe we >> should do the same? Keep global state in ShenandoahHeap and ThreadLocal state in Thread, and nowhere >> else. > > This patch follows what we have with evac_in_progress -- we have static field in Thread, let's keep > it that way for a time being. Yes. Let's change this once the upstream stuff arrives, and make it consistent then. I actually see a chance to upstream our stuff, and make G1 use the generic flag, or maybe even generify it even more and make room for generic thread local GC data structures. >> AArch64 looks reasonable, but should probably be built and tested once? I don't have resources >> right now to do that though. > I have cross-compiled it to AArch64 and ran basic tests on RPi 3. It failed, and I discovered a few > AArch64-specific bugs in the patch. Fixed them, and the basic tests run fine now. Very good. Patch looks ok for me now. Roman From shade at redhat.com Mon Jan 15 12:23:19 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 15 Jan 2018 13:23:19 +0100 Subject: RFR: Common TLS access to GC state, where possible Message-ID: <05a517dd-9293-b351-0406-8a6e7aa2ca3a@redhat.com> http://cr.openjdk.java.net/~shade/shenandoah/c2-common-gc-state/webrev.01/ (The initial version of this patch was drafted by Roland) This patch bases on single GC state flag patch. This enables us to match that load at once, and common all the loads of GC state between the safepoints, thus avoiding excess L1 cache accesses. This covers for the cases where we cannot move the barriers themselves, and thus improves the worst-case scenario. It sure helps targeted back-to-back store benchmarks: Benchmark Mode Cnt Score Error Units # default BarriersMultiple.test avgt 15 5.935 ? 0.003 ns/op BarriersMultiple.test:L1-dcache-loads avgt 3 35.420 ? 2.116 #/op BarriersMultiple.test:L1-dcache-stores avgt 3 9.082 ? 0.603 #/op BarriersMultiple.test:branches avgt 3 18.187 ? 1.005 #/op BarriersMultiple.test:cycles avgt 3 22.401 ? 1.249 #/op BarriersMultiple.test:instructions avgt 3 83.810 ? 4.297 #/op # -XX:+ShenandoahCommonGCStateLoads BarriersMultiple.test avgt 15 5.392 ? 0.116 ns/op BarriersMultiple.test:L1-dcache-loads avgt 3 26.302 ? 0.456 #/op // -9! BarriersMultiple.test:L1-dcache-stores avgt 3 9.078 ? 1.174 #/op BarriersMultiple.test:branches avgt 3 18.218 ? 0.092 #/op BarriersMultiple.test:cycles avgt 3 20.368 ? 3.023 #/op // -2 BarriersMultiple.test:instructions avgt 3 86.984 ? 1.127 #/op ...but comes with the caveat: the increased register pressure (?) seems to penalize some of the bigger workloads. To avoid bitrot, and get the matchers for GC state loads into our codebase, I propose pushing this under disabled experimental flag. New test validates the feature is not completely broken. Testing: hotspot_gc_shenandoah Thanks, -Aleksey From rkennke at redhat.com Mon Jan 15 12:27:56 2018 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 15 Jan 2018 13:27:56 +0100 Subject: RFR: Common TLS access to GC state, where possible In-Reply-To: <05a517dd-9293-b351-0406-8a6e7aa2ca3a@redhat.com> References: <05a517dd-9293-b351-0406-8a6e7aa2ca3a@redhat.com> Message-ID: <45ce3760-b6c1-5d31-bff6-52b41db0af99@redhat.com> Am 15.01.2018 um 13:23 schrieb Aleksey Shipilev: > http://cr.openjdk.java.net/~shade/shenandoah/c2-common-gc-state/webrev.01/ > (The initial version of this patch was drafted by Roland) > > This patch bases on single GC state flag patch. This enables us to match that load at once, and > common all the loads of GC state between the safepoints, thus avoiding excess L1 cache accesses. > This covers for the cases where we cannot move the barriers themselves, and thus improves the > worst-case scenario. > > It sure helps targeted back-to-back store benchmarks: > > Benchmark Mode Cnt Score Error Units > > # default > BarriersMultiple.test avgt 15 5.935 ? 0.003 ns/op > BarriersMultiple.test:L1-dcache-loads avgt 3 35.420 ? 2.116 #/op > BarriersMultiple.test:L1-dcache-stores avgt 3 9.082 ? 0.603 #/op > BarriersMultiple.test:branches avgt 3 18.187 ? 1.005 #/op > BarriersMultiple.test:cycles avgt 3 22.401 ? 1.249 #/op > BarriersMultiple.test:instructions avgt 3 83.810 ? 4.297 #/op > > # -XX:+ShenandoahCommonGCStateLoads > BarriersMultiple.test avgt 15 5.392 ? 0.116 ns/op > BarriersMultiple.test:L1-dcache-loads avgt 3 26.302 ? 0.456 #/op // -9! > BarriersMultiple.test:L1-dcache-stores avgt 3 9.078 ? 1.174 #/op > BarriersMultiple.test:branches avgt 3 18.218 ? 0.092 #/op > BarriersMultiple.test:cycles avgt 3 20.368 ? 3.023 #/op // -2 > BarriersMultiple.test:instructions avgt 3 86.984 ? 1.127 #/op > > ...but comes with the caveat: the increased register pressure (?) seems to penalize some of the > bigger workloads. To avoid bitrot, and get the matchers for GC state loads into our codebase, I > propose pushing this under disabled experimental flag. New test validates the feature is not > completely broken. > > Testing: hotspot_gc_shenandoah > > Thanks, > -Aleksey > I tried the initial Roland patch with traversal GC (against the then evac-in-progress flag), and have seen occurances of back-to-back evac-loads-checks that have not been common-ed. Roland is looking at it. I suggest to at least hold it back until this is resolved or confirmed to be a separate issue. Roman From rkennke at redhat.com Mon Jan 15 13:25:07 2018 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 15 Jan 2018 14:25:07 +0100 Subject: RFR: Common TLS access to GC state, where possible In-Reply-To: <05a517dd-9293-b351-0406-8a6e7aa2ca3a@redhat.com> References: <05a517dd-9293-b351-0406-8a6e7aa2ca3a@redhat.com> Message-ID: <4ac30fe4-c9f2-14a6-0e90-3365e272bfd5@redhat.com> Am 15.01.2018 um 13:23 schrieb Aleksey Shipilev: > http://cr.openjdk.java.net/~shade/shenandoah/c2-common-gc-state/webrev.01/ > (The initial version of this patch was drafted by Roland) > > This patch bases on single GC state flag patch. This enables us to match that load at once, and > common all the loads of GC state between the safepoints, thus avoiding excess L1 cache accesses. > This covers for the cases where we cannot move the barriers themselves, and thus improves the > worst-case scenario. > > It sure helps targeted back-to-back store benchmarks: > > Benchmark Mode Cnt Score Error Units > > # default > BarriersMultiple.test avgt 15 5.935 ? 0.003 ns/op > BarriersMultiple.test:L1-dcache-loads avgt 3 35.420 ? 2.116 #/op > BarriersMultiple.test:L1-dcache-stores avgt 3 9.082 ? 0.603 #/op > BarriersMultiple.test:branches avgt 3 18.187 ? 1.005 #/op > BarriersMultiple.test:cycles avgt 3 22.401 ? 1.249 #/op > BarriersMultiple.test:instructions avgt 3 83.810 ? 4.297 #/op > > # -XX:+ShenandoahCommonGCStateLoads > BarriersMultiple.test avgt 15 5.392 ? 0.116 ns/op > BarriersMultiple.test:L1-dcache-loads avgt 3 26.302 ? 0.456 #/op // -9! > BarriersMultiple.test:L1-dcache-stores avgt 3 9.078 ? 1.174 #/op > BarriersMultiple.test:branches avgt 3 18.218 ? 0.092 #/op > BarriersMultiple.test:cycles avgt 3 20.368 ? 3.023 #/op // -2 > BarriersMultiple.test:instructions avgt 3 86.984 ? 1.127 #/op > > ...but comes with the caveat: the increased register pressure (?) seems to penalize some of the > bigger workloads. To avoid bitrot, and get the matchers for GC state loads into our codebase, I > propose pushing this under disabled experimental flag. New test validates the feature is not > completely broken. > > Testing: hotspot_gc_shenandoah > > Thanks, > -Aleksey > Also, I am not sure if the patch already does it: what about also moving up the actual tests? And thus creating longer paths with/without barriers? I suspect it would be slightly trickier now because of the different masks that it needs to check? It might not be very useful with default heuristics because we tend to interleave different barriers (SATB vs. evac), but may be tremendously useful for traversal GC, where we only have one phase and can thus group all the barriers into one path (enqueue, WBs, *hopefully* even RBs and acmp barriers), and remain barrier-free in another? Roman From shade at redhat.com Mon Jan 15 13:38:51 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 15 Jan 2018 14:38:51 +0100 Subject: RFR: Common TLS access to GC state, where possible In-Reply-To: <45ce3760-b6c1-5d31-bff6-52b41db0af99@redhat.com> References: <05a517dd-9293-b351-0406-8a6e7aa2ca3a@redhat.com> <45ce3760-b6c1-5d31-bff6-52b41db0af99@redhat.com> Message-ID: On 01/15/2018 01:27 PM, Roman Kennke wrote: > Am 15.01.2018 um 13:23 schrieb Aleksey Shipilev: > I tried the initial Roland patch with traversal GC (against the then evac-in-progress flag), and > have seen occurances of back-to-back evac-loads-checks that have not been common-ed. Roland is > looking at it. I suggest to at least hold it back until this is resolved or confirmed to be a > separate issue. This is a separate issue, having nothing to do with barrier moves. This is about commoning the TLS access, so that this: testb $0x2, 0x3d8(TLS) jne SLOW ... testb $0x2, 0x3d8(TLS) jne SLOW ... becomes: mov %r11, 0x3d8(TLS) and $0x2, %r11 test %r11, %r11 jne SLOW ... test %r11, %r11 jne SLOW ... ...saving the TLS access on back-to-back barriers, which are dormant anyhow. > Also, I am not sure if the patch already does it: what about also moving up the actual tests? And > thus creating longer paths with/without barriers? I suspect it would be slightly trickier now > because of the different masks that it needs to check? It might not be very useful with default > heuristics because we tend to interleave different barriers (SATB vs. evac), but may be > tremendously useful for traversal GC, where we only have one phase and can thus group all the > barriers into one path (enqueue, WBs, *hopefully* even RBs and acmp barriers), and remain > barrier-free in another? Let's have some perspective, and not put all our eggs in one basket, okay? This patch helps the cases where (multiple) barriers cannot be optimized. It does not move the barriers around -- instead, it makes their fastpaths faster by not accessing the TLS every time. The whole machinery actually helps both SATB and WB checks, because after recent GC state both SATB and WB are checking against the same flag. It also aids future work, because it brings forward the matchers for generic GC state loads, not only evac-in-progress loads. If you want to have the barrier-free paths, you have to care about the generic GC state, not just evac-in-progress. Please note the optimization is disabled by default, but we want the C2 scaffolding anyway. -Aleksey From rkennke at redhat.com Mon Jan 15 14:07:51 2018 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 15 Jan 2018 15:07:51 +0100 Subject: RFR: Common TLS access to GC state, where possible In-Reply-To: References: <05a517dd-9293-b351-0406-8a6e7aa2ca3a@redhat.com> <45ce3760-b6c1-5d31-bff6-52b41db0af99@redhat.com> Message-ID: <456d81bb-c16c-9812-a6f5-3396e39daaec@redhat.com> Am 15.01.2018 um 14:38 schrieb Aleksey Shipilev: > On 01/15/2018 01:27 PM, Roman Kennke wrote: >> Am 15.01.2018 um 13:23 schrieb Aleksey Shipilev: >> I tried the initial Roland patch with traversal GC (against the then evac-in-progress flag), and >> have seen occurances of back-to-back evac-loads-checks that have not been common-ed. Roland is >> looking at it. I suggest to at least hold it back until this is resolved or confirmed to be a >> separate issue. > > This is a separate issue, having nothing to do with barrier moves. This is about commoning the TLS > access, so that this: > > testb $0x2, 0x3d8(TLS) > jne SLOW > ... > testb $0x2, 0x3d8(TLS) > jne SLOW > ... > > becomes: > > mov %r11, 0x3d8(TLS) > and $0x2, %r11 > test %r11, %r11 > jne SLOW > ... > test %r11, %r11 > jne SLOW > ... > > ...saving the TLS access on back-to-back barriers, which are dormant anyhow. Yes, this is what I was talking about, and I have still seen exactly those patterns after Roland's patch (at least for some cases). >> Also, I am not sure if the patch already does it: what about also moving up the actual tests? And >> thus creating longer paths with/without barriers? I suspect it would be slightly trickier now >> because of the different masks that it needs to check? It might not be very useful with default >> heuristics because we tend to interleave different barriers (SATB vs. evac), but may be >> tremendously useful for traversal GC, where we only have one phase and can thus group all the >> barriers into one path (enqueue, WBs, *hopefully* even RBs and acmp barriers), and remain >> barrier-free in another? > > Let's have some perspective, and not put all our eggs in one basket, okay? This patch helps the > cases where (multiple) barriers cannot be optimized. It does not move the barriers around -- > instead, it makes their fastpaths faster by not accessing the TLS every time. > > The whole machinery actually helps both SATB and WB checks, because after recent GC state both SATB > and WB are checking against the same flag. It also aids future work, because it brings forward the > matchers for generic GC state loads, not only evac-in-progress loads. If you want to have the > barrier-free paths, you have to care about the generic GC state, not just evac-in-progress. > > Please note the optimization is disabled by default, but we want the C2 scaffolding anyway. Ok. This is not so separate though. What I was suggesting in this last comment was to also common the actual checks, so your above example could become (assuming same flags): mov %r11, 0x3d8(TLS) and $0x2, %r11 test %r11, %r11 jne SLOW ... I am not (yet) suggesting to move any barriers around. All I care about for now is commoning the loads, and when that works, also commoning the tests. This alone should lead to nice groups of barriers under one flag-load-test, and a fast path without barriers. Or not? Roman From shade at redhat.com Mon Jan 15 14:53:51 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 15 Jan 2018 15:53:51 +0100 Subject: RFR: Common TLS access to GC state, where possible In-Reply-To: <456d81bb-c16c-9812-a6f5-3396e39daaec@redhat.com> References: <05a517dd-9293-b351-0406-8a6e7aa2ca3a@redhat.com> <45ce3760-b6c1-5d31-bff6-52b41db0af99@redhat.com> <456d81bb-c16c-9812-a6f5-3396e39daaec@redhat.com> Message-ID: <3abdbb2b-f6c0-1836-4c68-d68493b9bd6c@redhat.com> On 01/15/2018 03:07 PM, Roman Kennke wrote: > Ok. This is not so separate though. What I was suggesting in this last comment was to also common > the actual checks, so your above example could become (assuming same flags): > > ?? mov %r11, 0x3d8(TLS) > ?? and $0x2, %r11 > ?? test %r11, %r11 > ?? jne SLOW > ?? ... > > I am not (yet) suggesting to move any barriers around. All I care about for now is commoning the > loads, and when that works, also commoning the tests. This alone should lead to nice groups of > barriers under one flag-load-test, and a fast path without barriers. Or not? If would, but it requires rewiring the control flow (that what I meant by "moving the barriers", probably confusingly), while this particular change just commons the accesses to the flag itself. In my mind, this is orthogonal to rewiring the control flow, and it caters for cases where rewiring is not possible due to structural reasons. In other words, you want three things: a) Detect the GC state load; b) Common the GC state loads over multiple branches; c) Try to rewire branches so that huge happy paths are present under single branch; The patch in this RFR does (a) [provides scaffolding] and (b) [experimentally, disabled by default, as proof-of-concept such commoning is possible and available for performance testing if needed]. (c) would work even if (c) is not possible in a particular case. It seems odd to wait for (c) before pushing (a)+(b) out, right? Thanks, -Aleksey From rwestrel at redhat.com Mon Jan 15 15:44:36 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 15 Jan 2018 16:44:36 +0100 Subject: RFR: Common TLS access to GC state, where possible In-Reply-To: <05a517dd-9293-b351-0406-8a6e7aa2ca3a@redhat.com> References: <05a517dd-9293-b351-0406-8a6e7aa2ca3a@redhat.com> Message-ID: > http://cr.openjdk.java.net/~shade/shenandoah/c2-common-gc-state/webrev.01/ C2 code looks ok to me. Roland. From rwestrel at redhat.com Mon Jan 15 16:32:41 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 15 Jan 2018 17:32:41 +0100 Subject: RFR: Single thread-local GC state flag for all barriers In-Reply-To: <4c0cc38f-5a4c-a604-b574-20c3af9078ab@redhat.com> References: <4c0cc38f-5a4c-a604-b574-20c3af9078ab@redhat.com> Message-ID: > http://cr.openjdk.java.net/~shade/shenandoah/single-flag/webrev.03/ C2 code looks ok. Roland. From shade at redhat.com Mon Jan 15 16:40:39 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 15 Jan 2018 17:40:39 +0100 Subject: RFR: Common TLS access to GC state, where possible In-Reply-To: References: <05a517dd-9293-b351-0406-8a6e7aa2ca3a@redhat.com> Message-ID: <3454039e-2e2c-32c5-4a25-4848e58d3b86@redhat.com> On 01/15/2018 04:44 PM, Roland Westrelin wrote: >> http://cr.openjdk.java.net/~shade/shenandoah/c2-common-gc-state/webrev.01/ > > C2 code looks ok to me. Does it interfere/help your pending work? I think that is Roman's concern. Thanks, -Aleksey From rwestrel at redhat.com Mon Jan 15 16:45:47 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 15 Jan 2018 17:45:47 +0100 Subject: RFR: Common TLS access to GC state, where possible In-Reply-To: <3454039e-2e2c-32c5-4a25-4848e58d3b86@redhat.com> References: <05a517dd-9293-b351-0406-8a6e7aa2ca3a@redhat.com> <3454039e-2e2c-32c5-4a25-4848e58d3b86@redhat.com> Message-ID: > Does it interfere/help your pending work? I think that is Roman's concern. It's fine AFAICT. Roland. From shade at redhat.com Mon Jan 15 16:58:15 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 15 Jan 2018 17:58:15 +0100 Subject: RFR: Common TLS access to GC state, where possible In-Reply-To: References: <05a517dd-9293-b351-0406-8a6e7aa2ca3a@redhat.com> <3454039e-2e2c-32c5-4a25-4848e58d3b86@redhat.com> Message-ID: <1379a4f5-8333-3cea-d316-3cd414e2037b@redhat.com> On 01/15/2018 05:45 PM, Roland Westrelin wrote: > >> Does it interfere/help your pending work? I think that is Roman's concern. > > It's fine AFAICT. Roman, does this resolve your concern? Or you still want to hold this patch off? Thanks, -Aleksey From shade at redhat.com Mon Jan 15 17:18:27 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 15 Jan 2018 18:18:27 +0100 Subject: RFR: [9] Bulk backports to sh/jdk9 Message-ID: <2eacce66-6153-b8c8-1352-906990d19080@redhat.com> http://cr.openjdk.java.net/~shade/shenandoah/backports/jdk9-20180115/webrev.01/ This backports all outstanding work to sh/jdk9. This passes a few nightlies. Changes include: [backport] Increase test timeouts [backport] Report fwdptr size in JNI GetObjectSize [backport] Disable verification from non-Shenandoah VMOps. [backport] Cleanup reset_{next|complete}_mark_bitmap [backport] Verifier should check klass pointers before attempting to reach for object size [backport] TestSelectiveBarrierFlags times out due to too aggressive compilation mode [backport] Shenandoah SA implementation [backport] Allow use of fp spills around write barrier [backport] Rehash VMOperations and cycle driver mechanics for consistency [backport] Minor cleanup, uses latest Atomic API [backport] Match barrier fastpath checks better [backport] ShenandoahWriteBarrierRB flag to conditionally disable RB on WB fastpath String deduplication, NIO checkIndex fix, and assorted Windows compilation fixes were already backported by Zhengyu and Roman. Testing: hotspot_gc_shenandoah {fastdebug|release}, some benchmarks Thanks, -Aleksey From zgu at redhat.com Mon Jan 15 17:21:36 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Mon, 15 Jan 2018 12:21:36 -0500 Subject: RFR: Hint unused regions instead of uncommit them Message-ID: <537527af-1e46-c834-4f1b-36cd8f148666@redhat.com> This patch adds new experimental flag ShenandoahIdleRegions (default to false) to hint kernel that the regions are not needed (vs. madvise(MADV_DONTNEED), instead of proactively uncommitting. It appears that does have advantage over uncommitting regions, although, not by as much as I was expected. SPECjbb2015: Baseline: RUN RESULT: hbIR (max attempted) = 59167, hbIR (settled) = 51984, max-jOPS = 47925, critical-jOPS = 19108 -XX:ShenandoahUncommitDelay=0 -XX:-ShenandoahIdleRegions RUN RESULT: hbIR (max attempted) = 41119, hbIR (settled) = 36501, max-jOPS = 30839, critical-jOPS = 8841 -XX:ShenandoahUncommitDelay=0 -XX:+ShenandoahIdleRegions RUN RESULT: hbIR (max attempted) = 49322, hbIR (settled) = 42968, max-jOPS = 35019, critical-jOPS = 9283 Webrev: http://cr.openjdk.java.net/~zgu/shenandoah/idle_region/webrev.00/ Test: hotspot_gc_shenandoah (fastdebug + release) Thanks, -Zhengyu From rkennke at redhat.com Mon Jan 15 17:27:35 2018 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 15 Jan 2018 18:27:35 +0100 Subject: RFR: Common TLS access to GC state, where possible In-Reply-To: <1379a4f5-8333-3cea-d316-3cd414e2037b@redhat.com> References: <05a517dd-9293-b351-0406-8a6e7aa2ca3a@redhat.com> <3454039e-2e2c-32c5-4a25-4848e58d3b86@redhat.com> <1379a4f5-8333-3cea-d316-3cd414e2037b@redhat.com> Message-ID: <96db164d-1508-e6b5-0cb2-7c07183d80d1@redhat.com> Am 15.01.2018 um 17:58 schrieb Aleksey Shipilev: > On 01/15/2018 05:45 PM, Roland Westrelin wrote: >> >>> Does it interfere/help your pending work? I think that is Roman's concern. >> >> It's fine AFAICT. > > Roman, does this resolve your concern? Or you still want to hold this patch off? > > Thanks, > -Aleksey > It's fine for me. Roman From ashipile at redhat.com Mon Jan 15 17:35:07 2018 From: ashipile at redhat.com (ashipile at redhat.com) Date: Mon, 15 Jan 2018 17:35:07 +0000 Subject: hg: shenandoah/jdk10: 2 new changesets Message-ID: <201801151735.w0FHZ82l018554@aojmv0008.oracle.com> Changeset: 8735773ec619 Author: shade Date: 2018-01-15 12:19 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk10/rev/8735773ec619 Single thread-local GC state flag for all barriers ! src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp ! src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp ! src/hotspot/cpu/aarch64/shenandoahBarrierSet_aarch64.cpp ! src/hotspot/cpu/x86/c1_Runtime1_x86.cpp ! src/hotspot/cpu/x86/macroAssembler_x86.cpp ! src/hotspot/cpu/x86/shenandoahBarrierSet_x86.cpp ! src/hotspot/cpu/x86/x86_64.ad ! src/hotspot/share/c1/c1_LIRGenerator.cpp ! src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.cpp ! src/hotspot/share/gc/shenandoah/shenandoahCollectorPolicy.cpp ! src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp ! src/hotspot/share/gc/shenandoah/shenandoahConcurrentThread.cpp ! src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp ! src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp ! src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp ! src/hotspot/share/gc/shenandoah/shenandoahMarkCompact.cpp ! src/hotspot/share/gc/shenandoah/shenandoahSharedVariables.hpp ! src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp ! src/hotspot/share/opto/cfgnode.hpp ! src/hotspot/share/opto/compile.cpp ! src/hotspot/share/opto/graphKit.cpp ! src/hotspot/share/opto/ifnode.cpp ! src/hotspot/share/opto/memnode.hpp ! src/hotspot/share/opto/node.hpp ! src/hotspot/share/opto/shenandoahSupport.cpp ! src/hotspot/share/runtime/thread.cpp ! src/hotspot/share/runtime/thread.hpp ! src/hotspot/share/runtime/thread.inline.hpp Changeset: d55c6d5216d1 Author: shade Date: 2018-01-15 12:32 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk10/rev/d55c6d5216d1 Common TLS access to GC state, where possible ! src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp ! src/hotspot/share/opto/graphKit.cpp ! src/hotspot/share/opto/loopnode.cpp ! src/hotspot/share/opto/loopnode.hpp ! src/hotspot/share/opto/shenandoahSupport.cpp ! src/hotspot/share/opto/shenandoahSupport.hpp + test/hotspot/jtreg/gc/shenandoah/compiler/TestCommonGCLoads.java From shade at redhat.com Mon Jan 15 17:43:03 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 15 Jan 2018 18:43:03 +0100 Subject: RFR: improve profiled predicates In-Reply-To: References: Message-ID: <5637e356-3c95-6768-05d4-83e4b1cd4fe6@redhat.com> On 01/12/2018 05:54 PM, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/shenandoah/improved-profiled-predicates/webrev.00/ Help me understand why we are pushing this to sh/jdk10? Is this for pre-stabilization until we upstream this separately? We don't backport this at all to sh/jdk9 and sh/jdk8? Nits: loopPredicate.cpp *) indenting is off starting line 331, also see lines 334 and 339 *) fenv.h/math.h includes in the middle of the file? *) indenting at line 1208 deoptimization.hpp: *) Comment for the reason here? 65 Reason_profile_predicate, DataLayout.java: *) Comment is outdated: 96 // 4 bits of trap history (none/one reason/many reasons), *) Indenting, and also the condition looks reversed. Cell size is the size of ptr, right? And we have the union with u8 inside, which takes 2 slots on 32-bit VM? 120 static int headerSizeInCells() { 121 return VM.getVM().isLP64() ? 2 : 1; 122 } Thanks, -Aleksey From shade at redhat.com Mon Jan 15 23:10:01 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 16 Jan 2018 00:10:01 +0100 Subject: RFR: [8u] Bulk backports to sh/jdk8u Message-ID: <6b294bb5-bb6c-2bff-b879-6515fd5970b7@redhat.com> http://cr.openjdk.java.net/~shade/shenandoah/backports/jdk8u-20180116/webrev.01/ This backports all outstanding work to sh/jdk8u. This passes a few nightlies in sh/jdk10. Some changes, notably moving the VM operations around required some fiddling to match the code in sh/jdk8u. Changes include: [backport] Increase test timeouts [backport] Report fwdptr size in JNI GetObjectSize [backport] Disable verification from non-Shenandoah VMOps. [backport] Cleanup reset_{next|complete}_mark_bitmap [backport] Verifier should check klass pointers before attempting to reach for object size [backport] TestSelectiveBarrierFlags times out due to too aggressive compilation mode [backport] Shenandoah SA implementation [backport] Allow use of fp spills around write barrier [backport] Rehash VMOperations and cycle driver mechanics for consistency [backport] Minor cleanup, uses latest Atomic API [backport] Match barrier fastpath checks better [backport] ShenandoahWriteBarrierRB flag to conditionally disable RB on WB fastpath NIO checkIndex fix, and assorted Windows compilation fixes were already backported by Zhengyu and Roman. Testing: hotspot_gc_shenandoah {fastdebug|release}, some benchmarks Thanks, -Aleksey From shade at redhat.com Tue Jan 16 11:26:54 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 16 Jan 2018 12:26:54 +0100 Subject: RFR: Make degenerated update-refs use region-set cursor to hand over work In-Reply-To: <73df3b1a-5926-1b7a-2194-cb3649bdf456@redhat.com> References: <1a915cfc-4f78-242d-e528-3ce6b0729a1c@redhat.com> <73df3b1a-5926-1b7a-2194-cb3649bdf456@redhat.com> Message-ID: <773dce1c-ef79-dbec-944b-3210ec72cda4@redhat.com> On 12/14/2017 10:49 PM, Roman Kennke wrote: > Am 14.12.2017 um 19:06 schrieb Aleksey Shipilev: >> http://cr.openjdk.java.net/~shade/shenandoah/ur-degen-cursor/webrev.01/ >> >> This is based on previous RFR that cleans up operations. For Degenerate GC to work, we want to drop >> cancellation flag right away, and do init-update-refs, followed by final-update-refs to finish the >> update refs work. But, final-update-refs would not finish work when cancellation is cleared. >> >> Since work handover is tracked by regions cursor anyway, why don't we use that to signal available >> work? This also handles the case where cancellation is called when all threads have processed all >> regions during conc-update-refs, and reacted on cancellation at the end of the phase. Current code >> would make a futile attempt to whip up workers during final-update-refs, when we know there is no >> work left. > > Ok Forgot to push this one! Re-testing and pushing today... Thanks, -Aleksey From shade at redhat.com Tue Jan 16 11:49:53 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 16 Jan 2018 12:49:53 +0100 Subject: RFR: Hint unused regions instead of uncommit them In-Reply-To: <537527af-1e46-c834-4f1b-36cd8f148666@redhat.com> References: <537527af-1e46-c834-4f1b-36cd8f148666@redhat.com> Message-ID: <9bed336a-34df-5192-24da-db675b22cc45@redhat.com> On 01/15/2018 06:21 PM, Zhengyu Gu wrote: > This patch adds new experimental flag ShenandoahIdleRegions (default to false) to hint kernel that > the regions are not needed (vs. madvise(MADV_DONTNEED), instead of proactively uncommitting. > > It appears that does have advantage over uncommitting regions, although, not by as much as I was > expected. > > SPECjbb2015: > > Baseline: > RUN RESULT: hbIR (max attempted) = 59167, hbIR (settled) = 51984, max-jOPS = 47925, critical-jOPS = > 19108 > > -XX:ShenandoahUncommitDelay=0 -XX:-ShenandoahIdleRegions > RUN RESULT: hbIR (max attempted) = 41119, hbIR (settled) = 36501, max-jOPS = 30839, critical-jOPS = > 8841 > > -XX:ShenandoahUncommitDelay=0 -XX:+ShenandoahIdleRegions > RUN RESULT: hbIR (max attempted) = 49322, hbIR (settled) = 42968, max-jOPS = 35019, critical-jOPS = > 9283 > > > Webrev: http://cr.openjdk.java.net/~zgu/shenandoah/idle_region/webrev.00/ As I read MADV_DONTNEED man page and the explanations of different kernel people, I am getting uneasy using this. madvise call that basically corrupts memory, say what? And it also does not support large pages... It _maybe_ makes sense to optionally support this, but only if we make the code changes minimal. It looks like the fair bit of complexity comes from the attempt to fallback to commit/uncommit when idling fails. Could we just test that idle/activate_memory works, and select one of the options without fallback? E.g. when ShenandoahIdleRegions is true, LargePages is false, and idling works, make do_commit/do_uncommit only do idle_memory/activate_memory, and fail hard when idle_memory returns false. You would not need the _idle_region flag too then. Thanks, -ALeksey From zgu at redhat.com Tue Jan 16 13:13:59 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 16 Jan 2018 08:13:59 -0500 Subject: RFR: Hint unused regions instead of uncommit them In-Reply-To: <9bed336a-34df-5192-24da-db675b22cc45@redhat.com> References: <537527af-1e46-c834-4f1b-36cd8f148666@redhat.com> <9bed336a-34df-5192-24da-db675b22cc45@redhat.com> Message-ID: <6c57108d-5a93-c33f-e102-4bd7ec571e17@redhat.com> On 01/16/2018 06:49 AM, Aleksey Shipilev wrote: > On 01/15/2018 06:21 PM, Zhengyu Gu wrote: >> This patch adds new experimental flag ShenandoahIdleRegions (default to false) to hint kernel that >> the regions are not needed (vs. madvise(MADV_DONTNEED), instead of proactively uncommitting. >> >> It appears that does have advantage over uncommitting regions, although, not by as much as I was >> expected. >> >> SPECjbb2015: >> >> Baseline: >> RUN RESULT: hbIR (max attempted) = 59167, hbIR (settled) = 51984, max-jOPS = 47925, critical-jOPS = >> 19108 >> >> -XX:ShenandoahUncommitDelay=0 -XX:-ShenandoahIdleRegions >> RUN RESULT: hbIR (max attempted) = 41119, hbIR (settled) = 36501, max-jOPS = 30839, critical-jOPS = >> 8841 >> >> -XX:ShenandoahUncommitDelay=0 -XX:+ShenandoahIdleRegions >> RUN RESULT: hbIR (max attempted) = 49322, hbIR (settled) = 42968, max-jOPS = 35019, critical-jOPS = >> 9283 >> >> >> Webrev: http://cr.openjdk.java.net/~zgu/shenandoah/idle_region/webrev.00/ > > As I read MADV_DONTNEED man page and the explanations of different kernel people, I am getting > uneasy using this. madvise call that basically corrupts memory, say what? And it also does not > support large pages... Hummm ... can you point me how it can corrupt memory? since it is the way how thread stack is released. > > It _maybe_ makes sense to optionally support this, but only if we make the code changes minimal. It > looks like the fair bit of complexity comes from the attempt to fallback to commit/uncommit when > idling fails. Could we just test that idle/activate_memory works, and select one of the options > without fallback? E.g. when ShenandoahIdleRegions is true, LargePages is false, and idling works, > make do_commit/do_uncommit only do idle_memory/activate_memory, and fail hard when idle_memory > returns false. You would not need the _idle_region flag too then. Sure. -Zhengyu > > Thanks, > -ALeksey > From rkennke at redhat.com Tue Jan 16 17:33:55 2018 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 16 Jan 2018 18:33:55 +0100 Subject: RFR: [9] Bulk backports to sh/jdk9 In-Reply-To: <2eacce66-6153-b8c8-1352-906990d19080@redhat.com> References: <2eacce66-6153-b8c8-1352-906990d19080@redhat.com> Message-ID: Am 15.01.2018 um 18:18 schrieb Aleksey Shipilev: > http://cr.openjdk.java.net/~shade/shenandoah/backports/jdk9-20180115/webrev.01/ > > This backports all outstanding work to sh/jdk9. This passes a few nightlies. > > Changes include: > > [backport] Increase test timeouts > [backport] Report fwdptr size in JNI GetObjectSize > [backport] Disable verification from non-Shenandoah VMOps. > [backport] Cleanup reset_{next|complete}_mark_bitmap > [backport] Verifier should check klass pointers before attempting to reach for object size > [backport] TestSelectiveBarrierFlags times out due to too aggressive compilation mode > [backport] Shenandoah SA implementation > [backport] Allow use of fp spills around write barrier > [backport] Rehash VMOperations and cycle driver mechanics for consistency > [backport] Minor cleanup, uses latest Atomic API > [backport] Match barrier fastpath checks better > [backport] ShenandoahWriteBarrierRB flag to conditionally disable RB on WB fastpath > > String deduplication, NIO checkIndex fix, and assorted Windows compilation fixes were already > backported by Zhengyu and Roman. > > Testing: hotspot_gc_shenandoah {fastdebug|release}, some benchmarks > > Thanks, > -Aleksey > On thing that struck me that must have slipped my previous jdk10 review (but doesn't stop this backport): - void start_concurrent_marking(); void stop_concurrent_marking(); Why is start_concurrent_marking() gone, but not stop_concurrent_marking() ? Can't say about C2 changes. Other than that, it's good for me. Roman From shade at redhat.com Tue Jan 16 17:35:47 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 16 Jan 2018 18:35:47 +0100 Subject: RFR: [9] Bulk backports to sh/jdk9 In-Reply-To: References: <2eacce66-6153-b8c8-1352-906990d19080@redhat.com> Message-ID: <0b9acb5d-8b80-e17f-8aa2-234dad37fe9b@redhat.com> On 01/16/2018 06:33 PM, Roman Kennke wrote: > On thing that struck me that must have slipped my previous jdk10 review (but doesn't stop this > backport): > > -? void start_concurrent_marking(); > ?? void stop_concurrent_marking(); > > Why is start_concurrent_marking() gone, but not stop_concurrent_marking() ? Because start* and stop* were not really symmetric. start_concurrent_marking() was the alias for init-mark, while stop_concurrent_marking() is the method that cleans up mark mess, either in concurrent or Full GC cycle. The naming choice was misleading. Thanks, -Aleksey From rkennke at redhat.com Tue Jan 16 17:36:58 2018 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 16 Jan 2018 18:36:58 +0100 Subject: RFR: [9] Bulk backports to sh/jdk9 In-Reply-To: <0b9acb5d-8b80-e17f-8aa2-234dad37fe9b@redhat.com> References: <2eacce66-6153-b8c8-1352-906990d19080@redhat.com> <0b9acb5d-8b80-e17f-8aa2-234dad37fe9b@redhat.com> Message-ID: <6d669c8a-0c8d-f15c-2706-c0ce346a7f48@redhat.com> Am 16.01.2018 um 18:35 schrieb Aleksey Shipilev: > On 01/16/2018 06:33 PM, Roman Kennke wrote: >> On thing that struck me that must have slipped my previous jdk10 review (but doesn't stop this >> backport): >> >> -? void start_concurrent_marking(); >> ?? void stop_concurrent_marking(); >> >> Why is start_concurrent_marking() gone, but not stop_concurrent_marking() ? > > Because start* and stop* were not really symmetric. start_concurrent_marking() was the alias for > init-mark, while stop_concurrent_marking() is the method that cleans up mark mess, either in > concurrent or Full GC cycle. The naming choice was misleading. Ok From rkennke at redhat.com Tue Jan 16 17:38:59 2018 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 16 Jan 2018 18:38:59 +0100 Subject: RFR: [8u] Bulk backports to sh/jdk8u In-Reply-To: <6b294bb5-bb6c-2bff-b879-6515fd5970b7@redhat.com> References: <6b294bb5-bb6c-2bff-b879-6515fd5970b7@redhat.com> Message-ID: <1193e25c-addc-c158-d6ff-63148e2e3ddd@redhat.com> Am 16.01.2018 um 00:10 schrieb Aleksey Shipilev: > http://cr.openjdk.java.net/~shade/shenandoah/backports/jdk8u-20180116/webrev.01/ > > This backports all outstanding work to sh/jdk8u. This passes a few nightlies in sh/jdk10. Some > changes, notably moving the VM operations around required some fiddling to match the code in sh/jdk8u. > > Changes include: > > [backport] Increase test timeouts > [backport] Report fwdptr size in JNI GetObjectSize > [backport] Disable verification from non-Shenandoah VMOps. > [backport] Cleanup reset_{next|complete}_mark_bitmap > [backport] Verifier should check klass pointers before attempting to reach for object size > [backport] TestSelectiveBarrierFlags times out due to too aggressive compilation mode > [backport] Shenandoah SA implementation > [backport] Allow use of fp spills around write barrier > [backport] Rehash VMOperations and cycle driver mechanics for consistency > [backport] Minor cleanup, uses latest Atomic API > [backport] Match barrier fastpath checks better > [backport] ShenandoahWriteBarrierRB flag to conditionally disable RB on WB fastpath > > NIO checkIndex fix, and assorted Windows compilation fixes were already > backported by Zhengyu and Roman. > > Testing: hotspot_gc_shenandoah {fastdebug|release}, some benchmarks > > Thanks, > -Aleksey > Looks good to me. Can't say for sure about C2 changes. Roman From rkennke at redhat.com Tue Jan 16 17:43:23 2018 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 16 Jan 2018 18:43:23 +0100 Subject: RFR: Guard interpreter keep alive barrier with ShenandoahKeepAliveBarrier Message-ID: One thing that I found in traversal GC work: with -ShenandoahKeepAliveBarrier we still generate some code in the interpreter that is only used for the keep-alive-barrier. This patch avoids this. I realize that this should require some better refactoring (to move more of that code into keep_alive_barrier() to begin with), but I suspect that can wait until upstream codegens arrive, then we need to refactor it (big time) anyway. http://cr.openjdk.java.net/~rkennke/interpreter_keep_alive_barrier/webrev.00/ Tests: hotspot_gc_shenandoah Ok? From shade at redhat.com Tue Jan 16 17:47:32 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 16 Jan 2018 18:47:32 +0100 Subject: RFR: Guard interpreter keep alive barrier with ShenandoahKeepAliveBarrier In-Reply-To: References: Message-ID: <06a9dbdb-ecc3-9af2-5494-dd81afc06af9@redhat.com> On 01/16/2018 06:43 PM, Roman Kennke wrote: > One thing that I found in traversal GC work: with -ShenandoahKeepAliveBarrier we still generate some > code in the interpreter that is only used for the keep-alive-barrier. This patch avoids this. I > realize that this should require some better refactoring (to move more of that code into > keep_alive_barrier() to begin with), but I suspect that can wait until upstream codegens arrive, > then we need to refactor it (big time) anyway. > > http://cr.openjdk.java.net/~rkennke/interpreter_keep_alive_barrier/webrev.00/ I think this is already handled inside MacroAssembler::keep_alive_barrier, that this block eventually calls into: void MacroAssembler::keep_alive_barrier(Register val, Register thread, Register tmp) { if (UseG1GC) { // Generate the G1 pre-barrier code to log the value of // the referent field in an SATB buffer. g1_write_barrier_pre(noreg, rax /* pre_val */, thread /* thread */, tmp, true /* tosca_live */, true /* expand_call */); } else if (UseShenandoahGC && ShenandoahKeepAliveBarrier) { shenandoah_write_barrier_pre(noreg, rax /* pre_val */, thread /* thread */, tmp, true /* tosca_live */, true /* expand_call */); } } So the better fix would probably revisit all uses of keep_alive_barrier, and protect their relevant blocks, then putting the assert(ShenandoahKeepAliveBarrier) in MacroAssembler::keep_alive_barrier? -Aleksey From rkennke at redhat.com Tue Jan 16 17:49:41 2018 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 16 Jan 2018 18:49:41 +0100 Subject: RFR: Guard interpreter keep alive barrier with ShenandoahKeepAliveBarrier In-Reply-To: <06a9dbdb-ecc3-9af2-5494-dd81afc06af9@redhat.com> References: <06a9dbdb-ecc3-9af2-5494-dd81afc06af9@redhat.com> Message-ID: <4745f18f-4952-74c5-81e9-dc5b33740d9d@redhat.com> Am 16.01.2018 um 18:47 schrieb Aleksey Shipilev: > On 01/16/2018 06:43 PM, Roman Kennke wrote: >> One thing that I found in traversal GC work: with -ShenandoahKeepAliveBarrier we still generate some >> code in the interpreter that is only used for the keep-alive-barrier. This patch avoids this. I >> realize that this should require some better refactoring (to move more of that code into >> keep_alive_barrier() to begin with), but I suspect that can wait until upstream codegens arrive, >> then we need to refactor it (big time) anyway. >> >> http://cr.openjdk.java.net/~rkennke/interpreter_keep_alive_barrier/webrev.00/ > > I think this is already handled inside MacroAssembler::keep_alive_barrier, that this block > eventually calls into: > > void MacroAssembler::keep_alive_barrier(Register val, > Register thread, > Register tmp) { > > if (UseG1GC) { > // Generate the G1 pre-barrier code to log the value of > // the referent field in an SATB buffer. > g1_write_barrier_pre(noreg, > rax /* pre_val */, > thread /* thread */, > tmp, > true /* tosca_live */, > true /* expand_call */); > } else if (UseShenandoahGC && ShenandoahKeepAliveBarrier) { > shenandoah_write_barrier_pre(noreg, > rax /* pre_val */, > thread /* thread */, > tmp, > true /* tosca_live */, > true /* expand_call */); > } > } > > So the better fix would probably revisit all uses of keep_alive_barrier, and protect their relevant > blocks, then putting the assert(ShenandoahKeepAliveBarrier) in MacroAssembler::keep_alive_barrier? > > -Aleksey > Yes, that is what I mean with 'this should need more refactoring' ;-) Only the code that I touched uses it, so we should infact move all that code under keep_alive_barrier() instead. Want me to do that now? Or wait until codegen for interpreter arrives and do it really properly? Roman From shade at redhat.com Tue Jan 16 17:49:50 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 16 Jan 2018 18:49:50 +0100 Subject: RFR: Hint unused regions instead of uncommit them In-Reply-To: <6c57108d-5a93-c33f-e102-4bd7ec571e17@redhat.com> References: <537527af-1e46-c834-4f1b-36cd8f148666@redhat.com> <9bed336a-34df-5192-24da-db675b22cc45@redhat.com> <6c57108d-5a93-c33f-e102-4bd7ec571e17@redhat.com> Message-ID: <2168843c-7581-b8ba-2e5a-ea7577579e3e@redhat.com> On 01/16/2018 02:13 PM, Zhengyu Gu wrote: >> As I read MADV_DONTNEED man page and the explanations of different kernel people, I am getting >> uneasy using this. madvise call that basically corrupts memory, say what? And it also does not >> support large pages... > Hummm ... can you point me how it can corrupt memory? since it is the way how thread stack is released. Ah, I meant that it is very surprising to have madvise to do anything that affects correctness MADV_DONTNEED basically destructs the page contents, as far as application is concerned. Awkward API... -Aleksey From shade at redhat.com Tue Jan 16 17:52:44 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 16 Jan 2018 18:52:44 +0100 Subject: RFR: Guard interpreter keep alive barrier with ShenandoahKeepAliveBarrier In-Reply-To: <4745f18f-4952-74c5-81e9-dc5b33740d9d@redhat.com> References: <06a9dbdb-ecc3-9af2-5494-dd81afc06af9@redhat.com> <4745f18f-4952-74c5-81e9-dc5b33740d9d@redhat.com> Message-ID: <8fa031cc-ca9e-ce4c-d5b8-f4f3110f2b93@redhat.com> On 01/16/2018 06:49 PM, Roman Kennke wrote: > Am 16.01.2018 um 18:47 schrieb Aleksey Shipilev: >> So the better fix would probably revisit all uses of keep_alive_barrier, and protect their relevant >> blocks, then putting the assert(ShenandoahKeepAliveBarrier) in MacroAssembler::keep_alive_barrier? >> >> -Aleksey >> > > Yes, that is what I mean with 'this should need more refactoring' ;-) Only the code that I touched > uses it, so we should infact move all that code under keep_alive_barrier() instead. Want me to do > that now? Or wait until codegen for interpreter arrives and do it really properly? I think it does not matter at this point. We usually use Shenandoah*Barrier as the performance investigation tool, which means we do care about what compilers do. We are not really interested in what interpreters do perf-wise. So, a better move resource-wise would be to make it right once, after codegen interfaces arrive. Or, does it affect Traversal GC perf? Thanks, -Aleksey From rkennke at redhat.com Tue Jan 16 17:54:13 2018 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 16 Jan 2018 18:54:13 +0100 Subject: RFR: Guard interpreter keep alive barrier with ShenandoahKeepAliveBarrier In-Reply-To: <8fa031cc-ca9e-ce4c-d5b8-f4f3110f2b93@redhat.com> References: <06a9dbdb-ecc3-9af2-5494-dd81afc06af9@redhat.com> <4745f18f-4952-74c5-81e9-dc5b33740d9d@redhat.com> <8fa031cc-ca9e-ce4c-d5b8-f4f3110f2b93@redhat.com> Message-ID: <0d0c50ac-5e39-d2c1-a9d1-891ccf545cf7@redhat.com> Am 16.01.2018 um 18:52 schrieb Aleksey Shipilev: > On 01/16/2018 06:49 PM, Roman Kennke wrote: >> Am 16.01.2018 um 18:47 schrieb Aleksey Shipilev: >>> So the better fix would probably revisit all uses of keep_alive_barrier, and protect their relevant >>> blocks, then putting the assert(ShenandoahKeepAliveBarrier) in MacroAssembler::keep_alive_barrier? >>> >>> -Aleksey >>> >> >> Yes, that is what I mean with 'this should need more refactoring' ;-) Only the code that I touched >> uses it, so we should infact move all that code under keep_alive_barrier() instead. Want me to do >> that now? Or wait until codegen for interpreter arrives and do it really properly? > > I think it does not matter at this point. We usually use Shenandoah*Barrier as the performance > investigation tool, which means we do care about what compilers do. We are not really interested in > what interpreters do perf-wise. So, a better move resource-wise would be to make it right once, > after codegen interfaces arrive. > > Or, does it affect Traversal GC perf? > No, not really. It just means we keep alive more weakrefs than we need to. Ok, let's drop it for now. Roman From rkennke at redhat.com Tue Jan 16 17:59:47 2018 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 16 Jan 2018 18:59:47 +0100 Subject: RFR: Defer cleaning of system dictionary and friends to parallel cleaning phase Message-ID: <85219d9c-2c55-edfd-76f2-282b01c1e2fd@redhat.com> Found this during traversal GC work: when cleaning the system dictionary and friends, we do clean it in the first pass, *single threaded* and then do the cleaning stuff again, but multi-threaded. We shall defer cleaning to the parallel phase to begin with. That's what G1 does too. http://cr.openjdk.java.net/~rkennke/defer_cleaning/webrev.00/ Ok? Roman From shade at redhat.com Tue Jan 16 18:09:50 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 16 Jan 2018 19:09:50 +0100 Subject: RFR: Defer cleaning of system dictionary and friends to parallel cleaning phase In-Reply-To: <85219d9c-2c55-edfd-76f2-282b01c1e2fd@redhat.com> References: <85219d9c-2c55-edfd-76f2-282b01c1e2fd@redhat.com> Message-ID: <551fb0d8-da01-cfe2-ceaf-eb502bf4460f@redhat.com> On 01/16/2018 06:59 PM, Roman Kennke wrote: > Found this during traversal GC work: when cleaning the system dictionary and friends, we do clean it > in the first pass, *single threaded* and then do the cleaning stuff again, but multi-threaded. We > shall defer cleaning to the parallel phase to begin with. That's what G1 does too. > > http://cr.openjdk.java.net/~rkennke/defer_cleaning/webrev.00/ > Awwwwwwww. Note that in G1, there are two calls to do_unloading: one from weak_refs_work with "false", and another from mark_sweep_phase1 with default "true". Are you saying that doing this once with "false" is enough? It looks that ParallelCleaning stuff purges ResolvedMethodTable, but does it do ClassLoaderDataGraph::do_unloading with clean_previous_versions? Maybe we should cautiously say "full_gc", not "false" in the patch, so last-ditch can still do it? Thanks, -Aleksey From rkennke at redhat.com Tue Jan 16 18:15:27 2018 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 16 Jan 2018 19:15:27 +0100 Subject: RFR: Defer cleaning of system dictionary and friends to parallel cleaning phase In-Reply-To: <551fb0d8-da01-cfe2-ceaf-eb502bf4460f@redhat.com> References: <85219d9c-2c55-edfd-76f2-282b01c1e2fd@redhat.com> <551fb0d8-da01-cfe2-ceaf-eb502bf4460f@redhat.com> Message-ID: <5f6bb6b8-edfc-61e5-121e-d30a0370f372@redhat.com> Am 16.01.2018 um 19:09 schrieb Aleksey Shipilev: > On 01/16/2018 06:59 PM, Roman Kennke wrote: >> Found this during traversal GC work: when cleaning the system dictionary and friends, we do clean it >> in the first pass, *single threaded* and then do the cleaning stuff again, but multi-threaded. We >> shall defer cleaning to the parallel phase to begin with. That's what G1 does too. >> >> http://cr.openjdk.java.net/~rkennke/defer_cleaning/webrev.00/ >> > > Awwwwwwww. > > Note that in G1, there are two calls to do_unloading: one from weak_refs_work with "false", and > another from mark_sweep_phase1 with default "true". For ordinary concurrent GCs, it cleans everything in parallel phase, and thus passes 'false' to do_unloading(). For full-GC, I guess they don't care and do everything single-threaded. > Are you saying that doing this once with "false" is enough? It looks that ParallelCleaning stuff > purges ResolvedMethodTable, but does it do ClassLoaderDataGraph::do_unloading with > clean_previous_versions? Maybe we should cautiously say "full_gc", not "false" in the patch, so > last-ditch can still do it? I believe the ParallelCleaning handles everything. Zhengyu? Roman From rkennke at redhat.com Tue Jan 16 18:38:42 2018 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 16 Jan 2018 19:38:42 +0100 Subject: RFR: Traveral GC heuristics Message-ID: This started out as a smallish partial-GC experiment, then into a clone of partial GC, and ended up as a standalone GC mode for Shenandoah, which is a frankensteinization of partial+concurrent-marking, with some goodies :-) The idea is to do everything, marking+evacuation+update-refs, in one single phase. This is not very difficult to do: while traversing, evacuate objects that are in the Cset, and update references as we go. I chose to traverse the heap using an incremental-update approach, mostly because this is what partial GC does, and as said above, this started out as a clone of partial :-) The tricky part is to choose the Cset: I made it such that each GC cycle collects liveness information, and bases the decision about Cset in the next cycle on that liveness information. Yes, this means the first cycle does not collect anything (except immediate garbage). Advantages: - obviously, touching all live objects only once means less time spent in GC. Measurements show that traversing the heap and doing everything is only slightly longer than Shenandoah's marking phase, and this might actually be because we also need to mark through newly allocated objects. - Traversal-order evacuation gives us 10x increase in ordering-sensitive microbenchmark: https://shipilev.net/jvm-anatomy-park/11-moving-gc-locality/ - Simpler barriers: i-u style barriers don't need to load the pre-value, and can be optimized much better (hoisted out of hot paths, etc). Some of it is already done in this patch, but there are plenty of opportunities to make it even better. - Possibly less floating garbage because we trace through newly allocated objects too, and don't treat it implicitely live. - we don't need a keep-alive-barrier for Reference.get() which means we keep fewer referents alive just because they happen to be accessed during GC. - MWF is only a switch away (if I understand MWF correctly): -XX:+ShenandoahMWF - It does not need RBs in the WB fast-path, because outside of the single phase, nothing is ever forwarded. - It does not need the membar stuff in the WBs because we turn on/off the phase during safepoint Disadvantages: - Store-value barrier needs to be a WB, RB is not sufficient. The storeval barrier is there to ensure only to-space values ever get written to fields during update-refs. 3-phase Shenandoah doesn't evacuate during update-refs, and therefore RB is enough. We need WB here. (I believe this is off-set by optimization opportunities, see above) - Known I-U problem: mutators can outrun the GC with allocations and let us not terminate. - It needs barriers for constants (need to check this). Stuff left to do: - Implement sane degeneration: if we hit OOM, we simply restart and go into full-GC. - Depending on degen: make heuristics adaptive. Currently it requires manual tweaking of thresholds. Relevant knobs: - ShenandoahGarbageThreshold: regions with more garbage than this go into the Cset. Notice that this is based on the *previous* cycle, so we may actually have much more garbage (but not less). - ShenandoahFreeThreshold: start GC when we have less than that much free heap. I'll not go into all the details for now and give you the code: http://cr.openjdk.java.net/~rkennke/traversal/webrev.00/ Roman From zgu at redhat.com Tue Jan 16 19:09:12 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 16 Jan 2018 14:09:12 -0500 Subject: RFR: Hint unused regions instead of uncommit them In-Reply-To: <2168843c-7581-b8ba-2e5a-ea7577579e3e@redhat.com> References: <537527af-1e46-c834-4f1b-36cd8f148666@redhat.com> <9bed336a-34df-5192-24da-db675b22cc45@redhat.com> <6c57108d-5a93-c33f-e102-4bd7ec571e17@redhat.com> <2168843c-7581-b8ba-2e5a-ea7577579e3e@redhat.com> Message-ID: <183f0c80-eac7-bb0a-caf9-c0db0d25cd5e@redhat.com> On 01/16/2018 12:49 PM, Aleksey Shipilev wrote: > On 01/16/2018 02:13 PM, Zhengyu Gu wrote: >>> As I read MADV_DONTNEED man page and the explanations of different kernel people, I am getting >>> uneasy using this. madvise call that basically corrupts memory, say what? And it also does not >>> support large pages... >> Hummm ... can you point me how it can corrupt memory? since it is the way how thread stack is released. > > Ah, I meant that it is very surprising to have madvise to do anything that affects correctness > MADV_DONTNEED basically destructs the page contents, as far as application is concerned. Awkward API... Well, unmapping also destructs page contents. In fact, it will reconstruct content from the underlying mapped file, if it has backing file, or zero-fill-on-demand (which does not do any good to us) pages for mappings without an underlying file. -Zhengyu > > -Aleksey > From zgu at redhat.com Tue Jan 16 19:20:57 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 16 Jan 2018 14:20:57 -0500 Subject: RFR: [8u] Bulk backports to sh/jdk8u In-Reply-To: <1193e25c-addc-c158-d6ff-63148e2e3ddd@redhat.com> References: <6b294bb5-bb6c-2bff-b879-6515fd5970b7@redhat.com> <1193e25c-addc-c158-d6ff-63148e2e3ddd@redhat.com> Message-ID: Good to me. Can not say about barrier stuffs. -Zhengyu On 01/16/2018 12:38 PM, Roman Kennke wrote: > Am 16.01.2018 um 00:10 schrieb Aleksey Shipilev: >> http://cr.openjdk.java.net/~shade/shenandoah/backports/jdk8u-20180116/webrev.01/ >> >> >> This backports all outstanding work to sh/jdk8u. This passes a few >> nightlies in sh/jdk10. Some >> changes, notably moving the VM operations around required some >> fiddling to match the code in sh/jdk8u. >> >> Changes include: >> >> [backport] Increase test timeouts >> [backport] Report fwdptr size in JNI GetObjectSize >> [backport] Disable verification from non-Shenandoah VMOps. >> [backport] Cleanup reset_{next|complete}_mark_bitmap >> [backport] Verifier should check klass pointers before attempting to >> reach for object size >> [backport] TestSelectiveBarrierFlags times out due to too aggressive >> compilation mode >> [backport] Shenandoah SA implementation >> [backport] Allow use of fp spills around write barrier >> [backport] Rehash VMOperations and cycle driver mechanics for >> consistency >> [backport] Minor cleanup, uses latest Atomic API >> [backport] Match barrier fastpath checks better >> [backport] ShenandoahWriteBarrierRB flag to conditionally disable RB >> on WB fastpath >> >> NIO checkIndex fix, and assorted Windows compilation fixes were already >> backported by Zhengyu and Roman. >> >> Testing: hotspot_gc_shenandoah {fastdebug|release}, some benchmarks >> >> Thanks, >> -Aleksey >> > > Looks good to me. Can't say for sure about C2 changes. > > Roman > From shade at redhat.com Tue Jan 16 19:24:06 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 16 Jan 2018 20:24:06 +0100 Subject: RFR: ShConcurrentThread races with set_gc_state_bit Message-ID: <4aaa7fbf-27b4-7531-02e8-3a11e8c501d8@redhat.com> http://cr.openjdk.java.net/~shade/shenandoah/single-flag-races/webrev.01/ Zhengyu found this peculiar race: When ShConcurrentThread sets {evac,update_refs}_in_progress, the set_gc_state_bit checks for the safepoint. It turns out, after we checked for the safepoint and entered the Thread_lock-free branch, the safepoint may be over. The way out is to restore the *_concurrent family of methods, and acquire Thread_lock there unconditionally. Testing: hotspot_gc_shenandoah Thanks, -Aleksey From zgu at redhat.com Tue Jan 16 19:33:23 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 16 Jan 2018 14:33:23 -0500 Subject: RFR: ShConcurrentThread races with set_gc_state_bit In-Reply-To: <4aaa7fbf-27b4-7531-02e8-3a11e8c501d8@redhat.com> References: <4aaa7fbf-27b4-7531-02e8-3a11e8c501d8@redhat.com> Message-ID: <0567b57b-d152-be85-8189-4f4d8f7f31a2@redhat.com> 661 void ShenandoahHeap::set_gc_state_bit_concurrently(uint bit, bool value) { 1662 _gc_state.set_cond(bit, value); 1663 MutexLocker mu(Threads_lock); 1664 JavaThread::set_gc_state_all_threads(_gc_state.raw_value()); I wonder if you want to move _gc_state.set_cond(bit, value) into locked section? In case that global state is set, then we hit a safepoint ... not sure if it is matter. Otherwise, it looks good. Thanks, -Zhengyu On 01/16/2018 02:24 PM, Aleksey Shipilev wrote: > http://cr.openjdk.java.net/~shade/shenandoah/single-flag-races/webrev.01/ > > Zhengyu found this peculiar race: > > When ShConcurrentThread sets {evac,update_refs}_in_progress, the set_gc_state_bit checks for the > safepoint. It turns out, after we checked for the safepoint and entered the Thread_lock-free branch, > the safepoint may be over. The way out is to restore the *_concurrent family of methods, and acquire > Thread_lock there unconditionally. > > Testing: hotspot_gc_shenandoah > > Thanks, > -Aleksey > From shade at redhat.com Tue Jan 16 19:34:37 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 16 Jan 2018 20:34:37 +0100 Subject: RFR: ShConcurrentThread races with set_gc_state_bit In-Reply-To: <0567b57b-d152-be85-8189-4f4d8f7f31a2@redhat.com> References: <4aaa7fbf-27b4-7531-02e8-3a11e8c501d8@redhat.com> <0567b57b-d152-be85-8189-4f4d8f7f31a2@redhat.com> Message-ID: On 01/16/2018 08:33 PM, Zhengyu Gu wrote: > 661 void ShenandoahHeap::set_gc_state_bit_concurrently(uint bit, bool value) { > 1662?? _gc_state.set_cond(bit, value); > 1663?? MutexLocker mu(Threads_lock); > 1664?? JavaThread::set_gc_state_all_threads(_gc_state.raw_value()); > > > I wonder if you want to move _gc_state.set_cond(bit, value) into locked section? In case that global > state is set, then we hit a safepoint ... not sure if it is matter. Does not really matter: the Threads_lock is here to capture all threads. The GC state manipulation is MT-safe in itself. Thanks, -Aleksey From zgu at redhat.com Tue Jan 16 19:38:49 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 16 Jan 2018 14:38:49 -0500 Subject: RFR: ShConcurrentThread races with set_gc_state_bit In-Reply-To: References: <4aaa7fbf-27b4-7531-02e8-3a11e8c501d8@redhat.com> <0567b57b-d152-be85-8189-4f4d8f7f31a2@redhat.com> Message-ID: <97d969a1-5930-6805-5a67-932e7ef35d2c@redhat.com> On 01/16/2018 02:34 PM, Aleksey Shipilev wrote: > On 01/16/2018 08:33 PM, Zhengyu Gu wrote: >> 661 void ShenandoahHeap::set_gc_state_bit_concurrently(uint bit, bool value) { >> 1662 _gc_state.set_cond(bit, value); >> 1663 MutexLocker mu(Threads_lock); >> 1664 JavaThread::set_gc_state_all_threads(_gc_state.raw_value()); >> >> >> I wonder if you want to move _gc_state.set_cond(bit, value) into locked section? In case that global >> state is set, then we hit a safepoint ... not sure if it is matter. > > Does not really matter: the Threads_lock is here to capture all threads. The GC state manipulation > is MT-safe in itself. OK. Thanks, -Zhengyu > > Thanks, > -Aleksey > From ashipile at redhat.com Tue Jan 16 19:43:55 2018 From: ashipile at redhat.com (ashipile at redhat.com) Date: Tue, 16 Jan 2018 19:43:55 +0000 Subject: hg: shenandoah/jdk10: ShConcurrentThread races with set_gc_state_bit Message-ID: <201801161943.w0GJhtEk016595@aojmv0008.oracle.com> Changeset: 544322604347 Author: shade Date: 2018-01-16 20:23 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk10/rev/544322604347 ShConcurrentThread races with set_gc_state_bit ! src/hotspot/share/gc/shenandoah/shenandoahConcurrentThread.cpp ! src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp ! src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp From zgu at redhat.com Tue Jan 16 19:48:25 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 16 Jan 2018 14:48:25 -0500 Subject: RFR: Defer cleaning of system dictionary and friends to parallel cleaning phase In-Reply-To: <5f6bb6b8-edfc-61e5-121e-d30a0370f372@redhat.com> References: <85219d9c-2c55-edfd-76f2-282b01c1e2fd@redhat.com> <551fb0d8-da01-cfe2-ceaf-eb502bf4460f@redhat.com> <5f6bb6b8-edfc-61e5-121e-d30a0370f372@redhat.com> Message-ID: On 01/16/2018 01:15 PM, Roman Kennke wrote: > Am 16.01.2018 um 19:09 schrieb Aleksey Shipilev: >> On 01/16/2018 06:59 PM, Roman Kennke wrote: >>> Found this during traversal GC work: when cleaning the system >>> dictionary and friends, we do clean it >>> in the first pass, *single threaded* and then do the cleaning stuff >>> again, but multi-threaded. We >>> shall defer cleaning to the parallel phase to begin with. That's what >>> G1 does too. >>> >>> http://cr.openjdk.java.net/~rkennke/defer_cleaning/webrev.00/ >>> >> >> Awwwwwwww. >> >> Note that in G1, there are two calls to do_unloading: one from >> weak_refs_work with "false", and >> another from mark_sweep_phase1 with default "true". > > For ordinary concurrent GCs, it cleans everything in parallel phase, and > thus passes 'false' to do_unloading(). For full-GC, I guess they don't > care and do everything single-threaded. > >> Are you saying that doing this once with "false" is enough? It looks >> that ParallelCleaning stuff >> purges ResolvedMethodTable, but does it do >> ClassLoaderDataGraph::do_unloading with >> clean_previous_versions? Maybe we should cautiously say "full_gc", not >> "false" in the patch, so >> last-ditch can still do it? > > I believe the ParallelCleaning handles everything. Zhengyu? ParallelCleaning does handle ResolvedMethodTable ... -Zhengyu > > Roman From shade at redhat.com Tue Jan 16 20:28:51 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 16 Jan 2018 21:28:51 +0100 Subject: RFR: Defer cleaning of system dictionary and friends to parallel cleaning phase In-Reply-To: References: <85219d9c-2c55-edfd-76f2-282b01c1e2fd@redhat.com> <551fb0d8-da01-cfe2-ceaf-eb502bf4460f@redhat.com> <5f6bb6b8-edfc-61e5-121e-d30a0370f372@redhat.com> Message-ID: <174d5523-ed9b-8f00-e570-102cbbf186c7@redhat.com> On 01/16/2018 08:48 PM, Zhengyu Gu wrote: > > > On 01/16/2018 01:15 PM, Roman Kennke wrote: >> Am 16.01.2018 um 19:09 schrieb Aleksey Shipilev: >>> On 01/16/2018 06:59 PM, Roman Kennke wrote: >>>> Found this during traversal GC work: when cleaning the system dictionary and friends, we do >>>> clean it >>>> in the first pass, *single threaded* and then do the cleaning stuff again, but multi-threaded. We >>>> shall defer cleaning to the parallel phase to begin with. That's what G1 does too. >>>> >>>> http://cr.openjdk.java.net/~rkennke/defer_cleaning/webrev.00/ >>>> >>> >>> Awwwwwwww. >>> >>> Note that in G1, there are two calls to do_unloading: one from weak_refs_work with "false", and >>> another from mark_sweep_phase1 with default "true". >> >> For ordinary concurrent GCs, it cleans everything in parallel phase, and thus passes 'false' to >> do_unloading(). For full-GC, I guess they don't care and do everything single-threaded. >> >>> Are you saying that doing this once with "false" is enough? It looks that ParallelCleaning stuff >>> purges ResolvedMethodTable, but does it do ClassLoaderDataGraph::do_unloading with >>> clean_previous_versions? Maybe we should cautiously say "full_gc", not "false" in the patch, so >>> last-ditch can still do it? >> >> I believe the ParallelCleaning handles everything. Zhengyu? > ParallelCleaning does handle ResolvedMethodTable ... Ok then! -Aleksey From shade at redhat.com Tue Jan 16 21:36:58 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 16 Jan 2018 22:36:58 +0100 Subject: RFR: Refactor allocation failure and explicit GC handling Message-ID: http://cr.openjdk.java.net/~shade/shenandoah/refactor-af-explicit-gc/webrev.01/ This refactors the allocation failure and explicit GC handling, and prepares the code for the arrival of STW Degenerate GC. Tour of changes: 1. For historical reasons, we used to have the full_gc_* members in ShConcThread to handle the allocation failure, because that was the only option available for us. With the advent of degenerate CM and UR it started to mean just the "allocation failure". With Degenerate GC, it would further depart from its original meaning. So, renaming full_gc_* to alloc_failure_* to capture the real intent and rewiring accordingly is one part of the refactoring. Behavioral change: Alloc-failed threads are not immediately kicked after degenerated CM and degenerate UR, and instead they wait for the end of the cycle. This avoids a bad race against the alloc-failed threads that are coming with cancellation at the same time, and it keeps us away from OOM-during-evac when after-CM cleanup cannot regain enough space. This would be the behavior of the upcoming Degenerated GC anyway. 2. There is also the path that invokes explicit GCs. Again, for historical reasons, that originally meant only Full GC. With the advent of ExplicitGCInvokesConcurrent support, it means both concurrent and Full GC cycles! So, renaming conc_gc_* to explicit_gc_* and rewiring accordingly is the second part of refactoring. Behavioral change: Explicit GC no longer cancels the concurrent cycle, instead it waits for another control loop iteration to start explicit GC. This is for the best, because it both simplifies our handling logic, and allows requesters to wait for their own cycle. This is interesting when concurrent cycle is running, ExplicitGCInvokesConcurrent is enabled and System.gc() is called: the requesting thread would wait for one complete GC cycle to start and finish. 3. The logic in main control loop used to handle weird paths from cancellations back to Full GC. Having proper designations for alloc failure and explicit GCs help to write out the proper priorities for these events. This also allows us to potentially plug Degenerate GC for the out-of-cycle Allocation Failures, instead of unconditionally doing the Full GC. 4. Pulling the code out of ShenandoahHeap back to ShenandoahConcurrentThread allows to reduce coupling. Also, ShenandoahGCCause is eliminated in favor of proper GCCause, which simplifies logic further. 5. Additionally, gc+stats now tells things like these: ----- 8< ---------------------------------------------------------------------------------------- Under allocation pressure, concurrent cycles will cancel, and either continue phase under stop-the-world pause or result in stop-the-world Full GC. Increase heap size, tune GC heuristics, or lower allocation rate to avoid degenerated and Full GC cycles. 85 successful concurrent GC cycles 27 cancelled concurrent GC cycles (5 degenerated marks, 10 degenerated update refs, 12 Full GCs) 11 out-of-cycle allocation failures (11 Full GCs) 0 explicitly requested GC cycles (0 Full GCs) ----- 8< ---------------------------------------------------------------------------------------- Testing: hotspot_gc_shenandoah Thanks, -Aleksey From rkennke at redhat.com Tue Jan 16 22:16:04 2018 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 16 Jan 2018 23:16:04 +0100 Subject: RFR: Refactor allocation failure and explicit GC handling In-Reply-To: References: Message-ID: <12c87930-1d42-56e8-a502-90b545a8b7a8@redhat.com> Hi Aleksey, I like it. This mess was long overdue for some refactoring ;-) I am unsure about: > Behavioral change: Explicit GC no longer cancels the concurrent cycle, instead it waits for another > control loop iteration to start explicit GC. This is for the best, because it both simplifies our > handling logic, and allows requesters to wait for their own cycle. This is interesting when > concurrent cycle is running, ExplicitGCInvokesConcurrent is enabled and System.gc() is called: the > requesting thread would wait for one complete GC cycle to start and finish. That does mean that System.gc() at the beginning of marking would wait until marking+evac(+updaterefs?) finishes, then does the full-gc, and only then is the Java thread allowed to progress? I guess it does not really matter very much, but what is the point to wait for current cycle completion if it goes into full-gc anyway? I guess it is more relevant with ExplicitGCInvokesConcurrent (as you point out). Other than that, it is good for me. Thanks, Roman From shade at redhat.com Tue Jan 16 23:25:28 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 17 Jan 2018 00:25:28 +0100 Subject: RFR: Refactor allocation failure and explicit GC handling In-Reply-To: <12c87930-1d42-56e8-a502-90b545a8b7a8@redhat.com> References: <12c87930-1d42-56e8-a502-90b545a8b7a8@redhat.com> Message-ID: <48267858-d451-946b-24a4-048d868942cd@redhat.com> On 01/16/2018 11:16 PM, Roman Kennke wrote: > I am unsure about: > >> Behavioral change: Explicit GC no longer cancels the concurrent cycle, instead it waits for another >> control loop iteration to start explicit GC. This is for the best, because it both simplifies our >> handling logic, and allows requesters to wait for their own cycle. This is interesting when >> concurrent cycle is running, ExplicitGCInvokesConcurrent is enabled and System.gc() is called: the >> requesting thread would wait for one complete GC cycle to start and finish. > > That does mean that System.gc() at the beginning of marking would wait until > marking+evac(+updaterefs?) finishes, then does the full-gc, and only then is the Java thread allowed > to progress? Yes. Think about it like the event loop, where Full GC request gets queued while Conc GC is being processed at the moment, and the Full GC requester waits its place in line. Basically shifts System.gc() from being "OMG, drop everything" to being "Noted, take a number, we shall do this at our convenience". > I guess it does not really matter very much, but what is the point to wait for current > cycle completion if it goes into full-gc anyway? There is little performance point, I guess, and there are no performance guarantees for System.gc either :) There are two off-the-bat considerations: a) the abrupt explicit GC in the middle of regular cycle can wreck up inflight compaction decisions of smarter relocation heuristics; b) as concurrent GC cycle runs, we have more chances to coalesce explicit GC-s from multiple threads, and do one Full GC at once, not many quick back-to-back cancellations. But ultimately, this thing is really the implementation convenience: it makes cancellations *only* happen during allocation failures, which simplifies reasoning about the whole thing. This helps a lot with Degenerated GC, because cancellation is then the sole route to Degenerate GC (which can then be upgraded to Full GC), without the need to figure out if that cancellation was due to explicit GC. -Aleksey From zgu at redhat.com Wed Jan 17 02:45:24 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 16 Jan 2018 21:45:24 -0500 Subject: RFR: Refactor allocation failure and explicit GC handling In-Reply-To: References: Message-ID: ShenandoahConcurrentThread::handle_alloc_failure() now takes monitor lock, and this method is in ShenandoahHeap::allocate_memory() path, in turn, can be called inside write barrier ... seems to be the scenario we talked before, that we *can not* do. Maybe, I missed something? Thanks, -Zhengyu On 01/16/2018 04:36 PM, Aleksey Shipilev wrote: > http://cr.openjdk.java.net/~shade/shenandoah/refactor-af-explicit-gc/webrev.01/ > > This refactors the allocation failure and explicit GC handling, and prepares the code for the > arrival of STW Degenerate GC. > > Tour of changes: > > 1. For historical reasons, we used to have the full_gc_* members in ShConcThread to handle the > allocation failure, because that was the only option available for us. With the advent of degenerate > CM and UR it started to mean just the "allocation failure". With Degenerate GC, it would further > depart from its original meaning. So, renaming full_gc_* to alloc_failure_* to capture the real > intent and rewiring accordingly is one part of the refactoring. > > Behavioral change: Alloc-failed threads are not immediately kicked after degenerated CM and > degenerate UR, and instead they wait for the end of the cycle. This avoids a bad race against the > alloc-failed threads that are coming with cancellation at the same time, and it keeps us away from > OOM-during-evac when after-CM cleanup cannot regain enough space. This would be the behavior of the > upcoming Degenerated GC anyway. > > > 2. There is also the path that invokes explicit GCs. Again, for historical reasons, that originally > meant only Full GC. With the advent of ExplicitGCInvokesConcurrent support, it means both concurrent > and Full GC cycles! So, renaming conc_gc_* to explicit_gc_* and rewiring accordingly is the second > part of refactoring. > > Behavioral change: Explicit GC no longer cancels the concurrent cycle, instead it waits for another > control loop iteration to start explicit GC. This is for the best, because it both simplifies our > handling logic, and allows requesters to wait for their own cycle. This is interesting when > concurrent cycle is running, ExplicitGCInvokesConcurrent is enabled and System.gc() is called: the > requesting thread would wait for one complete GC cycle to start and finish. > > > 3. The logic in main control loop used to handle weird paths from cancellations back to Full GC. > Having proper designations for alloc failure and explicit GCs help to write out the proper > priorities for these events. This also allows us to potentially plug Degenerate GC for the > out-of-cycle Allocation Failures, instead of unconditionally doing the Full GC. > > > 4. Pulling the code out of ShenandoahHeap back to ShenandoahConcurrentThread allows to reduce > coupling. Also, ShenandoahGCCause is eliminated in favor of proper GCCause, which simplifies logic > further. > > > 5. Additionally, gc+stats now tells things like these: > > ----- 8< ---------------------------------------------------------------------------------------- > > Under allocation pressure, concurrent cycles will cancel, and either continue phase under > stop-the-world pause or result in stop-the-world Full GC. Increase heap size, tune GC heuristics, > or lower allocation rate to avoid degenerated and Full GC cycles. > > 85 successful concurrent GC cycles > 27 cancelled concurrent GC cycles (5 degenerated marks, 10 degenerated update refs, 12 Full GCs) > 11 out-of-cycle allocation failures (11 Full GCs) > 0 explicitly requested GC cycles (0 Full GCs) > > ----- 8< ---------------------------------------------------------------------------------------- > > > Testing: hotspot_gc_shenandoah > > Thanks, > -Aleksey > From rwestrel at redhat.com Wed Jan 17 08:23:09 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 17 Jan 2018 09:23:09 +0100 Subject: RFR: [9] Bulk backports to sh/jdk9 In-Reply-To: <2eacce66-6153-b8c8-1352-906990d19080@redhat.com> References: <2eacce66-6153-b8c8-1352-906990d19080@redhat.com> Message-ID: > http://cr.openjdk.java.net/~shade/shenandoah/backports/jdk9-20180115/webrev.01/ C2 changes look ok ok. Roland. From rwestrel at redhat.com Wed Jan 17 08:23:46 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 17 Jan 2018 09:23:46 +0100 Subject: RFR: [8u] Bulk backports to sh/jdk8u In-Reply-To: <6b294bb5-bb6c-2bff-b879-6515fd5970b7@redhat.com> References: <6b294bb5-bb6c-2bff-b879-6515fd5970b7@redhat.com> Message-ID: > http://cr.openjdk.java.net/~shade/shenandoah/backports/jdk8u-20180116/webrev.01/ C2 changes look ok. Roland. From shade at redhat.com Wed Jan 17 08:32:11 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 17 Jan 2018 09:32:11 +0100 Subject: RFR: Refactor allocation failure and explicit GC handling In-Reply-To: References: Message-ID: <561d2a3a-c580-4911-7ba5-dbe38bee7415@redhat.com> On 01/17/2018 03:45 AM, Zhengyu Gu wrote: > ShenandoahConcurrentThread::handle_alloc_failure() now takes monitor lock, and this method is in > ShenandoahHeap::allocate_memory() path, in turn, can be called inside write barrier ... seems to be > the scenario we talked before, that we *can not* do. Note that allocate_memory on the *shared/TLAB* allocation path was taking a lock in the old code too: see the path in ShHeap::allocate_memory -> ShHeap::collect(_allocation_failure) -> ShConcThread::do_full_gc. The trick here is not to lock when shared_gc/GCLAB allocation fails, and this is why we have separate ::handle_alloc_failure_evac(). Thanks, -Aleksey From rkennke at redhat.com Wed Jan 17 09:01:32 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 17 Jan 2018 10:01:32 +0100 Subject: RFR: Refactor allocation failure and explicit GC handling In-Reply-To: <561d2a3a-c580-4911-7ba5-dbe38bee7415@redhat.com> References: <561d2a3a-c580-4911-7ba5-dbe38bee7415@redhat.com> Message-ID: <040313a8-7c7d-ce4f-3d75-78b050dc58aa@redhat.com> Am 17.01.2018 um 09:32 schrieb Aleksey Shipilev: > On 01/17/2018 03:45 AM, Zhengyu Gu wrote: >> ShenandoahConcurrentThread::handle_alloc_failure() now takes monitor lock, and this method is in >> ShenandoahHeap::allocate_memory() path, in turn, can be called inside write barrier ... seems to be >> the scenario we talked before, that we *can not* do. > > Note that allocate_memory on the *shared/TLAB* allocation path was taking a lock in the old code > too: see the path in ShHeap::allocate_memory -> ShHeap::collect(_allocation_failure) -> > ShConcThread::do_full_gc. The trick here is not to lock when shared_gc/GCLAB allocation fails, and > this is why we have separate ::handle_alloc_failure_evac(). > Yes. We must not take locks under the write-barrier, because that is a leaf-call and must not possibly take a safepoint. It's ok to take locks in the allocation(-failure) path, because that is a no-leaf call, and may take safepoints. Infact, this is what happens with other GCs: allocation failure goes straight to VMThread::execute() ... I wonder if we could also do this and avoid the locking? But then, how to communicate with the ShConcThread ? Roman From shade at redhat.com Wed Jan 17 09:03:50 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 17 Jan 2018 10:03:50 +0100 Subject: RFR: Refactor allocation failure and explicit GC handling In-Reply-To: <040313a8-7c7d-ce4f-3d75-78b050dc58aa@redhat.com> References: <561d2a3a-c580-4911-7ba5-dbe38bee7415@redhat.com> <040313a8-7c7d-ce4f-3d75-78b050dc58aa@redhat.com> Message-ID: On 01/17/2018 10:01 AM, Roman Kennke wrote: > Am 17.01.2018 um 09:32 schrieb Aleksey Shipilev: > Infact, this is what happens with other GCs: allocation failure goes straight to > VMThread::execute() ... I wonder if we could also do this and avoid the locking? But then, how to > communicate with the ShConcThread ? No, we should not do the VMOp right away. Our ShConcThread is really a Driver, and we need to tell the Driver we have experienced allocation failure. Then it could decide what to do: Full GC, Degenerated GC, continue with Conc GC, fail hard... Thanks, -Aleksey From ashipile at redhat.com Wed Jan 17 09:52:46 2018 From: ashipile at redhat.com (ashipile at redhat.com) Date: Wed, 17 Jan 2018 09:52:46 +0000 Subject: hg: shenandoah/jdk9/hotspot: 12 new changesets Message-ID: <201801170952.w0H9qk5B000404@aojmv0008.oracle.com> Changeset: d0ad502cc3a0 Author: rkennke Date: 2018-01-15 16:29 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/d0ad502cc3a0 [backport] Increase test timeouts ! test/gc/shenandoah/EvilSyncBug.java ! test/gc/shenandoah/jvmti/TestHeapDump.java Changeset: e18143c303e9 Author: shade Date: 2018-01-15 16:32 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/e18143c303e9 [backport] Report fwdptr size in JNI GetObjectSize ! src/share/vm/prims/jvmtiEnv.cpp ! src/share/vm/prims/whitebox.cpp Changeset: a0be695501fe Author: rkennke Date: 2018-01-15 16:33 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/a0be695501fe [backport] Disable verification from non-Shenandoah VMOps. ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp Changeset: 9da7354496dd Author: shade Date: 2018-01-15 16:37 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/9da7354496dd [backport] Cleanup reset_{next|complete}_mark_bitmap ! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.cpp ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc/shenandoah/shenandoahHeap.hpp ! src/share/vm/gc/shenandoah/shenandoahMarkCompact.cpp Changeset: 3b3dbadb82eb Author: shade Date: 2018-01-15 16:39 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/3b3dbadb82eb [backport] Verifier should check klass pointers before attempting to reach for object size ! src/share/vm/gc/shenandoah/shenandoahVerifier.cpp Changeset: 8a3aef24b983 Author: shade Date: 2018-01-15 16:39 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/8a3aef24b983 [backport] TestSelectiveBarrierFlags times out due to too aggressive compilation mode ! test/gc/shenandoah/TestSelectiveBarrierFlags.java Changeset: 2bca755bd2e5 Author: zgu Date: 2018-01-15 16:52 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/2bca755bd2e5 [backport] Shenandoah SA implementation ! src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/gc/shared/CollectedHeap.java ! src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/gc/shared/CollectedHeapName.java + src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/gc/shenandoah/ShenandoahHeap.java + src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/gc/shenandoah/ShenandoahHeapRegion.java + src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/gc/shenandoah/ShenandoahHeapRegionSet.java ! src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/memory/Universe.java ! src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/ObjectHeap.java ! src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/tools/HeapSummary.java ! src/share/vm/gc/shenandoah/shenandoahHeap.hpp ! src/share/vm/gc/shenandoah/shenandoahHeapRegion.hpp ! src/share/vm/gc/shenandoah/shenandoahHeapRegionSet.hpp + src/share/vm/gc/shenandoah/vmStructs_shenandoah.hpp ! src/share/vm/runtime/vmStructs.cpp Changeset: 2f34f1efc3e1 Author: roland Date: 2018-01-15 17:03 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/2f34f1efc3e1 [backport] Allow use of fp spills around write barrier ! src/share/vm/opto/lcm.cpp Changeset: e1bdfc09b91a Author: shade Date: 2018-01-15 17:24 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/e1bdfc09b91a [backport] Rehash VMOperations and cycle driver mechanics for consistency ! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.cpp ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc/shenandoah/shenandoahHeap.hpp ! src/share/vm/gc/shenandoah/shenandoahMarkCompact.cpp ! src/share/vm/gc/shenandoah/shenandoahPhaseTimings.cpp ! src/share/vm/gc/shenandoah/shenandoahPhaseTimings.hpp ! src/share/vm/gc/shenandoah/shenandoahUtils.cpp ! src/share/vm/gc/shenandoah/shenandoahUtils.hpp ! src/share/vm/gc/shenandoah/shenandoahWorkerPolicy.cpp ! src/share/vm/gc/shenandoah/shenandoahWorkerPolicy.hpp ! src/share/vm/gc/shenandoah/vm_operations_shenandoah.cpp Changeset: a335541ed527 Author: zgu Date: 2018-01-15 17:28 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/a335541ed527 [backport] Minor cleanup, uses latest Atomic API ! src/share/vm/gc/shenandoah/shenandoahCodeRoots.hpp Changeset: fcf4e5e7b36f Author: shade Date: 2018-01-15 17:29 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/fcf4e5e7b36f [backport] Match barrier fastpath checks better ! src/cpu/x86/vm/x86_64.ad Changeset: b2bc1c1c6fd7 Author: shade Date: 2018-01-15 17:32 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/b2bc1c1c6fd7 [backport] ShenandoahWriteBarrierRB flag to conditionally disable RB on WB fastpath ! src/cpu/aarch64/vm/macroAssembler_aarch64.cpp ! src/cpu/x86/vm/macroAssembler_x86.cpp ! src/share/vm/gc/shenandoah/shenandoah_globals.hpp ! src/share/vm/opto/shenandoahSupport.cpp From ashipile at redhat.com Wed Jan 17 10:33:54 2018 From: ashipile at redhat.com (ashipile at redhat.com) Date: Wed, 17 Jan 2018 10:33:54 +0000 Subject: hg: shenandoah/jdk8u/hotspot: 12 new changesets Message-ID: <201801171033.w0HAXtRT016624@aojmv0008.oracle.com> Changeset: c580b405b19c Author: rkennke Date: 2018-01-15 18:56 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/c580b405b19c [backport] Increase test timeouts ! test/gc/shenandoah/EvilSyncBug.java ! test/gc/shenandoah/jvmti/TestHeapDump.sh Changeset: 889331b172e1 Author: shade Date: 2018-01-15 18:56 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/889331b172e1 [backport] Report fwdptr size in JNI GetObjectSize ! src/share/vm/prims/jvmtiEnv.cpp ! src/share/vm/prims/whitebox.cpp Changeset: 229a50c88055 Author: rkennke Date: 2018-01-15 18:56 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/229a50c88055 [backport] Disable verification from non-Shenandoah VMOps. ! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.cpp Changeset: 8459d5e19134 Author: shade Date: 2018-01-15 18:56 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/8459d5e19134 [backport] Cleanup reset_{next|complete}_mark_bitmap ! src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentThread.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.hpp ! src/share/vm/gc_implementation/shenandoah/shenandoahMarkCompact.cpp Changeset: 1a1daa04a9ca Author: shade Date: 2018-01-15 18:56 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/1a1daa04a9ca [backport] Verifier should check klass pointers before attempting to reach for object size ! src/share/vm/gc_implementation/shenandoah/shenandoahVerifier.cpp Changeset: a53bcb78b95d Author: shade Date: 2018-01-15 18:56 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/a53bcb78b95d [backport] TestSelectiveBarrierFlags times out due to too aggressive compilation mode ! test/gc/shenandoah/TestSelectiveBarrierFlags.java Changeset: b9559ebe9575 Author: zgu Date: 2018-01-15 19:21 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/b9559ebe9575 [backport] Shenandoah SA implementation + agent/src/share/classes/sun/jvm/hotspot/gc_implementation/shenandoah/ShenandoahHeap.java + agent/src/share/classes/sun/jvm/hotspot/gc_implementation/shenandoah/ShenandoahHeapRegion.java + agent/src/share/classes/sun/jvm/hotspot/gc_implementation/shenandoah/ShenandoahHeapRegionSet.java ! agent/src/share/classes/sun/jvm/hotspot/gc_interface/CollectedHeap.java ! agent/src/share/classes/sun/jvm/hotspot/gc_interface/CollectedHeapName.java ! agent/src/share/classes/sun/jvm/hotspot/memory/Universe.java ! agent/src/share/classes/sun/jvm/hotspot/oops/ObjectHeap.java ! agent/src/share/classes/sun/jvm/hotspot/tools/HeapSummary.java ! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.hpp ! src/share/vm/gc_implementation/shenandoah/shenandoahHeapRegion.hpp ! src/share/vm/gc_implementation/shenandoah/shenandoahHeapRegionSet.hpp + src/share/vm/gc_implementation/shenandoah/vmStructs_shenandoah.hpp ! src/share/vm/runtime/vmStructs.cpp Changeset: 2310d6a52d04 Author: roland Date: 2018-01-17 10:28 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/2310d6a52d04 [backport] Allow use of fp spills around write barrier ! src/share/vm/opto/lcm.cpp Changeset: 6d265ee073d5 Author: shade Date: 2018-01-17 10:28 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/6d265ee073d5 [backport] Rehash VMOperations and cycle driver mechanics for consistency ! src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentThread.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.hpp ! src/share/vm/gc_implementation/shenandoah/shenandoahMarkCompact.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahPhaseTimings.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahPhaseTimings.hpp ! src/share/vm/gc_implementation/shenandoah/shenandoahUtils.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahUtils.hpp ! src/share/vm/gc_implementation/shenandoah/shenandoahWorkerPolicy.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahWorkerPolicy.hpp ! src/share/vm/gc_implementation/shenandoah/vm_operations_shenandoah.cpp Changeset: 65ff5f8ac60f Author: zgu Date: 2018-01-17 10:28 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/65ff5f8ac60f [backport] Minor cleanup, uses latest Atomic API ! src/share/vm/gc_implementation/shenandoah/shenandoahCodeRoots.hpp Changeset: 755e302d100e Author: shade Date: 2018-01-17 10:28 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/755e302d100e [backport] Match barrier fastpath checks better ! src/cpu/x86/vm/x86_64.ad Changeset: 32480cdd3a60 Author: shade Date: 2018-01-17 10:28 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/32480cdd3a60 [backport] ShenandoahWriteBarrierRB flag to conditionally disable RB on WB fastpath ! src/cpu/aarch64/vm/macroAssembler_aarch64.cpp ! src/cpu/x86/vm/macroAssembler_x86.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoah_globals.hpp ! src/share/vm/opto/shenandoahSupport.cpp From zgu at redhat.com Wed Jan 17 13:35:47 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 17 Jan 2018 08:35:47 -0500 Subject: RFR: Refactor allocation failure and explicit GC handling In-Reply-To: <561d2a3a-c580-4911-7ba5-dbe38bee7415@redhat.com> References: <561d2a3a-c580-4911-7ba5-dbe38bee7415@redhat.com> Message-ID: <5d1c1ddf-cac2-1e16-859c-6b30b733ca46@redhat.com> On 01/17/2018 03:32 AM, Aleksey Shipilev wrote: > On 01/17/2018 03:45 AM, Zhengyu Gu wrote: >> ShenandoahConcurrentThread::handle_alloc_failure() now takes monitor lock, and this method is in >> ShenandoahHeap::allocate_memory() path, in turn, can be called inside write barrier ... seems to be >> the scenario we talked before, that we *can not* do. > > Note that allocate_memory on the *shared/TLAB* allocation path was taking a lock in the old code > too: see the path in ShHeap::allocate_memory -> ShHeap::collect(_allocation_failure) -> > ShConcThread::do_full_gc. The trick here is not to lock when shared_gc/GCLAB allocation fails, and > this is why we have separate ::handle_alloc_failure_evac(). Ah, it is a bit hard to read, could you add some comments like: ShenandoahHeap: 726 if (type == _alloc_tlab || type == _alloc_shared) { .... } else { assert(type == _alloc_gclab || type == _alloc_shared_gc, ..."); // OOM handled by .... } Thanks, -Zhengyu > > Thanks, > -Aleksey > From shade at redhat.com Wed Jan 17 14:12:21 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 17 Jan 2018 15:12:21 +0100 Subject: RFR: Refactor allocation failure and explicit GC handling In-Reply-To: <5d1c1ddf-cac2-1e16-859c-6b30b733ca46@redhat.com> References: <561d2a3a-c580-4911-7ba5-dbe38bee7415@redhat.com> <5d1c1ddf-cac2-1e16-859c-6b30b733ca46@redhat.com> Message-ID: <74baaf19-1715-5f8a-f700-7d4f0cc08441@redhat.com> On 01/17/2018 02:35 PM, Zhengyu Gu wrote: > > > On 01/17/2018 03:32 AM, Aleksey Shipilev wrote: >> On 01/17/2018 03:45 AM, Zhengyu Gu wrote: >>> ShenandoahConcurrentThread::handle_alloc_failure() now takes monitor lock, and this method is in >>> ShenandoahHeap::allocate_memory() path, in turn, can be called inside write barrier ... seems to be >>> the scenario we talked before, that we *can not* do. >> >> Note that allocate_memory on the *shared/TLAB* allocation path was taking a lock in the old code >> too: see the path in ShHeap::allocate_memory -> ShHeap::collect(_allocation_failure) -> >> ShConcThread::do_full_gc. The trick here is not to lock when shared_gc/GCLAB allocation fails, and >> this is why we have separate ::handle_alloc_failure_evac(). > > Ah, it is a bit hard to read, could you add some comments like: > > ShenandoahHeap: > ?726?? if (type == _alloc_tlab || type == _alloc_shared) { > ?? .... > ?} else { > ??? assert(type == _alloc_gclab || type == _alloc_shared_gc, ..."); > ??? // OOM handled by .... > ?} > That makes sense, added: diff -r 898a5ca31274 src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp --- a/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp Tue Jan 16 22:15:34 2018 +0100 +++ b/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp Wed Jan 17 15:11:55 2018 +0100 @@ -741,6 +741,10 @@ concurrent_thread()->handle_alloc_failure(); result = allocate_memory_under_lock(word_size, type, in_new_region); } + } else { + assert(type == _alloc_gclab || type == _alloc_shared_gc, "Can only accept these types here"); + // Do not call handle_alloc_failure() here, because we cannot block. + // The allocation failure would be handled by the WB slowpath with handle_alloc_failure_evac(). } if (in_new_region) { Thanks, -Aleksey From zgu at redhat.com Wed Jan 17 14:21:33 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 17 Jan 2018 09:21:33 -0500 Subject: RFR: Refactor allocation failure and explicit GC handling In-Reply-To: <74baaf19-1715-5f8a-f700-7d4f0cc08441@redhat.com> References: <561d2a3a-c580-4911-7ba5-dbe38bee7415@redhat.com> <5d1c1ddf-cac2-1e16-859c-6b30b733ca46@redhat.com> <74baaf19-1715-5f8a-f700-7d4f0cc08441@redhat.com> Message-ID: On 01/17/2018 09:12 AM, Aleksey Shipilev wrote: > On 01/17/2018 02:35 PM, Zhengyu Gu wrote: >> >> >> On 01/17/2018 03:32 AM, Aleksey Shipilev wrote: >>> On 01/17/2018 03:45 AM, Zhengyu Gu wrote: >>>> ShenandoahConcurrentThread::handle_alloc_failure() now takes monitor lock, and this method is in >>>> ShenandoahHeap::allocate_memory() path, in turn, can be called inside write barrier ... seems to be >>>> the scenario we talked before, that we *can not* do. >>> >>> Note that allocate_memory on the *shared/TLAB* allocation path was taking a lock in the old code >>> too: see the path in ShHeap::allocate_memory -> ShHeap::collect(_allocation_failure) -> >>> ShConcThread::do_full_gc. The trick here is not to lock when shared_gc/GCLAB allocation fails, and >>> this is why we have separate ::handle_alloc_failure_evac(). >> >> Ah, it is a bit hard to read, could you add some comments like: >> >> ShenandoahHeap: >> 726 if (type == _alloc_tlab || type == _alloc_shared) { >> .... >> } else { >> assert(type == _alloc_gclab || type == _alloc_shared_gc, ..."); >> // OOM handled by .... >> } >> > > That makes sense, added: > > diff -r 898a5ca31274 src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp > --- a/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp Tue Jan 16 22:15:34 2018 +0100 > +++ b/src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp Wed Jan 17 15:11:55 2018 +0100 > @@ -741,6 +741,10 @@ > concurrent_thread()->handle_alloc_failure(); > result = allocate_memory_under_lock(word_size, type, in_new_region); > } > + } else { > + assert(type == _alloc_gclab || type == _alloc_shared_gc, "Can only accept these types here"); > + // Do not call handle_alloc_failure() here, because we cannot block. > + // The allocation failure would be handled by the WB slowpath with handle_alloc_failure_evac(). > } > > if (in_new_region) { > Great! Looks good to me. Thanks, -Zhengyu > Thanks, > -Aleksey > From zgu at redhat.com Wed Jan 17 14:28:55 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 17 Jan 2018 09:28:55 -0500 Subject: RFR: [8u] Bulk backports to sh/jdk8u In-Reply-To: References: <6b294bb5-bb6c-2bff-b879-6515fd5970b7@redhat.com> Message-ID: <4b28c064-9fab-4386-7c3c-d7d92245b28e@redhat.com> SA and cleanup look good. -Zhengyu On 01/17/2018 03:23 AM, Roland Westrelin wrote: > >> http://cr.openjdk.java.net/~shade/shenandoah/backports/jdk8u-20180116/webrev.01/ > > C2 changes look ok. > > Roland. > From roman at kennke.org Wed Jan 17 14:37:57 2018 From: roman at kennke.org (roman at kennke.org) Date: Wed, 17 Jan 2018 14:37:57 +0000 Subject: hg: shenandoah/jdk10: Defer cleaning of system dictionary and friends to parallel cleaning phase Message-ID: <201801171437.w0HEbvcs028801@aojmv0008.oracle.com> Changeset: 1d1238a0603b Author: rkennke Date: 2018-01-17 15:33 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk10/rev/1d1238a0603b Defer cleaning of system dictionary and friends to parallel cleaning phase ! src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp From rkennke at redhat.com Wed Jan 17 14:37:55 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 17 Jan 2018 15:37:55 +0100 Subject: RFR: Traveral GC heuristics In-Reply-To: References: Message-ID: Testing showed up some regressions in non-traversal code and two issues that I introduced (or haven't fixed) when single-flag patch arrived. The following now passes hotspot_gc_shenandoah tests and runs of specjvm with fastdebug with -XX:+ShenandoahVerify -XX:+ShenandoahGCHeuristics=traversal, with -XX:TieredStopAtLevel=0|1|4 Differential: http://cr.openjdk.java.net/~rkennke/traversal/webrev.01.diff/ Full: http://cr.openjdk.java.net/~rkennke/traversal/webrev.01/ Please review, test, comment, etc. :-) Cheers, Roman > This started out as a smallish partial-GC experiment, then into a clone > of partial GC, and ended up as a standalone GC mode for Shenandoah, > which is a frankensteinization of partial+concurrent-marking, with some > goodies :-) > > The idea is to do everything, marking+evacuation+update-refs, in one > single phase. This is not very difficult to do: while traversing, > evacuate objects that are in the Cset, and update references as we go. I > chose to traverse the heap using an incremental-update approach, mostly > because this is what partial GC does, and as said above, this started > out as a clone of partial :-) > > The tricky part is to choose the Cset: I made it such that each GC cycle > collects liveness information, and bases the decision about Cset in the > next cycle on that liveness information. Yes, this means the first cycle > does not collect anything (except immediate garbage). > > Advantages: > - obviously, touching all live objects only once means less time spent > in GC. Measurements show that traversing the heap and doing everything > is only slightly longer than Shenandoah's marking phase, and this might > actually be because we also need to mark through newly allocated objects. > - Traversal-order evacuation gives us 10x increase in ordering-sensitive > microbenchmark: > https://shipilev.net/jvm-anatomy-park/11-moving-gc-locality/ > > - Simpler barriers: i-u style barriers don't need to load the pre-value, > and can be optimized much better (hoisted out of hot paths, etc). Some > of it is already done in this patch, but there are plenty of > opportunities to make it even better. > - Possibly less floating garbage because we trace through newly > allocated objects too, and don't treat it implicitely live. > - we don't need a keep-alive-barrier for Reference.get() which means we > keep fewer referents alive just because they happen to be accessed > during GC. > - MWF is only a switch away (if I understand MWF correctly): > -XX:+ShenandoahMWF > - It does not need RBs in the WB fast-path, because outside of the > single phase, nothing is ever forwarded. > - It does not need the membar stuff in the WBs because we turn on/off > the phase during safepoint > > Disadvantages: > - Store-value barrier needs to be a WB, RB is not sufficient. The > storeval barrier is there to ensure only to-space values ever get > written to fields during update-refs. 3-phase Shenandoah doesn't > evacuate during update-refs, and therefore RB is enough. We need WB > here. (I believe this is off-set by optimization opportunities, see above) > - Known I-U problem: mutators can outrun the GC with allocations and let > us not terminate. > - It needs barriers for constants (need to check this). > > Stuff left to do: > - Implement sane degeneration: if we hit OOM, we simply restart and go > into full-GC. > - Depending on degen: make heuristics adaptive. Currently it requires > manual tweaking of thresholds. > > Relevant knobs: > - ShenandoahGarbageThreshold: regions with more garbage than this go > into the Cset. Notice that this is based on the *previous* cycle, so we > may actually have much more garbage (but not less). > - ShenandoahFreeThreshold: start GC when we have less than that much > free heap. > > I'll not go into all the details for now and give you the code: > http://cr.openjdk.java.net/~rkennke/traversal/webrev.00/ > > > Roman From shade at redhat.com Wed Jan 17 14:44:21 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 17 Jan 2018 15:44:21 +0100 Subject: RFR: Traveral GC heuristics In-Reply-To: References: Message-ID: On 01/17/2018 03:37 PM, Roman Kennke wrote: > Testing showed up some regressions in non-traversal code and two issues that I introduced (or > haven't fixed) when single-flag patch arrived. > > The following now passes hotspot_gc_shenandoah tests and runs of specjvm with fastdebug with > -XX:+ShenandoahVerify -XX:+ShenandoahGCHeuristics=traversal, with -XX:TieredStopAtLevel=0|1|4 > > Differential: > http://cr.openjdk.java.net/~rkennke/traversal/webrev.01.diff/ Small-ish questions: *) This solves some Partial GC bug, not Traversal GC bug? If so, can you RFR and push it separately? --- old/src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.cpp 2018-01-17 15:32:54.756247073 +0100 +++ new/src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.cpp 2018-01-17 15:32:54.391251897 +0100 @@ -169,7 +169,7 @@ } bool ShenandoahBarrierSet::need_update_refs_barrier() { - if (_heap->is_concurrent_partial_in_progress() || _heap->is_concurrent_traversal_in_progress()) { + if (UseShenandoahMatrix || _heap->is_concurrent_traversal_in_progress()) { return true; } if (_heap->shenandoahPolicy()->update_refs()) { *) I think we have discussed the RFR for this -- does it turn out to be needed after all? --- old/src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp 2018-01-17 15:32:54.135255280 +0100 +++ new/src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp 2018-01-17 15:32:53.869258796 +0100 @@ -727,7 +727,7 @@ const int referent_offset = java_lang_ref_Reference::referent_offset; guarantee(referent_offset > 0, "referent offset not initialized"); - if (UseG1GC || UseShenandoahGC) { + if (UseG1GC || (UseShenandoahGC && ShenandoahKeepAliveBarrier)) { Label slow_path; // rbx: method Thanks, -Aleksey From rkennke at redhat.com Wed Jan 17 15:08:19 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 17 Jan 2018 16:08:19 +0100 Subject: RFR: Traveral GC heuristics In-Reply-To: References: Message-ID: <9e18ba8d-593f-57a7-12cc-99cffe164a88@redhat.com> Am 17.01.2018 um 15:44 schrieb Aleksey Shipilev: > On 01/17/2018 03:37 PM, Roman Kennke wrote: >> Testing showed up some regressions in non-traversal code and two issues that I introduced (or >> haven't fixed) when single-flag patch arrived. >> >> The following now passes hotspot_gc_shenandoah tests and runs of specjvm with fastdebug with >> -XX:+ShenandoahVerify -XX:+ShenandoahGCHeuristics=traversal, with -XX:TieredStopAtLevel=0|1|4 >> >> Differential: >> http://cr.openjdk.java.net/~rkennke/traversal/webrev.01.diff/ > > Small-ish questions: > > *) This solves some Partial GC bug, not Traversal GC bug? If so, can you RFR and push it separately? > > --- old/src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.cpp 2018-01-17 15:32:54.756247073 +0100 > +++ new/src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.cpp 2018-01-17 15:32:54.391251897 +0100 > @@ -169,7 +169,7 @@ > } > > bool ShenandoahBarrierSet::need_update_refs_barrier() { > - if (_heap->is_concurrent_partial_in_progress() || _heap->is_concurrent_traversal_in_progress()) { > + if (UseShenandoahMatrix || _heap->is_concurrent_traversal_in_progress()) { > return true; > } > if (_heap->shenandoahPolicy()->update_refs()) { > > No, this is a bug that I introduced with webrev.00 and reverted back with webrev.01. When using matrix, we always need to do the update-matrix-stuff, not only when partial GC is in progress. With traversal, we only need to go into the barrier when the traversal GC is in progress. > *) I think we have discussed the RFR for this -- does it turn out to be needed after all? > > --- old/src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp 2018-01-17 15:32:54.135255280 +0100 > +++ new/src/hotspot/cpu/x86/templateInterpreterGenerator_x86.cpp 2018-01-17 15:32:53.869258796 +0100 > @@ -727,7 +727,7 @@ > const int referent_offset = java_lang_ref_Reference::referent_offset; > guarantee(referent_offset > 0, "referent offset not initialized"); > > - if (UseG1GC || UseShenandoahGC) { > + if (UseG1GC || (UseShenandoahGC && ShenandoahKeepAliveBarrier)) { > Label slow_path; > // rbx: method Oops. Reverted here: Diff: http://cr.openjdk.java.net/~rkennke/traversal/webrev.02.diff/ Full: http://cr.openjdk.java.net/~rkennke/traversal/webrev.02/ (give it some seconds to upload) Better? Roman From zgu at redhat.com Wed Jan 17 17:10:41 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 17 Jan 2018 12:10:41 -0500 Subject: RFR: Traveral GC heuristics In-Reply-To: References: Message-ID: <694e454c-dd0f-ceb5-5258-a89fbd9690a4@redhat.com> shenandoahOopClosures.hpp: Missing string dedup version shenandoahSupport.cpp L#615 - 656 L#3537 - 3556 L#3981 - 4056 indent sharedRuntime.cpp 213 assert(oopDesc::is_oop(orig, true /* ignore mark word */), "Error"); 214 // store the original value that was in the field reference 215 if (UseShenandoahGC) { ShenandoahBarrierSet::enqueue(orig); } 216 return; 217 thread->satb_mark_queue().enqueue(orig); 218 JRT_END L#216: does not look right. Should it be inside UseShenandoahGC block? Thanks, -Zhengyu On 01/17/2018 09:37 AM, Roman Kennke wrote: > Testing showed up some regressions in non-traversal code and two issues > that I introduced (or haven't fixed) when single-flag patch arrived. > > The following now passes hotspot_gc_shenandoah tests and runs of specjvm > with fastdebug with -XX:+ShenandoahVerify > -XX:+ShenandoahGCHeuristics=traversal, with -XX:TieredStopAtLevel=0|1|4 > > Differential: > http://cr.openjdk.java.net/~rkennke/traversal/webrev.01.diff/ > Full: > http://cr.openjdk.java.net/~rkennke/traversal/webrev.01/ > > Please review, test, comment, etc. :-) > > Cheers, Roman > >> This started out as a smallish partial-GC experiment, then into a >> clone of partial GC, and ended up as a standalone GC mode for >> Shenandoah, which is a frankensteinization of >> partial+concurrent-marking, with some goodies :-) >> >> The idea is to do everything, marking+evacuation+update-refs, in one >> single phase. This is not very difficult to do: while traversing, >> evacuate objects that are in the Cset, and update references as we go. >> I chose to traverse the heap using an incremental-update approach, >> mostly because this is what partial GC does, and as said above, this >> started out as a clone of partial :-) >> >> The tricky part is to choose the Cset: I made it such that each GC >> cycle collects liveness information, and bases the decision about Cset >> in the next cycle on that liveness information. Yes, this means the >> first cycle does not collect anything (except immediate garbage). >> >> Advantages: >> - obviously, touching all live objects only once means less time spent >> in GC. Measurements show that traversing the heap and doing everything >> is only slightly longer than Shenandoah's marking phase, and this >> might actually be because we also need to mark through newly allocated >> objects. >> - Traversal-order evacuation gives us 10x increase in >> ordering-sensitive microbenchmark: >> https://shipilev.net/jvm-anatomy-park/11-moving-gc-locality/ >> >> - Simpler barriers: i-u style barriers don't need to load the >> pre-value, and can be optimized much better (hoisted out of hot paths, >> etc). Some of it is already done in this patch, but there are plenty >> of opportunities to make it even better. >> - Possibly less floating garbage because we trace through newly >> allocated objects too, and don't treat it implicitely live. >> - we don't need a keep-alive-barrier for Reference.get() which means >> we keep fewer referents alive just because they happen to be accessed >> during GC. >> - MWF is only a switch away (if I understand MWF correctly): >> -XX:+ShenandoahMWF >> - It does not need RBs in the WB fast-path, because outside of the >> single phase, nothing is ever forwarded. >> - It does not need the membar stuff in the WBs because we turn on/off >> the phase during safepoint >> >> Disadvantages: >> - Store-value barrier needs to be a WB, RB is not sufficient. The >> storeval barrier is there to ensure only to-space values ever get >> written to fields during update-refs. 3-phase Shenandoah doesn't >> evacuate during update-refs, and therefore RB is enough. We need WB >> here. (I believe this is off-set by optimization opportunities, see >> above) >> - Known I-U problem: mutators can outrun the GC with allocations and >> let us not terminate. >> - It needs barriers for constants (need to check this). >> >> Stuff left to do: >> - Implement sane degeneration: if we hit OOM, we simply restart and go >> into full-GC. >> - Depending on degen: make heuristics adaptive. Currently it requires >> manual tweaking of thresholds. >> >> Relevant knobs: >> - ShenandoahGarbageThreshold: regions with more garbage than this go >> into the Cset. Notice that this is based on the *previous* cycle, so >> we may actually have much more garbage (but not less). >> - ShenandoahFreeThreshold: start GC when we have less than that much >> free heap. >> >> I'll not go into all the details for now and give you the code: >> http://cr.openjdk.java.net/~rkennke/traversal/webrev.00/ >> >> >> Roman > From shade at redhat.com Wed Jan 17 17:54:26 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 17 Jan 2018 18:54:26 +0100 Subject: RFR: Traveral GC heuristics In-Reply-To: <9e18ba8d-593f-57a7-12cc-99cffe164a88@redhat.com> References: <9e18ba8d-593f-57a7-12cc-99cffe164a88@redhat.com> Message-ID: On 01/17/2018 04:08 PM, Roman Kennke wrote: > Full: > http://cr.openjdk.java.net/~rkennke/traversal/webrev.02/ Exciting! c1_Runtime1_x86.cpp: *) Let's rewrite this: if (bs->kind() == BarrierSet::ShenandoahBarrierSet && !ShenandoahSATBBarrier && !ShenandoahConditionalSATBBarrier && !ShenandoahStoreValEnqueueBarrier) { into: if (bs->kind() == BarrierSet::ShenandoahBarrierSet && !(ShenandoahSATBBarrier || ShenandoahConditionalSATBBarrier || ShenandoahStoreValEnqueueBarrier) { *) Re: 1644 __ testb(gc_state, ShenandoahHeap::MARKING | ShenandoahHeap::TRAVERSAL); So, set_concurrent_traversal_in_progress activates the SATB queues, and this is good. Why don't we set the ShenandoahHeap::MARKING bit to gc_state there, and avoid "| TRAVERSAL" all around the arch-specific code? *) shenandoahBarrierSet_x86.cpp: Pushes/pops around the call to g1_write_barrier_pre seem suspicious. How do we know we need to caller-save rbx, rcx, rdx, c_rarg1? Deserves a comment, maybe? *) ShenandoahStoreValReadBarrier, ShenandoahStoreValWriteBarrier, ShenandoahStoreValReadBarrier exclusion tests *) shenandoahBarrierSet.cpp: branches are the same, which looks like typo. Should be compound boolean predicate? 42 if (ALWAYS_ENQUEUE && !oopDesc::is_null(o)) { 43 ShenandoahBarrierSet::enqueue(o); 44 } else if (evac) { 45 ShenandoahBarrierSet::enqueue(o); 46 } shenandoahCollectorPolicy.cpp *) Stray debugging lines: 1367 // tty->print_cr("CSET regions:"); 1376 // r->print_on(tty); *) Heuristics need work: I think it runs into problem that adaptive cset selection solves: it chooses either too big or too small cset. I wonder if you can actually reuse that in traversal heuristics shenandoahConcurrentThread.cpp: *) I think you want to introduce ShenandoahHeap::{vmop_entry,entry,op}_traversal family of methods, and call them, as we do with the rest of VM ops. *) This is not needed anymore: 207 // TODO: Call this properly with Shenandoah*CycleMark 208 heap->set_used_at_last_gc(); shenandoahHeap.cpp: *) As mentioned above, this: 1683 void ShenandoahHeap::set_concurrent_traversal_in_progress(bool in_progress) { 1684 set_gc_state_bit(TRAVERSAL_BITPOS, in_progress); 1685 JavaThread::satb_mark_queue_set().set_active_all_threads(in_progress, !in_progress); 1686 set_evacuation_in_progress_at_safepoint(in_progress); 1687 set_has_forwarded_objects(in_progress); 1688 } is probably just: void ShenandoahHeap::set_concurrent_traversal_in_progress(bool in_progress) { set_gc_state_bit(TRAVERSAL_BITPOS, in_progress); set_gc_state_bit(MARKING_BITPOS, in_progress); set_gc_state_bit(HAS_FORWARDED_OBJECTS_BITPOS, in_progress); set_gc_state_bit_at_safepoint(_gc_state.raw_value()); JavaThread::satb_mark_queue_set().set_active_all_threads(in_progress, !in_progress); } shenandoahVerifier.cpp: *) Why are we testing for "next" bitmap here? _verify_liveness_complete and the comment seem to disagree? Comments still mention "partial"? void ShenandoahVerifier::verify_after_traversal() { verify_at_safepoint( "After Traversal", _verify_forwarded_none, // cannot have forwarded objects _verify_marked_next, // bitmaps might be stale, but alloc-after-mark should be well _verify_matrix_disable, // matrix is conservatively consistent _verify_cset_none, // no cset references left after partial _verify_liveness_complete, // no reliable liveness data anymore _verify_regions_nocset // no cset regions, trash regions allowed ); } shenandoah_globals.hpp: *) Comment is duplicated: 316 \ 317 diagnostic(bool, ShenandoahStoreValEnqueueBarrier, false, \ 318 "Turn on/off enqueuing of oops after write barriers (MWF)") \ 319 \ 320 diagnostic(bool, ShenandoahMWF, false, \ 321 "Turn on/off enqueuing of oops after write barriers (MWF)") \ graphKit.cpp: *) So we predicate shenandoah_enqueue_barrier with !ShenandoahMWF here: 4887 if (ShenandoahStoreValEnqueueBarrier && !ShenandoahMWF) { 4888 shenandoah_enqueue_barrier(obj); 4889 } ...but not around other uses of ShenandoahStoreValEnqueueBarrier, e.g. in c1_LIRGenerator? Other C2: *) Roland should take a look, but I find it uncomfortable to change do_unswitching, find_unswitching_candidate with new arguements... sharedRuntime.cpp: *) Bug due to bad indentation and braces? 215 if (UseShenandoahGC) { ShenandoahBarrierSet::enqueue(orig); } 216 return; shenandoahTraversalGC*: *) Really, really unfortunate to duplicate a lot from shenandoahConcurrentMark. Maybe we should massage the codebase so that we could reuse significant chunks of the code? Thanks, -Aleksey From ashipile at redhat.com Wed Jan 17 18:08:32 2018 From: ashipile at redhat.com (ashipile at redhat.com) Date: Wed, 17 Jan 2018 18:08:32 +0000 Subject: hg: shenandoah/jdk10: 2 new changesets Message-ID: <201801171808.w0HI8WD2018594@aojmv0008.oracle.com> Changeset: fd9724b26fdd Author: shade Date: 2018-01-17 15:37 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk10/rev/fd9724b26fdd Refactor allocation failure and explicit GC handling ! src/hotspot/share/gc/shared/gcCause.cpp ! src/hotspot/share/gc/shared/gcCause.hpp ! src/hotspot/share/gc/shenandoah/shenandoahCollectorPolicy.cpp ! src/hotspot/share/gc/shenandoah/shenandoahCollectorPolicy.hpp ! src/hotspot/share/gc/shenandoah/shenandoahConcurrentThread.cpp ! src/hotspot/share/gc/shenandoah/shenandoahConcurrentThread.hpp ! src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp ! src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp ! src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp ! src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp Changeset: 26b9048c042a Author: shade Date: 2018-01-17 16:08 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk10/rev/26b9048c042a Make degenerated update-refs use region-set cursor to hand over work ! src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp From rkennke at redhat.com Wed Jan 17 20:54:19 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 17 Jan 2018 21:54:19 +0100 Subject: RFR: Traveral GC heuristics In-Reply-To: <694e454c-dd0f-ceb5-5258-a89fbd9690a4@redhat.com> References: <694e454c-dd0f-ceb5-5258-a89fbd9690a4@redhat.com> Message-ID: Am 17.01.2018 um 18:10 schrieb Zhengyu Gu: > shenandoahOopClosures.hpp: > ? Missing string dedup version I am not sure what needs to be done for strdedup. Add support for it in a followup patch? > shenandoahSupport.cpp > L#615 - 656 > L#3537 - 3556 > L#3981 - 4056 > ? indent Fixed. > sharedRuntime.cpp > > ?213?? assert(oopDesc::is_oop(orig, true /* ignore mark word */), > "Error"); > ?214?? // store the original value that was in the field reference > ?215 if (UseShenandoahGC) { ShenandoahBarrierSet::enqueue(orig); } > ?216 return; > ?217?? thread->satb_mark_queue().enqueue(orig); > ?218 JRT_END > > L#216: does not look right. Should it be inside UseShenandoahGC block? It's not needed and can go away. You'll find the updated patch in reply to Aleksey's review that I'll post shortly (after testing). Thanks, Roman > Thanks, > > -Zhengyu > > > On 01/17/2018 09:37 AM, Roman Kennke wrote: >> Testing showed up some regressions in non-traversal code and two >> issues that I introduced (or haven't fixed) when single-flag patch >> arrived. >> >> The following now passes hotspot_gc_shenandoah tests and runs of >> specjvm with fastdebug with -XX:+ShenandoahVerify >> -XX:+ShenandoahGCHeuristics=traversal, with -XX:TieredStopAtLevel=0|1|4 >> >> Differential: >> http://cr.openjdk.java.net/~rkennke/traversal/webrev.01.diff/ >> Full: >> http://cr.openjdk.java.net/~rkennke/traversal/webrev.01/ >> >> Please review, test, comment, etc. :-) >> >> Cheers, Roman >> >>> This started out as a smallish partial-GC experiment, then into a >>> clone of partial GC, and ended up as a standalone GC mode for >>> Shenandoah, which is a frankensteinization of >>> partial+concurrent-marking, with some goodies :-) >>> >>> The idea is to do everything, marking+evacuation+update-refs, in one >>> single phase. This is not very difficult to do: while traversing, >>> evacuate objects that are in the Cset, and update references as we >>> go. I chose to traverse the heap using an incremental-update >>> approach, mostly because this is what partial GC does, and as said >>> above, this started out as a clone of partial :-) >>> >>> The tricky part is to choose the Cset: I made it such that each GC >>> cycle collects liveness information, and bases the decision about >>> Cset in the next cycle on that liveness information. Yes, this means >>> the first cycle does not collect anything (except immediate garbage). >>> >>> Advantages: >>> - obviously, touching all live objects only once means less time >>> spent in GC. Measurements show that traversing the heap and doing >>> everything is only slightly longer than Shenandoah's marking phase, >>> and this might actually be because we also need to mark through newly >>> allocated objects. >>> - Traversal-order evacuation gives us 10x increase in >>> ordering-sensitive microbenchmark: >>> https://shipilev.net/jvm-anatomy-park/11-moving-gc-locality/ >>> >>> - Simpler barriers: i-u style barriers don't need to load the >>> pre-value, and can be optimized much better (hoisted out of hot >>> paths, etc). Some of it is already done in this patch, but there are >>> plenty of opportunities to make it even better. >>> - Possibly less floating garbage because we trace through newly >>> allocated objects too, and don't treat it implicitely live. >>> - we don't need a keep-alive-barrier for Reference.get() which means >>> we keep fewer referents alive just because they happen to be accessed >>> during GC. >>> - MWF is only a switch away (if I understand MWF correctly): >>> -XX:+ShenandoahMWF >>> - It does not need RBs in the WB fast-path, because outside of the >>> single phase, nothing is ever forwarded. >>> - It does not need the membar stuff in the WBs because we turn on/off >>> the phase during safepoint >>> >>> Disadvantages: >>> - Store-value barrier needs to be a WB, RB is not sufficient. The >>> storeval barrier is there to ensure only to-space values ever get >>> written to fields during update-refs. 3-phase Shenandoah doesn't >>> evacuate during update-refs, and therefore RB is enough. We need WB >>> here. (I believe this is off-set by optimization opportunities, see >>> above) >>> - Known I-U problem: mutators can outrun the GC with allocations and >>> let us not terminate. >>> - It needs barriers for constants (need to check this). >>> >>> Stuff left to do: >>> - Implement sane degeneration: if we hit OOM, we simply restart and >>> go into full-GC. >>> - Depending on degen: make heuristics adaptive. Currently it requires >>> manual tweaking of thresholds. >>> >>> Relevant knobs: >>> - ShenandoahGarbageThreshold: regions with more garbage than this go >>> into the Cset. Notice that this is based on the *previous* cycle, so >>> we may actually have much more garbage (but not less). >>> - ShenandoahFreeThreshold: start GC when we have less than that much >>> free heap. >>> >>> I'll not go into all the details for now and give you the code: >>> http://cr.openjdk.java.net/~rkennke/traversal/webrev.00/ >>> >>> >>> Roman >> From zgu at redhat.com Wed Jan 17 20:56:45 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 17 Jan 2018 15:56:45 -0500 Subject: RFR: Traveral GC heuristics In-Reply-To: References: <694e454c-dd0f-ceb5-5258-a89fbd9690a4@redhat.com> Message-ID: On 01/17/2018 03:54 PM, Roman Kennke wrote: > Am 17.01.2018 um 18:10 schrieb Zhengyu Gu: >> shenandoahOopClosures.hpp: >> Missing string dedup version > > I am not sure what needs to be done for strdedup. Add support for it in > a followup patch? Sure. I can add the support afterward. Thanks, -Zhengyu > >> shenandoahSupport.cpp >> L#615 - 656 >> L#3537 - 3556 >> L#3981 - 4056 >> indent > > Fixed. > >> sharedRuntime.cpp >> >> 213 assert(oopDesc::is_oop(orig, true /* ignore mark word */), >> "Error"); >> 214 // store the original value that was in the field reference >> 215 if (UseShenandoahGC) { ShenandoahBarrierSet::enqueue(orig); } >> 216 return; >> 217 thread->satb_mark_queue().enqueue(orig); >> 218 JRT_END >> >> L#216: does not look right. Should it be inside UseShenandoahGC block? > > It's not needed and can go away. > > You'll find the updated patch in reply to Aleksey's review that I'll > post shortly (after testing). > > Thanks, Roman > >> Thanks, >> >> -Zhengyu >> >> >> On 01/17/2018 09:37 AM, Roman Kennke wrote: >>> Testing showed up some regressions in non-traversal code and two >>> issues that I introduced (or haven't fixed) when single-flag patch >>> arrived. >>> >>> The following now passes hotspot_gc_shenandoah tests and runs of >>> specjvm with fastdebug with -XX:+ShenandoahVerify >>> -XX:+ShenandoahGCHeuristics=traversal, with -XX:TieredStopAtLevel=0|1|4 >>> >>> Differential: >>> http://cr.openjdk.java.net/~rkennke/traversal/webrev.01.diff/ >>> Full: >>> http://cr.openjdk.java.net/~rkennke/traversal/webrev.01/ >>> >>> Please review, test, comment, etc. :-) >>> >>> Cheers, Roman >>> >>>> This started out as a smallish partial-GC experiment, then into a >>>> clone of partial GC, and ended up as a standalone GC mode for >>>> Shenandoah, which is a frankensteinization of >>>> partial+concurrent-marking, with some goodies :-) >>>> >>>> The idea is to do everything, marking+evacuation+update-refs, in one >>>> single phase. This is not very difficult to do: while traversing, >>>> evacuate objects that are in the Cset, and update references as we >>>> go. I chose to traverse the heap using an incremental-update >>>> approach, mostly because this is what partial GC does, and as said >>>> above, this started out as a clone of partial :-) >>>> >>>> The tricky part is to choose the Cset: I made it such that each GC >>>> cycle collects liveness information, and bases the decision about >>>> Cset in the next cycle on that liveness information. Yes, this means >>>> the first cycle does not collect anything (except immediate garbage). >>>> >>>> Advantages: >>>> - obviously, touching all live objects only once means less time >>>> spent in GC. Measurements show that traversing the heap and doing >>>> everything is only slightly longer than Shenandoah's marking phase, >>>> and this might actually be because we also need to mark through >>>> newly allocated objects. >>>> - Traversal-order evacuation gives us 10x increase in >>>> ordering-sensitive microbenchmark: >>>> https://shipilev.net/jvm-anatomy-park/11-moving-gc-locality/ >>>> >>>> - Simpler barriers: i-u style barriers don't need to load the >>>> pre-value, and can be optimized much better (hoisted out of hot >>>> paths, etc). Some of it is already done in this patch, but there are >>>> plenty of opportunities to make it even better. >>>> - Possibly less floating garbage because we trace through newly >>>> allocated objects too, and don't treat it implicitely live. >>>> - we don't need a keep-alive-barrier for Reference.get() which means >>>> we keep fewer referents alive just because they happen to be >>>> accessed during GC. >>>> - MWF is only a switch away (if I understand MWF correctly): >>>> -XX:+ShenandoahMWF >>>> - It does not need RBs in the WB fast-path, because outside of the >>>> single phase, nothing is ever forwarded. >>>> - It does not need the membar stuff in the WBs because we turn >>>> on/off the phase during safepoint >>>> >>>> Disadvantages: >>>> - Store-value barrier needs to be a WB, RB is not sufficient. The >>>> storeval barrier is there to ensure only to-space values ever get >>>> written to fields during update-refs. 3-phase Shenandoah doesn't >>>> evacuate during update-refs, and therefore RB is enough. We need WB >>>> here. (I believe this is off-set by optimization opportunities, see >>>> above) >>>> - Known I-U problem: mutators can outrun the GC with allocations and >>>> let us not terminate. >>>> - It needs barriers for constants (need to check this). >>>> >>>> Stuff left to do: >>>> - Implement sane degeneration: if we hit OOM, we simply restart and >>>> go into full-GC. >>>> - Depending on degen: make heuristics adaptive. Currently it >>>> requires manual tweaking of thresholds. >>>> >>>> Relevant knobs: >>>> - ShenandoahGarbageThreshold: regions with more garbage than this go >>>> into the Cset. Notice that this is based on the *previous* cycle, so >>>> we may actually have much more garbage (but not less). >>>> - ShenandoahFreeThreshold: start GC when we have less than that much >>>> free heap. >>>> >>>> I'll not go into all the details for now and give you the code: >>>> http://cr.openjdk.java.net/~rkennke/traversal/webrev.00/ >>>> >>>> >>>> Roman >>> > From rkennke at redhat.com Wed Jan 17 21:58:52 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 17 Jan 2018 22:58:52 +0100 Subject: RFR: Traveral GC heuristics In-Reply-To: References: <9e18ba8d-593f-57a7-12cc-99cffe164a88@redhat.com> Message-ID: <46b64ff3-d2df-b2da-0805-5fa7cf8593f0@redhat.com> >> Full: >> http://cr.openjdk.java.net/~rkennke/traversal/webrev.02/ > > Exciting! > > c1_Runtime1_x86.cpp: > > *) Let's rewrite this: > > if (bs->kind() == BarrierSet::ShenandoahBarrierSet && !ShenandoahSATBBarrier && > !ShenandoahConditionalSATBBarrier && !ShenandoahStoreValEnqueueBarrier) { > > into: > > if (bs->kind() == BarrierSet::ShenandoahBarrierSet && !(ShenandoahSATBBarrier || > ShenandoahConditionalSATBBarrier || ShenandoahStoreValEnqueueBarrier) { Done. > *) Re: > > 1644 __ testb(gc_state, ShenandoahHeap::MARKING | ShenandoahHeap::TRAVERSAL); > > So, set_concurrent_traversal_in_progress activates the SATB queues, and this is good. Why don't we > set the ShenandoahHeap::MARKING bit to gc_state there, and avoid "| TRAVERSAL" all around the > arch-specific code? Done. > *) shenandoahBarrierSet_x86.cpp: Pushes/pops around the call to g1_write_barrier_pre seem > suspicious. How do we know we need to caller-save rbx, rcx, rdx, c_rarg1? Deserves a comment, maybe? It's the same set of regs that need to be saved+restored in the write barrier, a few lines above. > *) ShenandoahStoreValReadBarrier, ShenandoahStoreValWriteBarrier, ShenandoahStoreValReadBarrier > exclusion tests ShStoreValWB and ShStoreValEnq are not exclusive. I need them both in tandem. I added exclusion test for ShStoreValEnq against ShStoreValRB in shenandoahCollectorPolicy.cpp, but don't know how to 'encode' that in TestSelectiveBarrierFlags.java > *) shenandoahBarrierSet.cpp: branches are the same, which looks like typo. Should be compound > boolean predicate? > > 42 if (ALWAYS_ENQUEUE && !oopDesc::is_null(o)) { > 43 ShenandoahBarrierSet::enqueue(o); > 44 } else if (evac) { > 45 ShenandoahBarrierSet::enqueue(o); > 46 } Done. > shenandoahCollectorPolicy.cpp > > *) Stray debugging lines: > > 1367 // tty->print_cr("CSET regions:"); > 1376 // r->print_on(tty); Removed. > *) Heuristics need work: I think it runs into problem that adaptive cset selection solves: it > chooses either too big or too small cset. I wonder if you can actually reuse that in traversal > heuristics Yes, but we need degenerate GC for traversal first :-) > shenandoahConcurrentThread.cpp: > > *) I think you want to introduce ShenandoahHeap::{vmop_entry,entry,op}_traversal family of methods, > and call them, as we do with the rest of VM ops. Done. > *) This is not needed anymore: > > 207 // TODO: Call this properly with Shenandoah*CycleMark > 208 heap->set_used_at_last_gc(); Removed. > shenandoahHeap.cpp: > > *) As mentioned above, this: > > 1683 void ShenandoahHeap::set_concurrent_traversal_in_progress(bool in_progress) { > 1684 set_gc_state_bit(TRAVERSAL_BITPOS, in_progress); > 1685 JavaThread::satb_mark_queue_set().set_active_all_threads(in_progress, !in_progress); > 1686 set_evacuation_in_progress_at_safepoint(in_progress); > 1687 set_has_forwarded_objects(in_progress); > 1688 } > > is probably just: > > void ShenandoahHeap::set_concurrent_traversal_in_progress(bool in_progress) { > set_gc_state_bit(TRAVERSAL_BITPOS, in_progress); > set_gc_state_bit(MARKING_BITPOS, in_progress); > set_gc_state_bit(HAS_FORWARDED_OBJECTS_BITPOS, in_progress); > set_gc_state_bit_at_safepoint(_gc_state.raw_value()); > JavaThread::satb_mark_queue_set().set_active_all_threads(in_progress, !in_progress); > } Done. > shenandoahVerifier.cpp: > > *) Why are we testing for "next" bitmap here? Because traversal uses the next bitmap, and only this, and I don't care to swap with complete, but I want to verify it. Good? > _verify_liveness_complete and the comment seem to > disagree? Comments still mention "partial"? Fixed. > shenandoah_globals.hpp: > > *) Comment is duplicated: > > 316 \ > 317 diagnostic(bool, ShenandoahStoreValEnqueueBarrier, false, \ > 318 "Turn on/off enqueuing of oops after write barriers (MWF)") \ > 319 \ > 320 diagnostic(bool, ShenandoahMWF, false, \ > 321 "Turn on/off enqueuing of oops after write barriers (MWF)") \ > > Fixed. > graphKit.cpp: > > *) So we predicate shenandoah_enqueue_barrier with !ShenandoahMWF here: > > 4887 if (ShenandoahStoreValEnqueueBarrier && !ShenandoahMWF) { > 4888 shenandoah_enqueue_barrier(obj); > 4889 } > > ...but not around other uses of ShenandoahStoreValEnqueueBarrier, e.g. in c1_LIRGenerator? > I only implemented this sketchy MWF thing in C2 for now. This definitely needs more work, and I don't even know if it is correct. > Other C2: > > *) Roland should take a look, but I find it uncomfortable to change do_unswitching, > find_unswitching_candidate with new arguements... This was actually done by Roland to get the new barriers to work and optimize well enough. > sharedRuntime.cpp: > > *) Bug due to bad indentation and braces? > > 215 if (UseShenandoahGC) { ShenandoahBarrierSet::enqueue(orig); } > 216 return; Yeah, this is not needed. I removed it. > shenandoahTraversalGC*: > > *) Really, really unfortunate to duplicate a lot from shenandoahConcurrentMark. Maybe we should > massage the codebase so that we could reuse significant chunks of the code? Yes, maybe. But for the start, I did not want it to interfere with existing code if I can avoid it. For this reason, this looks like a copy+paste job from conc-mark and partial for some parts. Thanks for reviewing and spotting all the issues. I could not really make a diff webrev, because I first had to pull -u your latest work, and this messed up my differential webrev... sorry. Only full webrev now: http://cr.openjdk.java.net/~rkennke/traversal/webrev.03/ From zgu at redhat.com Wed Jan 17 21:59:17 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 17 Jan 2018 16:59:17 -0500 Subject: RFR: Hint unused regions instead of uncommit them In-Reply-To: <9bed336a-34df-5192-24da-db675b22cc45@redhat.com> References: <537527af-1e46-c834-4f1b-36cd8f148666@redhat.com> <9bed336a-34df-5192-24da-db675b22cc45@redhat.com> Message-ID: On 01/16/2018 06:49 AM, Aleksey Shipilev wrote: > On 01/15/2018 06:21 PM, Zhengyu Gu wrote: >> This patch adds new experimental flag ShenandoahIdleRegions (default to false) to hint kernel that >> the regions are not needed (vs. madvise(MADV_DONTNEED), instead of proactively uncommitting. >> >> It appears that does have advantage over uncommitting regions, although, not by as much as I was >> expected. >> >> SPECjbb2015: >> >> Baseline: >> RUN RESULT: hbIR (max attempted) = 59167, hbIR (settled) = 51984, max-jOPS = 47925, critical-jOPS = >> 19108 >> >> -XX:ShenandoahUncommitDelay=0 -XX:-ShenandoahIdleRegions >> RUN RESULT: hbIR (max attempted) = 41119, hbIR (settled) = 36501, max-jOPS = 30839, critical-jOPS = >> 8841 >> >> -XX:ShenandoahUncommitDelay=0 -XX:+ShenandoahIdleRegions >> RUN RESULT: hbIR (max attempted) = 49322, hbIR (settled) = 42968, max-jOPS = 35019, critical-jOPS = >> 9283 >> >> >> Webrev: http://cr.openjdk.java.net/~zgu/shenandoah/idle_region/webrev.00/ > > As I read MADV_DONTNEED man page and the explanations of different kernel people, I am getting > uneasy using this. madvise call that basically corrupts memory, say what? And it also does not > support large pages... > > It _maybe_ makes sense to optionally support this, but only if we make the code changes minimal. It > looks like the fair bit of complexity comes from the attempt to fallback to commit/uncommit when > idling fails. Could we just test that idle/activate_memory works, and select one of the options > without fallback? E.g. when ShenandoahIdleRegions is true, LargePages is false, and idling works, > make do_commit/do_uncommit only do idle_memory/activate_memory, and fail hard when idle_memory > returns false. You would not need the _idle_region flag too then. Okay, made it fatal if can not idle the region. Updated webrev: http://cr.openjdk.java.net/~zgu/shenandoah/idle_region/webrev.01/ Test: hotspot_gc_shenandoah (fastdebug + release) Manual test to verify large pages are actually used. Thanks, -Zhengyu > > Thanks, > -ALeksey > From zgu at redhat.com Thu Jan 18 00:39:34 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 17 Jan 2018 19:39:34 -0500 Subject: RFR: Bitmap size might not be page aligned when large page is used Message-ID: <68242bad-5451-e508-cd83-9d9b71ef2bde@redhat.com> I discovered this when running tests for idling regions. # Out of Memory Error (/home/zgu/workspace/shenandoah-jdk10/src/hotspot/os/linux/os_linux.cpp:2598), pid=23493, tid=23494 # # JRE version: (10.0) (fastdebug build ) # Java VM: OpenJDK 64-Bit Server VM (fastdebug 10-internal+0-adhoc.zgu.shenandoah-jdk10, mixed mode, aot, tiered, compressed oops, Shenandoah gc, linux-amd64) # Core dump will be written. Default location: /home/zgu/workspace/shenandoah-jdk10/test/hotspot/jtreg/gc/arguments/core.%p # This bug is only reproducible when large pages are actually used (when system has enough large pages) http://cr.openjdk.java.net/~zgu/shenandoah/bitmap_size_large_page/webrev.00/ Test: hotspot_gc_shenandoah (fastdebug + release) Thanks, -Zhengyu From shade at redhat.com Thu Jan 18 08:58:32 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 18 Jan 2018 09:58:32 +0100 Subject: RFR: Bitmap size might not be page aligned when large page is used In-Reply-To: <68242bad-5451-e508-cd83-9d9b71ef2bde@redhat.com> References: <68242bad-5451-e508-cd83-9d9b71ef2bde@redhat.com> Message-ID: <15adbdff-f98e-3bc4-a01c-ad42e77446e4@redhat.com> On 01/18/2018 01:39 AM, Zhengyu Gu wrote: > I discovered this when running tests for idling regions. > > #? Out of Memory Error > (/home/zgu/workspace/shenandoah-jdk10/src/hotspot/os/linux/os_linux.cpp:2598), pid=23493, tid=23494 > # > # JRE version:? (10.0) (fastdebug build ) > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 10-internal+0-adhoc.zgu.shenandoah-jdk10, mixed mode, > aot, tiered, compressed oops, Shenandoah gc, linux-amd64) > # Core dump will be written. Default location: > /home/zgu/workspace/shenandoah-jdk10/test/hotspot/jtreg/gc/arguments/core.%p > # > > This bug is only reproducible when large pages are actually used (when system has enough large pages) > > http://cr.openjdk.java.net/~zgu/shenandoah/bitmap_size_large_page/webrev.00/ D'uh. Looks good! -Aleksey From rwestrel at redhat.com Thu Jan 18 13:20:24 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 18 Jan 2018 14:20:24 +0100 Subject: RFR: Traveral GC heuristics In-Reply-To: <46b64ff3-d2df-b2da-0805-5fa7cf8593f0@redhat.com> References: <9e18ba8d-593f-57a7-12cc-99cffe164a88@redhat.com> <46b64ff3-d2df-b2da-0805-5fa7cf8593f0@redhat.com> Message-ID: >> Other C2: >> >> *) Roland should take a look, but I find it uncomfortable to change do_unswitching, >> find_unswitching_candidate with new arguements... > > This was actually done by Roland to get the new barriers to work and > optimize well enough. C2 stuff is ok. Roland. From zgu at redhat.com Thu Jan 18 13:48:46 2018 From: zgu at redhat.com (zgu at redhat.com) Date: Thu, 18 Jan 2018 13:48:46 +0000 Subject: hg: shenandoah/jdk10: Bitmap size might not be page aligned when large page is used Message-ID: <201801181348.w0IDmkTh010708@aojmv0008.oracle.com> Changeset: 1a6a9f288dd2 Author: zgu Date: 2018-01-18 08:23 -0500 URL: http://hg.openjdk.java.net/shenandoah/jdk10/rev/1a6a9f288dd2 Bitmap size might not be page aligned when large page is used ! src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp From shade at redhat.com Thu Jan 18 15:18:15 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 18 Jan 2018 16:18:15 +0100 Subject: Degenerated GC Message-ID: http://cr.openjdk.java.net/~shade/shenandoah/degenerated-gc/webrev.01/ This patch implements Degenerate GC: better solution to handle allocation failures. We had pushed bits and pieces of infrastructure needed for it over few past weeks. Our current scheme roughly approximates the same thing: if allocation failure is raised during the concurrent mark or concurrent update-refs, we immediately STW and complete the phase under the pause. There are major caveats in that scheme though: it only works reliably for the phases that have final-STWs, it complicates the control code significantly, and it tries to continue the cycle concurrent cycle afterwards, even though we know something is fishy. Degenerate GC is basically the STW continuation of the concurrent cycle. When concurrent cycle degenerates, we invoke a single VM operation ("dive into STW"), and complete the same cycle there. In most cases, we degenerate at the end of concurrent cycle when the majority of work is already done. If Degenerate GC experiences the second allocation failure during that STW cycle (e.g. during evac), it upgrades to Full GC. It stands to reason that Degenerate GC is cheaper than Full GC, but here is how they compare most of the time: # Degenerated at evacuation, upgraded to Full GC: [46.755s][info][gc] GC(109) Cancelling concurrent GC: Allocation Failure [46.755s][info][gc] GC(109) Cannot finish degeneration, upgrading to Full GC [46.994s][info][gc] GC(109) Pause Degenerated GC (Evacuation) 4054M->527M(4096M) 239.331ms # Degenerated at update-refs [52.145s][info][gc] Cancelling concurrent GC: Allocation Failure [52.147s][info][gc] GC(123) Concurrent update references 3360M->3946M(4096M) 218.713ms [52.177s][info][gc] GC(124) Pause Degenerated GC (Update Refs) 3946M->1725M(4096M) 20.201ms So, degeneration can be seen as the softer graceful degradation step before full-stop full-heap full-moving Full GC. Degenerate GC brings several major improvements over our usual degenerate scheme: a) When allocation failure is raised, we stop *all* threads, not just that allocator thread. This makes sense because it is very likely that other threads would experience the allocation failure shortly. This is our failure mode, and GC log would register the GC pause that would correlate with the actual stalls experienced by application threads. b) When degenerate STW is running, it uses ParallelGCThreads count, completing the cycle as fast as it possibly can. Otherwise, if we degenerated the concurrent cycle, most mutator threads would probably be stuck waiting for allocation to succeed, but the concurrent cycle would still run with ConcGCThreads (which is realistically lower than ParallelGCThread), wasting precious wall time. c) It handles out-of-cycle allocation failure. When ShConcurrentThread cannot catch up with issuing the GC cycles fast enough, or when the heuristics misses the allocation spike, our current code just Full GCs. Current change runs the Degenerate GC, in hope that mark would identify enough immediate garbage to proceed with the cycle. (This would get better once we give the GC a stash of "reserved" regions for evacuation!) d) It allows easier future handling of partial, traversal, and evac degeneration: we are already at STW, and we can do whatever at that point. Degenerate GC seems to improve the survivability on densely populated heaps. This could be modeled roughly by having a normal heavily-allocating and heavily-threaded workload with a very tight heap. Current gc+stats would tell that most allocation failures are handled by Degenerated GCs then: -Xmx16g [140.227s][info][gc,stats] 48 successful concurrent GCs [140.227s][info][gc,stats] 0 invoked explicitly [140.227s][info][gc,stats] [140.227s][info][gc,stats] 2 Degenerated GCs [140.227s][info][gc,stats] 2 caused by allocation failure [140.227s][info][gc,stats] 0 upgraded to Full GC [140.227s][info][gc,stats] [140.227s][info][gc,stats] 0 Full GCs [140.227s][info][gc,stats] 0 invoked explicitly [140.227s][info][gc,stats] 0 caused by allocation failure [140.227s][info][gc,stats] 0 upgraded from Degenerated GC -Xmx2g [197.491s][info][gc,stats] 379 successful concurrent GCs [197.491s][info][gc,stats] 0 invoked explicitly [197.491s][info][gc,stats] [197.491s][info][gc,stats] 120 Degenerated GCs [197.491s][info][gc,stats] 120 caused by allocation failure [197.491s][info][gc,stats] 47 upgraded to Full GC [197.491s][info][gc,stats] [197.491s][info][gc,stats] 49 Full GCs [197.491s][info][gc,stats] 0 invoked explicitly [197.491s][info][gc,stats] 2 caused by allocation failure [197.491s][info][gc,stats] 47 upgraded from Degenerated GC (Full GC upgrades are from evac OOME-s, and alloc-failure Full GCs are the heuristics chickening out from multiple back-to-back Degenerated GCs into Full GC). Still fully testing it, but early reviews are welcome. Testing: hotspot_gc_shenandoah, benchmarks Thanks, -Aleksey From shade at redhat.com Thu Jan 18 19:51:34 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 18 Jan 2018 20:51:34 +0100 Subject: Degenerated GC In-Reply-To: References: Message-ID: On 01/18/2018 04:18 PM, Aleksey Shipilev wrote: > http://cr.openjdk.java.net/~shade/shenandoah/degenerated-gc/webrev.01/ Amped up alloc-failure injection, and that exposed a few bugs. Fixed them: http://cr.openjdk.java.net/~shade/shenandoah/degenerated-gc/webrev.02/ GCBasher runs for half an hour now without problems. Running further... -Aleksey From cflood at redhat.com Thu Jan 18 23:21:00 2018 From: cflood at redhat.com (Christine Flood) Date: Thu, 18 Jan 2018 18:21:00 -0500 Subject: RFR: Traveral GC heuristics In-Reply-To: References: <9e18ba8d-593f-57a7-12cc-99cffe164a88@redhat.com> <46b64ff3-d2df-b2da-0805-5fa7cf8593f0@redhat.com> Message-ID: Can we at least include a number of comments that we are using SATB queues for convenience but this isn't using an SATB algorithm. Otherwise future developers will curse us for misleading them. Is there some way to come up with a common abstraction for partial gc and traversal gc so we don't have to have all those duplicate timings? You have the MWF flag, but I don't see the implementation. You need something in ShenandoahBarrierSet to see if the object being written to was allocated after TAMS and if so, both the object and the field need to be marked. Christine On Thu, Jan 18, 2018 at 8:20 AM, Roland Westrelin wrote: > >>> Other C2: >>> >>> *) Roland should take a look, but I find it uncomfortable to change do_unswitching, >>> find_unswitching_candidate with new arguements... >> >> This was actually done by Roland to get the new barriers to work and >> optimize well enough. > > C2 stuff is ok. > > Roland. From shade at redhat.com Fri Jan 19 07:55:46 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 19 Jan 2018 08:55:46 +0100 Subject: Degenerated GC In-Reply-To: References: Message-ID: On 01/18/2018 08:51 PM, Aleksey Shipilev wrote: > On 01/18/2018 04:18 PM, Aleksey Shipilev wrote: >> http://cr.openjdk.java.net/~shade/shenandoah/degenerated-gc/webrev.01/ > > Amped up alloc-failure injection, and that exposed a few bugs. Fixed them: > http://cr.openjdk.java.net/~shade/shenandoah/degenerated-gc/webrev.02/ > > GCBasher runs for half an hour now without problems. Running further... 8-hour GCBasher passes with: $ -Xmx1g -XX:+UseShenandoahGC -XX:+UnlockDiagnosticVMOptions -XX:+UnlockExperimentalVMOptions -XX:ShenandoahGCHeuristics=aggressive -XX:+ShenandoahDegenerateALot TestGCBasherWithShenandoah 28800000 [27665.812s][info][gc,stats ] 85556 successful concurrent GCs [27665.812s][info][gc,stats ] 0 invoked explicitly [27665.812s][info][gc,stats ] [27665.812s][info][gc,stats ] 44995 Degenerated GCs [27665.812s][info][gc,stats ] 44995 caused by allocation failure [27665.812s][info][gc,stats ] 8628 upgraded to Full GC [27665.812s][info][gc,stats ] [27665.812s][info][gc,stats ] 8758 Full GCs [27665.812s][info][gc,stats ] 0 invoked explicitly [27665.812s][info][gc,stats ] 130 caused by allocation failure [27665.812s][info][gc,stats ] 8628 upgraded from Degenerated GC So, I am pretty sure it works :) -Aleksey From shade at redhat.com Fri Jan 19 09:10:04 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 19 Jan 2018 10:10:04 +0100 Subject: RFC: Pick up 9.0.4 to sh/jdk9 Message-ID: <575a7f9c-6ee3-cae9-1b17-a5ae9c871ad3@redhat.com> Upstream jdk-updates/jdk9u had pushed the changesets for 9.0.4: http://hg.openjdk.java.net/jdk-updates/jdk9u Let's pick them up! A few trivial merges were needed. Testing: hotspot_gc_shenandoah {fastdebug|release} Thanks, -Alekse From shade at redhat.com Fri Jan 19 10:05:59 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 19 Jan 2018 11:05:59 +0100 Subject: RFR: Hint unused regions instead of uncommit them In-Reply-To: References: <537527af-1e46-c834-4f1b-36cd8f148666@redhat.com> <9bed336a-34df-5192-24da-db675b22cc45@redhat.com> Message-ID: <8998368d-e6c1-2218-7ef5-5d9fe152c9c9@redhat.com> On 01/17/2018 10:59 PM, Zhengyu Gu wrote: > Updated webrev: http://cr.openjdk.java.net/~zgu/shenandoah/idle_region/webrev.01/ All right, good! -Aleksey From shade at redhat.com Fri Jan 19 10:24:40 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 19 Jan 2018 11:24:40 +0100 Subject: RFR: Traveral GC heuristics In-Reply-To: <46b64ff3-d2df-b2da-0805-5fa7cf8593f0@redhat.com> References: <9e18ba8d-593f-57a7-12cc-99cffe164a88@redhat.com> <46b64ff3-d2df-b2da-0805-5fa7cf8593f0@redhat.com> Message-ID: <5d9ea546-927b-28eb-0dd8-1ee8f4192862@redhat.com> On 01/17/2018 10:58 PM, Roman Kennke wrote: > Yes, maybe. But for the start, I did not want it to interfere with existing code if I can avoid > it. For this reason, this looks like a copy+paste job from conc-mark and partial for some parts. Okay, but please plan to common these things right away. We cannot have two copy-pasted 1000+ LOC blocks and hope for the best ;) > Thanks for reviewing and spotting all the issues. I could not really make a diff webrev, because I > first had to pull -u your latest work, and this messed up my differential webrev... sorry. Only full > webrev now: > > http://cr.openjdk.java.net/~rkennke/traversal/webrev.03/ Sorry to be a PITA about this, but the change is quite large, and I think we want to be more forward-looking to backports and stability. Another sweep through the code: *) GCCause::to_string misses the to_string case for _shenandoah_traversal_gc? *) So, wait. SBS::nterpreter_write_barrier_impl caller-saves registers when they do not equal to dst. New code in SBS::interpreter_storeval_barrier just does it unconditionally. Is WB too cautious, or SVB is too lax about this? *) I think with minimal changes, we can make ShenandoahStoreValEnqueueBarrier exclusive, which will make testing much easier (encoding this in TestSelectiveBarriers would be trivial). E.g. say: if (UseShenandoahGC) { if (ShenandoahStoreValWriteBarrier || ShenandoahStoreValEnqueueBarrier) { // perform WB } if (ShenandoahStoreValEnqueueBarrier) { // enqueue } if (ShenandoahStoreValReadBarrier) { // RB } } *) Minor nit: please indent second arguments like this: FLAG_SET_DEFAULT(UseShenandoahMatrix, false); FLAG_SET_DEFAULT(ShenandoahSATBBarrier, false); FLAG_SET_DEFAULT(ShenandoahConditionalSATBBarrier, false); FLAG_SET_DEFAULT(ShenandoahStoreValReadBarrier, false); FLAG_SET_DEFAULT(ShenandoahStoreValWriteBarrier, true); FLAG_SET_DEFAULT(ShenandoahStoreValEnqueueBarrier, true); FLAG_SET_DEFAULT(ShenandoahKeepAliveBarrier, false); FLAG_SET_DEFAULT(ShenandoahAsmWB, true); FLAG_SET_DEFAULT(ShenandoahBarriersForConst, true); FLAG_SET_DEFAULT(ShenandoahWBWithMemBar, false); FLAG_SET_DEFAULT(ShenandoahWriteBarrierRB, false); *) shenandoahOopClosures.hpp, indenting is a bit off here: 240 _thread(Thread::current()), _queue(q) {} ... 273 virtual bool do_metadata() { return true; } *) I wonder if we want to pull out ShenandoahWBWithMemBar changes into a separate changeset? This looks potentially backportable, and usable outside of Traversal GC. Thanks, -Aleksey From rkennke at redhat.com Fri Jan 19 10:43:23 2018 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 19 Jan 2018 11:43:23 +0100 Subject: Degenerated GC In-Reply-To: References: Message-ID: Am 18.01.2018 um 20:51 schrieb Aleksey Shipilev: > On 01/18/2018 04:18 PM, Aleksey Shipilev wrote: >> http://cr.openjdk.java.net/~shade/shenandoah/degenerated-gc/webrev.01/ > > Amped up alloc-failure injection, and that exposed a few bugs. Fixed them: > http://cr.openjdk.java.net/~shade/shenandoah/degenerated-gc/webrev.02/ > > GCBasher runs for half an hour now without problems. Running further... > > -Aleksey > I have no complaints about it. I like it! Roman From shade at redhat.com Fri Jan 19 10:52:11 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 19 Jan 2018 11:52:11 +0100 Subject: RFR: Demote warning message about OOM-during-evac to informational Message-ID: Let's finally do this: diff -r 8e52377a090e src/hotspot/share/gc/shenandoah/shenandoahConcurrentThread.cpp --- a/src/hotspot/share/gc/shenandoah/shenandoahConcurrentThread.cpp Fri Jan 19 11:38:51 2018 +0100 +++ b/src/hotspot/share/gc/shenandoah/shenandoahConcurrentThread.cpp Fri Jan 19 11:50:16 2018 +0100 @@ -396,7 +396,9 @@ if ((! Thread::current()->is_GC_task_thread()) && (! Thread::current()->is_ConcurrentGC_thread())) { assert(! Threads_lock->owned_by_self() || SafepointSynchronize::is_at_safepoint(), "must not hold Threads_lock here"); - log_info(gc)("%s. Let Java thread wait until evacuation finishes.", GCCause::to_string(GCCause::_shenandoah_allocation_failure_evac)); + log_info(gc)("%s. Thread \"%s\" waits until evacuation finishes.", + GCCause::to_string(GCCause::_shenandoah_allocation_failure_evac), + Thread::current()->name()); while (heap->is_evacuation_in_progress()) { // wait. Thread::current()->_ParkEvent->park(1); } User has nothing to do with that warning, and it is non-user-actionable. So, no point in putting scary messages in the GC log. It now prints: [info][gc] GC(63) Concurrent cleanup 611M->611M(1024M) 0.202ms [info][gc] GC(63) Cancelling concurrent GC: Allocation Failure During Evac [info][gc] Allocation Failure During Evac. Thread "MyShinyThread" waits until evacuation finishes. [info][gc] GC(63) Concurrent evacuation 612M->994M(1024M) 315.488ms [info][gc] GC(64) Pause Full (Allocation Failure) 994M->541M(1024M) 312.493ms Testing: hotspot_fast_gc_shenandoah Thanks, -Aleksey From rkennke at redhat.com Fri Jan 19 10:52:53 2018 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 19 Jan 2018 11:52:53 +0100 Subject: RFR: Traveral GC heuristics In-Reply-To: References: <9e18ba8d-593f-57a7-12cc-99cffe164a88@redhat.com> <46b64ff3-d2df-b2da-0805-5fa7cf8593f0@redhat.com> Message-ID: <0f37e367-1c6d-c694-a6f0-91bcb380012a@redhat.com> Am 19.01.2018 um 00:21 schrieb Christine Flood: > Can we at least include a number of comments that we are using SATB > queues for convenience but this isn't using an SATB algorithm. > Otherwise future developers will curse us for misleading them. I've added this note on top of shenandoahTraversalGC.hpp: /** * NOTE: We are using the SATB buffer in thread.hpp and satbMarkQueue.hpp, however, it is not an SATB algorithm. * We're using the buffer as generic oop buffer to enqueue new values in concurrent oop stores, IOW, the algorithm * is incremental-update-based. */ > Is there some way to come up with a common abstraction for partial gc > and traversal gc so we don't have to have all those duplicate timings? Aleksey also noted this with regards to conc-mark. I wanted it to not impact existing code for the start. I'll see into refactoring and commoning the code after the initial change is in and got some testing and play time? > You have the MWF flag, but I don't see the implementation. You need > something in ShenandoahBarrierSet to see if the object being written > to was allocated after TAMS and if so, both the object and the field > need to be marked. I've only implemented it in C2. It's not checking TAMS (because I don't really maintain a usable TAMS) but instead enqueue the target object unconditionally. I've probably not understood MWF correctly? Should I rip it out and put it back in later, and hopefully correct? I will post a revised changeset with the above comment added later in this thread. Roman From rkennke at redhat.com Fri Jan 19 10:54:40 2018 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 19 Jan 2018 11:54:40 +0100 Subject: RFR: Demote warning message about OOM-during-evac to informational In-Reply-To: References: Message-ID: Am 19.01.2018 um 11:52 schrieb Aleksey Shipilev: > Let's finally do this: > > diff -r 8e52377a090e src/hotspot/share/gc/shenandoah/shenandoahConcurrentThread.cpp > --- a/src/hotspot/share/gc/shenandoah/shenandoahConcurrentThread.cpp Fri Jan 19 11:38:51 2018 +0100 > +++ b/src/hotspot/share/gc/shenandoah/shenandoahConcurrentThread.cpp Fri Jan 19 11:50:16 2018 +0100 > @@ -396,7 +396,9 @@ > if ((! Thread::current()->is_GC_task_thread()) && (! Thread::current()->is_ConcurrentGC_thread())) { > assert(! Threads_lock->owned_by_self() > || SafepointSynchronize::is_at_safepoint(), "must not hold Threads_lock here"); > - log_info(gc)("%s. Let Java thread wait until evacuation finishes.", > GCCause::to_string(GCCause::_shenandoah_allocation_failure_evac)); > + log_info(gc)("%s. Thread \"%s\" waits until evacuation finishes.", > + GCCause::to_string(GCCause::_shenandoah_allocation_failure_evac), > + Thread::current()->name()); > while (heap->is_evacuation_in_progress()) { // wait. > Thread::current()->_ParkEvent->park(1); > } > > User has nothing to do with that warning, and it is non-user-actionable. So, no point in putting > scary messages in the GC log. It now prints: > > [info][gc] GC(63) Concurrent cleanup 611M->611M(1024M) 0.202ms > [info][gc] GC(63) Cancelling concurrent GC: Allocation Failure During Evac > [info][gc] Allocation Failure During Evac. Thread "MyShinyThread" waits until evacuation finishes. > [info][gc] GC(63) Concurrent evacuation 612M->994M(1024M) 315.488ms > [info][gc] GC(64) Pause Full (Allocation Failure) 994M->541M(1024M) 312.493ms > > Testing: hotspot_fast_gc_shenandoah > > Thanks, > -Aleksey > Yeah ok. Do we still have any desire to fix this for real? I've pursued a couple of possible implementations, and all of them seemed overly complex or performance-impacting... Roman From ashipile at redhat.com Fri Jan 19 11:00:52 2018 From: ashipile at redhat.com (ashipile at redhat.com) Date: Fri, 19 Jan 2018 11:00:52 +0000 Subject: hg: shenandoah/jdk10: Demote warning message about OOM-during-evac to informational Message-ID: <201801191100.w0JB0qmZ003346@aojmv0008.oracle.com> Changeset: 12654193e434 Author: shade Date: 2018-01-19 11:52 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk10/rev/12654193e434 Demote warning message about OOM-during-evac to informational ! src/hotspot/share/gc/shenandoah/shenandoahConcurrentThread.cpp From shade at redhat.com Fri Jan 19 14:15:56 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 19 Jan 2018 15:15:56 +0100 Subject: RFR: TestSelectiveBarrierFlags should accept multi-element flag selections Message-ID: <2658f693-856c-418e-f6a9-4ed521abe200@redhat.com> Roman's test changes need this: Fixed the bug that breaks when more than 2 flags per group are present, and also rewritten for clarity: diff -r 12654193e434 test/hotspot/jtreg/gc/shenandoah/TestSelectiveBarrierFlags.java --- a/test/hotspot/jtreg/gc/shenandoah/TestSelectiveBarrierFlags.java Fri Jan 19 11:52:40 2018 +0100 +++ b/test/hotspot/jtreg/gc/shenandoah/TestSelectiveBarrierFlags.java Fri Jan 19 15:14:50 2018 +0100 @@ -69,10 +69,11 @@ StringBuilder sb = new StringBuilder(); for (String[] l : opts) { - int f = t % (l.length + 1); - conf.add("-XX:" + ((f & 1) == 1 ? "+" : "-") + l[0]); - if (l.length > 1) { - conf.add("-XX:" + ((f & 2) == 2 ? "+" : "-") + l[1]); + // Make a choice which flag to select from the group. + // Zero means no flag is selected from the group. + int choice = t % (l.length + 1); + for (int e = 0; e < l.length; e++) { + conf.add("-XX:" + ((choice == (e + 1)) ? "+" : "-") + l[e]); } t = t / (l.length + 1); } Testing: TestSelectiveBarrierFlags {fastdebug,release} Thanks, -Aleksey From zgu at redhat.com Fri Jan 19 14:41:05 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Fri, 19 Jan 2018 09:41:05 -0500 Subject: RFR(XXS) Missing resource mark Message-ID: <04976088-7470-8852-5629-50a7d4fe5b5e@redhat.com> Crashed inside handle_alloc_failure_evac() due to missing ResourceMark when calling thread->name(). # # Internal Error (/home/zgu/workspace/shenandoah-jdk10/src/hotspot/share/memory/resourceArea.hpp:63), pid=1230, tid=1232 # fatal error: memory leak: allocating without ResourceMark # Webrev: http://cr.openjdk.java.net/~zgu/shenandoah/handle_alloc_evac_rm/webrev.00/ Test: hotspot_gc_shenandoah (fastdebug) Thanks, -Zhengyu From shade at redhat.com Fri Jan 19 14:42:18 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 19 Jan 2018 15:42:18 +0100 Subject: RFR(XXS) Missing resource mark In-Reply-To: <04976088-7470-8852-5629-50a7d4fe5b5e@redhat.com> References: <04976088-7470-8852-5629-50a7d4fe5b5e@redhat.com> Message-ID: On 01/19/2018 03:41 PM, Zhengyu Gu wrote: > Crashed inside handle_alloc_failure_evac() due to missing ResourceMark when calling thread->name(). > > # > #? Internal Error > (/home/zgu/workspace/shenandoah-jdk10/src/hotspot/share/memory/resourceArea.hpp:63), pid=1230, tid=1232 > #? fatal error: memory leak: allocating without ResourceMark > # > > Webrev: http://cr.openjdk.java.net/~zgu/shenandoah/handle_alloc_evac_rm/webrev.00/ Looks good! -Aleksey From zgu at redhat.com Fri Jan 19 14:58:24 2018 From: zgu at redhat.com (zgu at redhat.com) Date: Fri, 19 Jan 2018 14:58:24 +0000 Subject: hg: shenandoah/jdk10: Missing resource mark in SH::handle_alloc_failure_evac() Message-ID: <201801191458.w0JEwOqi002428@aojmv0008.oracle.com> Changeset: d791ef88cdff Author: zgu Date: 2018-01-19 09:54 -0500 URL: http://hg.openjdk.java.net/shenandoah/jdk10/rev/d791ef88cdff Missing resource mark in SH::handle_alloc_failure_evac() ! src/hotspot/share/gc/shenandoah/shenandoahConcurrentThread.cpp From rkennke at redhat.com Fri Jan 19 15:26:02 2018 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 19 Jan 2018 16:26:02 +0100 Subject: RFR: TestSelectiveBarrierFlags should accept multi-element flag selections In-Reply-To: <2658f693-856c-418e-f6a9-4ed521abe200@redhat.com> References: <2658f693-856c-418e-f6a9-4ed521abe200@redhat.com> Message-ID: Good. This makes my test work :-) Push it! Roman > Roman's test changes need this: Fixed the bug that breaks when more than 2 flags per group are > present, and also rewritten for clarity: > > diff -r 12654193e434 test/hotspot/jtreg/gc/shenandoah/TestSelectiveBarrierFlags.java > --- a/test/hotspot/jtreg/gc/shenandoah/TestSelectiveBarrierFlags.java Fri Jan 19 11:52:40 2018 +0100 > +++ b/test/hotspot/jtreg/gc/shenandoah/TestSelectiveBarrierFlags.java Fri Jan 19 15:14:50 2018 +0100 > @@ -69,10 +69,11 @@ > > StringBuilder sb = new StringBuilder(); > for (String[] l : opts) { > - int f = t % (l.length + 1); > - conf.add("-XX:" + ((f & 1) == 1 ? "+" : "-") + l[0]); > - if (l.length > 1) { > - conf.add("-XX:" + ((f & 2) == 2 ? "+" : "-") + l[1]); > + // Make a choice which flag to select from the group. > + // Zero means no flag is selected from the group. > + int choice = t % (l.length + 1); > + for (int e = 0; e < l.length; e++) { > + conf.add("-XX:" + ((choice == (e + 1)) ? "+" : "-") + l[e]); > } > t = t / (l.length + 1); > } > > Testing: TestSelectiveBarrierFlags {fastdebug,release} > > Thanks, > -Aleksey > From shade at redhat.com Fri Jan 19 15:27:59 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 19 Jan 2018 16:27:59 +0100 Subject: RFC: Pick up 9.0.4 to sh/jdk9 In-Reply-To: <575a7f9c-6ee3-cae9-1b17-a5ae9c871ad3@redhat.com> References: <575a7f9c-6ee3-cae9-1b17-a5ae9c871ad3@redhat.com> Message-ID: <6b4285eb-92db-a479-d132-c7c40928d3f9@redhat.com> On 01/19/2018 10:10 AM, Aleksey Shipilev wrote: > Upstream jdk-updates/jdk9u had pushed the changesets for 9.0.4: > http://hg.openjdk.java.net/jdk-updates/jdk9u > > Let's pick them up! A few trivial merges were needed. Ah, upstream seems to have borked AArch64! https://bugs.openjdk.java.net/browse/JDK-8195685 Let's wait a little then... -Aleksey From ashipile at redhat.com Fri Jan 19 15:36:37 2018 From: ashipile at redhat.com (ashipile at redhat.com) Date: Fri, 19 Jan 2018 15:36:37 +0000 Subject: hg: shenandoah/jdk10: TestSelectiveBarrierFlags should accept multi-element flag selections Message-ID: <201801191536.w0JFab77015699@aojmv0008.oracle.com> Changeset: 67294a38c0c7 Author: shade Date: 2018-01-19 16:27 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk10/rev/67294a38c0c7 TestSelectiveBarrierFlags should accept multi-element flag selections ! test/hotspot/jtreg/gc/shenandoah/TestSelectiveBarrierFlags.java From zgu at redhat.com Fri Jan 19 16:46:28 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Fri, 19 Jan 2018 11:46:28 -0500 Subject: Degenerated GC In-Reply-To: References: Message-ID: shenandoahHeap.cpp: 1600 1601 // Allocations happen during concurrent preclean, record peak after the phase: 1602 shenandoahPolicy()->record_peak_occupancy(); 1603 } 1604 1605 // Allocations happen during bitmap cleanup, record peak after the phase: 1606 shenandoahPolicy()->record_peak_occupancy(); May call twice. Otherwise, looks good. -Zhengyu On 01/19/2018 02:55 AM, Aleksey Shipilev wrote: > On 01/18/2018 08:51 PM, Aleksey Shipilev wrote: >> On 01/18/2018 04:18 PM, Aleksey Shipilev wrote: >>> http://cr.openjdk.java.net/~shade/shenandoah/degenerated-gc/webrev.01/ >> >> Amped up alloc-failure injection, and that exposed a few bugs. Fixed them: >> http://cr.openjdk.java.net/~shade/shenandoah/degenerated-gc/webrev.02/ >> >> GCBasher runs for half an hour now without problems. Running further... > > 8-hour GCBasher passes with: > > $ -Xmx1g -XX:+UseShenandoahGC -XX:+UnlockDiagnosticVMOptions -XX:+UnlockExperimentalVMOptions > -XX:ShenandoahGCHeuristics=aggressive -XX:+ShenandoahDegenerateALot TestGCBasherWithShenandoah 28800000 > > [27665.812s][info][gc,stats ] 85556 successful concurrent GCs > [27665.812s][info][gc,stats ] 0 invoked explicitly > [27665.812s][info][gc,stats ] > [27665.812s][info][gc,stats ] 44995 Degenerated GCs > [27665.812s][info][gc,stats ] 44995 caused by allocation failure > [27665.812s][info][gc,stats ] 8628 upgraded to Full GC > [27665.812s][info][gc,stats ] > [27665.812s][info][gc,stats ] 8758 Full GCs > [27665.812s][info][gc,stats ] 0 invoked explicitly > [27665.812s][info][gc,stats ] 130 caused by allocation failure > [27665.812s][info][gc,stats ] 8628 upgraded from Degenerated GC > > So, I am pretty sure it works :) > > -Aleksey > From rkennke at redhat.com Fri Jan 19 16:53:54 2018 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 19 Jan 2018 17:53:54 +0100 Subject: RFR: Implement flag to generate write-barriers without membars Message-ID: <72dc574c-54c5-c5fc-81ff-5f3028817f20@redhat.com> I extracted this from Traversal GC because it may be useful for other situations too. It introduces a flag ShenandoahWBWithMembar which enables to avoid generation of the load-load-membar in the write-barrier. This membar is not needed when evacuation is always turned off at safepoints (e.g. partial). http://cr.openjdk.java.net/~rkennke/wbwithmembar/webrev.00/ Test: hotspot_gc_shenandoah passed Roman From shade at redhat.com Fri Jan 19 16:59:59 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 19 Jan 2018 17:59:59 +0100 Subject: RFR: Implement flag to generate write-barriers without membars In-Reply-To: <72dc574c-54c5-c5fc-81ff-5f3028817f20@redhat.com> References: <72dc574c-54c5-c5fc-81ff-5f3028817f20@redhat.com> Message-ID: <1afef25c-8d6e-e0a1-8e37-ea010f03a666@redhat.com> On 01/19/2018 05:53 PM, Roman Kennke wrote: > I extracted this from Traversal GC because it may be useful for other situations too. It introduces > a flag ShenandoahWBWithMembar which enables to avoid generation of the load-load-membar in the > write-barrier. This membar is not needed when evacuation is always turned off at safepoints (e.g. > partial). > > http://cr.openjdk.java.net/~rkennke/wbwithmembar/webrev.00/ Looks good to me, assuming nothing changed since Traversal GC code that Roland reviewed :) Minor nit: can we make the option name consistent with other selective options. E.g. we have ShenandoahWriteBarrierRB. So this one seems to be ShenandoahWriteBarrierMembar? Thanks, -Aleksey From shade at redhat.com Fri Jan 19 17:05:39 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 19 Jan 2018 18:05:39 +0100 Subject: RFR: Allocation failure injection machinery Message-ID: http://cr.openjdk.java.net/~shade/shenandoah/inject-alloc-failure/webrev.01/ Found many bugs in Degenerated GC with this machinery. But it is separate from the rest of the code, and is useful to have for general testing: for example, to test if baseline without Degenerated GC fails the same way. Therefore, this patch splits the machinery out. Testing: hotspot_gc_shenandoah Thanks, -Aleksey From zgu at redhat.com Fri Jan 19 17:35:11 2018 From: zgu at redhat.com (zgu at redhat.com) Date: Fri, 19 Jan 2018 17:35:11 +0000 Subject: hg: shenandoah/jdk10: Hint unused regions instead of uncommit them Message-ID: <201801191735.w0JHZB9d026722@aojmv0008.oracle.com> Changeset: 46c3360b6623 Author: zgu Date: 2018-01-19 11:37 -0500 URL: http://hg.openjdk.java.net/shenandoah/jdk10/rev/46c3360b6623 Hint unused regions instead of uncommit them ! src/hotspot/os/linux/os_linux.cpp ! src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp ! src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp ! src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.cpp ! src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.hpp ! src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp ! src/hotspot/share/runtime/os.cpp ! src/hotspot/share/runtime/os.hpp ! test/hotspot/jtreg/gc/shenandoah/acceptance/HeapUncommit.java From rkennke at redhat.com Fri Jan 19 17:43:44 2018 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 19 Jan 2018 18:43:44 +0100 Subject: RFR: Implement flag to generate write-barriers without membars In-Reply-To: <1afef25c-8d6e-e0a1-8e37-ea010f03a666@redhat.com> References: <72dc574c-54c5-c5fc-81ff-5f3028817f20@redhat.com> <1afef25c-8d6e-e0a1-8e37-ea010f03a666@redhat.com> Message-ID: Am 19.01.2018 um 17:59 schrieb Aleksey Shipilev: > On 01/19/2018 05:53 PM, Roman Kennke wrote: >> I extracted this from Traversal GC because it may be useful for other situations too. It introduces >> a flag ShenandoahWBWithMembar which enables to avoid generation of the load-load-membar in the >> write-barrier. This membar is not needed when evacuation is always turned off at safepoints (e.g. >> partial). >> >> http://cr.openjdk.java.net/~rkennke/wbwithmembar/webrev.00/ > > Looks good to me, assuming nothing changed since Traversal GC code that Roland reviewed :) > > Minor nit: can we make the option name consistent with other selective options. E.g. we have > ShenandoahWriteBarrierRB. So this one seems to be ShenandoahWriteBarrierMembar? > > Thanks, > -Aleksey > Nothing should have changed since Roland's review. http://cr.openjdk.java.net/~rkennke/wbwithmembar/webrev.01/ Ok to go? Roman From shade at redhat.com Fri Jan 19 17:44:36 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 19 Jan 2018 18:44:36 +0100 Subject: RFR: Implement flag to generate write-barriers without membars In-Reply-To: References: <72dc574c-54c5-c5fc-81ff-5f3028817f20@redhat.com> <1afef25c-8d6e-e0a1-8e37-ea010f03a666@redhat.com> Message-ID: <486374dc-7bde-7a42-911f-a41f374ed729@redhat.com> On 01/19/2018 06:43 PM, Roman Kennke wrote: > Nothing should have changed since Roland's review. > > http://cr.openjdk.java.net/~rkennke/wbwithmembar/webrev.01/ > > Ok to go? OK for me. -Aleksey From roman at kennke.org Fri Jan 19 17:52:47 2018 From: roman at kennke.org (roman at kennke.org) Date: Fri, 19 Jan 2018 17:52:47 +0000 Subject: hg: shenandoah/jdk10: Implement flag to generate write-barriers without membars. Message-ID: <201801191752.w0JHqliZ003447@aojmv0008.oracle.com> Changeset: ecb87af5e0d8 Author: rkennke Date: 2018-01-19 18:40 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk10/rev/ecb87af5e0d8 Implement flag to generate write-barriers without membars. ! src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp ! src/hotspot/share/opto/compile.cpp ! src/hotspot/share/opto/shenandoahSupport.cpp From rkennke at redhat.com Sat Jan 20 13:46:21 2018 From: rkennke at redhat.com (Roman Kennke) Date: Sat, 20 Jan 2018 14:46:21 +0100 Subject: Race and double-counting objects in task balancing code? Message-ID: Hi there, I'm currently chasing a failure of TestGCThreadGroups.java with Traversal GC. I'm getting objects double counted and liveness going off the rails. It only seems to happen with ConcGCThreads > ParallelGCThreads. I am wondering what prevents GC workers from stealing oops off of queues that are currently transferred to 'regular' queues. ? Might we have a race there? Is this transferral thread-safe wrt to stealing? Or am I missing something? Please have a look at the last patch: http://cr.openjdk.java.net/~rkennke/traversal/webrev.03/ around: shenandoahTraversalGC.cpp mark_loop_work() The code is almost 100% identical to what we do in shenandoahConcurrentMark.cpp I wonder if simply letting fewer GC threads steal from extra queues might be the safer way to transfer work from extra queues? Roman From shade at redhat.com Mon Jan 22 09:16:22 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 22 Jan 2018 10:16:22 +0100 Subject: RFR: Make concurrent precleaning log message optional again Message-ID: http://cr.openjdk.java.net/~shade/shenandoah/preclean-optional/webrev.01/ This is the fix for UX regression after recent refactoring: even when precleaning is not enabled and/or process references is not enabled, we still print "Concurrent precleaning" message in the log. Testing: hotspot_gc_shenandoah Thanks, -Aleksey From rkennke at redhat.com Mon Jan 22 09:48:43 2018 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 22 Jan 2018 10:48:43 +0100 Subject: Race and double-counting objects in task balancing code? In-Reply-To: References: Message-ID: I think I know what it is. This is the first GC mode in Shenandoah that also traces newly allocated objects. I believe what I am seeing is that the GC thread doesn't see the updated region top yet, and thus fails the assertion live <= used. Roman > Hi there, > > I'm currently chasing a failure of TestGCThreadGroups.java with > Traversal GC. I'm getting objects double counted and liveness going off > the rails. It only seems to happen with ConcGCThreads > ParallelGCThreads. > > I am wondering what prevents GC workers from stealing oops off of queues > that are currently transferred to 'regular' queues. ? Might we have a > race there? Is this transferral thread-safe wrt to stealing? Or am I > missing something? Please have a look at the last patch: > > http://cr.openjdk.java.net/~rkennke/traversal/webrev.03/ > > around: > > shenandoahTraversalGC.cpp mark_loop_work() > > The code is almost 100% identical to what we do in > shenandoahConcurrentMark.cpp > > I wonder if simply letting fewer GC threads steal from extra queues > might be the safer way to transfer work from extra queues? > > Roman From shade at redhat.com Mon Jan 22 09:51:32 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 22 Jan 2018 10:51:32 +0100 Subject: RFR: Log message on ref processing and class unload for mark events Message-ID: <55fbd69b-cb79-6920-979c-1693a5dc5800@redhat.com> http://cr.openjdk.java.net/~shade/shenandoah/mark-message/webrev.01/ Another UX improvement: print marking cycle flavor in the log. Helps to diagnose if failing/slower marking cycle was somehow special. [14.017s][info][gc] GC(65) Pause Init Mark 0.266ms [14.047s][info][gc] GC(65) Concurrent marking (class unload) 900M->961M(1024M) 30.572ms [14.053s][info][gc] GC(65) Pause Final Mark (class unload) 5.529ms ... [14.135s][info][gc] GC(66) Pause Init Mark 0.697ms [14.195s][info][gc] GC(66) Concurrent marking (ref process) 869M->927M(1024M) 60.646ms [14.196s][info][gc] GC(66) Concurrent precleaning 927M->928M(1024M) 0.431ms [14.200s][info][gc] GC(66) Pause Final Mark (ref process) 4.355ms ... [14.378s][info][gc] GC(67) Pause Init Mark 0.633ms [14.453s][info][gc] GC(67) Concurrent marking 911M->988M(1024M) 75.755ms [14.456s][info][gc] GC(67) Pause Final Mark 2.735ms Testing: hotspot_gc_shenandoah Thanks, -Aleksey From rkennke at redhat.com Mon Jan 22 09:59:00 2018 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 22 Jan 2018 10:59:00 +0100 Subject: RFR: Make concurrent precleaning log message optional again In-Reply-To: References: Message-ID: <7ad35698-0f1c-7ad9-8689-6fd61f6e9597@redhat.com> Am 22.01.2018 um 10:16 schrieb Aleksey Shipilev: > http://cr.openjdk.java.net/~shade/shenandoah/preclean-optional/webrev.01/ > > This is the fix for UX regression after recent refactoring: even when precleaning is not enabled > and/or process references is not enabled, we still print "Concurrent precleaning" message in the log. > > Testing: hotspot_gc_shenandoah > > Thanks, > -Aleksey > Ok From rkennke at redhat.com Mon Jan 22 09:59:38 2018 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 22 Jan 2018 10:59:38 +0100 Subject: RFR: Log message on ref processing and class unload for mark events In-Reply-To: <55fbd69b-cb79-6920-979c-1693a5dc5800@redhat.com> References: <55fbd69b-cb79-6920-979c-1693a5dc5800@redhat.com> Message-ID: <39519b76-9268-bc75-d996-9fe39d97d468@redhat.com> Am 22.01.2018 um 10:51 schrieb Aleksey Shipilev: > http://cr.openjdk.java.net/~shade/shenandoah/mark-message/webrev.01/ > > Another UX improvement: print marking cycle flavor in the log. Helps to diagnose if failing/slower > marking cycle was somehow special. > > [14.017s][info][gc] GC(65) Pause Init Mark 0.266ms > [14.047s][info][gc] GC(65) Concurrent marking (class unload) 900M->961M(1024M) 30.572ms > [14.053s][info][gc] GC(65) Pause Final Mark (class unload) 5.529ms > ... > [14.135s][info][gc] GC(66) Pause Init Mark 0.697ms > [14.195s][info][gc] GC(66) Concurrent marking (ref process) 869M->927M(1024M) 60.646ms > [14.196s][info][gc] GC(66) Concurrent precleaning 927M->928M(1024M) 0.431ms > [14.200s][info][gc] GC(66) Pause Final Mark (ref process) 4.355ms > ... > [14.378s][info][gc] GC(67) Pause Init Mark 0.633ms > [14.453s][info][gc] GC(67) Concurrent marking 911M->988M(1024M) 75.755ms > [14.456s][info][gc] GC(67) Pause Final Mark 2.735ms > > Testing: hotspot_gc_shenandoah > > Thanks, > -Aleksey > Very good. Go! Roman From rkennke at redhat.com Mon Jan 22 10:05:08 2018 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 22 Jan 2018 11:05:08 +0100 Subject: RFR: Allocation failure injection machinery In-Reply-To: References: Message-ID: <9bcce6f4-3b35-1486-ed27-8f08b6485c00@redhat.com> Am 19.01.2018 um 18:05 schrieb Aleksey Shipilev: > http://cr.openjdk.java.net/~shade/shenandoah/inject-alloc-failure/webrev.01/ > > Found many bugs in Degenerated GC with this machinery. But it is separate from the rest of the code, > and is useful to have for general testing: for example, to test if baseline without Degenerated GC > fails the same way. Therefore, this patch splits the machinery out. > > Testing: hotspot_gc_shenandoah > > Thanks, > -Aleksey > Looks good. From ashipile at redhat.com Mon Jan 22 10:15:59 2018 From: ashipile at redhat.com (ashipile at redhat.com) Date: Mon, 22 Jan 2018 10:15:59 +0000 Subject: hg: shenandoah/jdk10: 3 new changesets Message-ID: <201801221015.w0MAFxVp016646@aojmv0008.oracle.com> Changeset: 820129a799b1 Author: shade Date: 2018-01-19 18:49 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk10/rev/820129a799b1 Allocation failure injection machinery ! src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp ! src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp ! src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp ! test/hotspot/jtreg/gc/shenandoah/LotsOfCycles.java ! test/hotspot/jtreg/gc/shenandoah/acceptance/AllocIntArrays.java ! test/hotspot/jtreg/gc/shenandoah/acceptance/AllocObjectArrays.java ! test/hotspot/jtreg/gc/shenandoah/acceptance/AllocObjects.java ! test/hotspot/jtreg/gc/shenandoah/acceptance/RetainObjects.java ! test/hotspot/jtreg/gc/shenandoah/acceptance/SieveObjects.java ! test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherWithShenandoah.java ! test/hotspot/jtreg/gc/stress/gclocker/TestGCLockerWithShenandoah.java ! test/hotspot/jtreg/gc/stress/gcold/TestGCOldWithShenandoah.java Changeset: e5398dce6e7b Author: shade Date: 2018-01-22 10:10 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk10/rev/e5398dce6e7b Make concurrent precleaning log message optional again ! src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp Changeset: b8c39bdc0dac Author: shade Date: 2018-01-22 10:47 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk10/rev/b8c39bdc0dac Log message on ref processing and class unload for mark events ! src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp ! src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp From shade at redhat.com Mon Jan 22 11:23:58 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 22 Jan 2018 12:23:58 +0100 Subject: RFR: Do not put down update-refs-in-progress flag concurrently Message-ID: http://cr.openjdk.java.net/~shade/shenandoah/no-concurrent-ur-flag/webrev.01/ There is a race with update-refs-in-progress flag handling that is reliably reproducible with Degenerated GC patch and AllocFailureALot. On cancellation path, ShConcThread puts u-r-in-p to false (this was added to handle partial GC failure, IIRC). But, this is enough race window for *native* thread to skip StoreValBarrier that is sensed by ShBarrierSet::need_update_refs_barrier and then ShBarrierSet::write_ref_array silently corrupts the heap by not fixing up from-space ptrs. The way out is to handle it properly, at safepoint. No thread is waiting for that flag to get down, so there is no reason at all to do this concurrently. Full GC code has to clean up the flag instead. There is a similar but significantly more complicated patch for evac-in-progress, which is better be separate from this. Testing: hotspot_gc_shenandoah, Degenerate GC tests Thanks, -Aleksey From shade at redhat.com Mon Jan 22 11:51:45 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 22 Jan 2018 12:51:45 +0100 Subject: Bug: -XX:-ShenandoahWriteBarrierMemBar crashes XmlTransform Message-ID: <3c4a3faf-63a4-fa32-3f89-6718c4a6d459@redhat.com> Run XmlTransform with: -XX:ShenandoahGCHeuristics=passive -XX:+ShenandoahWriteBarrier -XX:-ShenandoahWriteBarrierMemBar Fails with: # Internal Error (/home/shade/trunks/shenandoah-jdk10/src/hotspot/share/opto/shenandoahSupport.cpp:4062), pid=5060, tid=5085 # assert(load->Opcode() == Op_LoadUB) failed: inconsistent Stack: [0x00007f25c462d000,0x00007f25c472e000], sp=0x00007f25c4725ad0, free space=994k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x1969e5e] VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x4ce V [libjvm.so+0x196a9cf] VMError::report_and_die(Thread*, char const*, int, char const*, char const*, __va_list_tag*)+0x2f V [libjvm.so+0xaf7d82] report_vm_error(char const*, int, char const*, char const*, ...)+0x112 V [libjvm.so+0x17b10ea] ShenandoahWriteBarrierNode::move_evacuation_test_out_of_loop(IfNode*, PhaseIdealLoop*)+0xc2a V [libjvm.so+0x1149679] PhaseIdealLoop::do_unswitching(IdealLoopTree*, Node_List&)+0x2ac9 V [libjvm.so+0x17b0397] ShenandoahWriteBarrierNode::optimize_after_expansion(Node_List const&, Node_List const&, Node_List&, PhaseIdealLoop*)+0x3c7 V [libjvm.so+0x115e95d] PhaseIdealLoop::build_and_optimize(LoopOptsMode)+0x11bd V [libjvm.so+0xa4bc3b] Compile::optimize_loops(int&, PhaseIterGVN&, LoopOptsMode)+0x58b V [libjvm.so+0x17a3b78] ShenandoahWriteBarrierNode::expand(Compile*, PhaseIterGVN&, int&)+0x648 I put the additional printing in the assert, and it is now: assert(load->Opcode() == Op_LoadUB) failed: inconsistent: AndI I believe that AndI is the mask from GC state load. Not sure if that entire branch matters for correctness, or we just assert wrong things. Removing the assert makes the compiler fail with "Bad graph detected in build_loop_late". Can you guys understand what is going on there, and fix it? I think Traversal GC is broken because of that. Thanks, -Aleksey From shade at redhat.com Mon Jan 22 11:56:31 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 22 Jan 2018 12:56:31 +0100 Subject: Degenerated GC In-Reply-To: References: Message-ID: On 01/19/2018 05:46 PM, Zhengyu Gu wrote: > shenandoahHeap.cpp: > 1600 > 1601???? // Allocations happen during concurrent preclean, record peak after the phase: > 1602???? shenandoahPolicy()->record_peak_occupancy(); > 1603?? } > 1604 > 1605?? // Allocations happen during bitmap cleanup, record peak after the phase: > 1606?? shenandoahPolicy()->record_peak_occupancy(); > > May call twice. Yup, that one is fixed, thanks! I have been chasing a weird bug in Degenerated GC, which turns out to be a separate issue, see update-refs-in-progress race RFR on this list. That bugfix should be pushed before Degenerated GC, otherwise tests start to reliably fail. Updated patch for Degenerated GC: http://cr.openjdk.java.net/~shade/shenandoah/degenerated-gc/webrev.03/ Thanks, -Aleksey From rkennke at redhat.com Mon Jan 22 12:20:46 2018 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 22 Jan 2018 13:20:46 +0100 Subject: RFR: Do not put down update-refs-in-progress flag concurrently In-Reply-To: References: Message-ID: <4e7d7fd1-8fc6-efc9-223b-8fc6d437da33@redhat.com> Am 22.01.2018 um 12:23 schrieb Aleksey Shipilev: > http://cr.openjdk.java.net/~shade/shenandoah/no-concurrent-ur-flag/webrev.01/ > > There is a race with update-refs-in-progress flag handling that is reliably reproducible with > Degenerated GC patch and AllocFailureALot. On cancellation path, ShConcThread puts u-r-in-p to false > (this was added to handle partial GC failure, IIRC). But, this is enough race window for *native* > thread to skip StoreValBarrier that is sensed by ShBarrierSet::need_update_refs_barrier and then > ShBarrierSet::write_ref_array silently corrupts the heap by not fixing up from-space ptrs. > > The way out is to handle it properly, at safepoint. No thread is waiting for that flag to get down, > so there is no reason at all to do this concurrently. Full GC code has to clean up the flag instead. > > There is a similar but significantly more complicated patch for evac-in-progress, which is better be > separate from this. > > Testing: hotspot_gc_shenandoah, Degenerate GC tests > > Thanks, > -Aleksey > Sounds good. We've had enough issues with concurrently putting down -in-progress flags. Let's just do it at a proper safepoint. Roman From ashipile at redhat.com Mon Jan 22 15:48:01 2018 From: ashipile at redhat.com (ashipile at redhat.com) Date: Mon, 22 Jan 2018 15:48:01 +0000 Subject: hg: shenandoah/jdk10: Do not put down update-refs-in-progress flag concurrently Message-ID: <201801221548.w0MFm1PE003003@aojmv0008.oracle.com> Changeset: dc779781dd5e Author: shade Date: 2018-01-22 12:04 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk10/rev/dc779781dd5e Do not put down update-refs-in-progress flag concurrently ! src/hotspot/share/gc/shenandoah/shenandoahConcurrentThread.cpp ! src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp ! src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp ! src/hotspot/share/gc/shenandoah/shenandoahMarkCompact.cpp From ashipile at redhat.com Mon Jan 22 15:48:16 2018 From: ashipile at redhat.com (ashipile at redhat.com) Date: Mon, 22 Jan 2018 15:48:16 +0000 Subject: hg: shenandoah/jdk10: Degenerated GC Message-ID: <201801221548.w0MFmGMX003143@aojmv0008.oracle.com> Changeset: 45d471869b73 Author: shade Date: 2018-01-22 12:52 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk10/rev/45d471869b73 Degenerated GC ! src/hotspot/share/gc/shenandoah/shenandoahCollectorPolicy.cpp ! src/hotspot/share/gc/shenandoah/shenandoahCollectorPolicy.hpp ! src/hotspot/share/gc/shenandoah/shenandoahConcurrentThread.cpp ! src/hotspot/share/gc/shenandoah/shenandoahConcurrentThread.hpp ! src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp ! src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp ! src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.cpp ! src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.hpp ! src/hotspot/share/gc/shenandoah/shenandoahUtils.hpp ! src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp ! src/hotspot/share/gc/shenandoah/shenandoahVerifier.hpp ! src/hotspot/share/gc/shenandoah/shenandoahWorkerPolicy.cpp ! src/hotspot/share/gc/shenandoah/shenandoahWorkerPolicy.hpp ! src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp ! src/hotspot/share/gc/shenandoah/vm_operations_shenandoah.cpp ! src/hotspot/share/gc/shenandoah/vm_operations_shenandoah.hpp ! src/hotspot/share/runtime/vm_operations.hpp From rkennke at redhat.com Mon Jan 22 22:16:05 2018 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 22 Jan 2018 23:16:05 +0100 Subject: Bug: -XX:-ShenandoahWriteBarrierMemBar crashes XmlTransform In-Reply-To: <3c4a3faf-63a4-fa32-3f89-6718c4a6d459@redhat.com> References: <3c4a3faf-63a4-fa32-3f89-6718c4a6d459@redhat.com> Message-ID: Am 22.01.2018 um 12:51 schrieb Aleksey Shipilev: > Run XmlTransform with: > -XX:ShenandoahGCHeuristics=passive -XX:+ShenandoahWriteBarrier -XX:-ShenandoahWriteBarrierMemBar > > Fails with: > # Internal Error > (/home/shade/trunks/shenandoah-jdk10/src/hotspot/share/opto/shenandoahSupport.cpp:4062), pid=5060, > tid=5085 > # assert(load->Opcode() == Op_LoadUB) failed: inconsistent > > Stack: [0x00007f25c462d000,0x00007f25c472e000], sp=0x00007f25c4725ad0, free space=994k > Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native > code) > V [libjvm.so+0x1969e5e] VMError::report_and_die(int, char const*, char const*, __va_list_tag*, > Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x4ce > V [libjvm.so+0x196a9cf] VMError::report_and_die(Thread*, char const*, int, char const*, char > const*, __va_list_tag*)+0x2f > V [libjvm.so+0xaf7d82] report_vm_error(char const*, int, char const*, char const*, ...)+0x112 > V [libjvm.so+0x17b10ea] ShenandoahWriteBarrierNode::move_evacuation_test_out_of_loop(IfNode*, > PhaseIdealLoop*)+0xc2a > V [libjvm.so+0x1149679] PhaseIdealLoop::do_unswitching(IdealLoopTree*, Node_List&)+0x2ac9 > V [libjvm.so+0x17b0397] ShenandoahWriteBarrierNode::optimize_after_expansion(Node_List const&, > Node_List const&, Node_List&, PhaseIdealLoop*)+0x3c7 > V [libjvm.so+0x115e95d] PhaseIdealLoop::build_and_optimize(LoopOptsMode)+0x11bd > V [libjvm.so+0xa4bc3b] Compile::optimize_loops(int&, PhaseIterGVN&, LoopOptsMode)+0x58b > V [libjvm.so+0x17a3b78] ShenandoahWriteBarrierNode::expand(Compile*, PhaseIterGVN&, int&)+0x648 > > I put the additional printing in the assert, and it is now: > > assert(load->Opcode() == Op_LoadUB) failed: inconsistent: AndI > > I believe that AndI is the mask from GC state load. Not sure if that entire branch matters for > correctness, or we just assert wrong things. Removing the assert makes the compiler fail with "Bad > graph detected in build_loop_late". > > Can you guys understand what is going on there, and fix it? I think Traversal GC is broken because > of that. > > Thanks, > -Aleksey > Interestingly, I don't see it with the traversal patch. So maybe something in it fixes it, or the different graph shapes generated by traversal doesn't trigger it. Maybe try with the latest patch from the 'Traversal GC' thread? Roman From rkennke at redhat.com Mon Jan 22 22:17:23 2018 From: rkennke at redhat.com (Roman Kennke) Date: Mon, 22 Jan 2018 23:17:23 +0100 Subject: RFR: Traveral GC heuristics In-Reply-To: <5d9ea546-927b-28eb-0dd8-1ee8f4192862@redhat.com> References: <9e18ba8d-593f-57a7-12cc-99cffe164a88@redhat.com> <46b64ff3-d2df-b2da-0805-5fa7cf8593f0@redhat.com> <5d9ea546-927b-28eb-0dd8-1ee8f4192862@redhat.com> Message-ID: <75c6db4a-3236-dcd8-a4f4-67b78103129c@redhat.com> Am 19.01.2018 um 11:24 schrieb Aleksey Shipilev: > On 01/17/2018 10:58 PM, Roman Kennke wrote: >> Yes, maybe. But for the start, I did not want it to interfere with existing code if I can avoid >> it. For this reason, this looks like a copy+paste job from conc-mark and partial for some parts. > > Okay, but please plan to common these things right away. We cannot have two copy-pasted 1000+ LOC > blocks and hope for the best ;) Well the plan is to get rid of all the other stuff and make traversal the GC to rule them all ;-) >> Thanks for reviewing and spotting all the issues. I could not really make a diff webrev, because I >> first had to pull -u your latest work, and this messed up my differential webrev... sorry. Only full >> webrev now: >> >> http://cr.openjdk.java.net/~rkennke/traversal/webrev.03/ > > Sorry to be a PITA about this, but the change is quite large, and I think we want to be more > forward-looking to backports and stability. > > Another sweep through the code: > > *) GCCause::to_string misses the to_string case for _shenandoah_traversal_gc? Added. > *) So, wait. SBS::nterpreter_write_barrier_impl caller-saves registers when they do not equal to > dst. New code in SBS::interpreter_storeval_barrier just does it unconditionally. Is WB too cautious, > or SVB is too lax about this? Neither. The WB returns a value into the same register as the input value. We don't want to trash this when returning. The enqueing barrier is a one-way street. > *) I think with minimal changes, we can make ShenandoahStoreValEnqueueBarrier exclusive, which will > make testing much easier (encoding this in TestSelectiveBarriers would be trivial). E.g. say: > > if (UseShenandoahGC) { > if (ShenandoahStoreValWriteBarrier || ShenandoahStoreValEnqueueBarrier) { > // perform WB > } > if (ShenandoahStoreValEnqueueBarrier) { > // enqueue > } > if (ShenandoahStoreValReadBarrier) { > // RB > } > } Done. Altough it turned out to be not so minimal. Needed to add checks all over the place. > *) Minor nit: please indent second arguments like this: > > FLAG_SET_DEFAULT(UseShenandoahMatrix, false); > FLAG_SET_DEFAULT(ShenandoahSATBBarrier, false); > FLAG_SET_DEFAULT(ShenandoahConditionalSATBBarrier, false); > FLAG_SET_DEFAULT(ShenandoahStoreValReadBarrier, false); > FLAG_SET_DEFAULT(ShenandoahStoreValWriteBarrier, true); > FLAG_SET_DEFAULT(ShenandoahStoreValEnqueueBarrier, true); > FLAG_SET_DEFAULT(ShenandoahKeepAliveBarrier, false); > FLAG_SET_DEFAULT(ShenandoahAsmWB, true); > FLAG_SET_DEFAULT(ShenandoahBarriersForConst, true); > FLAG_SET_DEFAULT(ShenandoahWBWithMemBar, false); > FLAG_SET_DEFAULT(ShenandoahWriteBarrierRB, false); Done. > *) shenandoahOopClosures.hpp, indenting is a bit off here: > > 240 _thread(Thread::current()), _queue(q) {} > > ... > > 273 virtual bool do_metadata() { return true; } Fixed. > *) I wonder if we want to pull out ShenandoahWBWithMemBar changes into a separate changeset? This > looks potentially backportable, and usable outside of Traversal GC. Already done and pushed. Also, I have added tests that exercise traversal heuristics just as we do for other heuristics. This turned up a number of bugs and improvements that I fixed: - when growing the heap, we must make sure that the TAMS points to end for the new regions, otherwise we'd treat them implicitely marked. - added periodic GC - folded 'SATB' queue processing into thread stack scanning. The problem here is that iterating the threads 2x is cumbersome because of the claiming protocol: we need to fire the task/workers 2x: once for the SATB queues, once for the thread scanning. I folded it into one pass. This required a (trivial) extension in the upstream parallel thread scanning/iteration protocol. - I tripped an assert in SHR::increase_live_data(). I think the reason is that we have a race here: a GC thread might not yet see the updated SHR::_top but already accounts for the updated live data. I excluded conc-traversal from that check. This could probably be fixed by doing the proper concurrency membars, but do we care? For assertion code? - interesting bug: in mark-compact, we first check for stuff-in-progress, and turn it off. when checking for marking-in-progress first, we turn that off first and also turn off SATB. Notice the overlap of MARKING with TRAVERSAL. We then go on to check for TRAVERSAL, see that it's also ON, turn it off, which also turns off SATB again, and trip an assert because it checks the correct SATB active state. Reordering the checks fixes this. - I had a little index-out-of-bounds in humongous-checking code. Trivially fixed by bounds-checking. - Updated patch to match current head (some conflicts with degen) Differential: http://cr.openjdk.java.net/~rkennke/traversal/webrev.04.diff/ Full: http://cr.openjdk.java.net/~rkennke/traversal/webrev.04/ Testing: hotspot_gc_shenandoah passes Roman From shade at redhat.com Tue Jan 23 10:36:34 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 23 Jan 2018 11:36:34 +0100 Subject: Bug: -XX:-ShenandoahWriteBarrierMemBar crashes XmlTransform In-Reply-To: References: <3c4a3faf-63a4-fa32-3f89-6718c4a6d459@redhat.com> Message-ID: <33a5abbc-5547-909f-f7af-a9754163a52e@redhat.com> On 01/22/2018 11:16 PM, Roman Kennke wrote: > Interestingly, I don't see it with the traversal patch. So maybe something in it fixes it, or the > different graph shapes generated by traversal doesn't trigger it. Maybe try with the latest patch > from the 'Traversal GC' thread? Actually it fails with Traversal GC patch too, although much less (intermittently). I see that Traversal GC disables some WB-related optimizations with do_evac flags, but it seems the graph is still incorrect and it fails. # Internal Error (/home/shade/trunks/shenandoah-jdk10/src/hotspot/share/opto/loopopts.cpp:1537), pid=61675, tid=61700 # Error: assert(b->is_Bool()) failed V [libjvm.so+0x1169dc6] PhaseIdealLoop::clone_iff(PhiNode*, IdealLoopTree*)+0x86 V [libjvm.so+0x116e10c] PhaseIdealLoop::clone_loop(IdealLoopTree*, Node_List&, int, PhaseIdealLoop::CloneLoopMode, Node*)+0x10ec V [libjvm.so+0x11444bc] PhaseIdealLoop::create_slow_version_of_loop(IdealLoopTree*, Node_List&, int, PhaseIdealLoop::CloneLoopMode)+0xcac V [libjvm.so+0x1149735] PhaseIdealLoop::do_unswitching(IdealLoopTree*, Node_List&, bool)+0x125 V [libjvm.so+0x113f263] IdealLoopTree::iteration_split(PhaseIdealLoop*, Node_List&)+0x163 V [libjvm.so+0x113f176] IdealLoopTree::iteration_split(PhaseIdealLoop*, Node_List&)+0x76 Anyhow, it should be fixed before Traversal GC arrives, because the ShWBMemBar should be independently backportable. -Aleksey From shade at redhat.com Tue Jan 23 11:05:15 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 23 Jan 2018 12:05:15 +0100 Subject: RFR: Use properly-scoped FormatBuffers instead of err_msg when message is retained Message-ID: http://cr.openjdk.java.net/~shade/shenandoah/formatbuffers/webrev.01/ It seems we cannot use err_msg the way we do it now, when its result is retained for later. In this case, we have to use the properly-scoped FormatBuffer that has clear lifetime. Otherwise, on some platforms and compilers we have this: [0.620s][info][gc] GC(0) Pause Init Mark 5.032ms [0.661s][info][gc] GC(0) \u0008 1M->1M(7912M) 40.084ms [0.708s][info][gc] GC(0) \u0008 47.164ms Testing: hotspot_gc_shenandoah, eyeballing the GC logs on failing configs Thanks, -Aleksey From rkennke at redhat.com Tue Jan 23 11:23:40 2018 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 23 Jan 2018 12:23:40 +0100 Subject: RFR: Use properly-scoped FormatBuffers instead of err_msg when message is retained In-Reply-To: References: Message-ID: <1AA42139-11CC-490C-B1D8-550CBBDD5D26@redhat.com> Yes please Am 23. Januar 2018 12:05:15 MEZ schrieb Aleksey Shipilev : >http://cr.openjdk.java.net/~shade/shenandoah/formatbuffers/webrev.01/ > >It seems we cannot use err_msg the way we do it now, when its result is >retained for later. In this >case, we have to use the properly-scoped FormatBuffer that has clear >lifetime. Otherwise, on some >platforms and compilers we have this: > >[0.620s][info][gc] GC(0) Pause Init Mark 5.032ms >[0.661s][info][gc] GC(0) \u0008 1M->1M(7912M) 40.084ms >[0.708s][info][gc] GC(0) \u0008 47.164ms > >Testing: hotspot_gc_shenandoah, eyeballing the GC logs on failing >configs > >Thanks, >-Aleksey -- Diese Nachricht wurde von meinem Android-Ger?t mit K-9 Mail gesendet. From ashipile at redhat.com Tue Jan 23 11:39:30 2018 From: ashipile at redhat.com (ashipile at redhat.com) Date: Tue, 23 Jan 2018 11:39:30 +0000 Subject: hg: shenandoah/jdk10: Use properly-scoped FormatBuffers instead of err_msg when message is retained Message-ID: <201801231139.w0NBdVOk011467@aojmv0008.oracle.com> Changeset: 6b22dfb1ca65 Author: shade Date: 2018-01-23 11:56 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk10/rev/6b22dfb1ca65 Use properly-scoped FormatBuffers instead of err_msg when message is retained ! src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp From shade at redhat.com Tue Jan 23 16:06:20 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 23 Jan 2018 17:06:20 +0100 Subject: RFR: Traveral GC heuristics In-Reply-To: <75c6db4a-3236-dcd8-a4f4-67b78103129c@redhat.com> References: <9e18ba8d-593f-57a7-12cc-99cffe164a88@redhat.com> <46b64ff3-d2df-b2da-0805-5fa7cf8593f0@redhat.com> <5d9ea546-927b-28eb-0dd8-1ee8f4192862@redhat.com> <75c6db4a-3236-dcd8-a4f4-67b78103129c@redhat.com> Message-ID: On 01/22/2018 11:17 PM, Roman Kennke wrote: > - when growing the heap, we must make sure that the TAMS points to end for the new regions, > otherwise we'd treat them implicitely marked. Is this the source of this spooky change? inline ShenandoahHeapRegion* get(size_t i) const { - assert (i < _active_end, "sanity"); + assert (i < _reserved_end, "sanity"); return _regions[i]; } get() is supposed to only return added regions. I think if you return something farther the _active_end, you read garbage. > - folded 'SATB' queue processing into thread stack scanning. The problem here is that iterating > the threads 2x is cumbersome because of the claiming protocol: we need to fire the task/workers > 2x: once for the SATB queues, once for the thread scanning. I folded it into one pass. This > required a (trivial) extension in the upstream parallel thread scanning/iteration protocol. So this is the source for extension of RootProcessor and Thread methods? I am a bit uneasy about this (mostly because it raises backporting questions). I'd rather include this into RootProcessor right away, and assert nothing passes non-NULL ThreadClosure there. Then, sh/jdk8u, sh/jdk9 and sh/jdk10 versions would agree on the shape of RootProcessor methods and the calls to it, while sh/jdk10 would call RootProcessor with non-NULL ThreadClosure, and it would *also* implement the relevant parts in Thread. > - I tripped an assert in SHR::increase_live_data(). I think the reason is that we have a race > here: a GC thread might not yet see the updated SHR::_top but already accounts for the updated > live data. I excluded conc-traversal from that check. This could probably be fixed by doing the > proper concurrency membars, but do we care? For assertion code? But wait, this change means "s" is greater than max_jint on some paths during Traversal?! inline void ShenandoahHeapRegion::increase_live_data_words(size_t s) { - assert (s <= (size_t)max_jint, "sanity"); + assert (s <= (size_t)max_jint || _heap->is_concurrent_traversal_in_progress(), "sanity"); increase_live_data_words((int)s); } Also, I am confused where Traversal calls increase_live_data_words(size_t), because both call sites are already protected: if (!sh->is_concurrent_traversal_in_progress()) { r->increase_live_data_words(used_words); } ... if (!ShenandoahHeap::heap()->is_concurrent_traversal_in_progress()) { r->increase_live_data_words(word_size); } > Full: > http://cr.openjdk.java.net/~rkennke/traversal/webrev.04/ Since we have "adaptive" failures with C2 and/or -ShWBMemBar, I propose we chicken out, and drop all C2 changes (apart from the actual enqueue_barrier) from this change, then follow up on optimization story in subsequent changesets. This way we could integrated Traversal GC, and not risk immediate regression in non-Traversal code. This is done, along with other minor touchups here (apply over webrev.04): http://cr.openjdk.java.net/~shade/shenandoah/traversal-shade-updates-1.patch Thanks, -Aleksey From rkennke at redhat.com Tue Jan 23 17:23:49 2018 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 23 Jan 2018 18:23:49 +0100 Subject: RFR: Traveral GC heuristics In-Reply-To: References: <9e18ba8d-593f-57a7-12cc-99cffe164a88@redhat.com> <46b64ff3-d2df-b2da-0805-5fa7cf8593f0@redhat.com> <5d9ea546-927b-28eb-0dd8-1ee8f4192862@redhat.com> <75c6db4a-3236-dcd8-a4f4-67b78103129c@redhat.com> Message-ID: Am 23. Januar 2018 17:06:20 MEZ schrieb Aleksey Shipilev : >On 01/22/2018 11:17 PM, Roman Kennke wrote: >> - when growing the heap, we must make sure that the TAMS points to >end for the new regions, >> otherwise we'd treat them implicitely marked. > >Is this the source of this spooky change? > > inline ShenandoahHeapRegion* get(size_t i) const { >- assert (i < _active_end, "sanity"); >+ assert (i < _reserved_end, "sanity"); > return _regions[i]; > } > >get() is supposed to only return added regions. I think if you return >something farther the >_active_end, you read garbage. I think all regions are initialized, but no memory allocated? >> - folded 'SATB' queue processing into thread stack scanning. The >problem here is that iterating >> the threads 2x is cumbersome because of the claiming protocol: we >need to fire the task/workers >> 2x: once for the SATB queues, once for the thread scanning. I folded >it into one pass. This >> required a (trivial) extension in the upstream parallel thread >scanning/iteration protocol. > >So this is the source for extension of RootProcessor and Thread >methods? Yes. > I am a bit uneasy about >this (mostly because it raises backporting questions). I'd rather >include this into RootProcessor >right away, and assert nothing passes non-NULL ThreadClosure there. OK, can break that out of the patch. >Then, sh/jdk8u, sh/jdk9 and >sh/jdk10 versions would agree on the shape of RootProcessor methods and >the calls to it, while >sh/jdk10 would call RootProcessor with non-NULL ThreadClosure, and it >would *also* implement the >relevant parts in Thread. Makes sense. >> - I tripped an assert in SHR::increase_live_data(). I think the >reason is that we have a race >> here: a GC thread might not yet see the updated SHR::_top but already >accounts for the updated >> live data. I excluded conc-traversal from that check. This could >probably be fixed by doing the >> proper concurrency membars, but do we care? For assertion code? > >But wait, this change means "s" is greater than max_jint on some paths >during Traversal?! > > inline void ShenandoahHeapRegion::increase_live_data_words(size_t s) { >- assert (s <= (size_t)max_jint, "sanity"); >+ assert (s <= (size_t)max_jint || >_heap->is_concurrent_traversal_in_progress(), "sanity"); > increase_live_data_words((int)s); > } Gah. No. Will fix it. Stay tuned for updated patch. > >Also, I am confused where Traversal calls >increase_live_data_words(size_t), because both call sites >are already protected: > > if (!sh->is_concurrent_traversal_in_progress()) { > r->increase_live_data_words(used_words); > } > >... > > if (!ShenandoahHeap::heap()->is_concurrent_traversal_in_progress()) { > r->increase_live_data_words(word_size); > } > >> Full: >> http://cr.openjdk.java.net/~rkennke/traversal/webrev.04/ > >Since we have "adaptive" failures with C2 and/or -ShWBMemBar, I propose >we chicken out, and drop all >C2 changes (apart from the actual enqueue_barrier) from this change, >then follow up on optimization >story in subsequent changesets. This way we could integrated Traversal >GC, and not risk immediate >regression in non-Traversal code. > >This is done, along with other minor touchups here (apply over >webrev.04): >http://cr.openjdk.java.net/~shade/shenandoah/traversal-shade-updates-1.patch > >Thanks, >-Aleksey -- Diese Nachricht wurde von meinem Android-Ger?t mit K-9 Mail gesendet. From rkennke at redhat.com Tue Jan 23 20:24:03 2018 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 23 Jan 2018 21:24:03 +0100 Subject: RFR: Add ShenandoahRootProcessor API to report threads while scanning roots Message-ID: <430edf0d-8135-577b-2beb-629ac078d64e@redhat.com> As discussed in the Traversal GC thread, this breaks out the ShenandoahRootProcessor API to report threads while scanning roots. It is not implemented here, and only asserts that the ThreadClosure* is NULL. All call-sites are updated to pass NULL. The idea is to make backporting easier/less conflict-prone. http://cr.openjdk.java.net/~rkennke/root-proc-threads/webrev.00/ Test: hotspot_gc_shenandoah Ok? From shade at redhat.com Tue Jan 23 20:32:28 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 23 Jan 2018 21:32:28 +0100 Subject: RFR: Add ShenandoahRootProcessor API to report threads while scanning roots In-Reply-To: <430edf0d-8135-577b-2beb-629ac078d64e@redhat.com> References: <430edf0d-8135-577b-2beb-629ac078d64e@redhat.com> Message-ID: On 01/23/2018 09:24 PM, Roman Kennke wrote: > As discussed in the Traversal GC thread, this breaks out the ShenandoahRootProcessor API to report > threads while scanning roots. It is not implemented here, and only asserts that the ThreadClosure* > is NULL. All call-sites are updated to pass NULL. > > The idea is to make backporting easier/less conflict-prone. > > http://cr.openjdk.java.net/~rkennke/root-proc-threads/webrev.00/ > > Test: hotspot_gc_shenandoah > > Ok? OK! -Aleksey From roman at kennke.org Tue Jan 23 20:38:24 2018 From: roman at kennke.org (roman at kennke.org) Date: Tue, 23 Jan 2018 20:38:24 +0000 Subject: hg: shenandoah/jdk10: Add ShenandoahRootProcessor API to report threads while scanning roots Message-ID: <201801232038.w0NKcOWu006989@aojmv0008.oracle.com> Changeset: bd01b07ba0d7 Author: rkennke Date: 2018-01-23 21:20 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk10/rev/bd01b07ba0d7 Add ShenandoahRootProcessor API to report threads while scanning roots ! src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp ! src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp ! src/hotspot/share/gc/shenandoah/shenandoahMarkCompact.cpp ! src/hotspot/share/gc/shenandoah/shenandoahPartialGC.cpp ! src/hotspot/share/gc/shenandoah/shenandoahRootProcessor.cpp ! src/hotspot/share/gc/shenandoah/shenandoahRootProcessor.hpp From rkennke at redhat.com Tue Jan 23 20:41:10 2018 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 23 Jan 2018 21:41:10 +0100 Subject: RFR: Traveral GC heuristics In-Reply-To: References: <9e18ba8d-593f-57a7-12cc-99cffe164a88@redhat.com> <46b64ff3-d2df-b2da-0805-5fa7cf8593f0@redhat.com> <5d9ea546-927b-28eb-0dd8-1ee8f4192862@redhat.com> <75c6db4a-3236-dcd8-a4f4-67b78103129c@redhat.com> Message-ID: Am 23.01.2018 um 17:06 schrieb Aleksey Shipilev: > On 01/22/2018 11:17 PM, Roman Kennke wrote: >> - when growing the heap, we must make sure that the TAMS points to end for the new regions, >> otherwise we'd treat them implicitely marked. > > Is this the source of this spooky change? > > inline ShenandoahHeapRegion* get(size_t i) const { > - assert (i < _active_end, "sanity"); > + assert (i < _reserved_end, "sanity"); > return _regions[i]; > } > > get() is supposed to only return added regions. I think if you return something farther the > _active_end, you read garbage. All SHR are created and added to the regions list at the start. Which means iterating to active_end actually does what I wanted. Reverted back that change. >> - folded 'SATB' queue processing into thread stack scanning. The problem here is that iterating >> the threads 2x is cumbersome because of the claiming protocol: we need to fire the task/workers >> 2x: once for the SATB queues, once for the thread scanning. I folded it into one pass. This >> required a (trivial) extension in the upstream parallel thread scanning/iteration protocol. > > So this is the source for extension of RootProcessor and Thread methods? I am a bit uneasy about > this (mostly because it raises backporting questions). I'd rather include this into RootProcessor > right away, and assert nothing passes non-NULL ThreadClosure there. Then, sh/jdk8u, sh/jdk9 and > sh/jdk10 versions would agree on the shape of RootProcessor methods and the calls to it, while > sh/jdk10 would call RootProcessor with non-NULL ThreadClosure, and it would *also* implement the > relevant parts in Thread. Done in separate patch. >> - I tripped an assert in SHR::increase_live_data(). I think the reason is that we have a race >> here: a GC thread might not yet see the updated SHR::_top but already accounts for the updated >> live data. I excluded conc-traversal from that check. This could probably be fixed by doing the >> proper concurrency membars, but do we care? For assertion code? > > But wait, this change means "s" is greater than max_jint on some paths during Traversal?! > > inline void ShenandoahHeapRegion::increase_live_data_words(size_t s) { > - assert (s <= (size_t)max_jint, "sanity"); > + assert (s <= (size_t)max_jint || _heap->is_concurrent_traversal_in_progress(), "sanity"); > increase_live_data_words((int)s); > } I removed the change to that assert. We only need the other one. > Also, I am confused where Traversal calls increase_live_data_words(size_t), because both call sites > are already protected: > > if (!sh->is_concurrent_traversal_in_progress()) { > r->increase_live_data_words(used_words); > } > > ... > > if (!ShenandoahHeap::heap()->is_concurrent_traversal_in_progress()) { > r->increase_live_data_words(word_size); > } It's called from Traveral's own code. > Since we have "adaptive" failures with C2 and/or -ShWBMemBar, I propose we chicken out, and drop all > C2 changes (apart from the actual enqueue_barrier) from this change, then follow up on optimization > story in subsequent changesets. This way we could integrated Traversal GC, and not risk immediate > regression in non-Traversal code. > > This is done, along with other minor touchups here (apply over webrev.04): > http://cr.openjdk.java.net/~shade/shenandoah/traversal-shade-updates-1.patch Cool, thanks. Differential patch: http://cr.openjdk.java.net/~rkennke/traversal/webrev.05.diff/ Full patch, including your changes: http://cr.openjdk.java.net/~rkennke/traversal/webrev.05/ (give it some seconds to fully upload) Roman From shade at redhat.com Wed Jan 24 10:11:11 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 24 Jan 2018 11:11:11 +0100 Subject: RFR: Traveral GC heuristics In-Reply-To: References: <9e18ba8d-593f-57a7-12cc-99cffe164a88@redhat.com> <46b64ff3-d2df-b2da-0805-5fa7cf8593f0@redhat.com> <5d9ea546-927b-28eb-0dd8-1ee8f4192862@redhat.com> <75c6db4a-3236-dcd8-a4f4-67b78103129c@redhat.com> Message-ID: On 01/23/2018 09:41 PM, Roman Kennke wrote: > Differential patch: > http://cr.openjdk.java.net/~rkennke/traversal/webrev.05.diff/ > Full patch, including your changes: > http://cr.openjdk.java.net/~rkennke/traversal/webrev.05/ Okay! This looks safe enough to push. I have a minor question about why this is needed: 1657 case _degenerated_outside_cycle: 1658 if (shenandoahPolicy()->can_do_traversal_gc()) { 1659 // Not possible to degenerate from here, upgrade to Full GC right away. 1660 cancel_concgc(GCCause::_allocation_failure); 1661 op_degenerated_fail(); 1662 return; 1663 } Aren't we good with the usual Degenerated GC cycle here? -Aleksey From shade at redhat.com Wed Jan 24 10:26:22 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 24 Jan 2018 11:26:22 +0100 Subject: RFR: Degenerated GC: shortcut cycles, upgrade futile cycles Message-ID: <3283e088-cc9e-fe58-ae0e-10e40d41538b@redhat.com> http://cr.openjdk.java.net/~shade/shenandoah/degenerated-gc-shortcuts/webrev.01/ This makes Degenerated GCs much less painful: they shortcut like concurrent cycle does, and they do not try to do back-to-back degens when memory is not reclaimed. Testing: hotspot_gc_shenandoah Thanks, -Aleksey From shade at redhat.com Wed Jan 24 11:01:23 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 24 Jan 2018 12:01:23 +0100 Subject: RFR: Log concurrent mark that updates references Message-ID: http://cr.openjdk.java.net/~shade/shenandoah/mark-message-ur/webrev.01/ Small follow-up, we can actually print if we are running CM-with-UR or not. Testing: hotspot_fast_gc_shenandoah, eyeballing logs Thanks, -Aleksey From rkennke at redhat.com Wed Jan 24 11:12:38 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 24 Jan 2018 12:12:38 +0100 Subject: RFR: Traveral GC heuristics In-Reply-To: References: <9e18ba8d-593f-57a7-12cc-99cffe164a88@redhat.com> <46b64ff3-d2df-b2da-0805-5fa7cf8593f0@redhat.com> <5d9ea546-927b-28eb-0dd8-1ee8f4192862@redhat.com> <75c6db4a-3236-dcd8-a4f4-67b78103129c@redhat.com> Message-ID: Am 24.01.2018 um 11:11 schrieb Aleksey Shipilev: > On 01/23/2018 09:41 PM, Roman Kennke wrote: >> Differential patch: >> http://cr.openjdk.java.net/~rkennke/traversal/webrev.05.diff/ >> Full patch, including your changes: >> http://cr.openjdk.java.net/~rkennke/traversal/webrev.05/ > > Okay! This looks safe enough to push. > > I have a minor question about why this is needed: > > 1657 case _degenerated_outside_cycle: > 1658 if (shenandoahPolicy()->can_do_traversal_gc()) { > 1659 // Not possible to degenerate from here, upgrade to Full GC right away. > 1660 cancel_concgc(GCCause::_allocation_failure); > 1661 op_degenerated_fail(); > 1662 return; > 1663 } > > Aren't we good with the usual Degenerated GC cycle here? > > -Aleksey > > > The problem is that degen_outside_cycles goes into normal marking, and something's not up for that. I'm hitting asserts when I go there. To be honest, I am also not happy to have all this heuristics-specific code/branches all over the place. Could this stuff be abstracted into heuristics API? I.e. driver thread calls into heuristics to do stuff (e.g. normal-degen, degen-outside-cycle, but also other stuff that is currently sprinkled over different places), and heuristics calls the right thing to take care of it? Roman From rkennke at redhat.com Wed Jan 24 11:14:16 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 24 Jan 2018 12:14:16 +0100 Subject: RFR: Degenerated GC: shortcut cycles, upgrade futile cycles In-Reply-To: <3283e088-cc9e-fe58-ae0e-10e40d41538b@redhat.com> References: <3283e088-cc9e-fe58-ae0e-10e40d41538b@redhat.com> Message-ID: Am 24.01.2018 um 11:26 schrieb Aleksey Shipilev: > http://cr.openjdk.java.net/~shade/shenandoah/degenerated-gc-shortcuts/webrev.01/ > > This makes Degenerated GCs much less painful: they shortcut like concurrent cycle does, and they do > not try to do back-to-back degens when memory is not reclaimed. > > Testing: hotspot_gc_shenandoah > > Thanks, > -Aleksey > Ok From rkennke at redhat.com Wed Jan 24 11:14:25 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 24 Jan 2018 12:14:25 +0100 Subject: RFR: Log concurrent mark that updates references In-Reply-To: References: Message-ID: Am 24.01.2018 um 12:01 schrieb Aleksey Shipilev: > http://cr.openjdk.java.net/~shade/shenandoah/mark-message-ur/webrev.01/ > > Small follow-up, we can actually print if we are running CM-with-UR or not. > > Testing: hotspot_fast_gc_shenandoah, eyeballing logs > > Thanks, > -Aleksey > Ok From shade at redhat.com Wed Jan 24 11:14:47 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 24 Jan 2018 12:14:47 +0100 Subject: RFR: Traveral GC heuristics In-Reply-To: References: <9e18ba8d-593f-57a7-12cc-99cffe164a88@redhat.com> <46b64ff3-d2df-b2da-0805-5fa7cf8593f0@redhat.com> <5d9ea546-927b-28eb-0dd8-1ee8f4192862@redhat.com> <75c6db4a-3236-dcd8-a4f4-67b78103129c@redhat.com> Message-ID: On 01/24/2018 12:12 PM, Roman Kennke wrote: > Am 24.01.2018 um 11:11 schrieb Aleksey Shipilev: >> On 01/23/2018 09:41 PM, Roman Kennke wrote: >>> Differential patch: >>> http://cr.openjdk.java.net/~rkennke/traversal/webrev.05.diff/ >>> Full patch, including your changes: >>> http://cr.openjdk.java.net/~rkennke/traversal/webrev.05/ >> >> Okay! This looks safe enough to push. >> >> I have a minor question about why this is needed: >> >> 1657???? case _degenerated_outside_cycle: >> 1658?????? if (shenandoahPolicy()->can_do_traversal_gc()) { >> 1659???????? // Not possible to degenerate from here, upgrade to Full GC right away. >> 1660???????? cancel_concgc(GCCause::_allocation_failure); >> 1661???????? op_degenerated_fail(); >> 1662???????? return; >> 1663?????? } >> >> Aren't we good with the usual Degenerated GC cycle here? >> > > The problem is that degen_outside_cycles goes into normal marking, and something's not up for that. > I'm hitting asserts when I go there. That probably indicates a bug? Traversal GC is ought to leave the heap in the state that is ready for the usual concurrent cycle, no? > To be honest, I am also not happy to have all this heuristics-specific code/branches all over the > place. Could this stuff be abstracted into heuristics API? I.e. driver thread calls into heuristics > to do stuff (e.g. normal-degen, degen-outside-cycle, but also other stuff that is currently > sprinkled over different places), and heuristics calls the right thing to take care of it? Baby steps... -Aleksey From rkennke at redhat.com Wed Jan 24 11:17:05 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 24 Jan 2018 12:17:05 +0100 Subject: RFR: Traveral GC heuristics In-Reply-To: References: <9e18ba8d-593f-57a7-12cc-99cffe164a88@redhat.com> <46b64ff3-d2df-b2da-0805-5fa7cf8593f0@redhat.com> <5d9ea546-927b-28eb-0dd8-1ee8f4192862@redhat.com> <75c6db4a-3236-dcd8-a4f4-67b78103129c@redhat.com> Message-ID: <7273cfa6-7b01-af08-db88-43724cd4570f@redhat.com> Am 24.01.2018 um 12:14 schrieb Aleksey Shipilev: > On 01/24/2018 12:12 PM, Roman Kennke wrote: >> Am 24.01.2018 um 11:11 schrieb Aleksey Shipilev: >>> On 01/23/2018 09:41 PM, Roman Kennke wrote: >>>> Differential patch: >>>> http://cr.openjdk.java.net/~rkennke/traversal/webrev.05.diff/ >>>> Full patch, including your changes: >>>> http://cr.openjdk.java.net/~rkennke/traversal/webrev.05/ >>> >>> Okay! This looks safe enough to push. >>> >>> I have a minor question about why this is needed: >>> >>> 1657???? case _degenerated_outside_cycle: >>> 1658?????? if (shenandoahPolicy()->can_do_traversal_gc()) { >>> 1659???????? // Not possible to degenerate from here, upgrade to Full GC right away. >>> 1660???????? cancel_concgc(GCCause::_allocation_failure); >>> 1661???????? op_degenerated_fail(); >>> 1662???????? return; >>> 1663?????? } >>> >>> Aren't we good with the usual Degenerated GC cycle here? >>> >> >> The problem is that degen_outside_cycles goes into normal marking, and something's not up for that. >> I'm hitting asserts when I go there. > > That probably indicates a bug? Traversal GC is ought to leave the heap in the state that is ready > for the usual concurrent cycle, no? In the middle of traversal GC? I don't know... Also, cannot, in any case, expect *concurrent* cycle to work: we don't have the barriers for that. Theoretically, we could do STW normal cycle, but what would be the point? I'd rather have a STW degen traversal pickup. >> To be honest, I am also not happy to have all this heuristics-specific code/branches all over the >> place. Could this stuff be abstracted into heuristics API? I.e. driver thread calls into heuristics >> to do stuff (e.g. normal-degen, degen-outside-cycle, but also other stuff that is currently >> sprinkled over different places), and heuristics calls the right thing to take care of it? > > Baby steps... ;-) From shade at redhat.com Wed Jan 24 11:19:29 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 24 Jan 2018 12:19:29 +0100 Subject: RFR: Traveral GC heuristics In-Reply-To: <7273cfa6-7b01-af08-db88-43724cd4570f@redhat.com> References: <9e18ba8d-593f-57a7-12cc-99cffe164a88@redhat.com> <46b64ff3-d2df-b2da-0805-5fa7cf8593f0@redhat.com> <5d9ea546-927b-28eb-0dd8-1ee8f4192862@redhat.com> <75c6db4a-3236-dcd8-a4f4-67b78103129c@redhat.com> <7273cfa6-7b01-af08-db88-43724cd4570f@redhat.com> Message-ID: <407a3385-6981-549f-6aff-f0362cb5d3f3@redhat.com> On 01/24/2018 12:17 PM, Roman Kennke wrote: > Am 24.01.2018 um 12:14 schrieb Aleksey Shipilev: >> On 01/24/2018 12:12 PM, Roman Kennke wrote: >>> Am 24.01.2018 um 11:11 schrieb Aleksey Shipilev: >>>> On 01/23/2018 09:41 PM, Roman Kennke wrote: >>>>> Differential patch: >>>>> http://cr.openjdk.java.net/~rkennke/traversal/webrev.05.diff/ >>>>> Full patch, including your changes: >>>>> http://cr.openjdk.java.net/~rkennke/traversal/webrev.05/ >>>> >>>> Okay! This looks safe enough to push. >>>> >>>> I have a minor question about why this is needed: >>>> >>>> 1657???? case _degenerated_outside_cycle: >>>> 1658?????? if (shenandoahPolicy()->can_do_traversal_gc()) { >>>> 1659???????? // Not possible to degenerate from here, upgrade to Full GC right away. >>>> 1660???????? cancel_concgc(GCCause::_allocation_failure); >>>> 1661???????? op_degenerated_fail(); >>>> 1662???????? return; >>>> 1663?????? } >>>> >>>> Aren't we good with the usual Degenerated GC cycle here? >>>> >>> >>> The problem is that degen_outside_cycles goes into normal marking, and something's not up for that. >>> I'm hitting asserts when I go there. >> >> That probably indicates a bug? Traversal GC is ought to leave the heap in the state that is ready >> for the usual concurrent cycle, no? > > In the middle of traversal GC? I don't know... Also, cannot, in any case, expect *concurrent* cycle > to work: we don't have the barriers for that. Theoretically, we could do STW normal cycle, but what > would be the point? I'd rather have a STW degen traversal pickup. "outside cycle" means you are out of Traversal GC already -- that means outside the *complete* cycle. So, here is where Traversal differs from Partial? Partial may be followed by the normal concurrent cycle, and Traversal can only run Traversals? -Aleksey From roman at kennke.org Wed Jan 24 11:21:29 2018 From: roman at kennke.org (Roman Kennke) Date: Wed, 24 Jan 2018 12:21:29 +0100 Subject: RFR: Traveral GC heuristics In-Reply-To: <407a3385-6981-549f-6aff-f0362cb5d3f3@redhat.com> References: <9e18ba8d-593f-57a7-12cc-99cffe164a88@redhat.com> <46b64ff3-d2df-b2da-0805-5fa7cf8593f0@redhat.com> <5d9ea546-927b-28eb-0dd8-1ee8f4192862@redhat.com> <75c6db4a-3236-dcd8-a4f4-67b78103129c@redhat.com> <7273cfa6-7b01-af08-db88-43724cd4570f@redhat.com> <407a3385-6981-549f-6aff-f0362cb5d3f3@redhat.com> Message-ID: Am 24. Januar 2018 12:19:29 MEZ schrieb Aleksey Shipilev : >On 01/24/2018 12:17 PM, Roman Kennke wrote: >> Am 24.01.2018 um 12:14 schrieb Aleksey Shipilev: >>> On 01/24/2018 12:12 PM, Roman Kennke wrote: >>>> Am 24.01.2018 um 11:11 schrieb Aleksey Shipilev: >>>>> On 01/23/2018 09:41 PM, Roman Kennke wrote: >>>>>> Differential patch: >>>>>> http://cr.openjdk.java.net/~rkennke/traversal/webrev.05.diff/ >>>>>> Full patch, including your changes: >>>>>> http://cr.openjdk.java.net/~rkennke/traversal/webrev.05/ >>>>> >>>>> Okay! This looks safe enough to push. >>>>> >>>>> I have a minor question about why this is needed: >>>>> >>>>> 1657???? case _degenerated_outside_cycle: >>>>> 1658?????? if (shenandoahPolicy()->can_do_traversal_gc()) { >>>>> 1659???????? // Not possible to degenerate from here, upgrade to >Full GC right away. >>>>> 1660???????? cancel_concgc(GCCause::_allocation_failure); >>>>> 1661???????? op_degenerated_fail(); >>>>> 1662???????? return; >>>>> 1663?????? } >>>>> >>>>> Aren't we good with the usual Degenerated GC cycle here? >>>>> >>>> >>>> The problem is that degen_outside_cycles goes into normal marking, >and something's not up for that. >>>> I'm hitting asserts when I go there. >>> >>> That probably indicates a bug? Traversal GC is ought to leave the >heap in the state that is ready >>> for the usual concurrent cycle, no? >> >> In the middle of traversal GC? I don't know... Also, cannot, in any >case, expect *concurrent* cycle >> to work: we don't have the barriers for that. Theoretically, we could >do STW normal cycle, but what >> would be the point? I'd rather have a STW degen traversal pickup. > >"outside cycle" means you are out of Traversal GC already -- that means >outside the *complete* >cycle. So, here is where Traversal differs from Partial? Partial may be >followed by the normal >concurrent cycle, and Traversal can only run Traversals? Yes, exactly. Traversal is not a minor GC like partial would be. It is much like normal concept GC. -- Diese Nachricht wurde von meinem Android-Ger?t mit K-9 Mail gesendet. From roman at kennke.org Wed Jan 24 13:03:20 2018 From: roman at kennke.org (roman at kennke.org) Date: Wed, 24 Jan 2018 13:03:20 +0000 Subject: hg: shenandoah/jdk10: Traversal GC heuristics Message-ID: <201801241303.w0OD3K3C000435@aojmv0008.oracle.com> Changeset: 36640d8dec5f Author: rkennke Date: 2018-01-24 13:57 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk10/rev/36640d8dec5f Traversal GC heuristics ! make/hotspot/lib/JvmOverrideFiles.gmk ! src/hotspot/cpu/x86/c1_Runtime1_x86.cpp ! src/hotspot/cpu/x86/macroAssembler_x86.cpp ! src/hotspot/cpu/x86/shenandoahBarrierSet_x86.cpp ! src/hotspot/cpu/x86/stubGenerator_x86_64.cpp ! src/hotspot/cpu/x86/templateTable_x86.cpp ! src/hotspot/share/c1/c1_LIR.hpp ! src/hotspot/share/c1/c1_LIRGenerator.cpp ! src/hotspot/share/gc/shared/barrierSet.hpp ! src/hotspot/share/gc/shared/gcCause.cpp ! src/hotspot/share/gc/shared/gcCause.hpp ! src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.cpp ! src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.hpp ! src/hotspot/share/gc/shenandoah/shenandoahCollectorPolicy.cpp ! src/hotspot/share/gc/shenandoah/shenandoahCollectorPolicy.hpp ! src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp ! src/hotspot/share/gc/shenandoah/shenandoahConcurrentThread.cpp ! src/hotspot/share/gc/shenandoah/shenandoahConcurrentThread.hpp ! src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp ! src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp ! src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp ! src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp ! src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.inline.hpp ! src/hotspot/share/gc/shenandoah/shenandoahMarkCompact.cpp ! src/hotspot/share/gc/shenandoah/shenandoahOopClosures.hpp ! src/hotspot/share/gc/shenandoah/shenandoahOopClosures.inline.hpp ! src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.cpp ! src/hotspot/share/gc/shenandoah/shenandoahPhaseTimings.hpp ! src/hotspot/share/gc/shenandoah/shenandoahRootProcessor.cpp ! src/hotspot/share/gc/shenandoah/shenandoahRootProcessor.hpp + src/hotspot/share/gc/shenandoah/shenandoahTraversalGC.cpp + src/hotspot/share/gc/shenandoah/shenandoahTraversalGC.hpp + src/hotspot/share/gc/shenandoah/shenandoahTraversalGC.inline.hpp ! src/hotspot/share/gc/shenandoah/shenandoahUtils.hpp ! src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp ! src/hotspot/share/gc/shenandoah/shenandoahVerifier.hpp ! src/hotspot/share/gc/shenandoah/shenandoahWorkerPolicy.cpp ! src/hotspot/share/gc/shenandoah/shenandoahWorkerPolicy.hpp ! src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp ! src/hotspot/share/gc/shenandoah/shenandoah_specialized_oop_closures.hpp ! src/hotspot/share/gc/shenandoah/vm_operations_shenandoah.cpp ! src/hotspot/share/gc/shenandoah/vm_operations_shenandoah.hpp ! src/hotspot/share/opto/graphKit.cpp ! src/hotspot/share/opto/graphKit.hpp ! src/hotspot/share/runtime/sharedRuntime.cpp ! src/hotspot/share/runtime/thread.cpp ! src/hotspot/share/runtime/thread.hpp ! src/hotspot/share/runtime/vm_operations.hpp ! test/hotspot/jtreg/gc/shenandoah/LotsOfCycles.java ! test/hotspot/jtreg/gc/shenandoah/ShenandoahStrDedupStress.java ! test/hotspot/jtreg/gc/shenandoah/TestGCThreadGroups.java ! test/hotspot/jtreg/gc/shenandoah/TestPeriodicGC.java ! test/hotspot/jtreg/gc/shenandoah/TestRegionSampling.java ! test/hotspot/jtreg/gc/shenandoah/TestSelectiveBarrierFlags.java ! test/hotspot/jtreg/gc/shenandoah/TestShenandoahStrDedup.java ! test/hotspot/jtreg/gc/shenandoah/acceptance/AllocHumongousFragment.java ! test/hotspot/jtreg/gc/shenandoah/acceptance/AllocIntArrays.java ! test/hotspot/jtreg/gc/shenandoah/acceptance/AllocObjectArrays.java ! test/hotspot/jtreg/gc/shenandoah/acceptance/AllocObjects.java ! test/hotspot/jtreg/gc/shenandoah/acceptance/HeapUncommit.java ! test/hotspot/jtreg/gc/shenandoah/acceptance/RetainObjects.java ! test/hotspot/jtreg/gc/shenandoah/acceptance/SieveObjects.java ! test/hotspot/jtreg/gc/shenandoah/acceptance/StringInternCleanup.java ! test/hotspot/jtreg/gc/shenandoah/options/TestHeuristicsUnlock.java ! test/hotspot/jtreg/gc/stress/gcbasher/TestGCBasherWithShenandoah.java ! test/hotspot/jtreg/gc/stress/gcold/TestGCOldWithShenandoah.java From rkennke at redhat.com Wed Jan 24 14:12:05 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 24 Jan 2018 15:12:05 +0100 Subject: RFR: Relax assert in SBS::is_safe() Message-ID: <45c52e7d-8251-def1-3277-50494d7ed94e@redhat.com> With traversal I am hitting the assert in SBS::is_safe() (through weakref discovery) because GC got cancelled and the obj is not in to-space. It is not a problem with conc-mark because there we don't evac during marking. The fix is to fall-through the in_cset() check when GC got cancelled, and check if there is an actual copy. http://cr.openjdk.java.net/~rkennke/fixissafe/webrev.00/ Testing: hotspot_gc_shenandoah passes the occasional above failure now. Good? Roman From shade at redhat.com Wed Jan 24 14:18:07 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 24 Jan 2018 15:18:07 +0100 Subject: RFR: Relax assert in SBS::is_safe() In-Reply-To: <45c52e7d-8251-def1-3277-50494d7ed94e@redhat.com> References: <45c52e7d-8251-def1-3277-50494d7ed94e@redhat.com> Message-ID: <51e3ff18-cada-b81f-bffd-0c10b5e90e96@redhat.com> On 01/24/2018 03:12 PM, Roman Kennke wrote: > With traversal I am hitting the assert in SBS::is_safe() (through weakref discovery) because GC got > cancelled and the obj is not in to-space. It is not a problem with conc-mark because there we don't > evac during marking. > > The fix is to fall-through the in_cset() check when GC got cancelled, and check if there is an > actual copy. > > http://cr.openjdk.java.net/~rkennke/fixissafe/webrev.00/ Makes sense. Thanks, -Aleksey From roman at kennke.org Wed Jan 24 14:26:14 2018 From: roman at kennke.org (roman at kennke.org) Date: Wed, 24 Jan 2018 14:26:14 +0000 Subject: hg: shenandoah/jdk10: Relax assert in SBS::is_safe() Message-ID: <201801241426.w0OEQE2i029361@aojmv0008.oracle.com> Changeset: 3a6457fecc72 Author: rkennke Date: 2018-01-24 15:09 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk10/rev/3a6457fecc72 Relax assert in SBS::is_safe() ! src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.cpp From ashipile at redhat.com Wed Jan 24 14:35:13 2018 From: ashipile at redhat.com (ashipile at redhat.com) Date: Wed, 24 Jan 2018 14:35:13 +0000 Subject: hg: shenandoah/jdk10: 2 new changesets Message-ID: <201801241435.w0OEZDwH002674@aojmv0008.oracle.com> Changeset: 15261c4a6adf Author: shade Date: 2018-01-24 15:30 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk10/rev/15261c4a6adf Degenerated GC: shortcut cycles, upgrade futile cycles ! src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp ! src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp Changeset: 351efe4f6d40 Author: shade Date: 2018-01-24 15:30 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk10/rev/351efe4f6d40 Log concurrent mark that updates references ! src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp ! src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp From shade at redhat.com Wed Jan 24 16:31:53 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 24 Jan 2018 17:31:53 +0100 Subject: RFR: Fix Traversal GC regression Message-ID: <67cd6e2f-7c1f-1281-ac35-1d2c0651274f@redhat.com> After Traversal GC commit, normal cycle on Compiler.compiler fails within: V [libjvm.so+0x153c713] oop ShenandoahHeap::evac_update_oop_ref(unsigned int*, bool&)+0x333 V [libjvm.so+0x1539082] ShenandoahBarrierSet::write_ref_array(HeapWord*, unsigned long)+0x852 V [libjvm.so+0x133be02] void ObjArrayKlass::do_copy(arrayOop, unsigned int*, arrayOop, unsigned int*, int, Thread*)+0x142 V [libjvm.so+0x133932b] ObjArrayKlass::copy_array(arrayOop, int, arrayOop, int, int, Thread*)+0x72b V [libjvm.so+0xf23325] JVM_ArrayCopy+0x1e5 The troubling bit is why do we even get here: inline void do_oop_work(T* p) { oop o; if (STOREVAL_WRITE_BARRIER) { bool evac; o = _heap->evac_update_oop_ref(p, evac); <--- ???? if ((ALWAYS_ENQUEUE || evac) && !oopDesc::is_null(o)) { ShenandoahBarrierSet::enqueue(o); } } else { o = _heap->maybe_update_oop_ref(p); } if (UPDATE_MATRIX && !oopDesc::is_null(o)) { _heap->connection_matrix()->set_connected(p, o); } } It happens because the condition in selector is wrong: http://cr.openjdk.java.net/~shade/shenandoah/traversal-regr-1/webrev.01/ (Note the symmetry against the branch at L223. Testing: failing Compiler.compiler -Aleksey From shade at redhat.com Wed Jan 24 16:42:56 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 24 Jan 2018 17:42:56 +0100 Subject: RFR: Unsafe comparison in ShenandoahHeap::evac_update_oop_ref Message-ID: Traversal GC fails in ShenandoahHeap::evac_update_oop_ref when -XX:+VerifyStrictOopOperations is enabled, because we need: $ hg qdiff diff -r 21c595539121 src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp --- a/src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp Wed Jan 24 17:32:27 2018 +0100 +++ b/src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp Wed Jan 24 17:40:35 2018 +0100 @@ -151,7 +151,7 @@ forwarded_oop = evacuate_object(heap_oop, Thread::current(), evac); } oop prev = atomic_compare_exchange_oop(forwarded_oop, p, heap_oop); - if (prev == heap_oop) { + if (oopDesc::unsafe_equals(prev, heap_oop)) { return forwarded_oop; } else { return NULL; This actually affects partial too, which call this method in SVWB. Testing: failing test Thanks, -Aleksey From zgu at redhat.com Wed Jan 24 16:45:26 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 24 Jan 2018 11:45:26 -0500 Subject: RFR: Unsafe comparison in ShenandoahHeap::evac_update_oop_ref In-Reply-To: References: Message-ID: Looks good. -Zhengyu On 01/24/2018 11:42 AM, Aleksey Shipilev wrote: > Traversal GC fails in ShenandoahHeap::evac_update_oop_ref when -XX:+VerifyStrictOopOperations is > enabled, because we need: > > $ hg qdiff > diff -r 21c595539121 src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp > --- a/src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp Wed Jan 24 17:32:27 2018 +0100 > +++ b/src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp Wed Jan 24 17:40:35 2018 +0100 > @@ -151,7 +151,7 @@ > forwarded_oop = evacuate_object(heap_oop, Thread::current(), evac); > } > oop prev = atomic_compare_exchange_oop(forwarded_oop, p, heap_oop); > - if (prev == heap_oop) { > + if (oopDesc::unsafe_equals(prev, heap_oop)) { > return forwarded_oop; > } else { > return NULL; > > This actually affects partial too, which call this method in SVWB. > > Testing: failing test > > Thanks, > -Aleksey > From shade at redhat.com Wed Jan 24 17:03:14 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 24 Jan 2018 18:03:14 +0100 Subject: RFR: Fix and rewrite update-refs barrier selector In-Reply-To: <67cd6e2f-7c1f-1281-ac35-1d2c0651274f@redhat.com> References: <67cd6e2f-7c1f-1281-ac35-1d2c0651274f@redhat.com> Message-ID: <029d10b3-0e87-b192-c29d-f6aa2a86b6d4@redhat.com> On 01/24/2018 05:31 PM, Aleksey Shipilev wrote: > After Traversal GC commit, normal cycle on Compiler.compiler fails within: > > V [libjvm.so+0x153c713] oop ShenandoahHeap::evac_update_oop_ref(unsigned int*, > bool&)+0x333 > V [libjvm.so+0x1539082] ShenandoahBarrierSet::write_ref_array(HeapWord*, unsigned long)+0x852 > V [libjvm.so+0x133be02] void ObjArrayKlass::do_copy(arrayOop, unsigned int*, > arrayOop, unsigned int*, int, Thread*)+0x142 > V [libjvm.so+0x133932b] ObjArrayKlass::copy_array(arrayOop, int, arrayOop, int, int, Thread*)+0x72b > V [libjvm.so+0xf23325] JVM_ArrayCopy+0x1e5 > > The troubling bit is why do we even get here: > > inline void do_oop_work(T* p) { > oop o; > if (STOREVAL_WRITE_BARRIER) { > bool evac; > o = _heap->evac_update_oop_ref(p, evac); <--- ???? > if ((ALWAYS_ENQUEUE || evac) && !oopDesc::is_null(o)) { > ShenandoahBarrierSet::enqueue(o); > } > } else { > o = _heap->maybe_update_oop_ref(p); > } > if (UPDATE_MATRIX && !oopDesc::is_null(o)) { > _heap->connection_matrix()->set_connected(p, o); > } > } > > It happens because the condition in selector is wrong: > http://cr.openjdk.java.net/~shade/shenandoah/traversal-regr-1/webrev.01/ Actually, let's rewrite the damn fragile thing: http://cr.openjdk.java.net/~shade/shenandoah/traversal-regr-1/webrev.02/ -Aleksey From shade at redhat.com Wed Jan 24 18:31:58 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 24 Jan 2018 19:31:58 +0100 Subject: RFR: VerifyJCStressTest should test all heuristics Message-ID: <8b4d3794-a0b3-4467-559e-ad55570572d8@redhat.com> http://cr.openjdk.java.net/~shade/shenandoah/verify-jcstress-all/webrev.01/ We have missed unsafe oopDesc operation with traversal heuristics, because no test validates it. Extended VerifyJCStressTest with all heuristics. (Passive excludes -XX:+ShVerifyOptoBarriers, because barrier config is odd there). Testing: hotspot_gc_shenandoah Thanks, -Aleksey From rkennke at redhat.com Wed Jan 24 20:54:27 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 24 Jan 2018 21:54:27 +0100 Subject: RFR: VerifyJCStressTest should test all heuristics In-Reply-To: <8b4d3794-a0b3-4467-559e-ad55570572d8@redhat.com> References: <8b4d3794-a0b3-4467-559e-ad55570572d8@redhat.com> Message-ID: Am 24.01.2018 um 19:31 schrieb Aleksey Shipilev: > http://cr.openjdk.java.net/~shade/shenandoah/verify-jcstress-all/webrev.01/ > > We have missed unsafe oopDesc operation with traversal heuristics, because no test validates it. > Extended VerifyJCStressTest with all heuristics. (Passive excludes -XX:+ShVerifyOptoBarriers, > because barrier config is odd there). > > Testing: hotspot_gc_shenandoah > > Thanks, > -Aleksey > Yup From rkennke at redhat.com Wed Jan 24 20:55:17 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 24 Jan 2018 21:55:17 +0100 Subject: RFR: Fix and rewrite update-refs barrier selector In-Reply-To: <029d10b3-0e87-b192-c29d-f6aa2a86b6d4@redhat.com> References: <67cd6e2f-7c1f-1281-ac35-1d2c0651274f@redhat.com> <029d10b3-0e87-b192-c29d-f6aa2a86b6d4@redhat.com> Message-ID: Good. Sorry for breaking it. Thanks for fixing! On Wed, Jan 24, 2018 at 6:03 PM, Aleksey Shipilev wrote: > On 01/24/2018 05:31 PM, Aleksey Shipilev wrote: > > After Traversal GC commit, normal cycle on Compiler.compiler fails > within: > > > > V [libjvm.so+0x153c713] oop ShenandoahHeap::evac_update_oop_ref int>(unsigned int*, > > bool&)+0x333 > > V [libjvm.so+0x1539082] ShenandoahBarrierSet::write_ref_array(HeapWord*, > unsigned long)+0x852 > > V [libjvm.so+0x133be02] void ObjArrayKlass::do_copy int>(arrayOop, unsigned int*, > > arrayOop, unsigned int*, int, Thread*)+0x142 > > V [libjvm.so+0x133932b] ObjArrayKlass::copy_array(arrayOop, int, > arrayOop, int, int, Thread*)+0x72b > > V [libjvm.so+0xf23325] JVM_ArrayCopy+0x1e5 > > > > The troubling bit is why do we even get here: > > > > inline void do_oop_work(T* p) { > > oop o; > > if (STOREVAL_WRITE_BARRIER) { > > bool evac; > > o = _heap->evac_update_oop_ref(p, evac); <--- ???? > > if ((ALWAYS_ENQUEUE || evac) && !oopDesc::is_null(o)) { > > ShenandoahBarrierSet::enqueue(o); > > } > > } else { > > o = _heap->maybe_update_oop_ref(p); > > } > > if (UPDATE_MATRIX && !oopDesc::is_null(o)) { > > _heap->connection_matrix()->set_connected(p, o); > > } > > } > > > > It happens because the condition in selector is wrong: > > http://cr.openjdk.java.net/~shade/shenandoah/traversal- > regr-1/webrev.01/ > > Actually, let's rewrite the damn fragile thing: > http://cr.openjdk.java.net/~shade/shenandoah/traversal-regr-1/webrev.02/ > > -Aleksey > > > From rkennke at redhat.com Wed Jan 24 20:55:47 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 24 Jan 2018 21:55:47 +0100 Subject: RFR: Unsafe comparison in ShenandoahHeap::evac_update_oop_ref In-Reply-To: References: Message-ID: Ugh. Please push it. Thanks for fixing. On Wed, Jan 24, 2018 at 5:42 PM, Aleksey Shipilev wrote: > Traversal GC fails in ShenandoahHeap::evac_update_oop_ref when > -XX:+VerifyStrictOopOperations is > enabled, because we need: > > $ hg qdiff > diff -r 21c595539121 src/hotspot/share/gc/shenandoah/shenandoahHeap. > inline.hpp > --- a/src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp Wed Jan > 24 17:32:27 2018 +0100 > +++ b/src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp Wed Jan > 24 17:40:35 2018 +0100 > @@ -151,7 +151,7 @@ > forwarded_oop = evacuate_object(heap_oop, Thread::current(), > evac); > } > oop prev = atomic_compare_exchange_oop(forwarded_oop, p, heap_oop); > - if (prev == heap_oop) { > + if (oopDesc::unsafe_equals(prev, heap_oop)) { > return forwarded_oop; > } else { > return NULL; > > This actually affects partial too, which call this method in SVWB. > > Testing: failing test > > Thanks, > -Aleksey > > From ashipile at redhat.com Wed Jan 24 21:01:02 2018 From: ashipile at redhat.com (ashipile at redhat.com) Date: Wed, 24 Jan 2018 21:01:02 +0000 Subject: hg: shenandoah/jdk10: 3 new changesets Message-ID: <201801242101.w0OL12wP018895@aojmv0008.oracle.com> Changeset: d8a9b5bfb1bd Author: shade Date: 2018-01-24 18:02 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk10/rev/d8a9b5bfb1bd Fix and rewrite update-refs barrier selector ! src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.cpp Changeset: 8437e22953c0 Author: shade Date: 2018-01-24 18:03 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk10/rev/8437e22953c0 Unsafe comparison in ShenandoahHeap::evac_update_oop_ref ! src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp Changeset: 30e8ba6e2794 Author: shade Date: 2018-01-24 19:14 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk10/rev/30e8ba6e2794 VerifyJCStressTest should test all heuristics ! test/hotspot/jtreg/gc/shenandoah/acceptance/VerifyJCStressTest.java From shade at redhat.com Thu Jan 25 10:27:18 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 25 Jan 2018 11:27:18 +0100 Subject: RFR: ShBS::interpreter_storeval_barrier signature fix and cleanup Message-ID: <39e49d42-8d3d-b59e-b7d3-4edf8020dae1@redhat.com> http://cr.openjdk.java.net/~shade/shenandoah/shbs-storeval-fix/webrev.01/ sh/jdk10 aarch64 build fails with: /pool/buildbot/slaves/sobornost/shenandoah-jdk10/build/src/hotspot/cpu/aarch64/shenandoahBarrierSet_aarch64.cpp:110:6: error: prototype for ?void ShenandoahBarrierSet::interpreter_storeval_barrier(MacroAssembler*, Register)? does not match any in class ?ShenandoahBarrierSet? void ShenandoahBarrierSet::interpreter_storeval_barrier(MacroAssembler* masm, Register dst) { ^~~~~~~~~~~~~~~~~~~~ /pool/buildbot/slaves/sobornost/shenandoah-jdk10/build/src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.hpp:129:8: error: candidate is: virtual void ShenandoahBarrierSet::interpreter_storeval_barrier(MacroAssembler*, Register, Register, Register) void interpreter_storeval_barrier(MacroAssembler* masm, Register dst, Register tmp, Register thread); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ This is because the argument lists for interpreter_storeval_barrier are messed up. Testing: hotspot_fast_gc_shenandoah, builds on x86_64 and aarch64 Thanks, -Aleksey From rkennke at redhat.com Thu Jan 25 12:29:09 2018 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 25 Jan 2018 13:29:09 +0100 Subject: RFR: ShBS::interpreter_storeval_barrier signature fix and cleanup In-Reply-To: <39e49d42-8d3d-b59e-b7d3-4edf8020dae1@redhat.com> References: <39e49d42-8d3d-b59e-b7d3-4edf8020dae1@redhat.com> Message-ID: Oops. I forgot to check aarch64 when doing traversal. Sorry. The patch is fine. Thanks for fixing it! Cheers, Roman Am 25. Januar 2018 11:27:18 MEZ schrieb Aleksey Shipilev : >http://cr.openjdk.java.net/~shade/shenandoah/shbs-storeval-fix/webrev.01/ > >sh/jdk10 aarch64 build fails with: > >/pool/buildbot/slaves/sobornost/shenandoah-jdk10/build/src/hotspot/cpu/aarch64/shenandoahBarrierSet_aarch64.cpp:110:6: >error: prototype for ?void >ShenandoahBarrierSet::interpreter_storeval_barrier(MacroAssembler*, >Register)? does not match any in class ?ShenandoahBarrierSet? >void ShenandoahBarrierSet::interpreter_storeval_barrier(MacroAssembler* >masm, Register dst) { > ^~~~~~~~~~~~~~~~~~~~ > >/pool/buildbot/slaves/sobornost/shenandoah-jdk10/build/src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.hpp:129:8: >error: candidate is: virtual void >ShenandoahBarrierSet::interpreter_storeval_barrier(MacroAssembler*, >Register, Register, Register) >void interpreter_storeval_barrier(MacroAssembler* masm, Register dst, >Register tmp, Register thread); > ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ > >This is because the argument lists for interpreter_storeval_barrier are >messed up. > >Testing: hotspot_fast_gc_shenandoah, builds on x86_64 and aarch64 > >Thanks, >-Aleksey -- Diese Nachricht wurde von meinem Android-Ger?t mit K-9 Mail gesendet. From ashipile at redhat.com Thu Jan 25 14:28:29 2018 From: ashipile at redhat.com (ashipile at redhat.com) Date: Thu, 25 Jan 2018 14:28:29 +0000 Subject: hg: shenandoah/jdk10: ShBS::interpreter_storeval_barrier signature fix and cleanup Message-ID: <201801251428.w0PESTXn015379@aojmv0008.oracle.com> Changeset: 6183a72bd5c2 Author: shade Date: 2018-01-25 11:24 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk10/rev/6183a72bd5c2 ShBS::interpreter_storeval_barrier signature fix and cleanup ! src/hotspot/cpu/aarch64/shenandoahBarrierSet_aarch64.cpp ! src/hotspot/cpu/aarch64/templateTable_aarch64.cpp ! src/hotspot/cpu/x86/shenandoahBarrierSet_x86.cpp ! src/hotspot/cpu/x86/templateTable_x86.cpp ! src/hotspot/share/gc/shared/barrierSet.hpp ! src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.hpp From zgu at redhat.com Thu Jan 25 17:15:03 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Thu, 25 Jan 2018 12:15:03 -0500 Subject: RFR: Hole in CAS barrier when using traversal heuristics Message-ID: <1163aca7-397b-0fe7-1f48-0eae2662ef4c@redhat.com> I am not complete sure this is right fix. There is hole in CAS barrier when using traversal heuristics. E.g. Unsafe_CompareAndSetObject() evacuates target and exchange object, but not the field, so it may hit assertion in ShenandoahBarrier::enqueue(). I could not come up a reliable reproducer, but I have seen this a few time with specjvm ScimarkLU with options: "-Xmx1g -Xms1g -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:ShenandoahGCHeuristics=traversal -Xlog:gc+stats" Webrev: http://cr.openjdk.java.net/~zgu/shenandoah/cas_traversal/webrev.00/ Thanks, -Zhengyu From rkennke at redhat.com Thu Jan 25 17:26:37 2018 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 25 Jan 2018 18:26:37 +0100 Subject: RFR: Hole in CAS barrier when using traversal heuristics In-Reply-To: <1163aca7-397b-0fe7-1f48-0eae2662ef4c@redhat.com> References: <1163aca7-397b-0fe7-1f48-0eae2662ef4c@redhat.com> Message-ID: <9c60eedc-e502-a9e0-c8ec-5409ba7c05e8@redhat.com> Am 25.01.2018 um 18:15 schrieb Zhengyu Gu: > I am not complete sure this is right fix. There is hole in CAS barrier > when using traversal heuristics. > > E.g. Unsafe_CompareAndSetObject() evacuates target and exchange object, > but not the field, so it may hit assertion in ShenandoahBarrier::enqueue(). > > I could not come up a reliable reproducer, but I have seen this a few > time with specjvm ScimarkLU with options: > > "-Xmx1g -Xms1g -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions > -XX:+UnlockDiagnosticVMOptions -XX:ShenandoahGCHeuristics=traversal > -Xlog:gc+stats" > > Webrev: http://cr.openjdk.java.net/~zgu/shenandoah/cas_traversal/webrev.00/ > > > Thanks, > > -Zhengyu Hi Zhengyu, I am not sure what you mean by 'evacuates target and exchange object, but not the field' .. clearly the target object needs to be evacuated, because we only write to to-space (write-barrier). Also, the exchange object needs to be evacuated, to ensure we end up only with to-space references in fields (storeval-barrier). What do you mean by evacuation of 'the field' ? The target field is part of the target object. The issue here seems to be that the usual 'pre-barrier' (i.e. SATB-barrier) should not be called at all. However, since we do set the MARKING bit, we still get into this code path. We might just want to check for traversal-in-progress and return right at the start of the method. Roman From zgu at redhat.com Thu Jan 25 17:35:01 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Thu, 25 Jan 2018 12:35:01 -0500 Subject: RFR: Hole in CAS barrier when using traversal heuristics In-Reply-To: <9c60eedc-e502-a9e0-c8ec-5409ba7c05e8@redhat.com> References: <1163aca7-397b-0fe7-1f48-0eae2662ef4c@redhat.com> <9c60eedc-e502-a9e0-c8ec-5409ba7c05e8@redhat.com> Message-ID: On 01/25/2018 12:26 PM, Roman Kennke wrote: > Am 25.01.2018 um 18:15 schrieb Zhengyu Gu: >> I am not complete sure this is right fix. There is hole in CAS barrier >> when using traversal heuristics. >> >> E.g. Unsafe_CompareAndSetObject() evacuates target and exchange >> object, but not the field, so it may hit assertion in >> ShenandoahBarrier::enqueue(). >> >> I could not come up a reliable reproducer, but I have seen this a few >> time with specjvm ScimarkLU with options: >> >> "-Xmx1g -Xms1g -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions >> -XX:+UnlockDiagnosticVMOptions -XX:ShenandoahGCHeuristics=traversal >> -Xlog:gc+stats" >> >> Webrev: >> http://cr.openjdk.java.net/~zgu/shenandoah/cas_traversal/webrev.00/ >> >> >> Thanks, >> >> -Zhengyu > > Hi Zhengyu, > > I am not sure what you mean by 'evacuates target and exchange object, > but not the field' .. clearly the target object needs to be evacuated, > because we only write to to-space (write-barrier). Also, the exchange > object needs to be evacuated, to ensure we end up only with to-space > references in fields (storeval-barrier). What do you mean by evacuation > of 'the field' ? The target field is part of the target object. There is an example: http://hg.openjdk.java.net/shenandoah/jdk10/file/6183a72bd5c2/src/hotspot/share/prims/unsafe.cpp#l1020 the addr points a field in object, that might not be evacuated and I think you do have to enqueue it, as the object may be gray. Thanks, -Zhengyu > > The issue here seems to be that the usual 'pre-barrier' (i.e. > SATB-barrier) should not be called at all. However, since we do set the > MARKING bit, we still get into this code path. We might just want to > check for traversal-in-progress and return right at the start of the > method. > > Roman From shade at redhat.com Thu Jan 25 17:41:30 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Thu, 25 Jan 2018 18:41:30 +0100 Subject: RFR: Fix 32-bit build by ifdef-ing non-implemented store-val barrier Message-ID: http://cr.openjdk.java.net/~shade/shenandoah/build/storeval-i586/webrev.01/ x86_32 build is broken because Traversal GC references 64-bit registers: /home/shade/shenandoah-jdk10/src/hotspot/cpu/x86/shenandoahBarrierSet_x86.cpp: In member function ?virtual void ShenandoahBarrierSet::interpreter_storeval_barrier(MacroAssembler*, Register, Register)?: /home/shade/shenandoah-jdk10/src/hotspot/cpu/x86/shenandoahBarrierSet_x86.cpp:167:13: error: ?c_rarg1? was not declared in this scope __ push(c_rarg1); ^~~~~~~ /home/shade/shenandoah-jdk10/src/hotspot/cpu/x86/shenandoahBarrierSet_x86.cpp:171:41: error: ?r15_thread? was not declared in this scope __ g1_write_barrier_pre(noreg, dst, r15_thread, tmp, true, false); ^~~~~~~~~~ The way out is to ifdef the barrier, like we did with the interpreter_write_barrier_impl a few blocks above. Testing: failing build -Aleksey From rkennke at redhat.com Thu Jan 25 17:51:13 2018 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 25 Jan 2018 18:51:13 +0100 Subject: RFR: Hole in CAS barrier when using traversal heuristics In-Reply-To: References: <1163aca7-397b-0fe7-1f48-0eae2662ef4c@redhat.com> <9c60eedc-e502-a9e0-c8ec-5409ba7c05e8@redhat.com> Message-ID: <59070a56-11c2-b75b-6eae-21cdea01974e@redhat.com> Am 25.01.2018 um 18:35 schrieb Zhengyu Gu: > > > On 01/25/2018 12:26 PM, Roman Kennke wrote: >> Am 25.01.2018 um 18:15 schrieb Zhengyu Gu: >>> I am not complete sure this is right fix. There is hole in CAS >>> barrier when using traversal heuristics. >>> >>> E.g. Unsafe_CompareAndSetObject() evacuates target and exchange >>> object, but not the field, so it may hit assertion in >>> ShenandoahBarrier::enqueue(). >>> >>> I could not come up a reliable reproducer, but I have seen this a few >>> time with specjvm ScimarkLU with options: >>> >>> "-Xmx1g -Xms1g -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions >>> -XX:+UnlockDiagnosticVMOptions -XX:ShenandoahGCHeuristics=traversal >>> -Xlog:gc+stats" >>> >>> Webrev: >>> http://cr.openjdk.java.net/~zgu/shenandoah/cas_traversal/webrev.00/ >>> >>> >>> Thanks, >>> >>> -Zhengyu >> >> Hi Zhengyu, >> >> I am not sure what you mean by 'evacuates target and exchange object, >> but not the field' .. clearly the target object needs to be evacuated, >> because we only write to to-space (write-barrier). Also, the exchange >> object needs to be evacuated, to ensure we end up only with to-space >> references in fields (storeval-barrier). What do you mean by >> evacuation of 'the field' ? The target field is part of the target >> object. > > There is an example: > http://hg.openjdk.java.net/shenandoah/jdk10/file/6183a72bd5c2/src/hotspot/share/prims/unsafe.cpp#l1020 > > > the addr points a field in object, that might not be evacuated and I > think you do have to enqueue it, as the object may be gray. addr is a field in p, which is in to-space by the WB a few lines above. This should be good. x is the exchange value, also evacuated by the storeval barrier. So all should be fine. What error/assert/crash are you seeing? Is it something in SBS::is_safe()? Then it may already be fixed by my subsequent changeset? It may happen that traversal GC gets cancelled, and then we hit an overly strict assert like that. Roman From rkennke at redhat.com Thu Jan 25 17:51:33 2018 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 25 Jan 2018 18:51:33 +0100 Subject: RFR: Fix 32-bit build by ifdef-ing non-implemented store-val barrier In-Reply-To: References: Message-ID: Am 25.01.2018 um 18:41 schrieb Aleksey Shipilev: > http://cr.openjdk.java.net/~shade/shenandoah/build/storeval-i586/webrev.01/ > > x86_32 build is broken because Traversal GC references 64-bit registers: > > /home/shade/shenandoah-jdk10/src/hotspot/cpu/x86/shenandoahBarrierSet_x86.cpp: In member function > ?virtual void ShenandoahBarrierSet::interpreter_storeval_barrier(MacroAssembler*, Register, Register)?: > /home/shade/shenandoah-jdk10/src/hotspot/cpu/x86/shenandoahBarrierSet_x86.cpp:167:13: error: > ?c_rarg1? was not declared in this scope > __ push(c_rarg1); > ^~~~~~~ > /home/shade/shenandoah-jdk10/src/hotspot/cpu/x86/shenandoahBarrierSet_x86.cpp:171:41: error: > ?r15_thread? was not declared in this scope > __ g1_write_barrier_pre(noreg, dst, r15_thread, tmp, true, false); > ^~~~~~~~~~ > > The way out is to ifdef the barrier, like we did with the interpreter_write_barrier_impl a few > blocks above. > > Testing: failing build > > -Aleksey > Oh my. Yes, please push. Thanks for fixing it! Roman From ashipile at redhat.com Thu Jan 25 18:05:21 2018 From: ashipile at redhat.com (ashipile at redhat.com) Date: Thu, 25 Jan 2018 18:05:21 +0000 Subject: hg: shenandoah/jdk10: Fix 32-bit build by ifdef-ing non-implemented storeval barrier Message-ID: <201801251805.w0PI5Msh003251@aojmv0008.oracle.com> Changeset: 3c12448ec444 Author: shade Date: 2018-01-25 18:44 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk10/rev/3c12448ec444 Fix 32-bit build by ifdef-ing non-implemented storeval barrier ! src/hotspot/cpu/x86/shenandoahBarrierSet_x86.cpp From zgu at redhat.com Thu Jan 25 19:05:38 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Thu, 25 Jan 2018 14:05:38 -0500 Subject: RFR: Hole in CAS barrier when using traversal heuristics In-Reply-To: <59070a56-11c2-b75b-6eae-21cdea01974e@redhat.com> References: <1163aca7-397b-0fe7-1f48-0eae2662ef4c@redhat.com> <9c60eedc-e502-a9e0-c8ec-5409ba7c05e8@redhat.com> <59070a56-11c2-b75b-6eae-21cdea01974e@redhat.com> Message-ID: >>> >>> Hi Zhengyu, >>> >>> I am not sure what you mean by 'evacuates target and exchange object, >>> but not the field' .. clearly the target object needs to be >>> evacuated, because we only write to to-space (write-barrier). Also, >>> the exchange object needs to be evacuated, to ensure we end up only >>> with to-space references in fields (storeval-barrier). What do you >>> mean by evacuation of 'the field' ? The target field is part of the >>> target object. >> >> There is an example: >> http://hg.openjdk.java.net/shenandoah/jdk10/file/6183a72bd5c2/src/hotspot/share/prims/unsafe.cpp#l1020 >> >> >> the addr points a field in object, that might not be evacuated and I >> think you do have to enqueue it, as the object may be gray. > > addr is a field in p, which is in to-space by the WB a few lines above. > This should be good. x is the exchange value, also evacuated by the > storeval barrier. So all should be fine. Yes, p and x are fine, but the field (e.g. an object) to be swapped out, may still in cset, and it is enqueued by oopDesc::atomic_compare_exchange_oop() (http://hg.openjdk.java.net/shenandoah/jdk10/file/3c12448ec444/src/hotspot/share/oops/oop.inline.hpp#l407), then hit assertion failure here: http://hg.openjdk.java.net/shenandoah/jdk10/file/3c12448ec444/src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.cpp#l464 hs_err: https://paste.fedoraproject.org/paste/isPRNXG6PqaPgx9a18IA4w BTW, what's reason it has to be to-space object? can it be evacuated during processing SATB buffers, or by storeval_barrier()? -Zhengyu > > What error/assert/crash are you seeing? Is it something in > SBS::is_safe()? Then it may already be fixed by my subsequent changeset? > > It may happen that traversal GC gets cancelled, and then we hit an > overly strict assert like that. > > Roman From rkennke at redhat.com Thu Jan 25 19:29:25 2018 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 25 Jan 2018 20:29:25 +0100 Subject: RFR: Hole in CAS barrier when using traversal heuristics In-Reply-To: References: <1163aca7-397b-0fe7-1f48-0eae2662ef4c@redhat.com> <9c60eedc-e502-a9e0-c8ec-5409ba7c05e8@redhat.com> <59070a56-11c2-b75b-6eae-21cdea01974e@redhat.com> Message-ID: <60ffe9ce-8ff7-b358-a8c0-c8b26329b48d@redhat.com> Am 25.01.2018 um 20:05 schrieb Zhengyu Gu: > >>>> >>>> Hi Zhengyu, >>>> >>>> I am not sure what you mean by 'evacuates target and exchange object, >>>> but not the field' .. clearly the target object needs to be >>>> evacuated, because we only write to to-space (write-barrier). Also, >>>> the exchange object needs to be evacuated, to ensure we end up only >>>> with to-space references in fields (storeval-barrier). What do you >>>> mean by evacuation of 'the field' ? The target field is part of the >>>> target object. >>> >>> There is an example: >>> http://hg.openjdk.java.net/shenandoah/jdk10/file/6183a72bd5c2/src/hotspot/share/prims/unsafe.cpp#l1020 >>> >>> >>> the addr points a field in object, that might not be evacuated and I >>> think you do have to enqueue it, as the object may be gray. >> >> addr is a field in p, which is in to-space by the WB a few lines >> above. This should be good. x is the exchange value, also evacuated by >> the storeval barrier. So all should be fine. > > Yes, p and x are fine, but the field (e.g. an object) to be swapped out, > may still in cset, No. The field is not an object. The field is a reference, and belongs to p, and points to another object (e.g. x). It is not an object by itself and thus cannot be evacuated or such. and it is enqueued by > oopDesc::atomic_compare_exchange_oop() > (http://hg.openjdk.java.net/shenandoah/jdk10/file/3c12448ec444/src/hotspot/share/oops/oop.inline.hpp#l407), > then hit assertion failure here: > > http://hg.openjdk.java.net/shenandoah/jdk10/file/3c12448ec444/src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.cpp#l464 > > > hs_err: https://paste.fedoraproject.org/paste/isPRNXG6PqaPgx9a18IA4w Hmm, ok. This is a problem in oopDesc::::atomic_compare_exchange_oop(). It calls write_ref_field_pre(), which it shouldn't do. By our design, it should call storeval_barrier() instead, which does the right thing. However, this is going to change once we merge from upstream... > BTW, what's reason it has to be to-space object? can it be evacuated > during processing SATB buffers, or by storeval_barrier()? We have two reasons for forcing to-space objects: - We must only ever write to to-space objects for consistency - We must only ever store to-space objects into fields, because the GC threads that concurrently update fields may already have visited it. If Java threads were writing from-space objects we may end up with pointers/fields to from-space objects after GC. Roman From zgu at redhat.com Thu Jan 25 19:55:06 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Thu, 25 Jan 2018 14:55:06 -0500 Subject: RFR: Hole in CAS barrier when using traversal heuristics In-Reply-To: <60ffe9ce-8ff7-b358-a8c0-c8b26329b48d@redhat.com> References: <1163aca7-397b-0fe7-1f48-0eae2662ef4c@redhat.com> <9c60eedc-e502-a9e0-c8ec-5409ba7c05e8@redhat.com> <59070a56-11c2-b75b-6eae-21cdea01974e@redhat.com> <60ffe9ce-8ff7-b358-a8c0-c8b26329b48d@redhat.com> Message-ID: <09b2edec-6113-e0c5-c2ba-297eecd39cda@redhat.com> >> BTW, what's reason it has to be to-space object? can it be evacuated >> during processing SATB buffers, or by storeval_barrier()? Sorry, I am not clear on my question, which should be: why should only enqueue to-space object during conc-traversal gc? > > We have two reasons for forcing to-space objects: > - We must only ever write to to-space objects for consistency > - We must only ever store to-space objects into fields, because the GC > threads that concurrently update fields may already have visited it. If > Java threads were writing from-space objects we may end up with > pointers/fields to from-space objects after GC. I understand above reasons. But these do not apply to object to be enqueued to satisfy SATB protocol, since we do not write or update this object, but just to make sure it should be marked. If I understand correctly, this object has to be in to-space at the end of GC cycle with traversal gc. however, I don't see why it has to be to-space object to be enqueued, can it be evacuated when it is processed? BTW, do you want to take over this one? Thanks, -Zhengyu > > Roman > From zgu at redhat.com Thu Jan 25 20:02:44 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Thu, 25 Jan 2018 15:02:44 -0500 Subject: RFR: Hole in CAS barrier when using traversal heuristics In-Reply-To: <60ffe9ce-8ff7-b358-a8c0-c8b26329b48d@redhat.com> References: <1163aca7-397b-0fe7-1f48-0eae2662ef4c@redhat.com> <9c60eedc-e502-a9e0-c8ec-5409ba7c05e8@redhat.com> <59070a56-11c2-b75b-6eae-21cdea01974e@redhat.com> <60ffe9ce-8ff7-b358-a8c0-c8b26329b48d@redhat.com> Message-ID: <5886ba13-5457-358f-071b-8082ea3742bc@redhat.com> > > No. The field is not an object. The field is a reference, and belongs to > p, and points to another object (e.g. x). It is not an object by itself > and thus cannot be evacuated or such. Sorry, my bad writing, it is a reference to an object that may still in from-space. > > and it is enqueued by >> oopDesc::atomic_compare_exchange_oop() >> (http://hg.openjdk.java.net/shenandoah/jdk10/file/3c12448ec444/src/hotspot/share/oops/oop.inline.hpp#l407), >> then hit assertion failure here: >> >> http://hg.openjdk.java.net/shenandoah/jdk10/file/3c12448ec444/src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.cpp#l464 >> >> >> hs_err: https://paste.fedoraproject.org/paste/isPRNXG6PqaPgx9a18IA4w > > Hmm, ok. This is a problem in oopDesc::::atomic_compare_exchange_oop(). > It calls write_ref_field_pre(), which it shouldn't do. By our design, it > should call storeval_barrier() instead, which does the right thing. > However, this is going to change once we merge from upstream... Sorry, call storeval_barrier() on what? my understanding this that, you have to apply SATB barrier on swapped out *old* value, which is this write_ref_field_pre() does, no? Thanks, -Zhengyu > >> BTW, what's reason it has to be to-space object? can it be evacuated >> during processing SATB buffers, or by storeval_barrier()? > > We have two reasons for forcing to-space objects: > - We must only ever write to to-space objects for consistency > - We must only ever store to-space objects into fields, because the GC > threads that concurrently update fields may already have visited it. If > Java threads were writing from-space objects we may end up with > pointers/fields to from-space objects after GC. > > Roman > From rkennke at redhat.com Thu Jan 25 20:06:32 2018 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 25 Jan 2018 21:06:32 +0100 Subject: RFR: Hole in CAS barrier when using traversal heuristics In-Reply-To: <09b2edec-6113-e0c5-c2ba-297eecd39cda@redhat.com> References: <1163aca7-397b-0fe7-1f48-0eae2662ef4c@redhat.com> <9c60eedc-e502-a9e0-c8ec-5409ba7c05e8@redhat.com> <59070a56-11c2-b75b-6eae-21cdea01974e@redhat.com> <60ffe9ce-8ff7-b358-a8c0-c8b26329b48d@redhat.com> <09b2edec-6113-e0c5-c2ba-297eecd39cda@redhat.com> Message-ID: <2378a348-ff67-24f7-6995-848e2dd15a3a@redhat.com> Am 25.01.2018 um 20:55 schrieb Zhengyu Gu: >>> BTW, what's reason it has to be to-space object? can it be evacuated >>> during processing SATB buffers, or by storeval_barrier()? > Sorry, I am not clear on my question, which should be: why should only > enqueue to-space object during conc-traversal gc? > >> >> We have two reasons for forcing to-space objects: >> - We must only ever write to to-space objects for consistency >> - We must only ever store to-space objects into fields, because the GC >> threads that concurrently update fields may already have visited it. >> If Java threads were writing from-space objects we may end up with >> pointers/fields to from-space objects after GC. > > I understand above reasons. > > But these do not apply to object to be enqueued to satisfy SATB > protocol, since we do not write or update this object, but just to make > sure it should be marked. If I understand correctly, this object has to > be in to-space at the end of GC cycle with traversal gc. however, I > don't see why it has to be to-space object to be enqueued, can it be > evacuated when it is processed? The storeval barrier has two purposes: one is to ensure consistency vs. 'update-refs' (traversal updates references). The other is to ensure consistency vs traversal of the heap (e.g. 'marking'). If it needs to write-barrier the object anyway (to ensure update-refs consistency), then we can just as well make this an invariant. Then we can avoid reading the fwd ptrs in the GC thread. > BTW, do you want to take over this one? Ok, can do it. Do you happen to have a reproducer? Roman From rkennke at redhat.com Thu Jan 25 20:09:39 2018 From: rkennke at redhat.com (Roman Kennke) Date: Thu, 25 Jan 2018 21:09:39 +0100 Subject: RFR: Hole in CAS barrier when using traversal heuristics In-Reply-To: <5886ba13-5457-358f-071b-8082ea3742bc@redhat.com> References: <1163aca7-397b-0fe7-1f48-0eae2662ef4c@redhat.com> <9c60eedc-e502-a9e0-c8ec-5409ba7c05e8@redhat.com> <59070a56-11c2-b75b-6eae-21cdea01974e@redhat.com> <60ffe9ce-8ff7-b358-a8c0-c8b26329b48d@redhat.com> <5886ba13-5457-358f-071b-8082ea3742bc@redhat.com> Message-ID: <2bd76355-c636-dfc4-e1a1-031640999c66@redhat.com> Am 25.01.2018 um 21:02 schrieb Zhengyu Gu: >> >> No. The field is not an object. The field is a reference, and belongs >> to p, and points to another object (e.g. x). It is not an object by >> itself and thus cannot be evacuated or such. > Sorry, my bad writing, it is a reference to an object that may still in > from-space. Ok. Yes. >> ? and it is enqueued by >>> oopDesc::atomic_compare_exchange_oop() >>> (http://hg.openjdk.java.net/shenandoah/jdk10/file/3c12448ec444/src/hotspot/share/oops/oop.inline.hpp#l407), >>> then hit assertion failure here: >>> >>> http://hg.openjdk.java.net/shenandoah/jdk10/file/3c12448ec444/src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.cpp#l464 >>> >>> >>> hs_err: https://paste.fedoraproject.org/paste/isPRNXG6PqaPgx9a18IA4w >> >> Hmm, ok. This is a problem in >> oopDesc::::atomic_compare_exchange_oop(). It calls >> write_ref_field_pre(), which it shouldn't do. By our design, it should >> call storeval_barrier() instead, which does the right thing. However, >> this is going to change once we merge from upstream... > > Sorry, call storeval_barrier() on what? my understanding this that, you > have to apply SATB barrier on swapped out *old* value, which is this > write_ref_field_pre() does, no? We have a little naming problem here. While we're using G1's SATB buffer to enqueue objects, the traversal GC algorithm is *not* SATB-based. It is incremental-update-based, which is kindof the opposite of SATB. (one could call it 'snapshot-at-the-end' (-of-traversal). Instead of enqueing the previous values on stores, it enqueues the *new* values on stores. This is why the storeval barrier can do both enqueue (for i-u) and WB (for update-refs-consistency) in one swoop. I hope this clarifies it. ? Roman From zgu at redhat.com Thu Jan 25 20:18:02 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Thu, 25 Jan 2018 15:18:02 -0500 Subject: RFR: Hole in CAS barrier when using traversal heuristics In-Reply-To: <2378a348-ff67-24f7-6995-848e2dd15a3a@redhat.com> References: <1163aca7-397b-0fe7-1f48-0eae2662ef4c@redhat.com> <9c60eedc-e502-a9e0-c8ec-5409ba7c05e8@redhat.com> <59070a56-11c2-b75b-6eae-21cdea01974e@redhat.com> <60ffe9ce-8ff7-b358-a8c0-c8b26329b48d@redhat.com> <09b2edec-6113-e0c5-c2ba-297eecd39cda@redhat.com> <2378a348-ff67-24f7-6995-848e2dd15a3a@redhat.com> Message-ID: > > Ok, can do it. Do you happen to have a reproducer? Not simple reproducer. I got this by running ScimarkLU benchmark, it may takes a few runs. ${JAVA_HOME}/bin/java -jar jmh-specjvm2016.jar ScimarkLU --jvmArgs "-Xmx1g -Xms1g -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+UnlockDiagnosticVMOptions -XX:ShenandoahGCHeuristics=traversal -Xlog:gc+stats" -f 1 Thanks, -Zhengyu > > Roman From zgu at redhat.com Thu Jan 25 20:19:42 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Thu, 25 Jan 2018 15:19:42 -0500 Subject: RFR: Hole in CAS barrier when using traversal heuristics In-Reply-To: <2bd76355-c636-dfc4-e1a1-031640999c66@redhat.com> References: <1163aca7-397b-0fe7-1f48-0eae2662ef4c@redhat.com> <9c60eedc-e502-a9e0-c8ec-5409ba7c05e8@redhat.com> <59070a56-11c2-b75b-6eae-21cdea01974e@redhat.com> <60ffe9ce-8ff7-b358-a8c0-c8b26329b48d@redhat.com> <5886ba13-5457-358f-071b-8082ea3742bc@redhat.com> <2bd76355-c636-dfc4-e1a1-031640999c66@redhat.com> Message-ID: > > We have a little naming problem here. While we're using G1's SATB buffer > to enqueue objects, the traversal GC algorithm is *not* SATB-based. It > is incremental-update-based, which is kindof the opposite of SATB. (one > could call it 'snapshot-at-the-end' (-of-traversal). Instead of enqueing > the previous values on stores, it enqueues the *new* values on stores. > This is why the storeval barrier can do both enqueue (for i-u) and WB > (for update-refs-consistency) in one swoop. > > I hope this clarifies it. ? Okay, I guess I have to catch up this :-( Thanks, -Zhengyu > > Roman From rkennke at redhat.com Fri Jan 26 10:43:43 2018 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 26 Jan 2018 11:43:43 +0100 Subject: RFR: Don't enter SATB pre-barrier when in traversal. Message-ID: <8dbb6ada-3e0c-b4fe-9b1f-e03af6a7580e@redhat.com> This is the fix for the problem that Zhengyu found. It's another side effect of turning on MARKING during traversal. We need to ensure to not enter the SATB pre-barrier during traversal. http://cr.openjdk.java.net/~rkennke/traversal-no-pre-barrier/webrev.00/ Passes hotspot_gc_shenandoah Good? From shade at redhat.com Fri Jan 26 11:17:51 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 26 Jan 2018 12:17:51 +0100 Subject: RFR: Don't enter SATB pre-barrier when in traversal. In-Reply-To: <8dbb6ada-3e0c-b4fe-9b1f-e03af6a7580e@redhat.com> References: <8dbb6ada-3e0c-b4fe-9b1f-e03af6a7580e@redhat.com> Message-ID: <577cbe94-e051-0387-6c91-00d066cded4f@redhat.com> On 01/26/2018 11:43 AM, Roman Kennke wrote: > This is the fix for the problem that Zhengyu found. It's another side effect of turning on MARKING > during traversal. We need to ensure to not enter the SATB pre-barrier during traversal. > > http://cr.openjdk.java.net/~rkennke/traversal-no-pre-barrier/webrev.00/ Okay. So maybe it is wrong to turn off MARKING during Traversal GC? Let GC state MARKING only mean regular concurrent marking cycle? Thanks, -Aleksey From shade at redhat.com Fri Jan 26 12:00:04 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 26 Jan 2018 13:00:04 +0100 Subject: RFR: [9] Bulk backports to sh/jdk9 Message-ID: <819f0073-b0b9-0708-cec0-d0b55452c0eb@redhat.com> http://cr.openjdk.java.net/~shade/shenandoah/backports/jdk9-20180126/webrev.01/ Changes include: 8735773ec619: Single thread-local GC state flag for all barriers 544322604347: ShConcurrentThread races with set_gc_state_bit dc779781dd5e: Do not put down update-refs-in-progress flag concurrently d55c6d5216d1: Common TLS access to GC state, where possible 1d1238a0603b: Defer cleaning of system dictionary and friends to parallel cleaning phase fd9724b26fdd: Refactor allocation failure and explicit GC handling e5398dce6e7b: Make concurrent precleaning log message optional again 26b9048c042a: Make degenerated update-refs use region-set cursor to hand over work 1a6a9f288dd2: Bitmap size might not be page aligned when large page is used 12654193e434: Demote warning message about OOM-during-evac to informational 67294a38c0c7: TestSelectiveBarrierFlags should accept multi-element flag selections ecb87af5e0d8: Implement flag to generate write-barriers without membars. 820129a799b1: Allocation failure injection machinery b8c39bdc0dac: Log message on ref processing, class unload, update refs for mark events 45d471869b73: Degenerated GC 15261c4a6adf: Degenerated GC: shortcut cycles, upgrade futile cycles bd01b07ba0d7: Add ShenandoahRootProcessor API to report threads while scanning roots 3a6457fecc72: Relax assert in SBS::is_safe() 30e8ba6e2794: VerifyJCStressTest should test all heuristics 6183a72bd5c2: ShBS::interpreter_storeval_barrier signature fix and cleanup 3c12448ec444: Fix 32-bit build by ifdef-ing non-implemented storeval barrier Testing: hotspot_gc_shenandoah {fastdebug|release}, specjvm Thanks, -Aleksey From rkennke at redhat.com Fri Jan 26 11:53:33 2018 From: rkennke at redhat.com (Roman Kennke) Date: Fri, 26 Jan 2018 12:53:33 +0100 Subject: RFR: Don't enter SATB pre-barrier when in traversal. In-Reply-To: <577cbe94-e051-0387-6c91-00d066cded4f@redhat.com> References: <8dbb6ada-3e0c-b4fe-9b1f-e03af6a7580e@redhat.com> <577cbe94-e051-0387-6c91-00d066cded4f@redhat.com> Message-ID: Am 26. Januar 2018 12:17:51 MEZ schrieb Aleksey Shipilev : >On 01/26/2018 11:43 AM, Roman Kennke wrote: >> This is the fix for the problem that Zhengyu found. It's another side >effect of turning on MARKING >> during traversal. We need to ensure to not enter the SATB pre-barrier >during traversal. >> >> >http://cr.openjdk.java.net/~rkennke/traversal-no-pre-barrier/webrev.00/ > >Okay. > >So maybe it is wrong to turn off MARKING during Traversal GC? Let GC >state MARKING only mean regular >concurrent marking cycle? Yes I think so. Major GC phases (concmark, evac, uprefs, partial and traversal) should not overlap, and barrier code should positively select what it wants, and not exclude what it doesn't want. Let me rewrite this stuff. Roman -- Diese Nachricht wurde von meinem Android-Ger?t mit K-9 Mail gesendet. From shade at redhat.com Fri Jan 26 16:48:29 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 26 Jan 2018 17:48:29 +0100 Subject: RFR: [8u] Critical backports to sh/jdk8u Message-ID: <8a69240e-8838-4726-ef8c-a14d7befd0b3@redhat.com> http://cr.openjdk.java.net/~shade/shenandoah/backports/jdk8u-20180126-crit/webrev.01/ We do not have much time to complete bulk backports, so let us backport only the critical bug/perf/test fixes: dc779781dd5e: Do not put down update-refs-in-progress flag concurrently 1d1238a0603b: Defer cleaning of system dictionary and friends to parallel cleaning phase 1a6a9f288dd2: Bitmap size might not be page aligned when large page is used 30e8ba6e2794: VerifyJCStressTest should test all heuristics 820129a799b1: Allocation failure injection machinery Let's do these right now. We shall backport other things as time allows. Testing: hotspot_gc_shenandoah {fastdebug|release} Thanks, -Aleksey From shade at redhat.com Fri Jan 26 17:00:20 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 26 Jan 2018 18:00:20 +0100 Subject: RFR: Conditionalize PerfDataMemorySize on enabled heap sampling Message-ID: Saves some memory when sampling is not enabled (default case), and boosts up when we deal with lots of Shenandoah sampling data: diff -r 3c12448ec444 src/hotspot/share/runtime/arguments.cpp --- a/src/hotspot/share/runtime/arguments.cpp Thu Jan 25 18:44:13 2018 +0100 +++ b/src/hotspot/share/runtime/arguments.cpp Fri Jan 26 17:58:21 2018 +0100 @@ -2035,8 +2035,10 @@ FLAG_SET_DEFAULT(ParallelRefProcEnabled, true); } - if (FLAG_IS_DEFAULT(PerfDataMemorySize)) { - FLAG_SET_DEFAULT(PerfDataMemorySize, 512*K); + if (ShenandoahRegionSampling && FLAG_IS_DEFAULT(PerfDataMemorySize)) { + // When sampling is enabled, max out the PerfData memory to get more + // Shenandoah data in, including Matrix. + FLAG_SET_DEFAULT(PerfDataMemorySize, 2048*K); } #ifdef COMPILER2 Testing: hotspot_gc_shenandoah Thanks, -Aleksey From cflood at redhat.com Fri Jan 26 17:03:40 2018 From: cflood at redhat.com (Christine Flood) Date: Fri, 26 Jan 2018 12:03:40 -0500 Subject: RFR: Conditionalize PerfDataMemorySize on enabled heap sampling In-Reply-To: References: Message-ID: Looks good, Thanks! Christine On Fri, Jan 26, 2018 at 12:00 PM, Aleksey Shipilev wrote: > Saves some memory when sampling is not enabled (default case), and boosts up when we deal with lots > of Shenandoah sampling data: > > diff -r 3c12448ec444 src/hotspot/share/runtime/arguments.cpp > --- a/src/hotspot/share/runtime/arguments.cpp Thu Jan 25 18:44:13 2018 +0100 > +++ b/src/hotspot/share/runtime/arguments.cpp Fri Jan 26 17:58:21 2018 +0100 > @@ -2035,8 +2035,10 @@ > FLAG_SET_DEFAULT(ParallelRefProcEnabled, true); > } > > - if (FLAG_IS_DEFAULT(PerfDataMemorySize)) { > - FLAG_SET_DEFAULT(PerfDataMemorySize, 512*K); > + if (ShenandoahRegionSampling && FLAG_IS_DEFAULT(PerfDataMemorySize)) { > + // When sampling is enabled, max out the PerfData memory to get more > + // Shenandoah data in, including Matrix. > + FLAG_SET_DEFAULT(PerfDataMemorySize, 2048*K); > } > > #ifdef COMPILER2 > > Testing: hotspot_gc_shenandoah > > Thanks, > -Aleksey > From zgu at redhat.com Fri Jan 26 17:10:20 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Fri, 26 Jan 2018 12:10:20 -0500 Subject: RFR: [8u] Critical backports to sh/jdk8u In-Reply-To: <8a69240e-8838-4726-ef8c-a14d7befd0b3@redhat.com> References: <8a69240e-8838-4726-ef8c-a14d7befd0b3@redhat.com> Message-ID: <82090529-a7db-d446-a773-399de7e36ff1@redhat.com> Backport looks good. -Zhengyu On 01/26/2018 11:48 AM, Aleksey Shipilev wrote: > http://cr.openjdk.java.net/~shade/shenandoah/backports/jdk8u-20180126-crit/webrev.01/ > > We do not have much time to complete bulk backports, so let us backport only the critical > bug/perf/test fixes: > > dc779781dd5e: Do not put down update-refs-in-progress flag concurrently > 1d1238a0603b: Defer cleaning of system dictionary and friends to parallel cleaning phase > 1a6a9f288dd2: Bitmap size might not be page aligned when large page is used > 30e8ba6e2794: VerifyJCStressTest should test all heuristics > 820129a799b1: Allocation failure injection machinery > > Let's do these right now. We shall backport other things as time allows. > > Testing: hotspot_gc_shenandoah {fastdebug|release} > > Thanks, > -Aleksey > From cflood at redhat.com Fri Jan 26 17:37:35 2018 From: cflood at redhat.com (Christine Flood) Date: Fri, 26 Jan 2018 12:37:35 -0500 Subject: RFR: [8u] Critical backports to sh/jdk8u In-Reply-To: <8a69240e-8838-4726-ef8c-a14d7befd0b3@redhat.com> References: <8a69240e-8838-4726-ef8c-a14d7befd0b3@redhat.com> Message-ID: void ShenandoahHeap::try_inject_alloc_failure() { + if (ShenandoahAllocFailureALot && !cancelled_concgc() && ((os::random() % 1000) > 950)) { + _inject_alloc_failure.set(); + Thread::current()->_ParkEvent->park(1); + if (cancelled_concgc()) { + log_info(gc)("Allocation failure was successfully injected"); + } + } + } Is it possible that there is a race and we get to the test for cancelled_concgc before it actually gets set? Is there any reason not to try JCStressTests with frequent allocation failures? I'm just curious. I don't see any reason to stop the patch moving forward. Christine On Fri, Jan 26, 2018 at 11:48 AM, Aleksey Shipilev wrote: > http://cr.openjdk.java.net/~shade/shenandoah/backports/jdk8u-20180126-crit/webrev.01/ > > We do not have much time to complete bulk backports, so let us backport only the critical > bug/perf/test fixes: > > dc779781dd5e: Do not put down update-refs-in-progress flag concurrently > 1d1238a0603b: Defer cleaning of system dictionary and friends to parallel cleaning phase > 1a6a9f288dd2: Bitmap size might not be page aligned when large page is used > 30e8ba6e2794: VerifyJCStressTest should test all heuristics > 820129a799b1: Allocation failure injection machinery > > Let's do these right now. We shall backport other things as time allows. > > Testing: hotspot_gc_shenandoah {fastdebug|release} > > Thanks, > -Aleksey > From shade at redhat.com Fri Jan 26 17:40:30 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 26 Jan 2018 18:40:30 +0100 Subject: RFR: [8u] Critical backports to sh/jdk8u In-Reply-To: References: <8a69240e-8838-4726-ef8c-a14d7befd0b3@redhat.com> Message-ID: On 01/26/2018 06:37 PM, Christine Flood wrote: > void ShenandoahHeap::try_inject_alloc_failure() { > + if (ShenandoahAllocFailureALot && !cancelled_concgc() && > ((os::random() % 1000) > 950)) { > + _inject_alloc_failure.set(); > + Thread::current()->_ParkEvent->park(1); > + if (cancelled_concgc()) { > + log_info(gc)("Allocation failure was successfully injected"); > + } > + } > + } > > Is it possible that there is a race and we get to the test for > cancelled_concgc before it actually gets set? Yes, but we don't care. This is mostly to observe that any thread had reacted on this failure within the time set -- a debugging measure. > Is there any reason not to try JCStressTests with frequent allocation > failures? JCStressTests are mostly for testing internal compiler/oop verification, supposed to be fast. Other tests figure out what happens on alloc failures. -Aleksey From ashipile at redhat.com Fri Jan 26 17:45:08 2018 From: ashipile at redhat.com (ashipile at redhat.com) Date: Fri, 26 Jan 2018 17:45:08 +0000 Subject: hg: shenandoah/jdk10: Conditionalize PerfDataMemorySize on enabled heap sampling Message-ID: <201801261745.w0QHj8Qe001342@aojmv0008.oracle.com> Changeset: 16198c705496 Author: shade Date: 2018-01-26 17:56 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk10/rev/16198c705496 Conditionalize PerfDataMemorySize on enabled heap sampling ! src/hotspot/share/runtime/arguments.cpp From zgu at redhat.com Fri Jan 26 17:54:03 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Fri, 26 Jan 2018 12:54:03 -0500 Subject: RFR: Missing cancelled concgc check results assertion failure Message-ID: <028cd960-5aaa-7e5f-6137-2deaee117166@redhat.com> Diving to weak reference work without checking cancelled concgc, results assertion failure of not emptied task queues. http://cr.openjdk.java.net/~zgu/shenandoah/tq_cancelled_gc/hs_err.txt Webrev: http://cr.openjdk.java.net/~zgu/shenandoah/tq_cancelled_gc/webrev.00/ Thanks, -Zhengyu From ashipile at redhat.com Fri Jan 26 18:04:24 2018 From: ashipile at redhat.com (ashipile at redhat.com) Date: Fri, 26 Jan 2018 18:04:24 +0000 Subject: hg: shenandoah/jdk8u/hotspot: 5 new changesets Message-ID: <201801261804.w0QI4OY8009963@aojmv0008.oracle.com> Changeset: daa774ac0d72 Author: shade Date: 2018-01-22 12:04 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/daa774ac0d72 Do not put down update-refs-in-progress flag concurrently ! src/share/vm/gc_implementation/shenandoah/shenandoahConcurrentThread.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahMarkCompact.cpp Changeset: 04b591f74de6 Author: rkennke Date: 2018-01-17 15:33 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/04b591f74de6 Defer cleaning of system dictionary and friends to parallel cleaning phase ! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.cpp Changeset: 5ae425989ac9 Author: zgu Date: 2018-01-18 08:23 -0500 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/5ae425989ac9 Bitmap size might not be page aligned when large page is used ! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.cpp Changeset: 08632a44a72e Author: shade Date: 2018-01-24 19:14 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/08632a44a72e VerifyJCStressTest should test all heuristics ! test/gc/shenandoah/acceptance/VerifyJCStressTest.java Changeset: 0020bc4708fc Author: shade Date: 2018-01-19 18:49 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk8u/hotspot/rev/0020bc4708fc Allocation failure injection machinery ! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc_implementation/shenandoah/shenandoahHeap.hpp ! src/share/vm/gc_implementation/shenandoah/shenandoah_globals.hpp ! test/gc/shenandoah/LotsOfCycles.java ! test/gc/shenandoah/acceptance/AllocIntArrays.java ! test/gc/shenandoah/acceptance/AllocObjectArrays.java ! test/gc/shenandoah/acceptance/AllocObjects.java ! test/gc/shenandoah/acceptance/RetainObjects.java ! test/gc/shenandoah/acceptance/SieveObjects.java From rkennke at redhat.com Tue Jan 30 09:54:18 2018 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 30 Jan 2018 10:54:18 +0100 Subject: RFR: Make major GC phases exclusive from each other Message-ID: Currently, partial and traversal use overlapping GC phase bits: partial also activates evac, traversal activates everything. This causes a little mess when selecting barriers, as observed by Zhengyu last week. This patch makes all the major phases exclusive (marking, evac, update-refs, partial and traversal). Barriers are always included, and never excluded. This seems cleaner and easier to understand to me. The state bit for 'has-forwarded' is still overlapping. Not sure what to do with that. Bits in the gc-state bitmask are now addressed via mask, and not via position. This allows to check for groups of phases in one check. E.g. write-barriers are now checking for EVACUATION | PARTIAL | TRAVERSAL Passes hotspot_gc_shenandoah http://cr.openjdk.java.net/~rkennke/exclusive-gc-phases/webrev.00/ Ok? Some observations while I did this: - ShenandoahConditionalSATBBarrier can now be greatly simplified or even eliminated - Partial can use machinery from Traversal for speed boost: e.g. ShenandoahEnqueueBarrier - Traversal still has a liveness accounting problem ... all of which I will address in followup patches Roman From shade at redhat.com Tue Jan 30 10:07:49 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 30 Jan 2018 11:07:49 +0100 Subject: RFR: Make major GC phases exclusive from each other In-Reply-To: References: Message-ID: On 01/30/2018 10:54 AM, Roman Kennke wrote: > This patch makes all the major phases exclusive (marking, evac, > update-refs, partial and traversal). Barriers are always included, and > never excluded. This seems cleaner and easier to understand to me. Yup. Definitely looks better. > The state bit for 'has-forwarded' is still overlapping. Not sure what to do > with that. Nothing, it should be like that by design. > http://cr.openjdk.java.net/~rkennke/exclusive-gc-phases/webrev.00/ *) The change in shenandoahBarrierSet.cpp is not needed anymore, as the two bits are now exclusive? 283 if (_heap->is_concurrent_mark_in_progress() && ! _heap->is_concurrent_traversal_in_progress()) { *) set_gc_state_bit is now misnomer, I think: it is set_gc_state_mask? *) It also seems possible to put the mask exactly once now? Instead of: + set_gc_state_bit(TRAVERSAL, in_progress); + set_gc_state_bit(HAS_FORWARDED, in_progress); Do: set_gc_state_bit(HAS_FORWARDED | TRAVERSAL, in_progress); Thanks, -Aleksey From rkennke at redhat.com Tue Jan 30 10:51:40 2018 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 30 Jan 2018 11:51:40 +0100 Subject: RFR: Make major GC phases exclusive from each other In-Reply-To: References: Message-ID: On Tue, Jan 30, 2018 at 11:07 AM, Aleksey Shipilev wrote: > On 01/30/2018 10:54 AM, Roman Kennke wrote: > > This patch makes all the major phases exclusive (marking, evac, > > update-refs, partial and traversal). Barriers are always included, and > > never excluded. This seems cleaner and easier to understand to me. > > Yup. Definitely looks better. > > > The state bit for 'has-forwarded' is still overlapping. Not sure what to > do > > with that. > > Nothing, it should be like that by design. > > Ok, good. > > http://cr.openjdk.java.net/~rkennke/exclusive-gc-phases/webrev.00/ > > *) The change in shenandoahBarrierSet.cpp is not needed anymore, as the > two bits are now exclusive? > > 283 if (_heap->is_concurrent_mark_in_progress() && ! > _heap->is_concurrent_traversal_in_progress()) { > > Right. Good catch. > *) set_gc_state_bit is now misnomer, I think: it is set_gc_state_mask? > > Fixed. > *) It also seems possible to put the mask exactly once now? > > Instead of: > > + set_gc_state_bit(TRAVERSAL, in_progress); > + set_gc_state_bit(HAS_FORWARDED, in_progress); > > Do: > > set_gc_state_bit(HAS_FORWARDED | TRAVERSAL, in_progress); > > Fixed. Differential: http://cr.openjdk.java.net/~rkennke/exclusive-gc-phases/webrev.01.diff/ Full: http://cr.openjdk.java.net/~rkennke/exclusive-gc-phases/webrev.01/ Ok now? Roman From shade at redhat.com Tue Jan 30 10:55:58 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 30 Jan 2018 11:55:58 +0100 Subject: RFR: Make major GC phases exclusive from each other In-Reply-To: References: Message-ID: <743dbd83-cc87-bd83-e4c6-6f28c9d3338f@redhat.com> On 01/30/2018 11:51 AM, Roman Kennke wrote: > Full: > http://cr.openjdk.java.net/~rkennke/exclusive-gc-phases/webrev.01/ Good. I'd still do a few tune-ups: *) Keep the asserts in set/unset in ShenandoahSharedBitmap that the incoming value fits into byte. *) Arguments should be "mask", not "bit" void ShenandoahHeap::set_gc_state_mask_concurrently(uint bit, bool value) { void ShenandoahHeap::set_gc_state_mask(uint bit, bool value) { No need for re-review. Thanks, -Aleksey From roman at kennke.org Tue Jan 30 11:25:04 2018 From: roman at kennke.org (roman at kennke.org) Date: Tue, 30 Jan 2018 11:25:04 +0000 Subject: hg: shenandoah/jdk10: Make major GC phases exclusive from each other Message-ID: <201801301125.w0UBP4ch005284@aojmv0008.oracle.com> Changeset: dd1b2cd3c66e Author: rkennke Date: 2018-01-30 12:20 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk10/rev/dd1b2cd3c66e Make major GC phases exclusive from each other ! src/hotspot/cpu/x86/c1_Runtime1_x86.cpp ! src/hotspot/cpu/x86/macroAssembler_x86.cpp ! src/hotspot/cpu/x86/shenandoahBarrierSet_x86.cpp ! src/hotspot/share/gc/shenandoah/shenandoahBarrierSet.cpp ! src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp ! src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp ! src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp ! src/hotspot/share/gc/shenandoah/shenandoahSharedVariables.hpp ! src/hotspot/share/opto/shenandoahSupport.cpp From rkennke at redhat.com Tue Jan 30 11:21:49 2018 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 30 Jan 2018 12:21:49 +0100 Subject: RFR: Make major GC phases exclusive from each other In-Reply-To: <743dbd83-cc87-bd83-e4c6-6f28c9d3338f@redhat.com> References: <743dbd83-cc87-bd83-e4c6-6f28c9d3338f@redhat.com> Message-ID: On Tue, Jan 30, 2018 at 11:55 AM, Aleksey Shipilev wrote: > On 01/30/2018 11:51 AM, Roman Kennke wrote: > > Full: > > http://cr.openjdk.java.net/~rkennke/exclusive-gc-phases/webrev.01/ > > Good. I'd still do a few tune-ups: > > *) Keep the asserts in set/unset in ShenandoahSharedBitmap that the > incoming value fits into byte. > > *) Arguments should be "mask", not "bit" > void ShenandoahHeap::set_gc_state_mask_concurrently(uint bit, bool > value) { > void ShenandoahHeap::set_gc_state_mask(uint bit, bool value) { > > No need for re-review. > > Ok. I'm pushing: Differential: http://cr.openjdk.java.net/~rkennke/exclusive-gc-phases/webrev.02.diff/ Full: http://cr.openjdk.java.net/~rkennke/exclusive-gc-phases/webrev.02/ Thanks, Roman From shade at redhat.com Tue Jan 30 15:07:16 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 30 Jan 2018 16:07:16 +0100 Subject: Idea: aliased heap for checking to-space invariant Message-ID: <06d4e36d-262c-07f4-c131-5eccc63aa7f2@redhat.com> So I have been walking and muttering to myself how we cannot mprotect(PROT_READ) the collection set, because we have to accept the fwdptr update in the same page. We used to mprotect cset for verification, but that code basically mprotect(PROT_WRITE)-ed the page when fwdptr write had faulted, restarted the fwdptr update, accepting everything else after that too. Thus it was became too racy to be useful. This was the reason for us to ditch that verification part, and instead rely on explicit ShenandoahStoreCheck machinery. Then it hit me: the memory protection is enforced on virtual pages, not on physical pages, which means we can use the aliased heap to accept the fwdptr stores, while normal heap cset is protected from writes! I.e. have the normal heap WRITE|READ as usual, have the alias heap WRITE|READ as usual, then when cset is selected WRITE-protect the cset, and watch out for failures. The fwdptr updates from WB code should instead go via the aliased heap that is WRITE-enabled. This gives us several advantages: *) We capture all bad writes mechanically, instead of hoping we covered all ShStoreCheck cases *) The upstream exposure in .ad and platform-specific macro-assemblers goes away *) Roman's work on aliased heaps is not in vain :) *) We don't arrive to the mess with "differently-shaped" pointers to both normal and aliased heap, because we never leak aliased heap pointers anywhere: we just use that as the location for the fwdptr CAS. We can (and probably should) only enable this for verification, so we don't have any ill effects for non-verificated modes (which would just do the same thing they do today). Thanks, -Aleksey From shade at redhat.com Tue Jan 30 18:25:25 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 30 Jan 2018 19:25:25 +0100 Subject: RFR: [9] Bulk backports to sh/jdk9 In-Reply-To: <819f0073-b0b9-0708-cec0-d0b55452c0eb@redhat.com> References: <819f0073-b0b9-0708-cec0-d0b55452c0eb@redhat.com> Message-ID: On 01/26/2018 01:00 PM, Aleksey Shipilev wrote: > http://cr.openjdk.java.net/~shade/shenandoah/backports/jdk9-20180126/webrev.01/ > > Changes include: > > 8735773ec619: Single thread-local GC state flag for all barriers > 544322604347: ShConcurrentThread races with set_gc_state_bit > dc779781dd5e: Do not put down update-refs-in-progress flag concurrently > d55c6d5216d1: Common TLS access to GC state, where possible > 1d1238a0603b: Defer cleaning of system dictionary and friends to parallel cleaning phase > fd9724b26fdd: Refactor allocation failure and explicit GC handling > e5398dce6e7b: Make concurrent precleaning log message optional again > 26b9048c042a: Make degenerated update-refs use region-set cursor to hand over work > 1a6a9f288dd2: Bitmap size might not be page aligned when large page is used > 12654193e434: Demote warning message about OOM-during-evac to informational > 67294a38c0c7: TestSelectiveBarrierFlags should accept multi-element flag selections > ecb87af5e0d8: Implement flag to generate write-barriers without membars. > 820129a799b1: Allocation failure injection machinery > b8c39bdc0dac: Log message on ref processing, class unload, update refs for mark events > 45d471869b73: Degenerated GC > 15261c4a6adf: Degenerated GC: shortcut cycles, upgrade futile cycles > bd01b07ba0d7: Add ShenandoahRootProcessor API to report threads while scanning roots > 3a6457fecc72: Relax assert in SBS::is_safe() > 30e8ba6e2794: VerifyJCStressTest should test all heuristics > 6183a72bd5c2: ShBS::interpreter_storeval_barrier signature fix and cleanup > 3c12448ec444: Fix 32-bit build by ifdef-ing non-implemented storeval barrier > > Testing: hotspot_gc_shenandoah {fastdebug|release}, specjvm Ping. -Aleksey From shade at redhat.com Tue Jan 30 18:26:03 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 30 Jan 2018 19:26:03 +0100 Subject: RFR: Missing cancelled concgc check results assertion failure In-Reply-To: <028cd960-5aaa-7e5f-6137-2deaee117166@redhat.com> References: <028cd960-5aaa-7e5f-6137-2deaee117166@redhat.com> Message-ID: <7f7f5173-669f-225d-1e8d-e7cd1f70703b@redhat.com> On 01/26/2018 06:54 PM, Zhengyu Gu wrote: > Diving to weak reference work without checking cancelled concgc, results assertion failure of not > emptied task queues. > > http://cr.openjdk.java.net/~zgu/shenandoah/tq_cancelled_gc/hs_err.txt > > Webrev: > http://cr.openjdk.java.net/~zgu/shenandoah/tq_cancelled_gc/webrev.00/ Sounds reasonable to me. -Aleksey From rkennke at redhat.com Tue Jan 30 18:46:17 2018 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 30 Jan 2018 19:46:17 +0100 Subject: Idea: aliased heap for checking to-space invariant In-Reply-To: <06d4e36d-262c-07f4-c131-5eccc63aa7f2@redhat.com> References: <06d4e36d-262c-07f4-c131-5eccc63aa7f2@redhat.com> Message-ID: Am 30.01.2018 um 16:07 schrieb Aleksey Shipilev: > So I have been walking and muttering to myself how we cannot mprotect(PROT_READ) the collection set, > because we have to accept the fwdptr update in the same page. We used to mprotect cset for > verification, but that code basically mprotect(PROT_WRITE)-ed the page when fwdptr write had > faulted, restarted the fwdptr update, accepting everything else after that too. Thus it was became > too racy to be useful. This was the reason for us to ditch that verification part, and instead rely > on explicit ShenandoahStoreCheck machinery. > > Then it hit me: the memory protection is enforced on virtual pages, not on physical pages, which > means we can use the aliased heap to accept the fwdptr stores, while normal heap cset is protected > from writes! I.e. have the normal heap WRITE|READ as usual, have the alias heap WRITE|READ as usual, > then when cset is selected WRITE-protect the cset, and watch out for failures. The fwdptr updates > from WB code should instead go via the aliased heap that is WRITE-enabled. > > This gives us several advantages: > *) We capture all bad writes mechanically, instead of hoping we covered all ShStoreCheck cases > *) The upstream exposure in .ad and platform-specific macro-assemblers goes away > *) Roman's work on aliased heaps is not in vain :) > *) We don't arrive to the mess with "differently-shaped" pointers to both normal and aliased heap, > because we never leak aliased heap pointers anywhere: we just use that as the location for the > fwdptr CAS. > > We can (and probably should) only enable this for verification, so we don't have any ill effects for > non-verificated modes (which would just do the same thing they do today). > This sounds like a great idea! I think it would work. If we are going to introduce the machinery for multi-mapping (and we might eventually get it through ZGC anyway), we might want to think about finishing off the safe-oom-during-evac issue. Your inspiration inspired me: what if we create the 2nd mapping only on demand? I.e. when we hit oom-during-evac, we create the 2nd mapping, use it to safely flip the blocking bit in the forwarding pointer, and when the whole oom-during-evac sequence is over, we unmap the 2nd mapping. This way we'd keep memory counters in the kernel normal until we hit the very unlikely error case, and then only for a limited time. WDYT? Roman From zgu at redhat.com Tue Jan 30 19:07:23 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 30 Jan 2018 14:07:23 -0500 Subject: RFR: [9] Bulk backports to sh/jdk9 In-Reply-To: References: <819f0073-b0b9-0708-cec0-d0b55452c0eb@redhat.com> Message-ID: <4ac5ddf7-ede8-8ca6-79f1-ebe907ef50a8@redhat.com> Looks ok to me. -Zhengyu On 01/30/2018 01:25 PM, Aleksey Shipilev wrote: > On 01/26/2018 01:00 PM, Aleksey Shipilev wrote: >> http://cr.openjdk.java.net/~shade/shenandoah/backports/jdk9-20180126/webrev.01/ >> >> Changes include: >> >> 8735773ec619: Single thread-local GC state flag for all barriers >> 544322604347: ShConcurrentThread races with set_gc_state_bit >> dc779781dd5e: Do not put down update-refs-in-progress flag concurrently >> d55c6d5216d1: Common TLS access to GC state, where possible >> 1d1238a0603b: Defer cleaning of system dictionary and friends to parallel cleaning phase >> fd9724b26fdd: Refactor allocation failure and explicit GC handling >> e5398dce6e7b: Make concurrent precleaning log message optional again >> 26b9048c042a: Make degenerated update-refs use region-set cursor to hand over work >> 1a6a9f288dd2: Bitmap size might not be page aligned when large page is used >> 12654193e434: Demote warning message about OOM-during-evac to informational >> 67294a38c0c7: TestSelectiveBarrierFlags should accept multi-element flag selections >> ecb87af5e0d8: Implement flag to generate write-barriers without membars. >> 820129a799b1: Allocation failure injection machinery >> b8c39bdc0dac: Log message on ref processing, class unload, update refs for mark events >> 45d471869b73: Degenerated GC >> 15261c4a6adf: Degenerated GC: shortcut cycles, upgrade futile cycles >> bd01b07ba0d7: Add ShenandoahRootProcessor API to report threads while scanning roots >> 3a6457fecc72: Relax assert in SBS::is_safe() >> 30e8ba6e2794: VerifyJCStressTest should test all heuristics >> 6183a72bd5c2: ShBS::interpreter_storeval_barrier signature fix and cleanup >> 3c12448ec444: Fix 32-bit build by ifdef-ing non-implemented storeval barrier >> >> Testing: hotspot_gc_shenandoah {fastdebug|release}, specjvm > > Ping. > > -Aleksey > > From zgu at redhat.com Tue Jan 30 19:40:06 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 30 Jan 2018 14:40:06 -0500 Subject: RFR: String deduplication for traversal GC Message-ID: <4fdbbdc7-65bc-23a5-3b65-054b1c5ec28d@redhat.com> Please review the implementation of string deduplication for traversal GC. Webrev: http://cr.openjdk.java.net/~zgu/shenandoah/traversal_dedup/webrev.00/ Test: hotspot_gc_shenandoah (fastdebug + release) specJVM with -XX:+UseStringDeduplication (fastdebug) Thanks, -Zhengyu From rkennke at redhat.com Tue Jan 30 19:46:16 2018 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 30 Jan 2018 20:46:16 +0100 Subject: RFR: String deduplication for traversal GC In-Reply-To: <4fdbbdc7-65bc-23a5-3b65-054b1c5ec28d@redhat.com> References: <4fdbbdc7-65bc-23a5-3b65-054b1c5ec28d@redhat.com> Message-ID: Am 30.01.2018 um 20:40 schrieb Zhengyu Gu: > Please review the implementation of string deduplication for traversal GC. > > > Webrev: > http://cr.openjdk.java.net/~zgu/shenandoah/traversal_dedup/webrev.00/ > > > Test: > > ? hotspot_gc_shenandoah (fastdebug + release) > ? specJVM with -XX:+UseStringDeduplication (fastdebug) > > > Thanks, > > -Zhengyu I wonder if it should be possible to make the closure templated instead of making multiple explicit classes, like this: template class ShenandoahTraversalSuperClosure .. { .. template void work(T* p); } and then something like: template class ShenandoahTraversalDedupClosure : public ShenandoahTraversalSuperClosure { I am not totally sure about how to stitch it together, but something like this should work? Or maybe it's not worth all the hassle. ? (Infact, I suspect something like the above would be possible for the metadata flag too...) Roman From rkennke at redhat.com Tue Jan 30 19:48:00 2018 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 30 Jan 2018 20:48:00 +0100 Subject: RFR: String deduplication for traversal GC In-Reply-To: References: <4fdbbdc7-65bc-23a5-3b65-054b1c5ec28d@redhat.com> Message-ID: Am 30.01.2018 um 20:46 schrieb Roman Kennke: > Am 30.01.2018 um 20:40 schrieb Zhengyu Gu: >> Please review the implementation of string deduplication for traversal >> GC. >> >> >> Webrev: >> http://cr.openjdk.java.net/~zgu/shenandoah/traversal_dedup/webrev.00/ >> >> >> Test: >> >> ?? hotspot_gc_shenandoah (fastdebug + release) >> ?? specJVM with -XX:+UseStringDeduplication (fastdebug) >> >> >> Thanks, >> >> -Zhengyu > > I wonder if it should be possible to make the closure templated instead > of making multiple explicit classes, like this: > > template > class ShenandoahTraversalSuperClosure .. { > > .. > ?? template > ?? void work(T* p); > } > > and then something like: > > template > class ShenandoahTraversalDedupClosure : public > ShenandoahTraversalSuperClosure { > > I am not totally sure about how to stitch it together, but something > like this should work? Or maybe it's not worth all the hassle. ? > > (Infact, I suspect something like the above would be possible for the > metadata flag too...) > > Roman > Ah, one weirdo in this scheme is the definition of work(), which would look something like: template template inline void ShenandoahTraversalSuperClosure::work(T* p) { Roman From zgu at redhat.com Tue Jan 30 20:07:20 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 30 Jan 2018 15:07:20 -0500 Subject: RFR: String deduplication for traversal GC In-Reply-To: References: <4fdbbdc7-65bc-23a5-3b65-054b1c5ec28d@redhat.com> Message-ID: <837ac0a0-08c7-e7c8-8ebd-a611ab50c06e@redhat.com> On 01/30/2018 02:48 PM, Roman Kennke wrote: > Am 30.01.2018 um 20:46 schrieb Roman Kennke: >> Am 30.01.2018 um 20:40 schrieb Zhengyu Gu: >>> Please review the implementation of string deduplication for >>> traversal GC. >>> >>> >>> Webrev: >>> http://cr.openjdk.java.net/~zgu/shenandoah/traversal_dedup/webrev.00/ >>> >>> >>> Test: >>> >>> hotspot_gc_shenandoah (fastdebug + release) >>> specJVM with -XX:+UseStringDeduplication (fastdebug) >>> >>> >>> Thanks, >>> >>> -Zhengyu >> >> I wonder if it should be possible to make the closure templated >> instead of making multiple explicit classes, like this: >> >> template >> class ShenandoahTraversalSuperClosure .. { >> >> .. >> template >> void work(T* p); >> } >> >> and then something like: >> >> template >> class ShenandoahTraversalDedupClosure : public >> ShenandoahTraversalSuperClosure { >> >> I am not totally sure about how to stitch it together, but something >> like this should work? Or maybe it's not worth all the hassle. ? >> >> (Infact, I suspect something like the above would be possible for the >> metadata flag too...) >> >> Roman >> > > > Ah, one weirdo in this scheme is the definition of work(), which would > look something like: > > template > template > inline void ShenandoahTraversalSuperClosure::work(T* p) { What's advantage of this style? Thanks, -Zhengyu > > Roman From rkennke at redhat.com Tue Jan 30 20:08:37 2018 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 30 Jan 2018 21:08:37 +0100 Subject: RFR: String deduplication for traversal GC In-Reply-To: <837ac0a0-08c7-e7c8-8ebd-a611ab50c06e@redhat.com> References: <4fdbbdc7-65bc-23a5-3b65-054b1c5ec28d@redhat.com> <837ac0a0-08c7-e7c8-8ebd-a611ab50c06e@redhat.com> Message-ID: Less clutter/boilerplate code? ;-) I guess it's mostly a matter oft taste.. Am 30. Januar 2018 21:07:20 MEZ schrieb Zhengyu Gu : > > >On 01/30/2018 02:48 PM, Roman Kennke wrote: >> Am 30.01.2018 um 20:46 schrieb Roman Kennke: >>> Am 30.01.2018 um 20:40 schrieb Zhengyu Gu: >>>> Please review the implementation of string deduplication for >>>> traversal GC. >>>> >>>> >>>> Webrev: >>>> >http://cr.openjdk.java.net/~zgu/shenandoah/traversal_dedup/webrev.00/ >>>> >>>> >>>> Test: >>>> >>>> hotspot_gc_shenandoah (fastdebug + release) >>>> specJVM with -XX:+UseStringDeduplication (fastdebug) >>>> >>>> >>>> Thanks, >>>> >>>> -Zhengyu >>> >>> I wonder if it should be possible to make the closure templated >>> instead of making multiple explicit classes, like this: >>> >>> template >>> class ShenandoahTraversalSuperClosure .. { >>> >>> .. >>> template >>> void work(T* p); >>> } >>> >>> and then something like: >>> >>> template >>> class ShenandoahTraversalDedupClosure : public >>> ShenandoahTraversalSuperClosure { >>> >>> I am not totally sure about how to stitch it together, but something > >>> like this should work? Or maybe it's not worth all the hassle. ? >>> >>> (Infact, I suspect something like the above would be possible for >the >>> metadata flag too...) >>> >>> Roman >>> >> >> >> Ah, one weirdo in this scheme is the definition of work(), which >would >> look something like: >> >> template >> template >> inline void ShenandoahTraversalSuperClosure::work(T* p) { > >What's advantage of this style? > >Thanks, > >-Zhengyu > > >> >> Roman -- Diese Nachricht wurde von meinem Android-Ger?t mit K-9 Mail gesendet. From zgu at redhat.com Tue Jan 30 20:12:22 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 30 Jan 2018 15:12:22 -0500 Subject: RFR: String deduplication for traversal GC In-Reply-To: References: <4fdbbdc7-65bc-23a5-3b65-054b1c5ec28d@redhat.com> <837ac0a0-08c7-e7c8-8ebd-a611ab50c06e@redhat.com> Message-ID: <415b37b8-3d93-db29-8237-a2a28bc9a7e7@redhat.com> Let's follow existing style. If we decide to change, we should change them all together. Thanks, -Zhengyu On 01/30/2018 03:08 PM, Roman Kennke wrote: > Less clutter/boilerplate code? ;-) I guess it's mostly a matter oft taste.. > > Am 30. Januar 2018 21:07:20 MEZ schrieb Zhengyu Gu : >> >> >> On 01/30/2018 02:48 PM, Roman Kennke wrote: >>> Am 30.01.2018 um 20:46 schrieb Roman Kennke: >>>> Am 30.01.2018 um 20:40 schrieb Zhengyu Gu: >>>>> Please review the implementation of string deduplication for >>>>> traversal GC. >>>>> >>>>> >>>>> Webrev: >>>>> >> http://cr.openjdk.java.net/~zgu/shenandoah/traversal_dedup/webrev.00/ >>>>> >>>>> >>>>> Test: >>>>> >>>>> hotspot_gc_shenandoah (fastdebug + release) >>>>> specJVM with -XX:+UseStringDeduplication (fastdebug) >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> -Zhengyu >>>> >>>> I wonder if it should be possible to make the closure templated >>>> instead of making multiple explicit classes, like this: >>>> >>>> template >>>> class ShenandoahTraversalSuperClosure .. { >>>> >>>> .. >>>> template >>>> void work(T* p); >>>> } >>>> >>>> and then something like: >>>> >>>> template >>>> class ShenandoahTraversalDedupClosure : public >>>> ShenandoahTraversalSuperClosure { >>>> >>>> I am not totally sure about how to stitch it together, but something >> >>>> like this should work? Or maybe it's not worth all the hassle. ? >>>> >>>> (Infact, I suspect something like the above would be possible for >> the >>>> metadata flag too...) >>>> >>>> Roman >>>> >>> >>> >>> Ah, one weirdo in this scheme is the definition of work(), which >> would >>> look something like: >>> >>> template >>> template >>> inline void ShenandoahTraversalSuperClosure::work(T* p) { >> >> What's advantage of this style? >> >> Thanks, >> >> -Zhengyu >> >> >>> >>> Roman > From zgu at redhat.com Tue Jan 30 20:19:37 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Tue, 30 Jan 2018 15:19:37 -0500 Subject: RFR: String deduplication for traversal GC In-Reply-To: References: <4fdbbdc7-65bc-23a5-3b65-054b1c5ec28d@redhat.com> Message-ID: <3a707760-906c-2f2d-6cd3-7e238269b36b@redhat.com> On 01/30/2018 02:46 PM, Roman Kennke wrote: > Am 30.01.2018 um 20:40 schrieb Zhengyu Gu: >> Please review the implementation of string deduplication for traversal >> GC. >> >> >> Webrev: >> http://cr.openjdk.java.net/~zgu/shenandoah/traversal_dedup/webrev.00/ >> >> >> Test: >> >> hotspot_gc_shenandoah (fastdebug + release) >> specJVM with -XX:+UseStringDeduplication (fastdebug) >> >> >> Thanks, >> >> -Zhengyu > > I wonder if it should be possible to make the closure templated instead > of making multiple explicit classes, like this: > > template > class ShenandoahTraversalSuperClosure .. { > > .. > template > void work(T* p); > } > > and then something like: I had a version like this for early dedup closures, but changed to current style based on shade's comments. -Zhengyu > > template > class ShenandoahTraversalDedupClosure : public > ShenandoahTraversalSuperClosure { > > I am not totally sure about how to stitch it together, but something > like this should work? Or maybe it's not worth all the hassle. ? > > (Infact, I suspect something like the above would be possible for the > metadata flag too...) > > Roman From rkennke at redhat.com Tue Jan 30 21:21:46 2018 From: rkennke at redhat.com (Roman Kennke) Date: Tue, 30 Jan 2018 22:21:46 +0100 Subject: RFR: String deduplication for traversal GC In-Reply-To: <415b37b8-3d93-db29-8237-a2a28bc9a7e7@redhat.com> References: <4fdbbdc7-65bc-23a5-3b65-054b1c5ec28d@redhat.com> <837ac0a0-08c7-e7c8-8ebd-a611ab50c06e@redhat.com> <415b37b8-3d93-db29-8237-a2a28bc9a7e7@redhat.com> Message-ID: <00D13E3E-5B3D-408B-BF5E-19BD77B6423D@redhat.com> Yes, that is fine. The rest of patch looks good to me too. Am 30. Januar 2018 21:12:22 MEZ schrieb Zhengyu Gu : >Let's follow existing style. If we decide to change, we should change >them all together. > >Thanks, > >-Zhengyu > >On 01/30/2018 03:08 PM, Roman Kennke wrote: >> Less clutter/boilerplate code? ;-) I guess it's mostly a matter oft >taste.. >> >> Am 30. Januar 2018 21:07:20 MEZ schrieb Zhengyu Gu : >>> >>> >>> On 01/30/2018 02:48 PM, Roman Kennke wrote: >>>> Am 30.01.2018 um 20:46 schrieb Roman Kennke: >>>>> Am 30.01.2018 um 20:40 schrieb Zhengyu Gu: >>>>>> Please review the implementation of string deduplication for >>>>>> traversal GC. >>>>>> >>>>>> >>>>>> Webrev: >>>>>> >>> >http://cr.openjdk.java.net/~zgu/shenandoah/traversal_dedup/webrev.00/ >>>>>> >>>>>> >>>>>> Test: >>>>>> >>>>>> hotspot_gc_shenandoah (fastdebug + release) >>>>>> specJVM with -XX:+UseStringDeduplication (fastdebug) >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> -Zhengyu >>>>> >>>>> I wonder if it should be possible to make the closure templated >>>>> instead of making multiple explicit classes, like this: >>>>> >>>>> template >>>>> class ShenandoahTraversalSuperClosure .. { >>>>> >>>>> .. >>>>> template >>>>> void work(T* p); >>>>> } >>>>> >>>>> and then something like: >>>>> >>>>> template >>>>> class ShenandoahTraversalDedupClosure : public >>>>> ShenandoahTraversalSuperClosure { >>>>> >>>>> I am not totally sure about how to stitch it together, but >something >>> >>>>> like this should work? Or maybe it's not worth all the hassle. ? >>>>> >>>>> (Infact, I suspect something like the above would be possible for >>> the >>>>> metadata flag too...) >>>>> >>>>> Roman >>>>> >>>> >>>> >>>> Ah, one weirdo in this scheme is the definition of work(), which >>> would >>>> look something like: >>>> >>>> template >>>> template >>>> inline void ShenandoahTraversalSuperClosure::work(T* p) { >>> >>> What's advantage of this style? >>> >>> Thanks, >>> >>> -Zhengyu >>> >>> >>>> >>>> Roman >> -- Diese Nachricht wurde von meinem Android-Ger?t mit K-9 Mail gesendet. From rkennke at redhat.com Wed Jan 31 10:09:22 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 31 Jan 2018 11:09:22 +0100 Subject: RFR: [9] Bulk backports to sh/jdk9 In-Reply-To: <4ac5ddf7-ede8-8ca6-79f1-ebe907ef50a8@redhat.com> References: <819f0073-b0b9-0708-cec0-d0b55452c0eb@redhat.com> <4ac5ddf7-ede8-8ca6-79f1-ebe907ef50a8@redhat.com> Message-ID: Looks good to me too On Tue, Jan 30, 2018 at 8:07 PM, Zhengyu Gu wrote: > Looks ok to me. > > -Zhengyu > > > On 01/30/2018 01:25 PM, Aleksey Shipilev wrote: > >> On 01/26/2018 01:00 PM, Aleksey Shipilev wrote: >> >>> http://cr.openjdk.java.net/~shade/shenandoah/backports/jdk9- >>> 20180126/webrev.01/ >>> >>> Changes include: >>> >>> 8735773ec619: Single thread-local GC state flag for all barriers >>> 544322604347: ShConcurrentThread races with set_gc_state_bit >>> dc779781dd5e: Do not put down update-refs-in-progress flag concurrently >>> d55c6d5216d1: Common TLS access to GC state, where possible >>> 1d1238a0603b: Defer cleaning of system dictionary and friends to >>> parallel cleaning phase >>> fd9724b26fdd: Refactor allocation failure and explicit GC handling >>> e5398dce6e7b: Make concurrent precleaning log message optional again >>> 26b9048c042a: Make degenerated update-refs use region-set cursor to hand >>> over work >>> 1a6a9f288dd2: Bitmap size might not be page aligned when large page is >>> used >>> 12654193e434: Demote warning message about OOM-during-evac to >>> informational >>> 67294a38c0c7: TestSelectiveBarrierFlags should accept multi-element flag >>> selections >>> ecb87af5e0d8: Implement flag to generate write-barriers without membars. >>> 820129a799b1: Allocation failure injection machinery >>> b8c39bdc0dac: Log message on ref processing, class unload, update refs >>> for mark events >>> 45d471869b73: Degenerated GC >>> 15261c4a6adf: Degenerated GC: shortcut cycles, upgrade futile cycles >>> bd01b07ba0d7: Add ShenandoahRootProcessor API to report threads while >>> scanning roots >>> 3a6457fecc72: Relax assert in SBS::is_safe() >>> 30e8ba6e2794: VerifyJCStressTest should test all heuristics >>> 6183a72bd5c2: ShBS::interpreter_storeval_barrier signature fix and >>> cleanup >>> 3c12448ec444: Fix 32-bit build by ifdef-ing non-implemented storeval >>> barrier >>> >>> Testing: hotspot_gc_shenandoah {fastdebug|release}, specjvm >>> >> >> Ping. >> >> -Aleksey >> >> >> From shade at redhat.com Wed Jan 31 11:28:58 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 31 Jan 2018 12:28:58 +0100 Subject: RFR: Single GCTimer shared by all operations Message-ID: http://cr.openjdk.java.net/~shade/shenandoah/single-gc-timer/webrev.01/ Degenerated GC exposed a wrinkle in our GCTimer handling. Full GC has the separate GCTimer (for legacy reasons?). All other operations run with GCTimer from ShHeap. Degenerated GC is peculiar: it may start as the usual operation, but then *continue* as upgraded to Full GC. GCTimers then start to misbehave with asserts like: # assert(_phases->length() <= 1000) failed: Too many recored phases? The solution/cleanup is to use a single GCTimer, basically letting Full GC using the GCTimer from ShHeap. Testing: hotspot_gc_shenandoah Thanks, -Aleksey From rkennke at redhat.com Wed Jan 31 11:40:17 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 31 Jan 2018 12:40:17 +0100 Subject: RFR: Single GCTimer shared by all operations In-Reply-To: References: Message-ID: Am 31.01.2018 um 12:28 schrieb Aleksey Shipilev: > http://cr.openjdk.java.net/~shade/shenandoah/single-gc-timer/webrev.01/ > > Degenerated GC exposed a wrinkle in our GCTimer handling. Full GC has the separate GCTimer (for > legacy reasons?). All other operations run with GCTimer from ShHeap. Degenerated GC is peculiar: it > may start as the usual operation, but then *continue* as upgraded to Full GC. GCTimers then start to > misbehave with asserts like: > > # assert(_phases->length() <= 1000) failed: Too many recored phases? > > The solution/cleanup is to use a single GCTimer, basically letting Full GC using the GCTimer from > ShHeap. > > Testing: hotspot_gc_shenandoah > > Thanks, > -Aleksey > Seems good. Is there a difference between STWGCTimer and ConcurrentGCTimer that may fall on our feet? Roman From shade at redhat.com Wed Jan 31 11:42:49 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 31 Jan 2018 12:42:49 +0100 Subject: RFR: Single GCTimer shared by all operations In-Reply-To: References: Message-ID: On 01/31/2018 12:40 PM, Roman Kennke wrote: > Am 31.01.2018 um 12:28 schrieb Aleksey Shipilev: >> http://cr.openjdk.java.net/~shade/shenandoah/single-gc-timer/webrev.01/ >> >> Degenerated GC exposed a wrinkle in our GCTimer handling. Full GC has the separate GCTimer (for >> legacy reasons?). All other operations run with GCTimer from ShHeap. Degenerated GC is peculiar: it >> may start as the usual operation, but then *continue* as upgraded to Full GC. GCTimers then start to >> misbehave with asserts like: >> >> #? assert(_phases->length() <= 1000) failed: Too many recored phases? >> >> The solution/cleanup is to use a single GCTimer, basically letting Full GC using the GCTimer from >> ShHeap. >> >> Testing: hotspot_gc_shenandoah >> >> Thanks, >> -Aleksey >> > > Seems good. Is there a difference between STWGCTimer and ConcurrentGCTimer that may fall on our feet? I don't think so. The bigger question if that fixes the failures that you see in tests? Because I cannot reproduce the failure on my local machine. -Aleksey From rkennke at redhat.com Wed Jan 31 12:12:46 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 31 Jan 2018 13:12:46 +0100 Subject: RFR: Single GCTimer shared by all operations In-Reply-To: References: Message-ID: <164c6ad9-82c3-a836-9362-af8112a8cab5@redhat.com> Am 31.01.2018 um 12:42 schrieb Aleksey Shipilev: > On 01/31/2018 12:40 PM, Roman Kennke wrote: >> Am 31.01.2018 um 12:28 schrieb Aleksey Shipilev: >>> http://cr.openjdk.java.net/~shade/shenandoah/single-gc-timer/webrev.01/ >>> >>> Degenerated GC exposed a wrinkle in our GCTimer handling. Full GC has the separate GCTimer (for >>> legacy reasons?). All other operations run with GCTimer from ShHeap. Degenerated GC is peculiar: it >>> may start as the usual operation, but then *continue* as upgraded to Full GC. GCTimers then start to >>> misbehave with asserts like: >>> >>> #? assert(_phases->length() <= 1000) failed: Too many recored phases? >>> >>> The solution/cleanup is to use a single GCTimer, basically letting Full GC using the GCTimer from >>> ShHeap. >>> >>> Testing: hotspot_gc_shenandoah >>> >>> Thanks, >>> -Aleksey >>> >> >> Seems good. Is there a difference between STWGCTimer and ConcurrentGCTimer that may fall on our feet? > > I don't think so. > > The bigger question if that fixes the failures that you see in tests? Because I cannot reproduce the > failure on my local machine. > > -Aleksey > Seems good in the few runs I could make. Let's push it and undergo more testing when it's in. Roman From zgu at redhat.com Wed Jan 31 13:24:51 2018 From: zgu at redhat.com (zgu at redhat.com) Date: Wed, 31 Jan 2018 13:24:51 +0000 Subject: hg: shenandoah/jdk10: String deduplication for traversal GC Message-ID: <201801311324.w0VDOpgc017315@aojmv0008.oracle.com> Changeset: 6657b88f3a63 Author: zgu Date: 2018-01-31 08:19 -0500 URL: http://hg.openjdk.java.net/shenandoah/jdk10/rev/6657b88f3a63 String deduplication for traversal GC ! src/hotspot/share/gc/shenandoah/shenandoahOopClosures.hpp ! src/hotspot/share/gc/shenandoah/shenandoahOopClosures.inline.hpp ! src/hotspot/share/gc/shenandoah/shenandoahTraversalGC.cpp ! src/hotspot/share/gc/shenandoah/shenandoahTraversalGC.hpp ! src/hotspot/share/gc/shenandoah/shenandoahTraversalGC.inline.hpp ! src/hotspot/share/gc/shenandoah/shenandoah_specialized_oop_closures.hpp ! test/hotspot/jtreg/gc/shenandoah/ShenandoahStrDedupStress.java From zgu at redhat.com Wed Jan 31 13:26:33 2018 From: zgu at redhat.com (zgu at redhat.com) Date: Wed, 31 Jan 2018 13:26:33 +0000 Subject: hg: shenandoah/jdk10: Missing cancelled concgc check results assertion failure Message-ID: <201801311326.w0VDQXHD017862@aojmv0008.oracle.com> Changeset: 3ef7ac462979 Author: zgu Date: 2018-01-31 08:22 -0500 URL: http://hg.openjdk.java.net/shenandoah/jdk10/rev/3ef7ac462979 Missing cancelled concgc check results assertion failure ! src/hotspot/share/gc/shenandoah/shenandoahTraversalGC.cpp From ashipile at redhat.com Wed Jan 31 13:49:48 2018 From: ashipile at redhat.com (ashipile at redhat.com) Date: Wed, 31 Jan 2018 13:49:48 +0000 Subject: hg: shenandoah/jdk10: Single GCTimer shared by all operations Message-ID: <201801311349.w0VDnm35025225@aojmv0008.oracle.com> Changeset: 4050463704a4 Author: shade Date: 2018-01-31 12:29 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk10/rev/4050463704a4 Single GCTimer shared by all operations ! src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp ! src/hotspot/share/gc/shenandoah/shenandoahConcurrentThread.cpp ! src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp ! src/hotspot/share/gc/shenandoah/shenandoahMarkCompact.cpp ! src/hotspot/share/gc/shenandoah/shenandoahMarkCompact.hpp ! src/hotspot/share/gc/shenandoah/shenandoahUtils.cpp ! src/hotspot/share/gc/shenandoah/shenandoahUtils.hpp From ashipile at redhat.com Wed Jan 31 15:26:42 2018 From: ashipile at redhat.com (ashipile at redhat.com) Date: Wed, 31 Jan 2018 15:26:42 +0000 Subject: hg: shenandoah/jdk9/hotspot: 21 new changesets Message-ID: <201801311526.w0VFQhEN028358@aojmv0008.oracle.com> Changeset: 489bec20624c Author: shade Date: 2018-01-15 12:19 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/489bec20624c [backport] Single thread-local GC state flag for all barriers ! src/cpu/aarch64/vm/c1_Runtime1_aarch64.cpp ! src/cpu/aarch64/vm/macroAssembler_aarch64.cpp ! src/cpu/aarch64/vm/shenandoahBarrierSet_aarch64.cpp ! src/cpu/x86/vm/c1_Runtime1_x86.cpp ! src/cpu/x86/vm/macroAssembler_x86.cpp ! src/cpu/x86/vm/shenandoahBarrierSet_x86.cpp ! src/cpu/x86/vm/x86_64.ad ! src/share/vm/gc/shenandoah/shenandoahBarrierSet.cpp ! src/share/vm/gc/shenandoah/shenandoahConcurrentMark.cpp ! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.cpp ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc/shenandoah/shenandoahHeap.hpp ! src/share/vm/gc/shenandoah/shenandoahHeap.inline.hpp ! src/share/vm/gc/shenandoah/shenandoahMarkCompact.cpp ! src/share/vm/gc/shenandoah/shenandoahSharedVariables.hpp ! src/share/vm/gc/shenandoah/shenandoahVerifier.cpp ! src/share/vm/opto/cfgnode.hpp ! src/share/vm/opto/compile.cpp ! src/share/vm/opto/graphKit.cpp ! src/share/vm/opto/ifnode.cpp ! src/share/vm/opto/memnode.hpp ! src/share/vm/opto/node.hpp ! src/share/vm/opto/shenandoahSupport.cpp ! src/share/vm/runtime/thread.cpp ! src/share/vm/runtime/thread.hpp ! src/share/vm/runtime/thread.inline.hpp Changeset: 447b871ee85b Author: shade Date: 2018-01-16 20:23 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/447b871ee85b [backport] ShConcurrentThread races with set_gc_state_bit ! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.cpp ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc/shenandoah/shenandoahHeap.hpp Changeset: f667c875b72d Author: shade Date: 2018-01-22 12:04 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/f667c875b72d [backport] Do not put down update-refs-in-progress flag concurrently ! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.cpp ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc/shenandoah/shenandoahHeap.hpp ! src/share/vm/gc/shenandoah/shenandoahMarkCompact.cpp Changeset: ba8a39b9672d Author: shade Date: 2018-01-15 12:32 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/ba8a39b9672d [backport] Common TLS access to GC state, where possible ! src/share/vm/gc/shenandoah/shenandoah_globals.hpp ! src/share/vm/opto/graphKit.cpp ! src/share/vm/opto/loopnode.cpp ! src/share/vm/opto/loopnode.hpp ! src/share/vm/opto/shenandoahSupport.cpp ! src/share/vm/opto/shenandoahSupport.hpp + test/gc/shenandoah/compiler/TestCommonGCLoads.java Changeset: 2ed987e64f80 Author: rkennke Date: 2018-01-17 15:33 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/2ed987e64f80 [backport] Defer cleaning of system dictionary and friends to parallel cleaning phase ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp Changeset: 4c58342d9fc1 Author: shade Date: 2018-01-17 15:37 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/4c58342d9fc1 [backport] Refactor allocation failure and explicit GC handling ! src/share/vm/gc/shared/gcCause.cpp ! src/share/vm/gc/shared/gcCause.hpp ! src/share/vm/gc/shenandoah/shenandoahCollectorPolicy.cpp ! src/share/vm/gc/shenandoah/shenandoahCollectorPolicy.hpp ! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.cpp ! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.hpp ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc/shenandoah/shenandoahHeap.hpp ! src/share/vm/gc/shenandoah/shenandoahHeap.inline.hpp ! src/share/vm/gc/shenandoah/shenandoah_globals.hpp Changeset: 417fb8d6c4d0 Author: shade Date: 2018-01-22 10:10 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/417fb8d6c4d0 [backport] Make concurrent precleaning log message optional again ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp Changeset: d6298f7d7545 Author: shade Date: 2018-01-17 16:08 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/d6298f7d7545 [backport] Make degenerated update-refs use region-set cursor to hand over work ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp Changeset: 260edcc9f8a2 Author: zgu Date: 2018-01-18 08:23 -0500 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/260edcc9f8a2 [backport] Bitmap size might not be page aligned when large page is used ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp Changeset: fd14b29d82d7 Author: shade Date: 2018-01-19 11:52 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/fd14b29d82d7 [backport] Demote warning message about OOM-during-evac to informational ! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.cpp Changeset: 939b89fc6bd3 Author: shade Date: 2018-01-19 16:27 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/939b89fc6bd3 [backport] TestSelectiveBarrierFlags should accept multi-element flag selections ! test/gc/shenandoah/TestSelectiveBarrierFlags.java Changeset: 18f77577944a Author: rkennke Date: 2018-01-19 18:40 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/18f77577944a [backport] Implement flag to generate write-barriers without membars. ! src/share/vm/gc/shenandoah/shenandoah_globals.hpp ! src/share/vm/opto/compile.cpp ! src/share/vm/opto/shenandoahSupport.cpp Changeset: 882e15472997 Author: shade Date: 2018-01-19 18:49 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/882e15472997 [backport] Allocation failure injection machinery ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc/shenandoah/shenandoahHeap.hpp ! src/share/vm/gc/shenandoah/shenandoah_globals.hpp ! test/gc/shenandoah/LotsOfCycles.java ! test/gc/shenandoah/acceptance/AllocIntArrays.java ! test/gc/shenandoah/acceptance/AllocObjectArrays.java ! test/gc/shenandoah/acceptance/AllocObjects.java ! test/gc/shenandoah/acceptance/RetainObjects.java ! test/gc/shenandoah/acceptance/SieveObjects.java ! test/gc/stress/TestGCOldWithShenandoah.java ! test/gc/stress/gcbasher/TestGCBasherWithShenandoah.java ! test/gc/stress/gclocker/TestGCLockerWithShenandoah.java Changeset: 93865bd554e1 Author: shade Date: 2018-01-22 10:47 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/93865bd554e1 [backport] Log message on ref processing, class unload, update refs for mark events ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp Changeset: 5cfc9680da7d Author: shade Date: 2018-01-22 12:52 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/5cfc9680da7d [backport] Degenerated GC ! src/share/vm/gc/shenandoah/shenandoahCollectorPolicy.cpp ! src/share/vm/gc/shenandoah/shenandoahCollectorPolicy.hpp ! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.cpp ! src/share/vm/gc/shenandoah/shenandoahConcurrentThread.hpp ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc/shenandoah/shenandoahHeap.hpp ! src/share/vm/gc/shenandoah/shenandoahPhaseTimings.cpp ! src/share/vm/gc/shenandoah/shenandoahPhaseTimings.hpp ! src/share/vm/gc/shenandoah/shenandoahUtils.hpp ! src/share/vm/gc/shenandoah/shenandoahVerifier.cpp ! src/share/vm/gc/shenandoah/shenandoahVerifier.hpp ! src/share/vm/gc/shenandoah/shenandoahWorkerPolicy.cpp ! src/share/vm/gc/shenandoah/shenandoahWorkerPolicy.hpp ! src/share/vm/gc/shenandoah/shenandoah_globals.hpp ! src/share/vm/gc/shenandoah/vm_operations_shenandoah.cpp ! src/share/vm/gc/shenandoah/vm_operations_shenandoah.hpp ! src/share/vm/runtime/vm_operations.hpp Changeset: 9240f42fb9d1 Author: shade Date: 2018-01-24 15:30 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/9240f42fb9d1 [backport] Degenerated GC: shortcut cycles, upgrade futile cycles ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc/shenandoah/shenandoahHeap.hpp Changeset: fd4837b82b06 Author: rkennke Date: 2018-01-23 21:20 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/fd4837b82b06 [backport] Add ShenandoahRootProcessor API to report threads while scanning roots ! src/share/vm/gc/shenandoah/shenandoahConcurrentMark.cpp ! src/share/vm/gc/shenandoah/shenandoahHeap.cpp ! src/share/vm/gc/shenandoah/shenandoahMarkCompact.cpp ! src/share/vm/gc/shenandoah/shenandoahRootProcessor.cpp ! src/share/vm/gc/shenandoah/shenandoahRootProcessor.hpp Changeset: bfa5f2485433 Author: rkennke Date: 2018-01-24 15:09 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/bfa5f2485433 [backport] Relax assert in SBS::is_safe() ! src/share/vm/gc/shenandoah/shenandoahBarrierSet.cpp Changeset: 1be91cb7a447 Author: shade Date: 2018-01-24 19:14 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/1be91cb7a447 [backport] VerifyJCStressTest should test all heuristics ! test/gc/shenandoah/acceptance/VerifyJCStressTest.java Changeset: 4c7ca6405439 Author: shade Date: 2018-01-25 11:24 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/4c7ca6405439 [backport] ShBS::interpreter_storeval_barrier signature fix and cleanup ! src/cpu/aarch64/vm/shenandoahBarrierSet_aarch64.cpp ! src/cpu/aarch64/vm/templateTable_aarch64.cpp ! src/cpu/x86/vm/shenandoahBarrierSet_x86.cpp ! src/cpu/x86/vm/templateTable_x86.cpp ! src/share/vm/gc/shared/barrierSet.hpp ! src/share/vm/gc/shenandoah/shenandoahBarrierSet.hpp Changeset: a5e7ea380dc5 Author: shade Date: 2018-01-25 18:44 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk9/hotspot/rev/a5e7ea380dc5 [backport] Fix 32-bit build by ifdef-ing non-implemented storeval barrier ! src/cpu/x86/vm/shenandoahBarrierSet_x86.cpp From rwestrel at redhat.com Wed Jan 31 15:31:43 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 31 Jan 2018 16:31:43 +0100 Subject: RFR: backport of 8191887 Message-ID: I hit 8191887 when running specjvm with Shenandoah. This was fixed upstream so I propose we cherry pick it. The fix doesn't apply cleanly so here it is on top of the current shenandoah repo: http://cr.openjdk.java.net/~roland/shenandoah/8191887/webrev.00/ Roland. From rwestrel at redhat.com Wed Jan 31 15:34:08 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 31 Jan 2018 16:34:08 +0100 Subject: RFR: fix loop unswitching with -XX:-ShenandoahWriteBarrierMemBar Message-ID: http://cr.openjdk.java.net/~roland/shenandoah/loop_unswitching%2b-ShenandoahWriteBarrierMemBar/webrev.00/ This fixes: http://mail.openjdk.java.net/pipermail/shenandoah-dev/2018-January/004738.html Part of the logic required for this to work (the code added by the patch) also got lost at some point. Roland. From shade at redhat.com Wed Jan 31 15:43:26 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 31 Jan 2018 16:43:26 +0100 Subject: RFR: backport of 8191887 In-Reply-To: References: Message-ID: <4b18ee64-8e92-0d72-fa4c-002fbdf48d9f@redhat.com> On 01/31/2018 04:31 PM, Roland Westrelin wrote: > I hit 8191887 when running specjvm with Shenandoah. This was fixed > upstream so I propose we cherry pick it. The fix doesn't apply > cleanly so here it is on top of the current shenandoah repo: > > http://cr.openjdk.java.net/~roland/shenandoah/8191887/webrev.00/ Yes please, anything that helps resolves conflicts during the merges is cool. Please push it as: "Cherry-pick 8191887: assert(b->is_Bool()) in PhaseIdealLoop::clone_iff() due to Opaque4 node" -Aleksey From shade at redhat.com Wed Jan 31 15:45:31 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 31 Jan 2018 16:45:31 +0100 Subject: RFR: fix loop unswitching with -XX:-ShenandoahWriteBarrierMemBar In-Reply-To: References: Message-ID: <689fd86b-a6e2-fefd-29ac-548cc5e1cce8@redhat.com> On 01/31/2018 04:34 PM, Roland Westrelin wrote: > http://cr.openjdk.java.net/~roland/shenandoah/loop_unswitching%2b-ShenandoahWriteBarrierMemBar/webrev.00/ Looks good to me! 4059 Node* load = iff->in(1)->in(1)->in(1)->in(1); Do we care anywhere else about deeper chain from that iff to the actual load? I.e. no other code is broken due to the new graph shape? > Part of the logic required for this to work (the code added by the > patch) also got lost at some point. That makes sense. -Aleksey From rwestrel at redhat.com Wed Jan 31 16:03:16 2018 From: rwestrel at redhat.com (rwestrel at redhat.com) Date: Wed, 31 Jan 2018 16:03:16 +0000 Subject: hg: shenandoah/jdk10: Cherry-pick 8191887: assert(b->is_Bool()) in PhaseIdealLoop::clone_iff() due to Opaque4 node Message-ID: <201801311603.w0VG3Hcu012590@aojmv0008.oracle.com> Changeset: e3d076dce734 Author: roland Date: 2018-01-31 16:26 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk10/rev/e3d076dce734 Cherry-pick 8191887: assert(b->is_Bool()) in PhaseIdealLoop::clone_iff() due to Opaque4 node Summary: add special handling for graph shape If->Opaque4->Bool->CmpP Reviewed-by: kvn ! src/hotspot/share/opto/loopnode.hpp ! src/hotspot/share/opto/loopopts.cpp + test/hotspot/jtreg/compiler/unsafe/TestLoopUnswitching.java From rwestrel at redhat.com Wed Jan 31 16:04:36 2018 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 31 Jan 2018 17:04:36 +0100 Subject: RFR: fix loop unswitching with -XX:-ShenandoahWriteBarrierMemBar In-Reply-To: <689fd86b-a6e2-fefd-29ac-548cc5e1cce8@redhat.com> References: <689fd86b-a6e2-fefd-29ac-548cc5e1cce8@redhat.com> Message-ID: > Do we care anywhere else about deeper chain from that iff to the actual load? I.e. no other code is > broken due to the new graph shape? Not as far as can tell. Roland. From shade at redhat.com Wed Jan 31 16:05:21 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 31 Jan 2018 17:05:21 +0100 Subject: RFR: fix loop unswitching with -XX:-ShenandoahWriteBarrierMemBar In-Reply-To: References: <689fd86b-a6e2-fefd-29ac-548cc5e1cce8@redhat.com> Message-ID: <2b5cfaa0-5809-9732-bcbf-f7ab746f1d9e@redhat.com> On 01/31/2018 05:04 PM, Roland Westrelin wrote: >> Do we care anywhere else about deeper chain from that iff to the actual load? I.e. no other code is >> broken due to the new graph shape? > > Not as far as can tell. All good then. -Aleksey From rwestrel at redhat.com Wed Jan 31 16:17:31 2018 From: rwestrel at redhat.com (rwestrel at redhat.com) Date: Wed, 31 Jan 2018 16:17:31 +0000 Subject: hg: shenandoah/jdk10: fix -ShenandoahWriteBarrierMemBar and loop unswitching Message-ID: <201801311617.w0VGHVsL017798@aojmv0008.oracle.com> Changeset: af9272163588 Author: roland Date: 2018-01-31 16:17 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk10/rev/af9272163588 fix -ShenandoahWriteBarrierMemBar and loop unswitching ! src/hotspot/share/opto/shenandoahSupport.cpp From shade at redhat.com Wed Jan 31 16:49:21 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 31 Jan 2018 17:49:21 +0100 Subject: RFR: Make major GC phases exclusive from each other In-Reply-To: References: <743dbd83-cc87-bd83-e4c6-6f28c9d3338f@redhat.com> Message-ID: <675232a7-9eeb-8ff4-5c6e-038d7b137857@redhat.com> On 01/30/2018 12:21 PM, Roman Kennke wrote: > Differential: > http://cr.openjdk.java.net/~rkennke/exclusive-gc-phases/webrev.02.diff/ > > Full: > http://cr.openjdk.java.net/~rkennke/exclusive-gc-phases/webrev.02/ Oh wait, where are AArch64 parts? -Aleksey From shade at redhat.com Wed Jan 31 17:37:11 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 31 Jan 2018 18:37:11 +0100 Subject: RFR [9] 2018-02-01: Bulk backports to sh/jdk9 Message-ID: <5ab32b43-3d02-60f7-4a2e-4f611b1b8f4b@redhat.com> http://cr.openjdk.java.net/~shade/shenandoah/backports/jdk9-20180201/webrev.01/ This backports the follow-up bugfixes we have recently found to sh/jdk9: 16198c705496: [backport] Conditionalize PerfDataMemorySize on enabled heap sampling dd1b2cd3c66e: [backport] Make major GC phases exclusive from each other 4050463704a4: [backport] Single GCTimer shared by all operations af9272163588: [backport] fix -ShenandoahWriteBarrierMemBar and loop unswitching sh/jdk10 nightly is running with them now to verify separately. Testing: hotspot_gc_shenandoah {fastdebug|release} Thanks, -Aleksey From rkennke at redhat.com Wed Jan 31 17:39:05 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 31 Jan 2018 18:39:05 +0100 Subject: RFR: Make major GC phases exclusive from each other In-Reply-To: <675232a7-9eeb-8ff4-5c6e-038d7b137857@redhat.com> References: <743dbd83-cc87-bd83-e4c6-6f28c9d3338f@redhat.com> <675232a7-9eeb-8ff4-5c6e-038d7b137857@redhat.com> Message-ID: Will do them as soon as possible. Am 31. Januar 2018 17:49:21 MEZ schrieb Aleksey Shipilev : >On 01/30/2018 12:21 PM, Roman Kennke wrote: >> Differential: >> >http://cr.openjdk.java.net/~rkennke/exclusive-gc-phases/webrev.02.diff/ >> > >> Full: >> http://cr.openjdk.java.net/~rkennke/exclusive-gc-phases/webrev.02/ > >Oh wait, where are AArch64 parts? > >-Aleksey -- Diese Nachricht wurde von meinem Android-Ger?t mit K-9 Mail gesendet. From rkennke at redhat.com Wed Jan 31 19:14:54 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 31 Jan 2018 20:14:54 +0100 Subject: RFR: fix loop unswitching with -XX:-ShenandoahWriteBarrierMemBar In-Reply-To: References: <689fd86b-a6e2-fefd-29ac-548cc5e1cce8@redhat.com> Message-ID: Am 31.01.2018 um 17:04 schrieb Roland Westrelin: > >> Do we care anywhere else about deeper chain from that iff to the actual load? I.e. no other code is >> broken due to the new graph shape? > > Not as far as can tell. > > Roland. > With the patch, -XX:-ShenandoahWriteBarrierMemBar does not crash anymore, but it's significantly slower than with membar... like 75% slower. Which seems illogical. tried with: -XX:+UseShenandoahGC -XX:+UnlockDiagnosticVMOptions -XX:+UnlockExperimentalVMOptions -Xms4g -Xmx4g -XX:ShenandoahGCHeuristics=traversal -XX:ShenandoahFreeThreshold=17 -XX:-ShenandoahWriteBarrierMemBar on specjvm compiler. Roman From rkennke at redhat.com Wed Jan 31 19:59:26 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 31 Jan 2018 20:59:26 +0100 Subject: RFR: Don't treat allocation regions implicitely live during traversal GC Message-ID: Until now, we treated allocation regions from between GC cycles all live in traversal GC. This seems inconsequential: we are not treating alloc regions live during the cycle. This means that all the allocated garbage will have to pass through one complete cycle to count its liveness, and then another cycle to clear it up. This patch changes this to traverse+clear alloc regions in the next cycle. This gives some application a huge boost. E.g. compiler.compiler goes from 180ops/m to around 205ops/m in my tests. http://cr.openjdk.java.net/~rkennke/better-traversal-heuristics/webrev.00/ Ok? Roman From shade at redhat.com Wed Jan 31 20:03:50 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 31 Jan 2018 21:03:50 +0100 Subject: RFR: Don't treat allocation regions implicitely live during traversal GC In-Reply-To: References: Message-ID: On 01/31/2018 08:59 PM, Roman Kennke wrote: > Until now, we treated allocation regions from between GC cycles all live in traversal GC. This seems > inconsequential: we are not treating alloc regions live during the cycle. This means that all the > allocated garbage will have to pass through one complete cycle to count its liveness, and then > another cycle to clear it up. This patch changes this to traverse+clear alloc regions in the next > cycle. > > This gives some application a huge boost. E.g. compiler.compiler goes from 180ops/m to around > 205ops/m in my tests. > > http://cr.openjdk.java.net/~rkennke/better-traversal-heuristics/webrev.00/ Um. Doesn't it break non-Traversal GCs? Shared/TLAB allocs would not be counted as live with e.g. "adaptive"? -Aleksey From shade at redhat.com Wed Jan 31 20:15:10 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 31 Jan 2018 21:15:10 +0100 Subject: RFR: Don't treat allocation regions implicitely live during traversal GC In-Reply-To: References: Message-ID: <263fedae-8aa3-ca77-3c4b-38710184f97b@redhat.com> On 01/31/2018 09:03 PM, Aleksey Shipilev wrote: > On 01/31/2018 08:59 PM, Roman Kennke wrote: >> Until now, we treated allocation regions from between GC cycles all live in traversal GC. This seems >> inconsequential: we are not treating alloc regions live during the cycle. This means that all the >> allocated garbage will have to pass through one complete cycle to count its liveness, and then >> another cycle to clear it up. This patch changes this to traverse+clear alloc regions in the next >> cycle. >> >> This gives some application a huge boost. E.g. compiler.compiler goes from 180ops/m to around >> 205ops/m in my tests. >> >> http://cr.openjdk.java.net/~rkennke/better-traversal-heuristics/webrev.00/ > > Um. Doesn't it break non-Traversal GCs? Shared/TLAB allocs would not be counted as live with e.g. > "adaptive"? Yes, it does break half of hotspot_gc_shenandoah. Should be e.g.: bool ShenandoahFreeSet::implicit_live(ShenandoahHeap::AllocType type) const { ShenandoahHeap* heap = ShenandoahHeap::heap(); if (heap->shenandoahPolicy()->can_do_traversal_gc()) { if (heap->is_concurrent_traversal_in_progress()) { return false; } switch (type) { case ShenandoahHeap::_alloc_tlab: case ShenandoahHeap::_alloc_shared: return false; case ShenandoahHeap::_alloc_gclab: case ShenandoahHeap::_alloc_shared_gc: return true; default: ShouldNotReachHere(); } } return true; } -Aleksey From rkennke at redhat.com Wed Jan 31 20:15:48 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 31 Jan 2018 21:15:48 +0100 Subject: RFR: Don't treat allocation regions implicitely live during traversal GC In-Reply-To: References: Message-ID: Am 31.01.2018 um 21:03 schrieb Aleksey Shipilev: > On 01/31/2018 08:59 PM, Roman Kennke wrote: >> Until now, we treated allocation regions from between GC cycles all live in traversal GC. This seems >> inconsequential: we are not treating alloc regions live during the cycle. This means that all the >> allocated garbage will have to pass through one complete cycle to count its liveness, and then >> another cycle to clear it up. This patch changes this to traverse+clear alloc regions in the next >> cycle. >> >> This gives some application a huge boost. E.g. compiler.compiler goes from 180ops/m to around >> 205ops/m in my tests. >> >> http://cr.openjdk.java.net/~rkennke/better-traversal-heuristics/webrev.00/ > > Um. Doesn't it break non-Traversal GCs? Shared/TLAB allocs would not be counted as live with e.g. > "adaptive"? > Grr. See this is what happens when you want to rush out a change when brain is pudding ;-) In my tests I guarded this by UseNewCode... http://cr.openjdk.java.net/~rkennke/better-traversal-heuristics/webrev.01/ Better? Roman From shade at redhat.com Wed Jan 31 20:19:15 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 31 Jan 2018 21:19:15 +0100 Subject: RFR: Don't treat allocation regions implicitely live during traversal GC In-Reply-To: References: Message-ID: <8d9eb1aa-c67c-1aac-a99e-9ea8757da364@redhat.com> On 01/31/2018 09:15 PM, Roman Kennke wrote: > Am 31.01.2018 um 21:03 schrieb Aleksey Shipilev: >> On 01/31/2018 08:59 PM, Roman Kennke wrote: >>> Until now, we treated allocation regions from between GC cycles all live in traversal GC. This seems >>> inconsequential: we are not treating alloc regions live during the cycle. This means that all the >>> allocated garbage will have to pass through one complete cycle to count its liveness, and then >>> another cycle to clear it up. This patch changes this to traverse+clear alloc regions in the next >>> cycle. >>> >>> This gives some application a huge boost. E.g. compiler.compiler goes from 180ops/m to around >>> 205ops/m in my tests. >>> >>> http://cr.openjdk.java.net/~rkennke/better-traversal-heuristics/webrev.00/ >> >> Um. Doesn't it break non-Traversal GCs? Shared/TLAB allocs would not be counted as live with e.g. >> "adaptive"? >> > > Grr. See this is what happens when you want to rush out a change when brain is pudding ;-) In my > tests I guarded this by UseNewCode... > > http://cr.openjdk.java.net/~rkennke/better-traversal-heuristics/webrev.01/ > > Better? Yes, that seems okay. I'd still suggest a switch to guard from accidental enums: bool ShenandoahFreeSet::implicit_live(ShenandoahHeap::AllocType type) const { if (ShenandoahHeap::heap()->is_concurrent_traversal_in_progress()) { return false; } switch (type) { case ShenandoahHeap::_alloc_tlab: case ShenandoahHeap::_alloc_shared: return ShenandoahAllocImplicitLive; case ShenandoahHeap::_alloc_gclab: case ShenandoahHeap::_alloc_shared_gc: return true; default: ShouldNotReachHere(); return true; } } -Aleksey From rkennke at redhat.com Wed Jan 31 20:19:34 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 31 Jan 2018 21:19:34 +0100 Subject: RFR: Don't treat allocation regions implicitely live during traversal GC In-Reply-To: <263fedae-8aa3-ca77-3c4b-38710184f97b@redhat.com> References: <263fedae-8aa3-ca77-3c4b-38710184f97b@redhat.com> Message-ID: <8dee2e31-d368-c788-16ec-863270ea9c7e@redhat.com> Am 31.01.2018 um 21:15 schrieb Aleksey Shipilev: > On 01/31/2018 09:03 PM, Aleksey Shipilev wrote: >> On 01/31/2018 08:59 PM, Roman Kennke wrote: >>> Until now, we treated allocation regions from between GC cycles all live in traversal GC. This seems >>> inconsequential: we are not treating alloc regions live during the cycle. This means that all the >>> allocated garbage will have to pass through one complete cycle to count its liveness, and then >>> another cycle to clear it up. This patch changes this to traverse+clear alloc regions in the next >>> cycle. >>> >>> This gives some application a huge boost. E.g. compiler.compiler goes from 180ops/m to around >>> 205ops/m in my tests. >>> >>> http://cr.openjdk.java.net/~rkennke/better-traversal-heuristics/webrev.00/ >> >> Um. Doesn't it break non-Traversal GCs? Shared/TLAB allocs would not be counted as live with e.g. >> "adaptive"? > > Yes, it does break half of hotspot_gc_shenandoah. Should be e.g.: > > bool ShenandoahFreeSet::implicit_live(ShenandoahHeap::AllocType type) const { > ShenandoahHeap* heap = ShenandoahHeap::heap(); > if (heap->shenandoahPolicy()->can_do_traversal_gc()) { > if (heap->is_concurrent_traversal_in_progress()) { > return false; > } > switch (type) { > case ShenandoahHeap::_alloc_tlab: > case ShenandoahHeap::_alloc_shared: > return false; > case ShenandoahHeap::_alloc_gclab: > case ShenandoahHeap::_alloc_shared_gc: > return true; > default: > ShouldNotReachHere(); > } > } > return true; > } > > -Aleksey > This seems even better. I'm going to push this then? http://cr.openjdk.java.net/~rkennke/better-traversal-heuristics/webrev.02/ Good? Roman From shade at redhat.com Wed Jan 31 20:20:39 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 31 Jan 2018 21:20:39 +0100 Subject: RFR: Don't treat allocation regions implicitely live during traversal GC In-Reply-To: <8d9eb1aa-c67c-1aac-a99e-9ea8757da364@redhat.com> References: <8d9eb1aa-c67c-1aac-a99e-9ea8757da364@redhat.com> Message-ID: On 01/31/2018 09:19 PM, Aleksey Shipilev wrote: > On 01/31/2018 09:15 PM, Roman Kennke wrote: >> Am 31.01.2018 um 21:03 schrieb Aleksey Shipilev: >>> On 01/31/2018 08:59 PM, Roman Kennke wrote: >>>> Until now, we treated allocation regions from between GC cycles all live in traversal GC. This seems >>>> inconsequential: we are not treating alloc regions live during the cycle. This means that all the >>>> allocated garbage will have to pass through one complete cycle to count its liveness, and then >>>> another cycle to clear it up. This patch changes this to traverse+clear alloc regions in the next >>>> cycle. >>>> >>>> This gives some application a huge boost. E.g. compiler.compiler goes from 180ops/m to around >>>> 205ops/m in my tests. >>>> >>>> http://cr.openjdk.java.net/~rkennke/better-traversal-heuristics/webrev.00/ >>> >>> Um. Doesn't it break non-Traversal GCs? Shared/TLAB allocs would not be counted as live with e.g. >>> "adaptive"? >>> >> >> Grr. See this is what happens when you want to rush out a change when brain is pudding ;-) In my >> tests I guarded this by UseNewCode... >> >> http://cr.openjdk.java.net/~rkennke/better-traversal-heuristics/webrev.01/ >> >> Better? > > Yes, that seems okay. I'd still suggest a switch to guard from accidental enums: > > bool ShenandoahFreeSet::implicit_live(ShenandoahHeap::AllocType type) const { > if (ShenandoahHeap::heap()->is_concurrent_traversal_in_progress()) { > return false; > } > switch (type) { > case ShenandoahHeap::_alloc_tlab: > case ShenandoahHeap::_alloc_shared: > return ShenandoahAllocImplicitLive; > case ShenandoahHeap::_alloc_gclab: > case ShenandoahHeap::_alloc_shared_gc: > return true; > default: > ShouldNotReachHere(); > return true; > } > } > > -Aleksey I like the ShenandoahAllocImplicitLive flag better, because it avoids the v-call to collectionPolicy() on allocation path. And it reduces coupling between components. -Aleksey From rkennke at redhat.com Wed Jan 31 20:26:52 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 31 Jan 2018 21:26:52 +0100 Subject: RFR: Don't treat allocation regions implicitely live during traversal GC In-Reply-To: References: <8d9eb1aa-c67c-1aac-a99e-9ea8757da364@redhat.com> Message-ID: Am 31.01.2018 um 21:20 schrieb Aleksey Shipilev: > On 01/31/2018 09:19 PM, Aleksey Shipilev wrote: >> On 01/31/2018 09:15 PM, Roman Kennke wrote: >>> Am 31.01.2018 um 21:03 schrieb Aleksey Shipilev: >>>> On 01/31/2018 08:59 PM, Roman Kennke wrote: >>>>> Until now, we treated allocation regions from between GC cycles all live in traversal GC. This seems >>>>> inconsequential: we are not treating alloc regions live during the cycle. This means that all the >>>>> allocated garbage will have to pass through one complete cycle to count its liveness, and then >>>>> another cycle to clear it up. This patch changes this to traverse+clear alloc regions in the next >>>>> cycle. >>>>> >>>>> This gives some application a huge boost. E.g. compiler.compiler goes from 180ops/m to around >>>>> 205ops/m in my tests. >>>>> >>>>> http://cr.openjdk.java.net/~rkennke/better-traversal-heuristics/webrev.00/ >>>> >>>> Um. Doesn't it break non-Traversal GCs? Shared/TLAB allocs would not be counted as live with e.g. >>>> "adaptive"? >>>> >>> >>> Grr. See this is what happens when you want to rush out a change when brain is pudding ;-) In my >>> tests I guarded this by UseNewCode... >>> >>> http://cr.openjdk.java.net/~rkennke/better-traversal-heuristics/webrev.01/ >>> >>> Better? >> >> Yes, that seems okay. I'd still suggest a switch to guard from accidental enums: >> >> bool ShenandoahFreeSet::implicit_live(ShenandoahHeap::AllocType type) const { >> if (ShenandoahHeap::heap()->is_concurrent_traversal_in_progress()) { >> return false; >> } >> switch (type) { >> case ShenandoahHeap::_alloc_tlab: >> case ShenandoahHeap::_alloc_shared: >> return ShenandoahAllocImplicitLive; >> case ShenandoahHeap::_alloc_gclab: >> case ShenandoahHeap::_alloc_shared_gc: >> return true; >> default: >> ShouldNotReachHere(); >> return true; >> } >> } >> >> -Aleksey > > I like the ShenandoahAllocImplicitLive flag better, because it avoids the v-call to > collectionPolicy() on allocation path. And it reduces coupling between components. > > -Aleksey > > > Ok. Then this: http://cr.openjdk.java.net/~rkennke/better-traversal-heuristics/webrev.03 ? Roman From shade at redhat.com Wed Jan 31 20:28:38 2018 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 31 Jan 2018 21:28:38 +0100 Subject: RFR: Don't treat allocation regions implicitely live during traversal GC In-Reply-To: References: <8d9eb1aa-c67c-1aac-a99e-9ea8757da364@redhat.com> Message-ID: <5acd32d8-04a1-4b3b-51b2-72fa59089aae@redhat.com> On 01/31/2018 09:26 PM, Roman Kennke wrote: > http://cr.openjdk.java.net/~rkennke/better-traversal-heuristics/webrev.03 Yes, seems good. -Aleksey From roman at kennke.org Wed Jan 31 20:37:53 2018 From: roman at kennke.org (roman at kennke.org) Date: Wed, 31 Jan 2018 20:37:53 +0000 Subject: hg: shenandoah/jdk10: Don't treat allocation regions implicitely live during traversal GC Message-ID: <201801312037.w0VKbsx0024851@aojmv0008.oracle.com> Changeset: 207591c5122b Author: rkennke Date: 2018-01-31 21:14 +0100 URL: http://hg.openjdk.java.net/shenandoah/jdk10/rev/207591c5122b Don't treat allocation regions implicitely live during traversal GC ! src/hotspot/share/gc/shenandoah/shenandoahCollectorPolicy.cpp ! src/hotspot/share/gc/shenandoah/shenandoahFreeSet.cpp ! src/hotspot/share/gc/shenandoah/shenandoahFreeSet.hpp ! src/hotspot/share/gc/shenandoah/shenandoah_globals.hpp From zgu at redhat.com Wed Jan 31 20:45:06 2018 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 31 Jan 2018 15:45:06 -0500 Subject: RFR: More cancelled concgc check and bailout Message-ID: More cancelled concgc check and bailout. With this patch, traversal GC passed specJVM with string deduplication on. Webrev: http://cr.openjdk.java.net/~zgu/shenandoah/traversal_cancelled_gc/webrev.00/ Test: hotspot_gc_shenandoah (fastdebug + release) Thanks, -Zhengyu From rkennke at redhat.com Wed Jan 31 21:41:43 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 31 Jan 2018 22:41:43 +0100 Subject: RFR: More cancelled concgc check and bailout In-Reply-To: References: Message-ID: <8d99af81-652b-958e-6d3b-7411bf6af76d@redhat.com> > More cancelled concgc check and bailout. With this patch, traversal GC > passed specJVM? with string deduplication on. > > > Webrev: > http://cr.openjdk.java.net/~zgu/shenandoah/traversal_cancelled_gc/webrev.00/ > > > > Test: > ? hotspot_gc_shenandoah (fastdebug + release) > > Thanks, > > -Zhengyu > Looks good to me. Thanks! From rkennke at redhat.com Wed Jan 31 21:58:44 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 31 Jan 2018 22:58:44 +0100 Subject: RFR: Don't count evacs double in traversal GC Message-ID: <1c2666da-2b69-11eb-f722-75891c90374d@redhat.com> I think this improved liveness work just led me to find the liveness accounting bug that I have observed occasionally. It seems we are counting evacs double: once when allocating the gclab/shared-gc, and once by the usual GC mechanics: we evac cset objects, then push them to the queue, and when it's popped, we count liveness for the object, regardless in which region it is. Let's never count any liveness on allocation, and do GC traversal count it. This is more precise (not counting any GCLAB waste). http://cr.openjdk.java.net/~rkennke/traversal-liveness-accounting/webrev.00/ Test: hotspot_gc_shenandoah Ok? Roman From zgu at redhat.com Wed Jan 31 22:08:15 2018 From: zgu at redhat.com (zgu at redhat.com) Date: Wed, 31 Jan 2018 22:08:15 +0000 Subject: hg: shenandoah/jdk10: Cancelled congc check and bailout to avoid assertion failure Message-ID: <201801312208.w0VM8Ff0025280@aojmv0008.oracle.com> Changeset: 29e22a0191fa Author: zgu Date: 2018-01-31 17:03 -0500 URL: http://hg.openjdk.java.net/shenandoah/jdk10/rev/29e22a0191fa Cancelled congc check and bailout to avoid assertion failure ! src/hotspot/share/gc/shenandoah/shenandoahTraversalGC.cpp From rkennke at redhat.com Wed Jan 31 22:18:39 2018 From: rkennke at redhat.com (Roman Kennke) Date: Wed, 31 Jan 2018 23:18:39 +0100 Subject: RFR: Don't count evacs double in traversal GC In-Reply-To: <1c2666da-2b69-11eb-f722-75891c90374d@redhat.com> References: <1c2666da-2b69-11eb-f722-75891c90374d@redhat.com> Message-ID: <055cde79-e44f-261c-e858-271aa2a83244@redhat.com> Am 31.01.2018 um 22:58 schrieb Roman Kennke: > I think this improved liveness work just led me to find the liveness > accounting bug that I have observed occasionally. It seems we are > counting evacs double: once when allocating the gclab/shared-gc, and > once by the usual GC mechanics: we evac cset objects, then push them to > the queue, and when it's popped, we count liveness for the object, > regardless in which region it is. Let's never count any liveness on > allocation, and do GC traversal count it. This is more precise (not > counting any GCLAB waste). > > http://cr.openjdk.java.net/~rkennke/traversal-liveness-accounting/webrev.00/ > > > Test: hotspot_gc_shenandoah > > Ok? > > Roman Ok, this is probably nonsense. The traversal-in-progress check should already have caught this. However, outside of the traversal phase, there are no evacs either, so the whole switch-block is useless. The patch should still be useful as a cleanup and simplification. Ok? Roman